Clustering Numerical Multidimensional Data
In this post we will implement Bio Inspired Optimization for clustering multidimensional data. We will use two dimensional data array “data” however the code can be used for any reasonable size of array. To do this parameter num_dimensions should be set to data array dimension. We use number of clusters 2 which is defined by parameter num_clusters that can be also changed to different number.
We use custom functions for generator, evaluator and bounder settings.
Below you can find python source code.
# -*- coding: utf-8 -*-
# Clustering for multidimensional data (including 1 dimensional)
from time import time
from random import Random
import inspyred
import numpy as np
data = [(3,3), (2,2), (8,8), (7,7)]
num_dimensions=2
num_clusters = 2
low_b=1
hi_b=20
def my_observer(population, num_generations, num_evaluations, args):
best = max(population)
print('{0:6} -- {1} : {2}'.format(num_generations,
best.fitness,
str(best.candidate)))
def generate(random, args):
matrix=np.zeros((num_clusters, num_dimensions))
for i in range (num_clusters):
matrix[i]=np.array([random.uniform(low_b, hi_b) for j in range(num_dimensions)])
return matrix
def evaluate(candidates, args):
fitness = []
for cand in candidates:
fit=0
for d in range(len(data)):
distance=100000000
for c in cand:
temp=0
for z in range(num_dimensions):
temp=temp+(data[d][z]-c[z])**2
if temp < distance :
tempc=c
distance=temp
print (d,tempc)
fit=fit + distance
fitness.append(fit)
return fitness
def bound_function(candidate, args):
for i, c in enumerate(candidate):
for j in range (num_dimensions):
candidate[i][j]=max(min(c[j], hi_b), low_b)
return candidate
def main(prng=None, display=False):
if prng is None:
prng = Random()
prng.seed(time())
ea = inspyred.swarm.PSO(prng)
ea.observer = my_observer
ea.terminator = inspyred.ec.terminators.evaluation_termination
ea.topology = inspyred.swarm.topologies.ring_topology
final_pop = ea.evolve(generator=generate,
evaluator=evaluate,
pop_size=12,
bounder=bound_function,
maximize=False,
max_evaluations=25100,
neighborhood_size=3)
if __name__ == '__main__':
main(display=True)
Below you can find final output example. Here 0,1,2,3 means index of data array. 0 means that we are looking at data[0]. On right side of the numbers it is showing centroid data coordinates. All indexes that have same centroid belong to the same cluster. Last line is showing fitness value (2.0) which is sum of squared distances and coordinates of centroids.
0 [ 2.5 2.50000001]
1 [ 2.5 2.50000001]
2 [ 7.49999999 7.5 ]
3 [ 7.49999999 7.5 ]
2091 -- 2.0 : [array([ 7.50000001, 7.5 ]), array([ 2.5 , 2.50000001])]
In the next post we will move from numerical data to text data.
References
1. Bio-Inspired Optimization for Text Mining-1 Motivation
2. Bio-Inspired Optimization for Text Mining-2 Numerical One Dimensional Example
You must be logged in to post a comment.