Bio-Inspired Optimization for Text Mining-3

Clustering Numerical Multidimensional Data
In this post we will implement Bio Inspired Optimization for clustering multidimensional data. We will use two dimensional data array “data” however the code can be used for any reasonable size of array. To do this parameter num_dimensions should be set to data array dimension. We use number of clusters 2 which is defined by parameter num_clusters that can be also changed to different number.

We use custom functions for generator, evaluator and bounder settings.

Below you can find python source code.


# -*- coding: utf-8 -*-

# Clustering for multidimensional data (including 1 dimensional)

from time import time
from random import Random
import inspyred
import numpy as np



data = [(3,3), (2,2), (8,8), (7,7)]
num_dimensions=2
num_clusters = 2
low_b=1
hi_b=20

def my_observer(population, num_generations, num_evaluations, args):
    best = max(population)
    print('{0:6} -- {1} : {2}'.format(num_generations, 
                                      best.fitness, 
                                      str(best.candidate)))

def generate(random, args):
      
      matrix=np.zeros((num_clusters, num_dimensions))

     
      for i in range (num_clusters):
           matrix[i]=np.array([random.uniform(low_b, hi_b) for j in range(num_dimensions)])
          
      return matrix
      
def evaluate(candidates, args):
    
   fitness = []
    
   for cand in candidates:  
     fit=0  
     for d in range(len(data)):
         distance=100000000
         for c in cand:
            
            temp=0
            for z in range(num_dimensions):  
              temp=temp+(data[d][z]-c[z])**2
            if temp < distance :
               tempc=c 
               distance=temp
         print (d,tempc)  
         fit=fit + distance
     fitness.append(fit)          
   return fitness  


def bound_function(candidate, args):
    for i, c in enumerate(candidate):
        
        for j in range (num_dimensions):
            candidate[i][j]=max(min(c[j], hi_b), low_b)
    return candidate
 

def main(prng=None, display=False):
    if prng is None:
        prng = Random()
        prng.seed(time()) 
    
    
    
   
    ea = inspyred.swarm.PSO(prng)
    ea.observer = my_observer
    ea.terminator = inspyred.ec.terminators.evaluation_termination
    ea.topology = inspyred.swarm.topologies.ring_topology
    final_pop = ea.evolve(generator=generate,
                          evaluator=evaluate, 
                          pop_size=12,
                          bounder=bound_function,
                          maximize=False,
                          max_evaluations=25100,   
                          neighborhood_size=3)
                         

   

if __name__ == '__main__':
    main(display=True)

Below you can find final output example. Here 0,1,2,3 means index of data array. 0 means that we are looking at data[0]. On right side of the numbers it is showing centroid data coordinates. All indexes that have same centroid belong to the same cluster. Last line is showing fitness value (2.0) which is sum of squared distances and coordinates of centroids.


0 [ 2.5         2.50000001]
1 [ 2.5         2.50000001]
2 [ 7.49999999  7.5       ]
3 [ 7.49999999  7.5       ]
  2091 -- 2.0 : [array([ 7.50000001,  7.5       ]), array([ 2.5       ,  2.50000001])]

In the next post we will move from numerical data to text data.

References
1. Bio-Inspired Optimization for Text Mining-1 Motivation
2. Bio-Inspired Optimization for Text Mining-2 Numerical One Dimensional Example



Leave a Comment