Genetically oriented clustering using variable length chromosomes

Authors: Paraskevi Zacharia, Nikos Aspragathos

Abstract

In most cases, due to the plethora of data values, it is unrealistic for domain experts to mine useful knowledge from the database. Motivated by this, a novel approach for optimized clustering is developed in this paper. The proposed approach is genetically oriented to mine vital information incorporated in large databases avoiding entrapment in local optima and sensitivity to initialization. The aim of the proposed method is to find an optimum set of clusters that can properly classify all training data without much computational burden. The contribution is twofold: firstly, it gleans the valuable information hidden behind the database and secondly, it evolves automatically the appropriate number of cluster centers, as well as the partitioning of the data, without a priori assumptions on the cluster centers.
In this paper, the effectiveness of a Genetic Algorithm with variable length chromosomes is demonstrated for clustering data sets into an unknown number of clusters. The flexibility of the proposed variable length Genetic Algorithm to detect the optimum number of clusters and the corresponding partition is evaluated through various experimental tests. The results of the proposed algorithm are compared with those obtained by the well-known fuzzy c-means algorithm, which is applicable only for a predefined fixed number of clusters and the subtractive clustering method, where data-points are considered as candidate cluster centers.

AttachmentSize
IPROMS2008.wmv2.82 MB

a pdf file
ang's picture
Submitted by ang on Sat, 12/07/2008 - 2:59pm.

Dear authors,
(1) Based on your paper, you wrote that you are comparing GA, FCM, and SCM. So can you explain in Table 1, the meaning of GA-FCM, GA-SCM and FCM-SCM? I am confused with these symbols, for eg. the GA-FCM may means hybrid GA-FCM!

(2) In summary, can you give me some insight why the approach using GA is better than SCM and FCM?

(3) There are the Bees Algorithm works reported in the special session: Bees algorithm. The Bees algorithm were reported to perform better than GA. Why not have a look at the session.

Thank you.

ang


Zacharia's picture
Submitted by Zacharia on Mon, 14/07/2008 - 7:39am.

Dear Madam,

Table 1 shows the rms deviations for the three Data Sets resulting from the comparison between the three methods. The note 'GA-FCM' means that the proposed GA and the FCM are compared for each Data Set. Perhaps 'GA vs FCM' wouldn't have caused such a confusion.

The proposed GA has the advantage over FCM that it can automatically evolve the appropriate number of cluster centers without user intervention and is able to escape local optima. Our approach benefits form the advantages of SCM, but the search for the candidate cluster centers is not limited to the data points; instead, any point in the search space is a possible cluster center.

Thank you for your suggestion concerning the Bees algorithm.

Vivi Zacharia


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Who's online

There are currently 0 users and 170 guests online.