Statistical approach to numerical databases:clustering using normalised Minkowski metrics

Pre-processing or normalisation of data sets is widely used in a number of fields of machine intelligence. Contrary to the overwhelming majority of other normalisation procedures, when data is scaled to a unit range, it is argued in the paper that after normalisation of a data set, the average contributions of all features to the measure employed to assess the similarity of the data have to be equal to one another.
Using the Minkowski distance as an example of a similarity metric, new normalised metrics are introduced such that the means of all attributes are the same and, hence, contributions of the features to similarity measures are approximately equalised. Such a normalisation is achieved by scaling of the numerical attributes, i.e. by dividing the database values by the means of the appropriate components of the metric.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

NOTE: TO READ THIS AND OTHER IPROMS 2006 PAPERS, PLEASE REGISTER FOR THE CONFERENCE.

REGISTRATION IS FREE.

CLICK here TO REGISTER.

AttachmentSize
Statistical approach to numerical databases using normalised Minkowski metrics.ppt161 KB

login or register to download the paper. a pdf file
ashraf_afify's picture
Submitted by ashraf_afify on Tue, 04/07/2006 - 6:45pm.

Hi,

Your proposed method of normalisation is based on the idea that the means of individual features and, hence, their contributions to the overall similarity measure should be equal. It is not clear why this should be the case and how can the means of different features be equalised.

Thank You
Afify


mariasuarez's picture
Submitted by mariasuarez on Thu, 06/07/2006 - 1:00pm.

Thank you for your interesting questions.
1. As we wrote in the paper, when there is no prior information about the relative importance of the attributes, one has to assume that all attributes are equally relevant. This leads to our main assumption that the attribute contributions to the overall similarity measure should be equal.
2. The means of different features can be equalised by a statistical approach described in the paper. It is also possible to find there some examples of new normalised metrics.


Mr Olivier Dent's picture
Submitted by Mr Olivier Dent on Fri, 14/07/2006 - 1:22pm.

Hi!
Thank you for a very interesting paper.
It would be of interest to have examples of problems to which you have applied your approach.


Pham's picture
Submitted by Pham on Sun, 23/07/2006 - 12:30pm.

We are working on them and invite you to join us again in July next year at IPROMS 2007, when we hope to discuss interesting applications of our proposed technique.


Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.

Who's online

There are currently 0 users and 150 guests online.