Accompanied with the development of Internet, an era of information explosion has already come. It is a new chanllenge to find out the truly helpful information from the vast data ocean. Data mining is a technology that emergies under such a background and is now a very important research area. The target of data mining is to extract useful knowledge for the users in an understandable data structure. It is related with many other areas such as database, data management, modeling and inference, assessment of complexity, vision technology, online updating, etc. Clustering, which is a process that cluster different abstract data into groups based on the similarity among them, is the essenial subject of data mining, and is now applied broadly in mathematics, statistics, biology and economics.This paper analyzes and systematically introduces the broadly used clustering technologies. Based on that, two improved algorithms are proposed:The first is the modified k-means algorithm taking in use of two kinds of improvements, the initial centroids selecting and outlier points deleting policies. The modification efficiently removes the shortcoming of non-controllability caused by random initial centroids selection, adapts the traditonal k-means algorithm into the senario of overlapped clustering.The second is NOV-SOM algorithm, which is a modification to SOM algorithm that alternates the units with function module and extends the latter algorithm into the application of non-vectorized data clustering.In the end, groups of contrast experiments have been conducted. The results show that the two improved algoritms effectively augmented both the precision and efficiency of corresponding traditional algotithms. |