Font Size: a A A

K Means Clustering Algorithm To Improve Research

Posted on:2014-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:F Q LiuFull Text:PDF
GTID:2248330398957738Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Data mining is a process which extract implicit, unknown, and potentially valuableinformation and knowledge from large data set. Data mining technology integrated database,machine learning, statistics, artificial intelligence, pattern recognition and other fields ofknowledge and it is a cross science and technology.Clustering is an important approach in data mining technology, and clustering is an effectivetechnical method which explores and extracts the inner relationship between different things. Itsmain function is to separate the given data set according to certain rules, which makes thesimilarity larger between data objects in the same class, and makes the similarity smallerbetween two objects in the different chasses. At present, clustering analysis is widely used in allwalks of life. The cluster analysis algorithm can be divided into the following five categoriesaccording to its principles, such as hierarchy method, partitioning method, grid-based method,density-based method and model-based method.K-Means algorithm is a classical algorithm based on partition. The algorithm is widely usedbecause it has the advantages of simple operation, high efficiency and good scalability. Thealgorithm, however, has some defects: the clustering results has sensitivity and dependencies tothe choice of initial clustering center and k value of user input, and easily forms the situation oflocal optimal solution due to the influence of the outlier. This article mainly aims at puttingforward K-means algorithm clustering process because off K-means algorithm easily influencedby outlier and dependent by the K value of user input. The article provides algorithm for datapreprocessing Based on grid and the K value automatic generation algorithm based on maximumdistance.
Keywords/Search Tags:Data Mining, Clustering, K-means Algorithm, Outlier, Network
PDF Full Text Request
Related items