Font Size: a A A

Grid Clustering Research Based On Impact Factor Of Grid Density

Posted on:2015-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:B YangFull Text:PDF
GTID:2268330428464956Subject:Basic mathematics
Abstract/Summary:PDF Full Text Request
Data mining is generated by the needing of extracting valuable information fromhuge amounts of data information. As one of the important branch of data mining,clustering analysis can automatically identify classes that composed by similar datapoints that have not been labeled. Among various kinds of clustering algorithms, theclustering algorithm based on density can identify classes that have different densityand shape. However, clustering based on density often needs to set the globalparameters, and more than one other parameter. When the density of classes changesstrongly, the cluster algorithm will encounter difficulties. The clustering algorithmbased on grid calculates the information of mesh instead of data points, so it improvesthe processing speed. But the algorithm increases the speed at the expense of reducingprecision. Mesh size influences the quality of clustering, the smaller the particle size,the more precise clustering, but the computation cost is higher, and the greater theparticle size, the more crude of the quality of clustering.According to these disadvantages of clustering algorithms based on density andgrid, we consider the impact factor of grid density in this paper, propose an improvedclustering algorithm: Impact Factor of Grid Density based Clustering algorithm(IFGDC). The algorithm’s main work includes:(1)clustering grid unit instead of datapoints by partitioning the data space to reduce the complexity of clustering operationeffectively;(2)defining some concepts based on grid adjacent relations to avoid theinconvenience of determining the radius in traditional algorithm based on density;(3)putting forward the concept: impact factor of grid density to determine the core cellfrom the high-density grid mesh;(4)giving a method to extract clustering edgeboundary point and further to improve the clustering accuracy. Finally, we test IFGDCclustering algorithm through some experiments, verify the correctness and validity ofthe algorithm.K-means clustering algorithm is simple and it becomes a classical clusteringalgorithm. However K-means clustering is sensitive to parameters, it relies on theuser’s experience to select cluster number and initial cluster centers; also, the algorithm is susceptible to noise interference, and it’s results depend on the order ofdata input. Considering improving these deficiencies, this paper presents an improvedK-means algorithm based on IFGDC. Firstly, the new algorithm uses the speedadvantage by grid-clustering to preprocess data points to find the general structure anddistribution of the data sets, obtain clustering number k and initial mass center of theclusters. Then we use these two parameters for K-means clustering. Compared withsimply using K-means clustering directly, the experiment results show that theimproved algorithm can improve the quality of the parameter k and initial center ofmass. The new algorithm can reduce the sensitivity of the "noise", the cluster resultsare certain, and do not rely on the order of data entry. So the new algorithm canimprove the clustering effect effectively.The main content is summarized, and the future work is prospected at the end ofthis paper.
Keywords/Search Tags:data mining, clustering, grid, density, impact factor of grid density, K-means
PDF Full Text Request
Related items