Font Size: a A A

Research On Clustering Algorithm Of Uncertain Data

Posted on:2016-06-18Degree:MasterType:Thesis
Country:ChinaCandidate:D M PanFull Text:PDF
GTID:2308330464469410Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The continuous development of information technology has brought the explosive growth to the various data of all fields in the society. The massive amount of data comes from data transmission by the Internet, financial and commercial transactions, and social networking and media information, which is of the great importance for data mining. Clustering algorithm is one of the critical areas for the data mining.Increasingly, the more attention are paid to the study on the data mining since uncertain data is everywhere and unavoidable in the world. As we can see, the tolerance data from the machine accuracy, customer data from human interference for security are uncertain data which cannot be ignored because they impact the results. It is concluded that how to proceed with uncertain data is becoming greatly critical.Regarding the aforesaid the issues, the following issues are illustrated:In the first place, the origin and research situation in relation to data mining are briefed and the uncertain data is followed such as the reasons causing uncertain, data types and research situation. Clustering is introduced in terms of the definition, similarity measurement and mathematical models. The simple description of different types of universal clustering algorithm is given. Therefore, the uncertain data clustering can be elaborated which is grounds in terms of the theory.The study on clustering algorithm has shown that some clustering algorithms still share the defects with the original one when it comes to dealing with uncertain data. The dissertation presents a new approach on the base of the relative density which redefines the dissimilarity measure and the relative density. More importantly, such way can solve the above problems in an effective way which can produce the better clustering result.A connected component-based clustering algorithm for uncertain data is proposed to cope with its categorical attribute. It defines the uncertain attribution dual weights attribute similarity based on the probability of feature categorical attribute. This clustering algorithm gets results byseeking connected components of an undirected graph, which can be calculated. By doing this, it handles the categorical attribute clustering in an efficient way.
Keywords/Search Tags:uncertain data, clustering, relative density, categorical attribute, data mining
PDF Full Text Request
Related items