Font Size: a A A

Research And Implementation Of Clustering Algorithm Over Uncertain Data Streams

Posted on:2010-08-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y FangFull Text:PDF
GTID:2218330368999608Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The management and technology of traditional certain data streams have developed tremendously in the last decades and thus nurtured a mature research field. The research on uncertain data and its achievements were caused by the development of various fields of the information industry and the vast applications of hardware that measure data approximately. In real-world applications, due to the reasons that there is inaccuracy caused by the measurement, during the transmission and inaccuracy in its sources, uncertainty is ubiquitous and thus becomes an inherent feature of data. Consequently, the management and processing technology of uncertain data stream gradually becomes the focus and its research is to be of great importance.Because of the massive applications of data mining, much work is done to the mining of uncertain data in recent years. An important topic of data mining is clustering, which consequently makes the research on clustering over uncertain data streams significant. The distance between objects is employed to measure the similarity of objects in clustering. Consequently, the measurement of the distance between objects is critical to the research on clustering algorithms of uncertain data.In the thesis, a method called ASM, which takes the distribution of objects into consideration and thus utilize the variance into the calculation, is given to improve the accuracy of calculating the distance between uncertain objects. Subsequently, the method named MPD to calculate the distance between object and cluster is proposed. Different from the traditional means, the method takes the effect of all the elements of the cluster into account and uses the average distance of the uncertain object between those in the cluster. A vector that describes the statistic information of clusters, such as the square sum of the elements and etc. is then presented. In the end, an algorithm (UKluStream) is proposed to cluster over uncertain data streams. The efficiency of the algorithm proposed is verified by experiments performed.
Keywords/Search Tags:uncertainty, data stream, clustering, data mining
PDF Full Text Request
Related items