Font Size: a A A

The Research Of Text Data Streams Clustering Algorithm Based On Affinity Propagation

Posted on:2017-05-31Degree:MasterType:Thesis
Country:ChinaCandidate:Y M LiFull Text:PDF
GTID:2348330488454466Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data, a large amount of unstructured text data streams have emerged online. Those data streams are dynamic, high-dimensional and sparse. Faced with these generated in real time, the huge amount of data, complex structure of unstructured data, there is an urgent need to extract valuable information and knowledge. Text data stream clustering technique is commonly used method of analysis of these unstructured data, which is in the news filtering, topic detection and tracking (TDT), user characteristics recommended and achieved good application results, quickly became a research hotspot.For these characteristics and on the basis of the traditional AP algorithm, a text data stream clustering algorithm---OAP-s algorithm is proposed in this paper. The OAP-s algorithm combined weighted arithmetic extensions to text data stream clustering. By introducing attenuation factor in the AP algorithm, OAP-s algorithm passes the clustering center of the current window to the next window, while attenuating the results. However, this OAP-s algorithm has also some shortcomings. Therefore, the author proposes another text data stream clustering algorithm---OWAP-s algorithm. Based on the OAP-s algorithm, OWAP-s algorithm defines the weighted similarity, introduces attractive factor, makes the historic clustering center more attractive, thus obtains more accurate clustering results. Meanwhile, both algorithms adopt the sliding time window model, which reflects the temporal characteristics as well as the distribution of the data stream. Experimental results show that both algorithms are flexible and extensible, and they are more accurate and more stable than OSKM algorithm.Then, based on OWAP-s algorithm to crawl three stock news events were detected.
Keywords/Search Tags:Data Mining, AP Clustering, Text Data, Sliding Window, Weight
PDF Full Text Request
Related items