Font Size: a A A

Supporting Element Count Of Frequent Pattern Mining Based On Social Media Data

Posted on:2017-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:L L LvFull Text:PDF
GTID:2348330503972501Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In recent years, such as foreign Twitter, Facebook, Instagram, LinkedIn and domestic Sina microblogging, wechat and other social software like mushroomed quietly come into people's lives. while using these social platforms, the data generated by users also hides a huge economic and social value. However, due to its own social communicating streaming features, such as the size of data, data generated fast, chaotic arrival order, data can only be a finite number of processing, social communicating streaming has presented a huge challenge for people.Firstly, this thesis use the streaming text data and user information acquired from Sina microblogging platform and designed a kind of text clustering algorithm based on user information. When history and current window clusters merge, the cluster algorithm make full use of user information, and make users information as a measure of the combined cluster factor. At last, the mining results will be more accurate. Secondly, when mining frequent pattern about the event, the thesis proposed an algorithm which support element count of frequent pattern mining. FP-growth-EC is based on frequent pattern growth algorithm. While frequent pattern tree structuring, the item count informationis added to the end of item node; at the same time, during the frequent pattern mining, the current node and child node count information are combined. By adding count factor way to the frequent pattern tree node, FP-growth-EC can not only mining frequent patterns, but can obtain count information of frequent patterns.The results showed that:(1)Compared with the clustering algorithms without user information about the "Wei Zexi events", DBSCAN-IB is more accurate.(2) Using the FP-growth-EC and the frequent pattern algorithm without adding counting factor to solve the 10 sets of randomly generated with the element count, the former costs less time.
Keywords/Search Tags:social communicating stream, data mining, clustering, frequent pattern
PDF Full Text Request
Related items