Font Size: a A A

Bursty Topic Detection From Microblog Streams Based On Bursty Pattern Mining

Posted on:2019-03-25Degree:MasterType:Thesis
Country:ChinaCandidate:Y S OuFull Text:PDF
GTID:2428330545499756Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Detecting bursty topics from microblogs can not only help users filter information,improve the efficiency of information acquisition,but it can also help governments and companies know hot topics in adcvance.So governments and companies can make appropriate decisions and take measures timely.However,microblogs are short in length,lacking the ability of semantic expression,and are not standardized in terms of words.Furthermore,containing too many colloquialization words and noise data,microblogs are not standardized in therms of words.It can be a very challenging task to quickly and accurately detect bursty topics in a scalable manner.The continuous development of high utility pattern mining algorithms brings new opportunities and construction ideas for the topic detection task.The internal and external utility emphasized by high utility pattern mining algorithm can be used to express the hot and bursty features of topics.However,the application of high utility pattern mining algorithm is still in an early stage,and no research has tried to apply it to the field of bursty topic detection.Therefore,how to seamlessly incorporate the two is a burning issue.In this thesis,we incorporate high utility pattern mining algorithm into bursty topic detection task,aiming at detecting bursty topics with high realistic significance in a much more accurate and scalable way.The whole work can be divided into three parts:Firstly,we propose a framework for bursty topic detection,named ET-EPM,which transforms the bursty topic detection problem into a bursty pattern mining and clustering issue.In terms of the burstiness measurement of terms and patterns,we propose a novelty calculation method based on local weighted linear regression algorithm.Both the theoretical analysis and the empirical data show that this method can help the framework achieve ideal topic detection performance.Secondly,we incorporate the word embedding vector into a modularity-based graph partitioning method and employ the partitioning method to incrementally cluster the bursty patterns into bursty topics.The pattern text expression based on the word embedding vectors has the ability of carrying much more information,thus it can significantly improve the performance of similarity measurement.The modularity-based graph partitioning method can converge to the global optimal solution and cluster the patterns into topics without specifying the cluster number,which is ideal for the cluster work from data streams.Finally,we propose a new topic discription based on the hash expressions,and at the same time,we employ the tranditional topic expression based on topic words as a supplement.By combining the two expressions of words and phrases,we can obtain a set of topic discriptions with higher semantic expression and ensure that no topic is omitted.The empirical research conducted on the Sina microblog set demonstartes that the proposed ET-EPM framework outperforms some of the state-of-the-art bursty topic detection algorithms in bursty and the comprehensive performance.Meanwhile,these remarkable improvements also indicate the effectiveness of our framework in bursty topic detection from microblog streams.
Keywords/Search Tags:high utility pattern mining, bursty pattern, bursty topic detection, word emdedding
PDF Full Text Request
Related items