Font Size: a A A

Topic Clustering Analysis Of Popular Micro Blog Event

Posted on:2017-02-01Degree:MasterType:Thesis
Country:ChinaCandidate:J WangFull Text:PDF
GTID:2308330485464104Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In modern social life, as one of the most important carriers, Internet has generate various online media, such as television, newspaper, radio and so on. With the rapid development of application of the internet, the number of users are also grown very fast, and it has become a very important field for people which can express their opinion freely, typical representatives of the internet applications are online video sites, Micro blog, We-chat, etc. Through these platforms, users can share the news anytime, anywhere, or make comment about the popular micro blog topic. Moreover, public information on these platform can be the real reflection of the people’s opinion. So, it is important to mine the valuable information from these medias. Research the content on these platforms is meaningful, such as public opinion analysis, new media Marketing, Brand maintenance and so on.Micro blog is a social network based on the follow relationship between the users, their users can post short text which is fewer than 140 words, and point of praise, comment, forward others’micro-blog. With the development in recent years, the rapid growth in the number of users will generate vast amounts of data every day. With the explosive growth of data volume, users feel it is more and more difficult to find the valuable information from the ocean of data. Firstly, the content of micro-blog is various, including good/bad or filled with many spam. Secondly, for a particular event, different people have different purposes and emotions, so there will be some different views about the one event; Thirdly, with the number of blog about one event changes a lot, or some of the new situation can also affect public opinion, how to accurately obtain these evolution, it is a problem need to be solved. Analysis of micro-blog can help us to find public opinion, emotional tendencies, providing reliable and valuable information to support decision-making and forecasting.In this thesis, we discuss from the basic concept of text mining, related technology and algorithms, to the representation of the text, related text mining theories. Then, the after chapter detailed the LDA (Latent Dirichlet Allocation) topic model, including the mathematical basis for modeling, evaluation, inference. The main research of this thesis is summarized as follows:1. We do the word-level feature selection through the discovery of new words, TF (term frequency) and IDF (inverse document frequency), it can pick out the good feature and get rid of the bad;2. Using LDA topic model to build model,then cluster topics, analysis the evolution of the topics and using of key words and weights in the topic as two-tuples elements combine the dynamic threshold value to discover new topics;3. Putting forward the hypothesis’^ short text has one and only one main topic ". based on that, this thesis proposes a new method, it use the main topic as the decisive factor to classify short texts.In the experiment, using the proposed algorithm with micro-blog data on the terrorist attacks in Paris and Baidu knows data, the experimental results were analyzed to verify the proposed method. Analysis of the experimental results, they show that the feature selection method improves the modeling effect with LDA topic model; Based on the key words and their weights, the new topic discovery algorithm can discover new topics well, which correspond to the sub-topics of the hot topic; The classification method based on topic cluster proposed in this thesis has been raised the effect compared to the traditional K-means method.
Keywords/Search Tags:Text Mining, Micro-blog, LDA Topic Model, Topic Evolution
PDF Full Text Request
Related items