Font Size: a A A

Research On Text Mining Based On Topic Model

Posted on:2016-01-04Degree:MasterType:Thesis
Country:ChinaCandidate:L WangFull Text:PDF
GTID:2308330461976515Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the continuous development of the Internet, a variety of resources, led to text resource are in explosive growth, finding the valuable information from them which the user interested is becoming more and more difficult. In order to do that, amount of researchers have studied a variety of algorithms and developed a lot of text tools to allow users to effectively organize and manage the text information, thus helping users to quickly, accurately and comprehensively to find the information they need. The work is mainly topic mining technology. Topic is the soul of the text, mining the topic information is the process of discarding the dross, selecting the essential, from perceptual to rational knowledge and further development of the text. At first, we used LDA topic model to mining the topic information of paper and found that the LDA topic model is useful to journal recommendation, but we also found some problems, such as the topic number is very difficult to determine and the topic changes over time. Therefore, we studied how to dig out the topic evolution and visualize it more vividly. Finding the topic evolution means a lot to understand topic hotspot, topic evolution trend and prediction of the topic. The main contents are as follow.Firstly, we studied the value of topic model to journal recommendation. We combined the latent dirichlet allocation topic model and SVM classification model greatly enhanced the result of journal recommendation. Paper submission is a very difficult academic and practical problem, it not only involves the research topic but also the quality of the paper and the journal. To help scholars to choose the right journal when they submit for publication, the paper combines the result of LDA topic model and the SVM classification method to recommend the right journals. Compared with other models (svm-based journal recommendation, content-based journal recommendation, user-based journal recommendation, journal similarity-based journal recommendation), topic-based journal recommendation has a better performance. And, we discovered that some journal exist the problem that they published some paper which were not consistent with the research topic.Secondly, the paper use the hierarchical dirichlet processes topic mining method to study topic evolution, such as the shrinking, the expanding, the newborn, the perishing, the increasing, the decreasing of the topic, and use the ThemeRiver visualization method to display them vividly. The paper takes vehicle patent as a starting point to study the topic evolution of automotive, and uses the HDP topic model to cluster the patent data and mine splitting and merging of the topics by comparing the topics of each year and the topics with history data clustered by HDP and then visualizes the relationship of the topic information using stacked graph. The paper discovers that there are three major topics of the vehicle patent data and here are splitting and merging among different topics, shrinking of the topic, expanding of the topic, newborn of the topic and perishing of the topic.
Keywords/Search Tags:Topic Mining, Hierarchical Dirichlet processes, Latent Dirichlet allocation
PDF Full Text Request
Related items