Font Size: a A A

Design And Implementation Of A Topic Learning And Ranking System Of Short Texts

Posted on:2017-07-16Degree:MasterType:Thesis
Country:ChinaCandidate:L L XinFull Text:PDF
GTID:2348330503989877Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the development of the Internet, people produced a large amount of short text data in the Internet, these data relate to people's daily life and contain a lot of valuable knowledge. However, there are three features in the topic learning of short texts, first, the content of the short texts are short and non-standardized, second, there is no efficient approach to accurately estimate the number of the topics, finally, the traditional topic models do not rank the topics based on their importance. In order to solve these problems, the paper designs and implements a topic learning and ranking system of short texts.The system includes data acquisition module, data preprocessing module, topic learning module and topic ranking module. In the data acquisition module, a data crawler tool is designed to crawl short text data by topics. In the data preprocessing module, in order to transfer the data to the style matching with topic models, we have designed a procedure involving data cleaning, new word detection, word segmentation and stop words removal. In the topic learning module, we have applied biterm topic model to find the topics of short texts and designed a procedure including dictionary establishment, rare words removal and parameter estimation algorithm. In the topic ranking module, in order to better utilize the information of topic learning, we have designed and achieved topic ranking by means of topic filtering and topic importance ranking.In order to test the effectiveness of the system, this paper carries out a functional testing and experimental analyses in each module of the system. We have first analyzed data acquisition and data preprocessing module. They are validated to be effective. In the short text mining module, the accuracy of the module is confirmed. Finally, we have assessed the results of topic ranking, which turns out to be effective and accurate.
Keywords/Search Tags:Short text, Topic model, Topic Learning, Topic filtering, Topic ranking
PDF Full Text Request
Related items