Font Size: a A A

Research And System Implementation Of Topic Model Over Short Text

Posted on:2020-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:L Y LiFull Text:PDF
GTID:2428330572972303Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapidly development of mobile Internet,short text as the popular of information carrier has the advantage of convenient and efficient communication,which greatly satisfies the needs of people to participate in network activities anytime and anywhere.Massive short text information is also a true mapping of people in the real world and the online world.,analysing and mining of the short text information can better guide the behavior of the real world,and also promote more new applications to serve,and better serve human beings..Topic mining is a basic text analysis task that infers potential topical information from large short texts.The traditional topic model has been mature and stable in the topic mining of long text,but the short text is limited by the length of the text,the expression of the information is very short and random,with serious sparsity and lack of context information,the sparse of word-cocurrent information also makes the accuracy of model inferences very challenging.In view of the short text sparseness and lack of semantics,this paper analyzes the existed short text topic model and proposes the SEI-BTM topic model(Semantic Enhancement-TFIDF based Biterm Topic Model).The model includes the following points:(1)using word pairs as modeling objects to alleviate the lack of word co-occurrence information in statistical inference;(2)using word embedding technology to train word representation in large-scale short text sets,and using word pairs' similarity in word representation as the prior information of semantic association of the model to alleviate the semantic deficiency caused by the lack of information in short text;(3)using knowledge embedding technology to train knowledge representation in large-scale knowledge base,and using word pairs' similarity in knowledge representation as the prior information of entity association to supplement the lack of content expression in short text;(4)TF-IDF priori information is added to limit the inference bias caused by high-frequency words.In this paper,four representative short text topic models are selected as comparison experiments,and the experiments show that SEI-BTM model performances better than other models in classification and topic consistency.As the representative of short text information,online comments is an important way for people to express their voices in the network,which truly reflects people's life and thoughts.At the same time,as a source of information for many applications,extracting the valuable from online comments is always the hot spot.in this paper,the SEI-BTM model is applied to the mining of network comments,we design and implement a network comment mining system.The system can be used to collect the comment data from the Internet by the technology of web crawling.and the comment data can be cleaned and pre-processed,the topic mining module in system,including topic acquisition,topic abstracts and topic evolution,the opinion mining module in system,including comment attribute extraction,opinion extraction,and sentiment orientation analysis,the analysis results can be displayed in a visual way.
Keywords/Search Tags:topic model over short text, online comments, SEI-BTM, sparse, semantics
PDF Full Text Request
Related items