Font Size: a A A

A Biterm Pseudo Document Topic Model For Short Text

Posted on:2017-02-07Degree:MasterType:Thesis
Country:ChinaCandidate:L JiangFull Text:PDF
GTID:2308330485961761Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, people are overloaded with information, which beyond their handling ability. Hence, it becomes an important issue to classify text and forward specific information to people. Nowadays, an amount of social network platforms and online news media appear in people’s life, including Weibo, news websites and online Q&A communities, where people can access various short text data, which renders automatically short text clustering significantly useful.Our work in this thesis locates in precisely discovering topic patterns in short text corpus. To tackle the problem brought by short text sparsity, we proposed a biterm pseudo document topic models based on the idea that co-occurring words reveal more semantic information. We conducted several experiments upon two real world short text datasets with regard to topic coherence, document clustering and document classification. Results demonstrated BPDTM performed best compared to LDA and BTM, which proved its effectiveness on short text topic modeling task.The major work in this thesis is three-fold:1) A method for constructing biterm pseudo documents was proposed based on word triangle relation defined in this thesis.2) A biterm pseudo document topic model for short text was proposed3) A pseudo corpus scaling-down method was proposed to decrease time comspution.
Keywords/Search Tags:Topic Model, Topic Clustering, Machine Learning, Short Text, Text Analysis
PDF Full Text Request
Related items