Font Size: a A A

Topic Detection On Scientific Research Papers Based On Topic Model

Posted on:2014-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y L LeiFull Text:PDF
GTID:2268330422459305Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The arrival of information age, prompting scientific research has also been rapiddevelopment, the resulting massive growth in scientific literature resources, Howeverit becomes more difficult to access what we are looking for. For a large number ofjournals, using only simple keyword search, the search results less accura-te.Newattempt in the field of bioinformatics, based on scientific literature of bioinformaticsas an example, LDA model will be applied to the topic detection of Thebioinformatics technology literature, is a good method for text classification.The system discussed in this article is aiming at discovering topics in a set ofscientific research papers so as to get a broad view of the dataset as well as to learnthe hottest topics. The system first uses K-means Clustering Model to acquire thedistribution of topics. K-means method is easy to understand and has good effect onshort texts, but when facing with scientific researches which have high similarityrelatively; this method didn’t perform well enough. Then the system uses the latesttopic model (LDA) to detect topics. Based on a probabilistic model with hyperparameters, the LDA model will correct those parameters after a great quantity ofiteration and show both topic distribution over documents and words distribution overtopics. Moreover, LDA has great improvement on both the efficiency and result.Finally, the result and its related analysis of topic detection on abstracts of paperspublished on Bioinformatics will also be shown.
Keywords/Search Tags:Topic detection, K-means algorithm, LDA, Topic Model
PDF Full Text Request
Related items