Font Size: a A A

The Improvement And Application Of TFIDF Algorithm Based On LDA Topic Model

Posted on:2016-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y GaoFull Text:PDF
GTID:2348330485499985Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The VSM, that is commomly used in topic founding, transforms the text of linguistics into the space vector coordinate of mathematics and then converts the abstract similarity problems among texts into distance problems of space vector coordinates. Although it is intuitive and easy to understand, there is also one problem of semantic deficiency that could not identify the semantic information of the text.This paper is aimed at improving the weight algorithm TFIDF of VSM. In order to increase the semantic information of text space vector coordinate(due to coordinate-value is weight-value), this paper tries to increase the topic semantic information of key words weight by introducing the topic concept of LDA. There are two kinds of improved algorithms. Firstly, we can get the value of ? and ?, the probability distribution functions which corresponding document-topic and topic-word, through constructing the LDA. Secondly, we select the front key words as the subject to calculate the TDF. Then we work out the TFIDF-TDF. This text uses streamlined version of Sougou lab. The results showed that the key words which extracted from these two kinds of improved algorithms have obvious promotion. The promotion effects of SI-TFIDF is stability, and the clustering results of TFIDF-TDF is superior to the SI-TFIDF when there are many text topics.Finally, we extracted the news data in a certain period from Sohu news and used the two improved algorithms to extract the hot topics of network news. The results showed that the hot topics of news are consistent with the facts. Then the feasibility and effectiveness of the improved algorithms are further proved.
Keywords/Search Tags:LDA model, topic, TFIDF, semantic influence, topic distribution frequency
PDF Full Text Request
Related items