Font Size: a A A

Research On Automatic Keyword Extraction Algorithm Based On Improved TFIDF

Posted on:2016-08-11Degree:MasterType:Thesis
Country:ChinaCandidate:K Y YangFull Text:PDF
GTID:2308330470460383Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The shortage of the traditional TFIDF algorithm exists in two aspects: in text internal, TFIDF ignores the expressive force to the degree of word’s importance of the feature item’s characteristics in addition to term frequency; In text external,TFIDF leaves the dependence relation between the feature item and the categories of text set out of account.The known improvements on TFIDF mostly face to the problem in text external. These improvements either directly improved formula of IDF, or based on category distribution information to join the new factor to adjust the IDF calculation results. These improved algorithms are applied to text categorization can get better classification effect, but when they are used in the keyword extraction algorithm, maybe they can’t directly be applied, or the extracting effect maybe not ideal.This article aims at the problem that TFIDF algorithm to calculate the IDF without considering the distribution of words in the text set and is highly dependent on word document frequency.It puts forward improvement methods that using information gain combination with dispersion to quantify the distribution information of words in the text set, then using the quantization to adjust the IDF results; The article aiming at problem that text inside information of word frequency TF is insufficient, offers an improvement methods that using multiple characters to represent the importance of words to the text based on word frequency fusion word length, the part of speech, word location and word span.The experimental results show that the algorithm to extract the keywords effect is obvious.
Keywords/Search Tags:automatic keyword extraction, TFIDF, information gain, dispersion, multiple feature fusion
PDF Full Text Request
Related items