Font Size: a A A

The Research Of Keyword Extraction Technology In Multi-Document

Posted on:2010-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:J YangFull Text:PDF
GTID:2178360272985241Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Automatic extraction of keywords depends on a computer to extract words reflecting the subject content from document, which is also regarded as automatic labeling of keywords, and it can provide a compact abstract to users, making information orientation simpler. This thesis researches the algorithm for finding the topic by extracting keywords from multi-document having the same topic. This thesis gains research results as follows: First, this thesis puts forward the method of ATF*PDF for computing words'weight in the multi-document. The more the number of documents containing a given word, the more possible the word is an important element which can express the topic of the multi-document, when the word's weight is exponential to the document frequency in the method of ATF*PDF, the result of keyword extraction is better than that of linear relationship. In addition, the method of ATF*PDF also takes the influence of the size of a single document to word's weight into account.Second, the thesis provides a method of keyword extraction based on united weight, and improves the TextRank method for extracting keywords in the multi-document. In consideration of redundancy in candidate when keywords are extracted, this thesis uses the method of united weight to unit these words'weight who have higher semantic similarity with each other, so as to adjust the order of candidate to select keywords; In addition, in consideration of the fact that words expressing the same topic have strong semantic relation, this thesis improves the TextRank method to make these words with stronger semantic relation mutually enhance importance, recalculate the weight of candidate in a TextRank model. Experiment shows that compared with keyword-based cluster-labeling algorithm, two methods brought forward in this thesis have improved effect on keyword extraction.This thesis combines clustering technique with a keyword extraction method in the multi-document to construct new clustering search engine, compares this search engine with Vivisimo used in commerce, and expatiates the respective advantages and disadvantages of two search engines. At last, the work of this thesis is summed up, and the next study target on keyword extraction technology in the multi-document is presented...
Keywords/Search Tags:Keyword Extraction, United Weight, TextRank Model, Multi-document, Clustering
PDF Full Text Request
Related items