The Research Of Keyword Extraction Technology In Multi-Document

Posted on:2010-03-04

Degree:Master

Type:Thesis

Country:China

Candidate:J Yang

Full Text:PDF

GTID:2178360272985241

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Automatic extraction of keywords depends on a computer to extract words reflecting the subject content from document, which is also regarded as automatic labeling of keywords, and it can provide a compact abstract to users, making information orientation simpler. This thesis researches the algorithm for finding the topic by extracting keywords from multi-document having the same topic. This thesis gains research results as follows: First, this thesis puts forward the method of ATF*PDF for computing words'weight in the multi-document. The more the number of documents containing a given word, the more possible the word is an important element which can express the topic of the multi-document, when the word's weight is exponential to the document frequency in the method of ATF*PDF, the result of keyword extraction is better than that of linear relationship. In addition, the method of ATF*PDF also takes the influence of the size of a single document to word's weight into account.Second, the thesis provides a method of keyword extraction based on united weight, and improves the TextRank method for extracting keywords in the multi-document. In consideration of redundancy in candidate when keywords are extracted, this thesis uses the method of united weight to unit these words'weight who have higher semantic similarity with each other, so as to adjust the order of candidate to select keywords; In addition, in consideration of the fact that words expressing the same topic have strong semantic relation, this thesis improves the TextRank method to make these words with stronger semantic relation mutually enhance importance, recalculate the weight of candidate in a TextRank model. Experiment shows that compared with keyword-based cluster-labeling algorithm, two methods brought forward in this thesis have improved effect on keyword extraction.This thesis combines clustering technique with a keyword extraction method in the multi-document to construct new clustering search engine, compares this search engine with Vivisimo used in commerce, and expatiates the respective advantages and disadvantages of two search engines. At last, the work of this thesis is summed up, and the next study target on keyword extraction technology in the multi-document is presented...

Keywords/Search Tags:

Keyword Extraction, United Weight, TextRank Model, Multi-document, Clustering

PDF Full Text Request

Related items

1	Research And Implementation Of News Keyword Extraction Method Based On Semantic Clustering And Weighted TextRank
2	Research On Keyword Extraction Method Based On Document Topical Structure And Word Graph Iteration
3	Research On The Optimization Of TextRank Keyword Extraction Algorithm And SOM Text Clustering Model
4	Chinese Multi-document Automatic Summarization Extraction Based On The Combination Of LDA And TextRank
5	Automatic Abstract Extraction Based On Keyword And Graph Model
6	Chinese Single Document Abstract Research Based On Doc2Vec And Improved TextRank
7	Research And Application Of K-core-based Graph Decomposition TextRank Keyword Extraction Technology
8	Research And Implementation Of Keyword Extracion For Work Report
9	Complex Text Keyword Mining Method Based On Graph Embedding Model
10	An Automatic Extraction Method For Chinese Article Keywords Based On TextRank And Similarity Of Word Items