Font Size: a A A

Research On The Keyphrase Extraction And Relevant Technology

Posted on:2011-11-04Degree:MasterType:Thesis
Country:ChinaCandidate:Z JiangFull Text:PDF
GTID:2178330338979998Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Keyphrase extraction is an important technique in the field of text information processing. With the advent of the Internet age, on-line file shows an astonishing increase in geometry and information explosion has became the main character of this age. Searching and making use of network information becomes more difficult. Keywords, regarded as the brief introduction of an article, could assist people to make good understanding of whole content and save the time for them. Meanwhile, keywords also play an important role in automatic abstracting, information retrieval, text classification and text clustering. In real life, only a small part of the document has a mark on the key words and key words from the manual annotation is a very time consuming work and subjectivity. Therefore, automatically extraction on keyword is required.Keyword should not only reflect the main content of a document, but also reflect the specialty of this document. Keyword are considered as keyphrase in this article because an keyword usually consists of more than one words. Extraction of keyphrase has been a major research topic in the field of information retrieval. The problems this paper focuses on are as follow:First, research of keyphrase related resource construction. We classified the methods on processing structured data, and try to make people understand how to construct resource by structured processing internet data.Second, research of keyphrase on extraction issue using the structure of documents. This paper uses the idea of classification to complete the task of keyphrase extraction, which uses SVM to build classification model and uses CRF to extract keyphrases. The testing result shows that, the mentioned extraction approach has improved dramatically compared with previous methods in precision and recall rate. Third, research of software testing. We used standard software testing methods to test the keyphrase extraction system. This paper collected and classified popular software testing techniques and methods. By perform all the testing experiments using Junit, we found key factors and aspects needed to pay attention to by dividing the experiments into 5 parts and testing them seperately.
Keywords/Search Tags:keyphrase, information extraction, feature selection, term frequency, inverse document frequency
PDF Full Text Request
Related items