Font Size: a A A

Research On Keyword Extraction And Structured List Data Extraction

Posted on:2006-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:H XuFull Text:PDF
GTID:2178360212967471Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
It has become an urgent problem that how to obtain the information from Internet intelligently, rapidly and effectively. The keyword extraction and structured list data extraction are both the important methods for achiving the information from the Internet rapidly and accurately. The accuracy of keyword extraction will effect the notation and extraction of knowledge embedded in the documents. Semi-structured data can be wrapped into structured data by the algorithm of structured list data extraction. Both keyword extraction and structured list data extraction have become an important research topic in the area of information extraction and information retrieval.The main task of this article is to mine and use the structure and content information of the documents sufficiently, analyze and study the problem of keyword extraction and structured list data extraction, and propose effective algorithms for keyword extraction and structured list data extraction.In the research of keyword extraction, the classification theory is applied to the design and implementation of the algorithm. The Support Vector Machine model is used as the classification model. Two kinds of features, global context feature and local context feature, are proposed as the classification features. The experimental results show that the proposed algorithm outperforms the existed algorithms significantly. The extracted keywords are also used to the document classification experiment. It is proved that the extracted keywords are effective because the accuracy of document classification is increased significantly.In the research of the structured list data extraction, this article has proposed the separator selection algorithm based on statistical analysis, and designed and implemented the list extraction algorithm using the clustering theory, in which the physical arrangement and content information are made use of. The experimental results show that the proposed clustering based list data extraction algorithm can extraction the data of the structured list effectively.
Keywords/Search Tags:keyword extraction, semi-structured document, structured list, information extraction, machine learning
PDF Full Text Request
Related items