The Literature Information Retrieval And Matching From The Web

Posted on:2011-09-04

Degree:Master

Type:Thesis

Country:China

Candidate:L C Wang

Full Text:PDF

GTID:2178330332460839

Subject:Management Science and Engineering

Abstract/Summary:

PDF Full Text Request

With the development of Internet technology, The Web has become a huge data source with much Information, It has become a hotpot that how to how to effectively use and manage this data source. There exist few methods on how to extract the literature information automatically and serve the users with uniform interface in the area of literature Information management. The development of the research management system of Institute of Management of Dalian University of Technology also put forward new requirement to the automatic collection of literature information. Therefore, the main content of out research focuses on the method of the Web information extraction and the literature information extraction in use of the Web.In the area of theme Web pages Information extraction, we proposed a method based on the length of the html nodes. This method can identify the main content of a Web page in use of the features of their structure, then the information can be extracted from the pages. Compared with the traditional method, this method can get a higher precision with lower complexity. Our experiment shows that this method can be applied to the extraction of the information from Web pages very well.In the area of the automatic literature information extraction, we proposed a method based on the html tree and template to extract the information from literature Web pages according to the high similarity in their structure. We apply the structure similarity of Web pages to generate information extraction template, and the auto-generated template was used to the extraction of the literature information. The automatic classification of the Web pages according to their similarity is very precise and we can precisely extract the information of one kind of Web pages using the same template. At last, our experiment testify the effectiveness of our method.At the end of this paper, we applied our method to the development of the research management system to collect the literature information of the teachers automatically, and we achieved a good result.

Keywords/Search Tags:

Web information extraction, HTML tree, Web pages structure, similarity, template generation

PDF Full Text Request

Related items

1	Based On The Html Pages Of Web Information Extraction
2	Research And Implementation Of Fit-Template System Based On Mas
3	Tag Tree Template In The Pages Of Critical Information Extraction And Topic Identification
4	Research And Application Of Automatic Data Extraction From Template-generated Web Pages
5	The Research On Text Extraction From Web Pages
6	Research And Application On The Technology Of Web Information Extraction Based On The HTML
7	The Implementation And Application Of Extracting Structured Data From Web Pages
8	Research On Detection Of Similar Web Pages Based On Text Structure Tree
9	Research Of Web Information Extraction Based On Table Structure
10	Research On The Technology Of The Web Employment Information Extraction Based On The HTML