Font Size: a A A

The Literature Information Retrieval And Matching From The Web

Posted on:2011-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:L C WangFull Text:PDF
GTID:2178330332460839Subject:Management Science and Engineering
Abstract/Summary:PDF Full Text Request
With the development of Internet technology, The Web has become a huge data source with much Information, It has become a hotpot that how to how to effectively use and manage this data source. There exist few methods on how to extract the literature information automatically and serve the users with uniform interface in the area of literature Information management. The development of the research management system of Institute of Management of Dalian University of Technology also put forward new requirement to the automatic collection of literature information. Therefore, the main content of out research focuses on the method of the Web information extraction and the literature information extraction in use of the Web.In the area of theme Web pages Information extraction, we proposed a method based on the length of the html nodes. This method can identify the main content of a Web page in use of the features of their structure, then the information can be extracted from the pages. Compared with the traditional method, this method can get a higher precision with lower complexity. Our experiment shows that this method can be applied to the extraction of the information from Web pages very well.In the area of the automatic literature information extraction, we proposed a method based on the html tree and template to extract the information from literature Web pages according to the high similarity in their structure. We apply the structure similarity of Web pages to generate information extraction template, and the auto-generated template was used to the extraction of the literature information. The automatic classification of the Web pages according to their similarity is very precise and we can precisely extract the information of one kind of Web pages using the same template. At last, our experiment testify the effectiveness of our method.At the end of this paper, we applied our method to the development of the research management system to collect the literature information of the teachers automatically, and we achieved a good result.
Keywords/Search Tags:Web information extraction, HTML tree, Web pages structure, similarity, template generation
PDF Full Text Request
Related items