Font Size: a A A

XML-based WEB Information Extraction System Research And Implementation

Posted on:2012-04-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y TianFull Text:PDF
GTID:2178330335972224Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As computer science and technology and the Internet continue to develop, web has become essential in our work and life. Web information resources are growing on the number of exponentially, web has become a huge repository of information, in order to obtain desired information accurately and effectively becomes more and more difficult, and how to extract useful information from a web information repository has become the subject of many research scientists, web information extraction technology is presented.Based on the resrarch in existing web information extraction technology and combined with the standard XML technology, XML-based web information extraction technology is proposed. The main contributions in this paper are listed as follows:1. Based on the research in previous technical achievements, to integrate and extend the existing information extraction technology, XML-based framework of Web information extraction system model is designed.2. Do research in the key technologies of information extraction, and describe information extraction processes, and propose the extraction rules and the generation method of extract configuration file. Ultimately, the main function of the Web information extraction system is realized.3. The extraction result is classified, using a Naive Bayesian theory. A Chinese Web text classification system model is designed, which is under the framework of information extraction system model.4. The extraction result is XML data document type. Based on the analysis of the current database storage'technology, discussed the different methods of the extraction results tostore in the database.This design of XML-based Web information extraction system can better solve the problem of web information extraction, experimental results show that the system has higher recall and precision rates.
Keywords/Search Tags:XML, Web information extraction, extraction rules, Text classification, XML data storage
PDF Full Text Request
Related items