XML-based WEB Information Extraction System Research And Implementation

Posted on:2012-04-04

Degree:Master

Type:Thesis

Country:China

Candidate:Y Tian

Full Text:PDF

GTID:2178330335972224

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

As computer science and technology and the Internet continue to develop, web has become essential in our work and life. Web information resources are growing on the number of exponentially, web has become a huge repository of information, in order to obtain desired information accurately and effectively becomes more and more difficult, and how to extract useful information from a web information repository has become the subject of many research scientists, web information extraction technology is presented.Based on the resrarch in existing web information extraction technology and combined with the standard XML technology, XML-based web information extraction technology is proposed. The main contributions in this paper are listed as follows:1. Based on the research in previous technical achievements, to integrate and extend the existing information extraction technology, XML-based framework of Web information extraction system model is designed.2. Do research in the key technologies of information extraction, and describe information extraction processes, and propose the extraction rules and the generation method of extract configuration file. Ultimately, the main function of the Web information extraction system is realized.3. The extraction result is classified, using a Naive Bayesian theory. A Chinese Web text classification system model is designed, which is under the framework of information extraction system model.4. The extraction result is XML data document type. Based on the analysis of the current database storage'technology, discussed the different methods of the extraction results tostore in the database.This design of XML-based Web information extraction system can better solve the problem of web information extraction, experimental results show that the system has higher recall and precision rates.

Keywords/Search Tags:

XML, Web information extraction, extraction rules, Text classification, XML data storage

PDF Full Text Request

Related items

1	Design And Implementation Of Web Information Extraction Rules
2	Text Information Extraction Based On Domain Rules And Deep Learning
3	The Research And Implementation Of Web Text Classification That Use Table Information
4	Research On Specialty Knowledge Retrieval Method Based On Web Information Extraction
5	Contributions To Several Key Issues Of Associative Text Classification
6	Study And Design Of Text Information Extraction And Classification System
7	A Research On Automatic WEB Documents Extraction And Classification
8	Research On Language And Key Techniques For Accurate Information Extractionrules Towards Complex Web
9	Design And Implementation Of Web Information Extraction System SEU-WIE
10	Visual Web Page Information Extraction And Text Feature Word Extraction Technology Research