Web Information Extraction Based On Inductive Study

Posted on:2010-11-05

Degree:Master

Type:Thesis

Country:China

Candidate:C Y Zhang

Full Text:PDF

GTID:2178360275465309

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

With rapid development of Internet,Internet has become one of the most important knowledge repositories.To realize rapid and efficient extraction and usage among knowledge makes good application prospects and value. Large-quantity,semi-structured,and dynamic which are inherent characteristics of the Internet,has brought the complexity scalability and adaptability on information extraction.However,the emergence of XML technology provides an opportunity for solving the problems in web information extraction.After having analyzed and researched the technology of XML and information extraction at the beginning of paper,it is found that how to set a rule of extraction efficiently is the difficult for the recent web information extraction. Upon the existing problems,the paper provides a solution of web information extraction based on public path study and deeply studies related technologies.The key problem in information extraction is how to generate accurate, general,and robust extraction rules.The Paper makes use of the advantages of standards XSL and XPATH in data orientation and conservation to solve the problem..The method is that induction learning automatically can be realized by training samples and heuristic processing.And the information blocks which users are interested in can be located accurately based on the patterns appearing again in samples.Moreover,by this method the extraction rules based on XSLT is generated and information extraction automatically based on rules is realized.Finally,cope with the actual projectâ… participated in and use C++ programming language in windows platform,the prototype information extraction system has been built with good interpersonal interactive capabilities. Experimental results show that the system can extract the interest to the field of web pages;meanwhile,it has good user experience,scalability and adaptability.

Keywords/Search Tags:

Semi-structured, Information Extraction, Inductive Study, Extraction Rule

PDF Full Text Request

Related items

1	Semi-structured Web Information Extraction Technology And Its Application
2	Research On Semantic Information Extraction For Semi-structured Documents
3	Research Of Pattern Extraction From Semi-structured Data Based On Rules
4	Research On Keyword Extraction And Structured List Data Extraction
5	The Study Of Semi-supervised Web Data Extraction Rule Induction Based On User Interaction
6	Design And Implementation Of The Core Information Extraction System Of Semi-structured Financial Contract
7	Study On Text Preprocessing And Automatic Rule Learning Technology For Information Extraction
8	Study On Information Autonomous Extraction Technology Of Web Pages
9	Research And Application Of Extraction Method Of Semi-structured Text Information
10	Research On Methods Of Semi-structured Data Implication Rules Extraction