Xml-based Web Information Extraction Technology Research

Posted on:2009-10-08

Degree:Master

Type:Thesis

Country:China

Candidate:X B Shi

Full Text:PDF

GTID:2208360245468763

Subject:Computer software and theory

Abstract/Summary:

PDF Full Text Request

The rapid development of the Internet has become an important source of global information dissemination and sharing. Data on the Web has grown geometrically. To obtain useful information from the Web has become increasingly difficult. "Information overload" has become an urgent solution. The ideal situation is that people enable like searching the database for information on the same Web inquiries. However, how to access to and use useful information from Web has become the problem for research work.The characteristics, such as magnanimity, different constructing, and dynamite change that Internet has, made web information extraction different from traditional mformation extraction, brought the new challenge at the same time. Extraction technology is enriching constantly with increase of the demand, many kinds of information extraction methods have emerged both at home and abroad in recent years. These methods have focused solution problems confronting the Web information extraction to the above, achieved good results overall, but in certain areas there are varying degrees of limitations or flaws. In order to better address the many problems and shortcomings to the web information extraction, it is necessary for web information extraction for further research.In this thesis, author uses of standard XML technology to solve the problem of web site information extracetion and to develop a professional Cheating Event Information Extraction System(CEIES).Based on standard XSLT, using its powerful and flexible properties can code simple, healthy and the general rules. First get target HTML paper, and translating HTML files into XHTML file with the XML parser. Then use XML data query capability to inquiry powerful XML library. DOM trees will be used to restore the rules into the rule base. Based on the usage of the key verb that is expressed by the Case Grammar, partial information of sentence is extracted and is expressed by Knowledge Graphs. Through the join of Knowledge Graphs, partial information is integrated. Finally, some items of information is stored in the database of CEIES.

Keywords/Search Tags:

Information Extraction, Natural Language Comprehension, XML, DOM trees, Knowledge Graphs

PDF Full Text Request

Related items

1	Competitive Intelligence Mining Based On Knowledge Graph For Enterprises In Social Media Environment
2	Narrative Information Extraction with Non-Linear Natural Language Processing Pipeline
3	Design And Implementation Of Knowledge Extraction Algorithm Based On Natural Language Processing
4	Research And Application Of Information Extraction And Knowledge Discovery Based On Professional Literature
5	Design And Implementation Of Knowledge Extraction System For Overlapping Relations In Complex Semantic Context
6	Automatic Knowledge Extraction From The Chinese Natural Language Web Documents And Knowledge Consolidation
7	Research On Methods Of Machine Reading Comprehension Of Unstructured Chinese News Text
8	Research On Reading Comprehension Of Pre-Training Language Model Fused With Knowledge
9	Research On Clustering Optimization Based On Knowledge Graph Technology
10	Application Of Ontology Semantics-comprehension On Natural Language In Anti-spam