Technology Research, The Concept Of Tree-based Web Information Extraction

Posted on:2011-06-02

Degree:Master

Type:Thesis

Country:China

Candidate:W Gu

Full Text:PDF

GTID:2208330332973027

Subject:Computer software and theory

Abstract/Summary:

Due to the rapid development of Internet and communications, modern people live in a world of data information. Every day there is a large number of news published and reproduced on the Internet. Every day there are a lot of information on uploading and downloading on the Internet. Information on the Internet in general boundless like the sea, but his rate of development is really amazing. There is a growing need for a technology and tools to help them find the information that they need quickly. It is hoped that this technology and tools to be able to have a high accuracy, efficiency fast and artificial intelligence features. Web information extraction of technology becomes the focus of attention.At present, a variety of Web information extraction technology and systems has been developed. In the field of information extraction has also made important achievements, but between them also displays some differences and the insufficiency. According to the principle of use of Web information extraction can be divided into six kinds of way. For example, information extraction of based on wrapper, information extraction based on HTML structure, information extraction based on natural language processing, and so on. In the establishment of extraction templates, Part of systems can only produce the single slot extraction rule, This has led to results very single and not satisfactory. Some are multi-slot extraction rules, but require professional manual preparation, so need to extract the relevant content is very familiar with the relatively complicated to implement. Some other system to extract the object is very harsh, is only applicable to a particular type or content of the text extraction, thus weakening the applicability of information extraction.In view of the above question, this article based on the concept of an expanded approach to building information extraction templates and concept-based extraction rules. The goal of this article studies is needs to study one kind of effective learning algorithm from to move the production extraction rule, even if like this is the non-professional can also the smooth instruction extraction rule production and extracts the information which from the similar structure homepage needs to oneself. This article uses based on the semantics concept extension mechanism, through the suitable artificial establishment, enhances system's stability and the validity. In the information extraction process has used the concept extension mechanism fully, from extracts template's establishment, to the extraction text's information filtration and the information mapping mechanism, has manifested the concept extension function to the final text database's inquiry mechanism. Certainly, in the information extraction extracts the text processing to be very also important, this article conducts the research in view of three subject matters to discuss, respectively is names the entity the recognition, refer to a generation of resolution and time information processing.This article studied compares based on the concept Web information extraction technology and the former technology, had the obvious improvement, first has formed many templates, expanded the extraction result scope; Next has applied the concept mechanism in the extraction rule, also has manifested the artificial intelligence characteristic, even more user-friendly extraction information.

Keywords/Search Tags:

Information extraction, concept tree, semantic template, extraction rule

Related items

1	Tag Tree Template In The Pages Of Critical Information Extraction And Topic Identification
2	Research And Implementation On The Method Of Chinese Domain Concept And Relation Extraction Based On Semantic Graph
3	Research On Web Product Indicator Extraction Based On Ontology
4	Research And Application Of Automatic Data Extraction From Template-generated Web Pages
5	Method Of Opinion Target Extraction Combining Rule Template And Recommended Systematic Method Research
6	Research On System Of Multi-field Information Extraction Based On Semantic Role And Concept Graphs
7	Domain independent semantic concept extraction using corpus linguistics, statistics and artificial intelligence techniques
8	Research On Web Informaition Extraction Techniques
9	Chinese Information Extraction And The Method Of Summarization Generating Based On HowNet Semantic
10	Research On Chinese Named Entity Semantic Relation Extraction Based On Dependency Tree