Font Size: a A A

Technology Research, The Concept Of Tree-based Web Information Extraction

Posted on:2011-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:W GuFull Text:PDF
GTID:2208330332973027Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Due to the rapid development of Internet and communications, modern people live in a world of data information. Every day there is a large number of news published and reproduced on the Internet. Every day there are a lot of information on uploading and downloading on the Internet. Information on the Internet in general boundless like the sea, but his rate of development is really amazing. There is a growing need for a technology and tools to help them find the information that they need quickly. It is hoped that this technology and tools to be able to have a high accuracy, efficiency fast and artificial intelligence features. Web information extraction of technology becomes the focus of attention.At present, a variety of Web information extraction technology and systems has been developed. In the field of information extraction has also made important achievements, but between them also displays some differences and the insufficiency. According to the principle of use of Web information extraction can be divided into six kinds of way. For example, information extraction of based on wrapper, information extraction based on HTML structure, information extraction based on natural language processing, and so on. In the establishment of extraction templates, Part of systems can only produce the single slot extraction rule, This has led to results very single and not satisfactory. Some are multi-slot extraction rules, but require professional manual preparation, so need to extract the relevant content is very familiar with the relatively complicated to implement. Some other system to extract the object is very harsh, is only applicable to a particular type or content of the text extraction, thus weakening the applicability of information extraction.In view of the above question, this article based on the concept of an expanded approach to building information extraction templates and concept-based extraction rules. The goal of this article studies is needs to study one kind of effective learning algorithm from to move the production extraction rule, even if like this is the non-professional can also the smooth instruction extraction rule production and extracts the information which from the similar structure homepage needs to oneself. This article uses based on the semantics concept extension mechanism, through the suitable artificial establishment, enhances system's stability and the validity. In the information extraction process has used the concept extension mechanism fully, from extracts template's establishment, to the extraction text's information filtration and the information mapping mechanism, has manifested the concept extension function to the final text database's inquiry mechanism. Certainly, in the information extraction extracts the text processing to be very also important, this article conducts the research in view of three subject matters to discuss, respectively is names the entity the recognition, refer to a generation of resolution and time information processing.This article studied compares based on the concept Web information extraction technology and the former technology, had the obvious improvement, first has formed many templates, expanded the extraction result scope; Next has applied the concept mechanism in the extraction rule, also has manifested the artificial intelligence characteristic, even more user-friendly extraction information.
Keywords/Search Tags:Information extraction, concept tree, semantic template, extraction rule
PDF Full Text Request
Related items