Font Size: a A A

Research On Web Information Extraction Technology Based On Ontology Of Petroleum Domain

Posted on:2016-05-11Degree:MasterType:Thesis
Country:ChinaCandidate:Y J LiFull Text:PDF
GTID:2308330461481245Subject:Software engineering
Abstract/Summary:PDF Full Text Request
From the process of the digital oilfield to the intelligent oilfield, the information source of the oilfield becomes more and more extensive. In the process of information application, users need to manage data in oilfield. In addition, users need to obtain data from various documents, such as Web pages, research reports, literature and so on.This thesis mainly describes the current enterprise search engine in oilfield itself can not be direct, automatic, efficient to extract accurate information(such as the numbers of wells, the well location) from various documents.The semantic information and the pattern are not clear. So we analyses the situation and constructs a Web information extraction system of petroleum ontology. This research will provide the basis for automatic report generation and knowledge reasoning of oilfield in the future.There is the great realistic meaning and practical value.This thesis researches the technology of current information extraction based on ontology and then proposes the framework model of Web information extraction based on the petroleum ontology. It designs and implements a prototype system based on this model. The main contents of this thesis as follows:1. According to the text of the pronoun phenomenon caused by the uncertainty problem of information extraction, this thesis proposes two methods of anaphora resolution. According to the problem of the dominant pronouns, a method of combining rule and statistics is proposed. Firstly, the feature is filtered by the custom rules. Then the C4.5 decision tree algorithm is adopted to construct the classifier and the decision of the relationship is realized. For the problem of hidden pronouns anaphora resolution, we proposed a model. This model is divided into three steps which each has an algorithm to complete anaphora resolution.2. In view of the problem that the information extraction of the text information is not accurate and the semantic is not clear in oilfield, it presents a method of information extraction based on the petroleum ontology. For the attribute features and sentence forms of oilfield, the extraction rules are constructed by using the information of the ontology analysis. Then the entities and the relationships are extracted by the rules of attributes and three-tuple.Finally, this thesis designs and realizes the platform of Web information extraction system based on ontology of petroleum. Web pages in petroleum are experimental subjects to verify the performance of the information extraction system. The result achieves the expected goals and demonstrated the feasibility of technology and methods. It provides some reference and practical value.
Keywords/Search Tags:Ontology, Anaphora Resolution, Rule, Information Extraction
PDF Full Text Request
Related items