A Research On Chinese Information Extraction Based On Construction Of Domain Ontology

Posted on:2017-10-01

Degree:Master

Type:Thesis

Country:China

Candidate:S S Huang

Full Text:PDF

GTID:2348330485972938

Subject:Information Science

Abstract/Summary:

PDF Full Text Request

The rapid growth and electronization of information promote the sharing of knowledge and the convenience of information acquisition, but the situation also increases the difficulty for people to obtain knowledge. In this context, information extraction technology, as information technology that dedicated to the recognition of specific target information from various forms of information, gets extensive development. Chinese information extraction has more obstruction because of the unique characteristics of Chinese and polysemy.Biodiversity is the foundation of biology and ecology, species is the closest natural element to the organisms, so it can be regarded as the basis of biological diversity. And the mass of plant species makes it as one of the important contents of biological species field. However, the massive plant species description information often exists in the form of text, the situation makes the knowledge of plant species difficult to be identified and used, even to hinder the development of plant species field in the future.In order of the maximum applicability of information extraction scheme, this study designed an information extraction scheme. In this information extraction scheme, the ontology as the support base, layered analysis as the analytic pattern, under the condition of text structure analysis, extract information from description text with rules. Then a complete information extraction framework has been formed. What's more, in the scheme, a framework of building domain ontology has been provided, which constructs ontology with reusing existing ontology according the top-level ontology, and means full use of domain knowledge.The plant species has been chosen as the field to practice, in order to promote the development of knowledge of the field. The information extraction scheme can be divided into the following four tasks.(1) Construction of Domain Ontology------Construct Chinese plant species diversity ontology by reusing the existing ontology (PO) according the top-level ontology(BFO). And design and practice the whole process of constructing new ontology by reusing the existing ontology according the top-level ontology;(2) Generation of Text Set------Analysis the structure of Web page based on DOM tree, obtain the original text block, then filter information for the target description through the calculation of text similarity, in order to get text set to be extracted;(3)Formation of Domain Dictionary and Tagging Set------Form the domain dictionary through the analysis of the domain ontology by calling the function of Jena. And combine the domain dictionary with the text feature and needs of feature extraction, construct the tagging set;(4) Tagging and Extraction------Achieving content annotation by the process of word segmentation, on this foundation, analysis content by layers according to the text structure, then judge the semantic structure of the content, and ultimately represent knowledge in structural form.In practice, the study chose Chinese plant species as the practice domain. In order to support the information extraction task, Chinese plant species diversity ontology has been constructed, in the method of reusing ontology. Then extract structural information from plant species description text by rules based on the ontology. In four groups of experimental data, the average accuracy rate reached 0.89, the average recall rate reached 0.88, and the average F-measure reached 0.88. And the applicability of the information extraction framework for different species has been verified by contrast experiments.Finally achieved the following achievements:(1) Innovation of information extraction method; (2) Construction of the Chinese Plant Species Diversity Ontology;(3) Better practice comprehensive effect than the domestic similar research.

Keywords/Search Tags:

Information Extraction, Domain Ontology, plant species diversity description, text processing

PDF Full Text Request

Related items

1	Construction And Implementation Of Domain Ontology Based On Plain Text
2	Study On Plant Species Recognition Based On Leaf Characteristics And Realization Of Recognition System
3	Information Extraction And Analysis Based On Plant Ontology
4	Theoretical Studies On Ontology-Based Information System Modeling
5	Adaptive Web Information Extraction Method Research Based On Ontology
6	Research On Domain Ontology Representation, Reasoning And Integration For The Semantic Web And The Applications
7	Ontology-Based Structured Information Extraction From Web Pages
8	Domain Ontology-based Web Information Extraction Technology
9	Study Of Knowledge Modeling For Plant Community Growth Simulation
10	Research On Ontology-Based Web Information Extraction Technology