Font Size: a A A

Design And Implementation Of The Bio-pesticide Information Extraction Based On Ontology

Posted on:2015-03-20Degree:MasterType:Thesis
Country:ChinaCandidate:Y Y AnFull Text:PDF
GTID:2308330473950645Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the development of the Internet and the expansion of the information,vast amounts of information appeared in front of people.However we may just need a part of information which is a extremely large.How to get the information we need from these unstructured and how to formate the information effectively?So information extraction technology came into being.The purpose of extract information technology is to extract formatted text from non- formatted information.In recent years,the concept of ontology is more and more populating with people.Because the ontology has many unique advantages.For example different people can share a common understanding of the knowledge structure by using ontology,enhance reuse of the domain knowledge and analyze the knowledge by using domain ontology. In this thesis,we introduces the concept of the ontology to assist in information extraction. With the concept of hierarchical ontology described, we mainly drawn named instances and the relationships between instances.Firstly,this thesis has introduced the related concept of information extraction,the development of ontology and the measurement of information extraction system.On the other hand,this thesis has also introduced the definition of ontology,consisting of elements, the classification of ontology and the relationship between ontology and information extraction.Secondly, this thesis showing the specific design process bio-pesticides ontology according to the general ontology design process.On the other hand, this thesis also shows the ontology class hierarchy file and the part of the bio-pesticides ontology OWL file.And this thesis adds a lot of instances to the bio-pesticides ontology according to referencing the open source database.At finally, the bio-pesticides ontology got the recognition of relevant experts in the field.Thirdly, according to the demand for information extraction of biological pesticides, this thesis designs the biological pesticide information extraction system based on ontology.First of all,this thesis downloads web pages which to be extracted by pretreatment and DOM tree, then we can get the text content of the page. Next this thesis pre-processes the text such as word segmentation, tagging and so on. This thesispersistent the bio-pesticides ontology storage and generate user dictionary through the ontology parser. At the same time,this thesis puts forward MatchRule algorithm based on ontology triples and designs hierarchical rule base.Combining the bio-pesticides ontology and rule base,this thesis uses regular expressions for effective information extraction. The biological pesticide information extraction system can extract the information about instances and the relationships between instances.Finally,this thesis do a lot of tests for the biological pesticide information extraction system.Experimental results show that the biological pesticide information extraction system can achieve good results.
Keywords/Search Tags:Information extraction, Bio-pesticides, Ontology, Triples
PDF Full Text Request
Related items