Font Size: a A A

Research On Semi-supervised Entity Semantic Relation Extraction

Posted on:2016-04-23Degree:MasterType:Thesis
Country:ChinaCandidate:H ShiFull Text:PDF
GTID:2308330479995445Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the further popularization of computers and the rapid development of Internet, more and more data, information and knowledge appears in the form of electronic texts and the amount of data showing explosive growth. However, there is a growing sense for people to find the knowledge from massive amounts of data they need to become more and more difficult, this is because "information overload" and lack of search engine technology lead to this situation. The existing search engine technologies are essentially based on keyword matching, which can not understand the electronic text and then return the knowledge user needs to user. Based on this, information extraction technology has attracted more and more attention of researchers.Information extraction technology, its main task is to extract semi-structured or structured data from unstructured electronic text, and present and store them in a structured form for users to query and further use. Named entity semantic relation extraction is an important part of information extraction, gaining more and more attention; it uses named entity recognition technology to go deep into text inside and understand the text in order to return the semantic relationships between entities. So, named entity semantic relation extraction can solve the shortage of the search engine based on keyword matching, it plays an important role in the automatic question answering, machine translation and semantic annotation. Entity semantic relation extraction method can be divided into the method based on knowledge and machine learning method. Because the knowledge base method requires labor of a large number of experts in the field and is less portable. Therefore, more and more scholars are concerned about machine learning methods. In machine learning methods, it can be divided into supervised machine learning method, semi-supervised machine learning method and unsupervised machine learning method. Because semi-supervised machine learning method requires only a small amount of human intervention and has a good effect on practicability, portability, it has been widely used. Therefore, this paper mainly studies the semi-supervised machine learning method based on bootstrapping. In previous studies, the pattern is represented as a quintuple: <left, tag1, middle, tag2, right>, not considering the expression function of keyword to the named entity semantic relationship; Only take into account lexical information, and semantic information is not added to the extraction of semantic relationships; In the calculation of feature items weights, only regard the two entities as the core, the role of the keyword is not added.Aiming to these problems, this paper proposes:(1) A new pattern representation method; this method adds the keyword information, semantic information and word order and other information based on previous patterns. This is because the semantic relationships between entities can be triggered by the verb, noun and keywords in context, the keyword is the core in relation pattern; In the Chinese environment, there is a case of synonyms, adding the words semantic information can make relation extraction more effectively; The relative words order between keywords and entities reflects the framework and structure of the relationship description pattern in a certain extent, so the word order information will help similarity calculation and clustering of the relation describing pattern.(2) According to the new pattern, this paper proposes a new relation extraction method, which includes the concept of semantic distance, improved feature weight calculation, pattern similarity calculation, pattern acquisition, pattern clustering, pattern abstract and relationships extraction. By calculating the semantic distance, the relative distance calculation and feature weight calculation contain a certain semantic expression ability; the new pattern representation method of adding semantic information and word order can makes the pattern has a better matching attribute and makes the pattern obtaining, clustering and generalization more accurately to improve the performance of semantic relation extraction.(3) Finally, this paper designs a model system and Experiments was carried out. The experimental results verify the effectiveness of the proposed method.As a verification example of the method, this paper extracts the "wife" relationship, "daughter" relationship and "girlfriend" relationship on the model system. The experimental results showed that the average accuracy of the proposed method is improved by 6.5% than the traditional method.
Keywords/Search Tags:Information Extraction, Entity Semantic Relation, Bootstrapping Technology, Relation Description Pattern, Pattern Matching
PDF Full Text Request
Related items