Font Size: a A A

Research On Chinese Named Entity Relation Extraction Based On Lexical Semantic

Posted on:2017-05-06Degree:MasterType:Thesis
Country:ChinaCandidate:Q XuFull Text:PDF
GTID:2308330503457627Subject:Computer technology
Abstract/Summary:PDF Full Text Request
Since the birth of the Internet, there are more and more users who access to it, which result in the amount of information growing explosively. And a huge amount of information has brought great value. However, information on the Internet are mostly unstructured or semi-structured text. In order to take advantage of the information effectively, a hot topic on Natural Language Processing- Information Extraction has to be involved in. Named entity relation extraction is an important task of Information Extraction, and its task is to let a computer extract relations between entities automatically. Relation extraction is of great significance for many areas, such as the construction of domain ontology and knowledge map, question-answering systems, and information retrieval.Of the four Chinese relation extraction method, we choose the tree kernel based machine learning method as research direction, and the key is to build effective features. In the existing features, semantic dictionaries such as "Tongyici Cilin", "HowNet" are rarely involved, but the semantic information contained in them is of great value for relation extraction.This paper presents a method to calculate lexical semantic similarity based on "Tongyici Cilin" and proposes lexical semantic similarity tree feature based on the method. "Tongyici Cilin" code is divided into five layers, and the more same layers there are from left to right, the more semantically similar are the words; the principle of tree kernel function to calculate similarity is to count the same sub-tree, the higher the number, the higher the similarity. For the above two points, we proposed a "Tongyici Cilin" code tree feature, placing the 5 layers code in 5 nodes in the tree structure. In order to explore which layer’s semantic information is most suitable for relation extraction, we also proposed "Tongyici Cilin" code trees at all levels.The semantic information in "HowNet" is contained in the DEF item in the commonsense knowledge base. We presents a feature named "HowNet" semantic tree which is transformed from the DEF item. In order to reduce the number of nodes in the feature, we propose two simplified "HowNet" semantic trees: three layers semantic tree and no dynamic role semantic tree.The following conclusions are obtained from experimental results: the full "Tongyici Cilin" code tree is the best among the features based on "Tongyici Cilin"; among the features based on "HowNet", the full "HowNet" semantic tree performs best; a combination feature of "Tongyici Cilin" code tree and "How Net" semantic tree performs very well, the TFs of relation types and subytpes extraction are 86.6 and 93.3 respectively; the combination feature can be generate without annotated data, so it is of great application value in open domain relation extraction in the future.
Keywords/Search Tags:relation extraction, tree kernel, machine learning, Tongyici Cilin, HowNet
PDF Full Text Request
Related items