Font Size: a A A

Chinese Entity Relation Discovery For Bigcilin

Posted on:2017-05-27Degree:MasterType:Thesis
Country:ChinaCandidate:S LiuFull Text:PDF
GTID:2308330503987201Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the size of the data generated by the Internet increases sharply. How to extract valuable information from the mass data accurately and quickly has become one of the major issues of researches.Information Extraction is generated in this background. The main purpose of information extraction is to extract the fact information of specified entities, relations, time and other factual information from the natural language text, in fact, is to convert the text information to structured or semi-structured type. Entity relation extraction is one of the sub-tasks in information extraction. The traditional entity relation extraction requires pre-defined relation category system, and then determine the semantic relation categories of entities based on the entities and their context information. However, it is very difficult to define an overall system of entity relation categories, resulting in an open entity relation extraction technology using entity relation indicator word indicates the entity relation.Based on the large entity semantic category hierarchy in Bigcilin, we firstly use the model which based on character information to learn word embeddings, and then learn the hypernym–hyponym relation representations, resulting in good entity hypernym discovery experimental results. It alleviates the problem of unlisted word representations. The model based on character information can learn word embeddings of almost any words, including unlisted words. We then cluster the hypernym–hyponym relation representations learned by the word embeddings of hypernym and hyponym words in the training corpus, in order to learn the mapping matrix of each cluster. Finally, we identify the hypernyms according to the mapping matrices. In the dataset of lots of unlisted words for training data of word embeddings, we still obtain in nearly an 80% precision of the entity hypernym discovery experiments. For the good results, we can import them into Bigcilin.In this paper, for the open entity relation extraction, we use Long-Short Term Memory(LSTM) to learn syntax dependency path information between entities of open-domain text sentences. Before the open entity relation extraction, we analyze various RNNs features, and combine the advantages by using Bi-LSTM-CRF in the open-domain entity boundary identification and achieved 78.92% of the F1 score. We then use the shortest dependency path LSTM(SDP-LSTM) to extract the entity relation, respectively, using two sets of parameter learning shortest path dependence of entity 1 and entity 2, and use strategies to handle various forms of entity relation dependence path candidate, and fina lly in the open entity relation extraction, we achieved good results.For the rich entity library and the semantic categories in Bigcilin, we obtain the hypernym candidate pairs which may have relations according to the entity relation triples extracted in encyclopedic information box, and propose the concept of hypernym generalization degree to filter the hypernym candidate pairs with low hypernym generalization degree. Making entity pairwise combinations of the candidate hypernym pairs, we obtain the candidate relation indicators by means of search engine results and discriminate them. Experimental results show that the candidate entity relation discriminating features we proposed are effective.
Keywords/Search Tags:Relation Extraction, Bigcilin, Deep Learning, Open-domain, Relation Indicator, Hypernym
PDF Full Text Request
Related items