Font Size: a A A

Entity-relationship Extraction Research Fields And Semantic Tags

Posted on:2014-03-08Degree:MasterType:Thesis
Country:ChinaCandidate:J ZhaoFull Text:PDF
GTID:2268330401973364Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Entity relation extraction is not only a necessary part of information extraction, but also is the basic part of the follow-up work such as event detection, ontology knowledge construction. Studying how to improve the accuracy of extraction technology is significant to entity relation extraction. The core issue of entity relation extraction is to study the similarities between the two pairs of entities, and correctly classify the types of entity relation. At the same time we also need to address the problems of manual intervention and difficulty in marking corpus.Firstly, combined with shorter sentences of the tourism domain, this paper discovered a new composite kernel method for guiding entity relation extraction. In this paper, make use of convolution tree kernel to calculate the similarity between the pairs of entities. In order to reduce the impact of syntactic parser performance to tree kernel function, combined with the word sequence nuclear, a composite kernel of entity extraction is proposed based on the tree kernel and word sequence kernel, which can capture syntax, part-of-speech and word sequence information of entity relation instances. Test on the corpus of the tourism domain, as opposed to the traditional relation extraction performance based feature vectors and convolution tree kernel method, the performance of this method has a certain improvement.Combined with the characteristics of the tourism domain, this paper adopts a weakly-supervised extraction method of entity relation based on entropy minimization. This method firstly extracts the characteristic words by the idea of scalar clustering with small-scale stratified marked instances, and constructs the initial classifier with maximum entropy machine learning algorithm. Then use the initial classifier of certain accuracy to classify the unlabeled instances, and add the instances of the minimum information entropy to the training corpus set to continually expand the scale of training data set. Finally, repeat the above iterative process until the performance of classifier is to be stabilized, and then a final extraction classifier of entity relation in specific domain will be constructed. Experiments performed on the corpus of tourism domain show that, not only can this method reduce the dependence of entity relation extraction on manual intervention, but it could effectively improve the performance of entity relation extraction, the F value of which is up to63.69%.At the same time, we also adopt an extraction method of semantic label of entity relation in the tourism domain based on the conditional random fields and rules. In this method, firstly making use of the ideas of classification in named entity recognition, semantic items reflecting entity relations are seen as semantic labels in the contextual information to be labeled, and identify the semantic label with CRF, then respectively according to the relative location information of the two entities and semantic label and rules, the semantic labels are assigned to the associated entities. The experimental results on the corpus in the field of tourism show that this method can reach the F-measure of73.68%, indicating that the method is feasible and effective for semantic label extraction of entity relation.Experiment results indicate that the above methods could improve the performance of relation extraction and semantic tags extraction, thus lays a fairly good foundation for further research.
Keywords/Search Tags:Information Extraction, Entity Relation Extraction, Kernel Function, Entropy Minimization, Semantic Label
PDF Full Text Request
Related items