Font Size: a A A

Entity Relationship. Areas Of Automatic Extraction Research

Posted on:2012-12-19Degree:MasterType:Thesis
Country:ChinaCandidate:C Y LeiFull Text:PDF
GTID:2218330368480902Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Entity relation extraction is a hot issue in the field of information retrieval currently. Entity relation extraction refers to automatically identify every kind of potential semantic relations from a specific area of unstructured text, which has a wide range of applications in information retrieval and question-answering system. At present, the entity relation extraction mainly adopts supervised machine learning method in the mark,which makes relation extraction have better performance under the conditions of adequate training data; and in order to reduce the workload of manual labeled training data, the entity relation extraction of semi-supervised learning has been paid more and more attention.This paper firstly conducted a supervised machine learning methods to research entity relation extraction, using the ideas of statistical methods and machine learning,then it proposed two methods of supervised machine learning to study entity relation extraction in the tourism field:First, the research was conducted in field of entity relation extraction based on Maximum Entropy and self-expanding. Second, the research was conducted in field of entity relation extraction based on the method of combining binary classification idea with ration. semantics of words between entities; the second is based on two areas of classification and reasoning of entity relation extraction. In the period of research work on the above mentioned two kinds of supervised machine learning methods,on the one hand,this paper proposed to select features which influence the performance of relation extraction and the results show that adding a combination of entity type features, the distance between entity pairs features, the semantics of words between entity pairs could effectively improve the performance of relation extraction,on the other hand, combining with characters in the field of entity relation extraction, this paper proposed to a comparison between binary classification and multi-classification,the results show that binary classification could have a more powerful prediction, which based on the work studied by related researchers.Both of the two methods require to tag a large number of training data, while tagging data needs time and power, so that how to tag the less labeled training data to conduct the research of relation extraction? To this point, the exploration of third method was conducted and the method of entropy-based semi-supervised learning in the field of entity relation extraction, the nature of which is using a method of semi-supervised machine learning, based on ideas of the self-expansion to small-scale training data, to extract entity relations.In the method of third relation extraction, the three key questions are as follows:the first problem is the selection of initial training set. To the area of entity relation extraction, selecting a number of small-scale seed instances which have been labeled as training data for machine learning algorithm, so that we can obtain a classfication; the second problem is the training data extended automatically and expansion strategy or standard by which to select the instance of new seeds with higher reliability, which were added to the training data; Finally, when to terminate self-expansion of the training corpus and depth study of the expansion and iteration termination issues of training data.This paper considered that we should endow detailed semantic tag to entity relation when the above work of entity relation extraction is nearly to be finished.So,from the practical view,the experiment try to adopts machine learning arithmetic based on CRF to explore how to obtain semantic tag of entity relation.The subject makes a exploration to entity relation extraction in the field of tourism in Yunnan, including two ways of supervision:field entity relation extraction research based on maximum entropy and self-expanding; field entity relation extraction research of reasoning based on binary classification.There is a experiment of entity relation extraction based on semi-supervised learning.Besides,the obtained entity relation is endowed detailed semantic tag.The experimental data are the 1000 pieces of texts in tourism corpus in Yunnan, while experimental results show that using the supervised machine learning methods to extract relations, feature selection influence the relation extraction performance, and with the case of same feature set, the second classifier prediction is better than multiple classifiers; on the other hand, when a small amount of training data is labeled, the method of using information entropy to iterate extended training data,could effectively raise the performance of semi-supervised entity relation extraction.Overall,comparied the performance of supervised relation extraction to the relation extraction performance of semi-supervised learning, there remains a gap.
Keywords/Search Tags:Domain Entity Relation, maximum entropy, binary classifier, Information Maxent, semantic lable
PDF Full Text Request
Related items