Font Size: a A A

Comparative Study Of Automatic Entity Relation Extraction

Posted on:2011-01-22Degree:MasterType:Thesis
Country:ChinaCandidate:H Y NingFull Text:PDF
GTID:2178330338979944Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of computer and network technology, large amount of information in form of electronic documents has appeared. More and more attentions are paid to extract useful information from these texts. Therefore, information extraction technology has become prevalent and relation extraction is one of the important subtasks.Specific fact information in text is represented as entity, and the judgment of the relationship between these entities is defined as entity relation extraction. Entity relation extraction plays an important role in constructing ontology and refining information retrieval technology. This thesis focuses on some issues about entity relation extraction technology:First of all, domain-specific terms with important semantic relations except traditional named entity extraction are extracted. Because of the variability in the evaluation data of domain-specific term and difficulty in judging domain-specific terms by human, a variety of popular Chinese automatic domain-specific term extraction statistical methods are compared and analyzed in this paper. Both the objective method based on professional computer dictionary and the subjective method based on human judgment are adopted. A comprehensive comparison is performed with many evaluation measurements including precision, recall and F-measure. Moreover, this paper proposes a domain-specific term extraction method based on the weight of linear support vector machine. The experimental results show that this method extracts domain-specific terms effectively.Secondly, a unified corpus is employed to make comparison among the supervised, semi-supervised and unsupervised feature-based entity relation extraction in order to meet the requirements of different application.Previous studies based on supervised entity relation extraction methods did not consider the effect of features on no-relation between two entities. Thus, this paper compares effects of general features: words around an entity, type and subtype of an entity, location of two entities, dependency parsing of the center words and content of an entity on real relationships and no-relation. Besides, a novel feature that location information of a characteristic word is proposed and relation extraction.We do various comparison experiments with different entity features and size of seed set by semi-supervised entity relation extraction method of Bootstrapping. Also, we compare the performance of semi-supervised and supervised entity relation extraction method in the same conditions. Experimental results imply that the semi-supervised entity relation extraction can improve the precision of entity relation extraction.Most researchers use data clustering methods in unsupervised entity relation extraction. The effect of clustering algorithms and combined strategies on entity relation extraction is the focus of this thesis. Three clustering algorithms, namely K-means, Self-Organizing Map (SOM) and Affinity Propagation algorithm and two combined strategies (DCM and Cosine) are compared and analyzed in the thesis. Affinity Propagation algorithm can achieve the best precision in our experiment, and the SOM algorithm is superior in the real running time.
Keywords/Search Tags:entity relation extraction, domain-term extraction, Bootstrapping, clustering, DCM-combination
PDF Full Text Request
Related items