Font Size: a A A

Research On Semantic Relation Extraction Between Named Entities

Posted on:2010-06-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:L H QianFull Text:PDF
GTID:1118360278978094Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Semantic relation extraction is an important research subtask in the area of information extraction, as well as an active topic in the field of natural language processing (NLP). With the rapid development of the Internet and the explosive growth of the information on the Internet, it is of great significance to extract useful structured information from free texts. Meanwhile, as the NLP and machine learning technologies evolve progressively towards maturity, the extraction of useful information, even some kind of knowledge, from a large amount of texts becomes possible.The recent years have seen much progress in the information extraction area. However, the performance for semantic relation extraction has always fluctuated around unsatisfactory 70%. Moreover, since extracting semantic relation between named entities using mainstream machine learning paradigms usually needs a large-scale annotated corpus, which costs a great amount of human efforts, a considerable gap still exists between academic researches and practical applications. This may be in part due to the complex nature of the semantic relation extraction task, as well as its heavy dependence on a particular application domain. This paper explores new methods and strategies to make semantic relation extraction more practical for real-world applications, with focus on alleviating its dependence on large-scale annotated corpora.This paper carries out extensive research on the key techniques of semantic relation extraction, with the efforts and goals on:1. Feature-based semantic relation extraction methods, with focus on generating surface features and structural features from a free text and its syntactic representation. Furthermore, the contributions of these various features to relation extraction are systematically analyzed, thus providing appropriate directions for further research.2. Tree kernel-based semantic relation extraction approaches, with focus on how to properly express the structural representation for relation instances. This is done via a dynamic relation tree structure, which is generated on the principle of constituent dependency. The dynamic relation tree structure, which is constructed from the corresponding full parse tree in terms of dependency rules, not only covers the critical information for relation instances, but also avoids the noisy information. Experimental results show that the dynamic relation tree structure can significantly improve the performance for semantic relation extraction, particularly in recall rate.3. The effect of entity semantic information on semantic relation extraction. First, a novel entity semantic tree structure, a kind of structural representation of entity semantic information, is proposed. Then it is gracefully combined with syntactic structural information, i.e. the dynamic relation tree structure, to become a unified syntactic and entity semantic tree structure. Experiments indicate that this tree structure can effectively capture both the structural information and entity semantic information, leading to significant improvements in the performance of tree kernel-based relation extraction.4. Weakly supervised learning in semantic relation extraction. By applying the stratified sampling theory in statistics to weakly supervised learning, a kind of initial seed selection strategy based on stratified sampling is presented. Experiments show that initial relation instances obtained in this way are more representative than those not, and therefore better bootstrapping performance is achieved. Furthermore, the stratified sampling strategy can also be successfully applied to the training set expansion during the bootstrapping process.5. Semantic relation extraction based on label propagation algorithms. By bootstrapping a set of weighted support vectors, we alleviate the problem of excessive computation resources needed by traditional label propagation algorithms. First, weighted support vectors, which can effectively capture manifold structures inherent in both labelled and unlabelled instances, are bootstrapped using supports vector machines (SVM) through a co-training procedure. Then these critical instances are fed into label propagation algorithms as labelled instances. Experimental results show that both the performance and efficiency of label propagation algorithms can be remarkably improved via bootstrapped support vectors.The major contributions of this paper lie on the intensive research of the statistical machine learning methods to semantic relation extraction: 1) the proposal of the unified syntactic and semantic tree structure in tree kernel-based semantic relation extraction; 2) the application of the stratified sampling theory in statistics to the initial seed selection of weakly supervised learning-based semantic relation extraction; 3) label propagation-based semantic relation extraction via bootstrapped support vectors. Experiments show that the above research not only significantly improves the performance of semantic relation extraction, but also alleviates its dependence on large-scale annotated corpora, thereby exhibiting great reference value to the future research in the information extraction area.
Keywords/Search Tags:Natural Language Processing, Information Extraction, Semantic Relation Extraction between Named Entities, Statistical Machine Learning, Weakly Supervised Learning
PDF Full Text Request
Related items