Font Size: a A A

Entity Relation Extraction For Open Domain Text

Posted on:2017-01-08Degree:DoctorType:Dissertation
Country:ChinaCandidate:X Y GuoFull Text:PDF
GTID:1108330488984774Subject:Education Technology
Abstract/Summary:PDF Full Text Request
With the advent of the era of big data, the scale of data to be processed expands sharply, while most of them are in the form of unstructured text in natural language, and the boundaries among different domains arc becoming increasingly blurred, so some meaningful and valuable information are scattering in these large amount of words, which blocks the human from obtaining valuable information directly and efficiently. As an important task in information extraction, the goal of entity relation extraction research is to find out the semantic relations between entities in unstructured or semi-structrued text using the knowledge fom linguistics, statistics, computer sciences, information science and other domain, so as to make the human understand and master the meaning of text quickly. The research of relation extraction from open domain text is faced with several difficulties and challenges, but it has powerful support for evet extraction, information retrieval, machine translation, automatic question and answering, et al, therefore this work is worthy of researching.According to the features of open domain text and existing researches on relation extraction, this thesis carried out relation extraction for open domain text from relational feature selection and extraction method, and leveraging these results to construct the knowledge graph for college basic computer course. The key points of this thesis contain the following 4 aspects:(1) Entity relation extraction based on syntactical and semantical features. News text is a kind of common open domain text, and existing researches usually focus on the applicaion of kernal functions and their combinations, accordingly the researches on relation features acquire less attentions, what’s more these researches used to depend on external semantic kown ledge base, and lack consideration about linguistic features from sentences. This thesis proposes a relation extraction method based on syntactical and semantical features. In relational feature selection, this method adds dependency parsing, semantic role labeling and the some other relative positional features to the basic features, so it mines and expands the range of relational feature selection. In machine learning, on the basis of SVM, it introduces the training model based on feature space transformation, and adopts mature algorithm to optimize the training procedure. Finally this method uses partial<The People’s Daily> as the corpus to conduct the experiments, and the results show the effectiveness of this method.(2) Entity relation extraction based on weakly supversied machine learning. Encyclopedia is another kind of open domain text; we can implement the weakly supervised relation extraction according to its own content, and this method can recude manual intervention and improve the efficiency. The past researches usually used encyclopedia to extract entity attributes, and there were flaws when selecting objective relation types. Therefore, this thesis raises a weakly supervised relation extraction method. In building the relational knowledge base, it doesn’t depend on the data in infobox solely any more, but re-process them from several aspects to improve the quality of relation tuples, and use them to annotate relations in un-labeled data. In selecting objective relation types, this approach proposes a brand new method based on frequency difference density, and it could select relation types randomly from certain range according to the distribution density of each relation type, so as to improve the the coverage and scientificity of relation type selection. Besides, it continues to use the relational feature selction, feature vetor optimizing method and training model for relation classifiers which are presented in the previous research content, and finally it carries out the experiments with part of Baidu Encyclopedia, and gets fine results.(3) Entity relation extraction based on dictionary and rules. The goal of thie research is to extract relations of certain terminology from journal text. When using rules to extract relations, the dictionaries can help to improve the performance, while the rules and dictionaries are usually hand-made, so the efficiency is low. This thesis designs a new structure for the relational word dictionary, and it uses weakly supervised method to find dictionary items automatically. This thesis also presents a method to learn relational rules automatically based on pattern-matching principle. Moreover, this method takes the relation instances without apparent relational words into consideration, and also gives some rule pruning method aiming to improve the quality of rules. Taking the protein-protein interaction extraction as the target, we conduct the experiments with the corpus from journals papers and abstracts, and the results prove the availability of this approach.(4) Construction of the knowledge graph for college basic computer course. Constructing the knowledge graph for college basic computer course could providee abandunt knowlege support for instruction reform and learning method innovation, and it could be helpful to enhance the level of educational informationization. This thesis uses some textbooks of college basic computer course as the corpus, and the key points of this method include:a cross-language entity recognization method, combining the relation extraction methods above, the schema design, relation knowledge representation, knowledge updating method and conflict resolution. Finally we develop the visualization system of this knowledge graph.
Keywords/Search Tags:Open domain text, Entity recognization, Entity relation extraction, Knowledge graph
PDF Full Text Request
Related items