Font Size: a A A

The Research On Entity Relation Extraction Based On Distant Learning

Posted on:2018-06-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z X BaoFull Text:PDF
GTID:2348330518496944Subject:Electronics and Communications Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology in Web 2.0 era,Internet, as the main carrier of information dissemination, has been promoting the explosive growth of digital data, which contains amount of information that people concentrate on. The main task of information extraction that how to process the large-scale and unstructured data and turn it to structured information quickly and efficiently has become a research hotspot. Entity relation extraction(ERE) is a key branch of information extraction(IE) field. It has not only significant promotion in IE theory, but also broaden application value in practical engineering field.Currently ERE mainly lies in supervised/semi-supervised learning, and there are some disadvantages such as high expensive acquisition and poor generalization ability of training data. Distant learning (DS) solves these problems to a certain extent. In this paper, we study the ERE based on DS.The main contents are as follows.Firstly, we propose a paradigm of ERE based on DS integrated with word vector and analyses of classification for trigger words. We first extract features at different kind of levels, then get conjunctive feartures utilizing rich feature theory, and finally correct the predicted result of entity pair relation from the extraction system according to most representative trigger words for each relation. The result of experiment shows that the introduction of word vectors combined with classification analyses for trigger words improve the overall prediction performance of extraction system and is independent of the number of entities occurrence, and can achieve 20.3% and 18.7% promotion of performance at 150 entity pairs level and 500 entity pairs level respectively.Secondly, by analyzing the strict assumption of DS intuition, we propose a hierarchical topic model of sub-sequence mapping with filtering mechanism via summarizing the former research. In the training step,multi-layer topic model and subsequence mapping can alleviate sparsity and long tail effect caused by words sequence with low incidence. In addition, the predicting relation labels from generative model are filtered by a set of error markers to reduce the number of error-labeled samples.The experimental result demonstrates that the model can effectively reduce the number of the training data being mislabeled, compared with the model integrated with word vector and classification analyses of trigger words,this approach can achieve 9.72% promotion of accuracy and still maintain a good stability in the case of a large number of entity pair.The main contributions and innovation of this thesis are as follows.The proposed algorithm integrated with word vector and classification of trigger words has improved not only the accuracy of prediction among high-frequency entities, but also the overall accuracy of the whole system.And hierarchical topic model of sub-sequence mapping with filtering mechanism can effectively reduce the number of errors and noise data so as to improve the performance of extraction system.
Keywords/Search Tags:entity relation extraction, distant learning, classification of trigger words, generative model
PDF Full Text Request
Related items