Font Size: a A A

Researches On CNN Based Relation Extraction

Posted on:2019-11-18Degree:MasterType:Thesis
Country:ChinaCandidate:J J ZhangFull Text:PDF
GTID:2428330548979784Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
There is extremely rich data in the internet.However,the information is usually stored in unstructured form and cannot be used effectively.Relation extraction transforms unstructured natural language to structured information,which can help realize higher-level tasks such as knowledge base construction,semantic reasoning and multi-round dialogues.Based on the previous steps such as entity recognition and linking,the text of natural language is recognized by relation extraction,and the semantic relation contained in the text is found,so that structured information can be obtained.In this paper,a convolutional neural network model based on data priori is proposed.The main contributions are as follows:1.Comprehensive use of a variety of training data.Based on a small amount of structured knowledge base and large-scale unstructured corpus,the model uses distant supervision method to automatically generate training samples.However,distant supervision is essentially an unsupervised method,and the data generated by it often contains a lot of noises.In this paper,pre-training model and rule filtering are used to get filtered samples.Combined with a small number of manual annotation samples,the model uses three kinds of training data.2.Building multiple networks according to data sources.Manual annotation samples are the most accurate but the number of samples is limited.The generated data contains lots of noises.The number of filter samples is moderate but there is a certain deviation.In this paper,we construct multiple networks for different sources of training samples,and combine posterior probability regularization to get the posterior probability of fusion data.3.Using multi-instance learning to fuse networks and reduce noises.For multiple networks with different data sources,the model uses multi-instance learning method to combine samples from different sources to bags and update parameters in the bag level.In this way,the learning step is biased to manual annotation samples and filter samples,and the noises from distant supervised samples is reduced.The accuracy and robustness of the model are significantly improved.The convolutional neural network based on data priori solves the problem of data sources and noises.The paper that described this algorithm is published in SIGIR 2017,a top conference in the field of information retrieval,which has attracted wide attention from researchers.The experiment results show that the proposed algorithm achieves the best results on TAC-KBP dataset,and the accuracy is improved by more than 8%compared with the existing best methods.The algorithm proposed in this paper has been applied in many tasks such as knowledge base construction,China Engineering Science and Technology Knowledge Center and multi-round dialogue,which further illustrate the effectiveness of the algorithm.
Keywords/Search Tags:Relation Extraction, Convolutional Neural Network, Information Retrieve, Posterior Regularization
PDF Full Text Request
Related items