Font Size: a A A

Research And Application Of Distant Supervision Relation Extraction Based On Deep Learning

Posted on:2024-09-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y X XuFull Text:PDF
GTID:2568307127454194Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Relation extraction is a fundamental task in natural language processing and one of the most important subtasks in information extraction,which aims to identify the semantic relations between marked entities in the text and output in a structured form to support downstream tasks such as knowledge graphs and question answering.With the development of deep learning,supervised learning-based methods have been applied to relation extraction with satisfactory results but highly dependent on large-scale and high-quality manually annotated datasets,which is costly in terms of labor and time.By aligning the knowledge base and corpus,distant supervision allows automatic annotation of large-scale data,which has provided strong support for research of relation extraction.However,the distant supervision assumption is so strong that it brings the problems of mislabeling and long-tail phenomenon,which make distant supervision relation extraction still face many challenges.Based on deep learning,this thesis focuses on the noise problem and the long-tail problem.The main research work and innovation points of this thesis include:(1)Distant supervision relation extraction methods based on multi-instance learning framework mostly predict at bag-level and denoise effectively,but they show unsatisfactory performance on sentence-level prediction.In this thesis,a novel distant supervision relation extraction method is proposed,in which the model is trained at sentence-level via negative learning and selective positive learning to avoid over-fitting noisy data and enable faster convergence of clean data.In order to transform noisy instances into useful data,data reconstruction is applied by noise filtering and re-labeling according to label confidence,which refines the quality of distant data and further enhances model performance.Experiment results on NYT dataset show that the F1 scores of the proposed method on dev set,test set and noise-annotated test set outperform all compared methods,which verifies the ability of noise reduction and sentence-level relation extraction of the proposed method.(2)Aiming at the long-tail problem,a sentence-level relation extraction model enhanced with prototype and entity type is proposed.The model applies the attention mechanism on prototype embedding to enhance sentence representation based on the idea of prototype and introduces entity type information.Therefore,instances of different relations establish connections via prototypes and entity types implicitly,which helps to enhance representations of instances of long-tail relations.On this basis,the constraint graph is applied to explicitly model the potential dependencies between relations,and encoded by graph convolution networks to promote the propagation of information between relation nodes,which alleviates the lack of data of long-tail relations.The results of the comparative experiment and ablation experiment on NYT dataset and Re-TACRED dataset show that the proposed model achieves improvements in the extraction of long-tail relations under different types and proportions of noise environment.(3)Due to the complexity of Chinese and the lack of Chinese datasets,the current research on Chinese relation extraction is still inadequate,and there is still much space for development.Aiming to provide a reference for research in this field and verify the feasibility of the proposed methods in Chinese texts,an open system for Chinese relation extraction is designed and implemented.The system mainly contains three functional modules: relation extraction,data annotation,and knowledge graph query and visualization.Under distant supervision assumption,the system achieves annotation of data sets automatically by aligning knowledge graphs and then accomplishes training of models.Test results indicate that the system has practical value to some extent.In summary,this thesis proposes solutions to address the noise problem and the long-tail problem in distant supervision relation extraction,which effectively alleviates the impact of noisy data on sentence-level relation extraction models and improves the performance of the models on long-tail relations.Furthermore,an open system for Chinese relation extraction is designed and implemented based on the proposed methods and algorithms.
Keywords/Search Tags:Distant Supervision Relation Extraction, Noise Reduction, Long-Tail Problem, Attention Mechanism, Knowledge Graph
PDF Full Text Request
Related items