Font Size: a A A

Distantly Supervised Relation Extraction Method And Its Application Based On Deep Learning

Posted on:2022-10-01Degree:MasterType:Thesis
Country:ChinaCandidate:S Z YangFull Text:PDF
GTID:2518306569981759Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Relation extraction is the basic task of natural language processing(NLP)to obtain the triple in a sentence.However,the traditional methods based on supervised learning rely on the datasets produced by manual annotation,which requires high labor and time cost.Therefore,in order to extract relation automatically,the scholars have proposed the concept of distant supervision.Distant supervision can make use of the existing information to construct datasets for relation extraction while reducing the cost.Although distant supervision provides convenience for relation extraction task,the vanilla hypothesis brings two major problems:wrong labeling and long tail phenomenon.Therefore,the focus of this paper is to alleviate the negative impact of wrong labeling on the distantly supervised relation extraction,and at the same time to weaken the impact of the long tail phenomenon on the model,which is mainly reflected in the following four aspects:(1)In order to alleviate the impact of wrong labeling and long-tail problem,based on At-Least-One hypothesis,this paper proposes a distantly supervised relation extraction architecture fusing data augmentation and superbag representation,including a data augmentation method faced to the long tail relation and a superbag-level distantly supervised relation extraction model based on deep cluster.(2)Aiming at the long tail problem of the dataset,by analyzing the dataset,studying the distribution of the instances,this paper proposes a data augmentation method faced to the long tail relation.The proposed method can alleviate the impact of category imbalance of the dataset to improve the performance on long tail relation.(3)Aiming at the wrong label problem,this paper analyzes the limitation of the existing methods,and then proposes a method based on deep cluster,whose train unit is superbag.The method is an improvement version of the classical methods whose train unit is bag.Because the deep cluster module is added to reduce the influence of the noise,the performance and robustness of our model are improved finally.(4)The design and implement of the intelligent annotation system based on distant supervision.The paper equips the distantly supervised relation extraction model proposed by this paper in the system,which can do the pre-labeling before manually labeling.Benefitted by equipped with this module,the path of each labeling operation can be shortened,and the work efficiency of the annotators can be improved.
Keywords/Search Tags:Distant Supervision, Wrong Labeling, Long Tail, Data Augmentation, Deep Cluster
PDF Full Text Request
Related items