Font Size: a A A

Research And Implementation Of Uyghur Personal Relation Extraction Based On Distant Supervision

Posted on:2022-07-14Degree:MasterType:Thesis
Country:ChinaCandidate:X Y GuoFull Text:PDF
GTID:2505306542455454Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
In the context of the rapid development of the Internet,it is no easy task to retrieve valuable information from massive amounts of data.Information extraction technology has been rapidly developed today.Relation extraction is an important subtask of information extraction.It has very important research significance and broad application prospects.In recent years,it is a hot research object in the field of natural language processing.At present,the research on personal relationship extraction in Chinese and English has developed and achieved good results.However,due to the late start and lack of resources,the personal relationship extraction in Uyghur language needs further research.In this paper,we use distant supervision method to construct a Uyghur data set.In order to reduce the influence of noise data on personal relationship extraction,we use Double-layer Two-dimensional Structured Self Aattentions model to improve the effect of personal relationship extraction.Finally,a personal relationship search system is constructed.The main work and innovations of this paper are as follows:(1)Aiming at the problem of the lack of annotated data set for Uyghur personal relationship extraction,this paper adopts the method of distant supervision to align free text with the knowledge base to generate annotated data set.First,collect free texts in Uyghurs on Tianshan.com,People’s Daily and other websites.Secondly,crawl a large amount of Wikipedia entry data,sort and generate relational triples as a knowledge base,and use this as a source of supervision.Then,the free text and the knowledge base are matched and aligned to obtain the corresponding relationship labels,and a rich Uyghur annotation data set is automatically generated for the following personal relationship extraction experiments.(2)Aiming at the problems that the distant supervision method will bring a lot of noise and the feature selection is not comprehensive,a DTSSA model based on Doublelayer Self-attention mechanism is proposed.The model uses a multi-instance learning method,and uses Bi LSTM combined with a two-layer self-attention mechanism to extract relations.The word-level and sentence-level attention mechanism is improved from the traditional one-dimensional vector representation to the two-dimensional structured matrix representation.On the one hand,using Bi LSTM can better learn bidirectional contextual semantic features.On the other hand,the improved two-dimensional wordlevel self-attention mechanism can focus on more aspects of a sentence instance,and the two-dimensional sentence-level self-attention mechanism can better select effective instances and reduce the impact of noise instances.Experiments on the Uyghur personal relationship extraction and annotation data set show that the DTSSA model has improved P@N accuracy and F1 measurement,which proves the effectiveness of the model.(3)Based on the above theoretical research,a Uyghur personal relationship search system based on the B/S architecture was designed and implemented,which visualized the relationship of the characters in the form of a graph,and relied on the addition and modification functions that the system opened to users.Further enrich the existing knowledge base,thereby improving the quality of the data set.
Keywords/Search Tags:Personal relationship extraction, Deep learning, Distant Supervision, Attention mechanism
PDF Full Text Request
Related items