Font Size: a A A

Research On Person Relationship Recognition In News Text Based On Remote Supervision

Posted on:2022-02-23Degree:MasterType:Thesis
Country:ChinaCandidate:Q RanFull Text:PDF
GTID:2518306524952529Subject:Software engineering
Abstract/Summary:PDF Full Text Request
In the Internet wave,all kinds of information are growing at an explosive rate in daily life.Obtaining the information which users needed quickly and accurately from a large amount of information is becoming an urgent need of people.This has become a research hotspot.Therefore,Information extraction arises at the historic moment.Relation extraction is subordinate to information extraction,and personal relationship extraction is one of the specific research directions.News texts,as one of the main sources for people to obtain information,are mostly scattered in the complex Internet resources in unstructured or semi-structured forms.These texts may contain a great number of personal entities and personal relationship information.Thus,how to accurately extract the relationship between personal entities from massive news texts has become a major research hotspot.The personal relationships obtained from Internet news texts can be used to construct personal relationship graphs and knowledge bases;when used for intelligence collection tasks,it can analyze the relationship network of specific people;when used for public opinion tasks,it can also play a certain role in monitoring public opinion.Training the model with supervised relation extraction method need a great number of manually labeled data set,while the manual labeled data is not efficient and costly.In order to solve this problem,people obtain automatically labeled data by distant supervision,but this method will bring noise and data sparse problems at the same time.Aiming at these two problems,this thesis conducts a research on personal relationship recognition based on distant supervision.We will introduce our research content from the following points:The first is the recognition method of Chinese personal relationship based on Tong Yi Ci Ci Lin and rules.This method calculating the cosine distance between the word vector of the relationship trigger word that can represent the personal relationship in the Tong Yi Ci Ci Lin and the embedding vector of all words in the distant supervision training set,then select words with a smaller cosine distance and greater relevance to the relationship trigger word to expand the number of synonyms in the corresponding word cluster in Tong Yi Ci Ci Lin.After that,combining the rules of specific sentence rule of Chinese personal relations and multi instance learning ideas to identify a single personal relationship,and finally perform multi-relation prediction on all sentences in a bag to obtain the final personal relationship recognition result.The second is the Chinese personal relationship recognition method based on BertBi LSTM.On the basis of existing Chinese personal relationship recognition method based on Tong Yi Ci Ci Lin and rules,denoising distant supervised personal relationship data set.The specific steps are finding the false negative noise in the training set,which can be used to expand the number of sentences in the positive sample.Then remove the false positive noise sentences in the training set.After that,we use the method of Chinese-Japanese translation to enhance the data of the denoised positive samples.The denoised and enhanced data is used as the input of the Bert-Bi LSTM model,combined with the Focal?loss loss function to adjust the model's attention to positive and negative samples.Finally we get the improved personal relationship prediction results.Third,we build a prototype system based on the previous research content to identify the personal relationship in the distant supervision news text and display it.The built prototype system can process news corpus that contains personal entity pairs which given by users or crawled part of the corpus from news websites by distant supervision.The specific steps are preprocessing the corpus based on existing work,then input it into the model,and identify the relationship between characters,and finally the obtained personal relationships are visually displayed to users in the form of text and graphs.This article is based on the public Chinese distant supervision personal relationship extraction data set,we designed and conducted related experiments of personal relationship recognition on it.Then according to the proposed method,the prototype system is built.The experiment proves the feasibility and accuracy of the distant supervision person relationship recognition method proposed in this thesis.
Keywords/Search Tags:News text, Distant supervision, Noise, Personal relationship, Relation recognition
PDF Full Text Request
Related items