Font Size: a A A

Research Of Chinese Character Relation Extraction Based On Deep Learning

Posted on:2022-06-02Degree:MasterType:Thesis
Country:ChinaCandidate:M J ZhangFull Text:PDF
GTID:2518306494480644Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
With the continuous innovation of science and technology,the information on the Internet is growing at an unprecedented speed,and mankind has officially entered the era of "Big Data".Currently,information on the Internet mainly exists in the form of text.Faced with a large amount of text,people usually cannot obtain valuable information quickly.This problem can be solved by information extraction technology.For text data,information extraction technology can transform unstructured information into structured information,and then integrate it in a unified form.Relation extraction is the core task of information extraction.Through relation extraction,entities can be identified from the text and the semantic relationships between entities can be extracted,which can be used for upper-level applications such as search,question answering,and reasoning.In recent years,character relation extraction has become one of the research hotspots in the industry.This task limits the entity type to characters,recognizes the character entities from the text and establishes the semantic relationship between the characters,and then builds the character relationship network,which can be used for applications such as character relationship display,interpersonal relationship mining,and social network analysis.At present,some scholars have conducted related researches on Chinese character relation extraction,but there are still two deficiencies.Firstly,the public Chinese character relation extraction labeled data sets are lacking,and we are difficult to get a lot of Chinese labeled corpus.Distant supervision is a common method to build a large number of labeled data sets automatically.Although this method can solve the problem of annotation,but will also introduce a lot of noise data,and reduce character relation extraction task performance.Secondly,the current Chinese character relation extraction tasks mostly use traditional machine learning methods,which leads to the model's over-reliance on feature engineering.A large number of features need to be designed manually,and the features in different scenes are not the same.This brings a challenge to the generality of the Chinese character relation extraction model.Aiming at the shortcomings of the existing methods for Chinese character relation extraction,this article proposes corresponding solutions.The main contributions of this paper are summarized as follows:Firstly,due to the lack of public labeled data sets for Chinese character relation extraction,this paper uses the existing knowledge base and internet corpus to automatically construct a rich labeled data set for Chinese character relation extraction through the distant supervision method.Secondly,since current Chinese character relation extraction tasks mostly use traditional machine learning methods and rely on feature engineering,this paper adopted three deep learning models of CNN,PCNN,and Att-Bi LSTM to extract Chinese character relationships,automatically extracted feature from the text.We do not need to design any features artificially,which improves the versatility of the model.Finally,a two-stage optimization method is proposed to solve the noise data problem of the training set caused by the distant supervision method.In the training stage,this paper proposes a noise filtering method based on the reading comprehension model.The noise samples are identified by the reading comprehension model and removed.Then,we can train three better Chinese character relation extraction models on the denoised data set.In the prediction stage,this paper proposes an ensemble learning method based on multi-classifier voting and synthesizes the results of three different deep learning models to obtain better prediction results.
Keywords/Search Tags:relation extraction, distant supervision, deep learning, reading comprehension model, multi-classifier voting
PDF Full Text Request
Related items