Font Size: a A A

Research On Chinese Relation Extraction For Complex Text Structure

Posted on:2022-02-06Degree:MasterType:Thesis
Country:ChinaCandidate:Z R HeFull Text:PDF
GTID:2518306554471354Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Information extraction is an important branch of natural language processing.Its function is to extract structured data from unstructured or semi-structured text.One of the most critical sub-tasks of information extraction is relation extraction.The relationship extraction methods based on deep learning are divided into pipeline extraction and joint extraction.However,the traditional methods in dealing with complex text structure can not achieve good results,often can not deal with the relationship overlap problem and the noise information generated in the extraction process.This paper focuses on the research of Chinese entity relation extraction for complex text structure and proposes two optimization schemes for the traditional pipeline method and joint learning framework.The main research and innovation contents of this paper are as follows:(1)A pipeline extraction method based on LSTM and similarity calculation is designed.When using the pipeline method for entity recognition and relationship classification,the association between entities and relationships is split,especially in complex texts with overlapping relationships,the extraction results may be greatly affected by noise.LSTM can extract specific entity objects more accurately by training labeled corpus data.Combined with the joint extraction and annotation strategy of entity-relationship,the pattern of relationship extraction can avoid too pipelining.In the experiment,firstly,the neural network model is used to complete named entity recognition,and then sentence-level attention is used to classify relations based on traditional LSTM.During the experiment,dependency grammar is introduced to extract structured entity relations to enrich semantic features,and the classification weight of relations is adjusted according to similarity calculation.After verification,it is found that the F1 score of this optimization method is 2.76% higher than that of the basic LSTM model in Chinese datasets,and the highest score is obtained in different datasets.Experimental results show that the method can reduce the influence of noise in the text and achieve a good optimization effect.(2)A joint relation extraction method based on dilated convolution and word mixing embedding is designed.The main strategy of the model is to predict the object directly through the main entity based on the idea of sequence to sequence decoding.First,the words and words of the input text are coded separately,then the words vector obtained are mixed and embedded,and the position information is introduced for the input sequence.Then the feature vector is introduced into a convolutional neural network to train iteratively.The coding process of Chinese characters is optimized by using semi pointer and semi annotation structure.The main entity is used to predict the object entities corresponding to each relationship,and the self-attention mechanism is added to reduce the noise information impact.The experimental results show that the F1 score of the model in Chinese datasets is 1.88% higher than the control model,and the precision score of the model in the public datasets is 87.6%,and the recall rate is excellent in different data sets.The experimental results show that the joint extraction method not only simplifies the extraction process,solves the problem of relation overlapping,but also has better robustness and universality in the face of Chinese corpus with multiple relationships.
Keywords/Search Tags:Relation extraction, Deep Learning, Long term and short term memory network, Dilated convolution, Attention mechanism
PDF Full Text Request
Related items