Font Size: a A A

Research On Chinese Information Extraction Based On Deep Learning

Posted on:2020-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:M JiangFull Text:PDF
GTID:2428330596973172Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
During rapid development of information extraction technology which caused by the progress of computer science,named entity recognition and entity relationship extraction become the main tasks.Named entity recognition can extract important entity carriers such as person name,place name and organization name from unstructured text.Entity relationship extraction can extract semantic relationships between entities that exist in the text.As an important basic task,they are the foundation of advanced tasks such as machine translation,machine question and answer,etc.Therefore,they have far-reaching significance and influence.The main research results of the thesis are as follows:(1)For traditional machine learning methods,a large number of features are needed to ensure accuracy.Feature templates are particularly dependent on manual and expert knowledge,and traditional machine learning has the disadvantage of being difficult to obtain long-distance dependent information.In order to solve these,a BERT-BiLSTM-CRF deep learning network architecture was constructed.The BERT model was used to extract the text features of the corpus as the input of the BiLSTM network.BiLSTM overcomes the shortcomings of traditional machine learning that were difficult to obtain over long distances,allowed for effective prediction of tag sequences.CRF can learn the label transfer probability and constraints through training,and prevent the occurrence of illegal labels in the final prediction.The experimental results showed that the BERT-BiLSTM-CRF model had a good recognition effect.Compared with the traditional model,the accuracy,recall rate and F value improved greatly.(2)A BiGRU neural network model with attentional analysis and syntactic analysis was proposed.The structural features of Chinese are obtained by using dependency syntax analysis.The potential information in Chinese sentences was fully explored and transferred into the two-way GRU model to effectively overcome long-term forgetting.The advantage is that the GRU model structure is simpler than the long and short memory network model,with fewer parameters and reduces the probability of overfitting.The attention mechanism can be used to assign weights to different features in the sentence,and selectively focused on the feature information in the sentence that can improve the recognition efficiency,and reduce the adverse effects caused by noise in the data.The experimental results showed that the attention mechanism with fusion syntactic analysis had good performance.(3)A joint extraction method based on BiLSTM for entity and its relations was proposed.The entity identification and the relationship extraction were performed simultaneously,which overcame error accumulation of sequential operation.The Chinese part-of-speech feature was added to the Chinese word vector.The nouns and auxiliary words can clearly indicate the grammatical and semantic structure relationship between the blocks.Based on the advantages of the tandem extraction model,the relationship extraction based on joint extraction was proposed.In this way,the end-to-end BiLSTM framework was used to transform the Chinese entity relationship extraction task into Chinese sequence annotation.The experimental results showed that the method was more accurate than the traditional Chinese-based processing method.
Keywords/Search Tags:Deep learning, Named entity recognition, Entity relationship extraction, LSTM
PDF Full Text Request
Related items