Font Size: a A A

Research And Implementation Of Chinese Information Extraction Based On Deep Learning

Posted on:2022-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:X H YinFull Text:PDF
GTID:2518306605489374Subject:Master of Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet,people are living in an era of information explosion.Traditional retrieval systems are gradually unable to meet people's needs for more fine-grained retrieval of key information.How to effectively identify and use these massive texts has become a current research hotspot.The emergence of knowledge graph has found a breakthrough point for this problem.It uses this structured information to integrate knowledge and provides strong support for upper-level applications such as intelligent question answering.Information extraction,as a key technology of knowledge graphs,has received widespread attention in its development.The accuracy of extraction will directly affect the effect of upper-level applications.Therefore,a good information extraction model has far-reaching significance and important research value.Information extraction is usually divided into three tasks: entity,relationship,and event.After summarizing and comparing various methods of information extraction,this paper mainly focuses on the two tasks of entity and relationship for research,with deep learning as the main task and feature rules as the supplement.Named entity recognition is the basic task of natural language processing.This paper proposes a multi-feature-based named entity recognition model(WCP-RNN)based on the structural complexity of Chinese characters.In the text encoding part,the complex features of word vector,char vector,and part-ofspeech feature fusion are selected.The word vectors is trained using the traditional Skipgram model and compared with random initialization methods in subsequent experiments to prove that the pre-training vectors can better represent the semantics of the text.In addition,deep learning has gradually demonstrated its advantages in the field of natural language processing,and most of the current research methods are based on deep learning.Therefore,in order to verify the performance of multi-features,the downstream model of this paper selects the bidirectional loop network structure with the attention layer,analyzes the influence of each part of the feature on the model and the weighting effect of the attention mechanism through ablation experiments,and finally verifies the multi-feature-based the validity of the textual representation.As an extended experiment of named entity task,relation extraction has a certain internal connection with entity recognition task.Based on the entity research,this paper improves the model structure,adds entity relationship information in the data preprocessing and annotation stage,proposes the entity relationship joint extraction model(WCPD-CNNRNN).In order to make full use of the information contained in the Chinese sentence,this paper incorporates dependent syntactic features in the word feature representation part to generate more accurate semantic coding.Considering that the CNN network has the ability to extract local features and the RNN network is biased in processing long sequences of texts,this paper uses dilated convolution to capture more distant information.Finally,the local features extracted by CNN are used as supplementary features of the RNN code that is weighted by attention,and they are input into the decoding layer together to complete the sequence labeling task.At the end of this paper,we also designed a related comparative experiment to reduce the error caused by non-entity tags during the training process by adjusting the weight of the entity tags.Experiments show that the addition of dependency syntax features has a positive effect on the WCPD-CNN-RNN model,and the participation of the CNN unbiased network can improve the performance of relationship classification.Finally,this paper applies information extraction technology to the Internet hot information field,designs and implements a Chinese information extraction system oriented to the hot information field,and uses the two algorithm models proposed in this paper to realize the entity and relationship extraction function of natural text.
Keywords/Search Tags:Information Extraction, Multi Feature, Neural Network, Named Entity, Joint Extraction
PDF Full Text Request
Related items