Research On Chinese Information Extraction Algorithm Based On Deep Learning

Posted on:2021-02-12

Degree:Master

Type:Thesis

Country:China

Candidate:J X Liang

Full Text:PDF

GTID:2370330611999326

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of the information age,a large amount of information exists in the form of text on the Internet.The text knowledge of the Internet is usually stored in web pages in an unstructured form.Conventional rule extraction methods cannot extract this knowledge well.Therefore,how to use automated methods to extract key information from the text has become an urgent need in the industry to solve.The main purpose of information extraction algorithm technology is to extract structured information from unstructured natural language text accurately,quickly and efficiently,and save it in a corresponding preset format for subsequent use.The traditional research ideas of triple information extraction include rule-based method,machine learning method and deep learning method.Compared with previous research methods,the deep learning-based method has great advantages in modeling.Among deep learning methods,the pipeline method and the joint learning method have the problem of pointing and matching of entity pairs,and the hierarchical-binary-labeling method,though the method effectively extract the entity pairs,there are also error propagation problems caused by too many steps.In order to solve the multi-stage prediction problem,this paper designs and implements a one-stage model of directed graph structure.This model uses the adjacency matrix of directed graph to express the position of entity pairs and the pointing relationship between entity words.At the same time,this paper has designed a variety of models for constructing adjacency matrices of directed graphs,and the attention model based on bilinear matrix can effectively use the attention matrix to construct adjacency matrices of directed graphs.Based on the hierarchical binary labeling model,this paper explores the ability of different range extraction models to extract the features of entity words.Among them,the method of endpoint vector mixing is improved on the basis of the original method,and the simple feature engineering method is used to further enhance the ability of extracting triple information of the hierarchical-binary-labeling model.At the same time,this paper refers to the idea based on the hierarchical dichotomy model,further subdivides the structure of the triple information extraction,and designs and implements a three-stage model.The research focus of this model is the classification performance of the relationship classifier by different entities.This paper has experimented with multiple sets of relational classification models.Among them,the classification effect of CNN model is slightly better than LSTM and other models.The combination of the directed graph based Bert model and the bilinear matrix attention model designed and implemented in this paper can achieve a score of f1 0.807,and the proposed three-stage model based on hierarchical binary labeling can achieve a score of 0.778.Compared with the score of 0.697 of the two-stage model,the results of previous models have been significantly improved.

Keywords/Search Tags:

information extraction, deep learning, triad

PDF Full Text Request

Related items

1	Application Research In Glacier Information Extraction Based On U-NET Model Of Deep Learning
2	The Research Of Information Extraction From VHR Image Based On Deep Learning Model With Attention Mechanism
3	Remote Sensing Information On Extraction And Feature Analysis Based On Deep Neural Network
4	Research On Water Information Extraction Method Of Multi-source Remote Sensing Based On Deep Learning And Its Application
5	Research On The Method Of Urban High-resolution Image Feature Extraction Based On Deep Learning
6	Research On Green Space Information Extraction And Application Based On Deep Learning Algorithms
7	Research On Deep Learning Extraction Method Of Building Structure Type Based On Multi-source Information
8	Deep Learning-based Prediction Of DNA-binding Proteins
9	Prediction Of Protein Serine ADP-ribosylation Modification Sites Based On Deep Learning
10	Research On Target Time Information Analysis Method Based On Deep Learning