Font Size: a A A

Research On Chinese Personal Relation Extraction Based On Convolutional Neural Network

Posted on:2019-01-30Degree:MasterType:Thesis
Country:ChinaCandidate:L P JiaFull Text:PDF
GTID:2428330548486553Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the era when the present data is king,a great deal of text data based on natural language has been accumulated on the Internet,from which many valuable knowledge can be extracted and applied to many fields such as knowledge quiz,commercial recommendation and advertising system.There are many ways to research text mining at present.However,there are relatively few studies on the extraction of character relations.The more advanced is the method of machine learning,but the selection of features still needs to be done manually.In view of this status quo,based on a large number of text-free texts from Chinese free texts on the Internet,this paper proposes a method for extracting Chinese characters based on convolutional neural networks.The specific work of this paper includes the following aspects:(1)This thesis first studies the whole process of text preprocessing,focuses on two most classical word segmentation algorithms,Hidden Markov and Conditional Random Fields,and applies these two algorithms to Chinese word segmentation successfully.Through the analysis of experimental results,Suitable for the model of this data set,and complete the work of word segmentation and POS tagging.(2)In this thesis,we study two kinds of word vector representation algorithms.Firstly,we study the unique encoding and summarize the shortcomings of word vectors.Then we study the word vector algorithm based on distributed representation.Focus on the distributed Word2 vec model,Different training methods have completed the training of the two different architectures of the model,and through experiments,it proves that the last generated word vector contains the semantic information of the original words,and finally successfully uses the trained model to complete the training of the data set in this paper Word vector conversion work.(3)This thesis transforms the task of human relations extraction into text categorization,and proposes a method of extracting word vector features by Convolutional Neural Network(CNN)and classifying human relations.This method extracts the common five kinds of human relations from the social data of the characters on the Internet,with an accuracy rate of 92.87%.The recall rate of the three types of human relations has reached more than 85%,and the recall rate is not ideal Two types,this article has analyzed the reason again.Finally,it proves that the method proposed in this paper can be applied to complete the task of extracting the relationship between people in the project.
Keywords/Search Tags:relation extraction, word segmentation, word vector, Word2vec, CNN
PDF Full Text Request
Related items