Font Size: a A A

Research On Key Technologies Of Text Feature Representation Based On Neural Network

Posted on:2020-08-23Degree:DoctorType:Dissertation
Country:ChinaCandidate:X HanFull Text:PDF
GTID:1368330572472301Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the Internet era,data of text information has explosive growth.Faced with massive text data,one of the natural language research emphasis is how to extract meaningful information from it.In recent years,with the development of deep learning,neural networks have achieved great performances in image,speech and other practical tasks.However,in natural language processing,due to the complexity and ion of language,how to make computer understand human language has always been a difficult part to break through.Text representation is the basic input of most of the natural language processing tasks.Text representation can transform natural language into a computer-processable form,keep the corresponding semantic information,and apply on some specific practical tasks.In this paper,based on the neural network model,we focus on text feature representation.For different levels and granularity of text units,we propose a variety of text feature representation methods based on neural networks.The main research contents are as follows:Firstly,aiming at Chinese traditional characters,this paper proposes an ancient Chinese character representation which is lower than the character level based on character graphics features.By learning the glyph features of pictographs,some semantic information is extracted,which enriched the meaning of the word vector.This research extracts the radical information of traditional Chinese characters and uses Continuous bag-of-words model to generate corresponding character vectors according to the context.In addition,we try to regard traditional Chinese characters as pictures and use autoencoder convolution neural network to learn the features of these characters,so as to enrich the feature dimension of basic word vectors.In the experiment,we apply the model to the problem of sentence boundary recognition.In ancient Chinese,articles do not have punctuation.We use this character vector as the input,and try to find the boundary of a sentence.We apply the model on a large number of articles and achieve good results.Secondly,aiming at the word-formation of English words,this paper proposes a word representation based on English characters.This expression can learn the rules of alphabetical sorting,and the rules of a case and special characters in words.It can merge the extracted features and common word vectors to enhance the feature dimension of word representation.Based on the convolution neural network,the model takes letters of words as input features,learns the relationship between letters,and connects them to traditional English word vectors.In the experimental part,two typical problems of word-level sequential tagging in natural language processing,are selected to carry out relevant experiments,which are named entity recognition and part-of-speech tagging.The experimental results prove the validity and robustness of the proposed model.Finally,for high-granularity text types,a sentence vector representation based on the attention mechanism is proposed.A general vector representation of sentences based on an encoder-classifier structure is proposed.We use the classifier structure instead of a decoder part,so that to reduce the calculations.Attention mechanism is added to this model to enhance the dependence between words in sentences,and enrich the meaning of sentences.In addition,a sentence pair vector representation method based on a convolutional neural network is proposed.Attention mechanism is also used to strengthen the semantic relationship between sentences,and improve the classification effect.In the experimental part,the calculation of sentence correlation and the experiment of sentence classification are carried out for the general sentence feature representation.The experimental results prove the generality of the model.Sentence correlation experiments are carried out on the vectors for sentences,which improve the accuracy of the model.Based on various kinds of neural networks,this paper proposes several text feature representation methods based on multi-level granularity,and provides corresponding feature extraction content for characters,words,and sentences in the text,and carries out relevant experiments.Experiments show that good results have been achieved in text information annotation,text categorization in general data scenarios.
Keywords/Search Tags:neural network, text representation, attention mechanism, word vector
PDF Full Text Request
Related items