Research On Semantic Feature Based Text Classification Algorithom

Posted on:2017-12-15

Degree:Master

Type:Thesis

Country:China

Candidate:B Yuan

Full Text:PDF

GTID:2348330518495545

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of information technology and Internet,our society has been influenced deeply and extensively.Massive electronic text is created and spreads through the Internet,as a result of the eruption of websites,social networking services(SNS)and e-commerce.Text is one of the most important media on the Internet.There is no doubt that it's necessary and valuable to study techniques of filtering,organizing,managing and mining the web text.As a frontier topic in the field of information processing,automatic text classification or automatic text categorization(TC)enables us to manage the massive text effectively and to locate the information that we are interested in quickly.TC techniques have been widely applied in the applications of information retrieval(IR),news classification,e-mail classification and public opinion analysis.Doubtlessly,TC has a bright prospect of application and is of high research significance.Vector space model(VSM)and topic model are the most popular methods to model text.Both of them are bag of words models which ignore the information of word order and word context.However,meaning often varies when word order is different,and words have different meanings in different context.Since text class label depends on the semantic meaning of the text,the semantic information ignored by the above two models is important for text classification.In order to overcome the weakness of VSM and topic model,this thesis takes advantage of some deep learning techniques to mine the semantic information within text.One of the advantages of deep learning is that it can learn abstract features(semantic features)through deep architectures.The deep learning techniques used in this thesis includes word embedding,recurrent neural networks,convolutional neural networks and so on.The main contribution of this thesis is as follow:First,this thesis proposes a negative sample based recurrent neural network language model(Neg-RNNLM)to train word embedding.After the detailed analysis of the problems of current word embedding methods,this thesis makes some improvement of the recurrent neural network language model(RNNLM).Neg-RNNLM is more efficient than RNNLM,and the quality of word embedding is better.Second,a text and knowledge base combined model is proposed to train word embedding.There are lots of useful and accurate semantic relations in knowledge bases(such as WordNet).The text and knowledge base combined model takes advantage of the semantic relations in WordNet to train word embedding,and more accurate word embedding is obtained.Third,this thesis compares three different methods of modeling document features based on word embedding.Three methods are:Paragraph Vector,CNN and RNN recurrent layer vector.The method of RNN recurrent layer vector has not been studied in previous work.The result of experiment shows that,CNN is better.Combining CNN and word embedding trained by Neg-RNNLM-graph model,stat-of-art text classification result on two benchmark datasets is obtained.

Keywords/Search Tags:

text classification, semantic feature, word embedding, recurrent neural network, convolutional neural network

PDF Full Text Request

Related items

1	Research On Long Text Classification Based On Word Embedding Technology
2	Text Classification Research Based On Deep Neural Network And Attention Mechanism
3	Research And Implementation Of Text Sentiment Analysis System Based On Neural Network Model
4	Research On Text Classification Based On Word Sense Disambiguation And Convolutional Neural Network
5	Research On Text Classification Model Based On Deep Neural Network
6	Research Of Short-text Classification Method Based On Convolution Neural Network
7	Research On News Text Classification Based On Convolutional Neural Network
8	Short Text Classification Based On Multi-granularity Feature Representation And Recurrent Convolutional Neural Network
9	Research On Text Classification Based On Structure Optimization Recurrent Neural Network
10	Text Sentiment Analysis Based On Deep Learning Algorithm