Research On The Classification Of Chinese Short-texts Based On Neural Networks

Posted on:2019-04-17

Degree:Master

Type:Thesis

Country:China

Candidate:W X Ding

Full Text:PDF

GTID:2417330563993064

Subject:Applied Statistics

Abstract/Summary:

PDF Full Text Request

In recent years,with the large-scale application of artificial intelligence in many fields,the use of deep learning is also increasing.The limitedness of text data,especially the short-texts data,and the non-standardization of language usage make it worthwhile to study the quick and accurate classification of short-texts.After deeply understanding the present situation and the specific flow of text classification,this paper combines with relevant theoretical knowledge,applies the convolution neural network model which is popular in image processing to the short-texts classification.Then,on this basis,the model was optimized and improved,and the corresponding conclusion is obtained.Firstly,the paper elaborates on the research status of text classification and convolutional neural networks at home and abroad,and analyzes the influence of text length on text classification.The text classification process and the algorithm principles used in the paper are fully described.Second,collect news text to extract news headlines as research data for short-texts classification issues,and preprocess the data.The data is divided into corpus set,training set,and test set.The text segmentation is divided into Word vector corpus and training and test data set by using Jieba Word segmentation tool.Then,remove the low-frequency words,punctuation,and numbers,Word2 Vec completes the fixed word vector training as the text feature information.A convolutional neural network model for short text classification was established and related parameters were optimized.The experimental results showed the effectiveness of the model.Finally,this paper also proposes a short-texts expansion method based on word vector similarity to improve the model structure.After getting the new text,the original text and it is respectively input the original model trained separately to obtain the low-dimensional vector representation.Then the low dimensional vector representation is spliced,and the classification is predicted by the full-connected layer and the softmax function,and compared with the original model.The results show that the improved model is effective and the accuracy is 2.36% better than the original model.

Keywords/Search Tags:

Text classification, CNN, Short-texts, Word2Vec, Word vector

PDF Full Text Request

Related items

1	A Comparative Study Of Multiple Classification Methods Based On Long Texts Of News
2	Research On The Course Recommendation Based On Word2Vec And TF-IDF
3	Short Text Topic Mining Of Hotel Comments Based On Emotional Classification
4	Research Of Text Representation Method Based On Co-occurrence Analysis
5	Research On Classroom Language Behavior Recognition Based On Text Classification
6	Research On Chinese Text Classification Based On Convolution Neural Network
7	Chinese Text Classification Based On Statistical Method
8	A Text Classification Based On The Recurrent Neural Networks
9	The Method Of Selecting Local Feature Words And Its Application In Text Classification
10	Research Of SVM Kernel Functions In Text Classification