Font Size: a A A

Research On The Classification Of Chinese Short-texts Based On Neural Networks

Posted on:2019-04-17Degree:MasterType:Thesis
Country:ChinaCandidate:W X DingFull Text:PDF
GTID:2417330563993064Subject:Applied Statistics
Abstract/Summary:PDF Full Text Request
In recent years,with the large-scale application of artificial intelligence in many fields,the use of deep learning is also increasing.The limitedness of text data,especially the short-texts data,and the non-standardization of language usage make it worthwhile to study the quick and accurate classification of short-texts.After deeply understanding the present situation and the specific flow of text classification,this paper combines with relevant theoretical knowledge,applies the convolution neural network model which is popular in image processing to the short-texts classification.Then,on this basis,the model was optimized and improved,and the corresponding conclusion is obtained.Firstly,the paper elaborates on the research status of text classification and convolutional neural networks at home and abroad,and analyzes the influence of text length on text classification.The text classification process and the algorithm principles used in the paper are fully described.Second,collect news text to extract news headlines as research data for short-texts classification issues,and preprocess the data.The data is divided into corpus set,training set,and test set.The text segmentation is divided into Word vector corpus and training and test data set by using Jieba Word segmentation tool.Then,remove the low-frequency words,punctuation,and numbers,Word2 Vec completes the fixed word vector training as the text feature information.A convolutional neural network model for short text classification was established and related parameters were optimized.The experimental results showed the effectiveness of the model.Finally,this paper also proposes a short-texts expansion method based on word vector similarity to improve the model structure.After getting the new text,the original text and it is respectively input the original model trained separately to obtain the low-dimensional vector representation.Then the low dimensional vector representation is spliced,and the classification is predicted by the full-connected layer and the softmax function,and compared with the original model.The results show that the improved model is effective and the accuracy is 2.36% better than the original model.
Keywords/Search Tags:Text classification, CNN, Short-texts, Word2Vec, Word vector
PDF Full Text Request
Related items