Font Size: a A A

Research On Feature Expansion And Classification Of Short Text Based On Topic Model And Deep Learning

Posted on:2019-06-13Degree:MasterType:Thesis
Country:ChinaCandidate:P ZhaoFull Text:PDF
GTID:2438330545490706Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Text classification task can help people to discover valuable information hidden in the text set.Many previous studies have achieved excellent results in traditional text classification tasks.With the development of new social media,a large number of short texts appear on the Internet.The types of short text normally include micro-blog,reviews,comments and so on.Short texts usually consist of 10-100 words.Many classification methods which achieve excellent results on long texts hardly achieve satisfactory results on short texts,because data features directly affect the performance of machine learning algorithms.Based on the above,the research on short texts usually consists of three parts:feature extension,improving the quality of word vector representation and enhancing the effect of classification.Main tasks in this paper include:1)The existing short text extension methods are analyzed and researched.2)Analysis and improvement for existing word vector representation algorithms.3)The existing feature selection algorithms are analyzed and used for feature extraction in short texts.4)The CNN(Convolutional Neural Network)is used to classify short texts.The primary innovation of this paper includes:1)Based on TNG(Topical N-Gram)algorithm,a novel short text extension method is proposed,and its advantages and disadvantages are analyzed.2)This paper improves the learning scheme of TWE model.3)This paper presents a topic merging strategy for reducing the difficulty of preprocessing and protecting original features in short texts based on the MCFS algorithm.A framework for short text classification is presented in this paper,which includes word embedding,feature engineering,and classification system based on the CNN.Finally,on an open short text classification dataset,this paper compared the proposed framework with various baselines,and experimental results validate the effectiveness of our method.
Keywords/Search Tags:Topic model, Word embedding, Feature selection, CNN, Short text classification
PDF Full Text Request
Related items