Font Size: a A A

Feature Extension Methodfor Short-text Classification Based On LDA

Posted on:2018-03-12Degree:MasterType:Thesis
Country:ChinaCandidate:M ZhangFull Text:PDF
GTID:2348330515968008Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the advent of the information era,people spend more and more time online.In recent years,some content distribution platforms and social network sites got rapid development.The analysis of public opinion on the network and the arrangement of network news need to be classified according to certain requirements,which involves the classification of text,especially the classification of short-text.However,the method of short-text classification is different from the way of the long text classification.One idea is to extend the feature first and then classify it according to the key words.Basing on this idea,this paper presents a method for the combination of the theme words based on LDA and feature-word classification weight.This paper studies the representation model of traditional long text deeply,the vector space model,which is suitable for the long text with lots of keywords.While for the short-text with few keywords,there will be a problem that feature-vector space is too sparse,so the vector space model can't be directly used to represent short-text.According to the research status at home and abroad,this paper studies the theoretical basis of the LDA model,then uses this model to get the topic-word distribution,tests the belonging theme of the sample with the LDA model and analyzes the correlation between the theme's words and short-text.This paper argues that there is a limitation of solution when using theme's words of the LDA model to expand feature of the short-text.On the basis of the characteristics of the LDA model,in view of the limitation of using the theme's words to extend the short-text directly,this paper raise an evaluation standard that can reflect the classification information between classes,the dispersion within the class,and the incomplete classification of the characteristic words in the class.Besides,this paper proposes a candidate word's self-selection mechanism for feature expansion using LDA keywords.To verify the effectiveness of the method,this paper use the ICTCLAS(Chinese academy of sciences segmentation tools)and LIBSVM to construct a platform for short-text classification,comparing the feature extension method proposed in this paper with the traditional method based on the LDA short-text classification feature extension.The experiments show that the performance of the classification has been improved to some extent after the feature is extended by the method in this paper.
Keywords/Search Tags:Theme Model, Feature Extension, Short-text Classification
PDF Full Text Request
Related items