Font Size: a A A

Short Text Classification Method Based On Ensemble Learning

Posted on:2021-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:Z W SongFull Text:PDF
GTID:2428330614958450Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet and social media,people's learning and lifestyle are constantly changing.The rise of platforms such as Weibo,Twitter,BBS,and SNS has brought a lot of short text data,such as news headlines,online chats,and product reviews.These data cover a wide range and high information content,and provide a key source of information for organizations such as companies,governments,and scientific research institutions.Therefore,how to effectively manage and classify these short text data has become the focus of current research.Because short text has the characteristics of short space,sparse features and irregular text form,the traditional long text classification method can not achieve a good classification effect on short text.In response to this problem,this thesis studies the short text feature expansion method,and combined with the ensemble learning to improve the classification performance and generalization ability of short text.The research in this thesis includes the following:1.Aiming at the problem of sparse short text features,this thesis proposes a short text feature extension method based on LDA topic model.First,the LDA topic model is trained through a large document set,and the document-topic and topic-word probability distribution of short text is predicted by the model;then the topic with a higher probability value is selected,and the word with a higher probability is extended into the short text.Due to the high similarity between themes in the traditional LDA theme model,this thesis uses a weighted LDA theme model for training to reduce the similarity between themes and improve the difference between the words to be expanded.Finally,design experiments verify the feasibility and effectiveness of the method.Experimental results show that after using this method for feature expansion,short text can achieve better classification results.2.Aiming at the problems of low accuracy and unstable classification performance of a single classification algorithm,this thesis combined the idea of ensemble learning and proposed a short text classification model based on ensemble learning.First,multiple base classifiers are trained through the short text training set after feature expansion;then a classifier selection method based on multiple diversity measures is proposed,this method combines paired and unpaired diversity measures to select the most diversity classifier set to participate in the final short text classification;finally design experiments to verify the feasibility and effectiveness of the method.Experimental results show that the short text classification model proposed in this thesis has high classification performance and generalization ability.3.The short text classification model proposed in this thesis is actually applied to design and implement a short text classification prototype system.It mainly collects,classifies and statistically analyzes the short text data generated by news,microblogs,and other platforms,and finally displays the classification results for users in the form of Web pages.
Keywords/Search Tags:short text classification, LDA topic model, feature extension, ensemble learning, classifier selection method
PDF Full Text Request
Related items