Font Size: a A A

Research On Short Text Classification Algorithms For Microblog

Posted on:2020-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:Q ZengFull Text:PDF
GTID:2428330596975565Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the wide rise of social media,a series of application products such as micro-blog are developing rapidly.Until 2018,according to official statistics of Weibo,its daily active users had aready reached more than 160 million,and its daily visits had reached 10 billion levels.How to extract important information from these data and how to provide consumers with what they want to see more quickly and accurately has become a top priority.In this paper,based on microblog news data,the classification of Chinese short text is studied in two aspects.One is short text classification based on word vectors.In short text classification based on word vectors,K-Nearest Neighbor(KNN),FastText and Convolutional Neural Network(CNN)are selected as models.Word2 vec is the basis of word vector construction stage.The second is short text classification based on feature extension.In microblog short text feature extension classification,based on support vector machine(SVM)and KNN,the model is constructed.In feature extension,the topic model,knowledge base and word vector are expanded,and the weight representation of words is studied.(1)First,in terms of word vector classification based on microblog,a word vector text generation model based on word importance(TFIWF-WES)is proposed.A similarity-based KNN algorithm(CS-KNN)is proposed to improve the KNN model.CNN model is used to classify short micro-blog text and compare it with traditional machine learning algorithm.(2)Next,a feature extension model(SSE-BOW)based on the interaction of semantics and similarity is proposed for feature extension classification of short texts on microblogs.The model is compared with the basic model and the short text classification model with different granularity.(3)Finally,through the evaluation indicators of accuracy(P),recall(R),F1 value(F1),the two aspects of the study were compared.The evaluation indicators of SSE-BOW model are 69.3%,69.1%,69.0%,which are improved 4.5%,5.7%,5.3% compared with BOW model.The evaluation indexes of TFIWF-WES model are 68.8%,68.4% and 68.4%,which are improved 2.7%,2.4%,2.5% compared with D-WES model.
Keywords/Search Tags:machine learning, deep learning, feature extension, short text categorization, microblog
PDF Full Text Request
Related items