Research On Short Text Classification Algorithms For Microblog

Posted on:2020-05-24

Degree:Master

Type:Thesis

Country:China

Candidate:Q Zeng

Full Text:PDF

GTID:2428330596975565

Subject:Engineering

Abstract/Summary:

PDF Full Text Request

With the wide rise of social media,a series of application products such as micro-blog are developing rapidly.Until 2018,according to official statistics of Weibo,its daily active users had aready reached more than 160 million,and its daily visits had reached 10 billion levels.How to extract important information from these data and how to provide consumers with what they want to see more quickly and accurately has become a top priority.In this paper,based on microblog news data,the classification of Chinese short text is studied in two aspects.One is short text classification based on word vectors.In short text classification based on word vectors,K-Nearest Neighbor(KNN),FastText and Convolutional Neural Network(CNN)are selected as models.Word2 vec is the basis of word vector construction stage.The second is short text classification based on feature extension.In microblog short text feature extension classification,based on support vector machine(SVM)and KNN,the model is constructed.In feature extension,the topic model,knowledge base and word vector are expanded,and the weight representation of words is studied.(1)First,in terms of word vector classification based on microblog,a word vector text generation model based on word importance(TFIWF-WES)is proposed.A similarity-based KNN algorithm(CS-KNN)is proposed to improve the KNN model.CNN model is used to classify short micro-blog text and compare it with traditional machine learning algorithm.(2)Next,a feature extension model(SSE-BOW)based on the interaction of semantics and similarity is proposed for feature extension classification of short texts on microblogs.The model is compared with the basic model and the short text classification model with different granularity.(3)Finally,through the evaluation indicators of accuracy(P),recall(R),F1 value(F1),the two aspects of the study were compared.The evaluation indicators of SSE-BOW model are 69.3%,69.1%,69.0%,which are improved 4.5%,5.7%,5.3% compared with BOW model.The evaluation indexes of TFIWF-WES model are 68.8%,68.4% and 68.4%,which are improved 2.7%,2.4%,2.5% compared with D-WES model.

Keywords/Search Tags:

machine learning, deep learning, feature extension, short text categorization, microblog

PDF Full Text Request

Related items

1	Short Text Classification Algorithm Of Deep-learning Based On Feature Extension
2	Research On Short Text Classification Method Based On Feature Extension
3	Research On The Method Of Chinese Text Categorization Based On Machine Learning
4	A Study On Text Categorization Based On Machine Learning
5	Research And Implementation Of Text Classification Based On Depth Learning Theory And SVM Technology
6	Short Text Classification Based On Feature Extension
7	The Research And Application Of Text Categorization Based On Machine Learning
8	Research On Short Text Classification Technology Based On LDA Feature Extension
9	Text Feature Representation And Classification Based On Deep Learning
10	Research On High Performance Chinese Text Classification Based On Machine Learning