Research On Text Labeling Method For Wechat Public Accounts

Posted on:2019-08-11

Degree:Master

Type:Thesis

Country:China

Candidate:L D Deng

Full Text:PDF

GTID:2428330545465540

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

As a widely used information acquisition channel,Wechat public accounts' texts cover all walks of life.Reasonable labeling of it's texts can help users locate articles of interest quickly,and can help facilitate user behavior analysis and build user portrait,which have important application value.But at present,there is no relevant research on the labeling of Wechat public accounts texts.For this reason,the classification of the Wechat public accounts texts has been proposed in this paper.(1)This paper proposed a topic-word embedding model.It solves the problem that traditional text representations have high data dimensions and lack the relationship in-formation between words,and can't distinguish the problem of polysemy.Firstly,using LDA to assign topics to each text;then sending the subject to the Skip-gram training in the form of pseudo-words and words in the context at the same time to get the word vector of each word and the vector of each theme;The vector of words is cascaded as a vector of text.(2)Using combined semi-supervised SVM,with annealing algorithm to automati-cally select parameters to solve the problem of a large number of parameters to be set in the semi-supervised method training stage.In initial stage,this paper's implementation does nothing but validate the internal supervised solver's parameter C on the(usually very limited)labeled set;the other parameter,γ is kept fixed;In iterative stage,C*is handled by a standard annealing sequence and it is limited to assume a small finite set of possible values.Therefore,the whole process only needs to set a few parameters manually.(3)Using the clustering method to cluster a large number of unlabeled data and select the unlabeled samples used for training in proportion to solve the deviation of the general label data and unlabeled data distribution.The semi-supervised classification algorithm that randomly adds unlabeled data may not be applicable to global data problems.The selective addition of training data enables the classifier to achieve good results even when the sample distribution is not uniform.(4)The paper established a public accounts category knowledge base.The types of articles published by the same Wechat public account are relatively fixed,and the public account source has referential significance for labeling.During the training stage of the semi-supervised classifier,the knowledge base is used to assist in judging whether the unlabel sample can join the training set.In the classification stage,the knowledge base is used to assist in judging the classification result to determine whether manual labeling is needed.(5)A method of tagging the Wechat public accounts text category based on knowl-edge base and semi-supervisory was proposed.The experimental results show that the method proposed in this paper not only im-proves the accuracy of the article annotation under the Wechat public accounts,but alsoreduces the number of manual interventions.

Keywords/Search Tags:

Labeling, Text Classification, Wechat Public Accounts, Topic Word Embeddings, Semi-Supervised

PDF Full Text Request

Related items

1	Word Embeddings Towards Text Classification Of Emotion And Topic
2	Research On Text Classification Algorithms Based On Machine Learning
3	Research On Semi-supervised Topic Model For Text Classification
4	Reinforcing The Topic Of Embeddings With Theta Pure Dependence For Text Classification
5	Combining Topic Model And Word Embedding For Short-Text Classification
6	Semi-supervised Learning On Text Data
7	Research On Short Text Classification Of Semi-supervised Pre-training Based On Autoencoders And Word Order Dependencies
8	Research On Eigenvector Mapping Algorithm Based On Multi-label
9	Research On WeChat Public Accounts Relationship Network Based On Text Mining Technology
10	Marketing Effect Differences And Its Influencing Factors Between Brand Public Accounts And Non-brand Public Accounts Of Wechat