Font Size: a A A

Research On Learning Methods Based On Topic Model And Its Application In User Portraits

Posted on:2020-08-28Degree:MasterType:Thesis
Country:ChinaCandidate:M Y LiFull Text:PDF
GTID:2428330575495170Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the era of increasing popularity of smart phones and mobile networks,WeChat has become a mainstream social tool after weibo and tencent QQ.WeChat's unique WeChat Public Account function has also been loved by the majority of users,and then produced tens of thousands of accounts and a large number of articles about the business content of this accounts.The analysis and mining of this kind of unique and real source of informal text is a hotspot and difficulty in the field of data mining.In this paper,the topic model algorithm is used to extract the topic of WeChat Public Account articles,and further integrates various features of the text to realize the portrait construction of readers,so as to help the operators to improve the accuracy of personalized push,and at the same time to provide data support for network environment monitoring.The main work of this paper includes the following three aspects:(1)Aiming at the problem that the topic model algorithm has more noise words and can't solve the polysemy phenomenon,a topic model algorithm is proposed which effectively integrates the prior information of the word and the background word.The main idea is to calculate the prior distribution parameters specific to each subject based on the prior information of the words and to distinguish the background words from the subject words in the process of word sampling.(2)The topic model algorithm proposed in this paper preserves the conjugate properties of the structure,and further proposes an effective Gibbs sampling method to implement the parameter inference of the model.The topic model algorithm proposed in this paper preserves the conjugate properties of the structure,and further proposes an effective Gibbs sampling method to implement the parameter inference of the model.In this paper,a lot of experiments are carried out on the data of the real articles of the WeChat Public Account,which proves that the topic model algorithm proposed in this paper has certain improvements in model perplexity,topic coherence,model training time and so on.In order to further evaluate the text representation ability of the topic model algorithm proposed in this paper,experiments on text classification task and clustering task are carried out respectively,and the effectiveness of the topic model algorithm proposed in this paper is further verified.(3)Furthermore,the the topic model algorithm is applied to the task of user portrait construction,and two different application methods are proposed.One is the user portrait label classification algorithm based on Stacking classifier fusion strategy.The other one is the user interest modeling algorithm based on semantic similarity,which only focuses on the user interest modeling,improving the lack of interpretability of the results of applying the topic model algorithm to the user interest modeling.Experimental results show that the accuracy of user portrait construction obtained by the two application methods proposed in this paper is advanced to a certain extent.Compared with other algorithms,the accuracy is improved to a certain extent.Moreover,the results are easier to be understood and have more practical application value.
Keywords/Search Tags:Text Mining, Topic Model, Priori Information, Background Words, Gibbs Sampling, User Portrait
PDF Full Text Request
Related items