Font Size: a A A

Analysis And Research Of Microblog User Gender Classification

Posted on:2019-05-25Degree:MasterType:Thesis
Country:ChinaCandidate:Q Y SunFull Text:PDF
GTID:2348330542955570Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
From e-mail to blog,Facebook,twitter and other web sites,the development of social network beyond the imagination of people.Social network plays an important role in people's lives.It has become a part of people and has an inestimable impact on people's access to information,thinking and living.Social network has become a window for people to access information,show themselves and promote marketing.Microblog user's gender judgment has strong practical application value in such field as personalized recommendation,intelligent marketing and so on.The research of this thesis is divided into the following two aspects:The first research is judging microblog user's gender based on the user's original microblog text.Aiming at the short text sparseness,this thesis proposes a new method which combining the word2 vec model and the LDA model to extend text features.First word2 vec model was generated base on Chinese Wikipedia dataset.Then the massive original microblog text was used to generate LDA theme model after extend text features through word2 vec model.After using the combination model to extend text features of the training samples and the testing samples,the training samples were used to train the SVM classifier and the testing samples to test the classifier accuracy.The result of the experiment showed that using combination of the word2 vec model and the LDA theme model to extend text features can effectively reduce the degree of text scarcity and improve the classification accuracy.The second research is judging microblog user's gender based on three views which are original microblog text,user's tags and user's nickname.According to the number of the microblog users is very large,and personal information is not always true,it's difficult to label the user's gender.In this thesis three different views were constructed by analysis,and six classifiers were constructed with the entropy query-by-bagging.A small number of labeled samples and a large number of unlabeled samples were used to iterate the training classifiers.In each iteration,the unlabeled samples which have the largest vote entropy were artificial labeled and then added into the training set.And the same results of the implicit vote were also into the set.It is found that the accuracy of the improved tri-training classifier is 1.3% higher than the original tri-training algorithm,and is 7.1% higher than the single-view supervised algorithm by using the real user data to test the classifiers.
Keywords/Search Tags:word2vec, LDA, tri-training, multi-view learning, gender recognition
PDF Full Text Request
Related items