Font Size: a A A

Research And Application Of Gender Classification Of Microblog Users

Posted on:2020-02-04Degree:MasterType:Thesis
Country:ChinaCandidate:Y CaoFull Text:PDF
GTID:2428330575965445Subject:Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet technology,various social networking platforms have become popular and popular.As one of the most popular social networking platforms in China,Microblog platform has many users.At the same time of using microblog,it generates a huge amount of data,which contains rich information and value,and attracts experts from various fields to conduct mining and research.User gender as one of the user's basic attributes is the basis for other research on users.Correctly predicting the gender of microblog users has an important impact on the company's personalized recommendation,accurate marketing,online user behavior analysis.Therefore,the research on the gender of microblog users has become more and more popular.However,many people do not fill in their gender information or fill in the wrong gender information when registering microblog in order to protect privacy.Therefore,conducting research using user-registered information can produce erroneous results.Therefore,how to predict the gender of users through the information generated by users on the microblog platform has great research value.The research in this thesis mainly includes three parts:(1)The classification method of the user gender based on microblog text is proposed in this thesis.In order to solve the problems that the single microblog data can not accurately reflect the user's gender information,the method proposed in this thesis combines the microblog data published by the user for a period of time.In order to reduce the influence of text data dimension and noise on the classification result,the LDA topic model is used to extract the topic features of the microblog text,and then the random forest algorithm is used to construct the classifier for classification.The experimental results show that the LDA topic model combined with the random forest algorithm can effectively extract the hidden gender information in the microblog text and improve the accuracy of gender classification.(2)The user gender classification method based on multi-type data is proposed in this thesis.In order to solve the problem that single type data can't accurately predict the gender of microblog inactive users,the method proposed in this thesis uses microblog text data,microblog user name data and microblog source data to construct base classifiers respectively.Finally,these classifiers are combined by the Bayesian.The method proposed in this thesis makes full use of the connection between different types of data,and more accurately predicts the gender of the user.The experimental results show that the use of multi-type data for user gender prediction is better.And the method also has a high accuracy rate for the prediction of the inactive microblog user gender.(3)The personalized microblog recommendation algorithm based on user characteristics is proposed in this thesis.Traditional microblog recommendation systems mostly use user text data and ignore user attribute data.The algorithm proposed in this thesis first extracts the feature data of microblogtext,and combines user gender and age characteristics as auxiliary attributes to vectorize user information.The Xgboost classification algorithm is used to extract the user preference list,and the cosine similarity is used to construct the user's similar user list,and finally the microblog content published by similar users is recommended for the user.The experimental results show that the proposed method achieves a good recommendation.
Keywords/Search Tags:gender classification, random forest, LSTM, personalized recommendation of microblog
PDF Full Text Request
Related items