Font Size: a A A

User Attributes Inference Based On Reviews On Social Media

Posted on:2018-11-06Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiuFull Text:PDF
GTID:2348330512484590Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Social media platform is a kind of online media which can provide some services like remarking,voting,feedback,sharing etc.For example,news website Phoenix,e-commerce website Amazon and Taobao,film review website Douban.User comments have features of public and availability.Group opinion as a kind of reference for other customers who want to buy products or services.With 'understanding user comments and inferring their attributes,it can help enterprises,organizations and governments to make great improvement in the service quality,personalized recommendation,marketing and so on.Hence,inferring attributes based on analyzing user online behavior has important applied value.However,most of social media users are anonymous,their reviews data are fragmented,noisy,and unbalanced.In addition,a serious lack of balance in distribution of user attributes.Those questions bring challenges to inferring user attributes.For the problem that user reviews behavior data are unbalanced,fragmented and noisy,we take into account the item information user commented and context as the supplements to assist in modeling user behavior.In addition,according to the characteristics of user comments,we adopt hierarchical text mining method to deal with user reviews from the global perspective,which reveals semantic relationship existed in words.Besides,we use distributed representation model learns the distributed representation according to the context information of the word.It also preserves the order of words from the sentence perspective and avoids the problem of dimension disaster,so as to achieve the purpose of digging user comments deeply on the potential semantic features.In view of the problem that the feature dimension is high after modeling and the fragmented information has low value.We adopt information gain to measure the importance of features,based on which we improve the two representative methods of probabilistic feature selection:Probability Wrapped Features Selection algorithm and Heuristic Probability Feature Selection algorithm in order to retain the important features while select trivial features with low probability.Besides,the methods also can reduce the search space and improve the convergence rate and learning results.User attributes distribution is imbalanced.In order to solve this challenge,we propose an algorithm,which pay attention to the small proportion sample learning,improve the accuracy of user attributes classification effectively.The method integrates multiple classifier,each of which uses a feature selection algorithm based on the importance of feature to improve the learning efficiency of the classifier.Above strategies identify the instance of small proportion of samples easily,which can effectively improve the accuracy of the user attribute classification.Several real datasets are adopted to validate our methods on attribute inference from several aspects,including behavior models,feature selection methods,parameters influence and the degree of imbalanced data on user attributes.The experimental results show that our methods outperform the related algorithms.
Keywords/Search Tags:social media, attribute inference, semantic analysis, user behavior, probabilistic feature selection
PDF Full Text Request
Related items