Font Size: a A A

User Attribute Recognition On Microblog

Posted on:2016-05-21Degree:MasterType:Thesis
Country:ChinaCandidate:Y X XueFull Text:PDF
GTID:2308330464452140Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of social media, automatically analysis on useful information in Social Network has become an important research topic in the communities such as Natural Language Processing and Social Media Analysis. User attribute recognition on Microblog is one foundational task which aims to determine the attributes of the users in Microblog(e.g. gender, age) according to the related data generated by the user. Accurately recognizing the user attributes on Microblog is a basic task for many real-life applications, such as intelligent marketing, personalization prediction, and sentiment analysis. The key issues of our research are summarized as follows:First, as for the Human and Non-human attribute of Microblog user, we propose a classification method using both user names and messages(sent by the users) to recognize human or non-human user, where two different classifiers are trained with the two kinds of text. On the basis of the two classifiers, we employ Bayes rule to combine them so as to make the prediction with classification knowledge from both the user names and messages. Experimental results demonstrate the effectiveness of the proposed approach and that the combination method outperforms the individual classifier trained with only user names or messages.Secondly, as for the gender attribute of Microblog user, we propose a semi-supervised method for gender classification which leverages interactive knowledge and unlabeled data. Conventional approaches to gender classification much rely on a large scale of labeled data, which is normally hard and expensive to obtain. As a social media, Microblog provides multiple platforms for user interaction. So Microblog contains not only the non-interactive text that written by the user himself(i.e., messages) but also the interactive text that written by others(i.e., comments). In this paper, we propose a co-training approach in gender classification, which employs both non-interactive and interactive texts, i.e., the message and comment texts, as two different views to well incorporate unlabeled data. Experimental results on a large data set from Microblog demonstrate the appropriateness of leveraging interactive knowledge in gender classification and the effectiveness of the proposed cotraining approach in gender classification.At last, as for the age attribute of Microblog user, we propose a semi-supervised method for age regression with co-training. The main idea of our semi-supervised approach is to leverage textual and social features as two separate views in co-training to automatically annotate the unlabeled data. Moreover, we propose a query by committee(QBC) method to address the critical challenge of the confidence estimation problem in age regression. Empirical evaluation on both balanced and imbalanced data demonstrates the effectiveness of our proposed approach to semi-supervised age regression.
Keywords/Search Tags:Microblog analysis, gender classification, age recognition, semi-supervised learning
PDF Full Text Request
Related items