Font Size: a A A

User Gender Identification In Micro-blog

Posted on:2017-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:J J WangFull Text:PDF
GTID:2308330488961926Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Automatic data analysis in social network is an important research issue in several research communities, such as natural language processing and social network analysis. Gender classification in Micro-blog is one fundamental task which aims to determine whether a user is male or female by analyzing the user-generated content. Although some researchers have devoted their efforts on gender classification, there is still a lack of researches in Chinese gender classification. Therefore, we firstly propose an ensemble approach to address the gender classification in Chinese Micro-blog. Then inspired by the interactive mechanisms among the users in Micro-blog, we define a novel task named interactive gender inference which aims to utilize interactive text to identify the genders of two interactive persons. Finally, we propose a joint inference approach which not only could improve the classification performance of interactive gender classification, but also could facilitate the improvement on the classification performance of individual gender classification. In details, our study mainly includes the following three aspects:First, for gender classification in Chinese Micro-blog, a classification method using user names or messages(sent by the users) to recognize male and female is proposed. Different types of features(e.g., character and word features) are investigated to perform the classification; Then, on the basis of the two classifiers trained with user names and messages, Bayes rule is employed to combine the two classifiers so as to make the prediction with classification knowledge from both the user names and messages. Experimental results demonstrate that the proposed approach yields a nice performance to gender classification, and the combination method outperforms the individual classifier trained with only user names or messages.Second, in social media, it is worthwhile to highlight that the large number of users are not in isolation but correlated to each other. Therefore, in this scenario, a user-generated text is normally shared by several users instead of a single user. In view of this, we define a novel task named interactive gender inference, which aims to utilize interactive text to identify the genders of two interactive users. To address this task, we propose a two-stage approach by well incorporating the dependency among the interactive samples sharing identical users. Specifically, we first apply a standard four-category classification algorithm to get a preliminary result, and then propose a global optimization algorithm to achieve better performance. Evaluation demonstrates the effectiveness of our proposed approach to interactive gender inference.Third, not only the label of one instance from interactive gender classification might be correlated with those of other instances from the same task interactive, but the label of one instance from interactive gender inference might be correlated with those of other instances from the task of individual gender classification. Therefore, we address this task by proposing a joint inference approach which well incorporates label correlations among the instances. Specifically, an Integer Linear Programming(ILP) approach is proposed to achieve global optimization with various kinds of intra-task and extra-task constraints. Empirical studies demonstrate the effectiveness of the proposed ILP-based approach to both interactive gender classification and individual gender classification.
Keywords/Search Tags:Gender Classification, Text Classification, Social Media, Integer Linear Programming
PDF Full Text Request
Related items