Font Size: a A A

Research On Gender Discrimination Of Micro-Blog Users Based On Micro-Blog Data

Posted on:2016-08-18Degree:MasterType:Thesis
Country:ChinaCandidate:J H AnFull Text:PDF
GTID:2308330464472624Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the high-speeding development of Internet, more and more people have enjoyed the convenience brought by the Internet. Micro-blog, as a social application based on the Internet, has become an important tool for people to comment and discuss social issues because of its high-speed and sociability. Just because of its huge user group, high-traveling speed and the group effect, advertising media, public opinion supervision departments and other similar units are in urgent need of Micro-blog content analysis to gain useful information. In addition, how to gain the user’s behavior model and other information through the Micro-blog content and user’s data, has become an important part of research Area of big social media company such as Twitter, Facebook, Tencent and Sina Weibo. And that once the user’s features such as gender, age and so on, could be effectively predicted through the content of the Micro-blog, it will play a great role in the aspects mentioned above.Up to now, a mass of research work has been done on information mining from Micro-blog, focusing on hot topic detection, sentiment analysis, core opinion leader identification, mining in social media, etc. However, fewer research emphasizes on the user’s attributes data, such as the classification about gender, age.This paper just focus on a part of the users’features, doing some research on the gender discrimination of Micro-blog users using some simple method through analyzing the contents of Micro-blog and users’data. The main contribution and the innovation points of this paper are as follows:First, take the Tencent Weibo as an example, this paper do some research on the open platform API and features of Micro-blog contents, putting forward an automatically downloading algorithm for the personal profiles of users and the huge amount of Micro-blog. The author finds that, when analyzing the content of Weibo, there are many interaction operations in the Weibo content between the host and the other users, which involves names of other users. Thus this paper puts forwards an automatically detection algorithm of user names; using the user names detected automatically and the open platform API, the author designed the automatically downloading algorithm of huge amount of Weibo content and personal profile of users, and established the corpus of Micro blog contents and users’profiles.Secondly, according to the statistically analysis of Micro blog content and user’s profiles, this paper puts forwards an algorithm to discriminate the gender of users based on the nickname and the verbs in the Micro blog content, and a method to extract feature words. Through analysis of the huge amount of Micro blog content and personal profiles, the author found that the nicknames of the majority of users are similar with the Chinese names, and just because the high similarity and the fact that Chinese names have an strong gender distinction, the author raised an algorithm to discriminate gender of users based on the words in the nickname. After word segmentation for the Micro blog content, the author analyzed the frequency of verbs in two genders, finding that the verbs also have a strong gender distinction, so the author designed an algorithm to discriminate the gender based on verbs according to this point, and designed the standard method for extraction of feature words. Through the experiments and comparative analysis, we found that the methods put forwards by the author have a relatively good accuracy after properly adjustment and feature extraction upon the test data.
Keywords/Search Tags:Micro-blog, gender discrimination, machine learning, feature selection
PDF Full Text Request
Related items