Font Size: a A A

Clustering-Based Zhihu User Categorization

Posted on:2020-05-07Degree:MasterType:Thesis
Country:ChinaCandidate:X Y JiFull Text:PDF
GTID:2428330578452412Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
As a kind of emerging social networks,Question,and,Answer social networks have attracted the attention of scholars in recent years due to their professional and intellectual feature.As the largest Chinese Question-and-Answer social network,Zhihu naturally arose great interests of a large number of researchers.Unlike traditional social networks,most of Zhihu users follow each other according to common interests or topics,and tend to follow the creators of high-quality contents.Therefore,created contents is more important than social relationships for forming Zhihu social network.The categorization of Zhihu users helps Zhihu operators to accurately position users,distinguish between expert users and ordinary users,optimize contents pushing and content provider recommendation methods,and promote the development of knowledge payment activities.According to the characteristics of Zhihu network,we categorize users from two aspects:(1)categorize users with different behavior features;(2)categorize users with different areas of interest.The behavior features include the user's creation features and browsing features,and user influence,etc.Among the user's creation features,user-created content quality evaluation is one of the difficulties.We usually use the judgement from other users to evaluate the quality of created contents,but this method is often affected by the attention of the topics of the created contents.User-created content is one of the most important basis for categorizing users with their areas of interest.In the methods of user categorization based on text content,the topic models usually cannot effectively deal with short text problems,and previous text clustering methods usually categorize each user into one cluster,however,user-generated content often has more than one topic,so that each user belongs to a cluster is not in line with the actual situation.The main contributions of this thesis are as follows:(1)We crawl a large amount of data of Zhihu network through web crawler,and build a new and comprehensive large-scale Zhihu dataset,which lays a foundation for our research work.(2)After analyzing the characteristics of Zhihu user behavior data,we propose to categorize users into multiple types based on user behavior features,and analyze the tendency of different types of users to provide and consume paid knowledge.We propose an answer quality assessment method,which weakens the impact of the topic differences on the evaluation of answer qualities,and we use this method to extract one of the behavioral features for user categorization.(3)We propose to construct User-Keyword Importance Vectors for the users' answer contents,and use clustering methods to perform cluster analysis on the users'answer contents,and determine the fields of attention of users in each cluster based on the keyword characteristics of each cluster.Based on the clustering results,we propose to find the user's multi-level label within a certain text similarity threshold.The user categorization results are evaluated based on the labelled data,and it is found that the results of proposed methods in this thesis are better than LDA model and Author-Topic Model,and the secondary label has a high accuracy when the value of parameter is small.(4)We design and implement a Zhihu user analysis prototype system,which has the function of crawling the data of target Zhihu user online,analyzing user data and visualizing the analysis results,calculating to obtain the label of interested field according to the target user's answer content,and giving recommended users to follow according to text similarity between the voted answer contents of the target user and the answer contents of the existing users in the database.
Keywords/Search Tags:Zhihu, Question-and-Answer Social Network, User Categorization, Clustering, User Analysis System
PDF Full Text Request
Related items