Font Size: a A A

Discovering Users' Interests By Combining User Generated Contents And Behavioral Logs

Posted on:2019-03-29Degree:MasterType:Thesis
Country:ChinaCandidate:D L ZhangFull Text:PDF
GTID:2348330542991161Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
In recent years,all kinds of online technology communities have become important platforms for technology enthusiasts or practitioners to have communication,consultation and sharing on techniques.However,with the explosive growth of information,the problem of information flooding ensues.As a result,it poses a huge challenge for both users who need to acquire information and community operators who need to publish information.Therefore,it is of great significance to have a research on identifying users' expertise or interests from a mass of user generated contents and behavioral logs,which will contribute to profiling the users more accurately.Furthermore,it will also help community operators to provide precise recommendations and personalized services for users,thereby to increase users' viscosity and communities' activity.At present,the mainstream methods of discovering users' expertise or interests only consider the documents published by the users from the view of content producers,without considering the documents read,commented or collected by the users from the perspective of content consumers.To deal with the problem mentioned above,this paper fully analyzes the inherent laws of content production and consumption in online technology communities,and proposes a novel Author-Reader-Topic(ART)model to synchronously discover users' expertise and interests by combining user generated contents and behavioral logs.Firstly,massive raw data are preprocessed,including user generated contents and behavioral logs.For user generated contents,first of all,the noise data in the blog documents,including the code blocks,HTML tags and URL links,are filtered by a semi-supervised method according to their distribution.And then word segmentation and stop word removal are achieved through the word segmentation tool which is combined with the proprietary dictionaries in the field of Information Technology.In the end,non-technical types of documents are filtered according to the proportion of the technical vocabulary in the documents.For behavioral logs,this paper associates the users(readers)corresponding to the behaviors with the documents and forms a normalized log records by parsing the different types of behavioral logs.Secondly,this paper proposes to discover users' expertise and interests by combining user generated contents and behavioral logs.Considering that a user in a community is both a producer(author)and a consumer(reader)of contents,this paper proposes a novel topic model-ART model-to simultaneously model users' expertise and interests.The model can effectively link the authors and the readers of documents.Therefore,it can improve the quality of topic clustering and get more accurate author topic distribution and reader topic distribution,which will be helpful for better discovering users' expertise and interests.Finally,a series of comparative experiments and analyses are conducted on a real data set collected from the CSDN technology community.The experimental results show that the ART model proposed in this paper can effectively discover users'expertise and interests,and it obviously outperforms other existing related methods.At the same time,through the analysis of users' expertise and interests found by the model,the hypothesis that users' expertise in the community is relatively concentrated and users' interests are relatively decentralized is also verified.
Keywords/Search Tags:Online technology community, User profiling, User interest, User expertise
PDF Full Text Request
Related items