Font Size: a A A

User Interest Modeling In Large Scale Social Media Based On Spark Framework

Posted on:2018-03-01Degree:MasterType:Thesis
Country:ChinaCandidate:R F YangFull Text:PDF
GTID:2348330512482988Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Mining the intrinsic interest of social media users is a hot problem in the era of big data,user interests in the personalized advertising push,security intelligence,network public opinion and other directions have important theoretical and practical significance.And the historical information in the social media always indicates users' implicit interest to some extent.User interest modeling in social media has also attracted the attention of researchers.This paper makes an in-depth study on user interest modeling in social media,which is symbolized by microblogging,and puts forward a new user interest model.The method is also suitable to the investigation of short text data in the social media,such as We Chat,Twitter and other data.This article carries out research mainly from the interest representation and construction methods:(1)According to the corresponding content and published time,the users' interest can be modeled from three dimensions,including the topic model,the category model and interest keyword model.In the aspect of interest topic model,we improve word bag model according to the social media short text feature.Exploit the Word2 vec technigue to construct semantic representation model among features.The sequence diagram model can be constructed by using the sequence of features in the sentence.On the base of these,together with time factor,we proposed a time-based user interest topic model to extract the topic of user's attention.The experimental results show that the FM,AA and F of our method are increased by 200.40%,46.50% and 80.05%,respectively,compared with current new method FSC-LDA.In the aspect of interest category model,based on the traditional TF-IDF algorithm,we proposed a user interest category model based on the polynomial naive Bayesian,which considers the lexical item,part of speech,word length and text normalization.The experimental results show that the new algorithm can effectively improve the F1 value of microblogging short text classification,which can better construct the user interest category model.In the aspect of interest keyword model,based on the semantic relationship between interest words,a three-layer model is proposed by constructing user interest keyword model and considering the time window as well as the forgotten function that can incrementally update the model.Experimental results show that the Hit Rate of top-5 and top-10 on TLM are increased by 10.70% and 18.65%,respectively,compared to current new method TBIMM.TLM can better filter noise words and track interest excursions.(2)According to these three dimensions,we proposed a hybrid user interest model based on the hierarchical structure.The model describes the user's interests with multiple granularity and different dimensions,which can describe the user's interest more comprehensively.At the same time,the experiments of Spark parallelization on each dimension show the model can quickly deal with massive social media short text data.
Keywords/Search Tags:social media, hybrid interest model, topic model, category model, interest keyword model
PDF Full Text Request
Related items