Font Size: a A A

Research Of User Interest And Community Detection Algorithm Based On Hadoop In Microblog

Posted on:2016-10-28Degree:MasterType:Thesis
Country:ChinaCandidate:J J LiFull Text:PDF
GTID:2348330476455753Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In recent years, with the development of web 2.0, microblog gets the speed development. The research of social network is mainly from two aspects of interest and community detection. But the present algorithms about interest detection are only based on the one hand of user's data or behavior. And the present theoretical models are mainly used to solve the problem of noise. Especially the existing modeling algorithms do not take account of the social tagging features and user interaction of microblog. For the algorithm of community detection, the microblog network which has complex topological structure and property of node content is different from that network structure of single property. If we just consider any of them on one hand, it will not reach the ideal result of community detection. In addition the existing methods of community detection are mostly based on the node community which cannot recognition the overlap community in the network well.In view of the above questions, the work of this paper is mainly about the following two aspects:1. Aiming at the shortcomings of existing interest detection algorithms in social network which are based on the one hand of user data or behavior and ignore the characteristics of social tagging system, the paper using the relationships between microblog label and content, user behavior, puts forward the algorithm of user tag extraction based on Semantic vector and Page Rank in microblog. The algorithm first solves the problem of cold start of tag. Secondly the algorithm tries to extend the semantic of label, set up the user semantic model and design an algorithm of diversified recommendation. Finally considering the influence of user's interaction behavior on interest detection, the paper designs an objective function for calculating the weight of tag based on Page Rank algorithm.2. In view of the present community detection algorithms mostly focus on network structure or node content which cannot adapt to the problem of community partition in microblog network, this article making use of detection algorithm of link community puts forward an algorithm of community detection based on network topology and node content. The algorithm firstly establishes a directed and unweighted microblog network based on social relation of users, then accomplishes the reconstruction task of weighted network with full consideration to the original directed edge and attribute of node content. When dividing the structure of microblog community, in order to overcome the contradiction of overlapping nodes that node community detection produces, this article uses the community detection method based on link which considers link structure as clustering object in network, finally the dividing density is introduced as an evaluation standard for community.At last, the paper conducts the experiments for the proposed algorithms. First in order to determine the size of related parameters in algorithms, make several multiple comparison experiments expecting to obtain the corresponding parameter values under maximum performance of algorithm. Then make the modeling experiment about proposed tag extraction algorithm, collaborative filtering algorithm and the TFIDF algorithm based on keyword extraction with same data set, finding that accuracy, recall rate and F-value of the proposed algorithm improved significantly. Secondly in order to verify feasibility of the proposed community detection, the paper analyzes the community dividing process and does comparison experiment of community detection with different scale network. The results show that the accuracy of proposed algorithm is highest regardless of the changing network size. Finally we make the algorithm experiments under Hadoop environment, the results show that the efficiency and scalability of Hadoop environment has the remarkable improving.
Keywords/Search Tags:Microblog, Interest detection, Tag, Community detection, Hadoop
PDF Full Text Request
Related items