Font Size: a A A

The Application Research Of Virtual Communities Discovering Based On Distributed Spectral Clustering

Posted on:2017-08-10Degree:MasterType:Thesis
Country:ChinaCandidate:D H XuFull Text:PDF
GTID:2348330503467204Subject:Computer science and technology, computer technology
Abstract/Summary:PDF Full Text Request
Because of the different users age,occupation and interests etc,the social network presents different structural characteristics of communities,community discovery is the research foundation and core of community structure characteristics. Community findings not only contribute analyzing the relationship among different groups of users,fingding internal rules hidden in the community and tracking the hot topics in the network,but also play a impotant role in The recommendation of friends and precision marketing.The spectral clustering algorithm is one of large Community finding motheds, based on graph theory and applies to social networks that can be abstracted into user graph data.Due to the high time complexity of the reason, the traditional spectral clustering algorithm is applied in the network which has less number of nodes, but the social network has a huge users, which is a great challenge for the traditional spectral clustering algorithm. In order to solve this problem, this paper presents a widely used distributed computing framework,Hadoop,applied in large-scale community discovery, and designs a reasonable user similarity model for the characteristics of social network data. Then the efficiency of the traditional spectral clustering algorithms will be improved a lot.For the disadvantage of unable determining the number of community for the spectral clustering algorithm, this paper proposes different algorithms for two scenarios. In the obvious scenes of the community structure, community number discovery algorithm based on the PageRank which has a good parallelism,can use parallel computing in large scale data to improve the efficiency of finding the number of communities;In the implicit scenes of the community structure, the paper proposes the spectral clustering algorithm based on module of optimization to find the number of communities. This paper chooses the blogging social network which is often used by the public as an experimental verification, and Micro-blog users have large number of attributes,such as micro-blog content, attention, fans, interaction, personal information etc. This paper integrates four types of user attribute information, and buid more reasonable user similarity model. For the special scene that the spectral clustering algorithm is applied in social network, this paper proposes a series of Hadoop optimization strategies, such as using the HBase storing intermediate results, controlling the size of block, using Uber mode and so on.
Keywords/Search Tags:Distributed, Spectral clustering, Virtual communities discovering
PDF Full Text Request
Related items