Font Size: a A A

Research On The Algorithm Of Community Discovery And Key User Mining Based On Big Data

Posted on:2020-02-13Degree:MasterType:Thesis
Country:ChinaCandidate:C WangFull Text:PDF
GTID:2428330602452144Subject:Information Science
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of network technology and the rapid popularization and application of mobile Internet,more and more people communicate with others through social networks.Due to the rapid growth of people's demand for knowledge,various network knowledge communities have also flourished and become the main platform for many users to create and communicate knowledge.With the continuous increase of users,the number of community users and the data generated by user sharing and interactions are increasing rapidly.Therefore,social network analysis such as community discovery and user identification faces problems such as large data volume and high network complexity,which seriously restricts The development of large-scale social network analysis.At the same time,the convenience of social networks has greatly reduced the time of information generation and dissemination,and caused great difficulties in monitoring and guiding the dissemination of community information.Therefore,how to identify key users in the community and then grasp the flow of knowledge in the community has become an urgent problem to be solved.This paper is aimed at the problem that traditional methods are difficult to deal with large-scale network data.It combines Python and Spark technology,and uses the Graph X framework to improve and parallelize traditional algorithms,design a model framework,and use cluster advantages to process massive amounts of data.This paper mainly works as follows:First of all,A community discovery algorithm based on node importance is proposed.First,considering the propagation ability of different nodes and the degree of influence on the target nodes,the Page Rank algorithm is used to calculate the importance of each node.Secondly,because there are some nodes in the node's neighbor set that are not tightly linked or have little interaction with each other,a node affinity measure based on the number of nodes' common neighbors is proposed,and the intimacy between the nodes is used to the neighbors of the target node.The node filters.Finally,the label selection strategy is improved by the importance of the node to avoid the label "shock" phenomenon.Secondly,the key user identification model and method are constructed.This paper takes CSDN community users as the research object,and analyzes the user interaction behavior and published text features.Based on the K-core decomposition method,consider the neighboring user nodes.The difference of influence contribution,define the potential influence of the network edge and the influence factor of the edge,comprehensively consider the dual identity of the user knowledge contributor and the communicator,and construct the key user mining model and ranking method.Besides,based on Python and Spark big data processing technology,the related algorithms are parallelized to improve the computational efficiency of the algorithm.Finally,the research results of this paper are summarized,and the future research is proposed.
Keywords/Search Tags:knowledge community, big data, community discovery, key user mining
PDF Full Text Request
Related items