Font Size: a A A

Research And Application Of Community Discovery And Key Figure Mining Algorithm Based On Spark

Posted on:2019-09-12Degree:MasterType:Thesis
Country:ChinaCandidate:L L XueFull Text:PDF
GTID:2428330545990142Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the continuous development of the Internet and the continuous improvement of the big data policy environment and technology,various social platforms such as Weibo,WeChat,Zhihui,etc have rapidly developed and formed a huge social network.Communication with friends through social platforms,sharing their knowledge and insights,and acquiring knowledge have been integrated into people's daily lives.Social data has also exploded.The Internet has entered the age of how to conduct big data storage,big data processing analysis,big data parallel computing,big data value mining and application.In social networks,if people are regarded as a node in the graph and the link relationship between people is considered as the edge in the graph,there will be a large number of subgraphs in the social network.These subgraphs are the community structure.People within the same community structure have similar attributes,such as common interests and hobbies.There is a close connection between these people.Through in-depth study of the community structure in social networks,we can mine valuable information hidden in the community and make corresponding predictions.The influential hotspot in social networks are called key figure.Real-time mining of key figure in a social network can reveal current social hotspots,concurrent portal traffic,and other information.The era of big data has brought challenges to the mining of community structure and key figure.Due to the rapid development of the Internet,applications and communication technologies,a large number of unstructured data from various sources have been generated.Large amounts of data face many challenges in storage and real-time processing.The traditional community discovery and key figure algorithms applied to a single machine can no longer meet the requirements of the big data era.Based on the background of the above issues,In this paper,I have done a series of research on the Spark large-scale data parallel processing framework,the Spark parallelization of the PageRank algorithm,the Spark parallelization of the Louvain algorithm,the PageRank algoritham in the stand-alone environment,the Louvain algorithm in the stand-alone environment,and the visualization of the results.The main research contents and contributions of this article are as follows:(1)Research on the user influence and edge model in social network.This paper proposes a user influence and edge weight computing model based on user characteristics and link relationships,which complements key figure mining algorithms and lays a foundation for Louvain community discovery research based on weighted graphs.(2)Spark big data parallel computing framework research.This section mainly carries out work in three areas.1)Complete the construction of big data computing environment and configuration parameters optimization of Hadoop,Spark and Yarn;2)Complete the loading and preprocessing of the data to be studied;3)In-depth study of the MapReduce,Spark RDD principle,GraphX principle and core operators,Scala and Python Spark-based parallel programming,laying the foundation for subsequent parallel analysis of the algorithm.(3)Research on parallelization of PageRank algorithm.This section mainly focuses on two aspects of work.1)Propose the idea of "co-chain edge",implement the PageRank algorithm and its optimization in the stand-alone environment;2)implement the PageRank algorithm's parallelization based on the Spark platform.(4)Research on the parallelization of Louvain's algorithm.This section mainly carries out work in three areas.1)Realize the Louvain algorithm and its optimization in stand-alone environment;2)Implement the parallelization of the Louvain algorithm based on Spark;3)When using the Louvain algorithm for community discovery,most researchers use 1 as the weight of the edge without considering the impact of the actual weight of the edge and directed edge on the community discovery results.On this issue,based on the above-mentioned edge weight calculation model,this paper studies the influence of the weighted graph on the Louvain community discovery.(5)Visualized research and analysis of community discovery and key figures'mining results.In this section,the mining results are visualized and analyzed mainly through tools such as Gephi and D3.
Keywords/Search Tags:Social Networking, Community Discovery, Louvain, Key Figure, PageRank, Spark
PDF Full Text Request
Related items