Font Size: a A A

Design And Implementation Of The Bipartite-network Community Discovery System In Long Time Scale

Posted on:2021-06-03Degree:MasterType:Thesis
Country:ChinaCandidate:Y LiangFull Text:PDF
GTID:2518306308467194Subject:Computer technology
Abstract/Summary:PDF Full Text Request
There are many bipartite network structures in our lives,such as user-domain networks,source IP-domain networks which are carried by each other in cloud services,and so on.There are community structures in the bipartite network.Network analysis based on the community structure is more efficient than node-based network analysis.At the same time,the discovery of community structure can detect potential threats in anonymous networks,which is of great significance for the management and analysis of bipartite networks.but with big data with the advent of the times,Traditional community discovery methods are not applicable to massive log data in long time scales,which has made it more difficult for communities to discover bipartite network data over a long period of time.Therefore,this article takes the bipartite network in the context of big data as the research object.,and designs and implements a bipartite-network community discovery system in a long time scale.This system realizes the receiving and processing of high-speed and massive log in the real-time,aggregates relational query technology based on high-speed and massive user online logs.and then based on the aggregated data for community discovery in a weighted bipartite network.The specific work includes the following:First,the system builds a Logstash cluster based on the ELK framework,which is used to receive and pre-process high-speed and massive real-time user online logs.This system only targets the three most common fields in the log.The source IP,destination domain name,and timestamp The data stream is identified.The system uses Kafka as the distributed message queue,MongoDB as the underlying database,and SparKStreaming as the distributed computing engine to consume data in Kafka,and completes multiple aggregations of data in memory,before storage,and after storage.It realizes the aggregation of data communication relations under high-speed real-time log input.Then the system performs bipartite network community discovery for offline user aggregation logs,the system uses HDFS as the underlying data storage,and MapReduce as the distributed computing framework.It completes single-mode projection of aggregated data,parallel community discovery,and domain name node community calibration.It realizes the discovery of stable community structures in a binary network over a long time scale.Finally,in order to achieve rapid query of aggregated relationships,the system uses ElasticSearch in the ELK framework to index the data in MongoDB,and implements rapid query of elemental aggregated relationships and visual display of community discovery results.Based on tests,it can be shown that the system can aggregate data on a large scale,greatly reducing the actual number of queries,greatly speeding up the data aggregation relationship query speed,enabling the second-level query of aggregated relational data,and the system can reduce concurrent insertions.The number of data in the database increases the reliability of data transmission and improves the quality of service.The community discovery based on offline data can also better excavate the community structure in the bipartite network.
Keywords/Search Tags:ELK, kafka, sparkstreaming, hadoop, the discovery of bipartite network
PDF Full Text Request
Related items