Font Size: a A A

Research On Parallel Algorithm For Query On Complex User Relationship And Mining Frequent Communication User Relationship In Communication Nextwork

Posted on:2018-08-06Degree:MasterType:Thesis
Country:ChinaCandidate:P Y ZhuFull Text:PDF
GTID:2348330518491126Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid development of mobile communication technology and Internet,mobile communication equipment has become a portable tool for most people. These devices communicate with each other to form the communication network. The communication network reveals interpersonal relationship among the users and has important research value. In contrast with traditional data types, complexity of the communication network is reflected by: Firstly, huge amount of communication network data - data records are often in the order of millions or even tens of millions; Secondly,the network itself is complex, in this case, the size of relationships between users in the network is the number of the users squared. Query and mining is an important part of communication network research, which can be widely used in commodity (service)recommendation, user behavior analysis and social structure analysis. In this paper, we use parallel computing framework Spark to study query on communication relationship and mining frequent communication relationship. The main innovative contributions of the thesis are summarized as follows:1. A parallel isomorphic sub-graph matching algorithm PISM is proposed. The algorithm uses the Spark programming model to split the graph into non-overlapping adjacency lists. All homogeneous matching and connection operations are done on the basis of adjacency list, so that homogeneous sub-graph matching tasks can be efficiently distributed, suitable for analysis of large-scale graph data research. The correctness and validity of the parallel isomorphic sub-graph matching algorithm are verified by theorems and experiments.2. A parallel algorithm for complex queries on massive communication data is proposed. The algorithm uses Spark to convert the communication data and query requirements into communication graph and query graph with variable constraints. The algorithm is divided into three stages: preprocessing of communication graph, matching of communication graph and iterative connection. The large-scale communication graph is divided into a number of non-overlapping sub-graphs and then evenly distributed to the various computing nodes. Since the size of the query map is much smaller, query map can be distributed in the various computing nodes, after which calculation tasks can be completed in various computing nodes.3. A parallel algorithm PMFCS is proposed for mining frequent communication sub-graph of mass communication data. The algorithm is based on the Apriori algorithm and sub-graph connect principle. It uses Spark to distribute all the edges to each computing node; 1th-order frequent candidate sub-graphs are counted at each node, then the 1th-order frequent sub-graphs are distributed to each node. PMFCS iteratively connects the (k-1)th-order sub-graph and the 1th-order sub-graph to generate kth-order candidate sub-graphs. Subsequently, the algorithm terminates until the kth-order frequent sub-graph set is empty. The experimental results show that PMFCS can mine the frequent communication sub-graph efficiently and quickly.
Keywords/Search Tags:communication network, query on user communication relationship, frequent communication relationship mining, frequent sub-graph, isomorphic sub-graph, parallel algorithm
PDF Full Text Request
Related items