Font Size: a A A

Research On Social Relationship Of Massive Mail Based On MapReduce Model

Posted on:2016-06-10Degree:MasterType:Thesis
Country:ChinaCandidate:J N DaiFull Text:PDF
GTID:2208330461982896Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Entering the new century, with the rapid development of Internet, as a way to communicate on the Internet, e-mail has become one of people’s daily life essential exchanges of communication. As a social network, e-mail can reflect people’s social relations to a certain extent. With the research on the topology model of e-mail network, it can provide a theoretical basis for the spread mode of information or virus and information communication behavior of people. However, how quickly and effectively deal with the massive e-mail data of network explosion time and mining analysis social relationships from the data, such as finding the chain that exist between the users and the discovery of a circle of friends in networks, which is a difficulty and big challenge for us. The main contents of this paper are as follows:(1) Due to the complexity of the e-mail network, after the full study of complex network theory, starting from the actual e-mail networks, the paper uses complex network theory to handle e-mail network and constructs emails network topology model of directed weighted and analyzes its topological properties.(2) Reference to the principle of search strategies of a complex network, starting from the reliability of the search path, an email network search strategy based on weight of node and edges (WNE) was proposed. On the basis of both the search speed and search cost, the paper wants to find a highly reliable path.(3) Improved community partitioning algorithm based on edge clustering coefficient (ICPECC) was proposed. After taking into account of the social network features of the e-mail network, at the beginning in community partitioning, the algorithm introduced Canopy algorithm for initial rough division of the network. The network will be divided into a number of subsets of a more closely integrated. On this basis, by using Radicchi’s community partitioning algorithm which is appropriate for social network analysis to process the weakly labeled set of nodes of Canopy collections, it can further reduce the computation of the algorithm and improve the efficiency of the algorithm. The final community partitioning results of the community partitioning algorithm proposed is more suitable for the analysis of the "circle" feature of social networks, and the efficiency of the algorithm has been some improvement, while the improved algorithm flow is well suited for the parallel computing model MapReduce and has high practical value of processing massive data.(4) According to the research on the above three aspects, experiment on the email corpus provided by a cooperation unit, validate and analysis the effect of the model and the algorithms, and verify their legitimacy and effectiveness.
Keywords/Search Tags:e-mail network, complex network, node weight, edge weight, search strategy, community partitioning, Radicchi’s algorithm, Canopy algorithm, MapReduce
PDF Full Text Request
Related items