Font Size: a A A

The Study On E-mailcommunication Entity Relationshiip Mining And Analysis

Posted on:2015-05-27Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z F WuFull Text:PDF
GTID:1108330473452066Subject:Information security
Abstract/Summary:PDF Full Text Request
In order to adapt to the need of mining the entity relationship among the rapid expansion of network data, the research about analyzing the E-Mail’s social network becomes increasingly active. The reason is that the E-Mail network is one of the most widely used communication network, while it possess characteristics of the obvious social, the most used in enormous people, and the realistic relations hidden behind the data. The research about dividing the social network data structure of E-mail and discovering possible future links, has led to the social network analyzing to be an important part of mining the entity relationship among network data. It can be applied in many commercial applications, such as the e-commerce, social recommendation, while it also can be widely applied in the area of the anti-terrorism, the crime operational investigation, and so on. The social network division and the possible future links predicting are the hot topic in research.Facing with the mining of the e-mail communications entity relationship from the large amount of data, problems of the efficiency and the precision of the community division, the recalling rate and the precision of the link prediction, have become the trouble of the practical application. Based on the existing algorithm of analyzing social network, this paper focuses on topics aboutthe precision and the efficiency of the detection algorithm of the community structure, the recalling rate and the precision of the link prediction, in the area of mining the communication entity-relationship among the E-mail network. The main contributions of this paper are listed as follows:(1) Proposing a new model formeasuring the detection algorithm of the community structure. When employing the Modularity method to divide the community scale, the division result is not stable. Facing with this problem, based on the idea of information center, the newly proposed model will weight the relevance degrees among nodes, and also weight the node’s degree. According to this way, the model can not only accurately identify the cluster center, but also provides the basis for the similarity degree calculating among nodes. In addition, this paper proposes a new BSM algorithm. Based on the result of the simulation experiments and the experiment on the real network data, the newly proposed algorithm is more stable and more accurate compared with the Modularity method. Further more, it also identifies the effectiveness of the measure model.(2) Proposing a fast algorithm model for the complex and massive community scale. The research about the algorithm model is divided into two steps: firstly, aiming at the problem of the low efficiency of the Leuven fast algorithm when processing the first iteration, this paper proposes an improved algorithm(FLA algorithm) by employing the idea of pruning. Secondly, the Leuven fast algorithm is based on the idea of the Modularity optimization. As a result, the algorithm has the disadvantage of easily converging to the local optimal solution. Facing with this weakness, this paper proposes a newly CDDW algorithm based on the FLA algorithm by improving the optimization function template and adapting the information about the weight of the node’ degree, edge and so on. Based on the result of the simulation experiments and the experiment on the real network data, the newly proposed algorithm model not only can greatly reduce the computing cost, but also improve the precision of the division result of the entire community-scale.(3) Proposing a new link prediction algorithm. Aiming to the problem of low recalling rate, this paper proposes a novel integration learning algorithm. This algorithm regards the link prediction problem as a binary type problem. By using the error feedback mechanism provided by the Booting algorithm framework, this paper investigates and develops a newly link prediction algorithm model: AdaPred model. In order to improve the precision and recalling rate of the algorithm, this paper also proposes a newly link prediction algorithm and integrate this algorithm into the AdaPred model. Through the experiment on the real data from the thesis collaborative network, the E-mail networks, and etc., it proves that the forecast accuracy and recall rate of the AdaPred algorithm is more better than other algorithms.(4) Developing a visualization prototype to analysis the communication entity relationship of the E-mail network. The visualization technology will lead the research about the social network analyzing into the practical application.This paper regards the mining of the entity-relation in the E-mail network as the breakthrough point. Moreover, it investigates and develops a visualization and analyzing prototype to orient to practical application. The ability of analyzing the data provided by this prototype is the same as the international advanced level. This prototype is also general and extensible. The developed prototype has been evaluated by the third party and accepted by the national 863 project, the result of acceptance check is excellence.Overall, there are some challenges when the social network analyzing technology oriented to the practical application. This paper mainly researches these challenges, and develops a visualization analyzing prototype based on these researches. The result of this research will provide an efficient and feasible solution for spreading the social network analyzing technology. Instead of based on the context information, the analyzing technology adapted by this paper is based on the network topology structure. Therefore, this technology is well extensible, and can be widely spread into thepractical application scenario of analyzing the social network data.
Keywords/Search Tags:division of the community-scale, outline of the entity-relationship, link prediction, statistic machine learning, visualization of data mining
PDF Full Text Request
Related items