Font Size: a A A

Research On Crucial Technologies Of Email Communication Network Link Prediction

Posted on:2014-05-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y TianFull Text:PDF
GTID:2268330401476780Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
As a branch of data mining, link mining contains finding out the models and methods oflink prediction. With the rapid popularization and development of Internet, email has alreadybecome an important way of communication, the data applied to the analysis of social networkrelations can be extracted from email communication records, which makes the link prediction inthe email communications network still has been a hotspot in the research field of data mining.Based on the characteristics of email communication network, this paper obtains useful linkprediction information from multiple angles, improves existing link prediction method forlimitations, proposes three method applied to evolutionary and stable link prediction in emailcommunications network. Experiments on the Enron e-mail dataset show that compared to theother existing link predicting algorithm, the improved method has higher prediction accuracy.The content of this paper can be divided into the following parts:1. The existing proximity indicators don’t apply to the evolutionary link prediction in emailcommunications network. In this situation, according to that the email Communications networkis characterized by issues group structure, grouping node users into clusters with the email topics.After analysis of the causes of the evolutionary links in the group and between the groups, theimproved proximity indicators SIGRAand SIGRAto predict the evolutionary link in the emailcommunications network are proposed based on the expansion of the traditional algorithm. Theexperiments show that the prediction accuracies of the improved algorithms are higher than CNand RA.2. A new strategy is proposed that using Bayesian classification framework to predict theevolutionary link in email communications network. Firstly, this paper uses the improvedproximity indicators as a classification attributes and optimizes their likelihood function, andaccording to the new definition of the node’s type, with the communication probabilitydistribution between different types of nodes, a new classification attribute IIs is proposed. Forthe using limitation of naive Bayesian classifier, through node pair classification, using theimproved classification model to re-use all attributes for achieving link prediction. Compared toclassification model before optimization, greatly short the classification training time and savememory space. The experiments show that compared to proximity prediction algorithm, thegreater advantage of the improved classification prediction method is demonstrated on theprediction accuracy.3. For specific overlap and hierarchy characters of issues group structure in emailcommunication network, definite a new edge evolution model framework HOSBM of email communications network and give out HOSBM’s likelihood estimation function expression.With Markov chain Monte Carlo algorithm and existing idea to predict the pseudo links, appliesthem to predict stable link and proposes a stable link prediction method in email communicationnetwork based on links reliability. Finally, the experiment results also show that the newproposed link prediction algorithm’s precision of the prediction and the prediction accuracy aresignificantly higher than the existing link prediction method CN which based on the proximity.
Keywords/Search Tags:link prediction, proximity, Bayesian classification, email issues group, maximumlikelihood estimation, network evolution model
PDF Full Text Request
Related items