Font Size: a A A

Research On Learning The Structure Of Information Diffusion Network

Posted on:2013-04-16Degree:MasterType:Thesis
Country:ChinaCandidate:J LiFull Text:PDF
GTID:2248330371982780Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
The information stored in the Internet would spread in the network over time, which canform a network structure of information diffusion. This network displayed the path ofinformation diffusion, depicted information diffusion rule. With the Research on this network,wen can forecast the future propagation direction of information, and mark the key nodes inthe netwok. However, the structure of the Internet follows the scale-free model, and can notbe accurately given. At the same time, the node which receives information usually does’tindicate the source of information. For these reasons, observation tracking method can hardlyobtain information diffusion network structure accurately.The study of information diffusion network structure has become a hot topic, researchershave proposed some creative ideas on this problem, and also obtained many valuable results.While, there are still some problems and deficiencies. For example, a lot of papers did notconsider the selective diffusion of information in network, most of these methods can notachieve an ideal prediction on the real network.First, in this article, we give a comparative analysis of two classical propagation modelsin the area of information diffusion, which is the Cascade model and Threshold model. TheCascade model is widely used in the medical field, in view of the similarity betweeninformation diffusion in the network and disease spread in the crowd, this model wasintroduced to the computer field. Threshold model was originated in the field of sociology.Information diffusion was partly in accordance with the modeling principle of Cascade andThreshold model. However, Information diffusion was passive behavior to the informationitself, and was not related with interests of disseminator. So there are certain differencesbetween the information diffusion process and the process dipicted by two classical models.Therefore, our research takes the reasonable factor of the two classical models into account,and put forward the new mixed-propagation model.In this new model,"interestingness measure" of one node on information was introducedbased on the essence of information diffusion problem. While information diffuses in network,there is an "acceptance threshold" and "interest threshold" on one node for information. Wheninterestingness measure is greater than or equal to the interest threshold, the process of information difusion is similar to the one discribed in the Cascade model; Wheninterestingness measure is between the acceptable threshold and interest threshold, there is aninfulence weight from one node to its neighbors, the weight dipict the fact node will affect theinterestingness measure of its neighbor on one piece of information, and it will change overtime; when the interestingness measure of node is lower than the acceptance threshold, theinformation would never spread to this node. By setting the interest threshold and acceptancethreshold, the mixed-propagation model can be transformed into Cacscade model orThreshold model easily.Next, we analyzed the existing learning methods and proposed new algorithm PCpinfbased on optimal approximate reasoning. The principle of optimal approximate reasoningmethod is to find out the highest confidence edge gradually, it can get higher solutionprecision. But the algorithm faces the problem of selecting false edge and more than one edgeto be choosed. Some improvements in PCpinf as follow:1. We Analysed the evidence of inferring one edge to be real with some probability,introduced so-called " Statistical appear ratio" and "mutual information" based on the idea ofstatistics, setted parameter threshold to do pretreatment and classification for potential edgesbefore the start of the algorithm, part of the false edges are deleted, and part of potential falseedges are labeled.2. Combined with the "Max-K-Cover" problem, we discussed choosing strategies whenfacing the problem of many candidate edges, and proposed a simple and effective heuristic tochoose edges.3. Combined with unipolarity of edge and further use of mutual information, whilechoosing the right side, we remove confirmable false-edge at the same time.4. By use of the local and overall monotonicity of marginal gain for one edge, we didpruning on times of updating.5. We improved the "lazy evaluation" of the original algorithm, introduced optimizationprocess called "lazy deleting edge" combined with the deleted edge pruning.To verify the effect of new algorithm, this paper has designed a rigorous experimentalprocesses: First, by defining some statistical indicators, we verifed the rationality ofpretreatment process and collaborative pruning method in the setting of different timetransmission models and network structures. Second, by compared with original algorithmNetinf, we confirmed the fact PCpinf improves the efficiency by40%. In addtion, under theconditions of combination of different network structure and different propagation models, weconfirmed that new algorithm get improvement of about3%in the solution accuracy by comparison of the effect of PCpinf and Netinf. Finally, by changing the size of the set of timesequence data, we verified that the algorithm has good robustness, with only a small amountof time-sequence data, it can predict more than80percent real edge.In this Algorithm, how to set the parameters related with pretreatment and collabrativepruning is always refer to the value of β. analysis and research in-depth on paramters settingis needed in the next step, and we expect to achieve better preprocessing and pruning effect,so as to improve the accuracy of the final solution and efficiency.
Keywords/Search Tags:Information diffusion network, Mixed propagation model, Pretreatment and sort, Collaborative pruning, Approximate reasoning
PDF Full Text Request
Related items