Font Size: a A A

Pattern Mining And Evolution Analysis Of Complex Networked Data

Posted on:2018-05-09Degree:DoctorType:Dissertation
Country:ChinaCandidate:T WuFull Text:PDF
GTID:1310330512488095Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
In big data era,the digital world is formed by "quantifying everything". In virtue of the data reflects the world objectively and comprehensively, the world can be we can explored and exploited through data analysis. As network is the substantial character and inherent principle of actual world, in which natural elements, species and other human factors form networked systems through interacting and influencing, the data resources reflecting the actual world are basically the networked data. Actual world is complicated and miscellaneous, and the process of data collection is always unconstrained and ubiquitous. Accordingly, the networked data is complex and heterogeneous. With the development and popularization of information technology,enterprises and society have produced a large amount of complex networked data,which need to be analyzed and utilized urgently. The characteristic properties of the new fashioned complex networked data have brought great challenge to traditional data processing technology. In order to analyze and mining the complex networked data, this project explores new data processing models and theories according to its characteristic properties and practical requirements. Complex networked data mainly includes network structure data, network behavior data and network content data. In order to utilize these data, the project develops algorithms and solutions for pattern recognition and evolution analysis from various perspectives, construct research paradigm and research framework, extracting potential characteristics and underlying principles and improving the technological system. Specifically, the main contributions and innovations of this dissertation are listed as follows:1. Comprehensive network structure pattern miningTo solve the problem of structure pattern mining, label propagation based integrated network structure investigation algorithm (LINSIA) is proposed, which can identify community strucrure,hub nodes and outliers of complex heterogenous networks. LINSIA algorithm can recognize overlapping communities and hubs by allowing nodes possessing multiple lables, can recognize hierarchical communities by constructing bottom-up super-network structure, and can find outliers and avoiding great community structure by proposing novel lablel selection and label updating mechanisms. Moreover, LINSIA can give out a soft-partitioning community structure and depict the degree of overlapping nodes belonging to each relevant community.Extensive experiments demonstrate that LINSIA algorithm outperforms state-of-the-art methods, and has profound practical and theoretical value.2. Node centrality for optimal network targeted attack.To solve the problem of node centrality ranking, proposes a centrality ECI by considering loop density and degree diversity of local network topology. And the proposed ECI centrality would degenerate into Cl centrality with the reduction of the loop density and the degree diversity level. By comparing ECI with CI and classical centrality measures in both synthetic and real networks, the experimental results suggest that ECI can largely improve the performance of Cl for network disruption. Based on the results, we analyze the correlation between the improvement and the properties of the networks. We find that the performance of ECI is positively correlated with assortative coefficient and community modularity and negatively correlated with degree inequality of networks, which can be used as guidance for practical applications.Moreover, we propose a power iteration ranking (PIRank) algorithm by integrating mass diffusion and heat conduction into eigenvector centrality. Because these physical processes treat influential nodes differently, combining them increases our ability to identify different types of influential nodes. To test our PIRank algorithm, we apply it to the selection of attack targets in the network disruption problem and to the identification of influential spreaders in the influence maximization problem. From extensive experimental results on real-world networks we find that the strength of the network disruption of a PIRank-guided targeted attack can be increased.3. Network evolution prediction based on link prediction and position driftIn order to solve the problem of network evolution prediction, this project adopts linkprediction paradigm. To estimate the likelihood of the existence of links more accurate, an effective and robust similarity index is presented by exploiting network structure adaptively. Moreover, most of the existing link prediction methods do not make a clear distinction between future links and missing links. In order to predict the future links, the networks are regarded as dynamic systems in this project, and a similarity updating method, spatial-temporal position drift model,is developed to simulate the evolutionary dynamics of node similarity. Then the updated similarities are used as input information for the future links' likelihood estimation. Extensive experiments on real-world networks suggest that the proposed similarity index performs better than baseline methods and the position drift model performs well for evolution prediction in real-world evolving networks.4. Information diffusion prediction based on individual spreading estimation.To address the problem that information diffusion prediction, this project develops a novel prediction method, multiscale diffusion prediction (MScaleDP). MScaleDP aggregates microscopic spreading modules of individual nodes using a unidirectional label propagation algorithm for macroscopic diffusion prediction, in which the label selection mechanism corresponds to the microscopic spreading decision-making.Through microscopic spreading behavior modeling, the underlying influential factors and the principal driving mechanisms of diffusion process are identified. Moreover, we find that the accuracy of spreading behavior estimation does not always increase with the growth of the feature number. We also find that the accuracy of spreading behavior estimation is not strongly correlated with the estimation model when sufficient features are considered. The proposed method is successfully tested on microblogging network,and represents a valuable tool for gaining insights on information diffusion process..In summary, this dissertation takes efforts on pattern mining and evolution analysis of complex networked data. A series of experiments show that the identified characteristics and the proposed solutions in this project are accuracy, effective and effiecient. Therefore they not only have important theoretical value, but also have extensive application potential.
Keywords/Search Tags:Networked date, Link prediction, Node centrality, Information diffusion, Structural pattern, Evolution analysis
PDF Full Text Request
Related items