Font Size: a A A

A Distributed Computing Framework For Large-Scale Information Network Mining

Posted on:2014-02-24Degree:MasterType:Thesis
Country:ChinaCandidate:Z F CengFull Text:PDF
GTID:2248330398470888Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of information technology, large-scale infor-mation networks in different fields continue to emerge, such as the World Wide Web, social networks, instant messaging network and biological information networks. These information networks composed by a large number of differ-ent interaction of individual imply certain patterns and law. These large-scale information networks are of great significance not only for the exploration of natural science, but also for the study of human social behavior. The infor-mation network in the nature, such as bio-informatics network, neural network has become an important channel for the scientific community to discover new patterns and new laws. Large-scale online social network provides a great op-portunity for sociologists to study human behavior and social development. In the commercial field, the mining of large-scale information network plays a important role in the company’s business decisions and product promotion.Analysis of large-scale information network poses an enormous chal-lenges to the academic and industrial fields. First, the traditional methods of data analysis can not apply to the current large scale information network as traditional analysis algorithm complexity is too high for the large informa-tion network. Secondly, a single high-performance computers is insufficient to make a computation on such a large-scale network and distributed computing becomes the new trend of the data analysis which demands new data storage mechanism and system design. In response to these challenges, this paper fo-cuses on graph partition, computing models and analysis methods of the large-scale information network analysis. For the graph partition, this paper presents a parallel graph partitioning algorithm to partition the graph efficiently in the distributed computing environment which effectively reduce the communica- tion overhead and improves the computational performance of the system. For the computing model, this paper proposes a multi-way message passing mech-anism to improve the efficiency of the parallel computing system of large scale information analysis. For the analysis methods, besides the traditional graph-based method, this paper also proposed the algorithm framework of a matrix factorization. The matrix factorization algorithm framework integrates differ-ent information networks by using different regularization, providing a general analysis framework. Finally, this paper presents the design of a parallel anal-ysis platform for the large-scale information network. The users can carry out the analysis of large-scale information network by writing a small amount of code,regardless of the specifics of the distributed system.
Keywords/Search Tags:Information Network, graph partition, mult-way messagepassing, matrix factorization, router computing model
PDF Full Text Request
Related items