Font Size: a A A

Design And Implementation Of Distributed Query Algorithm Processing Communication Data Based On Hadoop

Posted on:2010-07-01Degree:MasterType:Thesis
Country:ChinaCandidate:Y ChenFull Text:PDF
GTID:2178360275473714Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Data is the carrier of information. With the development of information technology, data in the modern social life assumes an increasingly important role. Social network analysis distils useful information from the social network data by graph theory, data mining techniques and so on. Usually of the data set is very large. So the ability of data processing should be excellent.Large-scale social network communications data analysis and visualization system is a social network analysis tools dealing specifically with communication data set. For the system, Expansion of the hierarchical data involves querying in extensive dataset and thus it needs a high efficiency of data query requirements. Using the traditional relational database such as Oracle or SQL Server can meet the complex conditions while the inquiry, but when dealing with TB-class large-scale data sets, it's unable to do as much as we would like to. At the same time, BFS algorithm, which has traversal operation, is is very low in relational databases.In such a case, we need to solve data query and processing bottlenecks exist. After analyzing the existing distributed storage systems and cloud computing platform, we choose Hadoop platform for distributed data storage and query to improve the program.The paper focus on the communication data distributed storage and query based on Hadoop platform. It tells how to design the Hbase-based communication data model of social network data. We implement the conditions query, design and optimize data model. Finally, the clients can access services from Haoop platform. We also design and implement Map/Reduce algorithm for communication data set. Map and Reduce functions implement the data parallel query processing. In the data query process, the traverse process is put in the Reduce function, so that the BFS algorithm traverse can also run in parallel. This is in large measure to optimize the data query and the efficiency of stratification expansion.The implementation of communication data distributed storage and query based on Hadoop platform has very important significance. Hadoop platform needs to be deployed only in the ordinary, cheap PC to run, but have high efficient to deal with data, it has high value and application of practical significance.
Keywords/Search Tags:Hadoop platform, Map/Reduce algorithm, Distributed query, Hbase
PDF Full Text Request
Related items