Design And Implementation Of Distributed Query Algorithm Processing Communication Data Based On Hadoop

Posted on:2010-07-01

Degree:Master

Type:Thesis

Country:China

Candidate:Y Chen

Full Text:PDF

GTID:2178360275473714

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

Data is the carrier of information. With the development of information technology, data in the modern social life assumes an increasingly important role. Social network analysis distils useful information from the social network data by graph theory, data mining techniques and so on. Usually of the data set is very large. So the ability of data processing should be excellent.Large-scale social network communications data analysis and visualization system is a social network analysis tools dealing specifically with communication data set. For the system, Expansion of the hierarchical data involves querying in extensive dataset and thus it needs a high efficiency of data query requirements. Using the traditional relational database such as Oracle or SQL Server can meet the complex conditions while the inquiry, but when dealing with TB-class large-scale data sets, it's unable to do as much as we would like to. At the same time, BFS algorithm, which has traversal operation, is is very low in relational databases.In such a case, we need to solve data query and processing bottlenecks exist. After analyzing the existing distributed storage systems and cloud computing platform, we choose Hadoop platform for distributed data storage and query to improve the program.The paper focus on the communication data distributed storage and query based on Hadoop platform. It tells how to design the Hbase-based communication data model of social network data. We implement the conditions query, design and optimize data model. Finally, the clients can access services from Haoop platform. We also design and implement Map/Reduce algorithm for communication data set. Map and Reduce functions implement the data parallel query processing. In the data query process, the traverse process is put in the Reduce function, so that the BFS algorithm traverse can also run in parallel. This is in large measure to optimize the data query and the efficiency of stratification expansion.The implementation of communication data distributed storage and query based on Hadoop platform has very important significance. Hadoop platform needs to be deployed only in the ordinary, cheap PC to run, but have high efficient to deal with data, it has high value and application of practical significance.

Keywords/Search Tags:

Hadoop platform, Map/Reduce algorithm, Distributed query, Hbase

PDF Full Text Request

Related items

1	Implemention Of The Massive Telecom Data Distributed Storage And Query System Based On Hadoop
2	A Research Of Distributed Storage And Parallel Query Of Spatial Data Based On Hadoop Platform
3	Research And Application Of Big Data Migration And Query Based-on Hadoop Platform
4	Reach On Map-Reduce Application Based On Hadoop
5	Reach On Map-reduce Application Based On Hadoop
6	Research And Application Of Map/Reduce Based Distributed Log Analyzer
7	Research And Implementation On A Distributed Service Registry Based On HADOOP Platform
8	Research And Implementation Of Distributed Web Crawler Based On Hadoop
9	Research On Parallel Clustering Algorithm Based On Map-Reduce
10	Research Of Big Data Store Query Technology Based On HBase