The Design And Implementation Of A Distributed Storage And Retrieval System

Posted on:2010-03-24

Degree:Master

Type:Thesis

Country:China

Candidate:Y F Cao

Full Text:PDF

GTID:2178330338482192

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

In the digital era of information explosion, the storage and retrieval of information will become the basic means and ends. In the information age,"Data rich, information poor"is the most significant feature. Therefore, information retrieval technology is constantly updated and improved. Surge in volume of digital information, storage prices are low, the rapid develop- ment of the network, access to useful information in the context of several background above, traditional file system limited to a single device is already difficult to meet the requirements of storage management. The distributed storage and retrieval system has strong advantages of high efficiency, stability and scalability, is the best way to comply a efficient storage and re- trieval.Distributed parallel programming model for a lot of different characteristics, we compare the classic OpenMP, MPI and recently more popular MapReduce programming model and found that poor OpenMP scalable MPI programming model is complex. MapReduce is pre- sented as a Google group for large-scale mass data processing distributed programming model. The advantages: scalability is good, readable, and has better auto-parallelism and fault to- lerance.This thesis analyzes the distributed storage and retrieval system's strong advantages of high efficiency, stability and scalability, and introduces one kind of simplified distributed programming model—MapReduce.This thesis introduces how to establish a MapReduce-based distributed file storage sys- tem (DFS), and how to implement a distributed information retrieval (DIR) platform on this storage system to achieve full-text search.Through experimental comparison, we found that the efficiency of the distributed file system is far ahead of stand-alone treatment when data processing increased. In addition, the key of effectively improving the efficiency of parallel computing systems is to enhance its concurrency when under the permit of the system hardware conditions.

Keywords/Search Tags:

Parallel Computing, Distributed File System (DFS), Distributed Information Retrieval (DIR), MASS DATA, mapping protocol

PDF Full Text Request

Related items

1	Research On Mass Remote Sensing Image Data Storage Technology
2	Research On Personal Information Fusion System Based On Distributed File Storage
3	The Design And Implementation Of Parallel Computing Platform Based On MapReduce
4	The Design And Implementation Of A Log Analysis System Based On Distributed Computing Platform
5	Research On Distributed File System Of Supporting Gpu Acceleration
6	The Design And Implementation Of Image Retrieval System On Hadoop
7	Based On The Hadoop Mass File Storage System Analysis And Design
8	The Research Of A Remote Sensing Data Organization Model Based On Cloud Computing
9	Design And Realization Of Parallel File IO Based On Hadoop Distributed File System
10	Design And Realization Of Parallel File Io Based On Hadoop Distributed File System