Font Size: a A A

The Design And Implementation Of A Distributed Storage And Retrieval System

Posted on:2010-03-24Degree:MasterType:Thesis
Country:ChinaCandidate:Y F CaoFull Text:PDF
GTID:2178330338482192Subject:Computer technology
Abstract/Summary:PDF Full Text Request
In the digital era of information explosion, the storage and retrieval of information will become the basic means and ends. In the information age,"Data rich, information poor"is the most significant feature. Therefore, information retrieval technology is constantly updated and improved. Surge in volume of digital information, storage prices are low, the rapid develop- ment of the network, access to useful information in the context of several background above, traditional file system limited to a single device is already difficult to meet the requirements of storage management. The distributed storage and retrieval system has strong advantages of high efficiency, stability and scalability, is the best way to comply a efficient storage and re- trieval.Distributed parallel programming model for a lot of different characteristics, we compare the classic OpenMP, MPI and recently more popular MapReduce programming model and found that poor OpenMP scalable MPI programming model is complex. MapReduce is pre- sented as a Google group for large-scale mass data processing distributed programming model. The advantages: scalability is good, readable, and has better auto-parallelism and fault to- lerance.This thesis analyzes the distributed storage and retrieval system's strong advantages of high efficiency, stability and scalability, and introduces one kind of simplified distributed programming model—MapReduce.This thesis introduces how to establish a MapReduce-based distributed file storage sys- tem (DFS), and how to implement a distributed information retrieval (DIR) platform on this storage system to achieve full-text search.Through experimental comparison, we found that the efficiency of the distributed file system is far ahead of stand-alone treatment when data processing increased. In addition, the key of effectively improving the efficiency of parallel computing systems is to enhance its concurrency when under the permit of the system hardware conditions.
Keywords/Search Tags:Parallel Computing, Distributed File System (DFS), Distributed Information Retrieval (DIR), MASS DATA, mapping protocol
PDF Full Text Request
Related items