The Research And Improvement Of Distributed File System In Search Engine

Posted on:2011-01-20

Degree:Master

Type:Thesis

Country:China

Candidate:C M Huang

Full Text:PDF

GTID:2178360308463598

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

With the rapid growth of information on web,search engines have become indispensable for users to find online information.With the search engine technology continues to progress, there have been many excellent products such as Google, Baidu, Yahoo and other search engines. As the explosive growth of information, the user of the search engines increasingly dependent on it what is opportunities and challenges for search engins.How to improve performance of search engine become a research hotspot.Search engine performance further by many factors, the search engine itself is also involved in many technology. This paper is focus on the distributed file system in order to improve the performance of search engine.This paper will research on HDFS file system which is fundamental storage system for kapok search engine. Through studying of the structure and organization of data, detailed procedure of reading and writing, and refer to other excellent distributed file system. We introduce some new mechanisms to HDFS to improve performance.At first, this paper discuss on the algorithm of space selection on data node. There are some shortcomings of HDFS default algorithm. Without of an understanding of the system environment, the simple Round-Robin algorithm might cause data imbalance and I/O block. This paper presents a new algorithm, by obtaining some of the current system status information to make choices conducive to system performance.In HDFS, data node process packets in a serial processing model. It will transport date to next node, then flush it to local disk. So the speed of data processing will slow down. This paper presents a parallel data processing mode. Using a queue to buffer packets and a new thread to deal with the disk operations. This way can improve the efficiency of data processing.This paper design several experiments to compare the performance between HDFS. The result obtained through experiments, we can see that the improved HDFS in the three tests have shown a better performance.

Keywords/Search Tags:

Distributed structure, File System, Parallel Write, Performance

PDF Full Text Request

Related items

1	Research On Performance Modeling And Application Of Distributed File System
2	GlusterFS Data Distribution Policy And Performance Optimization Research
3	Distributed access to parallel file systems
4	Journal Design And Implementation Of Cappella Distributed File System
5	The Design And Implement Of A Improvement For Small File Performance In Distributed File System
6	The Design And Implementation Of The Parallel Network File System PNFS
7	Server-oriented Distributed And Parallel File System
8	Design And Realization Of Parallel File Io Based On Hadoop Distributed File System
9	Design And Realization Of Parallel File IO Based On Hadoop Distributed File System
10	Distributed Parallel File System