Font Size: a A A

The Research And Improvement Of Distributed File System In Search Engine

Posted on:2011-01-20Degree:MasterType:Thesis
Country:ChinaCandidate:C M HuangFull Text:PDF
GTID:2178360308463598Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
With the rapid growth of information on web,search engines have become indispensable for users to find online information.With the search engine technology continues to progress, there have been many excellent products such as Google, Baidu, Yahoo and other search engines. As the explosive growth of information, the user of the search engines increasingly dependent on it what is opportunities and challenges for search engins.How to improve performance of search engine become a research hotspot.Search engine performance further by many factors, the search engine itself is also involved in many technology. This paper is focus on the distributed file system in order to improve the performance of search engine.This paper will research on HDFS file system which is fundamental storage system for kapok search engine. Through studying of the structure and organization of data, detailed procedure of reading and writing, and refer to other excellent distributed file system. We introduce some new mechanisms to HDFS to improve performance.At first, this paper discuss on the algorithm of space selection on data node. There are some shortcomings of HDFS default algorithm. Without of an understanding of the system environment, the simple Round-Robin algorithm might cause data imbalance and I/O block. This paper presents a new algorithm, by obtaining some of the current system status information to make choices conducive to system performance.In HDFS, data node process packets in a serial processing model. It will transport date to next node, then flush it to local disk. So the speed of data processing will slow down. This paper presents a parallel data processing mode. Using a queue to buffer packets and a new thread to deal with the disk operations. This way can improve the efficiency of data processing.This paper design several experiments to compare the performance between HDFS. The result obtained through experiments, we can see that the improved HDFS in the three tests have shown a better performance.
Keywords/Search Tags:Distributed structure, File System, Parallel Write, Performance
PDF Full Text Request
Related items