Font Size: a A A

Research On Parallel File Syetem In Search Engine

Posted on:2007-06-12Degree:MasterType:Thesis
Country:ChinaCandidate:H Z GuoFull Text:PDF
GTID:2178360212467025Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the popularity of computer applications and the development of Internet, while facing oceans of information on Internet, people find it too difficult to get what they need, search engine appears. File storage and management becomes a key element in its development after search engine has crawled billions of web pages. Meanwhile in recent years, accompanied by the rapid development of cluster technology, distributed parallel file systems as clusters'core component, attract more and more attentions. So using distributed parallel file systems to provide file supporting becomes a better solution. But common parallel file systems, which are mostly generic systems, can't properly meet search engines'special file operation needs. The thsis aims at implementing a parallel file system in order to privde supporting in file storage and management for seach engines. First a prototype parallel file system will be chosen after the research on common techniques about it. And then some algorithms will be improved to fit seach engines'special file operation requriments.After comparing among several widely used parallel file systems, PVFS2 was chosen as a prototype system. Cross-backup and load-balance algorithms were studied based on the analysis about PVFS2's features, architecture and principles in contrast to Google File System (GFS). The dissertation mainly contains the following aspects:(1) Choose PVFS2 as the prototype system through comparisons on several common parallel file systems. PVFS2 has the advantages of high performance, friendly usability good support for big files, and its open-source.(2) Compare PVFS2 with GFS on system architecture, metadata management, and its special design for search engines. Analyse PVFS2's advantages and disadvantages. And point out which components don't fit for seach engines.(3) Implemente an interface above PVFS2 MPI-IO for search engines. PVFS2 MPI-IO uses ROMIO accessing PVFS2, and can provide high...
Keywords/Search Tags:parallel file system, PVFS, GFS, file backup, load balance
PDF Full Text Request
Related items