Font Size: a A A

Research And Implementation Of Fast Retrieval Technology For Massive Small Files

Posted on:2017-07-10Degree:MasterType:Thesis
Country:ChinaCandidate:Y HeFull Text:PDF
GTID:2428330569998929Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the expansion of data,distributed file system,with its massive data support,high availability,large-scale concurrent access and concurrent processing capacity advantages has drawn more and more attention.At present,the master-slave distributed file system always bases on file-directory structure to manage files stored on the file system,users can use the built-in file system command line interfaces or REST(Representational State Transfer)interface,rapidly access the destination file by file directory structure.For example,each photo often has the relevant attributes of "taking time","location","photo description",etc.If users need to quickly retrieve and locate photos from "taking time","location","photo description",or any dimension they interest,manage mass photos with file-directory structure are difficult to meet the needs of users' quickly retrieve.Paper bases on the available research results of SMDFS2.0(Small files Distributed File System 2.0),aiming at solving the problem of distributed file system view of the single issue,adding tags to file metadata,an inverted-index table of file feature to file index information is created,and a feature inverted-index technique which invertedindex binds with file metadata is proposed.The idea of inverted-index binding with file metadata is file metadata distribution in which node,on the node to build the file features inverted-index table,uses the skip-list to manage the inverted-index table,which is easy to search and locate the file that user concerned about from multidimension.SMDFS manages and distributes metadata with Index Cluster.With the creation and deletion of files,index cluster will be split and reconstructed,then index cluster will be redistributed.Therefore,the inverted-index table of file feature will be influenced.In this paper,the dynamic split and reconstruction method of inverted-index table is proposed,which implements efficient redistribution of feature inverted-index table,ensuring the high available of system.Based on SMDFS2.0,SMDFS3.0 which has features inverted-index and files metadata binding,features metadata dynamic split and reconstruction multidimensional browsing system has been implemented.Paper contains three comparative tests,namely the ability to read and write files and file feature retrieval and feature index centralized maintenance and decentralized maintenance contrast test.The experiment results show that the SMDFS3.0 has the same files reading and writing performance as SMDFS2.0,through the time dimension to retrieve correspond files to a time compared to SMDFS2.0,improves 231 times;urban dimensions of search and retrieval capabilities for a correspond city by 52 times to the SMDFS2.0.Compared with the centralized index management technology,the distributed index management technology has obvious advantages in the maximum file storage performance and file retrieval performance.
Keywords/Search Tags:Distributed file system, Massive small files, Inverted index, Search, Dynamic split and reconstruction
PDF Full Text Request
Related items