Font Size: a A A

Research On The Prediction Of File Access Behaviors Based On Tree-KNN

Posted on:2013-03-18Degree:MasterType:Thesis
Country:ChinaCandidate:J HuFull Text:PDF
GTID:2248330392457807Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The explosively growth of data makes the number of files increase rapidly and requiremore storage devices. There are so many types of files, which makes the file managementbe more difficult. Moreover, new storage media with different characteristics areintroduced into the storage systems, which meantime make file classification an importanttask. One of the most important factors that are helpful for file management is predictingthe file access behavior in the future. Existing storage systems are difficult to predict fileaccess behavior.In this paper we implemented a file access behavior predicting system which can beused to find out K files that have similar characteristics. This can be a useful functionalityto help storage system predict file access behavior, enhance file layout as well as allocatecaches to files smartly.The main idea behind this system is to combine both static metadata of files and theirprevious access history to construct a prediction model in order to predict their futureaccess behavior. First, a Decision Partition Tree (DPT) is constructed using metadata offiles, and a KNN model is constructed in the leaf nodes of the tree. Then this hybrid modelcan be used to predict the files incoming access frequency. Decision Partition Tree is ahighly balanced multi-branch tree that can be used to partition the raw training collectionof file metadata. This can not only remove noisy data but also reduce the after-cost ofclassification. A new coming file can be allocate to a sub-collection using DecisionPartition Tree, then a heap can be used to help it find its most similar k files, these k filescan be used to decide this file’s access behavior in the future.The experimental results show our system can predict file’s access frequency in thefuture accurately. The percentage of the accuracy achieves90%and its time overheadreduced by20times compared with traditional KNN methods.
Keywords/Search Tags:large-scale storage systems, metadata, file access behavior, decision tree, KNN Classifier
PDF Full Text Request
Related items