Font Size: a A A

Research On The Storage Approach For Open Access Paper Resource On Hadoop

Posted on:2015-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z Y WangFull Text:PDF
GTID:2298330422470812Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the rapid growth of Open Access papers on the Internet, the traditional way hasalready not been able to meet the demand of storing massive OA resources, it has becomean urgent problem to process and store massive OA resources efficiently. As a popularcloud computing infrastructure, Hadoop provides a distributed file system HDFS whichhas high fault tolerance, scalability and low-cost storage capability. However, HDFS doesnot perform well for massive small files because it will bring some disadvantages such ashigh memory usage and low accessing efficiency. In this paper, on the basis ofcomprehensive analysis of the current research status at home and abroad, and with thehelp of the special feature of OA papers, try to make further research on small files storageproblem in HDFS.Firstly, Hadoop distributed file system and MapReduce programming model areintroduced, which includes the holistic architecture, working mechanism and process ofread and write file.Secondly, an algorithm of distributed construction of eigenvector and an algorithmof distributed clustering are proposed based on MapReduce, which are used for achievingthe approach of prefetching. We construct and clustering the eigenvector according to thecharacteristics of Open Access papers, and then the parallel framework of this algorithmon Hadoop is given, and detailed implementation is described.Thirdly, an approach of storage, retrieval and prefetching based on the clusteringresult of metadata is proposed which is in connection with the problem of massive OAparpers storage in HDFS. OA papers storage, which is followed with a method ofdistributed construction of the retrieval based on Lucene, and a prefetching mechanismbased on the accessing habit of users.Finally, we compare the experimental evaluations and analysis of the storageapproach proposed in this paper with the traditional approaches on different data sets anddifferent number of accessing.
Keywords/Search Tags:Open Access paper, HDFS, MapReduce, Small file storage, Distributedclustering, Prefetching
PDF Full Text Request
Related items