Font Size: a A A

OA Paper Storage And Retreval Strategies Based On Hadoop

Posted on:2015-06-14Degree:MasterType:Thesis
Country:ChinaCandidate:H Y GaoFull Text:PDF
GTID:2298330422971065Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
The number of OA (Open Access) journal paper linear growth along with theinternet resources, for the challenge of access and store question of massive OA paper,how to store and positioning to accurate paper effectively is an urgent question. Now theHadoop with distributed storage framework has been widely applied to various fields, butthe Hadoop distributed storage framework HDFS is weak in storage and management ofsmall files, for how to take full advantage of this high fault tolerance and high scalabledistributed storage computer system to provide support for underlying data, this paperpropose a storage strategy based on OA journal papers of Hadoop, meanwhile this paperstudied it from the retrieval and ranking algorithm according to the particularity of OAjournal paper.Firstly, on the basis of related literatures review, this paper introduces the backgroundand significance of mass OA (Open Access) storage architecture in the Hadoop platform,and gives an intensive analysis for the Hadoop of the distributed storage and computingframework, this paper expound the present situation of the strategy for small file based onHadoop.Secondly, according to the characteristic of native Hadoop system, and the strategyfor small files merge storage cannot meet the requirements of OA journals’ storage, thispaper propose a strategy for small files merge based on Hadoop. Aiming at thecharacteristic of OA journal paper, this paper applies the B+tree indexing mechanism onpaper files storage. Meanwhile established the MoB+tree indexing mechanism, thisstrategy improves the speed of file retrieval, and slowing the namespace shortages of theNamenode.Once again, according to the nature of OA journal metadata, and for meet thedifferent label domain ordering algorithms with different weightiness, put forwarda retrieval algorithm based on the optimization OA journal papers.Finally, build the Hadoop platform, verify the method proposed in this paper, theexperimental results show that, this method improve the reading efficiency of OA journal papers on the Hadoop platform effectively, meanwhile use an improved Lucene sortingalgorithm effectively and improve the score in the ranking.
Keywords/Search Tags:Hadoop, Paper Storage, File Index, Searching Arithmetic
PDF Full Text Request
Related items