Font Size: a A A

Research And Implementation Of Mass Small Files Storage Of Social Network Based On HDFS

Posted on:2017-07-23Degree:MasterType:Thesis
Country:ChinaCandidate:C M BiFull Text:PDF
GTID:2428330536962634Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of information technology produces the explosive growth of data.The traditional way of storing mass small files is not competent storing the explosive of data,especially storing mass small files.Due to the amounts of small files in metadata management,access performance faces huge challenges.Therefore,it became the focus on attention problems.This thesis summarizes the social network data access characteristics on the basis of mass small files.To store mass small files for Hadoop distributed file system HDFS(Hadoop Distributed File System)carried on the thorough research.The work of research work of this thesis includes the following aspects:(1)Using small files,merge way reduces Name Node memory.In this thesis,index file using the global index adds the local block index combination.(2)For the merge file is still smaller than the data block size proposed to merge into the secondary program.First,each user related files,merge into a container,and then file merged again with the file smaller than the data block size,and finally merged files save to the HDFS.(3)For the limitations of traditional static merge files program not suited the user's dynamic access.In this thesis propose dynamic to merge file based on the access logs.Taking into account the consistency of the merged file,make frequent item mining algorithm based on a subset of detection.Using this algorithm to find related small files achieve the dynamic merge with small files.Dynamic merged program will research goal of this thesis conversion to a file-level to the user access level,providing a theoretical basis of the merge of small files.(4)Dynamic merged file can predict a user's next visit to prefetch guide the small file.Dynamic merged file improving prefetch hit the rate.But as the cache prefetch files too much,putting forward the cache replacement algorithm with cycle single list replace the file content.The algorithm reduces the space of a cache occupied.The experimental verification,in this thesis,to merge related files is still less than the size of a block,the scheme can achieve the purpose of reducing the access time.Improved HDFS sequential read file time is the original HDFS read file time 88.2%.At the same time the merged files occupy Name Node memory and write time are also reduced.
Keywords/Search Tags:The Social Network, Mass Small Files, HDFS, Dynamic Merge
PDF Full Text Request
Related items