Font Size: a A A

Image Distribution Technology Based On Similar File Coordination In Data Center Environment

Posted on:2016-11-27Degree:MasterType:Thesis
Country:ChinaCandidate:L C WeiFull Text:PDF
GTID:2348330536467553Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The Top-k query of similar file belongs to nearest neighbor search problem,is to point to to find the most similar file of the specified file,which must use document similarity computing technology.As a basic calculation,file similarity is used to compare the similarity degree between the files and it has a wide range of applications in many aspects such as web page rechecking,article copy and search engines.The traditional linear search method is simple and easy to operate,but its computing complexity is too high.Although the LSH function method can effectively reduce the computational complexity,the accuracy of query is not high.According to the above problem,this paper proposes a Top-k similar file query algorithm based on Bloom filter,and we achieve the parallelization of the algorithm on the Spark platform.Experimental results show that the proposed Top-k query method not only has hign precision,but also has good space and time efficiency.Virtual machine image distribution technology has always been an important problem in cloud computing research,studying on the rapid distribution of virtual machine image in the cloud environment is significant for speeding up the deployment of the applications that users need and improving the service quality of the cloud computing platform.The image distribution technology based on P2 P is a popular and effective method to distribute images.Howerver,it ignores the similarity beween the stored images,and can not be used to mine the redundancy of data block in the network.In this paper,we modify the Bit Torrent source code and propose a new P2 P image distribution protocol based on similar file collaboration.The Top-k similar file query in parallel based on Bloom filter talked above is applied to the similar file query module of the new protocol to improve the query efficiency and accurary.The new protocol has opened up the transmission channel between different download groups.Peer ont noly can exchange data with other Peers who download the same image,but also can obtain the required data from the Seeds who download the similar image,thus the new protocol realizes the cross-image distribution.A test based experiment shows that our method increases the downloading rate by 10% to 70% compared to the standard BitTorrent protocol,moreover,the overhead is modest on the Tracker server.
Keywords/Search Tags:similarity computing, Top-k query, parallel, image distribution, P2P
PDF Full Text Request
Related items