Font Size: a A A

Parallelization Optimization Of Vocabulary Tree Building Algorithm

Posted on:2015-11-25Degree:MasterType:Thesis
Country:ChinaCandidate:J X LiFull Text:PDF
GTID:2308330464963415Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Under the environment of the Big Data Era, the amount of data on the Internet becomes larger and larger, with an increasing growth rate. So the data-centric application emerges as the times require, leading to urgent needs of this kind of applications from users on the Internet. Among these data-centric applications, the rich and expressive application related to multimedia data has become the mainstream. These applications related to multimedia data make the basic support for the data-centric application, the image retrieval algorithm, becomes a new hotspot in research.However, in practice, since the image retrieval algorithm is both computationally intensive and data intensive, its efficiency may not satisfy the actual needs of users when it faces to large amounts of data. In the aspect of computation, the high dimension data and complex Euler distance computation lead to the great amount of computation and take a long time; in the aspect of input / output, large amounts of data will cause great pressures on input / output bandwidth, and even lead to repeating data input because of the memory capacity limitation, resulting to slow computation. As one of the mainstream image retrieval algorithm for feature matching, the Vocabulary Tree also has the problem on efficiency, especially in its training stage for tree building. Nowadays some general image libraries may contain millions of images, while the Vocabulary Tree building algorithm will take about 12 days in training for about 1.01 million images, including the processes of input / output and computation. This unacceptable time overhead urges the research for targeted performance optimization on the Vocabulary Tree building algorithm to satisfied users’ needs.In this paper, we do a detailed analysis on the basic logic and characteristics of Vocabulary Tree building algorithm, find out the its limitations on performance, combine it with the current growing popularity of multi-core processors and cluster environment, and finally implement various parallelized versions of Vocabulary Tree building algorithm. These different kinds of parallelization optimization designs give consideration to both computation intensity and data intensity, and reflect the variety of design ideas from the simple to the complex. These designs include different ways to handle parallel tasks and resources for computational parallelization, and data distribution and formatting for input / output. With considering differences between single machine and cluster environment are also considered, the efficiency of optimized Vocabulary Tree building algorithm on computation and input / output has been improved.The major conclusions in this paper include:● Analyze the Vocabulary Tree building algorithm’s main process: including characteristics on data intensive and computationally intensive, potential parallelization optimization space and major optimization difficulties.● Facing to the computation intensity on algorithm logic:design and implement the static (allocate processor core by estimating), semi-dynamic (dynamic invocate idle processor cores for small nodes), and dynamic (completely dominated by processor cores) computation parallel versions of the Vocabulary Tree building algorithm.● Facing to the data intensity on input / output: design and implement the input / output parallel version of the Vocabulary Tree building algorithm, then facing to larger amounts of data, analysis the limitation of parallelization on single machine, design and implement the cluster data parallel version of the Vocabulary Tree building algorithm under the MPI framework, and analysis the optimization from data formatting and SSD.Experimental results show that the 8-way single machine parallel version can achieve about 36X speedup on computation 36X speedup on input / output, and about 32X speedup in total for the 0.11 million image dataset; and the 4-machine 8-process cluster parallel version can achieve about 60X speedup in total for the larger 0.61 million image dataset, and even achieve about 300X speedup for 1.01 million image dataset, with data formatting and SSD.
Keywords/Search Tags:Image Retrieval, Vocabulary Tree, Algorithm, Performance, Parallelization
PDF Full Text Request
Related items