Font Size: a A A

Data Index Technology Research Based On Parallel Computing Platform

Posted on:2012-02-19Degree:DoctorType:Dissertation
Country:ChinaCandidate:B LongFull Text:PDF
GTID:1118330335462393Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the popularity of personal computers and the expansion of computer networks, digital information grows explosively. The generation, dissemination, collection and retrieval of information are now the most basic need of the human life. As is known, data index is the critical component of information retrieval system, which is used to manage and query data effectively and efficiently. However, with the requirement of energy-efficient computing as well as the growth of data and user queries, conventional index technique which improves the processing capacity by means of increasing the number of ordinary computers encounters an unavoidable bottleneck. Meanwhile, the high productivity parallel computing systems are greatly pushed forward due to the development of high performance computing applications related with people's daily lives, such as web search engines and social networks. Hence, how to make use of low-power parallel computing platforms to improve the performance and capacity of information retrieval systems attracts the growing attention of researchers in this field.With the development of parallel computing platforms, there is an inevitable trend that building index application on the parallel computing platforms. In this paper, we use the integration research methods of parallel "Architecture - Algorithms Programming - Application", and deeply study how to promote the performance of data intensive index, how to parallelize the data intensive index system form the serial version, and how to achieve the good system cost-efficiency. We overcome the research barriers that appeared in the transition from the serial era to the parallel era, and our methods are helpful in refining the parallel computing research system and enhance the application power of parallel computing platforms. Meanwhile, using high-dimensional data indexing as a typical application in parallel computing platform, we propose the HKD-tree hybrid index structure and the time series index, which solve data index problems on parallel computing platform. On the other hand, focusing the hotspot in data retrieval system, we deeply research the real-time timeliness, the performance and power consumption problem. By improving the traditional inverted index and solve the real-time problem in data-intensive system on parallel computing, the high effectiveness of index process is achieved. In summary, this paper study data index on parallel computing platforms, which can enhance the index efficiency and parallel performance, so that the parallel computing power is fully utilized. Our methods have some significance and wide application prospects. The main research contributions and innovations can be summarized as follows:1. HKD-tree hybrid index structure based on parallel computing platform. This structure combines KD-tree and Local Sensitive Hash (LSH), uses KD-tree structure as the upper trunk and LSH structures as leaf, which can take advantage of the hierarchical feature of parallel structure of multi-core cluster system. Compared with the traditional index structure, this hybrid index structure has good parallel efficiency and scalability, and is suitable for multi-core cluster system and high-dimensional data indexing problem.2. A real-time updating inverted index based on KD-60 domestic parallel platform. This scheme uses a main and an auxiliary inverted index together with the content filtering index, and realizes the real-time query and the index of the combination of content filtering, enabling a search process in real time. At the same time, we applied this index on high-performance green computing platform KD-60, so that the high cost-efficiency is achieved in a certain degree.3. A time series index structure based on parallel computing platform. We proposed a time series index on parallel computing platform and the power analysis model of it. Besides solve the time series index problem, this scheme realizes high efficiency and low power consume, instead of consider performance only. Experiments show that the time series index on parallel computing platform gives a good solution to the performance problem of time series indexing, as well as low power consumption.
Keywords/Search Tags:Parallel computing platform, data index, integration of research methods, inverted index, high efficient, low power
PDF Full Text Request
Related items