Font Size: a A A

Study On Indexing And Query Processing Techniques For High Dimensional Data

Posted on:2010-01-31Degree:DoctorType:Dissertation
Country:ChinaCandidate:J M HuangFull Text:PDF
GTID:1228330371450328Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Recently, as the development of the means of data collection, the amount of data that could be used by people has increased dramatically. It has been a hot topic in the academical community that how to manage these data in an efficient way and facilitate the user to find interesting information. Although high dimensional data index and its query evaluation have got attentions and deep research, there are still several problems to be solved:(1) traditional dimension reduction techniques avoid the overlap of index area in a certain extent, but the problems of high maintenance cost and discontented results still exist; (2) recent skyline cardinality tuning algorithm only cannot entirely control the number of query results, which means they cannot control the number of results accurately, and neglect the dealing when the original query results are too few; (3) traditional skyline query techniques are all proposed on certain database, so they cannot be applied directly on uncertain data.This dissertation deeply studies the techniques of high dimensional index and its query processing in detail. Its contributions are summarized as follows:(1) High dimensional index and its similarity search have been researched and a novel dimension reduction techniques and an innovative index structure, called MS-tree, have been proposed. We introduce the conception of active subspaces and inactive subspaces, and by analyzing the characters of high dimensional data research, we propose the conception of false active subspaces. Then a basic idea of solving the problem of accessing active subspaces is given, which is named maximal gap space mapping strategy (MaxGapMapping). Based on this basic idea, we design and implement a novel index structure, MS-tree, and its research algorithm has also been introduced. The experimental results show that MS-tree reduces the accesses to false active subspaces, comparing with other index and filtering techniques, and gets a better performance. (2)δ-skyline query of high dimensionality has also been researched. We proposeδ-skyline which can control the cardinality of skyline results, through the adjustment of special skyline, skyline and general skyline. Based on an improved SFS algorithm, a naiveδ-skyline query algorithm, NDA, is proposed. Another algorithm, NoS, based onδ-skyline cube derived from skyline cube, is also proposed. For the problem of too many repeat data points in NDA and the huge maintenance cost ofδ-skyline cube in NoS, an improvedδ-skyline query algorithm, ICA, is proposed. The experimental results show thatδ-skyline can satisfy the needs of users, which could offer an appropriate number of skyline results, and it also indicates that NDA, NoS and ICA are all accurate forδ-skyline.(3) Fuzzy skyline query based on fuzzy theory is researched. We analyze the characters of skyline points and non-skyline points, propose the conception of dominating degree skyline point and membership degree of non-skyline points respectively. And an efficient fuzzy skyline query algorithm, FSCA, is proposed to solve fuzzy skyline. The experimental results show that FSCA can control the number of skyline results from 0 to any size of collection, which much better helps the users to fill their needs.(4) The threshold skyline query on high-dimensional uncertain data is also discussed. We analyze the semantic characters of threshold skyline query on uncertain data, and then propose a basic algorithm, BPS, based on R-tree; through the analysis of the relationship between tuples and skyline probabilities, an improved BPS algorithm with effective filtering strategy, IPS, is proposed. The experimental results show that the performance of IPS can satisfy the needs of users and it is a useful threshold skyline query algorithm.(5) Uncertain skyline query on high-dimensional uncertain data is research in the article. Through the analysis of uncertain skyline query, we propose a basis algorithm, BUS, to solve uncertain skyline query, using the states space searching. Through the deep analysis of relations of different states, an improved algorithm FUS is proposed adding effective filtering strategy into BUS. The experimental results show that FUS has a better performance than BUS.
Keywords/Search Tags:High dimensional data, index, similarity query, skyline query, uncertainty, possible world
PDF Full Text Request
Related items