Analytical query processing in data intensive applications

Posted on:2006-02-12

Degree:Ph.D

Type:Dissertation

University:University of California, Santa Barbara

Candidate:Feng, Ying

Full Text:PDF

GTID:1458390008456937

Subject:Information Science

Abstract/Summary:

Recent advances in web applications and IT infrastructures have to deal with large amounts of data collected from the Internet and business transactions. Such large scale data sets need more advanced database support for analyzing and processing and querying. Analytical queries are one of the important operations for business analysis and decision-making. These queries involve a large number of data entities and pre-processing is usually used to improve online response time along with efficient query processing algorithms. This dissertation addresses the efficient and scalable online execution of analytical query processing, including aggregate queries and rank queries.; As the widely used pre-computation method for aggregate queries, data cubes are prohibitively expensive in terms of computation time and storage space. A new data structure, range trie, is proposed to capture and utilize data dependency, or data correlations, in data sets. Furthermore, a new cube representation is proposed to compress data cube storage without information loss by taking advantage of the partial order relationships in a data cube. Therefore, the approach effectively reduces both the computation and I/O cost for cube computation. In addition, a hierarchical indexing scheme based on range tries is also suggested to process point aggregation queries and range aggregation queries. For temporal data sets in which data are appended periodically, this dissertation addresses the challenges for range aggregate queries due to the sparsity and incremental property of temporal attributes. A new data structure named PBBT (Perfect Binary Block Tree) is used to ensure logarithmic time complexity for both query processing and data appending.; Moreover, the dissertation introduces a scalable and efficient approach to ranking top-k preference queries. An optimal search algorithm based on multidimensional index structures is investigated and analyzed for performance bottlenecks due to memory consumption. A notion of dominance rank is proposed to pre-process data sets to adaptively reduce data accesses at runtime for a given top-k query. Thus the method ensures high level of scalability and stable performance with bounded memory for increasing data set size.

Keywords/Search Tags:

Query processing, Queries, Data sets, New data structure

Related items

1	The Research Of Key Techniques Of Uncertain TOP-K Query Processing
2	Visual Construction Of Scientific Data Queries And Query Processing Optimization Techniques
3	Efficient structural query processing in XML databases
4	Fast Computation on Processing Data Warehousing Queries on GPU Devices
5	Data analysis and query processing in wireless sensor networks
6	Technology For Answering Queries On Incomplete Data
7	Tqindex:An Effective Index Structure For Processing Temporal Queries On Very Large Scale Of Data
8	Research On Key Technologies Of Fuzzy XML Data Storages And Queries
9	Efficient query processing for data integration
10	Online query processing in Geographic Information Systems