Font Size: a A A

Analytical query processing in data intensive applications

Posted on:2006-02-12Degree:Ph.DType:Dissertation
University:University of California, Santa BarbaraCandidate:Feng, YingFull Text:PDF
GTID:1458390008456937Subject:Information Science
Recent advances in web applications and IT infrastructures have to deal with large amounts of data collected from the Internet and business transactions. Such large scale data sets need more advanced database support for analyzing and processing and querying. Analytical queries are one of the important operations for business analysis and decision-making. These queries involve a large number of data entities and pre-processing is usually used to improve online response time along with efficient query processing algorithms. This dissertation addresses the efficient and scalable online execution of analytical query processing, including aggregate queries and rank queries.; As the widely used pre-computation method for aggregate queries, data cubes are prohibitively expensive in terms of computation time and storage space. A new data structure, range trie, is proposed to capture and utilize data dependency, or data correlations, in data sets. Furthermore, a new cube representation is proposed to compress data cube storage without information loss by taking advantage of the partial order relationships in a data cube. Therefore, the approach effectively reduces both the computation and I/O cost for cube computation. In addition, a hierarchical indexing scheme based on range tries is also suggested to process point aggregation queries and range aggregation queries. For temporal data sets in which data are appended periodically, this dissertation addresses the challenges for range aggregate queries due to the sparsity and incremental property of temporal attributes. A new data structure named PBBT (Perfect Binary Block Tree) is used to ensure logarithmic time complexity for both query processing and data appending.; Moreover, the dissertation introduces a scalable and efficient approach to ranking top-k preference queries. An optimal search algorithm based on multidimensional index structures is investigated and analyzed for performance bottlenecks due to memory consumption. A notion of dominance rank is proposed to pre-process data sets to adaptively reduce data accesses at runtime for a given top-k query. Thus the method ensures high level of scalability and stable performance with bounded memory for increasing data set size.
Keywords/Search Tags:Query processing, Queries, Data sets, New data structure
Related items