Font Size: a A A

Research On Key Methods Of Efficient Multi-dimensional Online Analytical Processing Query

Posted on:2013-05-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Z GaoFull Text:PDF
GTID:1228330377461080Subject:Information management and information systems
Abstract/Summary:PDF Full Text Request
As one of the key techniques of business intelligence (BI), on-lineanalytical processing (OLAP) is a very important way for knowledgeacquisition and decision support. Meanwhile, due to the increase of datavolumn and data dimensionality caused by the improvement ofinformatization, and the high-efficiency required by decision support, theresearch on efficient OLAP query is developing. Aiming at decisionsupport, it becomes an important research topic to study how to improvethe multi-dimensional OLAP query efficiency in massive high-dimensional data sets so as to shorten the query time.The purpose of this dissertation is to realize the high-efficiency ofmulti-dimensional OLAP query by studying the key approaches forimproving the OLAP query efficiency. In order to bring the full play ofdata analysis, the concept of on-line analytical mining (OLAM) is usedfor reference in this dissertation. Data mining techniques and statistical analysis approaches are integrated to form an OLAP query frame. Thenthe key techniques are studied in the framework. Since data cube is thebase of OLAP query of which the construction way directly influences onthe OLAP query efficiency, the materialization way of data cube isstudied. On the other hand, OLAP approximate query approach is capableto realize the tradeoff between the query time and the query accuracy,which is beneficial to remarkably improve the OLAP query efficiency, soit is also considered to be a part of main content of this dissertation.Meanwhile, in order to improve the efficiency of OLAP query, therecommendation of OLAP query dimensions is another way from theother perspective. This approach focuses on aid decision making, whichprovides the users the dimensions closely related to the query target so asto shorten the query time. To study on the approaches which can improvethe efficiency of OLAP query mentioned above, the primary work of thisdissertation includes: (1) The idea of data mining is introduced into the research onhow to improve the OLAP query efficiency in this dissertation. TheApriori theory is applied in building data cube which is the base of OLAPquery. User-interest is proposed to be the constraint condition of choosingthe frequent data cuboids. According to the frequently used queries, aniceberg data cube construction algorithm is designed to construct the datacube, meanwhile the method to incrementally update the iceberg cube isproposed. It enables the system to respond the OLAP query withoutrealtime computing by partial materialization. On the other hand, sincethe approach is based on the real log of queries, the OLAP queryefficiency is further improved due to the strong support to the queriesrequested by users while the data storage space is drastically reduced.(2) Model construction for OLAP approximate query is also aneffective way to improve the query efficiency. During the research,Copula theory is extended to the new filed to build a statistical model on continuous dimensions for range queries. The model is used to extract thedata synopsis, which stores the related samples and the information ofparameters. The model drastically saves the data storage space whileimproves the OLAP query efficiency guaranteed with acceptableaccuracy rate. In order to improve the accuracy of OLAP approximatequery model, some methods are carried out to solve the problem. First, inorder to fit the marginal distribution of each dimension more precise,nonparametric kernel density estimation is applied instead of parametricmodels with known distribution, by which the applicability of the modelis extended to most type of data. Second, after considering the existingcorrelation between dimensions, Copula is used to fit the jointdistribution, which is aimed at extracting the dependency structure to fitthe data distribution more accurately. On the other hand, it is easy toimplement OLAP operations like drill-down or roll-up on continuousdimensions without setting up the dimension levels in advance, which makes the OLAP query procedure flexible.(3) The high-dimensional data environment is considered in thisdissertation.“C-Vine” Pair Copula is adopted, which makes a further stepon applying Copula to OLAP approximate query. It fits the structure ofOLAP data set, and the difference of correlation between differentobservation dimensions and measure dimension is taken into account, bywhich the dependency structure is constructed according to the features ofsamples. The accuracy of the query results derived by the model is furtherimproved, especially in high-dimensional data environment.(4) From the perspective of introducing data mining to OLAPquery, feature selection is applied to OLAP query dimensionrecommendation in high-dimensional OLAP data set. The OLAP data setincludes massive data and there is correlation with different intensitybetween different dimensions. Consequently, the OLAP query efficiencyis influenced and the decision efficiency and accuracy are interfered. In order to remove the redundant dimensions according to the query target, adimension selection algorithm is designed to support the recommendationof OLAP query dimensions. This algorithm is capable to remove thedimensions not related to the decision target based on the information ofdecision attribute classification, as well as to recognize the dimension setincluding the correlated dimensions. The purpose of this work is toprovide users the referential query dimension set, so as to improve thequery and decision efficiency.
Keywords/Search Tags:OLAP Query Efficiency, Iceberg Data Cube, OLAPApproximate Query, Copula, Kernel Density Estimation, OLAPQuery Recommendation, Variable Selection
PDF Full Text Request
Related items