Font Size: a A A

Research Of Selectivity Estimation Algorithm For String Predicates Based On Modiifed PST

Posted on:2014-04-26Degree:MasterType:Thesis
Country:ChinaCandidate:Q X ZhangFull Text:PDF
GTID:2268330425975868Subject:Software engineering
Abstract/Summary:PDF Full Text Request
XML’s advantages of self-describing, verifiability and scalability make it quickly beaccepted by the Computer World and widely used, as it gradually becomes the importantstandard of data representation and data exchange, the complexity and the scale of XML datahave greatly increased, this has proposed a new requirement on the management of massXML data. The most effective and straight method is using database technique to manageXML data. Query processing is the eternal theme of database, The performance of the XMLquery optimization algorithm influences the performance of XML query directly. In the queryoptimization algorithm that is based on cost estimation, the selectivity of predicate is oneindispensable factor when estimating query cost. Generally, the selectivity of predicate isobtained through querying the statistical information of the values of the XML documentelements, but because the storage space is limited, the statistical information of the values ofthe XML document elements is not complete, so it is required to estimate the selectivity ofpredicate through specific estimation strategy.State-of-art XML document string type value statistical information techniques mainlybased on pruned count-suffix tree (PST) which is used as the outline statistical informationstore structure of string type value. The selectivity of string predicate is estimated throughspecific selectivity estimation algorithm, but these estimation algorithm based on PST oftensuffer severe underestimating and overestimating problems, thus their average relative errorsare not good.To improve the estimation accuracy of string predicate selectivity estimation algorithmbased on PST, we analyze the main causes of the underestimating and overestimatingproblems of existing algorithms, KVI and MO, then propose a novel restricted prunedcount-suffix tree (RPST) and a new pruning strategy. Based on these, we present the EKVIalgorithm and the EMO algorithm which are extended from the KVI algorithm and the MOalgorithm respectively,and then describe the application of RPST on multiple string booleanquery predicate estimation algorithm. The experiments compare the EKVI algorithm and theEMO algorithm with the traditional KVI algorithm and the MO algorithm, and the results show that the average relative errors of our selectivity estimation algorithms are significantlybetter than the traditional selectivity estimation algorithms. The EMO algorithm is the bestamong these algorithms from the overall view.
Keywords/Search Tags:Query optimization, Selectivity estimation, String predicate, PST
PDF Full Text Request
Related items