Font Size: a A A

Data Cube Implementation Of Dimension Frequent Itemset

Posted on:2019-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:T GuanFull Text:PDF
GTID:2348330566464634Subject:Engineering·Computer Technology
Abstract/Summary:PDF Full Text Request
Purpose —In the current research,high-dimensional and high-level data sets will cause dimension disaster when constructing data cubes.In order to solve the problem of high dimensions,it is proposed to find the key dimensions and only build and store the data cubes for the key dimensions.In order to solve high-level problems,hierarchical coding of hierarchical keywords is proposed,and fine-grained hierarchical data sets are stored.coarse-grained data sets are aggregated from finegrained data sets.Design/methodology/approach —The improved apriori algorithm is used to find the key dimensions and get the frequent itemsets of dimensions.The improved fragmentation method is used to materialize the data in fragments by combining hierarchical and non-hierarchical dimensions,and to retain the association index for the data between fragments.Findings —Firstly,By establishing an inverted arrangement table when searching for key dimensions,I / o overhead of scanning and counting the database when generating candidate item sets is reduced,and transaction lists in the inverted arrangement table are expressed in binary,thus reducing storage overhead.Secondly,the key dimensions are segmented,the data cubes are completely materialized in the segments,and the sub-cubes are reduced by combining one hierarchical dimension with multiple non-hierarchical dimensions.At the same time,hierarchical coding is carried out on the hierarchical keywords to reduce the storage pressure of the data.Research limitations/implications —(1)In the process of finding frequent itemsets of dimensions,the determination of thresholds and additional time and storage overhead.(2)For dimensions that are not involved in the data cube,you need to query the original data warehouse.Practical implications —Through the construction of key dimension data cubes,the use of some invalid data is reduced.The data cube construction efficiency is improved and the storage pressure is reduced.Originality/value —Through the construction of data cubes of key dimension,the use of some invalid data is reduced and the efficiency of data cube construction is improved.
Keywords/Search Tags:Data mining, Data cube, Apriori algorithm, Fragmented data cube, Dimension-frequent items
PDF Full Text Request
Related items