Data Cube Implementation Of Dimension Frequent Itemset

Posted on:2019-02-28

Degree:Master

Type:Thesis

Country:China

Candidate:T Guan

Full Text:PDF

GTID:2348330566464634

Subject:Engineering·Computer Technology

Abstract/Summary:

Purpose —In the current research,high-dimensional and high-level data sets will cause dimension disaster when constructing data cubes.In order to solve the problem of high dimensions,it is proposed to find the key dimensions and only build and store the data cubes for the key dimensions.In order to solve high-level problems,hierarchical coding of hierarchical keywords is proposed,and fine-grained hierarchical data sets are stored.coarse-grained data sets are aggregated from finegrained data sets.Design/methodology/approach —The improved apriori algorithm is used to find the key dimensions and get the frequent itemsets of dimensions.The improved fragmentation method is used to materialize the data in fragments by combining hierarchical and non-hierarchical dimensions,and to retain the association index for the data between fragments.Findings —Firstly,By establishing an inverted arrangement table when searching for key dimensions,I / o overhead of scanning and counting the database when generating candidate item sets is reduced,and transaction lists in the inverted arrangement table are expressed in binary,thus reducing storage overhead.Secondly,the key dimensions are segmented,the data cubes are completely materialized in the segments,and the sub-cubes are reduced by combining one hierarchical dimension with multiple non-hierarchical dimensions.At the same time,hierarchical coding is carried out on the hierarchical keywords to reduce the storage pressure of the data.Research limitations/implications —(1)In the process of finding frequent itemsets of dimensions,the determination of thresholds and additional time and storage overhead.(2)For dimensions that are not involved in the data cube,you need to query the original data warehouse.Practical implications —Through the construction of key dimension data cubes,the use of some invalid data is reduced.The data cube construction efficiency is improved and the storage pressure is reduced.Originality/value —Through the construction of data cubes of key dimension,the use of some invalid data is reduced and the efficiency of data cube construction is improved.

Keywords/Search Tags:

Data mining, Data cube, Apriori algorithm, Fragmented data cube, Dimension-frequent items

Related items

1	OLAP Algorithm Research Based On Dimension Hierarchy For Data Cube
2	Techniques Research For Data Cube Compression
3	Multidimensional Data Model For Mining And Analysis Based On Multiple Structure Data Cube
4	Research Of OLAP And Data Mining Technology Based On Water Supply Data Cube Of Quantity And Charge
5	The Online Mining Of Data Cube Gradient
6	Study And Implementation On Frequent Closed Cube Mining Algorithm Of Three Dimensional Microarray Data Sets
7	Research On Data Mining Algorithms Based On Association Rules
8	Research On Count-based Algorithm For Mining Frequent Items Over Data Stream
9	Medical-Information-Based Data Mining Research
10	Association Rule Mining Technique Based On Data Cube