Font Size: a A A

Ralational Database Compression Based On Tuples-Clustering

Posted on:2008-12-17Degree:MasterType:Thesis
Country:ChinaCandidate:X J ZhangFull Text:PDF
GTID:2178360215957241Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Database compression is an important branch in data compression field. Classical database compression methods compress database in stream mode, taking no account of the distribution of the redundancy and the orderly storing problem of datum after compressing. In this thesis, based on the analysis of classical database compression methods, a database compression method based on tuples-clustering is presented and studied. The main work and the outcome are as follows:Firstly, a database compression mechanism based on tuples clustering is established, and a database compression system is designed. The database compression procedure is divided into two separate phases in the compression system, the tuples in database are grouped according to the redundancy for first, then compressed.Secondly, considering the actual instance of data in databases, for the sake of making K-means algorithm suit to database tuples clustering, we improve and optimize the initial parameters of K-means algorithm. We propose and design a clustering cost function, and present an optimization algorithm to optimize the k value, then improved the initial center tuples creation algorithm as well, so as to K-means clustering using the initial center tuples can be suitable for database tuples compression.Finally, based on the tuples clustering, group-center reference mode is proposed, and the clustering results are linked by reference relation. Then we design a defference compression algorithm on tuples level to compress the tuples in database. With the reference relation, the datum after compressing are stored in reference tree mode, some operations of reference tree are defined, and the reference relation and some information of databases are kept down to help decompression.
Keywords/Search Tags:Database compression, Tuples clustering, Difference compression, Reference tree
PDF Full Text Request
Related items