Font Size: a A A

Mining Algorithm Based On The Data Of The Compressed Data

Posted on:2009-09-16Degree:MasterType:Thesis
Country:ChinaCandidate:H T MaFull Text:PDF
GTID:2208360245460191Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Data compression technology can improve the efficiency of storing and the capability of database. Data mining, which analyzes and processes volumes of data, and helps people effectively obtain the useful and conclusive information or knowledge, is becoming one of the most advanced and active research topics in the field of information decision-making. In recent years, although people have researched deeply on data mining field, few research on compressed data mining has been made.So researching data mining algorithm on compression data is of great importance both in theory and in practice.This paper mainly studies the technology of data mining for compressed data, including association rule mining, classification and clustering algorithm. First, we present an compression algorithm called H_ItCompress,which fully considers the selection of representative records and uses the strategy of one compression record corresponding with more than one representative record.So the algorithm has better compression ratio than that of other compression algorithms. Second, this paper proposes a algorithm of mining association rules called C_SPARMing and a classification mining algorithm called CMSA_CBA for compressed data. The two algorithms are all applicable to data compressed with the kind of compression algorithm based on representative rows like H_ItCompress. The two algorithms can be operated directly on compressed data without the need of decompressing them first. So they are all high efficient and scalable. Third, we propose a cluster mining algorithm called CCMD_P on compressed multidimensional data set compressed by mapping-complete compression algorithm. Because the algorithm operates directly on compressed data and combines partition and hierarchical clustering methods,it can not only obtain any kind of cluster but also can obtain higher efficiency and better scalability. Last, we realize a compressed data mining prototype system.
Keywords/Search Tags:Data Compression, Data Mining, Association Rule, Classification Clustering
PDF Full Text Request
Related items