Font Size: a A A

Research And Application On Matrix Compression Algorithms For Linear Model

Posted on:2020-05-09Degree:MasterType:Thesis
Country:ChinaCandidate:M X BaoFull Text:PDF
GTID:2370330602452530Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The explosion of data volume is a double-edged sword in the information time.Large-scale data matrices contain a large amount of information and also cause computer storage pressure and bandwidth pressure.As a result,larger and larger data brings a big challenge to the data storage and the efficiency of machine learning models.Thus,efficient compression of large-scale data matrices and compressed matrix operations tailored to the compression scheme are key technologies to address the challenge.Researchers found that large-scale machine learning data matrices have great compression potential.The general compression software and the sparse matrix compression algorithm supporting the SpMV(Sparse Matrix Vector Multiplication)can not solve the problem well.Therefore,it is necessary to study the matrix compression algorithm supporting the compressed matrix operations.This paper proposes a lossless matrix compression algorithm named COMC(Column-oriented Matrix Compression)that supports random access matrix rows and columns.The COMC uses a column-based compression framework to compressed each column of matrix in two layers.The first layer uses dictionary encoding,the idea is map the high frequent elements in the column into smaller integers to achieve the compression of the logical layer,and the output matrix called the dictionary code matrix.The second layer uses integer encoding,the bit layer encoding the dictionary code matrix,and the output matrix called compressed matrix and some auxiliary information.The COMC uses a column-based matrix compression framework and two layers of compression structures per column to compress the matrix,make full use of the general characteristics of the large-scale data matrices for machine learning models,and further effectively balancing the compression ratio and decompression efficiency.This paper proposes random access matrix row and column algorithms,like AccessR(k)and AccessC(k),supporting the compressed matrix operation,and applies the SIMD(Single Instruction Multiple Data)instructions in the access algorithms to further accelerate the algorithms.It has been found that the main computational quantities in the optimization process of machine learning models such as linear models are concentrated on matrix operations such as a?X,v~T?X and X?v etc.Therefore,this paper implements the efficient compressed matrix operations on the COMC compression matrix,like a?C,v~T?C and C?v etc.And this paper further implements the compressed simple linear regression model and the compressed logistic regression model with the compressed matrix operation as the main computational quantities.The compression matrix operation,like a?C,v~T?C and C?v etc.by randomly accessing the matrix rows and columns,completes the compressed matrix operation on the COMC compressed matrix without completely decoding the entire matrix.The compressed simple linear regression model and the compressed logistic regression model use the COMC compression matrix C as the input sample,and the optimization process of the model parameters is implemented by the compressed matrix operation.In this paper,We compared COMC algorithm,general compression software and sparse matrix compression algorithm supporting SpMV.The performance of the above algorithms are evaluated and analyzed in terms of compression ratio,decompression efficiency,compressed matrix operation efficiency and compressed linear model optimization efficiency.The experiment results show that the COMC algorithm performs well considering the compression ratio and decompression efficiency,and it is more flexible in the compressed matrix operation because it supports efficient random access rows and columns of the matrix.Furthermore,from the two aspects of compressed matrix operation performance and compressed linear model performance,COMC algorithm can better balance compression ratio and compressed matrix operation efficiency and compressed linear model optimization efficiency.Therefore,COMC algorithm can be used as a better compression scheme for the compressed linear model.
Keywords/Search Tags:Lossless compression, Dictionary encoding, Integer encoding, Random access, Compressed matrix operation, Compressed linear model
PDF Full Text Request
Related items