Font Size: a A A

Low-rank Matrix Approximation Algorithm And Its Applications

Posted on:2014-01-07Degree:DoctorType:Dissertation
Country:ChinaCandidate:G Z YuanFull Text:PDF
GTID:1228330401960174Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With the growing popularity of personal computers and the rapid development of the In-ternet, it becomes more and more easier for people to collect and store data. A large amount ofdata has been accumulated in all fields of scientific research and social life in the last decade.Therefore, howtoanalyzethesedataaswellashowtoeffectivelymanagethesedatahasbecomea common concern in both computer science and applied mathematics. Many machine learningand data management tasks can be expressed as matrix forms, examples of which include kernellearning, metric learning and data differential privacy. However, real-world applications ofteninvolves thousands of millions of records or samples, the space complexity and time complexityofcurrentmatrix-baseddataanalysistechniquesscalesroughlyquadraticallywiththesizeoftheproblem, whichmakesmanyapplicationsquicklyintractableasthesizeoftheproblembecomeslarge. Therefore, how to approximate the target matrix and make the data analysis techniquesmore accurate and scalable is a hot topic in the machine learning and data management commu-nity. Inspired by recent work on sparse and low-rank matrix approximation techniques such asSupport Vector Machines (SVMs), Compressed Sensing (CS) and Non-Negative Matrix Factor-ization (NMF), a variety of matrix-based data analysis approaches have been proposed by theresearchers.In this thesis, we discuss the problem of low-rank matrix approximation algorithm and itsapplications in machine learning and data management. Generally speaking, the contributionsof this thesis are three-fold.1) We proposed an efficient algorithm for low-rank quadratic semidefinite programming prob-lem. Low rank matrix approximation is an attractive model in large scale machine learningproblems, because it can not only reduce the memory and runtime complexity, but also pro-videanaturalwaytoregularizeparameterswhilepreservinglearningaccuracy. Inthispaper,we address a special class of nonconvex quadratic matrix optimization problems, which re-quire a low rank positive semidefinite solution. Despite their non-convexity, we exploit thestructure of these problems to derive an efficient solver that converges to their local optima.Furthermore, we show that the proposed solution is capable of dramatically enhancing theefficiency and scalability of a variety of concrete problems, which are of significant interest to the machine learning community. These problems include the Top-k Eigenvalue Problem,Distance Learning and Kernel Learning. Extensive experiments on UCI benchmarks haveshown the effectiveness and efficiency of our proposed method.2) We proposed a bilateral greedy optimization algorithm for large scale low-rank semidefiniteprogramming. Many machine learning tasks (e.g. metric and manifold learning problems)can be formulated as convex semidefinite programs. To enable the application of these taskson a large-scale, scalability and computational efficiency are considered desirable proper-ties for a practical semidefinite programming algorithm. In this paper, we propose a newbilateral greedy optimization (denoted BILGO) approach to solve general semidefinite pro-gramsonlarge-scaledatasets. Ascomparedtoexistingmethods, BILGOemploysabilateralsearch strategy during each optimization iteration. In such an iteration, the current semidef-inite matrix solution is updated as a bilateral linear combination of the previous solution anda suitable rank-1matrix, which can be efficiently computed from the leading eigenvectorof the descent direction at this iteration. By optimizing for the coefficients of the bilateralcombination, BILGO reduces the cost function in every iteration, thus guaranteeing con-vergence to a global optimum. For an-accurate solution, the number of BILGO iterationsneeded for convergence is O(1). The proposed algorithm thus successfully combines theefficiency of conventional rank-1update algorithms and the effectiveness of gradient de-scent. Last but not the least, BILGO can be easily extended to handle low rank constraints.The low-rank optimization techniques can always improve the greedy baseline implementa-tion, the result is a semidefinite programmming optimization algorithm robust and fast. Ourexperimental analysis on large real datasets clearly demonstrates the superiority of BILGOover state-of-the-art methods.3) We proposed an efficient and accurate algorithm for optimizing batch linear counting queryunder differential privacy. Differential privacy is a promising privacy-preserving paradig-m for statistical query processing over sensitive data. It works by injecting random noiseinto each query result, such that it is provably hard for the adversary to infer the presenceor absence of any individual record from the published noisy results. The main objectivein differentially private query processing is to maximize the accuracy of the query results, while satisfying the privacy guarantees. Previous work, notably the matrix mechanism, hassuggested that with an appropriate strategy, processing a batch of correlated queries as awhole achieves considerably higher accuracy than answering them individually. However,to our knowledge there is currently no practical solution to find such a strategy for an arbi-trary query batch; existing methods either return strategies of poor quality (often worse thannaive methods) or require prohibitively expensive computations for even moderately largedomains. Motivated by this, we propose the Low-Rank Mechanism (LRM), the first prac-tical differentially private technique for answering batch queries with high accuracy, basedon a low rank approximation of the workload matrix. We prove that the accuracy providedby LRM is close to the theoretical lower bound for any mechanism to answer a batch ofqueries under differential privacy. Extensive experiments using real data demonstrate thatLRM consistently outperforms state-of-the-art query processing solutions under differentialprivacy, by large margins.
Keywords/Search Tags:Low-Rank, Metric Learning, Kernel Learning, Sparse Eigenvalue Decompostion, Differential Privacy, Semi-definite Programming, Largest Eigenvalue
PDF Full Text Request
Related items