Font Size: a A A

Accelerated Multi-task Online Learning Algorithm For Big Data Analytics

Posted on:2016-06-02Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z J LiFull Text:PDF
GTID:1318330482957964Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
The advent of big data has been presenting a large array of applications that require real-time processing of massive data with high velocity. How to mining big data stream in a wide range of real-world applications become more and more important. Conventional batch machine learning techniques suffer from many limitations when being applied to big data analytics tasks. Online learning technique with stream computing mode is a promising tool for data stream learning. So far, a large number of online learning algorithms have been proposed. These algorithms consist of three parts:1) online learning for linear model; 2) kernel-based online learning for nonlinear model; 3) non-traditional online learning methods. Among them, classical online learning methods consist of online learning for linear model and kernel-based online learning for nonlinear model, and non-traditional online learning methods are latest online learning methods which are promising to tackle the emerging challenges of mining big data in a wide range of real-world applications. In this dissertation, we firstly introduce the motivation and background of big data analytics, and overview online learning algorithms and their key problems. Then the family of non-traditional online learning methods and their typical applications for big data analytics are investigated detailedly. The main contributions of the dissertation are summarized as follows.(1) A multi-task accelerated online learning algorithm is proposed, which is applied for large-scale collaborative filtering to obtain user matrix and item matrix.For traditional online learning algorithm, the learning objective is weight vector, and the convergence rate is only O(1/(?)T) up to T-th iteration. To tackle with this problem, we propose a novel multi-task accelerated online learning algorithm, where learning multiple related tasks simultaneously by exploiting shared information across tasks has demonstrated advantages over those models learned within individual tasks. Moveover, an improved micro-batch accelerated technique is adopted, which obtain optimal convergence rate O(1/T2). Proposed multi-task accelerated online learning algorithm can more effectively solve large-scale collaborative filtering problems.(2) A accelerated online learning algorithms for group lasso is proposed, whose group LASSO model is widespreadly applied in biological information analysis.The batch-mode group LASSO algorithms suffer from the inefficiency and poor scalability. To tackle with this problem, we develop a novel accelerated online learning algorithm to solve sparse group lasso model. The sparse group lasso model can achieve more sparsity in both the group level and the individual feature level. We provide strict theoretical analysis for the accelerated online algorithm. Moveover, we derive closed-form solution to update the weight vector based on the average of previous gradients, which yields worst-case time complexity and memory cost at each iteration both in the order of O(d), so the proposed online algorithm can be very efficient and scalable. The experimental results on synthetic and real-world datasets demonstrate the merits of the proposed online algorithm for sparse group LASSO.(3) Online multiple kernel learning algorithm based on nonlinear group LASSO is proposed, whose convergence rate and worst-case mistake upper bound are analyzed.For big data with widespread data source and complicated model, online optimal kernel learning often suffer from many limitations when being applied to big data analytics tasks. To tackle with this problem, we formulate a closed-form solution for optimizing the kernel weights, which derive the nonlinear group LASSO model of multiple kernel learning. Stochastic gradient descent method is applied to solve the equival model. In the algorithm, truncation and speedup are applied to effectively solve the online kernel expansion problem. Detailed theoretical analysis for the algorithm convergence rate and mistake bound are provided. Moveover, imbalanced online learning with kernels is investigated, which is applied in various real-world applications, such as abnormal behaviors in surveillance systems, fraudulence in credit card transactions, and clicking behaviors in online ads/news.(4) A method of inerement of diversity combined with quadratic discriminant analysis is proposed, which is used to real-time predict gene splice sites.Traditional batch-mode obviously unfit huge gene sequencing big data analytics. Moveover, the comparison of compositional features between two sequences and the comparison of base dependeneies at adjacent or non-adjacent positions of two sequences are necessary for biological information analysis. To tackle with this problem, motivated by the characteristics of base composition and base correlation in the adjacent segment sequences, we propose the method of inerement of diversity combined with quadratic discriminant analysis to study the dependence structure of splicing sites and predict the gene splice sites in online learning mode, which has good popularization and application value in gene sequencing big data analytics.
Keywords/Search Tags:big data streaming, online learning, accelerated, multi-task, group LASSO, increment of diversity
PDF Full Text Request
Related items