Font Size: a A A

The Research Of Improved Parameter Server Oriented To Genome-Wide Problem

Posted on:2017-03-11Degree:MasterType:Thesis
Country:ChinaCandidate:S YuanFull Text:PDF
GTID:2180330488464360Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the declining of high-throughput sequencing costs, the study on the genome-wide data begins to develop. Due to a surge in data scale, the previous methods based on traditional statistical analysis show the problems of huge workload, low efficiency and so on. Consequently, the large-scale machine learning oriented to the genome-wide becomes an important direction of research and development. Facing this problem, many organizations try to use the universal distributed computing framework such as Hadoop, Spark and so on, but the effect is not satisfying. The main reason is that the framework is not applicable to solve the problem of genome-wide machine learning. Therefore, this paper presents a distributed computing architecture based on the parameter server to cope with the genome-wide machine learning problems.Parameter server has been an emerging abstraction of the distributed machine learning framework in the past two years, which has been further applied in the large advertising system and the artificial intelligence system at present. This concept was first proposed in 2010 by Alex Smola in the design of parallel LDA framework. After that, the parameter server was widely noticed by the industry as the solution of Google Brain in 2012.The core design of its architecture is to improve the model parameter storage and update into independent components, and to use the asynchronous mechanism to enhance the processing capacity. This design can efficiently solve the inefficient iteration problem which is brought by parameter non-homogeneity in the solving process of large-scale machine learning. Besides, it also can greatly decrease the resource waste in the process of communication, coordination and wait. At the same time, this optimization also makes the model solution efficiency increase linearly with the increase of machine capacity, thus provides new idea to solve the genome-wide machine learning problem.At first, this paper systematically analyzes the difficulties of the genome-wide machine learning problems in computer technology. What’s more, it summarizes and discusses the abstract characteristics and applicability of the existing mainstream distributed computing frameworks. And then, aiming at the efficiency problem of genome-wide machine learning, this paper adapts FTRL algorithm to improve the traditional parameter server architecture, and develops an improved parameter server model called GW-PS. GW-PS can prevent over-fitting and promote the sparsity of the machine learning model, thus provide better adaptability to the genome-wide data. On this basis, this paper also improves the traditional convolution neural network structure according to the practical needs of gene sequence specificity recognition. Moreover, it respectively compares model training efficiency in detail on GW-PS and Spark architecture. The experiment proves that for genome-wide machine learning problem the GW-PS is superior to the traditional Spark architecture on both efficiency and performance. It explores the feasibility of parameter server, the latest technology, about the bioinformatic problems.
Keywords/Search Tags:Genome-Wide, Parameter Server, Machine Learning, FTRL, Parallel Computing
PDF Full Text Request
Related items