The Research Of Improved Parameter Server Oriented To Genome-Wide Problem

Posted on:2017-03-11

Degree:Master

Type:Thesis

Country:China

Candidate:S Yuan

Full Text:PDF

GTID:2180330488464360

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the declining of high-throughput sequencing costs, the study on the genome-wide data begins to develop. Due to a surge in data scale, the previous methods based on traditional statistical analysis show the problems of huge workload, low efficiency and so on. Consequently, the large-scale machine learning oriented to the genome-wide becomes an important direction of research and development. Facing this problem, many organizations try to use the universal distributed computing framework such as Hadoop, Spark and so on, but the effect is not satisfying. The main reason is that the framework is not applicable to solve the problem of genome-wide machine learning. Therefore, this paper presents a distributed computing architecture based on the parameter server to cope with the genome-wide machine learning problems.Parameter server has been an emerging abstraction of the distributed machine learning framework in the past two years, which has been further applied in the large advertising system and the artificial intelligence system at present. This concept was first proposed in 2010 by Alex Smola in the design of parallel LDA framework. After that, the parameter server was widely noticed by the industry as the solution of Google Brain in 2012.The core design of its architecture is to improve the model parameter storage and update into independent components, and to use the asynchronous mechanism to enhance the processing capacity. This design can efficiently solve the inefficient iteration problem which is brought by parameter non-homogeneity in the solving process of large-scale machine learning. Besides, it also can greatly decrease the resource waste in the process of communication, coordination and wait. At the same time, this optimization also makes the model solution efficiency increase linearly with the increase of machine capacity, thus provides new idea to solve the genome-wide machine learning problem.At first, this paper systematically analyzes the difficulties of the genome-wide machine learning problems in computer technology. What’s more, it summarizes and discusses the abstract characteristics and applicability of the existing mainstream distributed computing frameworks. And then, aiming at the efficiency problem of genome-wide machine learning, this paper adapts FTRL algorithm to improve the traditional parameter server architecture, and develops an improved parameter server model called GW-PS. GW-PS can prevent over-fitting and promote the sparsity of the machine learning model, thus provide better adaptability to the genome-wide data. On this basis, this paper also improves the traditional convolution neural network structure according to the practical needs of gene sequence specificity recognition. Moreover, it respectively compares model training efficiency in detail on GW-PS and Spark architecture. The experiment proves that for genome-wide machine learning problem the GW-PS is superior to the traditional Spark architecture on both efficiency and performance. It explores the feasibility of parameter server, the latest technology, about the bioinformatic problems.

Keywords/Search Tags:

Genome-Wide, Parameter Server, Machine Learning, FTRL, Parallel Computing

PDF Full Text Request

Related items

1	Identification And Annotation Of Genome-wide Functional Elements Based On Deep Learning
2	Research On Gene-gene Interaction Detection Algorithms For Genome-wide Association Studies
3	Semi-online Scheduling For Two Parallel Machines With A Single Server
4	Research Of ADCP Signal Processing Algorithm And Implementation Based On Parallel Computing Platform
5	Research On Machine Learning Algorithm For Gene Analysis
6	Research And Development Of Dynamic Monitoring System Based On The Fusion Of Near-field Dynamics And Machine Learning
7	Design And Application Of Machine Learning Algorithms Based On Quantum Computing
8	The Study On The Gravity And Magnetic Inversion Based On Parallel Computing And Deep Learning Algorithm
9	The Application Of Parallel Computing To Spatial Scanning Of Seismicity Rate Changes (Z-Value) And Its Efficiency Evaluation
10	Application Of Parallel Computing In Environmental Engineering