Research On Reusable Iterative Computing In Big Data Environment

Posted on:2016-07-11

Degree:Master

Type:Thesis

Country:China

Candidate:C P Guo

Full Text:PDF

GTID:2348330512970872

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the development of data mining,machine learning and the related areas,iterative computing is becoming more and more applicable under the background of big data.However,there are some situations where the iterative algorithm is not compatible.Low density of values is the typical feature of the big data.In order to get the better result,the data scope of iterative algorithm is shifted from one data set to another.The inputs of several times of computing are overlapped and distinguish,but the previous results of the overlapped dataset can not be reused.Meanwhile,time-variability is another of the characteristics of big data.When the data changes,the iteration result of the original data would be no longer applicable.As a consequence,the iterative algorithm needs to run the whole data once again,which wastes a lot of time and resources.So,it is a requirement and also a challenge to reusing the iterative results,avoiding iteration all the whole data set,and only iterating on changed part.It will be a brand new iterative computing method which adapts to big data environment.First of all,this thesis proposed a reusable iterative computing model which was suitable for the majority of the algorithms.Without sacrificing the iteration accuracy,this model can obtain new iteration data by means of the original iteration result and the data variation.Reusable iterative computing model includes the original iteration,pruning iteration,incremental iteration and merged iteration.The original iteration was finished before the reusable iteration started.By means of theoretical validation and experiment analysis,this thesis proved that the reusable iterative computing model was correct and advantageous.Second,the attribute of iterative algorithm is characterized by its iteration variables.The iteration variables have great influence on the iterative convergence rate of the iterative algorithm.This thesis proposes an optimized algorithm regarding how to select the initial value of the iteration variable to boost computing efficiency of iterative algorihtms.Third,for the reason that the nodes’ computing competence is different in heterogeneous cluster,they would wait for each other during synchronization in the distributed computing environment.This would cause computing resources wastes,which affects computing performance severely.Because there is synchronization operation in each iterative step,the abrove issue is more critical in iterative computing.From the perspective of load balancing,this thesis presented the load balancing algorithm based on task distributaion and task ajustment in order to perform parallel tasks among nodes and improve the performance of iterative computing.Finally,by a large number of experiments,this thesis testified the correctness and advantages of the proposed model and algorithm.The experimental results show that the reusable model proposed is suitable for a lot iterative algorithms and can improve their performance significantly;that the initial point selecting algorithm can be well applied to matrix oerpation algorithms and data clustering algorithm,boosting their execution performance;that the load balancing algorithm is suitable for MapReduce and Spark framework,improving the computing performance.In this thesis,we study the model and optimization approaches of reusable iteration,which will be of great significance to the research and practice of iterative computing in big data environment and can guide the optimization in iterative computing framework.

Keywords/Search Tags:

Big Data, Iteartive Computing, Iteartive Algorithm, Reusable, Load Balancing

PDF Full Text Request

Related items

1	Research On Load Balancing In The Construction Of Cloud Computing Data Center
2	Research On CDN Load Balancing Algorithm In Fog Computing
3	Research Of Load Balancing Strategy In Cloud Computing
4	An Intermediate Data Placement Algorithm For Load Balancing In Spark Computing Environment
5	Research And Implementation Of A Load Balancing Technology Based On Data Correlation In Cloud Computing
6	A Load Balancing Algorithm Based On LPM
7	Research On Load Balancing Technology In Edge Computing
8	Research And Application Of Load Balancing Technology In Semantic Switch
9	Research On Load Balancing Strategy Based On SLA Optimization In Cloud Computing
10	The Research Of Load Balancing Policy Based On Grid Computing