| With the development of data mining,machine learning and the related areas,iterative computing is becoming more and more applicable under the background of big data.However,there are some situations where the iterative algorithm is not compatible.Low density of values is the typical feature of the big data.In order to get the better result,the data scope of iterative algorithm is shifted from one data set to another.The inputs of several times of computing are overlapped and distinguish,but the previous results of the overlapped dataset can not be reused.Meanwhile,time-variability is another of the characteristics of big data.When the data changes,the iteration result of the original data would be no longer applicable.As a consequence,the iterative algorithm needs to run the whole data once again,which wastes a lot of time and resources.So,it is a requirement and also a challenge to reusing the iterative results,avoiding iteration all the whole data set,and only iterating on changed part.It will be a brand new iterative computing method which adapts to big data environment.First of all,this thesis proposed a reusable iterative computing model which was suitable for the majority of the algorithms.Without sacrificing the iteration accuracy,this model can obtain new iteration data by means of the original iteration result and the data variation.Reusable iterative computing model includes the original iteration,pruning iteration,incremental iteration and merged iteration.The original iteration was finished before the reusable iteration started.By means of theoretical validation and experiment analysis,this thesis proved that the reusable iterative computing model was correct and advantageous.Second,the attribute of iterative algorithm is characterized by its iteration variables.The iteration variables have great influence on the iterative convergence rate of the iterative algorithm.This thesis proposes an optimized algorithm regarding how to select the initial value of the iteration variable to boost computing efficiency of iterative algorihtms.Third,for the reason that the nodes’ computing competence is different in heterogeneous cluster,they would wait for each other during synchronization in the distributed computing environment.This would cause computing resources wastes,which affects computing performance severely.Because there is synchronization operation in each iterative step,the abrove issue is more critical in iterative computing.From the perspective of load balancing,this thesis presented the load balancing algorithm based on task distributaion and task ajustment in order to perform parallel tasks among nodes and improve the performance of iterative computing.Finally,by a large number of experiments,this thesis testified the correctness and advantages of the proposed model and algorithm.The experimental results show that the reusable model proposed is suitable for a lot iterative algorithms and can improve their performance significantly;that the initial point selecting algorithm can be well applied to matrix oerpation algorithms and data clustering algorithm,boosting their execution performance;that the load balancing algorithm is suitable for MapReduce and Spark framework,improving the computing performance.In this thesis,we study the model and optimization approaches of reusable iteration,which will be of great significance to the research and practice of iterative computing in big data environment and can guide the optimization in iterative computing framework. |