Font Size: a A A

The Research Of Key Techniques Of Incremental Computing For DAG- Based Framework

Posted on:2018-04-02Degree:MasterType:Thesis
Country:ChinaCandidate:J KanFull Text:PDF
GTID:2348330563952428Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of the information age,the explosive growth of data volume and the boost of computer performance led people into the era of large data.In the face of large data set,the distributed computing framework must be used to deal with it effectively,and the distrubuted computing framework that use Directed Acyclic Graph(DAG)model as the jobs' logical relationship method is one of the most popular large data computing solutions.Because large data is usually only incremental way to update the data set,making large data in the storage are usually incremental.The current DAG computing framework still has many challenges when computing such a data set.First,the lack of incremental awareness of the capacity,making the data after the re-calculation takes up too much computing resources;Second,the lack of reusable computing,especially the similarity of the identification and reuse.The existing work mainly improves from the application algorithm level and the computational framework level.The improvement from the application algorithm level only applies to specific computing pipelines that can not be optimized in a user-transparent manner;the improved method from the computational framework level is more demanding on data and computational logic.In order to solve the existing problems and increase the scope of optimization and make sure the optimization is transparent to the user,this paper realizes the dynamic cache management strategy by constructing the Cost Model by introducing the indirect reusing and the method of operator cropping.The main contributions of this paper are as follows:(1)The identification model of direct reuse and indirect reuse in DAG is established.The properties of reusable operators in DAG computing framework are extracted,and the direct and indirect reuse are analyzed and defined.The reusable part of the DAG can be identified by the recognition model.(2)Designed and implemented the incremental computing reuse framework based on DAG computing framework.In this paper,we propose a three-step incremental computing and reuse process based on DAG node preprocessing,reusable computing,matching and incremental computation.The partially matching and splitting mechanism of Filter operator is realized by the processing strategy of Filter operator in DAG.The matching mechanism of DAG reusable fragment based on FQ-Tree and the incremental computing strategy are used to realize the indirect reuse based on Filter operator with the mechanism.(3)Designed and implemented the cache management mechanism.Alluxio-based multi-media cache storage strategy is designed.The maintenance strategy and related algorithms based on FQ-Tree cache information are realized,which can make the cache system provide meta information for the matching and recognition of the operator.By designing a Cost Model that can use frequency,reuse type,and time correlation,the cache system can balance the cache block's benefit.(4)Designed and implementated of a series of performance evaluation experiments.The DAG incremental computing reuse framework proposed in this paper reduces the average computation time of the computing task by 32.49% under the same computing environment and the workload condition by test the system's performance with the workload generated at a reasonable mixing ratio.
Keywords/Search Tags:Distributed Computation, DAG Computation, Incremental Computation, Computation Reuse
PDF Full Text Request
Related items