Font Size: a A A

Research On Key Techniques Of Big Data Model Parsing System

Posted on:2018-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:C Y TaoFull Text:PDF
GTID:2348330512988925Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
With rapid development of mobile devices,communication and information technology,human have entered the age of big data.The magnitudes of data produced by all trades has developed from GB,TB all the way to the PB level.Various tools for handling big data also appear.Since the open source big data platform Hadoop's coming into being,a variety of tools for handling large data also appear,such as Hadoop,Hive,Pig,Sqoop and other Hadoop family tools,or MapReduce-like parallel computing engine such as Spark and so on.In the real process of data analysis,data analyst often use these tools in a combination,which form into a big data workflow.Because of the highly specialized tools above,ordinary users need to have a good understanding of their inner workings.To use these tool is a high threshold,which leads to low reusability of big data workflow developed by professional user.Taking this situation into account,a big-data-orented model parsing system is proposed according to the idea of modeling the big data process.At first this thesis gave an explanation to the idea of modeling the big data process and decribed the language specification of the data model metadata.To meet the need of supporting expanding kinds of big data processing tools,the mechanisms in which a big data model's metadata transform into an Oozie workflow is proposed,based on the rule engine's characteristics which separate logic from code.Then this thesis described the specific rules for model parsing.In order to speed up the rule match process,this thesis took a deep research into the structure and inner working of Rete network.According to the actual characteristics of model parsing rules,a constraint-based optimazation stategy for the Rete network's construction process is proposed by this thesis.By sorting constraints according to its frequency before compilation,we can build Rete network of higher node sharing level.In order to meet the needs of model reuse,this thesis analyses the existing implementation and its shortcomings of model reuse in model designing and running porcess.Then this thesis proposed the parsing method of reusable models.In order to speed up the pace of reusable model parsing,a localized copy stategy is proposed by this thesis according to combinedly weighted factor basing on the fact that the big data model is deployed on HDFS.During the file copy process,the DataNode's network distance,load and hdfs space usage is taken into consideration when choosing DataNode for each Block to place on.Finally,the prototype system is presented in end of this thesis.Experiments on constraint's frequency-based optimization strategy of Rete network building and localized copy stategy according to combinedly weighted factor are successfully designed and conducted.The outcome of these experiment has sucessfully proved above strategies's validity.
Keywords/Search Tags:big data workflow, rule engine, Rete network, HDFS Block, Oozie
PDF Full Text Request
Related items