Research On Baidu Take-out Big Data Refined Scheduling System

Posted on:2019-01-22

Degree:Master

Type:Thesis

Country:China

Candidate:F K Liang

Full Text:PDF

GTID:2428330551957833

Subject:Engineering Management

Abstract/Summary:

With the implementation of China's big data strategy,the application of big data will be booming,and big data will play an important role in all aspects of upgrading the industrial structure.Various industries in China have accumulated a wealth of data resources,and companies have steadily increased their investment in big data.In the process of processing and processing the entire link of big data,there is low efficiency of upstream and downstream collaboration of data production,the cost of interpretation of data delivery is high,and there is a problem of customized development of heterogeneous data exchange.Therefore,how can it be effective in engineering?Collaboration promotion is the focus of research topics.In the entire ecology of big data,the dispatching system plays a role in linking up with others.In the case of timing triggering and dependency triggering,it is possible to improve the close cooperation of upstream and downstream collaborations on the basis of content-aware scheduling and break the traditional task-based triggering.Without paying attention to the content of the data stream.The main research content of this article is as follows:(1)An open ETL model is constructed:When the heterogeneous data of big data is exchanged,the details of the data engine are read and written,and only the upstream and downstream rules of data and data are concerned.(2)Proposed a data upstream and downstream model based on definition and meet the rules:Through the analysis of the read engine grammar,it analyzes the kinship model that depends on the field separation between upstream and downstream collections.Provide data dependencies and link process interpretation.(3)A refined scheduling strategy is proposed:the minimum operating unit is used to satisfy the dependency analysis based on content understanding,and the condition is the operating strategy.The Baidu take-out big data scheduling task was used for empirical research.The proposed scheduling strategy was significantly improved in application execution delivery time.This paper provides valuable reference and reference for the current research of big data scheduling system,and proves the effectiveness of upstream and downstream collaboration based on field granularity content understanding through empirical analysis,and provides a certain reference value for effective big data efficient scheduling and theoretical significance.

Keywords/Search Tags:

big data, refined scheduling, kinship analysis, open etl, minimum operating unit

Related items

1	Specific Subnets Extraction Algorithm Based On Kinship Network
2	Research On Link Analysis And Recommendation Techniques In Large-scale Open Source Software Resource Task Unit
3	Research On Kinship Recognition Method Based On Face Images
4	Image-based Kinship Analysis And Research
5	Kinship Verification Using Facial Images Based On Local Discrimination
6	Research On Clustering Algorithm Of Streaming Data
7	Research On Facial Kinship Verification Methods For Service Robot
8	Research On Scheduling For Open Real-Time Systems
9	WeChat Kinship Group: Preferences,Etiquette And Taboos In Communication
10	Open Real-time Linux Research And Design