| With the implementation of China's big data strategy,the application of big data will be booming,and big data will play an important role in all aspects of upgrading the industrial structure.Various industries in China have accumulated a wealth of data resources,and companies have steadily increased their investment in big data.In the process of processing and processing the entire link of big data,there is low efficiency of upstream and downstream collaboration of data production,the cost of interpretation of data delivery is high,and there is a problem of customized development of heterogeneous data exchange.Therefore,how can it be effective in engineering?Collaboration promotion is the focus of research topics.In the entire ecology of big data,the dispatching system plays a role in linking up with others.In the case of timing triggering and dependency triggering,it is possible to improve the close cooperation of upstream and downstream collaborations on the basis of content-aware scheduling and break the traditional task-based triggering.Without paying attention to the content of the data stream.The main research content of this article is as follows:(1)An open ETL model is constructed:When the heterogeneous data of big data is exchanged,the details of the data engine are read and written,and only the upstream and downstream rules of data and data are concerned.(2)Proposed a data upstream and downstream model based on definition and meet the rules:Through the analysis of the read engine grammar,it analyzes the kinship model that depends on the field separation between upstream and downstream collections.Provide data dependencies and link process interpretation.(3)A refined scheduling strategy is proposed:the minimum operating unit is used to satisfy the dependency analysis based on content understanding,and the condition is the operating strategy.The Baidu take-out big data scheduling task was used for empirical research.The proposed scheduling strategy was significantly improved in application execution delivery time.This paper provides valuable reference and reference for the current research of big data scheduling system,and proves the effectiveness of upstream and downstream collaboration based on field granularity content understanding through empirical analysis,and provides a certain reference value for effective big data efficient scheduling and theoretical significance. |