Font Size: a A A

Research And Application Of Outlier Data Detecting And Fault Recovery In Stream Network

Posted on:2022-10-10Degree:MasterType:Thesis
Country:ChinaCandidate:X Q ZhangFull Text:PDF
GTID:2518306347473214Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the Big Data and the rapid development of Internet of Technology,various industries have higher and higher requirements for real-time processing,and various real-time computing systems emerge as the times require.Batch processing and stream processing are two forms of big data processing.Batch processing has a long history and its development is relatively mature.Stream processing has gradually emerged in recent years and began to occupy the main position in the market.In the process of data generation and processing,it will cause disorder due to network delay,transmission speed and other reasons.Ensuring the sequential processing of data is conducive to improving the accuracy of the calculation results,which is the first problem we need to resolve.On the other hand,Some unexpected accidents will affect the running stream processing system.How to effectively recover the fault and ensure the normal running of the system is the second problem of this paper.For the above two problems,the current solutions are mainly as follows.For the problem of data disorder,in classic big data frameworks such as Strom and Apache Flink adopt the method of allowing fixed delay.This method solves the disorder of data to a certain extent,but it cannot adapt to the complex and changing data flow.The fault recovery technology of stream processing system mainly includes active backup technology,passive backup technology,upstream backup technology and checkpoint technology.Checkpoint technology uses data rollback to recover,which consumes a lot of time.We have put forward new proposals based on the above two aspects.Aiming at the first problem,we propose a sliding time window of low watermark with dynamic allowed delay.The new method can dynamically change delay according to the number of stream data.On the other hand,we use the window to distinguish four different types of data.Stragglers are one type of the data.Experimental results show that our method greatly reduces the dropped rate of data and can effectively distinguish stragglers.For the second problem,we distinguish the primary node from the non-primary node and adopt different recovery strategies.We improve the active backup technology for the primary node.In order to ensure exactly-once processing,an unique identifier UID is added to the output data of the upstream node of the primary node.The data generated by the master node and its backup node adds the corresponding UID-i.The downstream nodes that receive the data are distinguished by UID-i to ensure that the data is processed exactly once.For the non primary node,we use the state consistency checkpoint mechanism based on Chandy-Lamport algorithm,which uses a special marker to segment the data,and uses it as the trigger mode of the checkpoint to achieve the data consistency at the state level.In this paper,the research is verified under the e-commerce platform.The experiment proves that our method can effectively distinguish outlier data,which can be used for user abnormal behavior analysis and other aspects.At the same time,distinguishing the state consistency failure recovery plan of the master node It can also be well adapted to e-commerce platforms,and its recovery time is relatively stable and less time-consuming.
Keywords/Search Tags:stream processing, out-of-order data, stragglers, fault recovery
PDF Full Text Request
Related items