Font Size: a A A

Research On The Reliability Of ? Join For Multi-way Data Streams

Posted on:2021-01-01Degree:MasterType:Thesis
Country:ChinaCandidate:M H LuFull Text:PDF
GTID:2428330623465010Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the advent of the Internet of everything era,more and more data are produced,which makes the data increase sharply in both scale and dimension,such as location,preference,movement trajectory,etc.,so the computing demand for big data is also getting higher and higher.Many "big data" applications have to process data in real time.However,due to different devices,the data obtained will inevitably be relatively single,which makes it difficult for a single data stream to get all the information of users.If you want to get all the information of users,you need to integrate all the data streams together.The Join operation is an effective means of connecting different data streams,which plays an important role in the operation of multi-channel data streams.Once one of the data is lost,it will have a great impact on the connection operation of the data stream and fail to provide accurate services for users.The proliferation of data streams requires larger clusters to run these applications on existing computing platforms such as Storm,Spark,Flink,and so on.These big data platforms can handle failures and data loss.This paper proposes a new fault-tolerant mechanism for data flow,which can further improve the recovery efficiency of data flow.In this paper,a new data recovery algorithm based on erasure correction code is proposed,which is used to recover data when the data in the window is lost.At the same time,the data stream fault-tolerant algorithm based on erasure correction code is also used in the connection operation of the data stream.This paper first introduces the background of big data generation and related technologies of data flow connection,and then introduces different computing models,as well as the advantages and disadvantages of this model in terms of fault tolerance.Finally,we studied how to remedy delete code based data recovery algorithm is applied to distributed stream processing program,compute cluster the data streams arrive when the data window treatment,at the same time,used in each window display delete code algorithm to encode the data,if after the loss of data,use of rectifying delete code algorithm for encoding data after decoding,recovering the original data.The main innovation of this paper is that unlike the traditional data fault-tolerance methods in the past,most of them use data storage in the database(such as HDFS)to achieve fault-tolerant processing of data.In this paper,after the data stream reaches the application,it directly encodes the data to achieve fault-tolerant processing of the data.
Keywords/Search Tags:data stream, connection, fault tolerance, erasure coding
PDF Full Text Request
Related items