Font Size: a A A

Design And Implementation Of A Distributed And Real Time Video Stream Data Processing Platform Based On Spark

Posted on:2017-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:C D YangFull Text:PDF
GTID:2348330518495778Subject:Computer technology
Abstract/Summary:PDF Full Text Request
The past few years have seen a major change in big data,as growing data volums and processor speeds reguire most companies to shift their focus on distributed platforms.Today,along with the development of the Internet of things,more and more smart devices is joining the network.A myriad data sources,from traditional surveillance cameras and network video to all kinds of imaging device on the intelligent terminals,produce large and valuable video data streams.However,it's a difficult challenge for most companies to dealing with such a huge amount of unstructured data,to say nothing of meeting the need for real-time requirements.This dissertation design and implement an architecture for distributed and real time data processing that can provide reliable realtime service while coping with great magnanimity of video data processing workloads.In this paper involves the main work as follows:First of all,I designed the overall framework of the platform and the architecture of the three subsystems,including video stream ETL system,distributed real time video stream processing engine and distributed file system.Second,I build the video stream ETL system,including extracting,transforming and loading.According to the feature of the video data,the open-source message broker project-Kafka is used in this system to tackle loading distributed data.And third,I implemented a distributed real time video stream processing engine based on Spark,which provide three solutions for real-time processing.Then,I used a face recognizer algorithm in real-time processing.We explore the availability of this platform from both a theoretical prespective and a practical prespective.Additionally,our platform allows real-time processing and SQL to be combined,enabling rich new applicatins.Finally,this paper implemented a two-tier file system structure,including HDFS on disk level and Tachyon,a memory-centric distributed storage system,on memory level.We put Tachyon between the underlying file system and calculation engine.The system whill have a higher execution efficiency by separating calculation and memory management.
Keywords/Search Tags:distributed, realtime, video streaming, Spark
PDF Full Text Request
Related items