Design And Implementation Of Multi-dimensional Index-oriented Video On-demand Data Analysis Subsystem

Posted on:2021-02-08

Degree:Master

Type:Thesis

Country:China

Candidate:S S Ma

Full Text:PDF

GTID:2428330632462632

Subject:Computer technology

Abstract/Summary:

PDF Full Text Request

With the development of communication technology and the popularity of the Internet,more and more families choose to watch on-demand to get TV programs.Program producer and advertising company are also increasingly concerned about the user's video on-demand data,in order to correct program structure and advertisement placement.Traditional video on-demand data collection uses a relatively crude sample statistics method,which has low accuracy and is not timely.With the development of data science,the use of big data technology for full sample analysis is gradually becoming the mainstream method for on-demand data analysis.Therefore,it is important to establish a highly scalable and flexible on-demand data analysis system.Some existing on-demand data analysis systems often rely on a data warehouse engine for data aggregation,and the reusability of analysis results is not high.Although open-source data analysis tools such as Kylin excel at the flexibility of multidimensional data aggregation,they lack time performance and lack the ability to calculate long-term indicators.By investigation of the large conventional data analysis techniques,aiming at analyzing needs of the user,this paper decides to use the Spark frame for offline analysis of user records.The back end of the system adopts a distributed architecture.Hive is used to store user records,and Oracle database is used to store analysis results.In addition,the system uses an object-oriented modeling method to decompose the core functions into five modules and design the corresponding interfaces.This paper divides the process of on-demand data analysis into two stages of generating intermediate results with multi-dimensional information and index aggregation,and designs the input,output and execution processes of each stage in detail.This design increases the flexibility of result generation,reduces the waste of resources caused by repeated calculations,and makes the entire system work more efficient.This paper also designs and implements a distributed data deduplication algorithm based on the Bloom filter,which solves the calculation problem of deduplication indicators for a long period of time with a low memory footprint.Through detailed requirements analysis and module design,this paper finally implements the video-on-demand data analysis subsystem.After a series of experimental tests,the system has excellent performance in the time and space performance of long-term indicator analysis;meanwhile,it has good scalability in indicator analysis,which has achieved the purpose of system construction.

Keywords/Search Tags:

Big data analysis, Data deduplication, System design

PDF Full Text Request

Related items

1	Research Of Data Deduplication In Data Disaster Tolerance Systems
2	HTDRDedu:The Design And Implementation Of A Distributed Backup Data Deduplication System
3	Research On Duplicate Data Detection In Data Deduplication
4	Study On Data Deduplication Technique For Data Backup Systems
5	Research On Efficient Data Deduplication Storage Technology Based On Intelligent Analysis Of Data Stream
6	Design And Analysis Of Intelligent Prefetching Algorithm For Data Deduplication
7	The Design And Implementation Of Data Deduplication With Garbage Data Removal Policy
8	Design And Research On A High-performance Deduplication System
9	Research And System Implementation Of Deduplication Algorithm Based On Distributed Storage
10	Big Data Encryption Algorithm Based On Data Deduplication Technology