Font Size: a A A

Design And Implementation Of Multi-dimensional Index-oriented Video On-demand Data Analysis Subsystem

Posted on:2021-02-08Degree:MasterType:Thesis
Country:ChinaCandidate:S S MaFull Text:PDF
GTID:2428330632462632Subject:Computer technology
Abstract/Summary:PDF Full Text Request
With the development of communication technology and the popularity of the Internet,more and more families choose to watch on-demand to get TV programs.Program producer and advertising company are also increasingly concerned about the user's video on-demand data,in order to correct program structure and advertisement placement.Traditional video on-demand data collection uses a relatively crude sample statistics method,which has low accuracy and is not timely.With the development of data science,the use of big data technology for full sample analysis is gradually becoming the mainstream method for on-demand data analysis.Therefore,it is important to establish a highly scalable and flexible on-demand data analysis system.Some existing on-demand data analysis systems often rely on a data warehouse engine for data aggregation,and the reusability of analysis results is not high.Although open-source data analysis tools such as Kylin excel at the flexibility of multidimensional data aggregation,they lack time performance and lack the ability to calculate long-term indicators.By investigation of the large conventional data analysis techniques,aiming at analyzing needs of the user,this paper decides to use the Spark frame for offline analysis of user records.The back end of the system adopts a distributed architecture.Hive is used to store user records,and Oracle database is used to store analysis results.In addition,the system uses an object-oriented modeling method to decompose the core functions into five modules and design the corresponding interfaces.This paper divides the process of on-demand data analysis into two stages of generating intermediate results with multi-dimensional information and index aggregation,and designs the input,output and execution processes of each stage in detail.This design increases the flexibility of result generation,reduces the waste of resources caused by repeated calculations,and makes the entire system work more efficient.This paper also designs and implements a distributed data deduplication algorithm based on the Bloom filter,which solves the calculation problem of deduplication indicators for a long period of time with a low memory footprint.Through detailed requirements analysis and module design,this paper finally implements the video-on-demand data analysis subsystem.After a series of experimental tests,the system has excellent performance in the time and space performance of long-term indicator analysis;meanwhile,it has good scalability in indicator analysis,which has achieved the purpose of system construction.
Keywords/Search Tags:Big data analysis, Data deduplication, System design
PDF Full Text Request
Related items