Font Size: a A A

Design And Implementation Of DMP System Based On Online Video Users’s Data

Posted on:2017-04-07Degree:MasterType:Thesis
Country:ChinaCandidate:T ZhangFull Text:PDF
GTID:2308330485958269Subject:Software engineering
Abstract/Summary:PDF Full Text Request
Along with the expansion of personal computers and smartphones, and network bandwidth costs falling, watch online video in People’s Daily life has become a common entertainment or learning. Domestic several well-known online video site will have one hundred million visits a day, it will produce a large amount of user data. How to store and effectively use these data to support the company’s advertising precision marketing, user statistics, data mining, to assess the effect of different business scenarios, is the need to solve the problem.To solve this problem, I reside internship company Youku implements a DMP system, namely data management platform. When technology selection, the main consideration the following points:DMP original amount of data is very large, high capacity requirements for data processing, but generate less demanding timeliness of this data. DMP needs to have a real-time query interface to meet the requirements of external business, the generated result data of operation requires a powerful real-time computing technology to support. Taking these two points, we were using the MapReduce framework and computational framework SPARK technically, to achieve both offline and real-time computing tasks. During the work at Youku, and I am mainly involved in the completion of the DMP system requirements analysis, design, development, testing and maintenance. Working as follows:(1) participate in and complete the system needs analysis, including functional requirements and non-functional requirements.(2) participation and completion of the design of the system, including a summary of the overall design of the system-level design, data preprocessing, data merge feature, population screening and projection and other function modules.(3) responsible for the completion of the detailed design and implementation of the system of multiple modules, including the design and implementation of the detailed design of the label system, log analysis module, the user channel preferences Mining Design and Implementation, Design and Implementation Top20 subchannels user preferences, mining design and implementation of advertising keywords the user preferences, and data cleansing module design and implementation to achieve Hive UDF in function, design and data merge module, the crowd filters management interface detailed design and implementation of population projection Interface design and implementation Details.(4) responsible for the completion of a number of functional modules and the actual test cases written in functional testing.(5) responsible for system maintenance and updates, including writing automated scripts, so that each partition data is updated regularly.In the process of realization of the project, the main technology used in Hadoop’s MapReduce framework and Spark Sql, language is Java, Hive and shell scripts. Using Git for version control, Maven project management. The system has been on the line, and stable operation, the company has multiple business scenarios and advertising products has been done docking, the effect is a good response.
Keywords/Search Tags:data management, mapreduce, etl processing, url parsing, string segmentation, user preferences, real-time computing interfaces
PDF Full Text Request
Related items