Font Size: a A A

The Design And Implementation Of An Online Video Site Data Statistic System

Posted on:2013-12-31Degree:MasterType:Thesis
Country:ChinaCandidate:J WuFull Text:PDF
GTID:2248330371988152Subject:Software engineering
Abstract/Summary:PDF Full Text Request
During the operation of a website, people need to keep abreast of their operating conditions for marketing strategy. The most effective way of understanding the operating status of a site is to analyze several types of data, such as site visits, number of visitors, which are the indicators of good guidance. For online video sites, there should be more customized indicators.There are many third-party statistics agencies or tools now, such as iResearch, Comscore and GA, which can provide the most basic and general data. But for executives, they need indicators that have their own characteristics and can be customized, other than those generalized, detailed data. In addition, third-party statistics also have a delay (usually1-2months delayed) issue.This paper introduces a solution that the website operators develop the data statistic and analysis system themselves. The technical department of the site uses operation logs as the initial data source, then builds the data warehouse and does data analysis and mining work based on it, after data cleaning, conversing and extracting according to high-level business requirements. From the development view, this solution uses star model to construct data warehouse based on the theme analysis, then processes ETL on logs according to the structure of the data warehouse. During data ETL, the system gets rid of redundant information, extracts, calculates the needed data from logs, and eventually loads them into Hive data warehouse. After ETL, HQL, a data query and calculation tool provided by Hive is adopted to complete the data analysis work under the control of Oozie, a workflow control tool. Then the analysis results are stored in databases, and eventually displayed by reports, graphics, etc.This system has been widely used within the company, and has become the most direct and timely tool for executives. In addition, some systems are developed based on data warehouse that this solution builds, such as recommendation system, search system and ranking system, which has brought at least50million W per day.
Keywords/Search Tags:Website operations, Data Analysis, Data Warehouse, Hadoop, Hive, Oozie
PDF Full Text Request
Related items