Font Size: a A A

The Hadoop-based Statistics Of Mass Data On Huge Website And Its Application

Posted on:2013-03-04Degree:MasterType:Thesis
Country:ChinaCandidate:J H WuFull Text:PDF
GTID:2248330371488337Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, there are more and more requirements for the Internet of the people. However, the requirements of the people for the Internet are different, and often reflect certain preferences. For a website that talked about, it stores the log data of the user behavior, and the data are massive. In order to analyze the user behavior characteristics, get user attributes, test the result of advertising, dealing with the mass data is the best way that we get the user behavior characteristics.There are a lot of researches about processing mass data, and some open source software frameworks have been developed. The most popular framework is the Hadoop distributed software framework. The developers can efficiently handle these mass data while using the software framework. Therefore, many developers have paid attention to the Hadoop.This user behavior analysis project is to analyze the user behavior characteristics. We process these mass data, and mine user behavior. In this project, we use the most popular distributed software framework Hadoop and Hive to deal with these mass data. The project is mainly divided into the following several parts:distinguish people and classify them, the overall data statistics, advertising data statistics, cookie statistics, brand probe and the whole website path statistics.This paper shows the design and implementation of these parts, and also gives some analysis of some parts. First, the paper introduces the related technology Hadoop, and then introduces some details of our data. Then it introduces the function of each part and how to use the distributed software framework Hadoop to help us to handle mass data. Finally, it summarizes the paper simply, and points out that the disadvantages of the project and in which the Hadoop can be optimized.
Keywords/Search Tags:Mass data, User Behavior, Statistics, Hadoop, Hive
PDF Full Text Request
Related items