Font Size: a A A

Research Of User Behavior Model Based On Hive Data Warwhouse

Posted on:2016-10-04Degree:MasterType:Thesis
Country:ChinaCandidate:J T PanFull Text:PDF
GTID:2298330467491890Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the development of information technology, people interact with each other on the Internet is becoming more and more common, the feedback mechanism through the log server has been able to track down easily the user’s behavior. Massive amounts of data generated in the interactive process, it contains a lot of valuable information. Faced with the rapid growth of the PB-level mass data, the need for effective collection, storage, distributed computing and data mining, it will maximize the value of big data.This paper comes from the laboratory and the Internet company’s cooperation projects, it based on data warehouse and Hadoop computing platform, by improving the k-means clustering algorithm to study user behavior, including the following:(1Introduces the Hadoop Distributed system infrastructure and data warehouse system, through a distributed storage Hbase, mapreduce distributed computing, it can achieve efficient mass data processing and analysis.On this basis, the paper analyzes the existing music data warehouse architecture, including architecture, ETL processes, themes division, dimension table structures, etc., as well as music metadata management system.(2)Introduction and comparison of various types of cluster analysis algorithm, after considering the amount of data and timeliness characteristics Hive music data warehouse, the paper chooses K-means algorithm, and traffic cleaning, initial selection, Outlier removal of three aspects of the optimization algorithm.By Hadoop cluster algorithm average running time and the number of iterations two indicators for performance analysis and algorithm assessment, after optimization, algorithm efficiency of about45%, which is already available in the actual work.(3)Through the data warehouse multidimensional model of user behavior clustering analysis, from the user level to help analysts get a more accurate and effective product evaluation and user evaluation, mainly in the overall user quality assessment, active users of the secondary cluster analysis, users quality historical trajectory, these three areas full of user clusters for data mining to provide decision for product operations, helping the music business in a timely manner and overall grasp the changes of users, providing more targeted to different types of music users personalized service, ultimately increasing the music business profits and market share.
Keywords/Search Tags:Web log mining, data warehouse, multidimensionalmusic user model, K-means clustering algorithm
PDF Full Text Request
Related items