Font Size: a A A

Research And Implementation Of Classification Algorithm Based On User's Rating Behavior

Posted on:2017-02-19Degree:MasterType:Thesis
Country:ChinaCandidate:Y L XuFull Text:PDF
GTID:2428330596957415Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the development of Internet technology,IPTV(Internet Protocol Television)has been flourish,mass IPTV user behavior data was created.Data mining of massive data of IPTV user viewing behavior contribute to the division of different groups of users,make more targeted and more accurate analysis results,IPTV promote the sound and rapid development,has very important practical significance and broad application prospect.Nowadays,with the arrival of large data technology and cloud computing era,it is possible to extract accurate and efficient knowledge from massive IPTV user viewing behavior data.In this paper,a high-performance large data processing platform is designed to improve the computing performance of classical random forest classification algorithm under large data environment.The characteristics of IPTV user's viewing behavior data and the pretreatment flow are studied emphatically.The IPTV user viewing behavior data are studied from the time perspective.This paper mainly focuses on the following points:1)In this paper,according to the mass IPTV user rating behavior data,we constructed a HDFS Hadoop as a storage platform,with Hadoop YARN as the core of resource management,the high performance and large data parallel computing platform based on Spark.2)When faced with massive data,the classical random forest algorithm for continuous attribute discretization of computational complexity,this paper presents a stochastic forest boundary point algorithm Fayyad based on the principle of judging(FayyadRandomForest,FRF).Based on this idea,we can reduce the time complexity of the discretization of the continuous valued attributes,improve the efficiency of the construction of the random forest,and finally verify the FRF deployment on the big data platform.3)In this paper,the IPTV user viewing behavior data characteristics of a detailed analysis of the proposed IPTV user ratings behavior data pre-processing principles and implementation process.Then it introduces the concrete performance of IPTV user's viewing behavior data in the real scene.Finally,based on the large data platform and the improved stochastic forest algorithm(FRF),the frequency of the users' TV is classified according to the user's viewing behavior data.
Keywords/Search Tags:IPTV, User Viewing Behavior, Spark, Fayyad Boundary Point Decision Principle, Random Forest Algorithm
PDF Full Text Request
Related items