Font Size: a A A

Recommendation Algorithm Based On Random Forests And Boosting Thought Research

Posted on:2016-06-15Degree:MasterType:Thesis
Country:ChinaCandidate:Z H JiaFull Text:PDF
GTID:2308330464953729Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Recommendation algorithm ultimate goal is a certain way to link users and merchandise to consumers and producers to find their true needs and target audience from "information overload", thereby greatly increasing the effective utilization of information. Personalized recommendation core research and technology lies in its ability to recommend, such as how to use the appropriate recommendation algorithm to enhance the accuracy of the information to be recommended to improve the fit and user interest information, improve user confidence and surprises of the algorithm, and the algorithm efficiency and the like. To overcome these bottlenecks, researchers continue to seek newer and more efficient recommendation algorithm.This study is the main face of big data in e-commerce, the user merchandise matrix will grow to unimaginable proportions, while in front of a large data, explicit feedback related information between users and items missing, invisible tremendous feedback data features. How to deal with a large number of "sparse data" recommendation and how to make your computer run the core algorithm selected to maximize the efficiency, both to meet the needs of users on the basis of existing personalized recommendation algorithm, to find a way to make the efficiency and accuracy higher recommendation algorithm. Based on the above objectives, we will, as we study the fusion algorithm recommendation system and focuses on the proposed fusion theory of random forests and boosting the algorithm used to reconstruct the ideological recommended.In the feature set of model building, we will need to spend a lot of energy before the extraction of feature selection feature construction work, researchers have long concerned about the characteristics of the excavation, instead of doing feature selection related work, greatly reducing the workload and other aspects of feature selection. We first constructed sequence features from the user, and user objects article three, while in the construction of model feature when using the time forget the law of sociology, an important basis for the time context information as a feature structure, continue to affect the experiment on a smooth computing time factor. In the feature set of configuration, use mapreduce hadoop under programmed to solve computational problems million level data.By studying the decision tree model, process data at smaller scale model of a single tree defects, analysis of the combination of random forests and other tree model algorithm, boosting the promotion of thought, as a contribution to the residual data errors based on the above model :has been The feature set of random forests and boosting algorithm calculates predicted, but in the case of random forests for a high degree of secondary features associated with unreliable results, will focus on the degree of random forests algorithm in the context of a smaller level characteristically, through the continuous smooth movement of time, get a different result sets, with a high degree of correlation boosting tree processing feature set to give time under several sets of results after smoothing set by linear regression model to predict the results of linear fusion to get the final recommendation result. After testing, the model in the big data environment is not only computationally feasible to achieve, but also on the accuracy of the algorithm has been improved. In the local data set experiment, Alibaba April to July of small-scale experiment data sets that will eventually algorithms on large data Alibaba contest experimentally analyzed and the algorithm achieved good results. The main work of this paper in the following areas:(1) Recommended existing systems theory has been studied, the existing recommendation algorithm based on neighborhood, the recommendation algorithm and insinuations intended to model the three types of graph-based recommendation system have been studied systematically, respectively, to achieve the principle according to different algorithms advantages and disadvantages of various algorithms. But in the e-commerce scene big data, user commodity matrix is too large, data dimensions are too high, the dominant feedback and invisible feedback imbalances, in the case of the above-described algorithm to analyze too complicated to calculate, using machine learning theory to reconstruct recommended, machine learning applications in large-scale data to implement recommended;(2) Recommended system important area of research focus is the integration and large-scale distribution algorithm. In this paper, large-scale data on e-commerce platform, data modeling process, the data processing of parallel thinking on different data using mapreduce programming hadoop aggregation of data, sampling and processing characteristics of construction, get the model feature set, hadoop computing interfaces provided by the Lynx ODPS platform, and uses mapreduce ensure the realization of the algorithm;(3) Pulled out of the feature extraction, feature extraction has been recommended system is quite tedious work, merit study paper tree model algorithm, the problem will recommend research into user behavior characteristics, greatly reducing the tedious work to bring the feature extraction. In-depth user behavior mining operations. From the user, product and interactive features to construct feature set, and then combined with collaborative filtering and graph algorithms, the collaborative filtering and graph theory as a result of a set of features set to treat algorithm integration. Forgetting time by studying law, to give human behavior influence the forgotten law, then the time distribution(4) Fusion algorithm, starting from tree to tree model fusion process, and then implement random forest algorithms to experience the power of fusion algorithm. Based on the research tree model, random forests and boosting in-depth study of regression tree model. In the random forest characteristic set of related training results unreliable defect, proposes to integrate new ideas and boosting Random Forest algorithm based on the idea, will be randomly selected forest training feature set on the relevance of features as small as possible, based on the obtained results and boosting tree results do temporal smoothing of the resulting value obtained by linear regression result set using TopN selection final result. Ali Baba in the local small-scale use data validation, improve the algorithm results and the use of the algorithm used in the Alibaba large data competition, the model has also made a good recommendation effect, and finally in 7274 the team achieved 41 results. If you do not consider the algorithm cycle, Ali existing recommendation algorithm has also been some progress...
Keywords/Search Tags:Personalized recommendation, Random forests, Boosting thoughts, Decision tree, Recommendation system
PDF Full Text Request
Related items