Study Of LambdaMart Algorithm Based On Spark

Posted on:2018-09-24

Degree:Master

Type:Thesis

Country:China

Candidate:J L Liang

Full Text:PDF

GTID:2348330518495450

Subject:Information and Communication Engineering

Abstract/Summary:

PDF Full Text Request

Along with the rapid development of the Internet, especially mobile Internet, it has gradually penetrated into all aspects of our life, influence and change our daily life. However, as the amount of data generated by the Internet has exploded, we need to spend more and more on getting the information we want. And Internet service providers tap our most wanted information from these massive data through search engine,recommendation system and other similar tools. Therefore, how to screen out the user's interested content based on big data is the key to cross the gap between the users and information. Ranking the information according to the user's intent is an important solution.Conventional ranking by relevance and by importance can only target a small number of simple features, but can not mine the relationship between the features in the case of complex information.Learning to rank can fully exploit the association between many features that represent complex information. This paper explores the learning to rank methods, and finds out that the lambdaMart modeling method can more accurately excavate the pattern of the ranking scenario and have higher training efficiency, and also studies the overfitting problem in machine learning methods.Besides,this paper studies in-depth the distributed systems,conducts experiments and finds that the boosted decision tree model based on spark is more efficient. The spark distributed computing method can be more efficient training lambdaMart model, faster updating model, more accurate and timely mining the law of data changesThis paper also proposes bin algorithm to compress features. Based on the characteristics of lambdaMart, it also proposes new ways to search the best features for node and calculate the lambda through spark distributed computing. Moreover , it proposes sampling part of features for training and penalty factor of feature to prevent overffiting without reducing the efficiency of model training. Finally, the algorithm is designed and implemented, and the experiment proves the validity and efficiency of the model to solve the problem of ranking. This study is of great significance to solve the problem of ranking in big data scenario.

Keywords/Search Tags:

learning to rank, distributed computing, lambdamart, spark

PDF Full Text Request

Related items

1	Research On Distributed Manifold Learning Algorithm Based On Spark
2	A High-Performance Chinese Distributed Computing System (CH-Spark)
3	A System For Distributed MD Data Analysis Based On Spark
4	Optimization And Parallelization Of The GBRT Algorithm For Learning To Rank
5	Research And Implementation Of Distributed Machine Learning Algorithms Orchestration System For Big Data Processing
6	Research And Implementation Of Spark Application Performance Prediction Model Based On Machine Learning
7	Research On Learning To Rank Based On LambdaXGB
8	Design And Implementation Of A Distributed Hybrid Index Structure Based On Spark
9	Research And Implement Of Distributed Deep Learning System Based On Spark
10	Research And Realization Of Clustering Algorithm Based On Spark Platform