Font Size: a A A

Study Of LambdaMart Algorithm Based On Spark

Posted on:2018-09-24Degree:MasterType:Thesis
Country:ChinaCandidate:J L LiangFull Text:PDF
GTID:2348330518495450Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Along with the rapid development of the Internet, especially mobile Internet, it has gradually penetrated into all aspects of our life, influence and change our daily life. However, as the amount of data generated by the Internet has exploded, we need to spend more and more on getting the information we want. And Internet service providers tap our most wanted information from these massive data through search engine,recommendation system and other similar tools. Therefore, how to screen out the user's interested content based on big data is the key to cross the gap between the users and information. Ranking the information according to the user's intent is an important solution.Conventional ranking by relevance and by importance can only target a small number of simple features, but can not mine the relationship between the features in the case of complex information.Learning to rank can fully exploit the association between many features that represent complex information. This paper explores the learning to rank methods, and finds out that the lambdaMart modeling method can more accurately excavate the pattern of the ranking scenario and have higher training efficiency, and also studies the overfitting problem in machine learning methods.Besides,this paper studies in-depth the distributed systems,conducts experiments and finds that the boosted decision tree model based on spark is more efficient. The spark distributed computing method can be more efficient training lambdaMart model, faster updating model, more accurate and timely mining the law of data changesThis paper also proposes bin algorithm to compress features. Based on the characteristics of lambdaMart, it also proposes new ways to search the best features for node and calculate the lambda through spark distributed computing. Moreover , it proposes sampling part of features for training and penalty factor of feature to prevent overffiting without reducing the efficiency of model training. Finally, the algorithm is designed and implemented, and the experiment proves the validity and efficiency of the model to solve the problem of ranking. This study is of great significance to solve the problem of ranking in big data scenario.
Keywords/Search Tags:learning to rank, distributed computing, lambdamart, spark
PDF Full Text Request
Related items