Font Size: a A A

The Design And Implementation Of Ad Feature Extractor System Based On Stream Processing

Posted on:2016-02-27Degree:MasterType:Thesis
Country:ChinaCandidate:X ZhongFull Text:PDF
GTID:2308330461955244Subject:Software engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of the Internet, the expansion of the information presented explosive. The way people get information become more convenient. The timeliness of information is becoming more demanding. For most search engines, like Google, Bing and Baidu, usually providing structured web results in response to the query results page, as well as insert ads based on the pay-per-click model. In order to show the most relevant ads on the best position of the page, we need to dynamically estimate the possibility of an ad is clicked with the given context through some machine learning algorithms. Context may include ad weights, user preferences, historical inquiry, historical click and other information. A primary search engine may handle thousands of queries per second, each page may contain multiple ads. For timely processing of user feedback, we need a low-latency, scalable, highly reliable processing engine.This paper describe such a real-time streaming ads feature extractor system. This system extract ads feature from the real-time generated ads show log and click log from the search engine for constantly training of some machine learning algorithms in CTR Prediction Model, then update the probability that the ad will be clicked. Search engine use these real-time data to decide what kind of ad will show to users, which make achieve users, advertisers and Baidu tripartite win. The ads feature extractor system developed base on Baidu streaming processing framework-Task Manager, combined with the relevant technical methods HDFS, MapReduce etc. In the application of Baidu Phoenix Nest Advertising System through this ads feature extractor system, ads show and click data feedback to CTR Prediction Model shortened to minute level, the amount of log data processed by the system daily reach tens T, feature data reach several hundred G.This paper first introduce the system project background, then illustrate the project related technical background and describe the system requirements, overall design and module design. Then several key modules of the system are described in detail. Finally we make a summary of the project and discuss the next stage.
Keywords/Search Tags:Machine Learning, Stream Processing, Task Manager, HDFS, MapReduce
PDF Full Text Request
Related items