The Design And Implementation Of Ad Feature Extractor System Based On Stream Processing

Posted on:2016-02-27

Degree:Master

Type:Thesis

Country:China

Candidate:X Zhong

Full Text:PDF

GTID:2308330461955244

Subject:Software engineering

Abstract/Summary:

PDF Full Text Request

With the rapid development of the Internet, the expansion of the information presented explosive. The way people get information become more convenient. The timeliness of information is becoming more demanding. For most search engines, like Google, Bing and Baidu, usually providing structured web results in response to the query results page, as well as insert ads based on the pay-per-click model. In order to show the most relevant ads on the best position of the page, we need to dynamically estimate the possibility of an ad is clicked with the given context through some machine learning algorithms. Context may include ad weights, user preferences, historical inquiry, historical click and other information. A primary search engine may handle thousands of queries per second, each page may contain multiple ads. For timely processing of user feedback, we need a low-latency, scalable, highly reliable processing engine.This paper describe such a real-time streaming ads feature extractor system. This system extract ads feature from the real-time generated ads show log and click log from the search engine for constantly training of some machine learning algorithms in CTR Prediction Model, then update the probability that the ad will be clicked. Search engine use these real-time data to decide what kind of ad will show to users, which make achieve users, advertisers and Baidu tripartite win. The ads feature extractor system developed base on Baidu streaming processing framework-Task Manager, combined with the relevant technical methods HDFS, MapReduce etc. In the application of Baidu Phoenix Nest Advertising System through this ads feature extractor system, ads show and click data feedback to CTR Prediction Model shortened to minute level, the amount of log data processed by the system daily reach tens T, feature data reach several hundred G.This paper first introduce the system project background, then illustrate the project related technical background and describe the system requirements, overall design and module design. Then several key modules of the system are described in detail. Finally we make a summary of the project and discuss the next stage.

Keywords/Search Tags:

Machine Learning, Stream Processing, Task Manager, HDFS, MapReduce

PDF Full Text Request

Related items

1	Design And Implementation Of Data Splice System Based On Stream Computing
2	Research On Two-stage Task Scheduling Of Distributed Stream Processing System
3	The Design And Implementation Of The D-Stream Stream ProcessingSystem Which Supports Dynamic Task Topology And Load Shedding
4	Processing Of Small Files Based On HDFS And Optimization And Improvement Of The Performance For Mapreduce Computing Model
5	Working Principle And Applied Research Of MapReduce
6	Research And Implementing Of Query Task Management In Data Stream Processing System
7	Large-Scale Stream Processing Task Resource Scheduling Method Based On Deep Reinforcement Learning
8	Research Of Online Learning Algorithm Based On Multi-task And Multi-kernel For Stream Data
9	The Design And Implementation Of A Distributed Computing System Based On MapReduce
10	Research On Data Processing Model Based On Machine Learning In Cognitive Computing Systems