Font Size: a A A

Design And Implementation Of Visual Search-based Advertising Information Augmentation System

Posted on:2014-02-18Degree:MasterType:Thesis
Country:ChinaCandidate:X H LiuFull Text:PDF
GTID:2268330392462831Subject:Software engineering
Abstract/Summary:PDF Full Text Request
The research subject of this paper mainly came from Microsoft Research Asia and SunYat-sen intelligent information processing and cloud computing laboratory’s cooperationproject: Advertising enhance system engineering. On the basis of the laboratory and ChinaTelecom Guangzhou Research Institute’s distributed image retrival platform, iSimilar, thispaper used the existed core technology and architecture, combined with distributed computing,Web crawler, image retrieval, mobile application development technoloty, proposed a solutionfor the project. This paper also designed and implemented an End-to-End Online MobileAdvertising Information Augmentation System–iSearch. This system provided advertisingregistration services to the back-end publishers, and advertising recognition services to front-end users.The main work of this paper includes the following aspects:(1) Designed andimplemented a visual search-based advertising information augmentation system; majormodules are:①Advertising registration module, provides the following functions: userregistration, user login, advertising information uploads, personal information management,advertising information management, and so on.②Visualization retrieval module, providesinformation retrieval function for three channels(movies, books, advertisements).③Mobileclient, providing a way to use the visual search function on Android and Windows Phonesmartphone.(2) Did some experiments and analysis on open source web crawler Hertrix andNutch; and implemented a custom distributed data crawler based on Nutch. This tool wasapplicable to crawl accurate information from the web pages with same structure.(3)Encapsulated an HTTP interface, the clients could obtain system’s image retrieval service through the HTTP protocol easily.The main contribution of this paper was to improve some project relevant prior art:(1) Proposed a XPATH-based template information extraction method, archieve theaccurate extraction of the specified data from web page. This method in conjunction with theweb crawler, to a certain extent solves the problem of existing web crawler can not captureddata accurately.(2) Proposed a solution of building incremental index for newly inserted data, solve theproblem of newly inserted data can not be retrieved in real time, which is caused by time-consuming to build the index of the full amount.(3) Used MySQL database to store annotation information, solve the problem thatiSimilar can not support long text stored well.Besides, this paper proposed an innovative advertising information augmentation systemapplication mode; provide an end-to-end visual mobile search platform with image retrievaltechnologies and mobile Internet; people can get more related information about the postersadvertising they are interested in almost anytime, anywhere, to effectively enhance the posteradvertising effectiveness.Detailed analysis, design and implementation of ISearch system according to thedevelopment process of software engineering was made in this paper. Currently, eachmodule’s function of ISearch system has been archieved, people can use the search functionprovided by the system through the Android client, Windows Phone client and Web browser.The distributed data crawler has been verified to be able to accurately extract specifiedinformation, and completed the accurate crawl specified data of “Mtime”. Effietive code ofthe system is about15,000lines.
Keywords/Search Tags:Distributed, Crawler, Information extraction method, Mobile search, Incremental Index
PDF Full Text Request
Related items