Research On Blocking Website Ads Based On Code Analysis And Image Process

Posted on:2019-03-26

Degree:Master

Type:Thesis

Country:China

Candidate:R Wang

Full Text:PDF

GTID:2428330545977962

Subject:Computer Science and Technology

Abstract/Summary:

PDF Full Text Request

With the rapid development and increasing popularity of the Internet,web pages have become an important source of information.While providing users with useful information,web pages are also filled with various commercial advertisements(ads).These ads may occupy system resources,influence the display of web contents,induce users to visit harmful web pages,affect usage experience,and finally reduce users'stickiness.The existing method of blocking website ads is based on filtering rules and the core of the method is to maintain a list of filtering rules.Current most popular block-ing tool,Adblock Plus,is based on the EasyList,a filtering rule list,blocks ads through network control and in-page manipulation.Although the method of blocking ads based on the filtering rules list can partly alleviate the troubles caused by ads,the method needs to be continuously maintained according to user's feedback,which resulting in high time cost and manpower cost.Otherwise,with the appearance of web page ran-domization technology,filtering rule matching method will fail.In addition,because developers may misuse the contents of the filtering rules list when defining element's id or class attribute values,which causes normal content is blocked.Thus,in order to avoid the waste of time and labor costs for maintaining the list of filtering rules,and to reduce the number of false positives and false negatives in web ads blocking tool.This thesis firstly empirically investigates 200 Web pages in 4 categories,excavating the structures of the real web page ad regions in the web page source codes,which summarizes 4 forms of ads label nodes in ad regions.And presenting a method of recognizing ads region by ads label,which is based on code analysis and image process technology,and implementing a tool AdClear to block ads.The main work of the thesis includes:(1)Completing the analysis of web page code by recursively processing the DOM tree generated by web page HTML code.When traversing the DOM tree,different processing will be performed according to different types of nodes.Especially,the node containing the image will be sent to the server for identification.In order to reduce the pressure on the server side,the corresponding filtering rules are further proposed,and the nodes are selected to be sent to the server for judgment.According to the structure of the actual web page ads region code,presenting a method to identify the minimum ads region by the ad label.(2)Combining the characters that background color in ad label changes smoothly and the boundary is clear between the characters and background.Using information entropy and Canny operator edge detection to binarize the image and using HOG fea-tures and CNN to extract features from the binarized image.Then using SVM and MLP classification model to achieve image text classification,and complete image ad label recognition.Finally,a combination of different binarization,feature extraction and classification model techniques are used to complete the ad label recognition in the image.The three methods are Information Entropy+HOG+SVM,Canny Operator+HOG+SVM,and Canny Operator+CNN.(3)Implementing the tool AdClear and comparing it with Adblock Plus to demon-strate its effectiveness and efficiency.In our experimentals,comparing the effective-ness of the three image recognition methods,and Canny operator+HOG+SVM is the best.So choosing it in our image recognition module.In the experiment of actual ads detection,AdClear has better results than Adblock Plus,with an accuracy of 99.55%and recall of 96.52%,compared with 62%in accuracy and 92.34%in recall of Adblock Plus.

Keywords/Search Tags:

ad-blocker, code analysis, image processing, DOM tree

PDF Full Text Request

Related items

1	Adaptive blocker rejection continuous-time sigma-delta ADC
2	A Design And Implementation Of Digital Radiographic Fluoroscopy Diagnostic System Image Processing Unit
3	Localize And Decode Bar Code Using Image Processing Method
4	Pyreview:A Python Source Code Analysis Tool Based On Abstract Syntax Tree Differencing Algorithm
5	An Automatic Inspection System Of Cotton And Ramie Fibers In Cross-sectional View Based On Image Processing
6	The Study And Application Of The Tree-Ring Micro Density Based On The Virtual Instrument And Image Processing Technology
7	Design Of A Resilient To Out-of-band Blocker Wideband Low Noise Amplifier
8	Design Of The Two-dimensional Bar Code Identification System Based On Image Processing Technology And Handheld Mobile Platform
9	Research Of QR Code Recognition Systerm Based On Image Processing
10	The Study Of Mobilephone Signal Blocker's Electromagnetic Field And The Development Of Evaluation System