Font Size: a A A

Research On The Key Technology Of Web Page Noise Recognition And Removal Based On Vision

Posted on:2018-01-19Degree:MasterType:Thesis
Country:ChinaCandidate:T N ZhaoFull Text:PDF
GTID:2348330542990941Subject:Engineering
Abstract/Summary:PDF Full Text Request
In the 21 st century,mankind has entered a highly informative era.The continuous development of the Internet has made it one of the most important ways of information transmission and has become the most extensive source of information.Many of the content on the page that is irrelevant to the subject matter is often referred to as web page noise information.The web page noise is usually around the subject content of the surrounding,making the Web page theme content is not clear.This will affect the user,when they browse web pages,and it will lead to the browser loads too much irrelevant content and cause the time delay.Which led to the study of the noise removal of the page and technology development.The web page noise removal technology is committed to the original chaotic structure of the page,redundant content,disorderly layout display and unrelated useless information for structured,clear,structured,and remove useless information.Therefore,how to improve the identification and removal of Web page noise removal technology,and make the main page content is more clear has become the focus of attention,Web page noise recognition and removal has become a Web mining in an urgent problem to be solved.This article mainly introduces an important aspect of Web information mining ——researching the value and sgnificance of Web Page Noise Removal.And describes the existing web page noise recognition and removal of technical advantages and disadvantages.The new page segmentation model DIV_DOM model is proposed which can logically divide the entire page.The paper also studied the web page noise removal algorithm based on this model,and sets out the criterion of noise,identifies and clears the noise data block.In order to ensure the visual invariance of the user during the removal of the page noise information,this paper also proposes a visual sense-free web page noise filtering algorithm based on finding similar data blocks.In the DIV_DOM model,the algorithm looks for similar sibling nodes,to ensure the visual invariance in the removal of noise data blocks.At the end of the paper,the experiment results show that it has a wide range of applicability.
Keywords/Search Tags:Web information mining, Web page noise, noise recognition, noise removal
PDF Full Text Request
Related items