Font Size: a A A

Research On Related Image Extraction Based On Page Structure On Web

Posted on:2011-09-04Degree:MasterType:Thesis
Country:ChinaCandidate:J S LiFull Text:PDF
GTID:2248330395958063Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
Nowadays, with network and people’s requirement developing quickly, more online shopping services have appeared on the web such as amazon and dangdang. Online shopping services provide many kinds of very cheap merchandises and no business acreages and time; customers can buy whatever they want without going out to the store. In order to satisfy the customers understanding the goods they want more clearly, we need both of the text and images of the goods. But because of the commercial nature of deep web, there are often only one or two pictures of the goods. In order to satisfy the customers’needs, we should solve two questions:first we must extract the images of the merchandise; second we should extract the related images from surface web then return them to the users.In order·to satisfy the customer’s contrasting and choosing the same merchandises in different web sites, this paper designs and implements an extractor of extracting images from the result pages of deep web. Aim at the features of image on deep web result pages and the image path on DOM tree, we can extract the images exactly. For some deep web sites which have no images on the result page and images are on the detailed record pages, this system can get into the detailed pages and find the data record block using the attributes of the record. And then extract images using the method like breadth-first search and the threshold.In order to get more related information of the goods, we propose an algorithm named VITS(Visual and Text-based Image Search). For the sake of meeting users’needs, we first find the related web sites on surface web, then using the text and visual information of pages to extract the images. This extracting subsystem is working on the first subsystem, and is to fit the deeper needs of customers, so the cost of this subsystem is bigger.Experiments prove that the method mentioned in this paper can truly and effectively extract the images from the result pages on deep web and has a high accuracy for extracting image from surface web. For the related image extracting on surface web, this system also has a high accuracy.
Keywords/Search Tags:online shopping, image extraction, Deep Web, Surface Web, Extractor
PDF Full Text Request
Related items