Font Size: a A A

Research On Visual Summarizations Of Web Pages For Search

Posted on:2013-01-26Degree:DoctorType:Dissertation
Country:ChinaCandidate:B X JiaoFull Text:PDF
GTID:1228330377451756Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet, search engines have been the major method for users to seek information. Beyond all of the users’ needs, accuracy and quickness are the most important ones. However, the accuracy of current search engines cannot fully satisfy the users, so it becomes essential that users can quickly find the needed information with the current search technologies.Visual contents, such as the images, animations and videos, are contained in web pages. A picture is worth a thousand words. Information search would become much more efficient if the visual information can be shown in the search result page, since it is easier for users to get a quick understanding by seeing an image than reading texts. These visual contents, which may help users search, are called visual summarizations. Among visual summarizations, the image is the basic component of the animation and video, so we discuss the key technologies of using images as the visual summarizations.For a specific web page, the images in this page, which are so-called "internal images", are generally reliable as the visual summarizations. For these images, we proposed a dominance model to measure the ability of them representing the web page. The more dominant the internal images are, the more appropriate they would be to serve as the visual summarizations. However, dominant internal images are unavailable in a lot of web pages, so we proposed a scheme to obtain from the Internet the images relevant to the target web page, which are so-called "external images". Besides, we compared these two natural image based visual summarizations with the synthesized images, such as thumbnails. Based on the comparisons, we further proposed an algorithm to select the best visual summarizations from the internal and external images. The main contents and contributions of this dissertation are as follows:1. Proposed a dominance model for internal images. Since advertisement images, decoration images exist in the web pages, we proposed an algorithm to measure the dominance of internal images based on feature extraction and machine learning. The image features were extracted on four levels and LamdaMART algorithm, which is based on boosted tree and optimized for NDCG, was applied in our system to establish the dominance model.2. Proposed algorithms to obtain external images and measure the relevance between them and the target web page. Relevant external images were obtained from the Internet based on key phrase extraction and image search, and then the relevance was calculated using textual and visual information of these images. Our system can find relevant external images for almost a half of the web pages without dominant internal images and achieve a high precision.3. Performed comparisons between internal images, external images, thumbnails and visual snippets. With a human labeled data set, we analyzed the characteristics of the web pages which were well represented by a specific kind of visual summarization. For example, internal images with high dominance scores are reliable as visual summarizations, and thumbnails are good visual summarizations for those web pages with small page sizes or with dominant images or logos from well-known sites in the snapshot area. Besides, we conducted user studies to compare the visual summarizations in web page understanding and re-finding tasks.4. Proposed an algorithm to jointly select the best visual summarization from the internal images and external images. To take the respective advantages of internal images and external images, we proposed a clustering based algorithm to select the best visual summarization. This algorithm leveraged the relevance and dominance as the prior information and exhibited the typicality property using the affinity propagation clustering algorithm. The best exemplar of the clustering algorithm was selected as the best visual summarization. Experimental results have shown that our algorithm can achieve about0.6NDCG@1performance. Our user study also indicated that the images selected by our algorithm were useful as the visual summarizations of web pages.
Keywords/Search Tags:visual summarization, internal image, external image, comparison ofvisual summarizations, best visual summarization
PDF Full Text Request
Related items