Font Size: a A A

Based On The Text Web Image Search Engine With

Posted on:2008-01-09Degree:MasterType:Thesis
Country:ChinaCandidate:T XieFull Text:PDF
GTID:2208360212999855Subject:Computer system architecture
Abstract/Summary:PDF Full Text Request
Internet technology has been greatly improved in recent years. It brings us a whole new lifestyle which is much more different than before. The key factors for the success of the World Wide Web are its large size and the lack of a centralized control over its contents. Both issues are also the most important source of problems for locating information. A new information retrieval technology is born and has got a flying development, which is search engine technology. The version of Web technology is 2.0 now, so the contents of Web are not only texts but also a lot of multimedia information (images, videos, audios etc.).It has made the Web pages rich and colorful. At the same time, we desire for another content of search. For example, we want to search for some images. Because the text-based technology has got a great achievement, this thesis mainly works on text-base Web image retrieval system. We have analyzed some technologies of search engines, proposed a scheme to design a Web image search engine system using mainly text-based technology and realized an elementary instant.This thesis firstly introduces the background of Web image search and analyzes some popular Web image search engine. Afterwards, we introduce some technologies of search engine, including typical system architecture, Web crawling, information extracting, indexing, results sorting, which are this thesis's rationale. We introduce our traversal spider-WIRE in chapter 3.In chapter 4,we analyze carefully the structure of HTML components including"img"tag,"a"tag, URL of image,title of Web page, anchor text of Web page,"meta"tag,"table"tag, surrounding text of"img"tag etc. And sum up nine extraction patterns to fetch information relevant to images. We also research some concrete extracting methods in this chapter. Afterwards, we propose some heuristics methods to filter useless images. And through statistics of mass of HTML documents, we conclude some latent rules, which are important to analysis of images' importance.At the end of this thesis, we propose a detailed architecture design of text-base Web image search engine and implement it. The global structure of our system and relations of the components of system are introduced. Some components are detailed in function and implementation. Finally a simple evaluation about searching effect and performance is given.
Keywords/Search Tags:text-based, Web image search engine, HTML, extracting mode, extracting method
PDF Full Text Request
Related items