Based On The Text Web Image Search Engine With

Posted on:2008-01-09

Degree:Master

Type:Thesis

Country:China

Candidate:T Xie

Full Text:PDF

GTID:2208360212999855

Subject:Computer system architecture

Abstract/Summary:

PDF Full Text Request

Internet technology has been greatly improved in recent years. It brings us a whole new lifestyle which is much more different than before. The key factors for the success of the World Wide Web are its large size and the lack of a centralized control over its contents. Both issues are also the most important source of problems for locating information. A new information retrieval technology is born and has got a flying development, which is search engine technology. The version of Web technology is 2.0 now, so the contents of Web are not only texts but also a lot of multimedia information (images, videos, audios etc.).It has made the Web pages rich and colorful. At the same time, we desire for another content of search. For example, we want to search for some images. Because the text-based technology has got a great achievement, this thesis mainly works on text-base Web image retrieval system. We have analyzed some technologies of search engines, proposed a scheme to design a Web image search engine system using mainly text-based technology and realized an elementary instant.This thesis firstly introduces the background of Web image search and analyzes some popular Web image search engine. Afterwards, we introduce some technologies of search engine, including typical system architecture, Web crawling, information extracting, indexing, results sorting, which are this thesis's rationale. We introduce our traversal spider-WIRE in chapter 3.In chapter 4,we analyze carefully the structure of HTML components including"img"tag,"a"tag, URL of image,title of Web page, anchor text of Web page,"meta"tag,"table"tag, surrounding text of"img"tag etc. And sum up nine extraction patterns to fetch information relevant to images. We also research some concrete extracting methods in this chapter. Afterwards, we propose some heuristics methods to filter useless images. And through statistics of mass of HTML documents, we conclude some latent rules, which are important to analysis of images' importance.At the end of this thesis, we propose a detailed architecture design of text-base Web image search engine and implement it. The global structure of our system and relations of the components of system are introduced. Some components are detailed in function and implementation. Finally a simple evaluation about searching effect and performance is given.

Keywords/Search Tags:

text-based, Web image search engine, HTML, extracting mode, extracting method

PDF Full Text Request

Related items

1	Design And Implementation Of Text Information Extracting Modules Of Html Web Pages Based On DOM
2	Design And Implementation Of An Image Search Engine Based On Information System
3	Research On Methods Of Extracting Image Semantics In WWW
4	Study On The Tag-based Analysis Technique Of Extracting The Body Of The Page
5	The Study And Implementation Of Search Engine
6	A Method Of Extracting The Graphic-Text Abstract Of Webpage Based On OWL
7	Research On Extracting Attack To Hidden Message In Image
8	The Method Of Extracting Hypertension Lesions From The Image Based On Non-Fluorescence Fundus Angiography
9	Contour Extracting From Digital Bandshaped Light Image
10	Research Of Conversion From HTML Web Based On Contect Personalization