The Key Technology Research Of Web Image Crawler

Posted on:2011-04-13

Degree:Master

Type:Thesis

Country:China

Candidate:Y Q Wang

Full Text:PDF

GTID:2178360332457625

Subject:Computer application technology

Abstract/Summary:

PDF Full Text Request

Using content-based image retrieval technology on internet for searching images is an important and challenging academic research. Web image searcher can supply continuous image data for content-based image search engine, and it is of significance to enhance quality of service for user by content-based image search engine. Based on the research of content-based image search engine system V1.0 which is developed independently, multi-thread technology is introduced and multi-thread web image searcher is developed in this thesis. At the same time, the I/O buffering scheme of multi-thread web image searcher is proposed. Some common search strategies are analyzed and compared deeply, and the search strategy of web oriented image searcher is studied. A new search strategy, which is fit for multi-thread web image searcher, is explored. Finally, a subsystem of multi-thread web image searcher is developed and recombined with image retrieval subsystem composing the system V2.0 of content-based image search engine.The disk I/O buffer method of multi-thread web image searcher is proposed. Frequent disk I/O operations result in the performance degradation of multi-thread web image crawler. A method of disk I/O buffer is proposed, which includes double-queue buffering in collecting URLs and cycle buffer pool in image storage and URL storage. Method of double-queue buffering is used in the URL queue, which is waiting for processing. When a queue provides all threads with URLs and the other one is performing a new operation of reading new URLs. Therefore, these two coinstantaneous operations can continuously supply the new URLs to each thread. The method of cycle buffer pool is used respectively in image storage and URL storage, and the two cycle buffer pool both work on the same principle. The experimental results show that the Multi-thread web image crawler system's performance is improved obviously when these disk I/O buffer methods are applied.Depth-based breadth first search of web image searcher is proposed. The position of the different images on web site is counted and analyzed, and experimental results show that the numbers of high quality image in deep web site are more than that of shallow web site. By the study of breadth first search and depth first search of traditional searcher, the depth-based breadth first search is proposed. In order to structure web image searcher of depth-based breadth first search, two ways of page URL determine repeat of DR-BTree and database storage of page URL are proposed. Combination this thesis'search strategy proposed and image filtering method achieve the filtering processing of download images. The comparison of experimental results show that the image numbers of this thesis'search strategy downloaded are respective 3.6 times and 2.7 times relative to two traditional search strategies. It denotes that thesis'search strategy is fit for multi-thread web image searcher.On the basis of above research, multi-thread web image searcher subsystem is designed and developed, and it is a very important part of the system V2.0 of content-based image search engine. Multi-thread technology, disk I/O buffer and Depth-based breadth first search are introduced into this subsystem. It enhances the speed of image downloading and supplies a lot of images for content-based image search engine, and the desired goal is achieved.

Keywords/Search Tags:

Web image searcher, Multi-thread technology, Disk I/O buffer, Depth-based breadth first search, Content-based image search engine

PDF Full Text Request

Related items

1	Study On Key Security Technology Of Content-based Web Image Search Engine
2	Storage And Indexing Technology Research And Implement On Image Search Engine
3	Research Of Key Technology Of Content-based Image Search Engine
4	Study On Web Image Search Engine Based On MPEG-7
5	Study On Content-based Image Meta-search Engine Technology
6	Image Retrieval Algorithms For Content-based Web Image Search Engines
7	Design And Implementation Of Text And Content Based Image Search Engine
8	The Design Of Web Search Engine And Realization Of Multi-format Information Search
9	A Similar Image Search Engine Based On Millions Of Images And Distributed Computing
10	Study On The Algorithms For Content-based Image Reverse Search Engine