Font Size: a A A

The Key Technology Research Of Web Image Crawler

Posted on:2011-04-13Degree:MasterType:Thesis
Country:ChinaCandidate:Y Q WangFull Text:PDF
GTID:2178360332457625Subject:Computer application technology
Abstract/Summary:PDF Full Text Request
Using content-based image retrieval technology on internet for searching images is an important and challenging academic research. Web image searcher can supply continuous image data for content-based image search engine, and it is of significance to enhance quality of service for user by content-based image search engine. Based on the research of content-based image search engine system V1.0 which is developed independently, multi-thread technology is introduced and multi-thread web image searcher is developed in this thesis. At the same time, the I/O buffering scheme of multi-thread web image searcher is proposed. Some common search strategies are analyzed and compared deeply, and the search strategy of web oriented image searcher is studied. A new search strategy, which is fit for multi-thread web image searcher, is explored. Finally, a subsystem of multi-thread web image searcher is developed and recombined with image retrieval subsystem composing the system V2.0 of content-based image search engine.The disk I/O buffer method of multi-thread web image searcher is proposed. Frequent disk I/O operations result in the performance degradation of multi-thread web image crawler. A method of disk I/O buffer is proposed, which includes double-queue buffering in collecting URLs and cycle buffer pool in image storage and URL storage. Method of double-queue buffering is used in the URL queue, which is waiting for processing. When a queue provides all threads with URLs and the other one is performing a new operation of reading new URLs. Therefore, these two coinstantaneous operations can continuously supply the new URLs to each thread. The method of cycle buffer pool is used respectively in image storage and URL storage, and the two cycle buffer pool both work on the same principle. The experimental results show that the Multi-thread web image crawler system's performance is improved obviously when these disk I/O buffer methods are applied.Depth-based breadth first search of web image searcher is proposed. The position of the different images on web site is counted and analyzed, and experimental results show that the numbers of high quality image in deep web site are more than that of shallow web site. By the study of breadth first search and depth first search of traditional searcher, the depth-based breadth first search is proposed. In order to structure web image searcher of depth-based breadth first search, two ways of page URL determine repeat of DR-BTree and database storage of page URL are proposed. Combination this thesis'search strategy proposed and image filtering method achieve the filtering processing of download images. The comparison of experimental results show that the image numbers of this thesis'search strategy downloaded are respective 3.6 times and 2.7 times relative to two traditional search strategies. It denotes that thesis'search strategy is fit for multi-thread web image searcher.On the basis of above research, multi-thread web image searcher subsystem is designed and developed, and it is a very important part of the system V2.0 of content-based image search engine. Multi-thread technology, disk I/O buffer and Depth-based breadth first search are introduced into this subsystem. It enhances the speed of image downloading and supplies a lot of images for content-based image search engine, and the desired goal is achieved.
Keywords/Search Tags:Web image searcher, Multi-thread technology, Disk I/O buffer, Depth-based breadth first search, Content-based image search engine
PDF Full Text Request
Related items