Font Size: a A A

Distributed Image Search Engine Design And Implementation

Posted on:2011-10-03Degree:MasterType:Thesis
Country:ChinaCandidate:H F ZhanFull Text:PDF
GTID:2178330338989802Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
Along with the rapid expansion of internet scale and the popularity of imaging device, image resources in internet increase very quickly and contain huge information. To extract and use this information efficiently, this paper selected high accuracy text based image search technology and the Hadoop distributed platform to reseach, through analyzing the existing image search technology, and constructed distributed image search engine. The finished work included:(1)Posing a kind of image focused crawler based on page authority and text complete degrees by combining analysis of existing network resources collection technology with actual demand of image search engine. This crawler can make prior data collection to the high authority and image intensive websites as well as taking into account the collection range to achieve the high efficiency collection purpose in unit interval.(2)Through researching the traditional text classification and information extracting technology, posing a kind of text key words extraction improved algorithm based on TF-IDF text classification technology and adding sentence components recognition and text's page position importance weighted elements. This algorithm can accurately extract appropriate statements to describe the page's image in a large number of texts. We also used Lucene open source development kit to customize convenient inverted structuring text index datebase and search interface for image search.(3) In order to still supply quick search service to users In the case of large-scale data, this paper researched the application of Hadoop distributed platform and used Map/Reduce distributed programming technology of Hadoop as distributed computing tool to design an image search engine system integrating distributed data collection, index and search fusing above single-point's data collection technology and index generation technology.(4) Using Eclipse programming tool to realize this distributed image search engine and took performance test. The test result showed that this distributed image search engine had better performance in data collection ability, establishment of large-scale index and search response speed.
Keywords/Search Tags:image search, focusing crawler, text extraction, Hadoop, Lucene
PDF Full Text Request
Related items