Font Size: a A A

Research On Text-Based Web Image Retrieval

Posted on:2008-09-21Degree:MasterType:Thesis
Country:ChinaCandidate:L X ZhengFull Text:PDF
GTID:2178360215471054Subject:Computer software and theory
Abstract/Summary:PDF Full Text Request
With the continual development of Web-based information technology, as one of important information resources, image has been inevitably emerging in Web information exchange and its capacity has been increasingly expanding. Among the huge amount of Web images, how to realize the Web image retrieval effectively on the basis of the knowledge about image recognition and retrieval habit of Web clients is one of important research projects in information retrieval.The thesis is based on the analysis of image and its semantical features, according to the image recognition and retrieval habits of Web clients, by studying variety of methods of image retrieval and combined image with its special web context, analyzed the feasibility and effectiveness of text-based web image retrieval on the basis of current technologies. The thesis deeply studied and analyzed the retrieval method of Web image by focusing on the improvement of the retrieval quality of Web images, and the study is emphasized on the collection of image resources and the preprocessing in analysis of related text of images. The main research activities comprise: ( 1 ) For the collection of various types of Web image resources, configurable Web Robot architecture was studied and achieved based on the analysis of the core mechanism of Robot and confirmed by experiments. By adopting the disposable flexible configuration, the target, range and efficiency of the collection of Web resources can be controlled flexibly. By retaining appropriate developing interface, the seamless integration of various collecting tools can be achieved, which is favorable for saving the developing cost of various Web collecting tools and improving the operating efficiency.( 2 ) For the increasing complicated HTML pages, algorithm of page topic division based on the difference of tree path in HTML was proposed, which is supported by the various heuristic rules proved by VIPS algorithm, and was verified by experiments. "Topic shifting" phenomenon coming from the index searching in which the overall page is regarded as single topic can be overcome by achieving the topic division in complicated page via the algorithm. Therefore, the retrieval quality of Web image according to the algorithm can be improved efficiently.(3 ) For the phenomena of using the same topic frequently in one Website, the algorithm about noise filtering of the Website was proposed by employing the algorithm of topic division based on the difference of tree path in HTML with Hub value in HITS algorithm, and was verified by experiments. By using the algorithm, the repeated topic in the same Website can be recognized efficiently. By filtering the repeated topics present in the HTML page, the effects of noise reduction can be achieved, which not only can benefit clients to obtain useful information, but also can collect useful image resources and its related text, and thus can enhance the retrieval precision of text-based web image retrieval.
Keywords/Search Tags:Web Retrieval, Image Retrieval, Web Robot, Topic Division, Noise Filtering
PDF Full Text Request
Related items