Font Size: a A A

Design And Implementation Of Web Images Classification System Based On Contentand Text-aid

Posted on:2014-10-30Degree:MasterType:Thesis
Country:ChinaCandidate:X H ChenFull Text:PDF
GTID:2268330422463229Subject:Communication and Information System
Abstract/Summary:PDF Full Text Request
With the arrival of the mobile Internet era, people can upload multimedia resources such as voice, images and video to the Internet with the help of any hand-held mobile devices. It leads to multimedia information of the Internet growing explosively, and so content-based image classification and retrieval technology in the management and query web images has attracted more and more attention. This paper has carried on research of the web image classification techniques deeply, and a prototype web image classification system which was called Tiny Panda has been developed by combining the image visual features and related text information of web pages.At first, in order to describe the image information, this paper presents an image classification algorithm of SURF combined with the global feature based on SVM. Firstly, this algorithm extracts the SURF feature vector set, and uses LSH to embed the set into a histogram vector, and then extracts other global features, such as color. Secondly, the algorithm uses SVM to classify the above two features respectively.Then the two classification results are integrated by the algorithm of decision level fusion to get the ultimate classification result. This algorithm combines the advantages of two different features; it not only solves the problem that combining the local feature and global feature directly leading to the curse of dimensionality, but also solves the drawback of high time complexity for searching matched key points in lots of SURF key points.Then, this paper designs and realizes the web image spider based on key words while building the web image classification system. In the process of crawling images, it calculates the Web page according to preliminary limited series of key words, and recognizes the Web page text area, which can achieve the purpose of the page level image screening and images、text information acquisition, and so, can overcome the problem of complicated classification because of many images of the Web page. At last, web image classification technology used the algorithm of the fusion of the image and text auxiliary. Through the artificial decision algorithm, we fuse images in visual feature extraction and SVM classification decision fusion to get the probability, and calculate the weight of the corresponding page text categories according to the key words related to the image, to realize the improvement of the accuracy of web image classification, which realizes the improvement of the accuracy of web image classificationIn order to test and verify this paper’s algorithm, a web images classification system Tiny Panda is developed. It includes querying module, visual features extraction module, web image topic crawler module, fusion of visual features and text information and etc. This system classifies13719web images which are fetched from internet and the result shows that, the average precision of one visual feature is only74.6%, but after fusing color feature and SURF feature, the average precision has improved to82.7%, at last with the help of text information, the average precision has improved to91.5%, and all these results show that this paper’s algorithm has great effectiveness for visual features fusion and text aided.This paper is based on the research work on the content of the web image classification and retrieval technology, and is exploration and attempt for the practice and commercial intends, and this paper’s algorithm has certain theoretical significance and application value.
Keywords/Search Tags:SURF, LSH, SVM, Decision Level Fusion, Web Image Classification, Image Theme Crawler
PDF Full Text Request
Related items