Font Size: a A A

Cannabis Webpage Filtering Based On Multi-Modal Fusion

Posted on:2013-02-28Degree:MasterType:Thesis
Country:ChinaCandidate:Y J WangFull Text:PDF
GTID:2348330503471636Subject:Pattern Recognition and Intelligent Systems
Abstract/Summary:PDF Full Text Request
Image and text category classification is an active and important research area in pattern recognition. Cannabis webpages filtering can be treated as a special case of image and text category classification. Image and text category classification is of significance to information retrieval and management. Similarly, cannabis image and text filtering is of great importance for the proper development of Web culture.Image and text category classification is the base of cannabis image and text filtering,meanwhile cannabis image and text filtering can be a good reference for image and text category classification. There exists very close relationship between image and text category classification and cannabis image and text filtering. Therefore, in this thesis, we focus on image category classification and the fusion of the text and image, as well as its application to cannabis webpages filtering. The main contributions of this thesis are summarized as follows:We propose a Multi-Modal Multiple-Instance Learning(MMMIL) approach combining both text and image information for cannabis webpage recognition. The main technical contributions of our work are two-fold. First, the text information associated with images is used to build a pre-classifier, which can pre-select pseudo positive training bags from new WebPages to update multi-modal classifier. This can be seen as a pseudo active learning process. Second, we design an efficient instance selection technique by utilizing text information to speed up the training process without compromising the performance. The experiments on a dataset containing over 40,000 images for more than 4,000 WebPages demonstrate the effectiveness and efficiency of the proposed approach.We have studied the method of fusing multi-modal multi-level information. Multi-modal classifier has been studied for classification, the first mode is a multi-modal multi-instance image classifier, the other is the main text classifier. Then D-S evidence theory is used to fuse the results of the two classifiers. The recall rate of the text classifier or image classifier is not very high. This demonstrates the fact that the classification of correct rate of the cannabis webpage is not very high. However, when two classifies are been fused, the recall rate are greatly improved and the correct rate are not changed too much than the main-text classify.Experiments show that our approach not only consider the practicality, speed, and taking into account the performance of the classifier and the classification of cannabis webpages for the special requirements of this particular area.
Keywords/Search Tags:Cannabis Webpage Recognition, MIL, Multi-Modal
PDF Full Text Request
Related items