Font Size: a A A

Research On Classification And Retrieval Techniques For Multi-Modal Data

Posted on:2019-01-17Degree:DoctorType:Dissertation
Country:ChinaCandidate:Z LiFull Text:PDF
GTID:1318330545972300Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
With the rapid development of Internet,cloud computing and mobile electronic devices,the amount of electronic data in various fields is rising sharply.Besides,the data volume has entered into the ZB-era from PB-era,most of which(around 85%)are with multiple modalities,e.g.,text,image,video.These data usually contains rich and valuable information.Currently,it is still a challenging problem to make computers automatically understand data of different modalities and mine their hidden relationships.More concretely,classification and retrieval of multi-modal data have been received remarkable attention by researchers in recent years.However,there still exists deficiency for some critical multi-modal problems,i.e.,efficient data computing,automatic content understanding for complex images and accurate cross-modal retrieval.Based on two kinds of common data(text and image),this thesis mainly engages in three key problems,i.e.,textual feature selection,single-and multi-label image classification with deep neural networks,and cross-modal retrieval.The main contributions of this work include the following three parts.(1)We propose a parallel text feature selection approach for addressing the issue of large-scale dimensionality reduction for text classification.Specifically,we employ the mutual information of Renyi entropy to measure the correlation between feature and class variables,and leverage the maximum mutual information theory to choose the combination of feature variables.We utilize Mapreduce model to implement the parallel feature selection for large scale text data.In addition,a measurement based on information loss is proposed in this thesis,which is used to measure the information increasement caused by adding a feature.(2)We propose an end-to-end deep convolutional neural network(CNN)for addressing complexity multi-label image classification.In particular,two CNNs,i.e.,Object-CNN and Scene-CNN,are embedded in one unified framework.In such a case,scene cues of images can be leveraged to benefit the classification results.In addition,we also propose a multiple cross-entropy loss for better optimizing the proposed framework.Extensive experiments on Pascal VOC 2007 and MS COCO validate the effectiveness of our approach.(3)We proposed a simple and effective deep learning approach for addressing crossmodal retrieval between images and texts.In particular,two independent deep neural networks are employed to project heterogeneous image and text feature representations into isomorphic semantic space,in which each dimension corresponds to a specific high-level semantics.Experimental results on NUS-WIDE and Pascal VOC 2007 well demonstrate the effectiveness of the proposed approach.With experimental analysis,the results show that these methods proposed in this thesis are effective in solving multi-modal data classification and retrieval problems.These methods have good application perspective both in scientific research and real applications.
Keywords/Search Tags:multi-modal data, feature selection, image retrieval, cross-modal data retrieval, deep learning, distribution computing
PDF Full Text Request
Related items