Font Size: a A A

Application Of Improved Canonical Correlation Analysis In Feature Fusion

Posted on:2018-10-02Degree:MasterType:Thesis
Country:ChinaCandidate:C X LiFull Text:PDF
GTID:2348330542490749Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
With the rapid development of multimedia industry,the research of audio analysis,image analysis and video processing are gaining more and more attention.And how to describe a sound,image or video better become a hot issue in the field of computer vision.Image and video is typically characterized as a variety of local descriptors,such as Scale-invariant feature transform SIFT,Motion Boundary Histogram MBH,Histogram of Oriented Gradient HOG and Histogram of Optical Flow HOF features,each local feature can describe some aspects of object properties,but this single feature is difficult to portray some of the more complex picture or video.This requires researchers to combine different features to improve the characterization of the image or video,however,the widely used fusion method does not take into account the relationship between multi-modal features in video.In this paper,we propose a variety of fusion algorithms for multi-modal features in video under the frame work of Vector of Locally Aggregated Descriptors VLAD optimization.The idea of linear discriminant analysis is integrated into the theory of canonical correlation analysis,and we try to optimize the fusion effect of Canonical Correlation Analysis.And mixture of probabilistic canonical correlation analysis is applied to VLAD level fusion of video multi-modal features,it is trying to get a better fusion effect.Firstly,we introduce the selection of the local features in the video,including the static and dynamic characteristics of the visual,as well as audio features.The Histogram of Oriented Gradient feature is selected for the static feature,the Histograms of Oriented Optical Flow feature and the Motion Boundary Histogram feature are selected for the dynamic feature,the Mel Frequency Cepstrum Coefficient MFCC feature is selected for the audio feature,and then the Vector of Locally Aggregated Descriptors VLAD Optimized representation is performed for the different local feature.And the multi-modal feature fusion is carried out under the optimized VLAD framework.Secondly,in order to verify the fusion algorithm Fisher Discriminant Canonical Correlation Analysis proposed in this paper,we will compare the different fusion methods in UCF101 and CCV dataset.The fusion methods include Descriptor connection D-level,Canonical Correlation Analysis,Kernel Canonical Correlation Analysis,Fisher Discriminant Canonical Correlation Analysis and Mixture of Probabilistic Canonical Correlation Analysis.UCF101 is chosen because the dataset is the largest number of categories of datasets and appeared as a comparative dataset multiple times in the CVPR conference article,but there is no audio data in the video;The CCV database is chosen because most of the video in this dataset has audio data.At last,the classification accuracy of different feature fusion algorithms under two kinds of datasets is compared and analyzed,and the fusion methods presented in this paper are summed up and summarized.Some existing problems are discussed and the improvement direction of how to further improve the fusion effect is introduced,a brief introduction of its range of applications.
Keywords/Search Tags:Feature Descriptor, VLAD Optimized Representation, Multi-modal Feature, Feature Fusion, Canonical Correlation Analysis, Video Classification
PDF Full Text Request
Related items