Font Size: a A A

Video Target Detection And Tracking Based On Multimodal Data

Posted on:2020-02-12Degree:MasterType:Thesis
Country:ChinaCandidate:J R TongFull Text:PDF
GTID:2428330578464132Subject:Computer Science and Technology
Abstract/Summary:PDF Full Text Request
The digital data produced and collected by sensors have increasingly complicated with the rapid development of information technology,presenting polymorphism,multi-source,and multi-descriptive characteristics.These data are often collectively referred to as multi-modal data.At present,the existence of a large amount of multi-modal data makes traditional data analysis methods for single modality confront many new challenges so that the research on multimodal analysis methods and mining potential common information in multimodal data has become a new research hotspot,which is of great theoretical research significance and application value.This dissertation mainly focuses on the target detection algorithm based on multi-modal data and achieves the following results.Firstly,the general Faster R-CNN-based multimodal pedestrian detection algorithm is and effective method of solving the problems of the poor performance of visible-light pedestrian detection model in some complex scenes.However,it has limited detection capability for multi-scale targets.As such,we construct a feature pyramid with multiple scale feature maps by feature generation network(Residual Neural Network)and introduces it into the multi-modal pedestrian detection based on Faster R-CNN,and thus propose a multi-modal pedestrian detection framework based on feature pyramid network.The experimental results on the standard benchmark show that the proposed method can complete the detection in multi-modal images and multiscale targets.Then,for the purpose of improving our proposed the multi-modal pedestrian detection algorithm based on feature pyramid network,we further invent four different multi-modal fusion architectures(i.e.feature pyramid fusion-cascade,feature pyramid fusion-max,feature pyramid fusion-sum and score fusion),which integrate visible and infrared thermal modal information in different ways at different stages.The effectiveness and limitations of these four different fusion architectures are investigated in depth by testing them on the standard multimodal pedestrian detection dataset.The optimal fusion architecture is sum fusion.Finally,according to the difference of the feature strength of visible light mode and infrared thermal mode,we propose a sharp fusion architecture combining max fusion and sum fusion and apply also it to our proposed the multi-modal pedestrian detection algorithm based on feature pyramid network.With the consideration of the difference of the sensitivity to light of each modal,we design a light intensity estimation network to generate the light intensity weight.A weighted fusion architecture based on the light intensity weight is introduced and applied to the algorithm.Further,for the difference in the scale characteristics of the feature extract in each modal,we suggest a multimodal pedestrian detection algorithm based on the weighted fusion of adaptive feature scale membership parameters.The the effectiveness and limitations of the three improved method are compared on the standard multimodal pedestrian detection dataset.To summarize,we introduce in this dissertation the feature pyramid network into the multimodal target detection framework and propose a multi-modal pedestrian detection algorithm based on feature pyramid network.Its optimal fusion architecture is investigated.Based on the difference of feature strength,light intensity and scale characteristics of data from different modal,we propose three different fusion architectures,which are shown to generate excellent results on the standard multimodal pedestrian detection dataset.
Keywords/Search Tags:multimodal data, feature pyramid, pedestrian detection, feature fusion, feature scale
PDF Full Text Request
Related items