Font Size: a A A

Object Detection And Human Object Interaction Detection Algorithms Based On The Fusion Of Semantic Information

Posted on:2020-09-06Degree:MasterType:Thesis
Country:ChinaCandidate:T C WangFull Text:PDF
GTID:2518306518464904Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Object detection is one of the fundamental problems in computer vision field,with numerous real-world applications in autonomous driving and video surveillance.Currently,the research of object detection mainly focuses on two directions.One is to improve the detection accuracy according to the deployment of deeper neural network.Another is to improve the detection speed by designing a light-weight network,which fails to detect the objects accurately.Currently the state-of-the-art single-stage detectors can meet the demand on real-time detection.But the detection accuracy is one weakness comparing with the two-stage object detectors.In the meantime,object detection is not enough for visual perception in autonomous driving and video surveillance fields,which requires some tasks with deeper understanding for semantic scene.Human object interaction(HOI)is one of those tasks for visual relationship detection.Given one input image,HOI needs to localize the human and objects and predicts the relationship between them as well.To break through the bottleneck of detection accuracy in single-stage object detectors,this thesis considers the classical image pyramid network and designs a novel image pyramid block,called efficient featurized image pyramid block(EFIP).The proposed EFIP can compensate the information for single-stage detectors to enhance the discriminative power of convolution features.The structure of EFIP is very simple and the parameters are very less.It can improve the detection accuracy of single-stage detectors while keeping the high detection speed.To further merge the high-level information and low-level information,the forward feature fusion module(FFM)and backward feature fusion module(BFM)are further developed.The proposed FFM and BFM use down-sampling and up-sampling operations,respectively to merge the deep features and the shallow features,so as to achieve higher detection accuracy.The LRNet based on the proposed EFIP,FFM and BFM,achieves state-of-the-art detection accuracy while keeping high detection speed.To further improve the detection accuracy of HOI,this thesis considers the large kernel module usually used in object detection and designs an efficient module,called context aggregation block(CAB)to capture the context information around the instances.To further merge the local semantics and global semantics,the local encoding module and the context-based attention module are used to encode the instance features containing the global context information and highlight the important region in the global feature using the local semantic information.The CSNet based on the proposed context aggregation block,local encoding module and context-based attention module.The proposed CSNet can achieve state-of-the-art detection accuracy on some HOI datasets.
Keywords/Search Tags:Deep learning, Object detection, Convolutional neural network, Semantic information fusion, Human object interaction
PDF Full Text Request
Related items