Font Size: a A A

Perception Of Image Content Based On Feature Unmixing And Semantic Enhancement

Posted on:2021-04-28Degree:DoctorType:Dissertation
Country:ChinaCandidate:Y Z LiFull Text:PDF
GTID:1488306548474004Subject:Information and Communication Engineering
Abstract/Summary:PDF Full Text Request
Object detection and semantic segmentation can provide complementary object-level and pixel-level perception for image content,which are the core of image content perception technology.They play a central role in various applications such as intelli-gent driving,video surveillance,biometric recognition,intelligent medical diagnosis,human machine interaction,etc.,determining the overall performance of the system.Although the semantic segmentation and object detection based on deep neural net-works have made a breakthrough in recent years,the gap between its performance and human perception is still large,which can not meet the demands of practical application systems for high precision and efficiency.Therefore,the study of object detection and semantic segmentation has important research significance and application value.In order to improve the accuracy and efficiency of object detection and semantic segmentation,in this thesis,we analyze and discover some key problems in the hierar-chical feature representation of deep convolutional neural networks.The main contri-bution and novelty of the proposed methods are as follows.(1)In the aspect of multi-scale prediction in object detection,the feature scale-confusion phenomenon,which causes the problems of missing detection of small ob-jects and false positive detection of large object parts,is first discovered.To solve these problems,a multi-level feature unmixing method is proposed to reconfigure the pyra-mid features to scale-aware features,enhancing the feature representation for corre-sponding scale objects of each pyramid layer and improving the detection precision.With the proposed method,a single-shot object detector,called NETNet,is constructed and evaluated on the benchmark COCO database.As a result,NETNet achieves an optimal trade-off for real-time detection speed and accurate detection results.(2)In terms of the efficiency for binocular based 3D object detection,the existing methods mostly use two backbone networks to extract the features of a pair of binocular images,which limits the detection efficiency.In this thesis,we propose a share network method.Through sharing the calculation of feature extraction for two binocular images,the calculation consumption is significantly reduced to improve the detection efficiency.In order to solve the problem of feature confusion caused by shared features,a multi-level feature unmixing strategy is proposed.By designing the self-feature unmixing module and guided-feature unmixing module,the left and right view features can be effectively unmixed from the shared features.As a result,the proposed method can reduce the computational complexity by 50%while maintaining the detection precision,which demonstrates the effectiveness and efficiency of our method for 3D object de-tection.(3)In the procedure of multi-level feature fusion for semantic segmentation,we analyze the semantic gap phonomemon.To improve the feature fusion,a multi-level semantic enhancement algorithm for semantic segmentation is proposed.By introduc-ing the feature semantic enhancement modules and boundary attention module to bridge the semantic gap,robust multi-level feature fusion can be obtained.Based on above strategies,a semantic enhanced network called SeENet is constructed,which has achieved a top performance when evaluating both on the traffic driving scene datasets and the daily scene parsing datasets.
Keywords/Search Tags:Perception of Image Content, Feature Unmixing, Semantic Enhancement, Object Detection, Semantic Segmentation
PDF Full Text Request
Related items