Font Size: a A A

Research On Semantic Understanding Algorithms Based On Visual Imagery

Posted on:2021-05-29Degree:DoctorType:Dissertation
Country:ChinaCandidate:J FangFull Text:PDF
GTID:1488306455963119Subject:Signal and Information Processing
Abstract/Summary:PDF Full Text Request
The visual imagery based semantic understanding is an important research topic in fields of imagery processing and analysis.How to mine useful and effective information from a large amount of imagery and video data,so as to better serve important national needs such as military reconnaissance,public security,modern medical treatment and smart city,has become an urgent problem in visual imagery based semantic understanding.In addition,the visual imagery based semantic understanding has been widely concerned and made significant progress since its great theoretical and extensive practical application values.However,the following problems still exist: 1)the mining for degradation factor is inadequate,2)the representation for imagery discriminant information is incomplete,3)the utilization for data distribution is insufficient,and 4)the description of imagery structure information is incompact.To address the four aforementioned four problems,the main contents of this paper can be summarized as follows.(1)Multi-task learning mechanism for image motion deblurring.The proposed method considers the image motion deblurring task as an image transformation one,from “motion” domain to “clear” domain.The data-driven strategy is used to learn the transformation relationships between different domains,and thus alleviating the overreliance on prior knowledge of traditional kernel estimation based methods.In addition,this method finds that,it brings different degradations when the same motion kernel is applied to images with different texture complexities.Specifically,the more complex images correspond to the increasingly severely affected.According to this phenomenon,the proposed method creatively introduces the texture complexity information of the image into the motion deblurring framework to guide its parameter optimization,and enhances the reconstruct ability of the network for complex regions by a weighted attention mask strategy.(2)Robust space-frequency joint representation learning mechanism for remote image scene classification.The proposed method creatively unifies the space-and frequency-domain information into an integral framework,which fully exploits the advantages of them.To be specific,the method uses convolutional neural network and multiscale band pass filtering network to extract the spatial discriminant and frequency statistical information of the image,and uses a one dimension cycle convolution strategy to fuse two different information effectively.Because of the introduction of frequency domain statistical information,the proposed method can address the problem of misclassification of high-resolution remote sensing images caused by their varies shooting angles to a certain extent.In addition,by coding the interactions among different local regions in the image,the proposed method strengthens the semantic discrimination ability of spatial feature descriptors,and thus improves the recognition performances of complex scenes.(3)Explicit and latent difficult sample learning mechanism for salient object detection.The proposed method divides the difficult samples into explicit difficult samples and latent difficult ones,and specific network architectures and optimization strategies are designed according to their different characteristics to improve the performance of the model.Specifically,explicit difficult samples refer to pixels distributed in edge and near-edge regions of the object,these samples are difficult to detect due to the similarity of their apparent features with surrounding pixels and heterogeneity of semantic labels.Latent difficult samples are pixels that have large differences between the predicted result and the true value in the last iteration.The proposed method enhances its sensitivity for these two types of difficult samples by increasing their penalty factors.In addition,according to the information of object size and shape structure,the proposed method adopts multi-scale soft attention strategy to improve its detection accuracy.(4)Spatial structure preserving feature pyramid network for semantic image segmentation.The proposed method adopts feature pyramid model to effectively integrate features of shallow detailed texture and deep distinguishing semantic features of deep neural networks,to ensure the accuracy and structure of the segmentation results.In order to avoid the over-fitting problem caused by insufficient training samples,the proposed method adopts transfer learning strategy to extract image features at different semantic levels by the neural network pretrained on the large-scale image recognition dataset.In addition,according to the similarities among different local regions of the input image and relevance of among corresponding blocks of predicted mask in the last iteration,the proposed method designs a spatial structure preserving loss term,which can propagate the interactions among different pixels in the input image to the predicted mask,and thus avoiding the spatial dispersion problem to a certain extent.
Keywords/Search Tags:Neural network, Multi-task learning, Space-Frequency joint representation, Difficult sample learning, Structure preserving feature pyramid network, Discriminative information
PDF Full Text Request
Related items