Font Size: a A A

Research Of Image Semantic Extraction Methods Based On Deep Learning

Posted on:2024-04-21Degree:MasterType:Thesis
Country:ChinaCandidate:Z Q MeiFull Text:PDF
GTID:2568307052996199Subject:Electronic information
Abstract/Summary:PDF Full Text Request
The application of computer vision relies on the understanding and extraction of image semantics.Image semantics,refers to the meaning of the image content.For image semantic extraction methods,in terms of granularity,there are three methods:image retrieval,target detection,and semantic segmentation,which implement image level,target level,and pixel level semantic extraction for the target of interest,respectively.This thesis focuses on deep learning-based image semantic extraction methods,and conducts research step by step according to the degree of accuracy of semantic analysis tasks.For the problem of lack of focus on local key information in the case of multi-target classification in image retrieval,LF-ResNet model is proposed,which effectively improves feature extraction;for the problem of target detection in the case of large differences in target scales and sample imbalance,YOLOv5-BiFPN model is proposed,which enhances the localization ability and multi-scale information extraction ability of the model;for semantic segmentation The TF-ChangeNet model is proposed for the problem of missing pixel information in the case of blurred and occluded edges of processed images,which effectively improves the feature fusion and information inference capabilities.At the same time,combining LF-ResNet with YOLOv5-BiFPN and TF-ChangeNet models,this thesis extracts image semantics from different granularity,which is complementary to target detection and semantics segmentation.The details are as follows.(1)Image retrieval method research based on LF-ResNet:In such practical application scenarios as commodity retrieval and surface retrieval where the objects to be retrieved are of various categories with small disparities between classes,and the image quality is affected by multiple factors such as merging light and occlusion,the existing classification retrieval networks cannot achieve the desired results.This thesis propose the LF-ResNet model,using the feature pyramid structure to do the fusion of features in each stage of ResNet,and adding the CBAM attention module to achieve the purpose of focusing on local key information by using the attention mechanism.This thesis validate the model using the commodity outbound image dataset in logistics field,and the accuracy rate reaches 80.5%,which is better than the current excellent image retrieval methods.At the same time,this thesis introduces a large-scale vector retrieval method,constructs an index library and combines the retrieval algorithm of inverted indexing,which greatly accelerates the speed of multiple image retrieval in practical applications and improves the speed by 31%over the general image retrieval process.The proposed image retrieval system can be used as an important auxiliary part of other fine-grained image semantic extraction methods.(2)Object detection method research based on YOLOv5-BiFPN:Models such as Faster R-CNN and YOLO have achieved better results in the task of target detection of natural objects.However,in processing images without obvious texture features,such as document-like images and printing images,the target objects to be detected are diverse and the number of different classes varies greatly,and the problem of sample imbalance easily arises,which poses a challenge to the existing models.To address this problem,this thesis propose the YOLOv5-BiFPN model,which introduces a new border loss function EIoU to balance the problem of large differences in the number of different samples by using improved prediction frame regression accuracy.At the same time,the problem of excessive scale differences and spacing between different kinds of objects or the same kind of objects can occur.This thesis improve the feature pyramid structure of YOLOv5 network and use BiFPN to extract multi-scale information,so that we can better cope with the size differences of different objects.Finally,this thesis verifies the effectiveness of the model using the dataset of courseware image resources from the Sequoia online teaching platform,and the MAP value of target detection reaches 83%,which is 5 percentage points higher than the YOLOv5 model.Based on the YOLOv5-BiFPN and LF-ResNet models,the detected image objects are further retrieved and classified,while the OCR method is fused to recognize the image text,which provides an important data resource for the construction of the online teaching platform course resource library.(3)Semantic segmentation method research based on TF-ChangeNet:Semantic segmentation is a fine to pixel level method for image semantic extraction.If the number of samples of the target of interest is too small,the problem of sample imbalance will affect the model performance.In order to filter the images that contain the desired semantic information,this thesis uses LF-ResNet model to reject the images that do not contain the target objects in the pre-processing stage before segmentation.When segmenting images,pixel point inference based on surrounding information is required,thus requiring high image quality.However,in practical applications,such as processing satellite remote sensing images,commodity transportation images,and medical images,problems such as occlusion,blurred edges,and too-small targets often occur,resulting in incomplete information in the images.In this thesis,a semantic segmentation model TF-ChangeNet is established,and this framework uses a feature pyramid structure to do feature fusion as a way to obtain multi-scale information to reason to the pixel information of lost and occluded regions.Change detection is an important application of semantic segmentation by analyzing the changes of pixel points in temporal images.In order to infer the change information,this thesis combines twin network structure in the model for dual temporal image change inference.In this thesis,the validity of the model is tested on a homemade Landsat8 satellite image dataset,and the MSIC(The maximum spectral index composite)mosaic synthesis method is proposed to produce the dual time-series images,and the dice coefficient reaches 72.8%,which is about 2 points higher than other commonly used models.In summary,this thesis studies three methods of image semantic extraction with different granularity levels:image retrieval,object detection,and semantic segmentation.Innovative models were proposed for the challenges encountered in image-level,target-level and pixel-level semantic extraction methods,and experiments were carried out through the datasets constructed in thethesis to verify the versatility and effectiveness of the method research.Meanwhile,in order to explore the complementarity of each method,this thesis also combines these image semantic extraction methods,making an important exploration for solving the problems of practical application scenarios.
Keywords/Search Tags:Image Semantic, Object Detection, Image Retrieval, Semantic Segmentation, Change Detection
PDF Full Text Request
Related items