Font Size: a A A

Research On Semantic Segmentation Methods For Field Grape Images Based On Deep Learning

Posted on:2024-09-23Degree:MasterType:Thesis
Country:ChinaCandidate:D J GongFull Text:PDF
GTID:2543307127999319Subject:Electronic information
Abstract/Summary:PDF Full Text Request
In recent years,with the rapid development of China’s grape industry,the picking process has become increasingly critical throughout the industry chain.However,the current picking process mainly relies on manual operations,resulting in low efficiency,time-consuming and laborintensive,which is difficult to meet the needs of the industry’s high-speed development.Therefore,intelligent picking technology has become the key to promoting the efficient development of the grape industry,it can locate and analyze the status of grapes through image recognition,thereby achieving accurate and effective picking.However,there are various complex factors in actual grape planting scenarios,such as cluster fruit characteristics,irregular shapes,etc.,which pose challenges for existing visual models in recognizing object categories and precise boundary information.At the same time,in order to improve recognition performance,existing models often use stacked convolutional layers and other methods to construct complex networks,which makes it difficult to meet real-time requirements.To address these challenges,this study focused on field grapes as the research subject and utilized pixel-level classification-based semantic segmentation methods to optimize and improve the encoding-decoding network.The objective was to enhance grape recognition accuracy while ensuring real-time requirements.The specific research content is as follows:(1)From the perspective of lightweight model design,this paper proposed a real-time semantic segmentation model based on the Channel Feature Pyramid(CFP)module.The model utilized the CFP module for feature extraction,which extracted multiscale features and context information of grape images through 1×3 and 3×1 dilated convolution skip connections,while reducing the number of model parameters.Next,it used dual attention for upsampling,emphasizing global context information and local spatial information to improve the performance of shallow models in semantic segmentation tasks.Then,by combining pooling layers and convolutional layers,the model preserved information during the downsampling process.The model achieved a mean intersection-over-union of 79.3% on the field grape test set,and ran at a speed of 64.86 frames per second on a single GTX 2080 Ti GPU with an input image resolution of512×512.While having good segmentation accuracy,the model demonstrated excellent real-time performance.(2)To address the problem of weak grape and stem recognition ability caused by insufficient receptive fields during feature extraction in Convolutional Neural Networks(CNN),this paper proposed a semantic segmentation model based on Transformer.The model replaced the encoding part of UNet with a Transformer,using multi-head self-attention mechanisms to obtain global context information and establish long-range dependencies for multiple targets.Furthermore,a Feature Enhancement Module(FEM)was designed to capture context information at multiple scales to solve the problem of Transformer’s lack of local detail perception.The model then used an Adaptive Fusion Module(AFM)to perform upsampling in the decoding stage.The AFM learned the weight coefficients at each spatial position to automatically adjust the fusion ratio of input feature maps,effectively combining high-level semantic information and low-level spatial detail information at different positions.Lastly,transfer learning was employed for training to improve model training efficiency.Experimental results showed that the model achieved a Mean Io U of 83.93% on the grape test set,indicating high segmentation accuracy that met the grape segmentation and recognition requirements in field environments.(3)To fully utilize the advantages of the aforementioned two models and address the difficulty of balancing real-time performance and accuracy,this paper constructed a dual-path semantic segmentation model.The model combined CNN and Transformer in parallel,effectively capturing low-level spatial details and global dependencies in a shallower manner.The Bi Fusion module was employed to fuse features from different paths,effectively integrating the encoded features from CNN and Transformer through spatial attention and channel attention mechanisms.During training,a multi-supervised approach was adopted to better supervise and balance the feature extraction capabilities of the two paths.Finally,the experiments demonstrated that the model achieved a Mean Io U of 84.16 on the grape dataset,along with a real-time processing efficiency of 43.23 frames per second.In summary,this paper addressed the image recognition problem in grape intelligent picking technology and explored semantic segmentation models based on CNN,Transformer,and dualpath feature extraction.Comparative experiments were conducted with other advanced algorithms.The experimental results demonstrated that the proposed dual-path feature extraction model achieved high segmentation accuracy and real-time picking efficiency in grape recognition tasks.This study provides an important theoretical foundation for the vision development of grape picking robots and is expected to promote the efficient and accurate automation of the grape industry’s picking process.
Keywords/Search Tags:Grape harvesting, Semantic segmentation, Channel feature pyramid, Convolutional neural networks, Transformer
PDF Full Text Request
Related items