| The demand for ore sorting in contemporary mining and related fields continues to increase,prompting people to explore the application of modern technology to improve sorting efficiency.In recent years,the development of neural networks and deep learning has led to the widespread application of object detection tasks.To improve the productivity,this thesis applies object detection tasks to the material sorting process on a vibrating platform,achieving the detection and tracking of materials conveyed by vibration.Based on the analysis of scene requirements,this thesis builds a corresponding experimental platform using a direct vibration feeder.By analyzing the principles of data acquisition devices,experimental data is collected and an image and video dataset is constructed.The dataset is then subjected to statistical analysis.In the constructed dataset,the target size is very small,with an average pixel occupancy of approximately 0.09% in the images.Due to the significant variation in the number of targets in each image of the dataset,the test set is divided into three groups based on the number of targets in each image to investigate the performance of the object detection network under different target densities.Given the close proximity of targets in the experimental scene of this thesis,an anchor-based Faster R-CNN object detection network is chosen as the basic framework for detecting targets in static images.To reinforce useful information and suppress irrelevant information in the neural network during the training,this thesis proposes an improved attention mechanism by analyzing the characteristics of common attention mechanisms.In order to achieve higher detection accuracy while minimizing the number of parameters in the object detection network,the improved attention mechanism is introduced into the RepVGGBlock with a reparameterization structure as the basic convolution module.According to the characteristics of small target size in this thesis,a corresponding feature extraction network is constructed based on the constructed basic convolution module.Experimental results demonstrate that the detection accuracy of the feature extraction network constructed in this thesis is improved by 12.46% to 18.51%compared to some common feature extraction networks,and the total number of parameters in the network is reduced by 7.472 M to 58.896 M during the inference phase.Compared to common attention mechanisms,the improved attention mechanism in this thesis achieves a 2.45% increase in object detection accuracy with an increase of only approximately 0.001 M parameters.For the object detection task in videos,based on YOLOv5 s,this thesis introduces the improved basic convolution module introduced into its feature extraction network,resulting in the detection accuracy improved by 0.34% in the absence of a significant improvement in the target detection network parameters.For object detection and tracking in videos,this thesis introduces the DeepSORT object tracking algorithm into YOLOv5 s and reconstructs the image classification network in DeepSORT according to the characteristics of the scene.Experimental results demonstrate that the improved object tracking algorithm in this thesis achieves a 0.1% to 5.2% improvement in multi-object tracking accuracy and a decrease of 0.02 to 0.23 in the inaccuracy of predicted bounding boxes. |