| With the continuous development of deep learning technology,computer vision has been extensively researched and applied in detecting dangerous driving behaviors.Using a mobile phone while driving is considered a dangerous driving behavior,and developing real-time detection algorithms for such actions is of significant importance in reducing traffic accidents caused by dangerous driving.However,challenges such as complex lighting conditions,degree of obstruction,monitoring angles,and unclear feature information in complex scenes have made detecting drivers’ phone use a daunting task.Moreover,current mainstream algorithms have low detection accuracy and slow speed when detecting small target objects,which makes it difficult to transfer them to resource-limited embedded devices.To address these challenges,this thesis proposes a target detection network model based on pruning algorithms,attention mechanisms,and feature fusion,and optimizes the algorithm for portability to embedded devices.The main work of this thesis is as follows:(1)To address the problem of insufficient existing datasets,we create the hand_phone driver phone call dataset and the hand_phoner general scene personnel phone call dataset.The data mainly comes from the in-vehicle terminal monitoring equipment of operating vehicles,and the driver’s hand information and cell phone are classified and labeled by Label Img labeling tool,and the establishment of this dataset aims to enrich the public dataset of phone calls.(2)To address the problem of poor recognition of small target objects in complex scenes,a novel target detection network YOLO-PAI is designed for driver handheld call detection.First,the attention mechanism module is introduced to design the SRBlock structure.Secondly,the original 3×3 convolution is replaced by the Inception V3 structure to reduce the number of model parameters and computational effort.Again,a new low-dimensional feature extraction branch is added to broaden the detection range and enhance the feature extraction capability.Finally,the results show that the YOLO-PAI network achieves 94% detection accuracy and 45 FPS detection speed on the hand_phone dataset,which is 1.44% higher and 21 FPS faster than the YOLOv4 network,and the detection accuracy and speed on the two public datasets of MAFA and VOC are better than other popular networks.(3)To extend the recognition scenario,based on the above study,a novel target detection network,YOLO-RESDAI,is designed for handheld call detection of people in general scenarios including driving scenarios.First,the Res Net50 network is used instead of the backbone network in the above YOLO-PAI to obtain deeper hand and phone semantic information.Secondly,features of different depths are fused to improve the network detection performance,which not only deepens the depth of the network structure but also controls the network complexity.Again,the PDCN convolutional structure is designed to slice the input tensor into the upper half to be processed by the convolutional operator and the lower half by the deformable convolution,which can be more robust by more closely matching the shape and size of the object during sampling,reducing the computational cost and improving the accuracy of the model.Finally,the semantic information obtained by the backbone network Res Net50 is input to the YOLO-PAI neck network for feature fusion.The results show that the YOLO-RESDAI network achieves an m AP value of 93.01% and a detection speed of 40 FPS on the hand_phoner dataset,with a detection accuracy of1.66% higher than the m AP of the YOLOv4 network and a detection speed of 16 FPS.the detectability on the KITTI and COCO datasets is better than that of other popular networks.(4)To study the deployment of target detection network on NVIDIA Jetson TX2 embedded devices and to perform practical tests for different lighting and obstacle occlusion situations.First,the trained model files are optimized and compiled using Tensor RT to improve the running efficiency of the algorithm on the Jetson TX2.Second,the optimized network model and the corresponding inference code are deployed and run on the Jetson TX2 development board to further improve the detection speed and accuracy of the algorithm by adjusting the parameters,optimizing the algorithm and hardware to meet the requirements of practical applications.Finally,the experimental results show that the detection accuracy of the YOLO-PAI and YOLO-RESDAI networks proposed in this thesis can reach 93.98% and 92.85% on the Jetson TX2,both of which are higher than the detection accuracy of other popular networks.In terms of detection speed,YOLO-PAI has the fastest detection speed,which can reach 30 FPS,followed by YOLO-RESDAI network,both of which are faster than the other two popular networks by more than 15 FPS. |