| With the continuous development and application of autonomous driving technology,3D point cloud data based on Li DAR sensors has become an indispensable part of the field of autonomous driving.Compared with traditional image data,3D point cloud data is more direct and real,and has richer information.At the same time,with the continuous improvement of 3D Li DAR technology and data processing capabilities,it has become possible to obtain and process large-scale point cloud data.Therefore,effective scene recognition of 3D point cloud has become a research hotspot in the field of autonomous driving.This thesis focuses on researching deep learning-based 3D point cloud scene recognition methods.There are two main problems with existing 3D point cloud scene recognition algorithms.Firstly,the mutual interference,overlap,and occlusion between different elements in the scene data lead to high noise and redundancy in the 3D point cloud data.Secondly,due to the relatively small size of objects such as streetlights and trees in the scene data,their corresponding 3D point cloud density is low,making them easy to ignore or misclassify.However,in real scenes,there are usually many streetlights and trees,and ignoring or misclassifying these objects may have a significant impact on decision-making for autonomous driving.To address the first problem of high noise and redundancy in 3D point cloud data,this thesis proposes a 3D point cloud scene recognition network model based on selfattention and Net VLAD(Net for Vector of Locally Aggregated Descriptors)modules.The network first preprocesses the 3D point cloud to reduce the number of points involved in the subsequent self-attention module calculations and obtain a point cluster containing rich domain information to filter out redundant noise.Secondly,the selfattention module is used to adapt to different,highly similar scene features.Finally,to obtain a descriptive vector with significant discriminability and representational ability,the Net VLAD module maps high-dimensional local feature vectors to a fixed-length vector,obtaining a global descriptor vector for matching tasks in scene recognition.To address the second problem of low 3D point cloud density for relatively small objects such as streetlights and trees,this thesis proposes a 3D point cloud scene recognition network model based on multi-level feature fusion.The network first quantizes the input 3D point cloud data sparsely to obtain sparsely quantized 3D point cloud data for subsequent network processing.Then,the feature pyramid module is used to extract features at different levels,and the ECA(Effective Channel Attention)module is added in the lateral connection operation of the feature pyramid module to improve the network’s robustness by adaptively weighting channel feature maps at different scales.Finally,the information concatenated at different levels in the feature pyramid module is filtered through the fully connected layer in the post-processing module,and then the generalized mean pooling generates the final global descriptor vector.The proposed network models are implemented using deep learning frameworks and tested on publicly available scene datasets.The experimental results show that the3 D point cloud scene recognition network model based on self-attention and Net VLAD modules proposed in this thesis can filter redundant noise while obtaining rich scene information in scene data,and compared with existing methods,the proposed method achieves higher scene recognition accuracy.The 3D point cloud scene recognition network model based on multi-level feature fusion proposed in this thesis can capture features at different levels,reduce the loss of information on relatively small objects such as streetlights and trees,and improve recognition accuracy and robustness.This model has a certain competitive advantage in 3D point cloud scene recognition compared to existing methods. |