| Artificial intelligence is now developing rapidly on the fast lane.With the increasing demand for three-dimensional(3D)object recognition in the fields of robot and automatic driving,3D object recognition methods emerge in endlessly.With the advantages of simplicity,ease of use and efficiency,3D object recognition methods based on multi-view stand out and become the best method in the field of 3D object recognition.The method based on grouping mechanism achieves good performance by considering the connection and difference between multiple views.However,the existing multi-view 3D object recognition methods based on grouping mechanism have defects in model design,lack of interpretability,and can’t reasonably group multi-view.And the existing methods fail to make full use of the differences between multiple views,ignoring the important role of the differences between multiple views.In addition,the design of grouping mechanism makes the method more modules and the network structure more complex,which is not conducive to the application landing in the real scene.Based on the above problems,this paper proposes three new multi-view 3D object recognition methods by using the functional properties such as L2 norm,Sigmoid activation function,Re LU activation function and e-exponential function,visual Transformer and multi-view properties.The proposed method can improve the accuracy of multi-view 3D object recognition from the perspectives of grouping module,feature extraction,feature fusion and simplifying network.(1)Double Weighted Convolution Neural Network(DWCNN network)for multi-view3 D object recognition.At present,the multi-view 3D object recognition method based on grouping mechanism has defects in the design of grouping module,which is unreasonable and lack of interpretability.To solve this problem,this paper proposes a L2-Sigmoid(L2-S)grouping module designed by skillfully using the properties of L2 norm and Sigmoid activation function to realize more reasonable grouping of multiple views and solve the problem of lack of interpretability of grouping module.Combining the L2-S grouping module with the designed double weighted fusion module,a new method is proposed: DWCNN network.This method can achieve more reasonable grouping of multiple views through L2-S grouping module,and achieve good performance.(2)Multi-view 3D object recognition with Double E-grouping Swin Transformer(DEST network).In order to overcome the drawbacks of using Convolutional Neural Networks(CNN)in feature extraction,the Swin Transformer,a popular visual Transformer method,is considered as the backbone network for multi-view 3D object recognition.However,due to the lack of feature fusion module,visual Transformer cannot be directly applied to multi-view 3D object recognition tasks,so a new grouping fusion module is designed.The L2-Re LU(LR)grouping module designed with the properties of L2 norm and Re LU activation function is an upgraded version of L2-S grouping module.The newly designed double E fusion module with e-exponential function fully emphasizes the importance of the difference between views in the multi-view feature fusion stage,and solves the problem that the existing methods can’t make full use of the differences between multiple views.The combination of LR grouping module and double E fusion module is called LR-DE grouping fusion module,which effectively solves the problem that Swin Transformer cannot be directly applied to multi-view 3D object recognition tasks.Based on LR-DE grouping fusion module,a new method is proposed: DEST network.The LR-DE grouping fusion module enables the powerful general backbone Swin Transformer to be used for multi-view3 D object recognition tasks and advanced performance for DEST networks.(3)E-fusion Convolution Neural Network(EFCNN network)for multi-view 3D object recognition.In order to simplify the multi-view 3D object recognition method,considering the two most intuitive properties of multi-view connection and difference,it is proposed that only two important modules should be included.Bi-directional Long Short-Term Memory(Bi-LSTM)and E-fusion module methods: EFCNN network.And the spectral theorem is cited to prove the importance of the differences between multiple views.The two modules retained by the network make full use of the connections and differences between multiple views to achieve the best performance in multi-view 3D object recognition tasks,and are also the simplest method in the existing methods with good performance.(4)In the experimental part,in order to show the effectiveness of the multi-view 3D object recognition method proposed in this paper,this method is compared with the classical,advanced and latest methods.Sufficient experiments show that the proposed multiview 3D object recognition method can achieve advanced performance on the mainstream public datasets Model Net40 and Model Net10.For the EFCNN network proposed in this paper,the classification accuracy can reach 99.35%,the instance accuracy can reach99.34%,and the m AP can reach 99.18%.In addition,in order to demonstrate the effectiveness of the module designed in this paper,sufficient ablation experiments were performed in the experimental sections of all methods to demonstrate the performance of the module.Finally,using the performance of each class of the dataset,the reasons affecting the further improvement of the performance of multi-view 3D object recognition methods are analyzed. |