| In semiconductor packaging,visual recognition and localization technology is the core technology to achieve precise chip grabbing and placement.High-quality chip images are the prerequisite to ensuring accurate recognition and localization.The accurate recognition and precise localization of chips have high demand and application value.Currently,the visual recognition and localization method applied to the chip production line has limited application scenarios,high equipment hardware and operating environment requirements,poor adaptability to low-quality and multi-directional and multi-scale chip images,and many challenges when dealing with complex environments.Therefore,this thesis aims at the problems of noise in image acquisition and transmission,image blur caused by dynamic imaging,low efficiency caused by manual labelling of large-scale and high-quality data,recognition rate and location accuracy caused by weak characters and multi-directional and multi-scale chips.To solve these problems,systematic research has been carried out around the three aspects of "enhancing image quality,improving labeling efficiency,and improving localization accuracy",using deep neural networks to improve the anti-interference,compatibility,and recognition and localization capabilities of the chip visual system.The research contents of the thesis are as follows:(1)For the denoising task of chip images in static scenes,an unsupervised blind denoising method based on noise extraction modelling and a fast-iterative training strategy is proposed.The method consists of three stages: noise extraction,noise modelling and noise removal.First,a smooth noise block extraction method based on a two-dimensional discrete wavelet transform is proposed to achieve accurate extraction of noise information;then,a multi-generator-multi-discriminator is proposed.Finally,an unsupervised blind denoising method based on plastic convolution and a fast-iterative training strategy is proposed,which uses malleable convolution to reduce deep neural network computation and improve denoising speed.The iterative training strategy improves the denoising effect of the model.Experiments show that the method can effectively remove noise interference on the chip image and improve the vision system’s recognition and localization performance and anti-interference ability in static scenes.(2)For the deblurring task of chip images in dynamic scenes,an end-to-end blind deblurring method with multiple inputs and outputs based on the fast residual Fourier transform lightweight module is proposed.First,a coarse-to-fine network structure with a multi-input and multi-output mechanism is designed to alleviate the problems of high computational complexity and long latency;then,a lightweight module based on residual fast Fourier transform is proposed to extract images completely by The global domain,spatial domain,and frequency domain difference information between them can improve the deblurring performance of the network.Experiments show that this method can effectively compensate for the chip’s image quality problem and improve the recognition and localization performance and anti-interference of the vision system in dynamic scenes.(3)For the automatic labelling task of chip images,an efficient and accurate automatic labelling method of chip images based on weakly supervised learning and bounding box correction is proposed.Firstly,a weakly supervised object detection algorithm based on cross-image co-attention mechanism is proposed to realize complete automatic labelling of character content;then,a bounding box correction algorithm based on sub-pixel edge detection is proposed to achieve accurate automatic labelling of chip localization;Experiments show that this method automatically label chips,reducing the labelling cost of the vision system and significantly improving the labelling efficiency.(4)For the recognition and localization task of chip images in multi-directional scenarios,a multi-oriented chip end-to-end recognition and localization method based on improved Oriented R-CNN is proposed.First,an image token pyramid transformer backbone network is proposed to enhance the discriminative features of weak and small characters;then,a directed target cascade detection head is proposed to improve chip position prediction’s accuracy gradually.Experiments show that this method can achieve lower false and missed detection rates and higher detection accuracy,with strong character recognition and chip localization capability.Finally,according to the four algorithms proposed in this thesis,the corresponding chip visual recognition and localization system is constructed,the software design and development of the human-computer interface is carried out,and the application test is carried out on the chip mounter.The results show that the visual system meets the technical requirements.Accurate identification and precise localization of chips in static/dynamic scenarios are realized. |