Effective scene perception in adverse imaging conditions caused by factors such as severe weather is of great significance for maintaining the safety and stability of autonomous driving.The focus of this study is scene segmentation technology under adverse imaging conditions based on visual images,with a core methodology of applying transfer learning by combining domain adaptation,test-time adaptation,and knowledge distillation to bridge the gap between limited training datasets and real adverse weather scenarios.This approach helps scene segmentation models achieve efficient and robust predictions.Existing domain adaptation semantic segmentation methods face challenges such as unclear value of training data,difficulty in filtering out adverse imaging condition factors,lack of effective fine-tuning methods tailored to application scenarios,and difficulties in achieving real-time deployment,all of which severely hinder the practical deployment of related technologies.To effectively address these issues,this study reorganizes the process of semantic segmentation models under adverse imaging conditions from training to deployment and conducts in-depth research by integrating the latest techniques such as large model pre-training.The main focus and innovations are as follows:(1)To solve the problem of unclear value of training data,a training architecture PDGD using progressive domain gap decoupling is designed.Value analysis is carried out on three types of alternative intermediate domain data from two levels to explore whether training is effective after decoupling domain gap,and the appropriate intermediate domain data is selected to segment the original domain gap,successfully reducing the difficulty of cross-domain training.Specifically,PDGD fixed the data of source domain and target domain,reserved the position of intermediate domain,and indirectly compared the value of various intermediate domain data by comparing the final performance of the model tested on the target domain.PDGD also uses the mean variance value to calculate the "subjective domain gap" index related to the model.This concept is different from the "objective domain gap",which is only related to the domain itself,and can be used to measure the generalization of the model across different domains.Each subjective domain gap is associated with one model and two domains.The lower the subjective domain gap value,the more easily the current model can overcome the difference between the two domains and generate robust predictions,that is,the stronger the generalization performance of the model.PDGD architecture was used to improve the commonly used domain adaptive method,and experiments were carried out on the semantic segmentation data benchmark under the two mainstream adverse imaging conditions of ACDC and Foggy Zurich.Finally,it was proved that the clear scene real image with the style characteristics of the target domain had the greatest value in helping the model overcome the domain gap.The proposed PDGD framework can effectively improve the prediction artifacts caused by style changes between domains in existing semantic segmentation models..(2)To solve the problem that the influence of bad imaging conditions is difficult to remove,an unsupervised domain adaptive semantic segmentation method SDAT-former ++ based on enhanced teacher is proposed.In this work,context information and style features are extracted based on mask learning based on the best intermediate domain data selected by the previous research,and an implicit filtering module of bad image factors based on adversarial training is added.In order to effectively integrate the knowledge of multiple steps and avoid the mutual interference between gradients,this work refines the progressive reference learning into a cycle training consisting of several local iterations.In each local iteration process,knowledge from different sources and different forms is integrated and finally summarized into the teacher network to guide the weight update of the teacher network.Finally,the teacher model is enhanced and can produce better pseudo-labels on the target domain images with poor imaging conditions,thus guiding the student model to train.In this paper,the proposed method is tested on the public ACDC dataset,the Foggy Zurich dataset and the more difficult Foggy Driving/Foggy Driving Dense dataset.The results show that the student model trained by SDATformer ++ has good robustness,and can achieve more accurate segmentation and prediction for difficult scenes such as thick fog and dark night.(3)To solve the problem that it is difficult for offline training methods to carry out lifelong learning for application scenarios,an adaptation scheme USP in the test stage was constructed by introducing entropy self-prompting.Usp was designed to use a small amount of unlabeled real scene data to build an uncertainty prompt,guide the model to pay attention to the features of the gap in the expression domain,and complete the iterative update of model parameters in the test stage.As a result,the model allows for lifelong learning and improved adaptability in real-world scenarios.In order to alleviate the catastrophic forgetting caused by long-term autonomous learning,USP also combines the pre-trained Segment Anything Model(SAM)to design a correction guidance scheme that integrates the prior knowledge of structure.The new prediction results are constrained in the correct spatial structure by semantic fusion and spatial structure consistency loss function.USP first verified the effectiveness of the method on the mainstream Cityscap-to-ACDC benchmark,and then carried out the experiment relying on Zhengzhou dataset and KaiFengNight dataset,which were independently collected.It is proved that USP can effectively guide the model to focus on the new input knowledge in the test stage,and maintain the ability to judge the spatial structure with the help of SAM.(4)On the basis of the above work,aiming at the problem that the trained model is difficult to deploy on the hardware terminal,a cross-structural feature projective knowledge distillation method FP-KD,which takes context and detail coding into account,is studied.The segmentation model obtained in the previous training is used as a teacher to transfer the knowledge contained in it into the real-time two-branch semantic segmentation convolutional neural network Bi Se Net.Then,Onnx intermediate format and TensorRT inference library are used to complete the deployment of GPU hardware.FP-KD focuses on the special architecture of the two-branch realtime semantic segmentation model: the context feature branch and the detail feature branch are processed,for which the context feature is aligned to the detail feature respectively.Specifically,FP-KD uses two feature projection modules to project the context and details of the teacher model and the student model into the same dimensional feature space,and then performs adversarial training with a feature discriminator to avoid output identity mapping.The experimental results show that this method can effectively complete the knowledge transfer,and make the student model keep the functions and characteristics of the teacher model as much as possible in terms of context and details,and the performance changes after deployment are also within a reasonable range,and efficient real-time reasoning is realized in the real scene. |