| One of the difficulties of virtual try on technology is that it is required to recover corresponding three-dimensional information only from a single image,without reconstructing three-dimensional human model and three-dimensional garment.Aiming at this problem,this paper proposes an imagebased virtual try on network named T3D-VTON.The network uses user images and target garment as input to output the user’s 3D virtual try on model.The network consists of three modules,including garment deformation module,depth estimation module and three-dimensional fusion module.In the garment deformation module,the encoder and decoder structure are adopted.In order to enhance the ability of the encoder to extract the characteristics of the user image and target garment,the convolution block attention module is introduced so that the network can pay more attention to the key features in the image and reduce the influence of unrelated features.Firstly,the main human body feature of the user image extracted by the encoder can improve the precision of the network generation segmentation map.Secondly,the feature guides the TPS deformation of the garment to make it more in line with the geometric region characteristics of the user’s replacement.In depth estimation module,this paper adopts a multi-scale encoder-decoder network structure mixed with Res Net and Transformer.The encoder first extracts the features of the input image better through Res Net,retains more detail information,and then extracts the semantic information and spatial information contained in the feature through Transformer structure.This is in favor of decoder to predict the depth value more precisely.At the same time,the network adopts multi-scale fusion structure,combining shallow and deep features.The three-dimensional fusion module uses U-Net structure to combine two-dimensional and three-dimensional information to produce three-dimensional virtual fitting human model.The validity of the T3D-VTON is proved by the qualitative and quantitative analysis with the baseline network.Quantitative experimental results show that in the two-dimensional virtual fitting effect,the structural similarity is increased by 0.0157 and the peak noise ratio is increased by 0.1132 compared to the baseline network,indicating that the quality of image generation is improved and less information is lost.In generating human model accuracy,the absolute relative error decreased by 0.037 and the square relative error decreased by 0.014 compared to the baseline network,indicating that the three-dimensional manikin generated by the network is more accurate and the depth value is more in line with the original given ground truth.The qualitative experimental results show that the deformation of the garment fits the original area of the target body more closely in the characterization of the test garment effect.When dealing with complex textures,the network can better preserve the logo and material of clothing fabrics.In the human model generation effect representation,it shows that the three-dimensional human model generated by the network presents a clearer outline edge,effectively eliminating the adhesion between the arms and abdomen.When the knees are adjacent to each other,the network eliminates adhesion between the knees.The above results indicate the effectiveness of the convolutional block attention module and Transformer introduced by T3D-VTON.When dealing with complex textures,it can effectively regulate the transformation of clothing and the fusion of target characters.While retaining the styling effects of garment textures,such as signs and text on garments.It can also produce sharper edge effects with superior shape generation capabilities.On this basis,considering the problem of body classification and garment size selection in virtual fitting,the body size classification algorithm is designed based on the three-dimensional human model,and the appropriate clothing size is recommended for users according to the predictive category results.First of all,the neural network which can extract the shape characteristics of three-dimensional human model is needed.By comparing the classification accuracy of Point Net and DGCNN network,DGCNN is selected as the optimal feature extractor.Secondly,using DGCNN for feature extraction of SPRING data set,the characteristic vector that can represent the shape of the body is obtained as the sample data set required for K-means clustering.On the basis of this,using the clustering center as the standard template for body size,the user human model is inputted to obtain the body size category by comparison of similarity,so as to recommend the clothing size for the user,and the user can get the clothing suitable for the individual body shape as shown in the result of virtual try on. |