| Chinese painting has a long history and deep cultural background.Chinese painting pattern elements are widely used in artistic design of daily necessities,especially ceramic patterns and Chinese paintings.The purpose of this study is to establish a model of dual-mode relationship between Chinese text and Chinese painting pattern based on in-depth learning method,to generate Chinese painting pattern,and to apply it to individual user customization of ceramic daily product pattern.The research results of this paper are applied to the platform of Chinese text to generate Chinese painting patterns,to provide Chinese painting patterns for users quickly,and to reduce the difficulty and time cost of obtaining Chinese painting patterns.This paper first establishes a multimodal datasets of Chinese painting pattern and text description,uses crawler technology to get Chinese painting pictures and classify them,uses Canny edge detection and edge density calculation to automatically find targets for long pictures and clip them,and filters and cleans the picture datasets.This study set keyword labels for all 45776 Chinese painting pictures,including the category of Chinese painting,the skills of Chinese painting,the main content and descriptions of the content.Describe 13311 pictures of Chinese traditional painting by hand labeling text;Use formatted fill-in text descriptions for the remaining 32,465 pictures.Finally,the datasets is divided into 36 624 training sets,4576 validation sets and 4576 test sets.In this paper,VQGAN is used as the discrete regeneration model of Chinese painting pattern.According to the problem that face and other details cannot be represented and freehand color blocks are not uniform when regenerating Chinese painting pattern,RDVQGAN and MRFVQGAN are proposed.In order to enhance the model’s perception of image details and its perceptual range,the perception modules:the Residual in Residual Dense Block or the Multibranched Receptive Field Residual Group are added respectively in the model generator.The FID index of RDVQGAN model is 25.73,which is 3%higher than that of the original model.The IS index was 18.82,which increased by 21.2%.Only IS was better than the original model.The FID index of MRFVQGAN was 14.37,which decreased by 42.05%.The IS index was 19.76,27.2%higher than the original model.In this study,Chinese text is encoded as a sequence by Jieba participle and BERT.Transformer model is used to establish the semantic relationship between the text sequence to achieve the generation of Chinese painting images from the text sequence to the sequence.This paper optimizes the structure of the model and introduces large-scale Chinese pre-training RoBERTa to fine-tune the text encoder to improve the weak generalization ability of the dataset due to the small amount of text.After fine-tuning,the R score of image correlation index generated by text is increased by 32.8%~52.9%.The best model Transformer-MRFVQGAN,FID index 41.53,IS index 19.62,R score 0.87.This paper combines CLIP pre-training model,aligns text embedding with image embedding,improves GPT structure,combines CLIP cosine similarity loss with GPT-generated image cross-entropy loss,and masks the random position of the trained image sequence to enhance model generation and generalization ability.The visual effect of the image generated by these models is significantly improved,with an average decrease of 49.4%compared with the Transformer-VQGANs model.The IS index increased by 19.0%on average.The average increase in R scores was 20.16%.The best model CLIP(fine tuning)-GPT-MRFVQGAN model produces image FID of 19.27,IS index of 22.07,R score of 0.91. |