| To date,more than 70% of recombinant therapeutic drugs on the market are produced by CHO cells.According to the characteristics of cell lines and products,customized CHO cell medium can improve production and ensure quality.However,most enterprises use commercially general medium,which is not only difficult to meet the optimization demand,but also has supply uncertainties and relatively high price.Therefore,with the increasingly enhancement of independent research capabilities,customized media has become the current trend.As the composition of CHO cell medium is complex,statistics,as an experimental design method to save manpower and material resources,is widely used in the CHO cell medium development.However,the traditional DOE relies on researchers to narrow the focus,difficult to process high-dimensional data,so the development cycle fluctuates with the researchers,which limits the internal efficiency in enterprises.Therefore,in order to solve the tail-end problem in medium development,it is necessary to use high-dimensional algorithms,integrate and optimize each link to establish a more general CHO cell medium development process,providing guidance and suggestions for the medium development and production.For guiding the establishment,this study analyzed the data characteristics first.Then the medium bank was selected with maximum Euclidean distance and the mixture experiment was designed using simplex lattice.Compared with the traditional PCA method,the linear correlation of the experimental design was reduced from 0.60 to 0.50,and the design space was expanded from 1.30 to 1.80,which improved the data utilization.The nonparametric local regression was then used to analyze the relationship between medium components and titers.The results on cell line A showed that the LOESS-based analysis method increases the accuracy from 20%to 60%compared with the PLS wildly used in commercial software,which was more helpful to the understanding of the relationship between medium components and titers,so as to guide the establishment of the subsequent prediction model.A multi-modal GMM and GBDT were combined to establish the prediction model according to the non-unimodal distribution characteristic of output,and it was confirmed that the superposition of LOESS analysis algorithm improved the model prediction accuracy.Then,the optimization efficiency of GA and Uncertainty Estimates were compared based on prediction model.The results on cell line A showed that the mAb titer using medium developed by GA was 5%higher than the optimal titer of mixture experiment,while that by Uncertainty Estimates was 24%higher,so it was determined as the generation algorithm of optimization formula.Besides,compared with the PLS as the prediction model,the average prediction accuracy of the GMM-GBDT model was increased by 14%,up to 87%.Based on the yield prediction model,the LIME algorithm was introduced to analyze the formula characteristics,and it was found that B301/FM3F was the most stable medium combination for cell line A,and S204B/S204F was more likely to cause yield fluctuation.The key medium components with great influence on yield were obtained by further analysis of local CPP.Hypothesis testing verified that fluctuations of CPP within ±10%had a significant effect on yield,and there were significant differences in the stability between the B301/FM3F and S204B/S204F.Based on this,the component control range under the acceptable yield fluctuation range can be evaluated,providing guidance for production stability control.Finally,the above-established medium development method was applied on cell line B expressing IgGl antibodies.The results showed that four sets of medium combinations were obtained and mAb titer was increased from 2.74 g/L to 4.36 g/L,which was 59%higher than the optimal titer of mixture experiment.The accuracy of the GMM-GBDT prediction model reached 84%.Based on the prediction model,it was suggested that BM31D/FM12D was the most stable medium combination,and the yield fluctuation range was 64%of the BM38D/FM12D combination.The effects of the formula stability and CPP analysis was verified by hypothesis testing on the single factor experimental results of local CPP.In this study,a more general CHO cell culture medium development method was established by integrating statistical methods,which can dealing with non-linear dataset under high dimensional settings(number of variables>>number of observations),and reduce the professional requirements of users.As a powerful auxiliary tool of traditional DOE software,this method can provide relevant guidance for the CHO cell medium production and pursue high yield under the premise of ensuring stability. |