The prediction of photovoltaic power generation is an important work in the field of clean energy.The accurate prediction results can eliminate the hidden danger to the economic operation of power enterprises due to the randomness and intermittence of photovoltaic power generation.In order to achieve high-precision photovoltaic power generation prediction results,the prediction algorithm model combined with artificial intelligence has been widely used.As an important part of photovoltaic power generation prediction,the data preprocessing model of photovoltaic power generation has also achieved some research results.However,due to the characteristics of photovoltaic data itself,such as large sample size,high data dimension and large time span,there are still some defects in the existing data preprocessing research,such as simple data classification,inaccurate feature point selection and single noise removal method.In view of the above problems,this paper proposes a combined model of photovoltaic forecasting data preprocessing based on clustering and similar day theory,which is used to preprocess photovoltaic power generation data and improve the accuracy of short-term photovoltaic power generation forecasting model.Firstly,this paper comprehensively and systematically describes the research status of data preprocessing model in short-term photovoltaic power generation prediction,and analyzes the advantages and disadvantages of various data preprocessing methods.Then,the options clustering algorithm and the improved similar day theory are introduced in detail.Then,according to the power generation characteristics of photovoltaic power generation,combined with the power generation data of a photovoltaic power station in Shanghai,the characteristics of various factors affecting photovoltaic power generation are determined;Aiming at the classification of photovoltaic power generation data,the basic idea of using Gaussian fitting function and similar day coefficient to construct similar day selection is proposed;Aiming at the problem of data noise removal,the density based options clustering algorithm is used to judge the data noise from the perspective of data association;Finally,using the combination model proposed in this paper,using the historical power generation data and meteorological data of a photovoltaic power station,using different photovoltaic power generation prediction models for comparative analysis,the results show that the proposed method can not only effectively improve the prediction accuracy of photovoltaic power generation model,but also widely applicable to different photovoltaic power generation prediction models. |