| As clean energy,wind energy will become a new economic growth point in the future compared with traditional energy.After more than a decade of development,China’s wind power market has been expanding day by day,but due to the volatility and uncertainty of wind energy,a natural resource,it has brought challenges to the integration of wind power into the grid.The development of wind power has generated a large amount of data,and when the data is recorded,due to the damage or maintenance of the recording equipment,data entry personnel input errors and network intrusion attacks,etc.,abnormal data in the wind power data are widespread,and the data quality of the wind power data is uneven.The forecasting error in the process of wind power forecasting mainly comes from two aspects,namely the error caused by the data input stage and the error caused by the model forecasting stage.Based on the research angle of reducing the error of the data input stage,this thesis conducts the research work of wind power forecasting.The main work includes the following aspects:(1)Firstly,the data are preliminarily explored: the influence of wind speed,temperature,deflection angle and other factors on wind power in the dataset is analyzed.The missing values in the wind power dataset are found and processed.The abnormal data of different variables in the dataset are found and repaired,and it is found that compared with other variables such as wind speed,there are a large number of anomalies in the variable data of power.(2)In this thesis,the Isolation Forest algorithm is selected to identify the anomalies of wind power: the scatter plot of wind speed and wind power is plotted,and it is found that many sample data deviate from the power distribution;After using Isolation Forest identification,it is found that most of the anomal data are distributed on the right side of the S-shaped distribution,and rarely on the left side.(3)In this thesis,the LightGBM algorithm is selected to repair the abnormal data of wind power: LightGBM is constructed by taking wind speed as the input variable,wind power as the target variable,and normal sample data as the training set.The abnormal data samples are predicted by using the model as a test set,and compared with the power prediction results and the actual situation,it is found that the repaired wind power are closer to the S-shaped distribution,and the data quality is indeed improved.(4)Finally,LSTM is trained using the above fixed dataset: the wind power in the next 48 hours are predicted,including a total of 288 prediction time points;The prediction results are compared with the actual situation,and it is found that the prediction results are basically consistent with the actual power change trend,and the prediction effect is better in the period of gentle wind power change.The prediction results are compared with the power prediction results without abnormal repair,and it is found that if the data is not repaired,the power prediction results are always low,and when the actual power is low,the prediction results are not much different from the actual power,and when the actual power is high,the prediction results are completely inconsistent with the actual power.This work shows that Isolation Forest and LightGBM used in this thesis to identify and repair anomalous data to improve data quality play an important role in improving the accuracy of wind power forecasting. |