Monday, 5 June 2023

Application of machine learning algorithms to energy forecasting

 

Modern life and associated lifestyles require reliable and secure power supply. The need for continuity on essential and critical services which include but not limited to healthcare, financial systems, telecommunication, emergency response, navigation, transportation exert the need for reliable energy systems that guarantee continuity of power supply. At the same time, various governments across the globe are advocating for migration to clean energy systems from their ‘unclean’ counterparts which rely on fossil fuels and emit Carbon dioxide (CO2) and raise atmospheric CO2 levels, which warm the planet.  As the energy crisis and the environmental crisis become more serious, Distributed Generations (DGs), as the main forms of Renewable Energy Sources (RESs), have attracted much attention in issues related to energy management and sustainability of the power systems. Distributed energy resources (DERs), which include distributed generation (DG), distributed storage, and adjustable load, are a key component in microgrid operations.

A Microgrid is a small-scale self-controllable power system clustering DERs and loads within clearly defined electrical boundaries which can function in grid-connected or island mode. Microgrids can be clustered at distribution levels to enhance the economics and the reliability of small DGs such as microturbines and wind-generation turbines as well as DGs with power electronic (PE) interfaces such as photovoltaic (PV) arrays and fuel cells. However, the outputs of most of these renewable resources fluctuate depending on weather conditions and time of day, hence the majority of the DERs cannot guarantee a continuous and steady amount of power generation. Over and above this generation variability problem, electricity demand in these microgrids can be unpredictable, hence there is need for complex energy management and sustainability frameworks of the Microgrid.

The energy management and sustainability frameworks of the Microgrids involve active monitoring and resource scheduling of energy assets to ensure they operate at peak efficiency and with minimal energy waste. Traditional resource scheduling models applied to large-scale power systems cannot be applied directly to microgrids considering microgrids’ special characteristics, which include but not limited to, considerable size of non-dispatchable renewable energy resources; connection to the main grid as a backup generation/load for microgrid, and islanding capability of microgrid which could be for economic or reliability purposes. Thus, there is need for forecasting as well as close tracking of the microgrid load by its generation always to achieve this economic and reliable operation of the system.

Long-term load forecasting (LTLF) usually covers forecasting horizons of one to ten years, and sometimes up to several decades. Medium-term load forecasting (MTLF) encompasses a horizon of several months up to two years into the future. When a Microgrid load is forecasted at a time horizon of few seconds, minutes, hours, or even few weeks it is termed Short Term Load Forecasting (STLF). STLF for a microgrid can involve application of Machine Learning (ML) algorithms. Machine Learning is one of the types of Artificial Intelligence. It is a form of predictive analytics, or predictive modeling where the computer uses programmed algorithms that receive and analyze input data to predict output values within an acceptable range. ML algorithms have shown great performance in time series forecasting and hence can be used to forecast power using weather parameters as model inputs.

ML prediction involves four distinct stages. First, there is acquisition of historical input and output data, which is followed by preprocessing of the collected data into suitable format before it is used to train the prediction model. Then training of the prediction model follows the processing stage. The training process is required to develop the model and is achieved by selecting appropriate parameters for the model. Parameter types depend on the algorithms that will be utilized for the regression process, and parameter selection is impacted by the size of training data, the selection of input variables, and the performance indicators. Lastly, the final stage involves testing the model where testing data is loaded to the trained model to test the prediction performance of the model.

There are several models that can be used in applying ML algorithms to energy forecasting. Some of the models found in literature include but not limited to Artificial Neural Networks (ANNs); Decision Trees (DTs); Support-Vector Machines (MSVs); Regression Analysis; Bayesian Networks; Gaussian Processes, and Genetic Algorithms. The main area of focus in this research is to conduct STLF and Short-Term Generation Forecasting (STGF) for a Solar Photovoltaic (PV) Microgrid. The research uses two independent Machine Learning algorithms, namely, Support Vector Machines (SVM) and Enhanced Decision Tree Regression (EDTR). The result identifies the most efficient method based on their generalization ability (stability), accuracy and computational cost. Onyekachukwu Ezeagbai focused on SVM, while Christopher Beza handled EDTR. Both SVM and EDTR models can run regression analysis.

Regression is a technique used for analyzing the impact of change in one or many variables on the change of another variable and it’s used in variety of science and engineering disciplines for the same.   Simple linear regression explains the relationship between a single dependent variable and one independent variables. Multiple regression on the other hand involves the analysis of more than one dependent variable and several independent variables. The variable(s) which impact(s) another is called predictor variable (generally denoted by Xs) and the variable which is impacted is called response variable (generally denoted by Y). This research paper is dealing with multiple regression since the datasets contain multiple predictor variables and a single response variable. 

The method adopted in this research to determine correlation between variables is to come up with the “best fit” regression equation which goes through the dataset with predictor variables and a response variable. This best fit is the equation that is closest to all if not most of the data points and has least total vertical distance from the data points. The generalization of the whole dataset into this single equation results into an error. This error is summed and squared to eliminate discrepancy and is called a Root Mean Square Error (RMSE). The main objective of regression is to minimize the RMSE. The equation with the minimum RMSE is declared the regression equation for the dataset. Introduction of predictor values in future results in a prediction of the dependent variable