Solar Radiation Prediction Using Deep Gaussian Process Algorithm

Introduction

Background

Solar radiation is the heat energy emitted by the sun and is crucial for life on Earth. However, not all of this energy reaches the Earth's surface. Approximately 50.1% of the solar energy reaching Earth is used to heat the Earth's surface, evaporate water, and support plant photosynthesis (Budiwati et al., 2014). A small portion, about 4%, is reflected back into the atmosphere, while 26% is scattered into the atmosphere by clouds and atmospheric particles, and 19% is absorbed by gases, particles, and clouds. The solar energy absorbed by the Earth's surface is then converted into longwave radiation in the form of heat. This radiation is then reflected back into the atmosphere, with a small part reaching outer space and the rest being trapped by greenhouse gases in the atmosphere. The balance between incoming solar radiation and outgoing radiation is crucial, playing a role in the greenhouse effect that influences Earth's surface temperature. A major environmental challenge is the warming caused by this solar radiation (Yanti et al., 2019).

Solar radiation plays a central role in maintaining the radiation balance on the Earth's surface, regulating the hydrological cycle, supporting plant photosynthesis, and influencing extreme weather and climate conditions (Budyko, 1969; Islam et al., 2009). Therefore, accuracy in predicting solar radiation is highly crucial, both in the context of the solar power industry and in climate research. This makes solar radiation prediction a very important aspect to be researched in this thesis (Huang et al., 2021).

Machine learning models are used to uncover relationships in pattern recognition and classification contexts, especially in situations where there is no clear mapping between input and output data, as seen in data mining and prediction tasks. In the field of machine learning, various supervised learning methods such as linear regression, nonlinear regression, Artificial Neural Networks (ANN), Support Vector Machines (SVM), and K-Nearest Neighbors (KNN) are used. Additionally, there are Unsupervised Learning methods and ensemble methods (Voyant et al., 2017).

In previous literature, single and hybrid prediction models have been developed to estimate hourly solar radiation. Initially, methods like Multi-Layer Perceptron (MLP), Autoregressive Moving Average (ARMA), and persistence models were used. Subsequently, these models were combined with Bayesian Rules, resulting in a 14% improvement in estimation (Voyant et al., 2014). Furthermore, Polynomial Basis Function (PBF) and Radial Basis Support Vector Regression (SVR) based on Radial Basis Function (RBF) were also used to estimate daily solar radiation, showing that the SVR method based on PBF with different statistical indicators provided superior prediction performance on a 1460-day solar radiation dataset (Mohammadi et al., 2015).

In 2017, a study titled "Prediction Of Solar Radiation Based On Machine Learning Methods" successfully developed a solar radiation prediction model using Linear Regression and Gaussian Process Regression, which yielded the smallest errors. The research results indicated that the Gaussian Process Regression method had a lower Mean Squared Error (MSE) value than the Linear Regression method. For the best model and parameters that produced the best results, the Mean Absolute Error (MAE) was calculated as 0.016620, the MSE as 0.000514, and the Root Mean Squared Error (RMSE) as 0.022674 (Karasu et al., 2017).

In 2021, a study titled "Deep Neural Networks for Predicting Solar Radiation at Hail Region, Saudi Arabia" conducted research related to one-day ahead solar radiation forecasting in the Hail region, Saudi Arabia, using various Deep Neural Network (DNN) based models. The results showed that simple structures like LSTM or Bi-LSTM were capable of providing good forecasting accuracy, with a correlation rate of 96% (Boubaker et al., 2021).

In 2020, a study titled "Evaluating the Potential of Gaussian Process Regression for Solar Radiation Forecasting: A Case Study" aimed to demonstrate the potential of Gaussian Process Regression in interpolating and predicting Global Horizontal Irradiance (GHI) data. The results of this study showed that interpolation and prediction of solar radiation data could be achieved quite well using multi-in-single-out Gaussian Process Regression with a periodic x rational quadric kernel on weather data taken at one-hour intervals (Lubbe et al., 2020).

This research predicts solar radiation using the Deep Gaussian Process method.

Problem Formulation

In efforts to solve prediction problems, artificial intelligence algorithms have become a primary choice due to their ability to identify complex patterns from data. For meteorological data, which is often characterized by non-linear dynamics and high uncertainty, an approach is needed that is not only accurate in modeling patterns but also capable of explicitly measuring prediction uncertainty.

Deep Gaussian Process (Deep GP) is an extension of Gaussian Process that integrates a multi-layered structure, enabling it to capture more complex non-linear relationships in data. This probabilistic approach allows the model to provide uncertainty estimates, which are important in analysis and decision-making.

Based on these considerations, this research aims to implement a Deep Gaussian Process model for predicting meteorological data, focusing on the model's ability to model stable data dynamics and effectively measure prediction uncertainty.

Research Objectives

The objective of this research is the exploration and implementation of the Deep Gaussian Process (DGP) method in predicting solar radiation with prediction ranges of 7 days, 14 days, 21 days, and 28 days ahead.

Scope of Research

To ensure this research is focused and beneficial, problem limitations are necessary. In this case, the problem limitations are:

The data used is from the research dataset "Development of software for estimating clear sky solar radiation in Indonesia," by Himsar Ambarita.
This research predicts solar radiation for timeframes of 7 days, 14 days, 21 days, and 28 days ahead.
The algorithm used is Deep Gaussian Process (DGP).

Research Benefits

Benefits that can be obtained from this research are as follows:

Testing the effectiveness of Deep Gaussian Process (Deep GP) in predicting meteorological data.
Providing insights into how Deep GP can handle uncertainty in meteorological data prediction.
Serving as a reference for future research in applying Deep GP for time series prediction.
Based on the research results, the Deep GP model can be utilized in various fields requiring predictions with high uncertainty, such as meteorology and renewable energy, including solar power plants.

Research Methodology

The research stages undertaken are as follows:

Literature Study

Literature study was conducted by collecting references from previous research related to solar radiation prediction and the use of the Deep Gaussian Process (Deep GP) algorithm. Research references were taken from journals, articles, theses, and other relevant sources.

Problem Analysis

After conducting the literature study, problems that arose were identified, both related to the topic of solar radiation prediction and issues in the application of algorithms in previous research. The identification results formed the basis for formulating the problem to be solved through this research, particularly in modeling meteorological data with Deep GP.

System Design

Based on the literature study and problem analysis, a system design was developed that includes the general architecture of the prediction model. This system design contains the data processing flow, Deep GP modeling stages, and prediction result evaluation steps.

Data Preparation

This stage involves:

Data Collection: Collection of weather and solar radiation data taken from the dataset "Development of Software for Estimating Clear Sky Solar Radiation in Indonesia" (Ambarita, 2017). The data includes wind speed, air temperature, air humidity, dew point, and solar radiation for 6 years (2017–2022) in Medan City, Indonesia.
Data Integration: Combining data from daily tables and monthly documents into a single dataset with uniform format and column names.
Data Cleaning: Correcting or removing incomplete or invalid data, considering there were days when sensors did not operate optimally.
Feature Selection: Selecting features that have a significant relationship with the prediction target (solar radiation).
Data Transformation: Using StandardScaler from the scikit-learn library to scale numerical data to a uniform range so that each feature contributes evenly and to speed up the model training process.

Implementation

The prediction model implementation was carried out using the Deep Gaussian Process (Deep GP) approach with the scikit-learn library. The steps performed include:

Model Development:
- Developing or adapting the Deep GP model by utilizing available modules in scikit-learn.
- Designing a layered structure in the model to capture complex non-linear relationships in meteorological data.
Model Training:
- Training the model using the processed dataset, with data transformation using StandardScaler.
- Optimizing model parameters to minimize prediction errors.
Model Evaluation:
- Measuring model performance using evaluation metrics such as Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Coefficient of Determination (R²).
User Interface Development:
- Creating a Streamlit-based interface so users can easily perform training, testing, and prediction processes without having to run code directly.

Report Compilation

The final stage of the research is the compilation of a report documenting the entire research process, the methodology applied, the results, and the analysis obtained. This report is structured systematically and comprehensively as evaluation material and a reference for future research.

Writing Systematics

The writing structure in this research consists of five chapters, described as follows:

Chapter 1: Introduction
This chapter discusses the research background, problem formulation, objectives to be achieved, research limitations, expected benefits, methodology used, and writing systematics as a guide in reading this report.
Chapter 2: Theoretical Basis
This chapter presents theories that support the research, including concepts related to the methods used, the problems studied, and relevance to the type of data analyzed.
Chapter 3: Analysis and Design
This chapter explains the data analysis process and system design, which includes the prediction model architecture, data processing stages, and user interface design developed in this research.
Chapter 4: Implementation and Testing
This chapter describes the application of the model designed in the previous chapter and reviews the test results conducted on the system. Analysis of model performance is also presented in this section.
Chapter 5: Conclusion and Suggestions
The final chapter presents the conclusions from the research conducted and provides recommendations for further development in the future.

Theoretical Basis

Solar Radiation

Solar radiation consists of electromagnetic waves generated by nuclear fusion processes in the sun's core. On a clear day, an average of 1,367 W/m² of solar energy reaches the Earth's outermost atmosphere in the form of short waves (around 4.0 μm), although only a portion of this energy reaches the land and ocean surfaces. Solar radiation entering the Earth's atmosphere undergoes various processes, such as absorption and reflection by atmospheric particles, as well as absorption by the Earth's surface (Kafka & Miller, 2019). The total shortwave radiation reaching the Earth's surface in a horizontal orientation is often referred to as global radiation or global horizontal radiation. This global radiation consists of two components: direct radiation and diffuse radiation (Sianturi, 2021).

Central Role of Solar Energy

Solar energy plays a central role as the primary energy source in various processes occurring on Earth. The contribution of solar energy helps facilitate diverse physical and biological processes on the planet. Radiation, in this context, refers to the process of heat energy propagation in the form of electromagnetic waves without an intermediary substance. Solar energy reaches the Earth's surface through radiation (emission) because there is a vacuum between the Earth and the Sun, meaning no intermediary medium is involved. Electromagnetic waves are a form of wave that propagates at very high speeds and does not require an intermediary medium, consisting of electric field and magnetic field components. In this propagation process, a large amount of energy emitted by the Sun reaches Earth, and this energy is then absorbed by our planet. This energy absorption, in turn, contributes to the increase in temperature on the Earth's surface (Pawitra Teguh Dharma Priatam et al., 2021).

Solar Radiation Prediction

Solar energy is naturally variable, so the power generated from solar power plants also fluctuates significantly, influenced by solar radiation intensity, ambient temperature, and other atmospheric conditions (Behera et al., 2018). Therefore, it is very important to predict solar radiation by utilizing easily measurable climate parameters, such as humidity, air temperature, wind speed, and cloud cover (Ağbulut et al., 2021).

Machine Learning

Machine Learning (ML) technology refers to machines that have the ability to learn independently without human assistance. Based on disciplines such as data mining, mathematics, and statistics, machines can learn by analyzing data without needing to be reprogrammed or instructed (Takdirillah, 2020).

The use of machine learning is one of the main components in the domain of artificial intelligence, which has now become a primary choice in facing various challenges and problems that require solutions (Roihan et al., 2019).

In this regard, machine learning (ML) has the ability to acquire current data on its own command. ML can also learn from current and previously acquired data to perform specific tasks. The tasks ML can perform vary greatly depending on what it learns. There are several techniques in machine learning, but broadly, ML has two basic learning techniques: supervised and unsupervised (Takdirillah, 2020).

Supervised Learning

Supervised Learning is one of the main paradigms in machine learning where algorithms are used to develop mathematical models from existing data. In Supervised Learning, these algorithms are guided by predefined examples, where the input and desired output are provided. Using existing training data, the algorithm compares its predictions with the desired results, and from this comparison, the model is updated to become more accurate. Two main tasks in Supervised Learning are classification, where the algorithm groups data into appropriate categories, and regression, where the algorithm predicts numerical values. Examples of supervised learning algorithms include k-Nearest Neighbors (k-NN), Naïve Bayes, decision trees, and various other algorithms. This approach is often used in various applications where historical data is used to predict future events (Sharma, 2020).

Unsupervised Learning

Unsupervised Learning is another paradigm in machine learning where algorithms are used to develop models that only include inputs without requiring predefined outputs. In this type of learning, unlabeled data is used, and tagged outputs are absent. Examples of algorithms used in this learning include association rules and K-means (Sharma, 2020).

Gaussian Process Regression

Gaussian Process (GP) is an important concept in statistics and machine learning with broad applications in modeling data distributions in continuous temporal or spatial domains. GP is a statistical distribution where each input observation is associated with a random variable that follows a multivariate Gaussian distribution (Rasmussen, 2004; Seeger, 2004). In theoretical terms, GP is described as the joint distribution of all input random variables, which can be mathematically represented by Equation (1):

f(θ) ~ GP(μ(θ), k(θ,θ*)) (1)

Where μ(θ) is the mean function and k(θ,θ*) is the covariance function between input variables. Equation (2) defines the covariance between observations, which is an essential element of GP (Rasmussen, 2004):

k(θ,θ*) = E[(f(θ) - μ(θ))(f(θ*) - μ(θ*))] (2)

GP is used to model data distributions on continuous domains. To apply it in a machine learning context, GP bases predictions on given training data. Equation (3) describes the joint distribution of observations in the training data, with μ = [μ(θ1),...,μ(θn)] and K as the n x n covariance matrix. The elements of the covariance matrix K are defined by Equation (4):

[f(θ₁), ..., f(θₙ)]ᵀ ~ N(μ, K) (3)

Kᵢⱼ = k(θᵢ, θⱼ) (4)

The kernel function is a key component in GP analysis. Some classic kernel functions used in GP include (Damianou & Lawrence, 2013; Lawrence, n.d.):

(a) Constant: k_C(x,y) = C
(b) Linear: k_L(x,y) = x^Ty
(c) Gaussian noise: k_GN(x,y) = σ²δ_xy
(d) SE: k_SE(x,y) = a²exp( - ||x - y||² / (2λ²) ), i.e., squared exponential.

In the context of prediction, GP allows us to predict the distribution of new data based on the distribution of known data. Equation (5) describes how GP predicts the distribution for new input observations O = [o1,...,om]T using the joint distribution with known data:

[ y ] ~ N ( [ μ ] , [ K(Φ,Φ) + δₙ²I K(O,Φ)ᵀ ] )
[ f*] ( [ μ* ] [ K(O,Φ) K(O,O) ] ) (5)

Gaussian Process allows for the modeling of complex data distributions and can be used to describe temporal changes in random variables defined between prediction points (Chen et al., 2020).

Deep Gaussian Process

Deep Gaussian Process (DGP) is an extension of Gaussian Process that has multiple layers of Gaussian Processes, similar to deep neural networks (Deep Learning).

DGP forms a hierarchical relationship with several layers of hidden functions:

h₁ ~ GP(μ₁, k₁)
h₂ ~ GP(μ₂, k₂) | h₁
...
y ~ GP(μ_L, k_L) | h_L-1

where each layer has its own GP distribution. The output function from the GP in the previous layer becomes the input for the GP in the next layer.

The main differences compared to standard GP are:

DGP can capture more complex and non-linear relationships.
Allows for latent feature learning like Deep Learning.
More difficult to optimize because there is no analytical solution.

Comparison with Other Methods

In the context of solar radiation prediction, various methods have been developed, ranging from traditional statistical models to advanced machine learning techniques. Statistical models like Autoregressive Integrated Moving Average (ARIMA) and physics-based models are often used as a baseline, but their ability to handle non-linear and complex data may be limited (Voyant et al., 2014). On the other hand, machine learning methods such as Support Vector Regression (SVR), Random Forest, and Artificial Neural Networks (ANN) have shown success in capturing intricate patterns in solar radiation data (Solano et al., 2022).

Deep Gaussian Process (DGP) offers a different approach by combining the flexibility of Gaussian Process models with the hierarchical feature representation capabilities of deep learning. Unlike conventional deep learning models, which are often black-box, DGP provides uncertainty estimates in its predictions, which is crucial for applications requiring an understanding of risk and prediction reliability (Rasmussen & Williams, 2006). Although RNNs and LSTMs (Hochreiter & Schmidhuber, 1997; Lipton et al., 2015) are also effective for time series data, DGP can naturally handle uncertainty and has a strong probabilistic interpretation. Research by (Najibi et al., 2020) shows that Gaussian Process Regression (GPR), which is the foundation of DGP, can provide accurate short-term predictions for solar power. With DGP's ability to model more complex relationships through layers of GPs, it is expected that DGP can provide better or at least complementary performance to other methods under certain conditions.

Previous Research

Research related to solar radiation prediction has been extensively conducted. In 2019, Yanti and her team conducted a study titled "Solar Radiation Prediction Using Elman Recurrent Neural Network Method." This research highlighted the important role of solar radiation in various aspects of life, including energy, agriculture, and health. They used an artificial intelligence technique, Elman Recurrent Neural Network (ERNN), to predict solar radiation based on data such as sunshine duration, air temperature, rainfall, and air humidity from BMKG. The results showed high accuracy up to 96.33% with appropriate parameter settings, such as a learning rate of 0.1, 500 epochs, and a minimum error of 0.0001. This study can help in understanding the impact of solar radiation on humans and the environment (Yanti et al., 2019).

Huang and his team in 2021 conducted research titled "Solar Radiation Prediction Using Different Machine Learning Algorithms and Implications for Extreme Climate Events." In this study, they created 12 different machine learning models to predict and compare daily and monthly solar radiation values. Additionally, they developed a "stacking" model. Using the best of these algorithms to predict solar radiation. The research results showed that meteorological factors such as sunshine duration, ground surface temperature, and visibility play a very important role in machine learning models. Trend analysis between extreme ground surface temperatures and solar radiation amounts demonstrated the importance of solar radiation in complex extreme climate events. Several models such as gradient boosting regression tree (GBRT), extreme gradient boosting (XGBoost), Gaussian process regression (GPR), and random forest showed better (or lower error) prediction capabilities for daily and monthly solar radiation. The "stacking" model using GBRT, XGBoost, GPR, and random forest, provided better results than single models in predicting daily solar radiation, but did not offer an advantage over the XGBoost model in predicting monthly solar radiation. Thus, it can be concluded that the "stacking" model and the XGBoost model are the best models for predicting solar radiation (Huang et al., 2021).

In 2023, Kim and his team conducted research titled "Solar Radiation Forecasting Based on the Hybrid CNN-CatBoost Model." In this study, solar radiation was predicted by utilizing extra-atmospheric solar radiation and three weather variables: air temperature, relative humidity, and total cloud volume. This research compared the performance of single models in machine learning and deep learning. In the single model comparison, "boosting" techniques such as extreme gradient boosting and categorical boosting (CatBoost) were used in machine learning, while the recurrent neural network (RNN) family, such as long short-term memory and gated recurrent units, were used in deep learning. In this study, CatBoost (previously considered the best model) was compared with Convolutional Neural Network (CNN) and a hybrid CNN-CatBoost model prediction method was introduced, combining CatBoost in machine learning and CNN in deep learning to achieve the best prediction performance in single model comparisons. Additionally, the research also tested prediction accuracy by adding wind speed and rainfall to the hybrid model. The results showed that the model considering wind speed and rainfall experienced increased accuracy in almost all locations, except for three out of the 18 analyzed locations: Gangneung, Suwon, and Cheongju (Kim et al., 2023).

In the study titled "Prediction of daily global solar radiation and air temperature using six machine learning algorithms; a case of 27 European countries," conducted by Nematchoua and his team, daily global solar radiation was predicted using six different machine learning algorithms, including models such as Linear model (LM), Decision Tree (DT), Support Vector Machine (SVM), Deep Learning (DL), Random Forest (RF), and Gradient Boosted Trees (GBT). This research covered 27 cities in 27 European countries with different solar radiation distribution patterns. The results showed that SVM was the best model among the six used, followed by DL, LM, GB, RF, and DT. Furthermore, this study also forecasted air temperature and global solar radiation in these cities for the years 2050 and 2100 using three latest scenarios from the Intergovernmental Panel on Climate Change (IPCC). The findings of this research provide important insights into understanding solar resources in these regions (Nematchoua et al., 2022).

In 2022, Solano and his team conducted research titled "Solar Radiation Forecasting Using Machine Learning and Ensemble Feature Selection." In this study, accurate solar radiation forecasting is essential for safely operating power systems, especially with the significant increase in photovoltaic power plant usage. This paper compares the performance of various machine learning algorithms in solar radiation forecasting using endogenous and exogenous inputs. Additionally, they proposed an ensemble feature selection method to choose not only relevant input parameters but also past observation values of these parameters.

The machine learning algorithms used in this research include Support Vector Regression (SVR), Extreme Gradient Boosting (XGBT), Categorical Boosting (CatBoost), and Voting-Average (VOA), which combines SVR, XGBT, and CatBoost. The proposed ensemble feature selection method uses Pearson coefficients, random forest, mutual information, and the relief method. To evaluate prediction accuracy, they used a real database from Salvador, Brazil, and considered various prediction time horizons, such as 1 hour, 2 hours, and 3 hours ahead.

Numerical results showed that the proposed ensemble feature selection approach could improve forecasting accuracy, and that the Voting-Average (VOA) model performed better than other algorithms across all prediction time horizons. This research provides important insights into developing reliable solar radiation forecasting using machine learning and ensemble feature selection methods (Solano et al., 2022).

Novi Yanti et al., in 2019, conducted research to predict solar radiation using the Elman Recurrent Neural Network (ERNN) method with data from the Meteorology, Climatology, and Geophysics Agency (BMKG) Pekanbaru. This study used historical data for five years (January 2013 – December 2017) which included sunshine duration, air temperature, rainfall, and air humidity as input variables. Test results showed that the ERNN method was able to provide the best accuracy of 96.531% in a 90% : 10% training and testing data split scenario with optimal parameters: learning rate 0.2, 500 epochs, and minimum error 0.0001. These findings indicate that ERNN can be used as a reliable prediction tool in modeling solar radiation (Novi Yanti et al., 2019).

Muhammad Rezza et al., in 2024, conducted research to predict solar energy using the Long Short-Term Memory (LSTM) method based on artificial neural networks. This study used data from West Kalimantan, collected over 57 days at 1-2 second intervals, resulting in 4,294,273 data points related to voltage, current, and solar panel output power. The LSTM model used consisted of two LSTM layers, each with 50 nodes, and was tested in two training scenarios with 1 and 10 epochs. Test results showed that the model with 10 epochs had the best prediction performance, with a Mean Square Error (MSE) of 0.04444, Root Mean Square Error (RMSE) of 0.00456, Mean Absolute Error (MAE) of 0.06753, and R-squared (R²) of 0.99961. These findings indicate that the LSTM method is highly accurate in predicting solar energy production and can be implemented for optimizing solar energy systems (Muhammad Rezza et al., 2024).

Huang et al., in 2021, conducted research to predict solar radiation using various machine learning algorithms, as well as analyzing its implications for extreme climate events. This study used 12 machine learning models, including Gradient Boosting Regression Tree (GBRT), Extreme Gradient Boosting (XGBoost), Gaussian Process Regression (GPR), and Random Forest, with daily meteorological datasets from Ganzhou, China (1980–2016). The research results showed that the XGBoost model and Stacking Model provided the best accuracy in daily solar radiation prediction, with an R² of 0.944 and RMSE of 1.131 MJ/m² for XGBoost. However, for monthly predictions, the Stacking model did not offer a significant advantage compared to XGBoost. This study emphasizes the importance of selecting meteorological variables in building accurate solar radiation prediction models (Huang et al., 2021).

Simanjuntak and Wibowo (2023) conducted research to predict solar radiation intensity in Jayapura City using Artificial Neural Networks (ANN) with the Backpropagation algorithm. The data used included air temperature, humidity, sunshine duration, and rainfall, obtained from the Dok II Jayapura Meteorological Station for the period 2009–2019. The ANN model was tested with several network architectures with varying numbers of neurons in 1 hidden layer to find the best combination based on Root Mean Square Error (RMSE) and correlation coefficient.

The research results showed that the best architecture had an RMSE of 1.970 kWh/m², with the average monthly solar radiation in Jayapura City reaching 4.5 kWh/m², indicating great potential for utilizing solar energy as an alternative energy source. This study confirms that the ANN method with backpropagation can be effectively used in estimating solar radiation and supports the development of renewable energy in Eastern Indonesia (Simanjuntak & Wibowo, 2023).

Research Differences

This research has several significant differences compared to previous studies in the field of solar radiation prediction. First, this research uses the Deep Gaussian Process (DGP) algorithm, which combines the strengths of Gaussian Process and Deep Neural Networks to capture complex non-linear relationships in meteorological data. The DGP algorithm allows the model to provide explicit uncertainty estimates, which are crucial in practical applications such as energy planning.

Unlike research using algorithms such as Elman Recurrent Neural Network (ERNN) (Yanti et al., 2019) or LSTM (Muhammad Rezza et al., 2024), DGP has a layered structure that allows the model to capture hierarchical features in the data. Additionally, this research uses the "Development of Software for Estimating Clear Sky Solar Radiation in Indonesia" dataset collected in Medan City, Indonesia, which has a tropical climate different from other regions focused on in previous research.

This research also systematically explores the use of different training data subsets for each prediction period of 7 days, 14 days, 21 days, and 28 days ahead. Our findings show that using smaller data subsets can sometimes lead to better performance, and this is a unique contribution of our research. Furthermore, this research pays special attention to uncertainty estimation in solar radiation prediction, allowing us to measure and visualize this uncertainty.

Thus, this research makes a significant contribution to the field of solar radiation prediction by using the DGP algorithm, relevant datasets, and innovative methodology.

System Analysis and Design

Data Used

This research utilizes numerical data sourced from the dataset titled "Development of Software for Estimating Clear Sky Solar Radiation in Indonesia," compiled by Himsar Ambarita in 2017. The data was collected using a HOBO Micro Station logger, equipped with temperature/humidity sensors, solar radiation sensors, and wind speed sensors. Data collection was carried out on the roof of the Mechanical Engineering S2 Building, University of Sumatera Utara.

The analyzed dataset consists of 60 CSV files, covering the data collection period from January 2018 to December 2022. Details of the data quantity are presented in Table 3.1 below.

Table 3.1 Details of Data Used

Year	Period	Number of Months
2018	January – December	12 months
2019	January – December	12 months
2020	January – December	12 months
2021	January – December	12 months
2022	January – December	12 months
Total		60 months

Each file contains data recorded by three types of sensors: wind speed sensor, solar radiation sensor, and air temperature/humidity sensor (Ambarita, 2017). The measurement ranges for each sensor are as follows:

Wind speed: 0 – 76 m/s
Solar radiation: 0 – 1280 W/m²
Air temperature and humidity: -40°C to 75°C

However, in this study, only temperature, air humidity, and solar radiation data were used.

General System Architecture

The general architecture of this research is shown in Figure 3.1. The solar radiation prediction process is carried out through several main stages, starting from data processing to model evaluation. Each stage plays an important role in ensuring that the built model can produce accurate and reliable predictions.

The first stage is Data Collection. The data used in this research was collected in separate files, then combined into a single file to be more easily used in the model training and testing process. This data merging was done to ensure consistency and ease of access, so that analysis can be performed more efficiently.

After the data is collected, the next stage is Data Pre-processing. At this stage, feature and target selection are performed for use in model training. In addition, the data is also normalized to have a more uniform distribution, so the model can learn more effectively. Normalization is done to avoid different data scales that can cause the model to tend to give greater weight to features with higher values.

Next, the processed data is divided into two parts through the Data Splitting stage. The data is divided into training data and testing data to ensure that the model can be tested with data it has never seen before. In this research, the division is based on varying time periods, namely 7 days, 14 days, 21 days, and 28 days. This division aims to evaluate how the model performs in various prediction scenarios with different time coverages.

The next stage is Model Training, where the Deep Gaussian Process (Deep GP) model is used to learn patterns in the data. Deep GP consists of several layers of Gaussian Process (GP), which allows the model to capture complex patterns that might be difficult for a single GP model to learn. With this layered structure, the model is expected to be more flexible in understanding non-linear relationships between variables in the data.

After the model is trained, the next stage is Data Testing. The trained model is tested using testing data to evaluate its performance in predicting solar radiation. This testing is important to see if the model can generalize well to new data and does not experience overfitting or underfitting.

The final stage is Model Evaluation. The prediction results are compared with actual data using several evaluation metrics, namely:

Mean Squared Error (MSE): Measures the average squared error between predictions and actual values.
Root Mean Squared Error (RMSE): The square root of MSE, providing an error measure in the same units as the original data.
Mean Absolute Error (MAE): Calculates the average absolute value of the difference between predictions and actual values.
Coefficient of Determination (R²): Measures the extent to which the model can explain the variability in the data.

Data Collection

Data collection is the initial stage of gathering weather and solar radiation data. This data was taken from the research dataset "Development of software for estimating clear sky solar radiation in Indonesia," by Himsar Ambarita. The data includes wind speed, air temperature, air humidity, dew point, and solar radiation captured during 2012 and then continued from 2017 to 2022 in Medan City, Indonesia, using a solar radiation measuring device with wind sensors, radiation sensors (Pyranometer), and temperature and humidity sensors.

Data Pre-processing

Before data can be effectively utilized for modeling, a comprehensive series of pre-processing stages is performed to ensure data quality and relevance. These stages include Data Cleaning, Feature Extraction, Feature Selection, and Feature Scaling.

1. Data Cleaning

Data cleaning is a crucial step in eliminating inaccuracies and inconsistencies that could potentially disrupt the model training process. Initially, the Wind Speed and Wind Gust Speed columns were identified as containing entirely zero values, so they were removed from the dataset as they did not provide relevant information.

Figure: Solar radiation distribution with humidity

Figure: Solar radiation distribution with dew point

Figure: Solar radiation distribution with temperature

Figure: Distribution of solar radiation with humidity, dew point, temperature, and their correlation matrix.

Next, a correlation analysis was performed between the Solar Radiation variable and other variables, namely Temperature, Relative Humidity, and Dew Point. Visualization of the solar radiation data distribution against these variables along with their correlation matrix is shown in the figures above. The initial correlation matrix shows the linear relationships between variables before outlier handling.

Figure: Solar radiation distribution with humidity after outlier removal

Figure: Solar radiation distribution with dew point after outlier removal

Figure: Solar radiation distribution with temperature after outlier removal

Figure: Correlation matrix after outlier removal

Figure: Distribution of solar radiation with humidity, dew point, temperature, and their correlation matrix after removing outliers.

After the initial identification, the outlier removal process was carried out to minimize the influence of extreme values that could distort analysis and modeling. The figures above present the data distribution and correlation matrix again after the outlier removal process.

In this correlation analysis, although Dew Point shows a fairly strong correlation with Solar Radiation (0.66 after outlier removal), this feature might not be carried forward to the feature extraction and selection stages. One potential reason for this decision is potential data quality issues. Based on reports from other studies using similar datasets (Ambarita, 2017), the Dew Point feature has a significant percentage of missing data (16.1%).

Figure: Average solar radiation per year after data cleaning

Figure: Average solar radiation per month after data cleaning

Figure: Average solar radiation per year and month after data cleaning.

Figure: Average temperature per year after data cleaning

Figure: Average temperature per month after data cleaning

Figure: Average temperature per year and month after data cleaning.

Figure: Average relative humidity per year after data cleaning

Figure: Average relative humidity per month after data cleaning

Figure: Average relative humidity per year and month after data cleaning.

As a final step in data cleaning, temporal data aggregation was performed. Data originally with hourly frequency was converted to daily frequency data by calculating the average value. This aggregation aims to simplify the data and make it more suitable for long-term trend analysis or modeling with daily or longer prediction targets. A summary of solar radiation, temperature, and relative humidity data after the cleaning and aggregation process per month and per year can be seen in Figures 3.4, 3.5, and 3.6 respectively (represented by the image sets above).

Figure: Solar radiation distribution with humidity after aggregation

Figure: Solar radiation distribution with temperature after aggregation

Figure: Correlation matrix after aggregation

Figure: Distribution of solar radiation with humidity, and temperature along with their correlation matrix after temporal data aggregation.

2. Feature Extraction

Feature extraction was performed to extract additional information from the 'date' column. In the initial dataset, date information was only available in a single 'date' column, so it needed to be split into three new features: 'year', 'month', and 'day'.

This year, month, and day information can help the model understand seasonal and daily patterns in solar radiation data. For example, solar radiation tends to be higher in dry season months and lower in rainy season months.

Figure 3.8 Pseudocode for Feature Extraction process

Table 3.2 Data before feature extraction

date	temp	rh	solar_radiation
2018-01-01	29.1158618705036	82.12014388489209	229.18805755395684
2018-01-02	28.409761768901568	84.97289586305278	179.4469329529244
2018-01-03	27.802230534351143	87.31068702290077	171.2303816793893
2018-01-04	27.978782978723405	85.59858156028369	158.58709219858156
2018-01-05	28.3998547008547	83.92507122507122	179.82094017094016

Table 3.3 Data after feature extraction

date	year	month	day	temp	rh	solar_radiation
2018-01-01	2018	1	1	29.1158618705036	82.12014388489209	229.18805755395684
2018-01-02	2018	1	2	28.409761768901568	84.97289586305278	179.4469329529244
2018-01-03	2018	1	3	27.802230534351143	87.31068702290077	171.2303816793893
2018-01-04	2018	1	4	27.978782978723405	85.59858156028369	158.58709219858156
2018-01-05	2018	1	5	28.3998547008547	83.92507122507122	179.82094017094016

After feature extraction, the data has additional features that can help the model understand the temporal patterns of the data.

3. Feature Selection

At this stage, features to be used as input (X) and target (y) were selected.

Features (X): year, month, day, temp, rh
Target (y): solar_radiation

These features were chosen because they are considered to have a significant relationship with solar radiation based on literature studies and initial data analysis. Pearson correlation analysis was performed to measure the strength and direction of the linear relationship between each feature and the target. The analysis results showed that temperature (temp) and relative humidity (rh) have a significant correlation with solar radiation. In addition, temporal features (year, month, day) were also selected to capture seasonal and daily patterns.

Figure 3.9 Pseudocode for Feature Selection Process

Table 3.4 Feature data (X)

year	month	day	temp	rh
2018	1	1	29.1158618705036	82.12014388489209
2018	1	2	28.409761768901568	84.97289586305278
2018	1	3	27.802230534351143	87.31068702290077
2018	1	4	27.978782978723405	85.59858156028369
2018	1	5	28.3998547008547	83.92507122507122

Table 3.5 Target data (y)

solar_radiation
229.18805755395684
179.4469329529244
171.2303816793893
158.58709219858156
179.82094017094016

This feature selection aims to filter variables considered relevant in the model learning process.

4. Feature Scaling

To ensure data has a uniform scale, normalization was performed using StandardScaler. StandardScaler works by transforming data into a distribution with a mean (μ = 0) and standard deviation (σ = 1). This process is important so that each feature contributes evenly in the model training process and to avoid numerical issues.

Normalization formula:

X' = (X - μ) / σ

Where:

X' is the value after normalization
X is the original value
μ is the mean of that feature
σ is the standard deviation of that feature

StandardScaler was chosen because it is robust to outliers and does not restrict data to a specific range like MinMaxScaler.

Figure 3.10 Pseudocode for Feature Scaling Process

Table 3.6 Data before normalization

year	month	day	temp	rh
2018	1	1	29.1158618705036	82.12014388489209
2018	1	2	28.409761768901568	84.97289586305278
2018	1	3	27.802230534351143	87.31068702290077
2018	1	4	27.978782978723405	85.59858156028369
2018	1	5	28.3998547008547	83.92507122507122

Table 3.7 Data after normalization

month_normalized	day_normalized	temp_normalized	rh_normalized
-1.12	-1.22	1.03	-0.98
-1.12	-1.00	0.67	-0.45
-1.12	-0.78	0.31	0.12
-1.12	-0.56	0.42	-0.22
-1.12	-0.34	0.66	-0.67

Data Splitting

The data is divided into two sets: training data (Data Training) and testing data (Data Testing). Training data is used to train the model, while testing data is used to test the model's performance after training.

The following tables show data for the Data Splitting process based on prediction periods of 7, 14, 21, and 28 days.

Table 3.8 Last 7 Days Data

Date	Year	Month	Day	Temperature (°C)	Humidity (%)	Solar Radiation (W/m²)
2022-12-04	2022	12	4	30.90	75.39	244.76
2022-12-05	2022	12	5	31.41	74.89	250.71
2022-12-06	2022	12	6	28.78	82.44	185.29
2022-12-07	2022	12	7	28.89	85.31	132.95
2022-12-08	2022	12	8	32.44	73.35	268.86
2022-12-09	2022	12	9	31.06	72.19	254.18
2022-12-10	2022	12	10	29.04	81.41	175.97

Table 3.9 Last 14 Days Data (Excerpt)

Date	Year	Month	Day	Temperature (°C)	Humidity (%)	Solar Radiation (W/m²)
2022-11-27	2022	11	27	30.18	76.39	195.49
...	...	...	...	...	...	...
2022-12-10	2022	12	10	29.04	81.41	175.97

Similar tables (Table 3.10 for 21 days, Table 3.11 for 28 days) detail the data used for those respective prediction horizons.

Model Training

In this research, a Deep Gaussian Process (DGP) algorithm with two layers was implemented using the scikit-learn library in Python. The first layer uses a Radial Basis Function (RBF) kernel combined with a WhiteKernel to handle noise in the data. This kernel was chosen for its ability to model non-linear relationships and capture local variations in the data (Rasmussen & Williams, 2006). The second layer also uses a combination of RBF and WhiteKernel.

The model training process was conducted using a small subset of the training data (subset_percent) for computational efficiency, as determined by the user. The selection of the data subset was done randomly with a seed also determined based on a percentage of the training data amount. Kernel parameter optimization in each layer was performed using the `fit` function of `GaussianProcessRegressor` with `n_restarts_optimizer` set to 10 times to find optimal kernel parameters.

The input for the first layer consists of normalized features: air temperature (temp), relative humidity (rh), year, month, and day. The output from the first layer then becomes the input for the second layer, which predicts the normalized solar radiation value.

Figure 3.12 Pseudocode for Training Deep Gaussian Process

Evaluation Metrics

The performance of the prediction model will be evaluated using several common metrics to measure the accuracy of time series predictions, namely Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and R-squared (R²). MAE measures the average magnitude of prediction errors, MSE measures the average of the squares of the errors, RMSE is the square root of MSE (giving more weight to large errors), and R² measures the proportion of variance in the dependent variable that is predictable from the independent variables. These metrics were chosen to provide a comprehensive overview of the model's performance in terms of error magnitude and how well the model can explain variations in solar radiation data.

Figure 3.13 Pseudocode for Model Testing Process

System Interface Design

Page Layout Design

The system interface was created using Streamlit with the following features:

Configuration Sidebar:
- Input path for CSV file.
- Choice of prediction time frame.
- Percentage of training data subset used.
- Seed percentage for experiment reproducibility.
Main Page:
- Data Exploration Visualization: Scatter plots, bar charts, and correlation heatmaps.
- Prediction Results: Model evaluation table and graph comparing predictions with observations.
- Confidence Interval (Uncertainty Estimation) from the DGP model.

Main Page Layout with Data Exploration — Figure 3.14 Main Page Layout Design with Data Exploration Display

Main Page Layout with Prediction Plot — Figure 3.15 Main Page Layout Design with Prediction Results as a Plot

Implementation and Testing

System Implementation

In the system implementation phase, solar radiation magnitude prediction from meteorological data was performed using the Deep Gaussian Process (Deep GP) algorithm within a system using the Python programming language. This system is designed as a web-based application that will use Streamlit as the framework for building the user interface.

Hardware and Software

The development of the solar radiation prediction system utilized the following hardware and software:

CPU: Intel i5-8250U (8) @ 3.40GHz
RAM: 20 GB DDR4
SSD Capacity: 1 TB
Operating System: Ubuntu 24.04.2 LTS x86_64

The system implementation uses Python 3.12.3 and several supporting libraries:

Streamlit v.1.42.2: Used to build an interactive web-based user interface (UI), allowing users to interact with the model without writing code.
Pandas v.2.2.3: Crucial for data manipulation, such as reading CSV files, cleaning data, and performing data transformations.
Numpy v.1.26.4: Used for efficient numerical operations, especially multidimensional arrays that form the basis for many machine learning algorithms.
Matplotlib v.3.10.0 & Seaborn v.0.13.2: Both used for data visualization, aiding in initial data exploration and presentation of prediction results.
Scikit-learn v.1.6.0: The main library for implementing machine learning algorithms, including Gaussian Process Regression, which is the foundation of the DGP model.

Interface Design Implementation

After designing the interface in Chapter 3, the implementation of this design was created using Streamlit. The system has one main page consisting of several sections for data exploration, model training, model testing, and result prediction.

Streamlit Interface: The program code integrates an interactive interface that allows users to input configurations, select prediction time frames, and view analysis results in real-time.
Data Visualization: Functions are available to display data exploration plots and prediction result plots, so users can understand model performance through graphs.
Users can select configuration parameters via the sidebar.
After starting training, the system will display data exploration results and model predictions.
Visualization of prediction results is displayed in graphical form and evaluation tables.

1. Main Page View

Figure 4.1 Main Page View with Configuration Sidebar

The main page view is the first screen that appears when the system is run. This page contains the research title and author data displayed at the top of the screen. In the Configuration Sidebar, users can set parameters before running the model, such as CSV file path, prediction time frame, percentage of training data subset, and seed for the random number generator, which aims to ensure that experimental results can be reproduced if run again with the same parameters. If the user checks the "Show Data Exploration" option, the system will display scatter plots, bar charts, and correlation heatmaps.

2. Training, Testing, and Model Prediction Process View

Figure 4.2 Main Page View with Data Exploration

This section presents various data visualizations that help users understand dataset characteristics before the model is trained. Visualizations displayed include scatter plots to see relationships between variables, bar charts for category or frequency comparisons, and correlation heatmaps to show the strength and direction of linear relationships between features in the data.

Figure 4.3 Training, Testing, and Model Prediction Process View

This section illustrates the system workflow after the user presses the "Start Training" button on the interface. After training parameters are set, the system will sequentially perform DGP model training, followed by model testing using test data, and finally generate solar radiation predictions. This view likely shows the progress or status of these stages, as well as initial output from the model.

This section displays quantitative results of the performance evaluation of the trained and tested model. Common evaluation metrics in regression tasks, such as Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Coefficient of Determination (R²), and actual vs. predicted results are presented in a table. This view allows users to objectively assess how well the model performs predictions.

Figure 4.5 Overall Evaluation Metric View

This section presents a summary or comparison of evaluation metrics overall.

Model Implementation

In finding the configuration that yields the best performance during the Deep Gaussian Process model training, a series of experiments were conducted by varying the percentage of training data used (Subset %) for each prediction time frame. Other parameters, such as the Gaussian Process kernel, were predetermined. The initial phase of model implementation began with experiments on a 7-day prediction time frame. In this experiment, all available training data (100%) was used as a starting point. Detailed results of model implementation with various percentages of training data for a 7-day time frame can be seen in Table 4.1.

Table 4.1 Model Implementation with Specific Training Data Percentages (7-Day Prediction Horizon)

Subset %	MAE	MSE	RMSE	R²
100	20.3740676189	490.3277650887	22.1433458422	0.7817402278
90	20.7441293977	516.499499064	22.7266253338	0.7700903945
80	17.2883621789	356.0250097317	18.868625009	0.841522461
70	18.2687860141	403.5015923139	20.0873490614	0.8203891929
60	19.2658155999	442.3380064583	21.0318331692	0.8031019261
50	15.7973788756	309.1258878421	17.5819762212	0.8623986837
40	18.5722098249	417.0784585244	20.4224988315	0.8143457176
30	22.5119940028	560.7525028077	23.6802133185	0.7503920393
20	20.8719164255	490.2806455994	22.1422818517	0.7817612021
10	20.9267773856	523.3341099672	22.8764968902	0.7670481018

From Table 4.1, which contains the results of model implementation with various percentages of training data for a 7-day prediction horizon, it can be observed that the influence of the amount of training data used is not always monotonic. For example, the model trained with 50% training data (experiment 6) produced lower MAE and RMSE values compared to the model trained with 100% data (experiment 1). This indicates that more training data does not always guarantee better results, and selecting the right data subset can lead to a model with more optimal performance. In this trial, experiment 6 with 50% training data showed interesting results with the lowest MAE of 15.797 and the lowest RMSE of 17.582, compared to other experiments for the 7-day prediction horizon.

Next, similar experiments were conducted for different prediction horizons, namely 14, 21, and 28 days. The results of model implementation with various percentages of training data for a 14-day prediction horizon can be seen in Table 4.2.

Table 4.2 Model Implementation with Specific Training Data Percentages (14-Day Prediction Horizon)

Subset %	MAE	MSE	RMSE	R²
100	21.7682322406	608.3807347818	24.6653752208	0.7590739365
90	21.8740741953	615.0764504158	24.8007348765	0.7564223529
80	21.9442939114	618.781133712	24.875311731	0.7549552539
70	20.603306228	548.3188743537	23.4162096496	0.7828591532
60	21.1337376044	574.9566894547	23.9782545123	0.7723102591
50	21.5085441815	587.8229840301	24.2450610234	0.7672150523
40	19.459548559	503.2258266614	22.4326954836	0.8007165406
30	18.5356369481	496.0601889337	22.2724086918	0.803554219
20	19.5146666943	504.7279774639	22.466151817	0.8001216709
10	13.2591272953	366.9922594081	19.1570420318	0.8546666662

For the 14-day prediction horizon, it is again observed that model performance does not always improve with increased training data. Experiment 10, with only 10% training data, actually produced the lowest MAE and RMSE values (13.259 and 19.157, respectively) and the highest R² value of 0.855.

Next, the model implementation results for a 21-day prediction horizon can be seen in Table 4.3.

Table 4.3 Model Implementation with Specific Training Data Percentages (21-Day Prediction Horizon)

Subset %	MAE	MSE	RMSE	R²
100	18.149703494	449.886055544	21.210517569	0.8079876293
90	17.6883748537	428.469055315	20.6994940836	0.8171284527
80	18.6160984343	472.5142084116	21.7373919413	0.7983298833
70	17.4233796538	420.6943985702	20.5108361256	0.8204466935
60	17.5434286778	427.2764388299	20.6706661438	0.8176374641
50	17.7426378353	421.2788893726	20.5250795217	0.8201972316
40	18.4307034773	485.6431547323	22.0373127838	0.7927264197
30	18.5148058531	464.1947050179	21.5451782313	0.8018806659
20	17.1712843655	435.1352499431	20.8598957318	0.814283306
10	18.1221865366	466.7914779	21.6053576203	0.8007723574

For the 21-day prediction horizon, quite good results were shown in experiment 9 with 20% training data, yielding an MAE of 17.171, RMSE of 20.860, and R² of 0.814.

Finally, the model implementation results for a 28-day prediction horizon can be seen in Table 4.4.

Table 4.4 Model Implementation with Specific Training Data Percentages (28-Day Prediction Horizon)

Subset %	MAE	MSE	RMSE	R²
100	18.7476149739	464.0980294914	21.5429345608	0.8184982475
90	17.8997474617	422.1469733015	20.5462155469	0.834904674
80	18.0183887276	423.6813126562	20.5835204146	0.834304617
70	19.8156205085	520.3167812733	22.8104533334	0.79651194
60	18.3425949489	439.2484928265	20.9582559586	0.8282165272
50	17.8109691867	416.5011520704	20.4083598574	0.8371126697
40	19.9786003267	518.9554426149	22.7805935527	0.797044339
30	19.091775831	465.9119797594	21.5849943192	0.8177888388
20	19.7791303417	514.5990557034	22.6847758575	0.7987480564
10	17.0992747436	386.1201795866	19.6499409563	0.8489942106

For the 28-day prediction horizon, experiment 10 with only 10% training data again showed very good performance with the lowest MAE of 17.099, the lowest RMSE of 19.650, and the highest R² value of 0.849.

After conducting a series of experiments with various percentages of training data for each prediction horizon, it can be observed that no single percentage of training data consistently provides the best results for all prediction horizons. Model performance is highly dependent on the desired prediction horizon.

Based on these experimental results, some interesting configurations for different prediction horizons are:

7-Day Prediction Horizon: 50% data subset
14-Day Prediction Horizon: 10% data subset
21-Day Prediction Horizon: 20% data subset
28-Day Prediction Horizon: 10% data subset

These configurations show that using smaller data subsets can sometimes lead to better performance compared to using all training data, possibly due to noise or less relevant patterns in the overall dataset for a specific prediction horizon.

Evaluation and Test Results

Evaluating prediction performance is an essential stage in validating the Deep Gaussian Process (DGP) model for solar radiation forecasting. In this study, model performance was quantitatively measured using a standard set of evaluation metrics, including Mean Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE), and the coefficient of determination (R²). The selection of these metrics is based on their ability to provide a comprehensive analysis of various aspects of prediction error. MAE represents the average magnitude of prediction errors, while MSE and RMSE provide higher sensitivity to larger deviations. The coefficient of determination (R²) indicates the proportion of variance in the observed data that can be explained by the model, thus reflecting the model's goodness-of-fit.

Evaluation and Test Results for 7-Day Prediction Horizon

Based on Table 4.1, the model configuration trained with a 50% proportion of training data showed optimal performance for solar radiation forecasting with a 7-day prediction horizon. This optimality criterion is based on relatively low MAE and RMSE values and a high R² value.

Figure: Comparison of Model Prediction Results with Observation Data (50% Training Data, 7-Day Prediction Horizon)

The solar radiation predicted by the Deep Gaussian Process model compared with the actual observed solar radiation values for a 7-day prediction horizon. This graph specifically displays the results from the model trained using 50% of the total training data, a configuration identified as providing optimal performance for the 7-day horizon based on evaluation metric analysis (Table 4.1). In this graph, there are two curves or data series: one representing the actual observation values (observation curve) and another representing the values predicted by the model (prediction curve). Visual analysis of this graph shows how closely the prediction curve follows the observation curve. The correspondence between the two curves indicates the model's ability to capture patterns and trends in solar radiation data for the 7-day prediction period. Deviations between the prediction and observation curves highlight model prediction errors. For example, on day 4, the model tends to overestimate, which would appear as a peak on the prediction curve higher than the observation curve on that day.

Figure: Posterior Samples Distribution of Model Parameters (50% Training Data, 7-Day Prediction Horizon)

In the context of Bayesian modeling like Gaussian Process, model parameters are not represented as single values but as probability distributions (posterior distributions) that reflect uncertainty regarding those parameter values after observing the training data. Figure 4.7 displays histograms or density plots for each key parameter in the DGP model (e.g., kernel parameters like characteristic length, amplitude, or noise). The shape, central location, and width of these distributions provide important information about the model parameters. For instance, the peak of the distribution indicates the most likely parameter value (mode or mean of the posterior distribution), while the width of the distribution reflects uncertainty: a narrower distribution indicates higher certainty about the parameter value, whereas a wider distribution indicates greater uncertainty.

Figure: Histogram of Model Residuals (50% Training Data, 7-Day Prediction Horizon)

The histogram of residuals shows that the center of the residual distribution is around 0, indicating no significant systematic bias (the model does not consistently overestimate or underestimate). The spread of residuals ranges from -30 to +20 W/m², providing an idea of the magnitude of prediction errors. The shape of the histogram generally resembles a normal distribution. The highest frequency of residuals is around 0.

Figure: Scatter Plot of Predicted Values vs. Observation Values (50% Training Data, 7-Day Prediction Horizon)

In this scatter plot, each point represents a pair of values, where the horizontal axis (x) shows the actual observation values and the vertical axis (y) shows the model's predicted values. A diagonal line (y=x) is also often drawn on this plot. If the model could predict values perfectly, all points on the scatter plot would fall exactly on this diagonal line. Therefore, the spread of points relative to the diagonal line is a primary visual indicator of model accuracy. Points scattered above the diagonal line indicate overestimation, while points below the line indicate underestimation. This plot is very useful for identifying whether the model tends to make larger errors in certain value ranges or if prediction errors are evenly distributed across the entire range of solar radiation values. The tighter the spread of points around the diagonal line, the better the model's prediction performance.

Table: Comparison of Observed and Predicted Solar Radiation Values for 7 Days (Model Trained with 50% Training Data)

Day	Actual Value	Predicted Value
1	244.7611	226.3653
2	250.7115	234.9828
3	185.2926	173.3746
4	132.9539	164.0118
5	268.8598	254.3508
6	254.1825	238.3339
7	175.9745	179.098

Quantitative evaluation shows an MAE of 15.797 W/m², RMSE of 17.582 W/m², and R² of 0.862. Based on Table 4.5, on day 4, the model predicted a value of 164.0118 W/m² while the observed value was 132.9539 W/m², indicating significant overestimation. Conversely, on day 1, the model tended to underestimate with a prediction of 226.3653 W/m² compared to the observation of 244.7611 W/m². However, overall, the differences between predicted and observed values are relatively small.

Evaluation and Test Results for 14-Day Prediction Horizon

Based on Table 4.2, the model configuration trained with a 10% proportion of training data showed optimal performance for solar radiation forecasting with a 14-day prediction horizon.

Figure: Comparison of Model Prediction Results with Observation Data (10% Training Data, 14-Day Prediction Horizon)

This graph specifically shows the results from the model trained with only 10% of the training data, a configuration found to be optimal for the 14-day prediction horizon based on evaluation metrics. Similar to Figure 4.6, this graph displays the prediction curve and observation curve over a 14-day period. Visual analysis of this graph allows assessment of the model's ability to capture temporal patterns and predict solar radiation values two weeks ahead. Although the prediction curve generally shows an ability to follow data trends, increased variability or disparity between predictions and observations begins to appear compared to the shorter prediction horizon (7 days). This indicates that the challenge of predicting solar radiation increases with the prediction horizon, resulting in prediction errors that tend to be larger or more fluctuating in medium-term predictions.

Figure: Posterior Samples Distribution of Model Parameters (10% Training Data, 14-Day Prediction Horizon)

This visualization is similar to Figure 4.7 but specific to the 14-day prediction and 10% training data configuration. Comparing Figure 4.11 with Figure 4.7 can provide insights into how different amounts of training data (50% vs 10%) and different prediction horizons (7 days vs 14 days) affect model parameter inference. Changes in the shape, central location, or width of parameter distributions can indicate how data influences the model's confidence in parameter values, or how model parameters adapt to data characteristics relevant for longer-term predictions.

Figure: Histogram of Model Residuals (10% Training Data, 14-Day Prediction Horizon)

In Figure 4.12, the center of the residual distribution is still around zero, although a slight shift towards positive values is observed. This suggests that the model is still relatively unbiased overall, but there might be a small tendency to underestimate in some cases. The spread of residuals appears wider compared to the 7-day prediction (Figure 4.8), covering a broader range, around -50 to +20 W/m². This increased spread indicates that prediction errors tend to be more varied and possibly larger for the 14-day prediction horizon. The shape of the histogram shows similarity to a normal distribution, but with potentially thicker tails and slight skewness to the left, indicating some larger underestimation errors compared to overestimation errors.

Figure: Scatter Plot of Predicted Values vs. Observation Values (10% Training Data, 14-Day Prediction Horizon)

Figure 4.13 is a scatter plot showing the relationship between the model's predicted values and actual observed values for a 14-day prediction horizon, using a model trained with 10% training data. This plot serves as a direct visualization of model accuracy at each data point in the 14-day test set. As in Figure 4.9, points falling close to the diagonal line (y=x) indicate accurate predictions, while deviations from the diagonal line indicate prediction errors. By comparing Figure 4.13 with Figure 4.9, we can visually see if the spread of points becomes wider or less centered around the diagonal line for the longer prediction horizon. An increased spread or clearer deviation patterns in Figure 4.13 compared to Figure 4.9 would confirm the increased prediction challenge at a longer horizon.

Table: Comparison of Observed and Predicted Solar Radiation Values for 14 Days (Model Trained with 10% Training Data)

Day	Actual Value	Predicted Value
1	175.262	230.3499
2	255.7469	252.9514
3	111.015	135.4258
4	192.8626	196.6387
5	258.9128	253.7216
6	177.8482	185.1278
7	260.4985	247.9016
8	244.7611	232.1152
9	250.7115	242.7451
10	185.2926	168.0485
11	132.9539	158.8185
12	268.8598	265.3196
13	254.1825	247.5481
14	175.9745	175.3805

Quantitative evaluation shows an MAE of 13.259 W/m², RMSE of 19.157 W/m², and R² of 0.855. Based on Table 4.6, on day 1, the model significantly overestimates with a prediction of 230.3499 W/m² compared to the observation of 175.262 W/m². However, on day 14, the prediction is very accurate with a value of 175.3805 W/m² compared to the observation of 175.9745 W/m².

Evaluation and Test Results for 21-Day Prediction Horizon

Based on Table 4.3, the model configuration with a 20% proportion of training data showed relatively good results for solar radiation forecasting with a 21-day prediction horizon.

Figure: Comparison of Model Prediction Results with Observation Data (20% Training Data, 21-Day Prediction Horizon)

Visual comparison between the model's prediction curve and the actual observation curve for a 21-day prediction horizon, yielding relatively good results for this prediction timeframe. By comparing the prediction and observation curves over the 21-day period, this graph allows a qualitative assessment of the model's ability to capture long-term patterns and trends. This visual comparison shows that the disparity (difference) between the two curves tends to be more significant compared to shorter prediction horizons (7 and 14 days). This reflects the increased difficulty in predicting solar radiation values for a three-week period ahead, where the accumulation of uncertainty and the complexity of long-term data patterns become more prominent. Larger fluctuations or more consistent deviations between predictions and observations will be clearly visible in this graph.

Figure: Posterior Samples Distribution of Model Parameters (20% Training Data, 21-Day Prediction Horizon)

This visualization is similar to Figures 4.7 and 4.11 but specific to the 21-day prediction and 20% training data configuration. Changes in the distribution shape (peak location, width, and symmetry) compared to shorter prediction horizons can provide insights into how prediction uncertainty increases with the horizon, and how the model adjusts its parameters to model data patterns on a larger time scale.

Figure: Histogram of Model Residuals (20% Training Data, 21-Day Prediction Horizon)

In Figure 4.16, the center of the residual distribution is still around zero, but a slight shift towards positive values is observed, suggesting a possible slight tendency to underestimate overall. The spread of residuals widens further compared to 7 and 14-day predictions, covering a broader range, around -50 to +30 W/m². This increased spread confirms that prediction errors tend to be larger and more varied for the 21-day prediction horizon. The shape of the histogram shows similarity to a normal distribution, but with apparently thicker tails and slight skewness to the left, indicating some significant underestimation errors.

Figure: Scatter Plot of Predicted Values vs. Observation Values (20% Training Data, 21-Day Prediction Horizon)

By comparing Figure 4.17 with scatter plots for shorter horizons (Figures 4.9 and 4.13), we can observe how the spread of points changes as the prediction horizon increases. Increased deviation of points from the diagonal line (y=x) and the potential appearance of more complex distribution patterns in Figure 4.17 would reflect increased prediction errors and modeling challenges at the 21-day horizon. This plot can help identify if the model struggles to predict specific ranges of solar radiation values or if prediction errors are more randomly distributed.

Table: Comparison of Observed and Predicted Solar Radiation Values for 21 Days (Model Trained with 20% Training Data)

Day	Actual Value	Predicted Value
1	95.7571	121.8377
2	195.4927	219.436
3	190.3087	212.879
4	215.4701	223.3723
5	201.5295	186.0622
6	171.9968	163.1573
7	220.1208	245.9689
8	175.262	228.6363
9	255.7469	246.6441
10	111.015	146.0759
11	192.8626	197.3162
12	258.9128	250.9305
13	177.8482	191.6163
14	260.4985	245.8707
15	244.7611	232.657
16	250.7115	240.8184
17	185.2926	174.492
18	132.9539	165.228
19	268.8598	257.2191
20	254.1825	242.8554
21	175.9745	179.5105

Quantitative evaluation shows an MAE of 17.171 W/m², RMSE of 20.860 W/m², and R² of 0.814. Based on Table 4.7, on day 1, the model significantly overestimates with a prediction of 121.8377 W/m² compared to the observation of 95.7571 W/m². However, on day 21, the prediction is quite accurate with a value of 179.5105 W/m² compared to the observation of 175.9745 W/m².

Evaluation and Test Results for 28-Day Prediction Horizon

Based on Table 4.4, the model configuration with a 10% proportion of training data again showed very good performance for solar radiation forecasting with a 28-day prediction horizon.

Figure: Comparison of Model Prediction Results with Observation Data (10% Training Data, 28-Day Prediction Horizon)

As with other comparison graphs, this image displays the prediction curve and observation curve over a 28-day period, allowing a qualitative assessment of the model's ability to capture data patterns and predict solar radiation values one month ahead. Visually, the difference between predictions and observations may become more apparent compared to shorter prediction horizons. This is a manifestation of accumulated uncertainty and data variability at longer prediction horizons, which inherently makes prediction more challenging. Larger fluctuations and potentially more frequent deviations between the prediction and observation curves will be clearly visible in this graph, indicating the model's limitations in capturing all data dynamics on a longer time scale.

Figure: Posterior Samples Distribution of Model Parameters (10% Training Data, 28-Day Prediction Horizon)

Comparison with parameter distributions at shorter prediction horizons can show how increasing prediction uncertainty is reflected in the model's parameter distributions. Wider distributions or different patterns in Figure 4.19 may indicate that the model is less confident about parameter values on a larger time scale due to complexity or limitations of the training data in capturing all long-term variability.

Figure: Histogram of Model Residuals (10% Training Data, 28-Day Prediction Horizon)

In Figure 4.20, the center of the residual distribution shows a slight shift towards negative values, indicating a slight tendency to overestimate overall in some cases. The spread of residuals appears widest compared to all other prediction horizons, covering a range of about -40 to +20 W/m². This increased spread highlights that prediction errors tend to be largest and most varied for predicting solar radiation one month ahead. The shape of the histogram shows similarity to a normal distribution, but with a less distinct peak and thicker tails, as well as slight skewness to the left. This indicates some significant underestimation errors and a possibly not entirely symmetrical error distribution in long-term predictions.

Figure: Scatter Plot of Predicted Values vs. Observation Values (10% Training Data, 28-Day Prediction Horizon)

Increased deviation of points from the diagonal line, a wider spread, or the appearance of distribution patterns showing a clearer bias in Figure 4.21 compared to scatter plots for shorter prediction horizons (Figures 4.9, 4.13, and 4.17) would reflect increased difficulty and decreased prediction accuracy at the 28-day horizon. This plot is very helpful in identifying ranges of solar radiation values where the model performs poorly or if prediction errors have a specific pattern on a monthly time scale.

Table: Comparison of Observed and Predicted Solar Radiation Values for 28 Days (Model Trained with 10% Training Data)

Day	Actual Value	Predicted Value
1	235.0359	217.7777
2	244.2844	218.093
3	223.4444	190.9036
4	241.6173	257.2112
5	97.307	116.3376
6	144.8075	158.7722
7	142.423	138.2108
8	95.7571	105.0462
9	195.4927	212.985
10	190.3087	207.1194
11	215.4701	217.1153
12	201.5295	179.1233
13	171.9968	154.4916
14	220.1208	241.4238
15	175.262	224.6222
16	255.7469	241.7781
17	111.015	134.7343
18	192.8626	193.0718
19	258.9128	243.8138
20	177.8482	182.8369
21	260.4985	238.573
22	244.7611	226.0482
23	250.7115	234.3608
24	185.2926	165.4055
25	132.9539	155.5004
26	268.8598	251.1839
27	254.1825	238.8563
28	175.9745	172.2085

Quantitative evaluation shows an MAE of 17.099 W/m², RMSE of 19.650 W/m², and R² of 0.849. Based on Table 4.8, on day 3, the model underestimates with a prediction of 190.9036 W/m² compared to the observation of 223.4444 W/m². However, on day 4, the model overestimates with a prediction of 257.2112 W/m² compared to the observation of 241.6173 W/m².

Discussion

The test results of the Deep Gaussian Process model show that solar radiation prediction performance is highly influenced by the prediction horizon and the amount of training data used. Generally, model performance tends to decrease (MAE and RMSE values increase, R² value decreases) as the prediction horizon increases. This is a common characteristic in time series forecasting tasks, where long-term predictions tend to be more difficult and uncertain.

Another interesting finding is that the variation in the proportion of training data that yields optimal performance is dependent on the prediction horizon. The 7-day prediction horizon showed the best performance with 50% training data, while the 14 and 28-day prediction horizons were optimal with 10% training data, and the 21-day prediction horizon gave good results with 20% training data. Furthermore, using smaller training data subsets sometimes resulted in better performance compared to using all training data. This could be due to several factors, such as:

Noise in data: Large datasets may contain noise or less relevant information that can disrupt the model training process.
Patterns specific to certain prediction horizons: Certain data subsets might be better at capturing temporal patterns relevant to a specific prediction horizon.
Overfitting: Using all training data might cause the model to become too complex and overfit the training data, thus reducing its performance on test data.

These results underscore the importance of experimenting with various proportions of training data to find the optimal configuration for each prediction horizon.

For future research, it is recommended to:

Further explore the reasons why certain data subsets yield better performance.
Perform hyperparameter tuning on the Gaussian Process kernel for each prediction horizon and proportion of training data.
Compare the DGP model's performance with other time series models like ARIMA or other machine learning models like Random Forest or Support Vector Regression.
Analyze the statistical significance of performance differences between configurations.

Overall, this research shows that the Deep Gaussian Process model is a promising approach and has good potential for solar radiation prediction. However, selecting the right amount of training data, through careful experimentation, is a crucial factor in achieving optimal performance and maximizing the model's predictive capabilities.

Conclusion and Suggestions

Conclusion

The conclusions that can be drawn from the research on solar radiation prediction using Deep Gaussian Process are as follows:

The Deep Gaussian Process method can be used to predict solar radiation by utilizing historical solar radiation data itself. Model performance varies depending on the prediction horizon and the amount of training data used, with evaluation metrics like MAE, MSE, RMSE, and R2 indicating the model's ability to capture data patterns.
Using different percentages of training data subsets significantly impacts model performance for each prediction horizon. Using all training data does not always yield the best results, and selecting the right data subset can improve prediction accuracy.
Based on test results, generally, solar radiation prediction accuracy tends to decrease as the prediction horizon increases (from 7 days to 28 days). This indicates challenges in making long-term predictions.
Experiments show that the optimal percentage of training data differs for each prediction horizon. For example, for a 7-day prediction horizon, 50% training data provides good results, while for 14 and 28-day prediction horizons, only 10% training data yields the best performance in some metrics.

Suggestions

Suggestions that can be considered for future research are as follows:

Use of More and Varied Data: Using longer historical solar radiation data and from various geographical locations can help improve model reliability and generalization. Also consider incorporating data from other relevant sources, such as weather data or satellite data.
Exploration of Additional Features: Adding other potentially relevant features, such as information about seasons, cloud conditions, or other meteorological variables (e.g., temperature, humidity), can help improve prediction accuracy.
External Validation: Validating the model using a completely independent dataset from a different time period or location will provide a better picture of the model's generalization ability.
More In-depth Hyperparameter Tuning: Further exploration of Deep Gaussian Process model hyperparameters, including kernel parameters, number of layers, and optimization parameters, can help find more optimal configurations for each prediction horizon.
Comparison with Other Models: Comparing the performance of the Deep Gaussian Process model with other solar radiation prediction models, such as statistical models (e.g., ARIMA), or machine learning models (e.g., Random Forest, Neural Networks), can provide more comprehensive insights into the advantages and disadvantages of each approach.
Model Interpretation Analysis: Although Deep Gaussian Process is known for its ability to capture uncertainty, further research into model interpretation and how the model makes predictions can provide a better understanding of the factors affecting solar radiation.
Development of Adaptive Models: Developing models that can adaptively adjust to changes in solar radiation patterns over time can improve long-term prediction performance.

References

Ağbulut, Ü., Gürel, A. E., & Biçen, Y. (2021). Prediction of daily global solar radiation using different machine learning algorithms: Evaluation and comparison. Renewable and Sustainable Energy Reviews, 135, 110114. https://doi.org/10.1016/j.rser.2020.110114
Ambarita, H. (2017). Development of software for estimating clear sky solar radiation in Indonesia. Journal of Physics: Conference Series, 801(1). https://doi.org/10.1088/1742-6596/801/1/012093
Behera, M. K., Majumder, I., & Nayak, N. (2018). Solar photovoltaic power forecasting using optimized modified extreme learning machine technique. Engineering Science and Technology, an International Journal, 21, 428–438. https://doi.org/10.1016/j.jestch.2018.04.013
Boubaker, S., Benghanem, M., Mellit, A., Lefza, A., Kahouli, O., & Kolsi, L. (2021). Deep Neural Networks for Predicting Solar Radiation at Hail Region, Saudi Arabia. IEEE Access, 9, 36719–36729. https://doi.org/10.1109/ACCESS.2021.3062205
Budiwati, T., Hamdi, S., & Dyah Aries Tanti. (2014). EFEK GAS SO2 DAN KELEMBAPAN UDARA TERHADAP INSOLASI DAN TEMPERATUR DI BANDUNG [EFFECT OF SO2 GAS AND HUMIDITY TO INSOLATION AND TEMPERATURE IN BANDUNG]. Jurnal Sains Dirgantara, 11, 109–120.
Budyko, M. I. (1969). The effect of solar radiation variations on the climate of the Earth. Tellus A: Dynamic Meteorology and Oceanography, 21(5), 611. https://doi.org/10.3402/tellusa.v21i5.10109
Chen, Z., Guo, D., & Lin, Y. (2020). A deep gaussian process-based flight trajectory prediction approach and its application on conflict detection. Algorithms, 13(11), 1–19. https://doi.org/10.3390/a13110293
Damianou, A. C., & Lawrence, N. D. (2013). Deep Gaussian Processes.
Hochreiter, S., & Schmidhuber, J. (1997). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780.
Huang, L., Kang, J., Wan, M., Fang, L., Zhang, C., & Zeng, Z. (2021). Solar Radiation Prediction Using Different Machine Learning Algorithms and Implications for Extreme Climate Events. Frontiers in Earth Science, 9. https://doi.org/10.3389/feart.2021.596860
Islam, M. D., Kubo, I., Ohadi, M., & Alili, A. A. (2009). Measurement of solar energy radiation in Abu Dhabi, UAE. Applied Energy, 86(4), 511–515. https://doi.org/https://doi.org/10.1016/j.apenergy.2008.07.012
Kafka, J. L., & Miller, M. A. (2019). A climatology of solar irradiance and its controls across the United States: Implications for solar panel orientation. Renewable Energy, 135, 897–907. https://doi.org/https://doi.org/10.1016/j.renene.2018.12.057
Karasu, S., Altan, A., Sarac, Z., & Hacioglu, R. (2017). PREDICTION OF SOLAR RADIATION BASED ON MACHINE LEARNING METHODS. www.dergipark.gov.tr/jcs
Kim, H., Park, S., Park, H. J., Son, H. G., & Kim, S. (2023). Solar Radiation Forecasting Based on the Hybrid CNN-CatBoost Model. IEEE Access, 11, 13492–13500. https://doi.org/10.1109/ACCESS.2023.3243252
Lawrence, N. D. (n.d.). Learning for Larger Datasets with the Gaussian Process Latent Variable Model.
Lipton, Z. C., Berkowitz, J., & Elkan, C. (2015). A Critical Review of Recurrent Neural Networks for Sequence Learning. ArXiv Preprint ArXiv:1506.00019.
Lubbe, F., Maritz, J., & Harms, T. (2020). Evaluating the potential of gaussian process regression for solar radiation forecasting: A case study. Energies, 13(20). https://doi.org/10.3390/en13205509
Mohammadi, K., Shamshirband, S., Anisi, M. H., Amjad Alam, K., & Petković, D. (2015). Support vector regression based prediction of global solar radiation on a horizontal surface. Energy Conversion and Management, 91, 433–441. https://doi.org/10.1016/J.ENCONMAN.2014.12.015
Muhammad Rezza, et al. (2024). Prediksi Energi Matahari Menggunakan Metode Long Short-Term Memory (LSTM) Berbasis Jaringan Syaraf Tiruan. Jurnal Teknik Elektro, 15(2), 45-52.
Najibi, F., Apostolopoulou, D., & Alonso, E. (2020). Gaussian Process Regression for Probabilistic Short-term Solar Output Forecast. ArXiv Preprint ArXiv:2002.10878.
Nematchoua, M. K., Orosa, J. A., & Afaifia, M. (2022). Prediction of daily global solar radiation and air temperature using six machine learning algorithms; a case of 27 European countries. Ecological Informatics, 69. https://doi.org/10.1016/j.ecoinf.2022.101643
Novi Yanti, et al. (2019). Prediksi Radiasi Matahari Menggunakan Metode Elman Recurrent Neural Network (ERNN) dengan Data dari Badan Meteorologi, Klimatologi, dan Geofisika (BMKG) Pekanbaru. Jurnal Informatika, 8(3), 112-120.
Pawitra Teguh Dharma Priatam, P., Fitra Zambak, M., & Harahap, P. (2021). Analisa Radiasi Sinar Matahari Terhadap Panel Surya 50 WP. 4(1), 48–54. https://doi.org/10.30596/rele.v4i1.7825
Rasmussen, C. E. (2004). Gaussian Processes in Machine Learning. In O. Bousquet, U. von Luxburg, & G. Rätsch (Eds.), Lecture Notes in Artificial Intelligence (LNAI) (Vol. 3176, pp. 63–71). Springer-Verlag. https://doi.org/10.1007/978-3-540-28650-9_4
Rasmussen, C. E., & Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. The MIT Press. http://www.GaussianProcess.org/gpml
Roihan, A., Abas Sunarya, P., & Rafika, A. S. (2019). IJCIT (Indonesian Journal on Computer and Information Technology) Pemanfaatan Machine Learning dalam Berbagai Bidang: Review paper. In IJCIT (Indonesian Journal on Computer and Information Technology) (Vol. 5, Issue 1).
Seeger, M. (2004). Gaussian Processes for Machine Learning. International Journal of Neural Systems, 14(2), 69–106. https://doi.org/10.1142/S0129065704001899
Sharma, R. (2020). Study of Supervised Learning and Unsupervised Learning. International Journal for Research in Applied Science and Engineering Technology, 8(6), 588–593. https://doi.org/10.22214/ijraset.2020.6095
Sianturi, Y. (2021). Pengukuran dan Analisa Data Radiasi Matahari di Stasiun Klimatologi Muaro Jambi. Megasains, 12(1), 40–47. https://doi.org/10.46824/megasains.v12i1.45
Simanjuntak, R. A., & Wibowo, A. S. (2023). Prediksi Intensitas Radiasi Matahari di Kota Jayapura Menggunakan Jaringan Syaraf Tiruan dengan Algoritma Backpropagation. Jurnal Sains dan Teknologi, 12(1), 1-10.
Solano, E. S., Dehghanian, P., & Affonso, C. M. (2022). Solar Radiation Forecasting Using Machine Learning and Ensemble Feature Selection. Energies, 15(19). https://doi.org/10.3390/en15197049
Takdirillah, R. (2020). Apa itu Machine Learning? Beserta Pengertian dan Cara Kerjanya. Dicoding Blog. https://www.dicoding.com/blog/machine-learning-adalah/
Voyant, C., Darras, C., Muselli, M., Paoli, C., Nivet, M. L., & Poggi, P. (2014). Bayesian rules and stochastic models for high accuracy prediction of solar radiation. Applied Energy, 114, 218–226. https://doi.org/10.1016/J.APENERGY.2013.09.051
Voyant, C., Notton, G., Kalogirou, S., Nivet, M.-L., Paoli, C., Motte, F., & Fouilloy, A. (2017). Machine learning methods for solar radiation forecasting: A review. Renewable Energy, 105, 569–582. https://doi.org/https://doi.org/10.1016/j.renene.2016.12.095
Yanti, N., Pandu Cynthia, E., Vitriani, Y., Azmi, G., Islam Negeri Sultan Syarif Kasim Riau Jl Soebrantas Km, U. H., Simpang Baru Panam -Kecamatan Tampan, K., kunci, K., Syaraf Tiruan, J., Buatan, K., & Matahari, R. (2019). Prediksi Radiasi Matahari Dengan Penerapan Metode Elman Recurrent Neural Network (Vol. 12).