--- title: "Exogenous Variables" output: rmarkdown::html_vignette: toc: true toc_depth: 2 vignette: > %\VignetteIndexEntry{Exogenous Variables} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r setup, include=FALSE} library(httptest2) .mockPaths("../tests/mocks") start_vignette(dir = "../tests/mocks") original_options <- options("NIXTLA_API_KEY"="dummy_api_key", digits=7) knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.width = 7, fig.height = 4 ) ``` ```{r} library(nixtlar) ``` ## 1. Exogenous variables Exogenous variables are external factors that provide additional information about the behavior of the target variable in time series forecasting. These variables, which are correlated with the target, can significantly improve predictions. Examples of exogenous variables include weather data, economic indicators, holiday markers, and promotional sales. `TimeGPT` allows you to include exogenous variables when generating a forecast. This vignette will show you how to include them. It assumes you have already set up your API key. If you haven't done this, please read the [Get Started](https://nixtla.github.io/nixtlar/articles/get-started.html) vignette first. ## 2. Load data For this vignette, we will use the electricity consumption dataset with exogenous variables included in `nixtlar`. This dataset contains hourly prices from five different electricity markets, along with two exogenous variables related to the prices and binary variables indicating the day of the week. ```{r} df_exo_vars <- nixtlar::electricity_exo_vars head(df_exo_vars) ``` There are two types of exogenous variables: **historic** and **future**. - **Historic Exogenous Variables**: They should be included directly in the input dataset `df`. - **Future Exogenous Variables**: They must be included in the `X_df` parameter. To specify which variables should be treated as historic, use the `hist_exog_list` parameter. This parameter is available in both the `forecast` and `cross_validation` functions. - If `df` contains exogenous variables but they are not found in `X_df` nor declared in `hist_exog_list`, they will be ignored. - If exogenous variables were declared as historic but found in `X_df`, then they will be considered as historic. In the next section, we will explore different cases for forecasting with exogenous variables. ## 3a. Forecasting electricity prices using historic and future exogenous variables If both historic and future values of all exogenous variables are available, include the historic exogenous variables in `df` and the future exogenous variables in `X_df`. ```{r} future_exo_vars <- nixtlar::electricity_future_exo_vars head(future_exo_vars) fcst_exo_vars <- nixtla_client_forecast( df_exo_vars, h = 24, X_df = future_exo_vars ) head(fcst_exo_vars) ``` ## 3b. Forecasting electricity prices using only historic exogenous variables If future values of the exogenous variables are not available, you can still generate forecasts using only their historical values. In this case, simply include them in `df` and declare them in `hist_exog_list`. ```{r} fcst_exo_vars <- nixtla_client_forecast( df_exo_vars, h = 24, hist_exog_list = c("Exogenous1", "Exogenous2", "day_0", "day_1", "day_2", "day_3", "day_4", "day_5", "day_6") ) head(fcst_exo_vars) ``` Note that if you don't declare the exogenous variables in `hist_exog_list`, they will be ignored. If we hadn't declared them above, the output would be the same as the TimeGPT forecast using only the target variable `y`. **Important:** If you include historical exogenous variables without explicitly defining their future values, you are implicitly assuming that their historical patterns will continue into the future. Whenever possible, it is recommended to use future exogenous variables to make these assumptions explicit. ## 3c. Forecasting future exogenous variables When future exogenous variables are not available, an alternative approach is to forecast them separately using TimeGPT. First, generate forecasts for the exogenous variables and then pass the predicted values in `X_df` for the main forecast. ## 3d. Forecasting electricity prices using both future and historic exogenous variables In some cases, only a subset of future exogenous variables is available. For example, if future values of `Exogenous1` and `Exogenous2` are unknown, add them to `hist_exog_list`. ```{r} future_exo_vars <- future_exo_vars |> dplyr::select(-dplyr::all_of(c("Exogenous1", "Exogenous2"))) fcst_exo_vars <- nixtla_client_forecast( df_exo_vars, h = 24, X_df = future_exo_vars, hist_exog_list = c("Exogenous1", "Exogenous2") ) head(fcst_exo_vars) ``` ## 4. Plot TimeGPT forecast `nixtlar` includes a function to plot the historical data and any output from `nixtla_client_forecast`, `nixtla_client_historic`, `nixtla_client_anomaly_detection` and `nixtla_client_cross_validation`. If you have long series, you can use `max_insample_length` to only plot the last N historical values (the forecast will always be plotted in full). ```{r} nixtla_client_plot(df_exo_vars, fcst_exo_vars, max_insample_length = 500) ``` ```{r, include=FALSE} options(original_options) end_vignette() ```