1 Introduction

Increasing variability in crop yields is a major concern for global food security (Araujo-Enciso et al. 2017; Lesk et al. 2016). High yield variability translates into volatile farm incomes and may limit investment in intermediate inputs, such as mineral fertilizer, as the inputs would be lost in the case of production shortfall (Hurley et al. 2018). Hence, increasing yield variability may reduce land-use intensity, with negative consequences for crop yields even in years with good weather conditions (Swinnen et al. 2017).

Climate change has already contributed to higher crop yield variability (Döring and Reckling 2018; Hawkins et al. 2013; Iizumi and Ramankutty 2016). Rising yield variability elevates the risk of synchronized production failure in important global breadbaskets (Gaupp et al. 2020; Mehrabi and Ramankutty 2019), as evidenced for example by the simultaneous shortfall of global maize production in several key producing regions in 1983 (Anderson et al. 2019). Synchronized production failures can cause price spikes for major staple crops, with particularly distressing consequences for poorer consumers in developing countries who rely on imports (Gilbert and Morgan 2010). For example, poor wheat harvests led to the implementation of export bans in Russia, Ukraine, and Kazakhstan, which contributed to spikes in wheat prices in 2008 (Headey 2011). Examining the causes of yield variability in the major production centers is therefore important.

Economic and political shocks (Wright 2011), insect pest outbreaks (Oerke 2006), fungal diseases such as rust (Singh et al. 2008), and farmer decisions regarding soil management, choice of cultivars, and fertilizer or pesticide applications can all affect crop yields (Gregory et al. 2009; Schierhorn et al. 2014). However, annual variations in climatic and weather conditions contribute the most to crop yield variability at the global level (Frieler et al. 2017; Ray et al. 2015). Efforts to reduce variability in crop yields, such as through plant breeding (Mühleisen et al. 2014) and agronomic management (Hatfield et al. 2018; Smith et al. 2007), critically depend on a better understanding of the impact of long-term climatic trends and rapid-onset of weather extremes on yield variability (Webber et al. 2018). Yet, the compound effects of climatic trends and weather extremes on crop yields are not well understood to date.

Long-term climatic mean variables, such as temperature, growing degree days (GDD), and precipitation, constitute the biophysical boundary conditions for crop growth potentials of a region and have been associated with crop yields in statistical models (Roberts et al. 2012). However, the impacts of long-term climatic mean variables on crop yields (e.g., during specific months or the entire growing season) are distinct from the effects of one-off weather extremes (Rowhani et al. 2011; Vogel et al. 2019). Extreme low or high temperatures or abnormally low or high rainfall lead to stunted crop growth and yield declines below the location-specific yield potential, even under optimal crop management and when seasonal temperature and water requirements are met (Barlow et al. 2015; Tigchelaar et al. 2018).

Weather extremes, such as excessive precipitation, extreme frost, or extreme heat, are rare, often unprecedented events that are quantified using percentiles (e.g., above the 95th percentile) or absolute values above a threshold, such as a temperature threshold (Tebaldi et al. 2006). Empirical insights suggest that extreme weather events exert growing damage to global crop production (Asseng et al. 2011; Rowhani et al. 2011; Urban et al. 2015; Zampieri et al. 2017).

The impacts of weather extremes on crop yields depend on when they occur during plant growth. For example, short episodes of temperatures higher than 22 °C during the early reproductive stage are known to cause a reduction in grain number and grain yield of wheat (Farooq et al. 2011). Later in the growing season, temperatures above 32 °C during anthesis and 34 °C during grain filling can be detrimental to grain weight (Farooq et al. 2011), particularly if they occur as a heatwave, i.e., a period of consecutive extremely hot days (Mazdiyasni and AghaKouchak 2015). Unfortunately, only a few notable examples have controlled for the effects of extreme temperature on yields during different plant developmental stages (Innes et al. 2015; Peichl et al. 2018; Rowhani et al. 2011), partly because data on the timing of phenological stages are lacking. As a result, knowledge about the impacts of heat events on yields remains limited (Iizumi and Ramankutty 2016; Liu et al. 2014).

Ukraine is an important global breadbasket, where the impacts of extreme weather events on crop yields remain elusive. Ukraine was among the six largest exporters of wheat between 2015 and 2019 (FAOSTAT 2021) and had the second largest variability in wheat yield between 2010 and 2018 globally (Fig. S1), with important repercussions for world grain markets and global food security (Araujo-Enciso et al. 2017). Ukrainian agriculture, particularly on the southeastern steppe where most production is concentrated (Fileccia et al. 2014; Müller et al. 2016), is increasingly exposed to the effects of climate change, suggesting that climate-induced yield variability will continue to increase (Iizumi and Ramankutty 2016). Unfortunately, assessments on the impact of long-term climatic patterns and weather extremes on crop yields in Ukraine are lacking to date.

The main objective of this paper is to assess the impacts of long-term climatic means and extreme weather events on wheat yields in Ukraine. We address the following two research questions: First, how much of the variability in wheat yields from 1985 to 2018 can be explained by climatic mean variables and how much by variables that capture weather extremes? Second, how large are the effects of climatic means and weather extremes on yields during specific developmental stages of the wheat plants? To address these questions, we used random forests, a machine learning algorithm based on an ensemble of decision trees, to associate winter wheat yields with daily weather measurements from meteorological stations across Ukraine.

2 Materials and methods

2.1 Study area

Ukraine is divided into 24 provinces and the Autonomous Republic of Crimea (Fig. 1a), which was annexed by the Russian Federation in 2014. Wheat is almost entirely cultivated under rainfed conditions in Ukraine, and winter wheat accounted for 96% of the total wheat and 44% of the cereal area on average between 1992 and 2018 (UKRSTAT 2019). Ukraine’s land mass has already experienced a 2.1 °C mean temperature increase between 1985 and 2018 (own calculation based on data from Harris et al. (2020)). In particular, the vast fertile black soil areas in the Southeast are increasingly affected by more intense droughts (Fig. S2). Ukraine’s continental climate with hot summers and cold winters is characterized by high interannual climatic variability (Müller et al. 2016).

Fig. 1
figure 1

Study area divided into the Northwest and Southeast of Ukraine and with the location of the 190 meteorological stations from the Global Historical Climatology Network (GHCN) (Menne et al. 2012) (A); Developmental stages of winter wheat in Ukraine (B) (USDA 2021). See Table S1 for length of the developmental stages in the Northwest and Southeast

Our dependent variable was winter wheat yield reported by the State Statistics Service of Ukraine (UKRSTAT 2019) at the provincial level for every year from 1985 to 2018 (yield data are only available until 2013 for Crimea). We analyzed models for the entire country, as well as for the Northwest and Southeast of Ukraine separately to account for the distinct climatic and soil conditions in the two regions (Müller et al. 2016). The Northwest consists mainly of mixed forest and forest steppe; it has sufficient precipitation and highly favorable temperature conditions for wheat growth in most years. Fertile black (chernozem) soils characterize the Pontic steppe in the Southeast, a semi-arid region that was formerly dominated by natural grasslands, which has largely been converted to arable land during the 20th century (Wesche et al. 2016). On average, the amount of sown area of wheat has been approximately 20% higher in the Southeast than in the Northwest (UKRSTAT 2019).

2.2 Developmental stages of winter wheat

In Ukraine, winter wheat is sown by the end of August and in the first weeks of September (Fig. 1b). After an initial vegetative phase, plant growth ceases with the onset of winter in early November. During winter, young wheat plants are insulated from frost by a snow layer, and plant growth resumes in the first weeks of March (USDA 2021). This second vegetative phase includes the development of the double ridge and terminal spikelet, typically in the last two weeks of April (Eriksson and Magnusson 2015). Anthesis is towards the end of May, followed by the grain filling phase in June and the first weeks of July (Becker-Reshef et al. 2010; Eriksson and Magnusson 2015; Morgun et al. 2018; USDA 2021). The harvesting period in Ukraine typically stretches from July to early August. The onset of planting, harvesting, and the length of the developmental stages of winter wheat differ by several days between the Northwest and Southeast of Ukraine and we accounted for these differences in our models (Table S1).

2.3 Climatic data

We used daily measurements of minimum, maximum, and average temperature (TMIN, TMAX, and TAVG, respectively), precipitation (P) and snow cover from 1985 to 2018 from the 190 meteorological stations within Ukraine that are available from the Global Historical Climatology Network (GHCN) - Daily dataset (Fig. 1a, Menne et al. (2012)). We excluded days with implausible measurements, such as excessive TMAX (> 45 °C) or TMIN (< -40 °C), and erroneous datapoints, such as when TAVG or TMIN exceeded TMAX or when TMIN exceeded TAVG. For days without data for TAVG, we obtained TAVG by averaging TMIN and TMAX. For days where TMIN or TMAX was missing but TAVG was available, we calculated TMIN or TMAX by adding to TAVG the mean difference between TAVG and TMIN or TMAX on the respective day of the year over all years. We approximated hourly, within-day temperatures by using a sinusoidal distribution through daily TMIN and TMAX (D’Agostino and Schlenker 2016). When snow cover was missing with TMAX below 2 °C since the last snow cover occurrence, we assumed that snow cover had stayed constant. When snow cover was missing but precipitation had been recorded with TMAX below 2 °C, we added the additional precipitation to the existing snow cover, assuming a 1:10 snow ratio (i.e., 10 mm of precipitation yield 100 mm of snow cover), which constitutes the lower bound of estimates for this ratio (Ware et al. 2006). This suffices our purposes because we are mainly interested in whether a protective snow cover was present; the precise thickness of the snow layer is less important here. We then averaged for each day the measurements of TMIN, TMAX, TAVG, P, and snow cover of all stations located within a province to match the spatial scale of the yield data.

We examined both long-term climatic mean variables (‘climatic means’ hereafter) and variables that capture weather extremes (‘weather extremes’ hereafter, Table S2). Our climatic means include TMIN, TMAX, TAVG, P, growing degree days (GDD), and the standardized precipitation evapotranspiration index (SPEI, Vicente-Serrano et al. (2010)). GDD is a measure of heat energy received by the crop for a specified time period (e.g., from sowing to harvesting; McMaster and Wilhelm (1997)). We calculated GDD for all days when TAVG was above 0°C and TMAX was below the respective threshold for extreme heat. The SPEI captures the availability of soil-water for crops in relation to the statistical long-term distribution of the local water balance and allows monitoring of drought development. We used the SPEI estimates from SPEIbase v.2.6 (Vicente-Serrano et al. 2010).

We used the daily station measurements from GHCN to calculate eight different weather extremes, of which three relate to extreme heat during the day (extreme degree days - EDD, heat waves with precipitation - HWwithP, and heat waves without precipitation - HWnoP). These three extreme heat variables were only calculated for days during which TMAX exceeded a certain heat threshold, and for such days, we did not calculate GDD. We used two different approaches to calculate the heat threshold: In the percentile threshold approach, the heat threshold is the 90th percentile of daily maximum temperatures from 1985 to 2018 for the given province and day of the year. In the fixed threshold approach, we used province-invariant heat thresholds that are constant for each day within the same developmental stage and are reported in Farooq et al. (2011): 30 °C for the vegetative phase I, 21.4 °C for the vegetative phase II, 32 °C for the reproductive phase, and 34.3 °C for the grain filling phase. The calculation of the remaining weather extreme variables did not differ between the percentile and fixed threshold approach. EDD measures the accumulated hourly temperatures above the heat threshold and has been shown to be important for crop yields (Auffhammer and Schlenker 2014; Carleton and Hsiang 2016). HWwithP and HWnoP are the accumulated hourly temperatures for days when TMAX exceeds the heat threshold for at least three consecutive days. TNW capture waves of exceptionally high daily minimum temperatures. Heavy precipitation (HP) captures unusually high daily precipitation, which can lead to water logging and can reduce yields by inhibiting root growth (Malik et al. 2002), favoring the proliferation of fungal diseases, and by posing mechanical difficulties during harvest (Mäkinen et al. 2018). To assess frost, we differentiated between frost events with snow cover (FwithSC) and without snow cover (FnoSC) because severe frost without an isolating snow cover can lead to leaf chlorosis and hence to yield loss (Harkness et al. 2020; Kolář et al. 2014). For the frost isolation effect to be exerted, we assumed that there must be at least 10 mm of snow. Temperatures below zero in late winter or early spring, particularly after plant growth has already resumed, may cause medium to severe yield loss by damaging the plant’s florets and stem (Frederiks et al. 2015; Zheng et al. 2015). To account for frost events in spring, we defined a late frost variable (LF) that considers previous daily average temperatures (Table S2).

We used season-long variables for the period from planting to harvesting, while intraseasonal variables were specific for the duration of the five developmental stages of wheat (Fig. 1b). We hypothesize that intraseasonal variables have higher explanatory power than season-long variables because wheat has different optimal temperature requirements and maximum temperature thresholds during each developmental stage (Farooq et al. 2011). We calculated season-long and intraseasonal temperature variables by averaging daily TMIN, TMAX, and TAVG; and we constructed season-long and intraseasonal P, GDD, and weather extremes by summing the daily values for the entire growing season and the specific plant developmental stages. For the SPEI, we chose the time scale of the period of interest and associated, for example, SPEI-2 in October with vegetative phase I (September and October) and SPEI-11 in July to cover the entire growing season since September.

2.4 Random forest models

Random forests (RFs) are a nonparametric machine learning algorithm that use an ensemble of classification or regression trees (Breiman 2001). RF have been used to quantify the effect of climate and weather on crop yield and have achieved higher explanatory power than traditional regression-based approaches (Feng et al. 2018; Hoffman et al. 2020; Jeong et al. 2016; Vogel et al. 2019). To remove the effects of long-term technological and climatic trends on yields, we detrended the yield data using a second-order polynomial regression against time prior to running the RF models (Lu et al. 2017). We assessed the explained variance (R2) with three combinations of climatic variables (both climatic means and weather extremes; climatic means only; weather extremes only), with season-long and intraseasonal variables, and for three different spatial extents (Country wide, Northwest, Southeast) (Fig. S3). We included province dummy factor variables, which account for unobserved regional differences in cropping practices, mechanization, and input intensity across the provinces.

We ran each model 100 times; in each model run, we randomly assigned 80% of the observations to the training data and calculated R² values with the remaining 20%. We performed Wilcoxon signed-rank tests (Woolson 2007) to assess whether mean R² values significantly differed between models with or without the province dummy and for models based on either the percentile or fixed threshold approach.

A linear relationship between two predictor variables does not affect predictions in RF (Cutler et al. 2007) but collinearity can obscure variable influences because the importance value is split between the two (Breiman 2001). For example, high temperatures are often the result of, and hence collinear with, soil moisture stress (Trenberth and Shea 2005). We identified the model configurations with the highest R² and assessed collinearity among the predictor variables by calculating the variance inflation factor (VIF; (Alin 2010)). We iteratively deleted the variable with the highest VIF until all remaining variables had a VIF below 10. This resulted in a dataset with 39 climatic mean and weather extreme variables and the province dummy (Fig. S4). For each of the 100 model runs and each of the three regions, we recorded the percent increase in mean squared error (%IncMSE) as a measure of variable importance. We also calculated the partial dependence of the response variable on each predictor variable for 50 equally spaced points from the minimum to maximum value of the respective predictor variable. We compared the predictions of the final RF models with the predictions of linear mixed-effect (LME) models, in which we used the province dummy as a random effect. We also assessed the performance of the RF and LME models in terms of root mean squared error (RMSE), Nash-Sutcliffe model efficiency (NSE), and Willmots’ index of agreement (Willmott’s d). RMSE and Willmott’s d are two measures of deviation between observed and predicted values of the response variable (Willmott 1981). The NSE evaluates the predictive power by comparing it with the observed mean and supports out-of-sample cross-validation (Nash and Sutcliffe 1970). Again, we used Wilcoxon signed-rank tests to assess whether model performance significantly differed between the RF and LME models.

We grew 500 trees in each RF configuration and chose the default setting of one-third of the predictor variables (i.e., the hyperparameter mtry=13) as candidates at each split in a regression tree. We found that the variable importance ranks of our most important predictors were largely unaffected by the number of candidate variables available at each split (see results). We set the minimum size of terminal nodes to five and kept the maximum size of terminal nodes unrestricted. Finally, we used partial dependence plots to reveal the functional relationship between the marginal effects of climatic means and weather extremes on predicted wheat yields. We performed all statistical analyses in the R environment (R-Development-Core-Team 2017).

3 Results

3.1 Climatic and weather variables alone explain 49–58% of the yield variability

The RF models with both intraseasonal climatic means and weather extremes clearly outperformed the models with only season-long variables (Fig. 2). The Wilcoxon signed-rank tests showed almost no significant differences in R² values between the models based on the percentile and fixed threshold approach (Fig. S5). We here focus on the percentile threshold approach and provide the results for the fixed threshold approach in the supplement (Figs. S6, S7, S8). The predictive ability of the models was substantially lower without the province dummy (Fig. 2). The RF models without the province dummy reveal the sole influence of climatic variables on wheat yields and captured 54% (country-wide), 58% (Northwest), and 49% (Southeast) of the wheat yield variability when based on the percentile threshold approach and both climatic means as well as weather extremes were included. When the province dummy and hence unobserved non-climate factors were included, these models captured 59% (country-wide), 63% (Northwest) and 52% (Southeast) of the variability (Fig. 2).

Fig. 2
figure 2

Explained variability (R²) from random forest out-of-sample predictions with different sets of climatic mean and weather extreme predictor variables, for season-long and intraseasonal variables, and for three different regions (country-wide, Northwest, Southeast), based on the percentile threshold approach. The symbols for the group comparison indicate that the statistical significance of the Wilcoxon signed-rank test differed between models with and without the province dummy (not significant (ns): p > 0.05; *: p <= 0.05; **: p <= 0.01; ***: p <= 0.001; and ****: p <= 0.0001)

3.2 Climatic means have more explanatory power than weather extremes but both are important

We assessed the contribution of weather extremes to the variability in wheat yields with intraseasonal models based on the percentile threshold approach and with the province dummy. Climatic means alone captured 58% (country-wide), 62% (Northwest), and 53% (Southeast) of the yield variability (Fig. 2). A comparison with the model with all climate variables implies that adding weather extremes to the models with climatic means increased the explained variability at low level (Fig. 2). The small difference between the R² values suggests that much of the impacts of weather extremes are already captured by the climatic means. Indeed, weather extremes accounted for a mean yield variability of 36% (country-wide), 40% (Northwest), and 36% (Southeast) (Fig. 2), implying that weather extremes alone already have sizeable effects on yield variability. The predictions mirror well the annual variability in observed yields in most years but often failed to predict the full amplitude in the observed yields, e.g., in 2003 in the Northwest (Fig. S9). The validation measures except Willmott’s d show that the RF models outperformed the LME models in all three regions, but the differences were small (Fig. S10).

3.3 Distinct influences of climate variables in different stages of plant growth

We used the final RF models, consisting of 39 climate mean and weather extreme variables and the province dummy, to systematically assess variable importance for the different plant developmental stages and regional levels. We found distinct impacts of climatic means and weather extremes on wheat yields in the different developmental stages. Temperature-related climatic means (TMIN, TMAX, and GDD) were important predictors during winter dormancy, vegetative phase II, and reproductive phase in all three regions (Fig. 3). SPEI affected yields across most plant developmental stages in all three regions. Precipitation (P) was either excluded due to multicollinearity with the SPEI (Fig. S11) or had low variable importance during winter dormancy and grain filling.

In general, weather extremes had a smaller influence than climatic means. However, some weather extremes substantially influenced yields in specific plant developmental stages, such as extreme degree days (EDD) in the vegetative phase I (country-wide and Northwest), heat waves with precipitation (HWwithP) and tropical night waves (TNW) in the reproductive phase, and frost with snow cover (FwithSC) during winter dormancy. Late frost (LF) had no impact on yields in all models.

In general, climatic means played a smaller role in determining yields across most plant developmental stages in the Southeast than in the Northwest and country-wide (Fig. 3). However, the SPEI in the reproductive phase, HWwithP, TNW, and frost events without snow cover (FnoSC) had particularly strong effects in the Southeast. The province dummy was the most important variable in the Northwest and at the country level (mean %IncMSE of 27 and 34, respectively), suggesting that unobserved factors, such as input intensity, were more important in these regions than in the Southeast (mean %IncMSE = 10). The importance of the predictor variables was largely unaffected by the hyperparameter mtry, i.e. the number of candidate variables at each split when growing the random forest (Fig. S12).

3.4 Functional relationships between climate variables and predicted wheat yields

For all three regions, TMIN below -3 °C during winter dormancy and below 3 °C during vegetative phase II was associated with lower yields (Fig. 4). In all three regions, yields increased when TMIN was higher than 3 °C in the vegetative phase II, or when TMAX was higher than 12 °C during the vegetative phase II. In the Southeast, TMAX reduced yields when TMAX rose above 22 °C in the reproductive phase and above 28 °C during the grain filling phase.

Fig. 3
figure 3

Variable importance in specific developmental stages of winter wheat expressed as percent increase in mean squared error (%IncMSE) of the climatic means and weather extremes based on the percentile threshold approach and with the province dummy. Higher %IncMSE signals higher variable importance. See Table S2 for abbreviations and definitions of the different climate mean and weather extreme variables. Empty cells indicate variables that were removed due to multicollinearity or that were not defined (Fig. S4)

Fig. 4
figure 4

Partial dependence of predicted yield on climatic means for the three regions and five developmental stages based on the percentile threshold approach. The x-axis shows the mean (TMIN, TMAX, SPEI) or accumulated (P, GDD) variable-specific values for the respective plant developmental stage; the y-axis shows the yields associated with these values. The shaded area around the partial dependence represents the standard deviations of the results from the 100 model runs. See Table S2 for abbreviations and definitions of the different climate mean and weather extreme variables. Empty panels indicate variables that were removed due to multicollinearity or that were not defined (Fig. S4)

Yields were sensitive to GDD during the reproductive phase in the Northwest and country-wide model, with an increase in yields linked to GDD above 400 °C. Both the partial dependence plots and variable importance suggest that GDD had little impact on predicted wheat yields across all developmental stages in the Southeast. Unusually dry periods with a SPEI below -1 during the reproductive and grain filling phases reduced yields, particularly in the Southeast. Yields in the Northwest were negatively associated with SPEI and precipitation (P, Fig. 4) during the grain filling phase, indicating that excessive water supply by the end of the growing season decreased yields in this region.

We also observed specific impacts of temperature extremes on wheat yields across regions and plant developmental stages (Fig. 5). In general, yields were more sensitive to extreme temperatures in the Southeast and are negatively associated with EDD during vegetative phase I and grain filling. More heat wave events with precipitation (HWwithP) during the reproductive phase were associated with a decrease in yields in all three regions (Fig. 5). In the Southeast, yields were negatively associated with HWwithP during grain filling and with heat waves without precipitation (HWnoP) in the vegetative phase I, although with low variable importance (Fig. 3). The negative effect of HWwithP on yields was only present with the percentile but not with the fixed threshold approach (Fig. S8). During the reproductive phase and in all three regions, yields decreased when TNW increased. Frost during winter dormancy reduced yields in all three region (Fig. 5), but the yields were sensitive to the isolation effect of snow cover only in the Southeast (Figs. 3 and 5). Finally, late frost (LF) had no impact on wheat yields in Ukraine.

Fig. 5
figure 5

Partial dependence of predicted yield on weather extremes for the three regions and five plant developmental stages based on the percentile threshold approach. The x-axis shows the accumulated variable-specific values for the respective developmental stage, and the y-axis shows the yield associated with these values. The shaded area around the partial dependence represents the standard deviations of the results from the 100 model runs. See Table S2 for abbreviations and definitions of the different climate mean and weather extreme variables. Empty panels indicate variables that were removed due to multicollinearity or that were not defined (Fig. S4)

4 Discussion

We presented the first empirical study on the climatic determinants of wheat yield variability in Ukraine from 1985 to 2018. We used machine learning to associate all available daily data from meteorological stations within Ukraine with observed yields of winter wheat. Our analysis accounts for climatic means and for weather extremes in different stages of plant growth. The results suggest that climatic means alone explained 53–62% of the annual yield variation; weather extremes were important in specific regions and growth stages of the wheat plants.

In a global linear regression analysis, Ray et al. (2015) estimated that climatic variability explains 24% of the variability in wheat yields in Ukraine when accounting for country-level and season-long climatic variables from 1979 to 2008. A global assessment, also using random forests, found that climatic means and weather extremes explained less than half of the variance in yield anomalies for maize and spring wheat (Vogel et al. 2019). In 17 European countries, random forests revealed that season-specific temperature and precipitation variables explained 43% of historical wheat yields (Beillouin et al. 2020). Monthly weather variables explained 50 to 80% of the spatial variation of the yield of four major crops in extreme years in Germany (Webber et al. 2020). Random forest revealed that climatic variation together with regional dummies explained 93% of maize yields in the United States (Leng and Hall 2020). The province dummies in our country-wide model with both climatic means and weather extremes improved the explained variability by 5%. The validation measures (except Willmott’s d) suggest that RF models outperform linear mixed-effects (LME) models, in line with other studies that concluded that RF models are effective for crop yield predictions from regional to global scales (Feng et al. 2018; Jeong et al. 2016; Leng and Hall 2020).

At first glance, weather extremes had a limited influence on wheat yields in Ukraine. The models with intraseasonal climatic means and weather extremes showed similar performance as the models with only climatic means, which suggests that a substantial share of the impact of weather extremes on yields has been captured by climatic means. For example, a heat wave during a certain plant developmental stage will elevate the mean temperatures observed in this stage (Vogel et al. 2019). We showed that the random forest models that were trained only with the weather extremes explained about half of the yield variability. This suggests that the separation of climatic means from weather extremes is challenging in statistical models because these variables are often collinear, which complicates disentangling their individual effects on yields.

The influence of extreme weather events on yields is equivocal. In a global assessment, the exclusion of weather extremes from random forest models reduced the explained yield variability by 43% for maize and by 18% for spring wheat (Vogel et al. 2019), which suggests a low sensitivity of wheat to temperature and precipitation extremes. Similarly, a process-based crop model revealed that heat stress and drought had little influence on wheat yields in Europe (Webber et al. 2018), but heat stress explained a considerable amount of variation in wheat yields in eastern Germany (Webber et al. 2020). For eastern Australia, a region with similarly high yield variability as in Ukraine, rainfall extremes accounted for 41–67% of the wheat yield variation (Feng et al. 2018). Our study corroborates the large influence of weather extremes on yields in a major global breadbasket.

Intraseasonal climatic variables are increasingly accounted for in the analysis of relations between climate and crop yields (Harkness et al. 2020; Lu et al. 2017; Lüttger and Feike 2018). Intraseasonal variables outperformed a model with season-long variables by up to 115% in the United States (Ortiz-Bobea et al. 2019). To our knowledge, Hoffman et al. (2020) and Beillouin et al. (2020) are the only studies that applied RF modeling for crop-specific developmental stages. Our work underlines that using intraseasonal climatic variables for distinct crop developmental stages improves the understanding of the relationship between weather and yield during the developmental stages and hence confirms previous studies (Butler and Huybers 2015; Ortiz-Bobea et al. 2019). Better knowledge on intraseasonal effects of climate on yields can inform adaptation strategies of farmers and crop breeders.

Wheat yield is sensitive to heat during the reproductive and grain filling phases (Innes et al. 2015; Liu et al. 2014; Lobell et al. 2012; Lüttger and Feike 2018). We found that heat waves (HWwithP) negatively affect wheat yields during the reproductive phase, particularly in the Southeast. As EDD and HWwithP were highly correlated (Fig. S11), EDD was removed from the models of the reproductive phase. Exposure of wheat to short episodes of heat during the reproductive phase causes male and female sterility and triggers damage to pollen tube growth and fertilization, resulting in lower grain number and grain yield (Farooq et al. 2011; Innes et al. 2015; Sehgal et al. 2018). It has been shown that high temperatures in May reduced wheat yields in the Southeast of Ukraine (Morgounov et al. 2013). During the grain filling phase, extreme heat (TMAX) also reduced yield in our models, particularly in the Southeast. Heat stress during the grain filling period increases the rate and reduces the duration of grain filling, which ultimately reduces grain yield (Barlow et al. 2015; Farooq et al. 2011). A meta-study identified a temperature threshold of 34 °C for the grain filling period, above which grain weight and hence yield are reduced (Farooq et al. 2011). Heat stress during the grain filling phase also negatively affects grain protein contents and hence grain size (Farooq et al. 2011).

We found that spells of warm nights, so-called tropical night waves (TNW), negatively affected wheat yields during the reproductive phase. Tropical night waves occurred both in the Northwest and Southeast (Fig. S13). The mechanisms underlying yield decreases in response to TNW are complex and not yet fully understood. Arguably, warm nights downregulate photosynthesis-dependent processes and trigger losses in the amount of carbon directed toward plant and seed growth; both mechanisms result in yield declines (Sadok and Jagadish 2020). TNW also decease the duration of grain filling with a concomitant reduction in grain yield (Farooq et al. 2011). Fortunately, advances in breeding can guard against yield decreases from tropical night waves, particularly for European cultivars (Sadok and Jagadish 2020).

Our assessment revealed that the percentile and fixed threshold approach to assess the impacts of extreme temperatures achieved similar R² values. However, models using the percentile threshold had clearer functional relationships with wheat yields than models using the threshold approach. For example, the negative effect of HWwithP on yield was only prevalent in the percentile approach (Fig. 5 and Fig. S8). In addition, the temperature measurements from weather stations are not ideal for defining the heat variable because the measurements do not match the temperature in the field. In Germany, maximum crop canopy temperatures and weather station temperatures can deviate by up to 7 °C (Siebert et al. 2014). As we lack information about how much crop canopy and weather station temperatures differ, we were unable to control for this difference. In the absence of reliable crop canopy temperatures, we recommend testing climatic variables that are based on percentile thresholds.

Frost stress during winter dormancy was an important yield-limiting factor in all three regions. Yields were sensitive to the isolation effect of snow cover only in the Southeast. Wheat transitions through a process of cold acclimation toward ‘hardened’ wheat plants, which adapts the plants to low temperatures in winter (Barlow et al. 2015). However, the combination of severe frost and the absence of isolated snow cover can lead to leaf chlorosis and hence to yield loss (Harkness et al. 2020; Kolář et al. 2014). For example, the severe frost without protective snow cover that hit the southeastern Ukrainian steppe in December 2002 led to a drastic decrease in wheat yields in 2003 (Fig. S9). Provinces in the Northwest, where a sufficiently thick snow layer protected the plants, were not severely affected. This underscores the protective effect of snow cover against frost, which has received little attention in the literature to date. Our approach to consider the protective effect of snow cover during frost events can help to better account for yield damages from frost.

Wheat yields in Ukraine seem insensitive to low precipitation across all plant developmental stages, in line with similar studies elsewhere (Ortiz-Bobea et al. 2019; Petersen 2019). However, the SPEI was an important predictor of wheat yields in several developmental stages, suggesting that the SPEI better reflects water availability for plants than the precipitation variables. The SPEI captures evaporative demands due to high temperatures and hence is a powerful indicator of water stress (Ortiz-Bobea et al. 2019; Potopová et al. 2015; Starks et al. 2019). We also tested the standardized precipitation index (SPI), an indicator that has performed very well in eastern Australia (Feng et al. 2018), but the SPEI had higher predictive power than the SPI, and both were correlated with each other (results not shown). The small effect of precipitation in the statistical models could also have been due to the high correlation between water stress and heat (Ortiz-Bobea et al. 2019), but we did not account for these potentially confounding effects.

The example of the SPEI shows that statistical models for different regions can reveal important additional insights, which a spatially aggregated model cannot. The partial dependence plots suggest that the SPEI impacts wheat yields at the country level during grain filling, but the effects have opposite signs in the Northwest and Southeast (Fig. 4). We recommend assessing climate-yield relationships separately for regions with substantial biophysical differences.

Our models detected water stress as a yield-limiting factor during the grain filling and reproductive phase in the Southeast. Limited water availability in the early stages of the reproductive phase, particularly during booting, heading, and flowering, accelerates leaf senescence and limits photosynthetic activity, hence reducing grain size and grain number (Ji et al. 2010; le Roux et al. 2020). Severe drought during the reproductive and grain filling phases reduced yields of winter wheat in the Czech Republic (Potopová et al. 2015). Water limitation has not compromised yields in the Northwest of Ukraine, where sufficient precipitation is typically available, including in drier years. However, excessive water reduced yields during the grain filling phase and this arguably delayed harvesting in the Northwest. Very wet soils at the end of the growing season also lowered yields in the US (Li et al. 2019; Ortiz-Bobea et al. 2019). Northern Ukraine is frequently affected by severe waterlogging (FAO and ITPS 2015), which can critically affect the grain number per plant and thus reduce yields (Marti et al. 2015).

5 Conclusions

We have used machine learning for assessing the complex and nonlinear relationships between climate and crop yields. We found that climatic means and weather extremes captured 53-62% and 36-40%, respectively, of the wheat yield variability in Ukraine. This suggests that much of the impact of extreme weather is captured by the climatic means. Heat waves and tropical night waves during the reproductive phase, frost, and droughts during grain filling and the reproductive phase were the key factors that compromised wheat yields. The high predictive power of extreme weather events in Ukraine calls for intensifying adaptation measures toward improving the resilience of agriculture against climatic extremes, as such events will likely occur more frequently in the future. The increasingly challenging climatic conditions in the region have already contributed to the abandonment of many fertile lands, particularly on the southeastern steppe (Ostapchuk et al. 2021). In climatically favorable years, Ukraine is one of the most important grain-exporting nations in the world. However, in climatically unfavorable years, yields collapse with important repercussions for the global grain market. A better understanding of the causes of the high yield variability in wheat production and of pathways to stabilizing wheat supply is therefore of high international interest and constitutes the backbone for targeting adaptation strategies.