Introduction

The global population is at an increasing trend, and it is estimated that the world population would reach 9 billion by the year 2050 (Dong and Xiao 2016). In the last half-century, there had been an increasing trend in overall agricultural production (Blomqvist et al. 2020) and it is expected to be maintained in the similar manner. However, with finite resources such as water and land, any unplanned overexploitation is highly challenging to cope with the increasing demands for water and food in the future. This imposes serious problems on food security leading to a multitude of socio-economic problems. It is very evident in the recent COVID-19 pandemic periods from interconnected disruptions in the food supply chain across the world (World Bank, 2020). Besides several factors, climate-related disasters like floods and droughts caused by extreme events further add pressure on the agricultural systems (IPCC, 2019). To obtain the exact knowledge of what type of crop, where and when it is being grown, improved monitoring of agricultural production is highly required. It is, therefore, important to monitor agricultural activities on a global, local, and regional scale by developing a modelling framework that inevitably integrates climate change adaptation strategies with a crop forecasting system (Chen et al., 2017; Ghamghami & Beiranvand, 2022; Ghose et al., 2021; Zhang & Zhang, 2016). Crop yield monitoring and the related statistics of crop production form a core part of the information required in formulating a well-defined agricultural plan and policy design (You & Wood, 2006; Khan et al., 2010; Shirsath et al., 2020). Through these processes, a detailed map of various parameters of agricultural production can be developed. These maps form the basis for better managing the overall agricultural systems.

Several different crop yield and crop-related data sources majorly derived from censuses and satellites have been developed on a global scale by numerous researchers (Iizumi et al., 2014; Klein Goldewijk et al., 2017; Kim et al., 2021; Grogan et al., 2022). In some countries, such as India, Brazil, data on crop productivity statistics is available through institutional mechanism such as with state department repositories or at national scales like Food and Agriculture Organization and United States Department of Agriculture (You & Wood, 2006; Ibragimov et al., 2020; Shirsath et al., 2020). However, the major challenge lies in the nature of these historical data and the global scale data on crop productivity (Shirsath et al., 2020). These global or national scale datasets do not explicitly reveal profound information about the geographic distribution of production and land use pattern within the nation boundary. Furthermore, these data sets from higher levels of administrative units do not reveal much about the diversity and spatial patterns (You & Wood, 2006; Khan et al., 2010; Madhukar et al., 2020; Shirsath et al., 2020; Yu et al., 2020). Hence, it is the data from the lower strata of the institutional hierarchy such as at the district or at the sub-national levels or at the scale of a watershed that convey significant information about crop yield and productivity (You & Wood, 2006; Shirsath et al., 2020). Furthermore, while several attempts have been made to compile the agriculture production data at a national and sub-national scale, there still exists an enormous gap in the finer resolution crop yield, crop coverage, availability of longer duration data, and information on several geographic locations (You & Wood, 2006).

Crop yield estimation methods can be broadly classified into four; (i) direct crop cuts, (ii) physical based crop modelling, (iii) statistical models, and (iv) remote sensing methods (Sapkota et al., 2016). Remote sensing methods are more popular as it has potential to map crops and to estimate crop yields at various spatial and temporal scales depending on the image characteristics of the specific remote sensed data products. Many studies have reported the use of these images singly or combinedly to map different crops and crop yield (Yoshikawa & Shiozawa, 2006; Li et al., 2012). The advancement in the optical sensors lies in having high temporal resolution (daily revisits) images such as Moderate Resolution Imaging Spectroradiometer (MODIS), Advanced Very High-Resolution Radiometer (AVHRR), Système Pour l’Observation de la Terre (SPOT).

The remote sensing derived indices are more often used to relate the crop yield and biomass as they have a strong functional relationship with accumulated biomass and yield. Widely used remote sensing derived vegetation and biophysical indices include Normalised Difference Vegetation Index (NDVI), Leaf Area Index (LAI), Green Chlorophyl Vegetation Index (GCVI), Enhanced Vegetation Index (EVI), Gross Primary Productivity (GPP), Triangular Vegetation Index (TVI), Green Normalised Vegetation Index (GNDVI), Simple Ratio (SR), Soil Adjusted Vegetation Index (SAVI), Normalised Difference Red Edge Index (NDREI) (Venancio et al., 2019; Peroni Venancio et al., 2020). The daily scale NDVI data which capture strong periodic characteristics of crop behaviour will be highly useful to closely monitor the relative changes in the vegetation information for establishing long term trend of crop yields, land cover characteristics, vegetation phenology (Gu et al., 2008; Murthy et al., 2007). To a large extent, NDVI not only helps to prepare mapping of crops, but also helps in yield forecasting. However, the scope of this study is not predicting or forecasting crop yield and biomass using some of these remote sensing derived vegetation indices rather it is using these indices to disaggregate or downscale coarse resolution global crop yield datasets to a fine resolution for more detailed and informed decision making in the agricultural sector at local scales.

Downscaling or disaggregation terminology used in this manuscript goes alternatively for the same meaning. Statistical downscaling is the simplest process where a statistical relationship between dependent variable and independent predictor variables can be used to downscale a coarse crop yield data to a finer scale data (Shirsath et al., 2020). The predictor variables used in downscaling coarse resolution crop yield datasets include mainly EVI, NDVI, LAI, and GCVI indices (Lobell et al., 2015; Azzari et al., 2017; Lobell and Azzari, 2017). The study of Shirsath et al., (2020) highlighted the importance of accepting downscaled data from administrative units to have a better utility and relevance. They also presented a simple approach of downscaling district level crop yield data to a local scale using EVI index. Yu et al., (2020) also introduced the model Spatial Production Allocation Model, SPAM2010 which disaggregates crop statistics data such as the yield or harvested area by administrative unit levels, crop type and farming system from coarser to finer scales. Lobell and Azzari (2017) disaggregated soybean yield for 2000—2015 in the Midwestern United States at 30 m resolution using GCVI index. We have identified two main potential limitations of the existing disaggregation methods. One is that the identified biophysical or vegetation indices to disaggregate a coarse resolution crop yield to a local scale not always constant across different season. Secondly, different means of identifying anchor point values on the crop yield and vegetation index for the disaggregation functions poses different degree of bias on the disaggregated yield values. These two research gaps have been addressed in the present study by considering multiple vegetation and biophysical indices for specific season across multiple years rather considering a single index in the disaggregation functions. Secondly, identifying an optimal threshold anchor points based on minimum, maximum, and median percentile values to properly disaggregate global crop yields within the expected or observed crop yield range.

Crop water productivity denotes crop production per unit volume of water applied. Achieving higher yield with less water consumption is the best practice specifically in water scarce regions. Satellite remote sensing techniques have also emerged as another promising field of research in crop productivity, irrigation-water use, and crop water use efficiency (Cazcarro et al., 2019; Johnson & Humphreys, 2021; Safi et al., 2022; Zhou et al., 2021). This significant information assists in approaching resource concerns in agriculture with a sustainable perspective. In this study, we have also derived the crop water productivity maps at a same spatial resolution of crop productivity maps using the disaggregated crop yield and MODIS based evapotranspiration (ET) products.

Based on the above discussions, we aim to develop a fine resolution crop productivity and crop water productivity database at approximately 500 m grid scale using threshold-based disaggregation method in south and southeast Asia region (SSEA). The specific objectives of this study are (i) to downscale coarse resolution global crop yield using threshold-based disaggregation functions, (ii) to assess ET based crop water productivity with downscaled crop yield and MODIS ET data products and (iii) to statistically analyse crop yield, et and crop water productivity variations with respect to unique climate zones in SSEA region.

We also used a reference district level rice yield dataset from International Crops Research Institute for the Semi-Arid Tropics (ICRISAT), India for evaluating the performance of different disaggregation functions with different biophysical and vegetation indices.

Methods

Study Area

Mainland countries in the SSEA region significantly contribute in the world’s food production. The South-East Asia is popularly known as rice-bowl of the world contributing up to 40% of global export (Yuan et al., 2021). Major 16 rice producing mainland countries of SSEA include Vietnam, Thailand, Sri Lanka, Philippines, Pakistan, Nepal, Myanmar, Malaysia, Laos, Indonesia, India, Cambodia, Brunei, Bhutan, Bangladesh, and Afghanistan. The study region spread across 60.5° E to 141° E Longitude and 38.47° N to—10.92° S Latitude. A wide variation in the climate, topography, and culture in SSEA brings unique cropping systems in the region. Global Food Security Support Analysis Data (GFSAD) based 1 km crop mask dataset developed by NASA for SSEA region is shown in Fig. 1a. This particular image was developed from different satellite image products such as Landsat, AVHRR, SPOT, MODIS along with climatic datasets and other supporting ancillary country level statistics (Teluguntla et. al., 2010). Major irrigated area according to this crop extent map is located in the northern and central parts of India, north-eastern part of Thailand and central part of Myanmar while majority of the rainfed areas are located in the east coast and western parts of India and widely across Cambodia, Vietnam, Malaysia, Indonesia, and Philippines. Major crops grown in the SSEA region include Rice, Wheat, Corn, Sugarcane, and Cassava. Different climate zones in the SSEA as per Köppen-Geiger climate classification is also shown in Fig. 1b. The Köppen-Geiger climate classification is based on seasonal precipitation and temperature of the zone or region. The main climate zones in the Köppen-Geiger climate classified map are Tropical, Arid, Temperate, Continental and Polar (Beck et al., 2018). The subclasses of the Köppen-Geiger climate zones are further classified based on temperature and dryness conditions of the region. However, we have considered only the main classes of the climate zones for the statistical analysis in this study (Fig. 1b).

Fig. 1
figure 1

Study area with a Crop mask for major crops and b Climate zones in the south and southeast Asia region

The crop productivity is mainly depending on the climate, soil, topography and crop management of the region. The predominant soil type in SSEA is clay loam. Laterite soils are widespread in some parts of Myanmar, Thailand, and Vietnam. In general, most of the land covered by fertile soils than most of the tropical regions. Soil erosion also less severe when compared to other tropical regions (Frederick & Leinbach, 2020). The climate in SSEA can be classified broadly as monsoonal climate and an average rainfall is more than 1500 mm with wide variation in spatial extent over the study region. Five major river systems, Irrawaddy, Salween, Chao Phraya, Mekong, and Red Rivers, supply water for different sectors including irrigation (Frederick & Leinbach, 2020).

Data

Global Crop Yield Datasets

The global dataset of historical yield (GDHY version 1.2 and version 1.3) from 2000 to 2015 at 0.5° (~ 50 km) gridded data was used in the present study. The GDHY data was accessed from Data Publisher for Earth and Environmental Science, PANGAEA which provides authenticated open-source geospatial datasets for Earth and Environmental research (Lizumi 2019; Lizumi and Sakai 2020). This database can be accessed online at https://doi.pangaea.de/10.1594/PANGAEA.909132. The GDHY crop yield database includes major crops such as Rice, Maize, Wheat, and Soybean with different seasons. However, in this study, we have considered only major season rice crop datasets for downscaling.

Satellite Data Products

Normalised Difference Vegetation Index (NDVI) and Enhanced Vegetation Index (EVI): MODIS vegetation data products version 6.1 (MOD13A1) for NDVI and EVI layers were accessed directly from Google Earth Engine platform. Spatial and temporal resolutions of this MOD16A1 products is 500 m and 16-day composites, respectively (Didan, 2015). This product can be directly accessed online at https://lpdaac.usgs.gov/products/mod13a1v061/. A script in GEE was developed to filter and export NDVI and EVI layers during a major rice growing season (i.e., June 15 through October 15) for the years 2000–2015.

Leaf Area Index (LAI): MODIS combined LAI and Photosynthetically Active Radiation (PAR) data products (MOD15A2H) were filtered for the same period using GEE scripts. The spatial and temporal resolutions of this data products is 500 m and 8-day composites, respectively (Myneni et al., 2015). This product can be directly accessed online at https://lpdaac.usgs.gov/products/mod15a2hv006/.

Gross Primary Product (GPP): MODIS cumulative GPP product (MOD17A2H) was accessed for the same season during 2000–2015 (Running et al., 2015). This particular dataset is available at 500 m spatial resolution and at 8-day composite images. This product can be directly accessed online at https://lpdaac.usgs.gov/products/mod17a2hv006/.

Evapotranspiration (ET): MODIS ET data products (MOD16A2) was accessed for crop water productivity calculation over rice fields during the same season. This product is available at 500 m spatial resolution with 8-day composites temporal aggregation (Running et al., 2017). This data can be directly accessed online at https://lpdaac.usgs.gov/products/mod16a2v006/. Cumulative ET for the primary rice season was calculated by aggregating all 8-day composite images over the season (i.e., June 15 – October 15) for the years 2000—2015.

ICRISAT Rice Yield Data

A district level actual yield data for rice crop was accessed from ICRISAT, India web portal for validating disaggregated global rice yield data products. The ICRISAT district level rice yield data was accessed for the same primary Kharif season for the comparison and validation purpose. This data was accessed directly from ICRISAT India web poral (http://data.icrisat.org/dld/src/Spatialmap.html). All district level rice yield data was downloaded in the CSV file format and later the CSV based rice yield data was linked with corresponding India districts polygon shapefile layer. The shapefile layers of rice yield information were further converted to raster grids for ease of comparison during the validation process.

Downscaling Global Rice Yield Data

Coarse resolution global rice yield data products often downscaled using vegetation related indicators such as NDVI and EVI and biomass related indicator such as LAI and GPP (Dong et al., 2016; Gilardelli et al., 2019). In this study, we first analysed the correlation functions of different biophysical and vegetation indices with crop yield. The best correlated and least biased index was then used to downscale coarse rice yield data to local scales for the specific season (Fig. 2). As rice yield considered in this study was the accumulated grain yield at the time of harvest in the given season, NDVI, EVI, LAI and GPP were also transformed in such a way that these indices represent the specific season in the given year. The specific season rice yields equivalent NDVI, EVI, LAI were obtained by calculating relative increase which was obtained by subtracting maximum values from the minimum values of the respective indices during the same season. However, for GPP index, the specific season rice yields equivalent GPP was obtained by summing all 8-day composite GPP images over that season of the year. The downscaled rice yield at 500 m scale and MODIS ET data were used to calculate the cop water productivity maps as shown in Fig. 2.

Fig. 2
figure 2

Methodology for downscaling crop yield and crop water productivity mapping

Threshold Based Disaggregation Methods

The primary rice season GDHY datasets were only used in this study to downscale coarse gridded data to a local scale. The GDHY primary season rice yield datasets were at 50 km scale while MODIS derived data products such as NDVI, EVI, LAI and GPP were at 500 m scale. Both data products were filtered for the primary season which is June 15—October 15 during 2000–2015. The GDHY rice yield data was in t/ha units. The MODIS based GPP was in kg C/m2 while NDVI, EVI, and LAI were representing vegetation indices with no specific units. The disaggregation methods proposed in the study are explained as follows,

Step 1: Global rice yield raster grids of the year 2000 were polygonised and each polygon values were extracted to a list using GEE scripts.

Step 2: Extracted rice yield values were sorted in ascending order from which 5th (Ymin) and 95th (Ymax) percentile rice yield values were identified for minimum–maximum (min–max) disaggregation approach. Similarly, 5th (Ymin) and 50th (Ymedian) percentile rice yield values were identified for minimum-median (min-med) disaggregation approach.

Step 3: MODIS based NDVI, EVI, LAI and GPP data were aggregated to 50 km scale as the GDHY rice yield data was at 50 km scale for consistency in the spatial sampling.

Step 4: A binary mask layer was created based on the min–max and min-med threshold rice yield values. The binary mask created based on rice yield minimum, maximum and median threshold values were used calculate the respective values from aggregated NDVI, EVI, LAI and GPP layers.

Step 5: The min–max and min-med disaggregation functions with threshold anchor points for Rice Yield-NDVI scatter space conceptualisation diagram is explained in Fig. 3 as follows,

Fig. 3
figure 3

Disaggregation approaches adopted in the present study to downscale rice yield datasets

Step 6: Identified two anchor points in the rice yield versus NDVI /EVI/LAI/GPP scatter space for min–max (Eq. 1) and min-med disaggregation (Eq. 2) approaches were used to downscale the coarse resolution rice yield data into local scale or original MODIS resolution using the following functions, respectively,

$$Y_{down} = \left[ {\frac{{Y_{\max } - Y_{\min } }}{{NDV{\text{I}}_{max} - NDV{\text{I}}_{min} }}} \right]NDVI$$
(1)
$$Y_{down} = \left[ {\frac{{Y_{{{\text{m}} edian}} - Y_{\min } }}{{NDV{\text{I}}_{median} - NDV{\text{I}}_{min} }}} \right]NDVI$$
(2)

Step 7: Comparison of the disaggregated yields at 500 m scale with district level ICRISAT-India rice yield datasets were analysed and the best correlated indices (i.e., NDVI or EVI or LAI or GPP) from the best disaggregation approach (i.e., min-median or min–max) was selected based on the lowest root mean square error (RMSE) and mean absolute error (MAE) values for downscaling coarse rice yield datasets for the year 2000.

Step 8: Step 1 through Step 7 was repeated for subsequent years (i.e., 2001 -2015) and the best correlated variable from best disaggregation functions was identified to downscale the rice yield datasets from 50 km grids to 500 m grids.

Validation with ICRISAT-India Rice Yield Datasets

The Kharif season (i.e., June—October) district-wise rice yield data from ICRISAT-India during 2000–2015 was used for validating disaggregation methods adopted in the present study. The district wise kharif season ICRISAT-India rice yield data in the excel database was linked with India's district level administrative shapefile using Q-GIS software (QGIS Development Team, 2009). The shapefile with rice yield information from 2000–2015 were imported into the GEE Assets background from which these shapefiles were accessed into the GEE scripts for further processing. These shapefiles of ICRISAT rice yield datasets were converted to raster grids at the district scale. Thirty-six random districts were selected to extract the corresponding average downscaled rice yield values. The comparison and validation of the downscaled rice yields with ICRISAT-India rice yields were assessed with RMSE and MAE as follows,

$$RMSE = \sqrt {\frac{1}{n}\sum\limits_{i = 1}^{n} {\left[ {S_{i} - O_{i} } \right]}^{2} }$$
(3)
$$MAE = \frac{1}{n}\sum\nolimits_{i = 1}^{n} {\left| {O_{i} - S_{i} } \right|}$$
(4)

where \(S_{i}\) is the downscaled rice yield value [t/ha]; \(O_{i}\) is the ICRISAT-India observed rice yield [t/ha]; \(\overline{O}\) is the mean value of the observed rice yield values; \(\overline{S}\) is the mean value of the downscaled rice yield estimates; \(n\) is the total number of observations.

Crop Water Productivity Mapping

Crop water productivity (CWP) was estimated with crop productivity (kg/ha) and volume of water consumed by the crops (m3/ha) as per Eq. (5). To calculate crop water productivity, we used the rice crop productivity maps generated at 500 m grid-scale from the proposed downscaling approach in the numerator and MODIS ET data products in the denominator of Eq. (5). The MODIS ET data products provide 8-day composites ET layers. Therefore, these 8-day composites were cumulated over the primary rice season to calculate a seasonal ET at each 500 m grid cell every year from 2000 to 2015. However, the MODIS-ET data products were in ‘mm’ of water units. Therefore, the ‘mm’ of water units was converted to m3/ha units with a conversion factor of 10 for the convenience of CWP calculation. On the other hand, the downscaled rice crop yield raster grids were converted from ‘t/ha’ unit to ‘kg/ha’. The crop water productivity maps were generated at 500 m resolutions with crop productivity and MODIS ET derived seasonal ET maps with the following equation,

$$CWP\,[kg/m^{3} ] = \frac{{{\text{Crop}}\,{\text{productivity [}}kg/ha{]}}}{{{\text{Volume}}\,{\text{of}}\,{\text{water}}\,{\text{consumed}}\,{\text{or}}\,{\text{ET}}\,[m^{3} /ha]}}$$
(5)

Statistical Analysis of Rice Yield, ET and CWP

Spatially distributed rice yield, ET and CWP maps were analysed statistically with respect to unique climate zones in SSEA region. The four main climatic zones map was used to extract the respective rice yield, ET and CWP values from the all the raster layers. Two different statistical tests were conducted to analyse these parameters for their mean and variance variation with respect to the climatic zones. ANOVA test was used to analyse whether the mean values of rice yield across different climate zones are statistically different or not. Similarly, a paired t-test was used to analyse whether the paired climate zones mean are different from each other or not. These two statistical tests, ANOVA and paired t-test were repeated for ET and CWP variables as well.

Results

Selection of Least Biased Variable and Disaggregation Method

The calculated minimum, maximum, and median threshold values based on 5th, 95th, and 50th percentile rice yield and associated NDVI, EVI, LAI, and GPP values are shown in supplementary Tables S1, S2, S3, and S4, respectively. Based on these threshold values, the coarse gridded rice yield images were disaggregated to 500 m raster grids using disaggregated functions as given in the Eqs. (1) and (2). The downscaled rice yield based on NDVI, EVI, LAI, and GPP were compared with ICRISAT-India rice yield at 36 randomly selected districts. The comparison of downscaled rice yields with ICRISAT rice yields for both min–max and min-med downscaling methods are shown in Fig. 4. Figure 4 clearly shows min-med method of downscaling was comparatively better than min–max downscaling method as the estimated rice yield values by earlier method show comparatively lower overestimation with respect to ICRISAT rice yield values across all the years. On the other hand, among all the indices, GPP was showing relatively lower overestimation across all the years. The calculated RMSE and MAE for all these indices among these two downscaling methods also confirm that min-med approach with GPP index were consistently better across all the years. The least RMSE value of 1.04 t/ha was obtained by GPP based min-med approach during 2007 while the least MAE value of 0.81 t/ha was also obtained during the same year 2007. LAI was the next best performing index after GPP while NDVI an EVI indices were highly overestimating the rice yield values consistently across all the years (Tables 1 and 2).

Fig. 4
figure 4

Comparison of min-med and min–max downscaling results for GPP, LAI, NDVI and EVI variables during a 2000–2003, b 2004–2007, c) 2008–2011, and d 2012–2015

Table 1 RMSE values of different indices used in the min-med and min–max disaggregation functions
Table 2 MAE values of different indices used in the min-med and min–max disaggregation functions

Downscaled Rice Yield for the Year 2015

For brevity, coarse gridded rice yield map, downscaled rice yield map using GPP index, and the reference ICRISAT-India rice yield map for 2015 are shown in Fig. 5. Furthermore, we have also shown the enlarged images at three random sites in the study area for a detailed comparison among coarse rice yield, downscaled rice yield and ICRISAT-India rice yield values (Fig. 5). The coarse rice yields were varying from 0.08 to 7 t/ha from Pakistan to Indonesia. Three selected random sites from India, Thailand and Indonesia show a wide range of rice yield due to unique variation in the climate, water availability, and field management practices. A random site at south-eastern part of India shows a wide variation with 2–6 t/ha in the rice yield within the selected region while at other two random sites in the north-eastern part of Thailand and southern part of Indonesia show a relatively lower variation in the rice yield with 4–6 t/ha and 2–2.5 t/ha, respectively within the selected sites (Fig. 5a).

Fig. 5
figure 5

Comparison of spatially distributed primary season a global gridded rice yield, b downscaled rice yield by GPP based min-med method, and c ICRISAT-India rice yield for the year 2015

Figure 5b shows the downscaled rice yield using GPP variable. The downscaled rice yield map captures a similar spatial pattern of rice yield as that of ICRISAT-India rice yield and Global gridded rice yield layers at all 3 random sites. This indicates that the disaggregation approach adopted in this study is reasonable and effective in downscaling rice yield datasets. The comparison of the coarse gridded yield with the downscaled rice yields at three random sites show that the range of rice yield values were also similar at all the sites. Thus, the downscaled rice yield maps preserve both the spatial pattern and the range of rice yield values similar to global gridded rice yield and ICRISAT rice yield values. Since ICRISAT-India rice yield was only available to India, the comparison of the downscaled yields was done at the district level yield values over India (Fig. 5c). The yield variation in the ICRISAT-India map was ranging from 1 to 6 t/ha. The districts from southern Indian states such as Tamil Nadu, Andhra Pradesh, Telangana and Northeast Indian states such as Punjab were recorded with the highest rice yields up to 5–6 t/ha while other states were showing relatively lower yields (Fig. 5c). This is mainly because of intensive irrigation practices in these regions increase the rice yield comparatively higher than any other rainfed based rice farming system in India. Groundwater is the source of irrigation in most of these districts where the yield is relatively high. The comparison of downscaled rice yield and the ICRISAT-India rice yield at the random site in India show a strong similarity which indicates that the performance of the min-med downscaling approach was effective and appropriate at most parts of India, Thailand and other SSEA countries (Fig. 5c).

Statistical Analysis of Downscaled Rice Yield

Downscaled seasonal rice yield maps for all the years from 2000 to 2015 are given in the supplementary materials (Fig. S1). The four major climate zones (i.e., tropical, arid, temperate, cold) with 36 subregions polygon layer was used to extract the rice yield values. The extracted rice yield values from four major climate zones show a distinct variation from each other (Fig. 6). The tropical zone shows the highest variation in the rice yield values which varies from 1.5 to 10 t/ha. Cold and temperature zones were showing relatively similar range of rice yield values with temperature zone being slightly higher yield values than cold zone. The arid zone shows the yield variation from 0.5 to 3 t/ha which is the lowest yield range among other climate zones. This is very true that the arid region lacks significant rainfall and moisture thus affects the crop yield (Fig. 6). However, the ANOVA analysis show that the mean rice yield values across these 4 climate zones are significantly different with a p-value of 2e-16 (Table 3). The paired t-test p-value matrix show that the mean rice yield values of cold-temperate zones are not significantly different from each other while the other paired zones such as arid-cold, arid-temperate, arid-tropical, cold-tropical, and temperate-tropical zones mean rice yield values were significantly different from each other (Table 4).

Fig. 6
figure 6

Downscaled rice yield variation among climate zones

Table 3 ANOVA analysis for rice yield across climatic zones
Table 4 Pairwise t-test p-value matrix for rice yield among climatic zones

Evapotranspiration Variation and Statistical Analysis

MODIS ET for the year 2015 is shown in Fig. 7. The seasonal ET maps for all the years from 2001 to 2015 are given in supplementary materials section (Fig. S2). The MODIS-ET data products was not available for the primary season of the year 2000. Therefore, the ET mapping was done from 2001 through 2015. The ET values were highly varying from Pakistan to Indonesia with a range of < 300 m3/ha to more than 1500 m3/ha, respectively. Specifically, the highest ET values were observed in central Thailand, northeast of India, southern Indonesia and northern Philippines with a range of 1500–4000 m3/ha. A relatively lower ET values in the range of 300–500 m3/ha were observed in the central part of Myanmar, western Thailand and Pakistan (Fig. 7a). The extracted ET values from four major climate zones are shown in Fig. 7b. Tropical zone show a huge variations in the seasonal ET rates which ranges from 500 to 4000 m3/ha while the lowest ET rates were observed in the arid climate which range from 100 to 1000 m3/ha. The arid zone in SSEA include mainly western part of India and central and southern part of Pakistan. The lower ET rates in these regions due to lower moisture availability as these zones comes under great Indian desert or Thar desert. The temperate zone ET rates were ranging from 100 to 2000 m3/ha while cold zone show an ET range of 1000 to 2000 m3/ha. The ET variation across different climate zones were different with different mean values. The ANOVA test for mean ET values across different climate zones in SSEA region show that these mean values were significantly different (Table 5). However, the pairwise t-test statistics show that arid-cold, arid-tropical, arid-temperate and temperate-tropical mean ET values were statistically different at 100% confidence level. The cold-tropical mean ET values were significant only at 99% confidence level while cold-temperate mean ET values were not statistically significant (Table 6).

Fig. 7
figure 7

a Spatial variation of ET over the study area during 2015, and b ET variation with respect to climate zones

Table 5 ANOVA analysis for ET across climatic zones
Table 6 Pairwise T-test p-value matrix for ET among climatic zones

Crop Water Productivity Variation and Statistical Analysis

ET based crop water productivity map of 2015 is shown in Fig. 8a. The seasonal CWP maps for all the years from 2001 to 2015 are given in the supplementary materials (Figure S3). A distinct variation on CWP was observed spatially across SSEA region. Similarly, seasonal variation across all the years in the CWP values was also observed. This indicates the water availability, water use, and crop production varies with season and region. From Fig. 8a, the highest CWP values were observed in the northern part of Laos, central part of Myanmar, southern part of Indonesia with range of 5–7 kg/m3 while the lowest CWP values were observed in the north-eastern part of Thailand and southern India with less than 3 kg/m3. A wide range of CWP values with 1–12.5 kg/m3 from tropical zone indicate that this zone represents a mixture of crop water use and crop production characteristics across multiple countries that comes under this tropical zone such as Sri Lanka, India, Myanmar, Laos, Thailand, Indonesia and Philippines (Fig. 8b). It was also observed that arid and temperate zone CWP values were in the same range while cold zone CWP values were relatively lower than all other zones. The ANOVA based statistical analysis show that the mean CWP values across different zones in SSEA were significantly different (Table 7). A pairwise t-test comparison matrix show that arid-temperate zones mean CWP values were not significantly different. The arid-cold zones based mean CWP values were significant at 99% confidence level while other combinations such as arid-tropical, cold-temperate, cold-tropical, temperate-tropical zones mean values were statistically significant at 100% confidence level (Table 8).

Fig. 8
figure 8

a Spatial variation of CWP over the study area during 2015, and b CWP variation with respect to climate zones

Table 7 ANOVA analysis for crop water productivity across climatic zones
Table 8 Pairwise t-test p-value matrix for crop water productivity among climatic zones

Discussion

Global crop yield data products at coarser spatial scales are, although, useful to understand a large-scale dynamic of climate on the crop productivity, the applicability of these datasets for the understanding of crop variability at local scale is limited due to heterogenous nature of the field conditions. In this study, we have evaluated the functional relationship and effectiveness of different biophysical and vegetation indices to downscale global gridded crop yield data to a relatively finer spatial unit. The basis of selecting NDVI, EVI, LAI and GPP variables for disaggregating coarse crop yield products to a finer resolution was mainly due to their seasonal pattern strongly correlates with crop growth, biomass and yield (Gilardelli et al., 2019; Kang et al., 2016; Li et al., 2014; Machwitz et al., 2014). Nevertheless, the level of dependency of each variable with crop yield on a particular season might differ. Therefore, a season specific correlation analysis of each variable on crop yield was adopted in the present study to identify the least biased variable for effective downscaling of coarse gridded datasets on a year-by-year basis. The accuracy of downscaled results also depends on the anchor points or threshold values to transform coarse gridded data to finer grids. Therefore, we also investigated two different methods of fixing anchor points based on min–max and min-med threshold values from yield and biophysical and vegetation indices scatter space.

The GPP indices were strongly correlated with crop yield as they were relatively less biased when compared to LAI, NDVI and EVI. However, the relative bias between GPP and LAI indices were relatively lesser. These results strongly support the results from a similar study by (Jaafar & Ahmad, 2015) where they conclude that the variable GPP was strongly correlated with summer crop yields in the semiarid and arid agrosystems. The threshold values used in the disaggregation approach could lead to systematic and higher bias in the results if a proper functional relationship with representative values was not considered. This has been explained in this study through demonstrating two different downscaling methods and corresponding model performances. Identifying the anchor point threshold values for vegetation indices and crop yield datasets play a significant role in the effective downscaling process. The min–max downscaling approach used 5th and 95th percentile values of the vegetation indices and crop yield values while the min-med downscaling approach used 5th and 50th percentile values. The min-med approach was effective in the downscaling of rice yield datasets when compared to min–max approach. The min–max approach used the extreme values of global rice yield data and associated biophysical and vegetation indices to transform the coarse yield to local scale which overestimates the actual field level yield values. This is because of higher extreme (95 percentile) global yield values often not matched with local scale extreme values due to a systematic error in the conceptualisation and modelling process. However, the median (50 percentile) global yield values were reasonably matched with local scale median values thus the min-med approach effectively transforms the coarse yield to local yield.

Performance assessment of minimum-median based downscaling model across all the years from 2000 through 2015 show that the MAE values were within 0.81–1 t/ha for most of the years while the maximum MAE never exceeded beyond 1.21 t/ha. This indicates that the proposed downscaling method maximum error with respect to the total range of actual observed yield (i.e., 0–7 t/ha) values not exceeded 15–20%. These results also strongly support from similar studies in such a way that the average error with respect to the actual range of values was within 15–20% (Folberth et al., 2019; Gaso et al., 2019; Gilardelli et al., 2019). The crop water productivity of rice crop was mapped at 500 m grid scale with downscaled rice yield datasets and MODIS-ET data sets. The equivalent ‘mm’ depth of ET units was transformed to ‘m3/ha’ in such a way that the crop water productivity was estimated in ‘kg/m3’ units. The time series crop water productivity maps show a unique trend in spatial and temporal domains in each year from 2000 through 2015. The mapped crop water productivity results were closely aligned with the results from (Morita, 2021) where Maize, Rice, Wheat, and Sugarcane crops crop water productivity was estimated in the south Asia and sub-Saharan Africa from 1980 through 2012.

The ANOVA based statistical analysis show that the downscaled rice yield, ET and CWP across different climate zones were statistically significant (Tables 3, 5, 7). Although these values were significantly different across different climate zones in SSEA region, specific climate zones mean values were not statistically different from other climate zones from a pairwise t-test analysis (Tables 4, 6, 8). The temperate-cold zones mean yield values were not statistically different. Similarly, temperate-cold and arid-temperate climate zones mean values were not statistically different from other combination zones for ET and CWP variables, respectively. These comparison analyses help to identify specific zones that are more productive and efficient in crop water use for the crop production systems.

Conclusions

Crop productivity and crop water productivity information at local field scales is crucial for calibrating crop models, forecasting crop yields, and informed decision making in crop management. This study explored multiple variables and multi model approach to identify an appropriate downscaling method to transform global scale yield values to a local scale at 500 m. It was found that the min-med downscaling approach was more effective than min–max approach. The min-med downscaling approach performance was satisfactory with MAE range of 0.85–1.2 t/ha which lies in the error range of 15–20% with respect to actual range of reference rice yield (0–7 t/ha) datasets. It was found that the GPP variable was strongly correlated with crop yield response when compared to LAI, NDVI and EVI indices.

A distinct variation in the crop productivity and crop water productivity was observed across SSEA region from year to year. In general, tropical zone rice yield and crop water productivity values were highly varying when compared to temperate, arid and cold climate zones. For example, the tropical zone rice yield and crop water productivity values were highly varying in the range of 1–10 t/ha and 1–12.5 t/ha, respectively. The temperate and cold zones rice yield values were at par each other while arid and temperate crop water productivity values were at par each other.

The performance evaluation of the proposed downscaling methods was successfully verified with ICRISAT based district level rice yield datasets. However, the extensive evaluation with field-based rice yield datasets across different climatic zones would be highly recommended for robust validation. Moreover, in this study we have only downscaled the primary season rice yield for SSEA region. But a similar approach can be adopted to downscale multiple seasons and multiple crops coarse resolution global gridded datasets with the proposed downscaling algorithms.