Summary

We combined (i) ambient temperatures and air pollution concentrations with (ii) mortality registers in each region to estimate the location-specific empirical relation between the environment and human health. These so-called “epidemiological associations” thus quantify, for each location, the actual risk of death at any given temperature or air pollution concentration based on real data. Every day, we download and process the new set of updated temperature forecasts for the next 15 days, and the new set of updated air pollution forecasts for the next 4 days, and use the above epidemiological associations to transform them into predictions of temperature related mortality and air pollution related mortality, respectively.

These predictions are grouped into 5 warning categories: a baseline warning state (“none”), when the risk of death is minimum, and 4 categories of heat, cold or air pollution warnings (“low”, “moderate”, “high” and “extreme”), corresponding to increasing levels of risk of death. On the one hand, the risk of death due to ambient temperatures is generally minimum twice every year: at the beginning and end of the summer season. In general, the risk of death increases with increasing temperatures in summer (“heat warnings”), and with decreasing temperatures in autumn, winter and spring (“cold warnings”). On the other hand, the risk of death due to air pollution is generally minimum once per year: in winter for ozone (O3), and in summer for nitrogen dioxide (NO2) and particulate matter with an average aerodynamic diameter of up to 2.5 (PM2.5) or 10 (PM10) micrometres. In general, the risk of death increases with air pollution concentrations, with the highest concentrations of O3 in summer, and the highest values of NO2, PM2.5 and PM10 in winter (“air pollution warnings”).

A key aspect of the system is that, in each location, we estimated separate epidemiological associations for each sex and age group. These sex-specific and age-specific epidemiological associations were exclusively estimated with mortality records of the respective sex and age group, and therefore, they quantify, for each location, the actual risk of death of the population subgroup at any given temperature or air pollution concentration based on real data. This means that we issue independent health warnings for each population subgroup based on the corresponding sex-specific and age-specific epidemiological association. In general, the risk of death from heat is higher in women than in men, and consequently, heat warnings in summer are expected to be more frequent and of higher category in women than in men. Similarly, the risk of death from both heat, cold and air pollution increases with age, and therefore, heat, cold and air pollution warnings are expected to be more frequent and of higher category in the elderly all year round.

We analysed here and here how far in advance we can reliably forecast temperatures and their associated health effects. Ambient temperatures and temperature related mortality risks and health emergencies can generally be forecast with some degree of confidence up to two weeks in advance. We must however strongly emphasise that the reliability of these forecasts and warnings decreases as we predict more distant dates, with very high reliability a few days ahead only. We generally recommend being cautious with temperature forecasts and associated health warnings issued more than 7 days in advance.


Author contributions

Joan Ballester Claramunt: original idea, conceptualisation, overall methodology design, project funding, team creation/coordination, temperature/mortality epidemiological modelling, website descriptions, supervision of all steps.
Mireia Beas-Moix: project management, mortality data acquisition, website licensing.
Nadia Beltrán-Barrón: mortality data processing, website creation/design.
Zhao-yue Chen: air pollution/mortality epidemiological modelling.
Raúl Fernando Méndez Turrubiates: temperature data processing, temperature population weighting.
Fabien Peyrusse: temperature data processing, temperature population weighting.
Marcos Quijal-Zamorano: temperature/mortality epidemiological modelling, temperature/mortality predictability assessment, temperature bias-correction.


Ballester J, Beas-Moix M, Beltrán-Barrón N, Chen ZY, Méndez Turrubiates RF, Peyrusse F, Quijal-Zamorano M. Forecaster.Health. Available at https://forecaster.health/ (2024).

Mortality records

We used the spatiotemporally-homogeneous daily regional mortality database of the project EARLY-ADAPT. As of September 2024, the database contains over 164 million counts of deaths from 654 contiguous NUTS regions in 32 European countries, representing their entire urban and rural population of over 541 million people.


Temperature observations and forecasts

Every day, we obtain the most recent available hourly gridded (0.1° x 0.1°) 2-meter temperatures from the ERA5-Land reanalysis, here considered as a proxy for observations. We also obtain gridded (0.25° x 0.25°) 2-meter temperature forecasts issued at 00 UTC from ECMWF. Forecasts include 51 ensemble members with data every 3 hours at hourly lead times 0 to 144 (i.e. days 1 to 6), and every 6 hours from hourly lead times 144 to 354 (i.e. days 7 to 15). We compute the daily regional temperature observations and forecasts by weighting gridded temperatures with gridded population data for year 2018 from GISCO.

Then, we post-process the ensemble of temperature forecasts to bias-correct them against the temperature observations used in the epidemiological models. We apply a bias-correction method considering the most recent N = 30 pairs of observations and forecasts with respect to each forecast start date (BC-30). Thus, for any given region \(r\), observation date or forecast start date \(s\), and forecast lead time \(l\) (expressed in days), we calculate the correction \(c\) of the forecast ensemble members as

$$ c(r,s,l)=\frac{1}{N} \sum_{n=1}^{N}o(r,s-n)-f(r,s-n-l\mathit{+1},l) , $$

where \(o(r,s-n)\) and \(f(r,s-n-l\mathit{+1},l)\) are the pairs of temperature observations and ensemble mean forecasts for all cases in the training dataset, respectively. We then add this correction individually to each of the forecast ensemble members to obtain the ensemble of bias-corrected temperature forecasts.


We used a time-series quasi-Poisson regression model in each region to derive estimates of region-specific temperature-lag-mortality risks with data from the period 2000-2019, following the methodology described here and here. The equation is as follows

$$ log(E(mort))=intercept+S(\textit{time, 8 df per year} )+dow+cb, $$

where \(mort\) denotes the daily time series of mortality counts; \(E\) corresponds to its expected value; \(S\) is a natural cubic spline of time with 8 degrees of freedom per year to adjust for the seasonal and longer-term trends; \(dow\) corresponds to a categorical variable to control for the day of the week; and \(cb\) is the cross-basis function produced by a distributed lag non-linear model combining the exposure-response and lag-response associations. The exposure-response association was modelled with a natural cubic spline, with three internal knots placed at the 10th, 75th and 90th percentiles of the observed distribution of daily regional temperatures. The lag-response association was modelled with a natural cubic spline, with an intercept and three internal knots placed at equally-spaced intervals in the logarithmic scale, with lags ranging between 0 and 21 days. We then performed a multivariate multilevel meta-analysis, modelling dependencies of regions within countries through structured random effects, and including the location-specific temperature average and interquartile range as meta-predictors. The fitted meta-analytical model was used to derive the best linear unbiased predictions of the cumulative temperature-mortality association in each region, from which we estimated the regional minimum mortality temperature.

Every day, we transform the temperature observations and bias-corrected temperature forecasts into temperature related mortality (TRM) estimates and predictions, respectively. TRM represents the fraction of deaths attributable to non-optimal temperatures, calculated as

$$ TRM(d)=1-\frac{1}{RR(T(d))}, $$

where \(RR(T(d))\) is the relative risk at temperature \(T(d)\) of a given observed or forecast date \(d\). The relative risk is computed from the respective regional cumulative temperature-mortality association, centred at its minimum mortality temperature. We created five warning categories, for temperature related mortality values smaller than 5% (“none”), between 5% and 10% (“low”), between 10% and 15% (“moderate”), between 15% and 20% (“high”) and higher than 20% (“extreme”). “Cold” and “heat” warnings correspond to days with temperatures colder or warmer than the respective regional minimum mortality temperature, respectively.


Air pollution observations and forecasts

As described here and here, we used quantile machine learning models to estimate daily concentrations of particulate matter with an average aerodynamic diameter of up to 2.5 (PM2.5) or 10 (PM10) micrometres, nitrogen dioxide (NO2) and the maximum daily 8-hour average of ozone (O3) at a 0.1° x 0.1° spatial resolution across Europe. The models were trained on ground-monitoring data and multiple spatiotemporal predictors, including satellite retrievals, land-use and meteorological and atmospheric reanalysis variables. These estimates were here considered as a proxy for observations, and used to fit the epidemiological models of air pollution, together with data of 2-meter temperature and relative humidity from the ERA5-Land reanalysis (see next section).

Every day, we obtain gridded (0.1° x 0.1°) surface air pollution forecasts of PM2.5, PM10, NO2 and O3 issued at 00 UTC from CAMS, with data every hour at hourly lead times 0 to 96 (i.e. days 1 to 4). We use the median value of the available 11-member ensemble, here considered as the most reliable and stable forecasts.

We compute the daily regional air pollution observations and forecasts by weighting gridded temperatures with gridded population data for year 2018 from GISCO.


We used a time-series quasi-Poisson regression model in each region to derive estimates of region-specific air pollution-lag-mortality risks with data from the period 2003-2019. The equation is as follows

$$ log(E(mort)) = intercept + S(\textit{time, 8 df per year} ) + S(\textit{temperature, lags 0-3 days, 6 df}) + $$ $$ S(\textit{relative humidity, lag 0 days, 3 df}) + dow + cb(\textit{air pollutant}), $$

where \(mort\) denotes the daily time series of mortality counts; \(E\) corresponds to its expected value; \(S\) is a natural cubic spline; \(dow\) corresponds to a categorical variable to control for the day of the week; and \(cb\) is the cross-basis function produced by a distributed lag non-linear model combining the exposure-response and lag-response associations. The exposure-response association was modelled with a linear term, and the lag-response association with integer values, ranging from lag 0 to a maximum lag of 2 days for PM2.5, PM10 and NO2, and 3 days for O3. The model was controlled for a natural cubic spline of time with 8 degrees of freedom per year to adjust for the seasonal and longer-term trends, a natural cubic spline of temperature averaged over lags 0-3 days with 6 degrees of freedom, and a natural cubic spline of relative humidity at lag 0 with 3 degrees of freedom. We then performed a multivariate multilevel meta-analysis, modelling dependencies of regions within countries through structured random effects, and including the following location-specific meta-predictors: (i) average and (ii) interquartile range of the corresponding air pollutant, (iii) temperature average, (iv) relative humidity average, (v) rate of elderly residents (65 years or older) and (vi) natural logarithm of gross domestic product per capita. The fitted meta-analytical model was used to derive the best linear unbiased predictions of the cumulative air pollution-mortality association in each region.

Every day, we transform the air pollution observations and forecasts into air pollution related mortality (APRM) estimates and predictions, respectively. APRM represents the fraction of deaths attributable to air pollution, calculated as

$$ APRM(d) = 1-\frac{1}{RR(AP(d))}, $$

where \(RR(AP(d))\) is the relative risk at air pollution concentration \(AP(d)\) of a given observed or forecast date \(d\). The relative risk is computed from the respective regional cumulative air pollution-mortality association, centred by subtracting a reference level of 0 µg/m3 for PM2.5, PM10 and NO2, and 70 µg/m3 for O3. We created five warning categories, for air pollution related mortality values smaller than 1% (“none”), between 1% and 2% (“low”), between 2% and 3% (“moderate”), between 3% and 4% (“high”) and higher than 4% (“extreme”).