Verification of Simulations
We conduct extensive and regular verification of our and other simulation models, comparing them to actual measurement and observation data. Thereby, we ensure that our services are delivering top-quality (and continuously improving) weather data, both historic and forecast. meteoblue is the first commercial weather service that regularly publishes verification data on the company website since 2010, as well as daily local accuracy updates.
Why do we publish our verifications?
- We are transparent: Weather is no "chaos" and our customers should know what they receive
- We deliver quality: Our accuracy is so high that it is worth showing.
- We are realistic: You should know what to expect from a forecast - and what not to expect.
- We are competitive: If someone believes we are not good enough - show us how to do better.
What does meteoblue simulation quality mean? This page shows the results of some of the most important studies.
Weather forecasts improved tremendously over the last decades
Numerical weather forecast models have been continuously improved in the last decades. Around 1980, the 24-hour ahead forecast of air temperature was calculated with an accuracy of around 70%. In 2018, the accuracy of the 24h forecast increased to around 90% and the 72h forecast nowadays is as good as the 24h forecast was 40 years ago. In numerical weather forecast models, the accuracy of the 500 hPa geopotential height is even higher than the accuracy of the 2 m air temperature simulation. The evolution of the model accuracies over time can be seen in the following figure:.
Evolution of the forecast skill [%] of the 500hPa geopotential height from 1980 - 2013 (Source: ECMWF).
Three main factors are responsible for the increasing model accuracy during the last 40 years:
- The initial conditions of the numerical weather forecast model are estimated significantly better than 40 years ago. New meteorological measurement techniques (e.g. satellite observations) and more accurate measurements are responsible for this improvement.
- Finer horizontal (and vertical) resolution of the numerical weather forecast models due to more computational power.
- Better sub-grid parametrisations in the numerical models than 40 years ago.
The accuracy of a weather simulation model significantly depends on the chosen meteorological variable. Meteorological variables like 2m air temperature, surface pressure or the 500hPa geopotential height are typically calculated with high accuracy, whereas other variables (e.g. precipitation, wind gusts, etc.) have a lower accuracy, typically caused by small-scale spatial variations, which are not resolved in weather models.
meteoblue verification for historical and forecast data
In the following, we show the meteoblue model accuracy for different meteorological variables and the model skill of meteoblue multimodels, MOS, reanalysis models and stand-alone ("raw") numerical weather forecast models.
The verification of numerical weather forecast models is highly relevant for all stakeholders in order to show that weather forecast models have a larger model skill than simple climatological forecasts or persistence forecasts ("Weather tomorrow is the same as today").
Four different meteorological variables (air temperature, wind speed, precipitation and dewpoint temperature) have been verified on more than 10'000 different meteorological stations worldwide during the year 2017, by analysing the model accuracy of several different raw ("stand-alone") weather forecast models, satellite observations and reanalysis models. Additionally, the accuracy of different multimodel approaches was tested and compared against raw ("stand-alone") models and a 24-hour ahead forecast from model output statistics (MOS).
We distinguish between historical data sets and forecast data sets, based on the availability of the model data.
The following table shows the MAE (Mean absolute error in K) on an hourly basis (and yearly basis for Annual precipitation), determined for each method and variable on 10'000 weather stations globally for the year 2017.
Comparison of the mean absolute error (MAE) for four different meteorological parameters for more than 10'000 weather stations worldwide. The analysis was conducted based on measurements recorded in 2017:
|Model approach||Air temperature||Wind speed||Annual precipitation||Dewpoint temperature|
|Forecast||meteoblue learning multimodel||1.2K||-||170mm||-|
|Weather forecast models||1.7 - 2.2K||1.5 - 1.7m/s||220 - 230mm||1.9 - 2.4K|
|History||Real-time updates (NEMS30)||2.1K||1.7 m/s||220mm||2.2K|
|Reanalysis model||1.5K||1.5 m/s||120 - 180mm||1.6K|
From an accuracy perspective, we recommend the following sources for spatial (worldwide) weather data:
|Air temperature||ERA5||NEMS local, NEMS30||meteoblue Learning Multimodel (mLM)|
|Wind Speed||ERA5||NEMS local, NEMS30||meteoblue MOS and model mix|
|Precipitation (daily events)||ERA5 (all precipitation events); CMORPH (heavy precipitation events)||NEMS local, NEMS30||meteoblue Learning Multimodel (mLM)|
|Precipitation (annual sums)||historical meteoblue model mix||historical meteoblue model mix||meteoblue Learning Multimodel (mLM)|
|Dewpoint temperature||ERA5||NEMS local, NEMS30||meteoblue MOS and model mix|
For historical analysis, reanalysis models offer the highest accuracy, but they are only available with a time lag of 2-5 days (CMORPH) to 2-3 months (ERA5), and do not yet cover 20 years. For applications with require realtime updates, consistency and time extension over 30 years and multiple variables, NEMS30 is the only solution currently available.
The 2m air temperature is best calculated by the meteoblue learning multimodel (mLM) with values of MAE = 1.2K. The MOS air temperature forecast gives the same accuracy as the reanalysis model ERA5 (MAE = 1.5K), which is recommended for historical data sets. The ‘stand-alone’ (RAW) global weather forecast models perform in the range between 1.7 and 2.2K. Hence, the 6-day forecast of the meteoblue multi-model is as good as the 1-day forecast of a ‘stand-alone’ (RAW) numerical weather forecast model.
The model uncertainty of the forecasted 10m wind speed is within 1.5 – 1.7m/s by using ‘stand-alone’ weather forecast models and for historical data 1.5m/s by using the reanalysis model ERA5. The model error could be reduced to 1.2m/s for model simulations with MOS.
meteoblue calculates radiation for the land and sea surface and for atmospheric layers, both as incoming direct and indirect sunlight, as well as reflected radiation from clouds or surface. The meteoblue simulations for global surface radiation is consistent over continents and reaches a monthly mean absolute error of 1-15% in 95% of all places.
The model skill of daily precipitation events decreases with increasing precipitation intensity. Numerical weather forecast models are the best source for detection of small precipitation events. For heavy precipitation events, the model skill of satellite observations is larger than those of numerical weather forecast models. The model skill could not be increased by mixing two (or more) models for daily precipitation events.
For historical data, annual precipitation sums are calculated best by using satellite observations from CHIRPS2, which are bias corrected with the same measurement data set used for verification in this study. The model accuracy of CHIRPS2 in regions without measurement stations is therefore expected to be significantly lower and to a certain extent unknown.
The model accuracy for the dewpoint temperature is slightly lower than the model accuracy for the air temperature. MAE values are between 1.9 - 2.4K for numerical weather forecast models and 1.6K for a reanalysis model. The accuracy of model simulations with MOS are in a similar range to those of the reanalysis model.
Global Patterns of Predictability
The following section gives a summary of a study on global patterns of predictability. This study was conducted by meteoblue to give an overview of the ability of weather models to simulate local temperature and wind speed. The definition of predictability allows users to estimate in advance the precision of a weather simulation for a given place or region.
Combined predictability for temperature, wind and dewpoint
- Derived multivariate predictability index, using the MOS forecast (model output statistics), which corrects for bias and other local effects.
- The hourly absolute errors of the MOS simulations for temperature, wind speed and dewpoint are scaled linearly between 0 - 1.
- Absolute errors for temperature, wind speed and dewpoint temperature which are considered with equal weight: The scaling bounds for each variable are set, such that the 10% with largest as well as the 10% with smallest error have an index value of 0 and 1, respectively.
The predictability index gives a good overview of the ability of numerical models to predict weather variables (independent of local effects).
Predictability (expected quality of simulation) is very variable across the globe:
- UK, Northern France, Germany and the tropics are easiest to predict!
- Continental climate is more difficult to predict than maritime (temperature).
- Wind speed predictability is lowest at coastlines.
- Precipitation was not analysed (in this study). Still it is most difficult in tropics.
The predictability index allows for a clear differentiation between the regions
- Regional indices can be assigned to define expected simulation accuracy.
NMMb can be used to generate high precision weather simulations with precision slightly higher than existing global models (GFS), and additionally offers global hourly resolution, to better define phenomena such as predictability.
The concept of predictability can be used in economical models, to calculate regional probabilities of forecast accuracy. Applications are possible to
- Forecasts: setting warning thresholds (e.g. chance of frost prediction)
- History and archive: calculating economical impact of weather-based measures (e.g. frequency of occurrence of certain temperatures, wind speeds, etc.).
NMMb hourly global simulations are now available from 1994 to 2014, and can be used to reproduce weather conditions in any part of the world with a known accuracy.
Sensitivity to the forecast hours
The model performance typically decreases with increasing forecast hours. The accuracy of the 2m air temperature by choosing the meteoblue learning multimodel (mLM) is within 1.2K for the 24h forecast and within 2.0K for the 6 day forecast. This implies that the 24h forecast of the mLM is as good as the 6 day forecast of the stand-alone numerical weather forecast models.
MAE [K] as a function of the forecast hours for the mLM for single analysis days and the average (black). The 24h forecast error for MOS (blue) and the raw models (red) is additionally shown.
A quick introduction to the meteoblue learning multimodel can be downloaded here: mlm_leaflet.pdf
24 hour forecast of historical and forecast data
Our results show that the meteoblue Learning Multimodel (mLM), which is used in the operational forecast, performs significantly better than MOS simulations and the historical reanalysis model ERA5 (MAE = 1.2K vs. MAE = 1.5K). The accuracy of stand-alone numerical weather forecast models (e.g. NEMS, GFS) is significantly worse than MOS and mLM in particular.
Over 90% of all meteorological stations have an accuracy better than 2K by using the meteoblue learning multimodel (mLM). This number is reduced to 85% by using the reanalysis model ERA5 and to 50% (36%) by using the stand-alone numerical weather forecast model NEMS (GFS).
Continental regions and regions in high elevation are typically simulated worse than maritime and low elevated regions. The errors in Europe and North America are typically lower than on the Southern Hemisphere. Air temperatures are typically worse simulated in Northern Hemispheric winter than in summertime.
Model performance of the 24h forecast of the mLM (top panel), the reanalysis model ERA5 (bottom left) and the numerical weather forecast model GFS (bottom right) for September – October 2018.
NEMS Global 25km - verification of temperature
Temperature simulations with MOS:
- Corrects most errors
- 92% of stations with MAE (Mean Absolute Error) < 2.0°C
- Improvement vs. RAW = 0.8°C
- 85% of all hourly errors < 2.0°C
Used in all meteoblue forecasts.
NEMS Global - verification of temperature
meteoblue predicts more than 70% of all temperatures with less than 2°C difference from measured temperature - 3 days (72 hours) in advance. For 12 hours ahead, more than 80% of all temperature forecasts are less than 2°C different from measurement - on an hourly basis. The RMSE (Root Mean Square Error) of the hourly forecast is less than 2.5°C for up to 3 days forecasts, and around 2°C for 1 day forecast (these data are valid for Europe and North America). What does that mean?
Personally: If you assume that you can distinguish temperature differences of more than 2°C (by feeling), then 2/3 of all meteoblue hourly temperature forecasts are already correct 3 days in advance! It means that when you look at a meteoblue forecast for the next 72 hours, you will experience during at least 54 hours the same temperature as meteoblue forecasts.
Technically: If you interpolate temperature measurements from a weather station made every 3 hours into hourly data and compare them to the actual hourly measurements, your RMSE will be 1.5 - 2.0°C. The meteoblue forecast error is 2.2°C on average. This means that the temperature forecast is as good as a measurement every 6 hours.
Example of use
In the following example the probability of a frost event is shown.
Hypothetical example: You are interested in temperatures below 0°C, frost, because you are a farmer and you are worried about your crop. The error of the model is +/- 2°C in 85% of the cases. This means with a risk of 15% the temperature can vary more than 2°C. But you are only interested in decreasing temperatures, so the risk of temperature variations in negative direction is only half: 7.5%. Precise, at a temperature of 2°C you have a probability of 7.5% that the temperature falls below 0°C. If the critical temperature for your crop is -1°C, the risk is even lower: 3.75%; as well as for a critical temperature of -2°C: 1.875%.
The same thought can be made with an error of 1°C in 85% of the cases. There is a decrease in probability that the temperature falls below 0°C; the risk is 3.75% at 0°C.
If the forecast is only valid for 60% of all cases the risk looks as follows: At a temperature of 2°C you have a probability of 20% that the temperature falls below 0°C; 10% for a temperature below -1°C.
All these considerations assume that you are estimating the risk of temperatures below 0°C at an actual temperature of 2°C. If this calculations are made at for example 1°C, the risk changes.
In the following figure, these facts are shown in a visual way.
Visualisation of the risk, that the temperature drops below 0°C.
Introduction of the multimodel approach for radiation in 2017
In a meteoblue radiation verification study from 2017, for hourly radiation simulations, the following results could be found:
- Bias < 10% for all sources
- The hourly MAE$^*$ lies between 5 and 30% for multimodel simulations (standard for forecast and history simulations)
- The hourly MAE$^*$ lies between 5 and 20% for intraday forecast and satellite observation
- These findings are consistent over all continents
$^*$MAE: Mean absolute error in percent of the mean observation value. For further information, please consult the full document of the 2017 radiation verification study.
The multimodel technique is used for all meteoblue radiation simulations.
Multimodel radiation validation 2017
Improvements in 2018
Like other variables, our radiation forecast and history is a matter of constant improvement. Furthermore, it is important to us to show the data quality of different sources transparently.
Thus we carried out another study, with the Fraunhofer Institute for Energy Economics and Energy System Technology from Kassel, Germany, where we were able to confirm the potential of multimodelling approaches for radiation simulation, and could compare the multimodel forecast with other sources, like the highly acknowledged IFS model of ECMWF. The results were presented at the Fachtagung Energiemeteorologie in Goslar, Germany and at the EU-PVSEC (the abstract document of the study for the conference is available here) in Brussels, Belgium.
Based on the outcomes of the studies from 2017 and 2018, meteoblue was able to further improve the radiation forecast and to beat the IFS model of ECMWF, which is shown in the graphic below. The improved multimodel forecast was launched in September 2018.
Multimodel radiation validation 2018
The operational meteoblue multimodel mix for the 10 m wind speed is of similar quality as the ERA5 historical reanalysis. Numerical weather forecast models have significantly larger MAE values than ERA5, MOS or the meteoblue multi-model mix. A mean absolute error of smaller than 2 m s-1 was found for 89 % of the meteorological stations by using the meteoblue multimodel mix. For 87% of the meteorological stations the mean absolute error is smaller than 2 m s-1 by using the reanalysis model ERA5, followed by GFS (80%) and NEMS (80%).
Mountainous and continental regions typically show the largest model errors. The regions with the best model quality are Europe and North America, whereas the model accuracy of the wind speed is typically worst close to the equator, in Africa and in Australia.
Mean absolute error (MAE), mean bias error (MBE), root mean square error (RMSE) and standard deviation (stddev) [m s-1] and Pearson correlation coefficient for numerical weather forecast models (GFS, NEMS), the meteoblue multi-model mix and historical reanalysis model ERA5.
|MAE [m/s]||MBE [m/s]||RMSE [m/s]||stddev [m/s]||Pearson correlation|
|meteoblue multimodel mix||1.48||0.13||1.94||1.66||0.67|
MAE [m/s] of meteoblue multimodel mix 10 m wind speed used in operational weather forecast. Verification is based on all hourly data of the year 2017.
MAE [m/s] of the reanalysis model ERA5 (not available as forecast) used for long term historical analysis. Verification is based on all hourly data of the year 2017.
MAE [m/s] of wind speed forecasts for ‘stand-alone’ model output as computed by the numerical weather forecast model GFS. Verification is based on all hourly data of the year 2017.
MAE [m/s] of the 10 m wind speed from the numerical weather forecast model NEMS. Verification is based on all hourly data of the year 2017
Daily precipitation events
For historical data, the model performance of ERA5 and the meteoblue Learning Multimodel (mLM) is significantly better than the satellite observation CHIRPS2. Satellite observations typically have a larger model skill than numerical weather forecast models for heavy precipitation and close to the equator.
Probability of detection (POD), false alarm rates (FAR) and Heidke skill score (HSS) for three different daily precipitation events (1 mm; 10 mm; 50 mm) for the historical reanalysis model ERA5, the numerical weather forecast model GFS, the satellite observation CHIRPS2 and the meteoblue multi-model.
|Daily Precipitation > 1 mm||Daily Precipitation > 10 mm||Daily Precipitation > 50 mm|
|meteoblue Learning Multimodel (mLM)||0.70||0.49||0.47||0.48||0.64||0.36||0.09||0.73||0.14|
| ERA5 | 1.49 | 0.03 | 1.93 | 1.62 | 0.66 | | GFS | 1.69 | 0.24 | 2.20 | 1.87 | 0.58 | | NEMS | 1.67 | -0.05 | 2.20 | 1.85 | 0.60 |
Heidke Skill Score (HSS) for precipitation events of > 1mm/day for the reanalysis model ERA5 (not available as forecast) used for long term historical analysis. Verification is based on all daily data of the year 2017.
Annual precipitation sums
The meteoblue multimodel mix is significantly better than the historical reanalysis model ERA5 for annual precipitation sums. We recommend the use of the meteoblue Learning Multimodel (mLM) for the operational forecast, because daily precipitation events as well as annual precipitation sums are satisfactorily reproduced. Note that the measured precipitation sums in Romania are inaccurate, resulting in non-reliable values of the mean percentage error (MPE) in this region.
MPE [%] for the historical reanalysis model ERA5. Verification is based on all daily data of the year 2017.
MPE [%] for the meteoblue Learning Multimodel (mLM) used in operational forecasting. Verification is based on all daily data of the year 2017.
The model accuracy of the meteoblue multimodel mix (MAE = 1.8 K) (used in the operational forecast) is of similar accuracy as the ERA5 historical reanalysis model (MAE = 1.6 K). The spatial distribution of MAE values for the meteoblue multimodel mix, ERA5, and the reference numerical weather forecast model GFS can be found below. The best accuracy was found in Europe and North America and in low elevated, maritime regions.
A mean absolute error of smaller than 2 K was found for 81 % of the meteorological stations by using the historical reanalysis model ERA5. For 70 % of the meteorological stations the mean absolute error is smaller than 2 K by using the meteoblue multimodel mix, followed by NEMS (53 %) and GFS (43 %).
MAE [K] of the reanalysis model ERA5. Verification is based on all hourly data of the year 2017.
MAE [K] of the meteoblue multimodel mix used in operational forecast. Verification is based on all hourly data of the year 2017.
MAE [K] of the numerical weather forecast model GFS. Verification is based on all hourly data of the year 2017.
In the following, the verification studies done with meteoblue simulations are shown.
Verification study for air temperature, wind speed, precipitation and dewpoint temperature (2018)
Comprehensive study for four meteorological parameter (air temperature, wind speed, precipitation, dewpoint temperature) for historical and forecast data for more than worldwide 10000 meteorological stations. An overview over the model performance of meteoblue multi-models, MOS, reanalysis models and 'stand-alone' numerical weather forecast models is conducted.
Further a quick introduction to the meteoblue learning multi-model can be downloaded below.
NEMS validation study (2015)
Scientific publication dealing with the verification of RAW and MOS forecasts for air temperature, wind speed and the dewpoint temperature by using 9000 weather stations worldwide.
NEMS validation study (2013)
Operational system validation of global weather simulation with hourly intervals. It includes verification of temperature and wind speed of RAW and MOS forecast for >9500 weather stations and was presented at the Meteorological World Expo in Brussels on 16 October 2013:
NMM validation study (2011)
Operational system validation of high resolution weather simulation with hourly intervals. It includes verification of temperature and wind speed of RAW and MOS forecast for >1100 weather stations and was published in the Journal of Applied Meteorology and Climatology in August 2011:
Solar multi-model validation
Operational system validation of solar radiation & power forecast for day-ahead forecast, intraday nowcast and historical time series of NEMS simulations or satellite derived radiation: