Taking climate model evaluation to the next level

Earth system models are complex and represent a large number of processes, resulting in a persistent spread across climate projections for a given future scenario. Owing to different model performances against observations and the lack of independence among models, there is now evidence that giving equal weight to each available model projection is suboptimal. This Perspective discusses newly developed tools that facilitate a more rapid and comprehensive evaluation of model simulations with observations, process-based emergent constraints that are a promising way to focus evaluation on the observations most relevant to climate projections, and advanced methods for model weighting. These approaches are needed to distil the most credible information on regional climate changes, impacts, and risks for stakeholders and policy-makers. Earth system models project likely future climates, however, evaluation of their output is challenging. This Perspective discusses new evaluation approaches, considering both simulations and observations, to ensure credible information for decision-making.


Introduction
The Intergovernmental Panel on Climate Change (IPCC) Fifth Assessment Report (AR5) concluded that the warming of the climate system is unequivocal and human influence on the climate system is clear 1 .Observed increases of greenhouse gases have contributed significantly to warming of the atmosphere and ocean, sea-ice decline and sea-level rise.The size and rapidity of these changes is concerning.Human-caused climate change is already affecting many aspects of societies and ecosystems.These impacts will become more visible and more serious in the twenty-first century.It should, therefore, be an international priority to improve our understanding of the climate system, and to reduce current uncertainties in projections of future change.This will rely on information from theory, observations, and Earth system model (ESM) simulations that are coordinated as part of the World Climate Research Programme (WCRP) Coupled Model Intercomparison Project (CMIP; refs. 2 -5).CMIP is now in its sixth phase (CMIP6) 5 and is confronted with a number of new challenges.Compared to CMIP5, an increased number of institutions participate in CMIP6, many with multiple model versions.The latest generation of climate models feature increases in spatial resolution, improvements in physical parameterizations (in the representation of clouds, for example) and inclusion of additional Earth system processes (such as nutrient limitations on the terrestrial carbon cycle) and components (such as ice sheets).These additional processes are needed to represent key feedbacks that affect climate change, but are also likely to increase the spread of climate projections across the multimodel ensemble.This escalates the need for innovative and comprehensive model evaluation approaches.
CMIP provides the basis for multimodel evaluation and has, over the years, revealed a variety of systematic differences between models and observations, with many persisting from one model generation to the next 6,7 .An important issue that remains to be fully addressed is the extent to which model errors affect the quality of climate projections and subsequent impact assessments 8 .Traditionally, many climate projections are shown as multimodel averages in the peer-reviewed literature and IPCC reports, with the spread across models presented as a measure of projection uncertainty 9 .There is now emerging evidence that weighting based on model performance may improve projections for specific applications 10,11,12 .A further complication in devising model weighting approaches is that many CMIP models share components, or are variants of another model in the ensemble, and hence are not truly independent 12,13,14,15,16 .This has the potential to bias the multimodel results in ways that are only beginning to be explored.The lack of independence challenges the notion of a 'model democracy', in which each model is weighted equally 17 .
The growing number and complexity of models, the expanding suite of outputs they produce, the multitude of downstream applications and the growing availability of observational datasets drive a need for more routine and systematic evaluation, utilizing a comprehensive set of existing model performance metrics and diagnostics.Newly developed CMIP evaluation tools 18,19 will ultimately enhance our ability to identify model errors, to investigate their causes and to quantify and potentially reduce projection uncertainties.
In this Perspective, we summarize key advances since AR5 and key scientific opportunities for improving climate model analyses that will be assessed in the AR6.Our focus is on gaps in the understanding of systematic errors, the development of CMIP model evaluation tools, emergent constraints and weighting methods.We also address the need for more user-and policyoriented model evaluation at the regional scale required for impact studies.Finally, we discuss how the scientific community might provide more robust climate model information and more tightly constrained model projections.

From model errors to understanding processes
Comparing model results to observations provides insight into the quality of model simulations and the way in which various processes are represented.Comparisons with observations can reveal shortcomings in individual models and systematic errors in a large multimodel ensemble 7,20 .An example of a systematic error is the excessive simulated band of precipitation in the tropical Pacific south of the Equator, a feature not present in observations.Taken together with the usually correctly simulated climatological intertropical convergence zone (ITCZ) precipitation maximum that stretches across the tropical Pacific north of the Equator, this systematic splitting of tropical Pacific rainfall into two discrete branches is commonly referred to as the double ITCZ.Other examples of systematic errors include a dry Amazon bias, a warm bias in the eastern parts of tropical ocean basins, differences in the magnitude and frequency of El Niño and La Niña events, biases in sea surface temperatures (SSTs) in the Southern Ocean, a warm and dry bias of land surfaces during summer, and differences in the position of the Southern Hemisphere atmospheric jet 7 .
One major challenge is that it is often not possible to attribute a specific cause to a specific systematic error.For example, it has been suggested that the systematic warm bias in the upwelling zones off the west coasts of each continent (see Fig. 1) is associated with biases in the representation of stratocumulus clouds 21 and boundary layer convection 22 in these regions.However, other studies suggest that the root cause of this warm bias is the representation of ocean upwelling and its forcing from surface winds 23 .An additional complication is that a regional difference between the simulation and observations may be a consequence of errors that occur far from the region in question, and are manifested via teleconnections.Certain regional SST biases, for example, are related to biases in other ocean basins and to aspects of the large-scale ocean overturning circulation 24 .In some cases, although the link between a particular bias and some physical process may seem robust, the specific cause of the bias -as well as its remedy -may remain elusive.
But there are also compelling examples of how a multimodel analysis of a particular systematic bias can lead to a clearer understanding of underlying causes.One systematic bias revealed in the evaluation of CMIP5 models was the apparent difference between the observed and modelled global mean surface temperature increase in the early twenty-first century 7 .These differences motivated a range of targeted analyses exploring model performance, internal variability, external forcing and observational uncertainty 25 .Although the magnitude of the slowdown differs slightly depending on which global observational dataset is analysed, this focused effort revealed that the observed slowdown was due to a combination of factors, chiefly involving internally generated decadal-timescale variability in the tropical Pacific 26,27 and the missing effects of a series of moderate volcanic eruptions 28 .Averaging the time series across a collection of coupled model simulations strongly reduces the effects of internally generated variability, more clearly revealing the underlying externally forced response.There is, therefore, a mismatch between the precise observed sequence of variability and the smooth evolution of temperature in the multimodel mean.Models initialized with observations from the years immediately before the early twenty-first century slowdown were able to capture aspects of the observed change in warming rate after 2000 29 .These results highlight the importance of using different simulation frameworks (for example, coupled simulations and decadal predictions initialized with observations) to understand the causes of differences between modelled and observed climate changes.Stronger observed warming since 2014, which is replicated in initialized model predictions for the period after 2014 30,31 , adds to the evidence that the weaker warming before 2014 had a large contribution from internal climate variability.
A related question is the extent to which observational uncertainties and inhomogeneities 32 are hampering model evaluation.Just as efforts to improve models continue, there is a parallel effort to improve observationally based datasets.Even for a very basic climate quantity such as temperature, this involves refined corrections for biases and incomplete global coverage in the raw surface observations 33 and corrections for biases in satellite retrievals 34 .Observations are also critically important in model tuning 35,36 , which should be clearly documented and taken into account in model evaluation studies.One difficulty in comparing models against observations is posed by inconsistency in the sampling or definition of the quantities compared (for example, model data may be daily averages whereas satellite samples may be for a certain time of day).This inconsistency can be addressed by incorporating simulators of specific instruments into climate models 37 .

New CMIP model evaluation tools
The scope of model evaluation has expanded dramatically in recent years.Well-established aspects of model evaluation are now becoming more routine, results are available more rapidly than for CMIP5, enhancing their value for model analysts and developers 38 .A key development for CMIP6 is the availability of the Earth System Model Evaluation Tool (ESMValTool 18 ; https://www.esmvaltool.org/)and the Coordinated set of Model Evaluation Capabilities (CMEC; https://cmec.llnl.gov),which are both open-source capabilities.ESMValTool includes a large collection of diagnostics and performance metrics for atmospheric, oceanic and terrestrial variables; not only for the mean state, but also for trends, variability, key physical processes and emergent constraints.ESMValTool also has the capability to reproduce figures from several chapters of AR5 and incorporates targeted analysis packages, such as the National Center for Atmospheric Research (NCAR) Climate Variability Diagnostics Package 39 .CMEC comprises the PCMDI Metrics Package (PMP 19 ), the International Land Modeling Benchmarking Project package (ILAMB 40 ) and the parallel toolkit for extreme climate analysis (TECA 41 ).CMEC emphasizes a diverse suite of physical and biogeochemical summary statistics gauging the consistency between models and observations across a range of space and timescales.
Both ESMValTool and CMEC have undergone rapid development over the past few years, and are now mature, well-tested tools that provide end-toend provenance tracking to ensure reproducibility.One goal is to routinely provide evaluation results through the Earth System Grid Federation (ESGF) shortly after new CMIP6 simulations are published.This workflow is depicted in Fig. 2: the tools are run at selected ESGF nodes, utilizing observations available in standard formats or provided by the user 38 .The foundations for this significant undertaking are the community-based experimental protocols and conventions of CMIP, including their extension to observations (obs4MIPs 42 ) and reanalysis.
Emergent constraints on Earth system sensitivities One of the biggest challenges in ESM evaluation is to identify the performance metrics that are most relevant to climate projections 7 .The reliability of models can only be assessed with observations of the past and present.This means that models are assessed against criteria that are not necessarily informative in terms of the quality of model projections of future climate change.The emergent constraint approach attempts to address this problem by identifying robust, physically interpretable relationships between Earth system feedback behaviours on short, well-observed timescales and on timescales that span the twenty-first century and beyond 43,44 (see the figure in Box 1).Emergent constraints use an ensemble of ESMs to define a relationship between a measured aspect of current or past climate and the strength of a simulated Earth system feedback in the future.It is the model ensemble behaviour (rather than the behaviour of a single model) that defines the emergent relationship between the observed variability and the projection of the future climate.When combined with observational data and a measure of observational uncertainty, the model-derived emergent relationship can be converted into an emergent constraint on the Earth system sensitivity in the real world.
When AR5 was published, numerous emergent constraints had been identified.Examples include studies of snow-albedo feedback 43 , sea-ice 45 , tropical precipitation extremes 46 , carbon loss from tropical land under warming 47 and the future latitudinal shift of the Southern Hemisphere westerlies 48 .Such studies have proliferated since AR5, including constraints on cloud feedbacks and equilibrium climate sensitivity (ECS) 49,50,51,52,53,54,55,56,57 , strengthening of the hydrological cycle 58,59 , the temperature sensitivity of tropical land carbon storage 60 , CO 2 fertilization of plant photosynthesis 61 , future changes in ocean net primary production 62 , permafrost loss 63 , changes in natural sources and sinks 64 of CO 2 and mid-latitude daily heat extremes 65 .The proposed observable constraints involve historical trends 43,45,64 , interannual variability 47,55,60 , seasonal cycles 43 , trends in the seasonal cycle 61 and spatial variability 63 .Constraints have been tested against different ensembles and scenarios 43,47,60,66 .For example, a relationship between the ECS and the inferred strength of upward mixing in the tropical lower troposphere was used to discount ECS values below 3 °C, as all models with lower ECS had too little mixing (Fig. 3, left) and by implication a decreased positive cloud feedback at low levels 53 .This would narrow the range of ECS significantly compared to the 1.5-4.5 °C range assessed by AR5 1 .However, other emergent constraint studies for ECS lead to different estimates of this uncertainty range 51,52,53,54,55,56,57 , pointing to the need for further research.For carbon cycle feedbacks, an emergent constraint on the impact of increased CO 2 on photosynthesis was found based on observed changes in the seasonal cycle of atmospheric CO 2 , suggesting that doubling of the CO 2 concentration in the atmosphere will cause global plant photosynthesis to increase by approximately one third (Fig. 3, right) 61 .
Despite the attractiveness of emergent constraints, there are some welljustified concerns.Most importantly, the emergent relationship between the observable and the sensitivity to be constrained is derived from a model ensemble.The emergent constraint may be misleading if the model ensemble has a systematic error (such as the double ITCZ) that affects the emergent relationship, or reflects the simplicity of a parameterization common to many models rather than an intrinsic underlying process.Second, there is a danger of finding spurious relationships between observables and Earth system sensitivities if the high-dimensional outputs available from ESMs are simply data-mined for high correlations67.The correlations found in a data-mining approach should be restricted to those that have a physical explanation.Finally, we should not expect short-term variability to yield constraints on slow feedbacks that have negligible effects on that variability.For example, interannual variations in sea level are unlikely to provide constraints on century-timescale sea-level rise due to icesheet melt.On the other hand, fast processes (such as water vapour and cloud feedbacks) are evident in short-term variability as well as trends, and are therefore much better candidates for emergent constraints that relate variability and sensitivity.Many of the most uncertain feedbacks are fast feedbacks such as these, which are more amenable to the emergent constraint technique.

Weighting multimodel climate projections
Traditionally, CMIP models were treated as independent, equally plausible estimates of future climate.Confidence in projections was inferred from model agreement on the sign and magnitude of future change 9 .In the context of multimodel ensemble projections, an increasing number of studies have weighted models that agree better with historical observations of that quantity or relationships between the projected quantity and observable metrics 11,67,68,69,70 .However, the majority of weighting studies of certain climate properties such as sea-ice extent in AR5 include only a small set of metrics that are not always clearly related to the projected quantity in question.
An increase in weighted skill scores can be relatively simply achieved insample (that is, in the observational period and/or location used to derive weights).However, only a few studies have specifically focused on the likelihood of weighted results providing benefits for the intended application (that is, out-of-sample, typically twenty-first century projections) 10,11,12,14,15,68,71,72 .Although we clearly have no observations of future climate, model-as-truth (also termed pseudo-reality in some studies 11,68 ) and calibration-validation exercises for different time periods of the observations yield valuable information on the potential benefits of different weighting approaches 73 .In addition to testing whether projections of a specific variable and metric can be improved through weighting, thorough out-of-sample testing can help guard against other potential issues with weighting.For example, there is no single metric that reliably captures all aspects of model performance for all purposes, even if interest is restricted to a very specific scientific question 74 .Out-of-sample testing can tell us whether optimizing in one metric or variable transfers any benefits to other metrics or variables 71 .It can also indicate whether internal variability has played a role in any in-sample success of weighting, help to avoid the issue of the same datasets being used to calibrate and weight models, and reveal whether weighting has artificially reduced ensemble spread.A further problem is the risk of systematic errors in observational products producing inappropriately weighted ensembles.Furthermore, weighting schemes have no capacity to account for errors that are shared across an ensemble -an issue that is particularly important in the case of small ensembles.
Another relevant issue is model interdependence.Some of the nominally different models in the CMIP archive share individual components or parameterizations, or represent key processes in the same way.This can lead to shared biases that have the potential to compromise the efficacy of performance-based weighting 72 and to create artificially strong emergent constraints 75 .Using model error correlation as a measure for interdependence, it was found that the effective number of independent climate models was likely to be significantly lower than the total number of models in the CMIP ensemble 76,77 .Several studies have subsequently introduced alternative ways of quantifying and accounting for interdependence 13,14,78,79 .Recently, the US National Climate Assessment weighted each member of the CMIP5 archive using both a multivariate skill score for historical climatology and a measure of uniqueness in the archive 12 .Figure 4 shows the resulting skill weight versus the independence weight for all CMIP5 models.Skill weights are calculated as multivariate root mean square errors over a North American domain, whereas independence weights are computed using model error bias correlation.No model receives high weights for both skill and independence (see the empty upper-right corner).This suggests that the ensemble has been unintentionally skill weighted by the inclusion of multiple versions of better-performing models.As in the case of efforts to define broadly applicable performance metrics, it is evident that there is no universally accepted definition of model dependence: accounting for model dependence is problem-specific.Weighting exercises that accounted for model dependence were sensitive to almost all aspects of the problem, including the selected metric, variable, analysis period and constraining observational dataset.

Regional model evaluation for impact and risk assessments
A primary goal of climate research is to identify how climate variability and change affect society and to inform strategies for mitigation and adaptation to climate change.Impact sectors include agriculture, forestry, water resources, infrastructure, energy production, land and marine ecosystems, and human health.To accurately capture many of the significant effects of climate change in sectoral impacts models, high levels of detail regarding the evolving climate state are necessary.The impacts community generally needs rigorous regional-scale evaluation of the seasonal cycles of temperature, precipitation, humidity, wind speed and downwelling solar radiation.Although some sectors are affected by mean climate changes, the most acute impacts are related to extreme events.
Annual mean temperature and precipitation, monsoon timing and intensity, and modes of variability that can alter the probability of extreme events have been evaluated in the CMIP5 ensemble 7 .The Expert Team on Climate Change Detection and Indices compiled a set of indices to quantify extreme events 80 .Observational estimates of these indices have been used to evaluate CMIP5 models 81 , and are now incorporated into the ESMValTool for evaluation of CMIP6 models.The overall model performance was mixed in capturing the observed behaviour of these extreme event metrics.However, such comparisons remain difficult to interpret because of substantial uncertainties and data gaps in many of the observational datasets, and due to limited availability of CMIP5 model output at sub-monthly and daily frequencies.
Stakeholder-oriented applications have benefited from improved models, more user-relevant metrics, more robust observational systems, and longer observational records since AR5.Improvements in remote sensing products have enabled the evaluation of interannual and sub-seasonal events 82 .Models show continuing improvement in the representation of the diurnal cycle, storm tracks, the effects of blocking on extreme events, the El Niño/Southern Oscillation, tropical cyclones and other circulation features.Awareness of multivariate extremes 83 has begun to emerge.The production and application of dynamically and empirically downscaled model information has advanced 84 .Extreme event detection and attribution has made substantial strides, with real-time probabilistic event attribution 85 now feasible.Applications in different impact sectors have also been advanced by hybrid climate forcing datasets combining models and observations 86 , and by sector-specific information such as the geographic distribution of nitrogen fertilizer or irrigation applications and land-use patterns for agricultural modelling 87 .
Interactions between the climate modelling and applications communities are facilitated by the Vulnerability, Impacts, Adaptation, and Climate Services (VIACS) Advisory Board 88 , a CMIP6-Endorsed MIP 5 .This Board is an effort to draw a broader array of climate experts, practitioners and information brokers into the CMIP process, and to leverage the community engagements organized under the Global Programme of Research on Climate Change Vulnerability, Impacts and Adaptation (PROVIA) and as part of efforts such as the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP 89 ).The VIACS Advisory Board has solicited sustained community engagement: (1) on priority experiments; (2) on output variables to inform CMIP6 data requests; and (3) to highlight evaluations that would help to establish model credibility with the VIACS community.Given that it is common practice to adjust model biases with monthly mean information from present-day fields, the evaluation of the seasonal evolution and distributions of monthly climate at regional scales is among the highest priorities for VIACS users 90 .ESMs often have less variance than observations at hourly, daily and interannual timescales, which can lead to spurious effects when bias adjustment relies on standardized anomalies.Tropical rainfall biases are particularly problematic.

Ways ahead
The CMIP6 experiment design provides an opportunity for sophisticated, consistent characterization of the ensemble and its predecessors 5 .Targeted MIPs associated with CMIP6 will accelerate efforts to disentangle internal climate variability from forced responses, and to evaluate which model processes are relevant to a wide range of climate characteristics.Insights into the underlying causes of systematic errors are likely to be gained from idealized experiments (such as aquaplanets 91,92 ), systematic assessment of the influence of horizontal resolution, analyses of forcing uncertainty and the evaluation of individual model components.The diverse numerical experiments proposed in CMIP6 may help the climate science community to gain a deeper understanding of model behaviour and processes than has been possible in the past.Further diagnostic benefits should accrue from the development of convectively resolving models, dynamic vegetation, threedimensional ice-sheet models and refined physical parameterizations.
Model development, evaluation and weighting will be facilitated by the ongoing development and deployment of new climate observing systems with continuous quality assessment and independent verification.Rigorous quantification of observational uncertainties is now routine rather than exceptional.Examples include the availability of 'ensembles of observations' for a single observational product, which account for uncertainties associated with different subjective processing choices 93 .Challenges remain in propagating these uncertainties to derived quantities such as trends or conditional averages.New measurements and measurements made at higher frequencies will also provide further insights into systematic errors.ARGO floats 94 and new satellite missions are prime examples.
An exciting opportunity is provided by the new CMIP evaluation tools ESMValTool 18 and CMEC 19 .Both evaluation packages will be routinely executed whenever new model simulations are contributed to the CMIP6 archive.This allows rapid, quantitative comparisons of model results to a wide range of climate observations 38 .Such rapid and comprehensive feedback on model performance should help in addressing the causes of long-standing systematic errors and facilitate a shift towards more processoriented diagnostics, while ensuring continuity with more 'traditional' diagnostics applied in previous CMIP phases.The hope is that ESMValTool and CMEC will be further enhanced by the CMIP6-Endorsed MIPs and other science teams, leading to widespread adoption by model development teams and the user community.Other promising diagnostic developments on the horizon that should be further advanced include studies that assess responses to perturbations rather than mean climate 95 , and the application of innovative data science methods in Earth system science 96 such as neural networks 97 , machine learning-based anomaly detection techniques 98 , graphical models and causal discovery 99 .
Physically robust emergent constraints are a promising concept for understanding and constraining Earth system feedbacks and narrowing uncertainty in future projections.They may ultimately influence model development and observational strategies.In addition to new research on consistency across different emergent constraints and across generations of model ensembles, we anticipate the use of more sophisticated statistical analyses, which have so far typically involved one-dimensional linear regressions.Higher-dimensional emergent relationships related to more than one observable should yield more robust conclusions and avoid the possibility of contradictory constraints derived from separate onedimensional relationships.There is a new opportunity to test emergent constraints developed in previous model generations against the outputs from CMIP6 models.Finally, there needs to be a greater focus on developing emergent constraints for regional climate change that are more relevant to impacts than many of the large-scale metrics that are the current focus of emergent constraints 8 .
To guard against misleading emergent constraints arising from spurious correlations or from the dependence introduced by a parameterization common to many models -rather than from an intrinsic underlying process -we suggest that the development of emergent constraints should be treated as a form of hypothesis testing.For example, emergent relationships between variability and sensitivity could be derived on the basis of physical theory or simple underlying models.The predicted emergent relationship can then be tested against outputs from full-form ESMs.This approach could also yield an improved theoretical understanding of relationships between variability and sensitivity in the Earth system.Even where the outputs of one generation of models seem to be consistent with the hypothesized emergent relationship, the robustness of the relationship should be tested out of sample against models that were not used to define the relationship.The hypothesis testing approach that we propose would also protect from attempts to artificially tune a model to fit an observational constraint.Where this is carried out unphysically, the tuned model is likely to move away from the theoretical curve (that is, it will fit the x-axis observation but it will no longer be consistent with the y-axis sensitivity).
There is enough evidence now that the continued assumption of model democracy cannot be fully justified in future IPCC assessment reports.It is not yet clear, however, whether all variables of interest can be reliably constrained.Successful skill weighting has thus far been implemented for a limited number of specific applications.In these applications, the target property of interest is constrained by a small number of clearly relevant variables 10,11,12,67,68 .Future work for more complex chains of influence will need to consider orthogonal uncertainties and processes.For example, regional precipitation change may be influenced by global-scale warming, large-scale dynamics and microphysical parameterizations.For regional climate projections, a weighting that is based on processes controlling the region of interest and biases in large-scale atmospheric circulation is advocated 8 .
In addition, it has been demonstrated that CMIP models are not independent.Most inferences in the literature about model interdependence are derived from error correlation 13,79 .This cannot identify the specific model components that are interdependent.Identification of these common components is a difficult task due to the large number of models involved in CMIP and lack of detailed information regarding individual model versions.Further work is required to understand how interdependence can best be assessed.These efforts can proceed in tandem with research to better understand the effects of model construction and genealogy.Comprehensive databases of shared code, parameterizations, model development and tuning practices could help disentangle how models are related, and for what purposes they can be considered independent estimates of change.There is also the potential for better quantification of natural variability from palaeoclimate simulations 100 and enhanced collaboration with the detection and attribution community, whose statistical approaches provide information on whether model responses to changes in external forcings are consistent with observations.Simpler representations of the Earth system in a hierarchy of models will also be useful to improve more complex ESMs.
For improved assessments of regional impacts and risks, a key challenge and opportunity will be to derive collective understanding from global and regional climate models, as well as from regional-scale observations.To do so it will be important to bridge the gap between the climate model and impacts communities, and between the different scales at which these communities typically operate.CMIP6 will include weather-resolving global model resolutions (∼25 km or finer) that need to be compared to regional model results and downscaled coarse-resolution simulations.Concerns about bias correction of climate change simulations have been raised, and ways to address these concerns have been proposed 8 .Many of the key systematic errors that hampered the reliable simulation of surface variables and extreme events will benefit from increasing spatial resolution 101 , variable-resolution grids, improved parameterizations, and advances in bias corrections and downscaling techniques 84 .Curated archives and the CMIP evaluation tools will enable participation by a broader diagnostic community, many of whom are not presently capable of advanced interrogation of climate model simulations.The provision of useful climate information and messages for the impacts, risk and climate services communities requires a process that is rooted in sustained engagement with stakeholders that concentrates on areas of particular vulnerability or exposure.The projection of changing hazard metrics and the construction of driving scenarios for impact models across a range of local, regional and national scales should benefit from a process that distils information from many different sources.Such sources include multiple ESMs, statistically and dynamically downscaled models (through the Coordinated Regional Downscaling Experiment, CORDEX, now a CMIP6-Endorsed MIP 84 ) and bias adjusted models 8 .The publication of AR6 will enhance the focus on regional climate information through a regionally defined Atlas and new chapters on globalto-regional linkages, extreme events, and impact-and risk-relevant climate hazards.
Despite substantial progress in climate modelling over the past few decades, there remains a substantial spread in projections of future climate change.For example, the range of model estimates for ECS to a doubling of CO 2 concentrations has not decreased since the 1970s 7 , although understanding of the processes that are involved has certainly increased.The need to inform mitigation policy and adaptation remains 102 .We believe that there is now an unprecedented opportunity to constrain policy-relevant metrics such as cumulative CO 2 emissions consistent with specific temperature targets with observations, and to reduce uncertainties in climate projections, both at global and regional scales.The challenge is to make intelligent use of the petabyte-scale output that will become available from the new CMIP6 project, along with modern observation systems, new model evaluation tools and novel data science techniques.A combination of different process-based emergent constraints together with model-weighting approaches that consider both model performance and interdependence have the potential to yield more robust multimodel information for a wide array of societally and environmentally critical applications.