Modelling process overview

This page gives an overview of different parts of the modelling process, particularly in the context of decisions about model structure and modelling software tool selection.

Here 'model structure' refers to things like how the catchment is broken up into "modelled units", such as subcatchments or smaller units representing areas of a particular land cover type or river channel units, and how different units are connected to one another in the model. Structure choices are linked to the choice of 'modelling software tool' because different tools allow for different kinds of structures and process representation. In some cases there may be multiple modelling tools that will be able to meet one's structure needs and wants. In other cases, no tool may meet all the demands.

There are other considerations, such as ease of use, that will contribute to modelling tool selection. There is never enough time and data to build "the perfect model." Compromises will always need to be made. Identifying 'needs' and 'nice-to-haves' for the modelling project early on in the process will assist with appropriate, 'fit-for-purpose' structure selection, tool selection, and model building.

The process of catchment modelling is generally not linear. The steps covered below aren't always completed sequentially: there will often be iterations and circling back for revising.

Defining the modelling project goal(s)

Defining the goals of a modelling project in as much detail as possible, and prioritising across them, allows one to use the needed and desired model outputs to identify the model structures that would be required to attain them.

For example: a very broad goal of a modelling project could be to look at impacts of invasive alien trees on water supply from a catchment. More specific goals in such a project could be to look at the impacts of invasive alien trees on the 98% assurance yield of a particular water supply reservoir, or on the water level of an aquifer in a particular part of the catchment, or on streamflow in particular places within the catchment (withdrawal points, critical habitat points, etc.) during dry periods of certain recurrence frequencies. Looking at these more specific goals would help in determining whether model output on a monthly scale would be sufficient or shorter time-scales are needed, whether groundwater storage volumes or levels are a needed output from the model, what spatial scales one might want to get model streamflow outputs for, etc. In this example, more specific goals for modelling would also include defining what alternative land cover states would be used as reference to determine the "impact" of the invasive alien trees. This would influence the model structure and the needed tool capabilities in terms of how different land cover types and properties can be represented.

For reasons of data, time, or other constraints, detailed project goals may need to be revisited and adjusted. Understanding the potential pathways to reach ideal goals can help target future work.

Taking stock of available data & information

The data and information available will influence what is possible in a modelling project and the ability to assess and reduce uncertainties. Hydrometric observational data and catchment biophysical property data are used both for direct model inputs and for constraining structure and parameter value options through calibration and reality-checks (see Conceptual model section). The quality, spatial and temporal resolution, and time period of data influence how they can be used in modelling and what assumptions need to be made. For most data required for a catchment model, if local observational data is not available there are national and global datasets of various kinds (see data sources page) that can be used to obtain starting estimates, recognising potential scaling issues.

The scale and quality of the data and information about the catchment have implications for the level of detail in a model structure. The idealised specific goals of a modelling project may lead one toward a model with a highly detailed spatial and vertical structure. However, without commensurate data to parameterise or evaluate a model of this detail, the structure (and specific goals) may need to be revisited, acknowledging the uncertainties involved.

Developing 'conceptual' models of catchment processes

Describing and laying out a ‘conceptual’ or ‘perceptual’ model (see Terminology) of the catchment(s) being modelled is an important step in deciding on a structure for the numerical model. This will also assist in reality-checks on the numerical model and in recognising what simplifications, implicit representation, and assumptions are involved. A conceptual model is your hypotheses about processes in the catchment based on landscape analyses (e.g. topography, climate, spatial distribution of rainfall, geology, land cover type distribution, etc.), diagnostic patterns in the available hydrometric data (e.g. runoff ratio and its variability, flashy streamflow vs slow or prolonged responses to rainfall events, seasonality, groundwater variability, etc.), and previous studies in the catchment or in comparable settings. Identifying what runoff processes are likely to be dominant in different parts of the landscape and what flow paths are likely to connect different areas can guide choices of how to break up the catchment area into modelled units and how these should ideally be connected. It helps to highlight what is well understood and what is more highly uncertain (i.e. multiple potential explanations for an observed pattern and not clear which apply in this case).

The conceptual model description should extend to the scenarios being modelled in a project, i.e. how and how much are the changes to be modelled likely to impact different processes and connections at different scales? The numerical modelling is being done to answer these questions in more detail for a site, but existing understanding of the likely mechanisms, direction, and magnitudes of changes from relevant field studies can help design and check the modelling.

Using pre-existing numerical modelling software tools limits the design of the model: the spatial and vertical units, how they can be connected, how processes are calculated, etc. It is reasonable to expect that some aspect of your conceptual model of a catchment can not be directly represented by models built with a tool in the way that you believe they are occurring in reality (e.g. In the model, can a floodplain area receive subsurface flow from surrounding mountain areas? In the model, can a wetland area be fed by channel overflow? In the model, can vegetation ET be fed by groundwater?). Identifying this is important. There may be ways to represent certain processes implicitly or indirectly within a given model structure. This should be done consciously and be evaluated, especially if the processes being implicitly represented are central to the question that the model is being used to answer and/or they have a large impact on the key model outputs for the project.

Selecting a model structure & a modelling tool

In an ideal situation, one would come up with model structure needs (e.g. separately represented cover types, subsurface layers, etc; how these are connected in the surface and subsurface; model output types, locations, spatial and temporal scales) based on the conceptual model of catchment processes, the available data, and the specific goals of the modelling exercise, and then select a software tool capable of building a model with the desired structure. However, more often than not we are additionally constrained by time and other practicalities. The selection of a modelling tool can also be informed by how familiar we are with a tool and the time available to learn a new one sufficiently, awareness of what is possible across different tools, the time and effort required to set up and run different tools (see section below), computing power, access to tool licenses when applicable.

Even if a variety of tools are practically available to us for a project, it's possible no one tool will be able to build the structure we hoped for. In this case we can select the best compromise, proceed using multiple tools, and/or programme something outside the existing tools (new model, a linked module, etc).

Once a tool is selected, it is helpful to draw out and evaluate potential structures for the model (units, connections, etc) within the options available in the software tool in light of the project goals, conceptual model, and data. Even within these constraints, there will be decisions to be made. Part of this process is also deciding on the parameter values, or ranges of values, to apply in the various model algorithms used in the specific tool, based on the information available about different biophysical properties compared to the meanings of the parameters. While compromise and uncertainties in structure decisions are unavoidable, they should be documented to inform improvement in the future. Comparing outcomes across different structures that are considered potentially viable/realistic would be desirable if there are resources for this level of effort in the project.

Building & running the numerical model

Once the model structure (or multiple to test) within the selected tool has been decided upon, there is often a fair amount of work involved in actually building this in the software tool. This involves getting all the needed data into the required formats for the tool and inputting it. Tools vary significantly in how this is done, how user-friendly the process is, and how easy it is to make user-errors in this process and to find these errors when they’ve been made. Modelling tools also vary significantly in the time they take to run depending on their complexity and computing strategies. Tools also vary in how different outputs they produce can be accessed and saved, and in some cases analysed internally by the tool. In some situations practicalities around building and running the model alone can be motivation to further simplify aspects of a model structure, if this does not pose a major hindrance to reaching prioritised goals. An inter-comparison of aspects of the model-building process, i.e. the user interface, across the focus tools is given here.

Model validation & calibration

After the model has been built and run and the outputs have been obtained, the outputs need to be assessed against observational data, in as much as possible, and against the conceptual model at a minimum. It is likely this process will identify aspects of the model in need of improvement and ideally there would be time and resources in the project to test appropriate adjustments. There is a rich literature on model realism assessment, performance assessment, goodness-of-fit statistics. Linked to this are parameter sensitivity analyses, model output uncertainty analyses, and model calibration approaches.

Some form of assessment of the validity of the model outcomes against independent measures of realism is always necessary, regardless of the care taken in the model construction and the quality of the inputs. The approach taken in a project will vary depending on the project goals and resources available. Even if local time-series observational data of streamflow are not available, other information and data sources can be consulted to provide realism checks on different model outputs, e.g. patterns of flow in gauged catchments considered to be similar, local resident accounts of the extent and duration of major flood events, measurements or remote sensing estimates of ET in dominant vegetation types, borehole water level information, etc. If the model is producing outputs that don’t conform to expected patterns based on independent information, the structure and parameterisation, the input data, and/or potentially the conceptual model may need re-evaluation. A first port of call is error-checking the model set-up!

Uncertainty is introduced into modelling at every step: the input data, the data we validate against, and the choice of model structure with its algorithms and parameter values. This does not mean that models won’t provide helpful information about likely catchment processes and how they could change, but we should be realistic in our expectations of accuracy and attempt to get an idea of the scale of the uncertainty. Different modelling applications will require different degrees of certainty for the output to be of practical use. Thresholds of acceptability should be identified, but will take different forms for different projects.

Some modelling software tools have in-built uncertainty analysis, sensitivity analysis, and calibration tools that facilitate testing many potential parameter value sets (info the focus tools here). These are generally designed to target parameter value uncertainty and won’t, in themselves, address problems with the input data or validation data, or with the structure in terms of units, connections, and algorithms. They assess the degree to which model outputs vary, or can be brought closer to observational data, by changing the parameter values. There is always some degree of uncertainty in parameter values, given data availability, scale of measurement vs scale of modelling, simplifications of the physical process in the model's algorithms, etc. However, bounds can be placed on potentially realistic values for each parameter based on the existing biophysical information and understanding of the parameter meaning. If reasonable performance cannot be achieved through parameter sets selected within these bounds, this suggests limitations of this structure in representing the catchment processes and/or problems with the data.

It is possible for multiple different sets of parameter values to produce equivalent levels of performance against the modeller’s criteria for acceptability. This is known as equifinality. This can indicate that there isn’t enough observational data to resolve differences between some parameter set options, given the number of parameters we are uncertain about and the range of values we think each one could realistically take. More data with which to check model performance or model realism (potentially of different kinds), and more information to constrain the range of potential parameter values that are considered reasonable, can assist in reducing equifinality. Equifinality can also have to do with the model structure. For example, one set of parameters may produce better fitting model outputs for dry periods, while another set of parameter values has better outputs for wet periods. The two sets have comparable outcomes for overall combined statistics. This could be because properties or processes of the catchment actually shift more over time than the model structure itself allows for. It takes exploration of the ‘parameter space’ to identify equifinality and not all modelling tools facilitate this. When equifinality is identified, a suite of similarly acceptable parameter sets can be applied and the range of the outputs of this set of models can be presented as the project outcome, rather than relying on the model outputs of a single set of parameter values. Different modelling tools, and structures, make this a simpler or harder task to complete.