Greater transparency is needed on water quality modelling in the UK - written evidence to Environmental Audit Committee
Published:
The following was submitted by myself and Dr Gareth Roberts as written evidence to the Environmental Audit Committee’s non-inquiry session into water quality, published here. We highlight that computational tools being used by regulators and industry to divert billions of pounds of investment into are not transparent or readily accessible. We suggest that making software used to assess water quality open source is a crucial step towards developing trust in modelling results and consequent decisions.
Summary
Computational tools are a crucial component in assessing the impact of water companies and other industries on water quality. They are used for a variety of purposes including assessment of chemical risk to the environment and humans, to justify billions of pounds spent on water quality improvements, and to identify sources of pollutants requiring intervention. With important financial and environmental intervention decisions being made based on output from computational models, we think it is incumbent on software developers and users to make underlying code and results available for scrutiny by the public. We are concerned that software widely used by public bodies and the water industry to assess the impact of pollution on waterways in the UK is not transparent, readily accessible or interoperable, limiting opportunities for independent scrutiny and development. Our view is that making software used to assess and predict water quality open source is a crucial step towards developing trust in modelling results and consequent decisions. Moreover, we suggest that by doing so, a larger pool of experts and motivated communities could contribute to the assessment and development of the toolset required to understand and address the freshwater contamination crisis. Recommendations for how software could be made widely accessible are provided.
Written Evidence
Question being addressed
We are providing evidence to address the question “How can water companies and other industries be more transparent about their impact on water quality?”1.The principal focus of our contribution is on transparency of computational tools widely used by government and industry to understand the source and fate of pollutants in waterways. This submission is particularly concerned with the computational tools used to apportion pollutants to their sources.
Expertise
We are Earth scientists based at Imperial College London and the University of Oxford. During the last fifteen years we have developed expertise in the development of computational tools and their application to a variety of problems in the Earth and environmental sciences. Directly relevant to this call for evidence is our work during a recent project funded by the Natural Environment Research Council2 developing and applying new computational methods to identify and apportion sources of pollutants in river networks from water quality data3
The need for source apportionment modelling and its challenges
Identifying sources of pollutants in waterways is crucial for targeted remediation and for enforcing the “Polluter Pays” Environmental principle4, for instance. However, doing so is challenging for fundamental scientific and logistical reasons. The principle scientific challenge is disentangling contributions of pollutants from diverse sources. All catchments can contain multiple sources of many contaminants. They can be point (e.g., wastewater outflows) and diffuse (e.g., agricultural run-off), be unknown and change in space and time. From a logistical perspective, direct observations (i.e. measurements of contaminant concentrations) are often unavailable—many sources are not or cannot be directly monitored—instead they are often estimated using computational models. For accurate source apportionment, contributions from different sectors (e.g., agriculture, road runoff) across a catchment must be quantified and associated uncertainties estimated. We note that in general, pollution apportionment is challenging, and the methods to do so are of interest to diverse stakeholders. For instance, in their Written Evidence to EAC’s previous Water Quality in Rivers Report5, the National Farmers’ Union explicitly mentioned concerns about how pollution is apportioned, especially in light of data misreporting from the water industry.
In our own work developing computational tools for pollution apportionment6,7,8, we have learned that source apportionment is by no means a straightforward modelling exercise. For instance, there are a variety of modelling choices that must be made, large numbers of model parameters must frequently be included, and there are often few data constraints available for calibration or validation. Consequently, we understand how easy it can be for models to be poorly calibrated, and over- or under-fit observed data. To enable independent assessment of confidence in results we think it is crucial that models are easily accessible and open source.
We note that understanding how chemicals are sourced, move through environments and their ultimate fate remain active areas of scientific research9 . For instance, evolution of chemical watchlists emphasises the need for continued research into the role of chemicals of emerging concern in the environment10 . Models developed to predict concentrations of specific contaminants in specific environments and climatic regimes may need significant revision, or be inappropriate, for predicting the fate of other contaminants. Validation using independent information is often crucial for developing confidence in results.
Popular computational tools: Uses, barriers and examples
Perhaps the most widely used computational tool for assessing the chemical state of waterways in the UK is the Source Apportionment Geographical Information System (SAGIS) model11. SAGIS was developed by UK Water Industry Research (UKWIR), a private research organisation sponsored by the Water Industry. SAGIS is widely used by the water industry as well as public bodies such as the Environment Agency, Natural Resources Wales, Natural England, and the Scottish Environment Protection Agency. A Research Excellence Framework 2021 Impact Case Study12 indicated that that model ‘enabled £4.0 billion investment to reduce chemical pollution entering UK rivers’. Moreover, according to the UKWIR website ‘SAGIS-SIMCAT is now firmly embedded within business-as-usual processes and will serve as the regulatory agencies’ primary catchment planning tool until at least 2027’13.
The SAGIS software, to the best of our knowledge, is not publicly available. To our knowledge, it is proprietary and the documentation required to understand how it works is contained in UKWIR research reports14 that must be purchased. To obtain the model and this documentation a license costing £600 must be obtained from UKWIR, at which point “UKWIR will provide contact details for the Environment Agency to provide you with a link to download the executables”15. To operate the model, further licenses for separate proprietary datasets are required16. Whilst an academic article17 describing broadly how SAGIS works was published in the academic literature over a decade ago, we understand that the model was been updated significantly since and no up-to-date documentation is, to our knowledge, available to the public without purchase.
In our view, financial and other barriers to accessing software, such as SAGIS, has unfortunate consequences. Whilst purchasing a software license for hundreds of pounds may be reasonable for large organisations (e.g., a water company), it is a prohibitive barrier to access for many (e.g., voluntary community groups, the general public). We think that it is unreasonable to expect stakeholders to have trust in outputs from a computational model if documentation describing how it works, and the code, cannot be inspected (without significant financial outlay). We suggest that accessibility and transparency regarding modelling assumptions and in the approaches used to generate results would build confidence and trust in subsequent decisions and recommendations.
Our experience includes one of us recently being asked by a community river group to review output from SAGIS modelling produced by the EA. The group in question were seeking independent expert opinion so that they could better understand implications and uncertainties of the model and interpret the results accordingly. Unfortunately, due to the limited public information regarding how SAGIS works, it was not possible to do so with confidence. Requested documentation was not provided by the EA, and instead they stated that it could be viewed only if a SAGIS license was purchased from UKWIR18. We discuss opportunities, as we see them, to having a more transparent and accessible approach to modelling pollution of waterways in the following paragraphs.
Opportunities from improved transparency and accessibility
Understanding outputs from code developed and used primarily by private organisations (e.g., water companies) and public bodies (e.g., the Environment Agency) would benefit from scrutiny by a broader pool of expert (e.g., academic) and motivated communities. We note that motivated and skilled members of the public have been able to make significant, highly technical, contributions to inform the state of English rivers. For example, Windrush Against Sewage Pollution’s use of machine learning (a form of AI) methods to detect untreated sewage discharges19. We suggest that by offering software up for general scrutiny, errors and shortcomings are more likely to be identified and resolved.
Recommendations
We recommend that all computational tools and models used for water quality assessment by public bodies and industry, for instance SAGIS, are made open source and free to access. These tools could be published according to emerging standards of software transparency (e.g., the FAIR principles: Findable, Accessible, Interoperable, Reusable20). We note that the UK Government’s Central Digital and Data Office already provides a ‘Technology Code of Practice’21, which includes recommendations such as ‘Publish your code and use open source software to improve transparency, flexibility and accountability22’ and ‘Build technology that uses open standards to ensure your technology works and communicates with other technology, and can easily be upgraded and expanded23’. We suggest such principles should be adopted.
An example of good practice that we are aware of is the “SPAtially Referenced Regression On Watershed attributes” (SPARROW) model, produced by the United States Geological Survey (USGS). SPARROW fulfils a very similar purpose to the SAGIS model previously discussed, assessing the impact of various pollutant sources on water quality in rivers. The USGS have made SPARROW available for free in multiple programming languages along with the source code, examples of use, and documentation. We suggest a similar approach ought to be adopted.
Finally, we note that researchers funded by UK Research and Innovation are given guidance on providing open access software and data24. This guidance includes depositing software in open access data portals such as the Environmental Information Data Centre. We suggest that assessment of the impact of industry on water quality would benefit if similar standards were applied. We emphasise here that we are suggesting that source code of software should be open access and easily accessible, not just the results obtained from running the software.
References
1 https://committees.parliament.uk/committee/62/environmental-audit-committee/news/201014/is-thegovernment-doing-enough-to-fix-the-uks-rivers-committee-takes-stock-of-water-quality-progress/
2 https://gotw.nerc.ac.uk/list_full.asp?pcode=NE%2FX010805%2F1&classtype=ENRIs&classification=Pollutio n+and+Waste
3 https://doi.org/10.1016/j.scitotenv.2024.172827
4 https://www.gov.uk/government/publications/environmental-principles-policy-statement/environmentalprinciples-policy-statement
5 https://committees.parliament.uk/writtenevidence/22433/pdf/
6 https://gotw.nerc.ac.uk/list_full.asp?pcode=NE%2FX010805%2F1&classtype=ENRIs&classification=Pollutio n+and+Waste
7 https://doi.org/10.31223/X5708M
8 https://doi.org/10.1016/j.scitotenv.2024.172827
9 https://www.ukri.org/what-we-do/browse-our-areas-of-investment-and-support/understanding-changes-inquality-of-uk-freshwaters/
10 https://watereurope.eu/european-commission-adopts-revised-surface-water-watch-list/
11 https://sagis.ukwir.org/sagis/welcome
12 https://results2021.ref.ac.uk/impact/d61134b8-b7dd-41a8-bf39-ebbb106be0d4?page=1
13 https://sagis.ukwir.org/sagis/welcome
14 https://ukwir.org/eng/topic-catalogue?object=36227da2-ddfc-4597-887a-0f4bb9f710c4
15 https://sagis.ukwir.org/sagis/contact-us
16 https://catchmentbasedapproach.org/learn/source-apportionment-gis-tool/
17 https://doi.org/10.1021/es401793e
18 Personal Communication with Environment Agency Water Quality Advisor - 10th April 2024
19 https://www.nature.com/articles/s41545-021-00108-3
20 https://www.nature.com/articles/s41597-022-01710-x
21 https://www.gov.uk/guidance/the-technology-code-of-practice#full-publication-update-history
22 https://www.gov.uk/guidance/be-open-and-use-open-source
23 https://www.gov.uk/government/collections/open-standards-for-government-data-and-technology
24 https://www.ukri.org/publications/ukri-open-access-policy/uk-research-and-innovation-open-access-policy/