Machine learning models in trusted research environments - Understanding operational risks

Ritchie, Felix; Tilbrook, Amy; Cole, Christian; Jefferson, Emily; Krueger, Susan; Mansouri-Benssassi, Esma; Rogers, Simon; Smith, Jim

doi:10.23889/ijpds.v8i1.2165

Machine learning models in trusted research environments - Understanding operational risks

Ritchie, Felix; Tilbrook, Amy; Cole, Christian; Jefferson, Emily; Krueger, Susan; Mansouri-Benssassi, Esma; Rogers, Simon; Smith, Jim

Authors

Felix Ritchie Felix.Ritchie@uwe.ac.uk
Professor in Economics

Amy Tilbrook

Christian Cole

Emily Jefferson

Susan Krueger

Esma Mansouri-Benssassi

Simon Rogers

Jim Smith James.Smith@uwe.ac.uk
Professor in Interactive Artificial Intelligence

Abstract

IntroductionTrusted research environments (TREs) provide secure access to very sensitive data for research. All TREs operate manual checks on outputs to ensure there is no residual disclosure risk. Machine learning (ML) models require very large amount of data; if this data is personal, the TRE is a well-established data management solution. However, ML models present novel disclosure risks, in both type and scale.ObjectivesAs part of a series on ML disclosure risk in TREs, this article is intended to introduce TRE managers to the conceptual problems and work being done to address them.MethodsWe demonstrate how ML models present a qualitatively different type of disclosure risk, compared to traditional statistical outputs. These arise from both the nature and the scale of ML modelling.ResultsWe show that there are a large number of unresolved issues, although there is progress in many areas. We show where areas of uncertainty remain, as well as remedial responses available to TREs.ConclusionsAt this stage, disclosure checking of ML models is very much a specialist activity. However, TRE managers need a basic awareness of the potential risk in ML models to enable them to make sensible decisions on using TREs for ML model development.

Journal Article Type	Article
Acceptance Date	Oct 30, 2023
Online Publication Date	Dec 14, 2023
Publication Date	Dec 14, 2023
Deposit Date	Oct 31, 2023
Publicly Available Date	Jan 3, 2024
Journal	International Journal of Population Data Science
Electronic ISSN	2399-4908
Publisher	Swansea University
Peer Reviewed	Peer Reviewed
Volume	8
Issue	1
Article Number	2165
DOI	https://doi.org/10.23889/ijpds.v8i1.2165
Keywords	Artificial intelligence, Confidentiality, Machine Learning, Data Enclave, Trusted Research Environment, Output Checking, Disclosure
Public URL	https://uwe-repository.worktribe.com/output/11404305
PMID	38414545

Files

Machine learning models in trusted research environments – understandingoperational risks (1.3 Mb)
PDF

Licence
http://creativecommons.org/licenses/by/4.0/

Publisher Licence URL
http://creativecommons.org/licenses/by/4.0/

Operationalising ‘safe statistics’: The case of linear regression (-0001)
Preprint / Working Paper

Addressing the human factor in data access: Incentive compatibility, legitimacy and cost-effectiveness in public data resources (-0001)
Preprint / Working Paper

Resistance to change in government: Risk, inertia and incentives (-0001)
Preprint / Working Paper

Access to sensitive data: Satisfying objectives rather than constraints (2014)
Journal Article

Evidence-based, context-sensitive, user-centred, risk-managed SDC planning: Designing data access solutions for scientific use (2015)
Presentation / Conference Contribution

Machine learning models in trusted research environments - Understanding operational risks

Ritchie, Felix; Tilbrook, Amy; Cole, Christian; Jefferson, Emily; Krueger, Susan; Mansouri-Benssassi, Esma; Rogers, Simon; Smith, Jim

Authors

Abstract

Files

You might also like

Downloadable Citations