Statistical disclosure controls for machine learning models

Krueger, Susan; Mansouri-Benssassi, Esma; Ritchie, Felix; Smith, Jim

Statistical disclosure controls for machine learning models

Krueger, Susan; Mansouri-Benssassi, Esma; Ritchie, Felix; Smith, Jim

Authors

Susan Krueger

Esma Mansouri-Benssassi

Felix Ritchie Felix.Ritchie@uwe.ac.uk
Professor in Economics

Jim Smith James.Smith@uwe.ac.uk
Professor in Interactive Artificial Intelligence

Abstract

Artificial Intelligence (AI) models are trained on large datasets. Where the training data is sensitive, the data holders need to consider risks posed by access to the training data and risks posed by the models that are released. The first problem can be considered solved: there are multiple tested solutions delivering secure access to sensitive data for research purposes. These include robust 'statistical disclosure control' (SDC) procedures for checking the confidentiality risk in outputs released from the secure environment. However, these SDC procedures are designed for statistical outputs. It is not clear how they relate to AI model specification created within the secure environment. Similarly, there is a small but growing literature on re-identification and other risks from AI models trained on personal data. However, this does not consider the operational circumstances which might limit opportunities for misuse. We bring these two fields together to consider • Is there any conceptual risk from releasing AI model specifications from a controlled environment? • If so, is there any practical risk? • If so, are there effective controls to minimise that practical risk without excessive cost or damage to the data/models? We show that there is certainly a theoretical risk, which also seems to have practical validity. There exist both statistical/technical controls to reduce risk, as well as operational controls which might be relevant for restricted environments. However, there remains a very large degree of uncertainty, including such fundamental questions as what exactly is 'disclosive' in ML models.

Presentation Conference Type	Conference Paper (published)
Conference Name	2021 Statistical Data Confidentiality Expert Meeting
Start Date	Dec 1, 2021
End Date	Dec 3, 2021
Acceptance Date	Jun 24, 2021
Online Publication Date	Oct 6, 2021
Publication Date	Oct 6, 2021
Deposit Date	Nov 23, 2021
Public URL	https://uwe-repository.worktribe.com/output/8067227
Publisher URL	https://statswiki.unece.org/display/confid/Work+Session+on+Statistical+Data+Confidentiality+2021

Operationalising ‘safe statistics’: The case of linear regression (-0001)
Preprint / Working Paper

Addressing the human factor in data access: Incentive compatibility, legitimacy and cost-effectiveness in public data resources (-0001)
Preprint / Working Paper

Resistance to change in government: Risk, inertia and incentives (-0001)
Preprint / Working Paper

Access to sensitive data: Satisfying objectives rather than constraints (2014)
Journal Article

Evidence-based, context-sensitive, user-centred, risk-managed SDC planning: Designing data access solutions for scientific use (2015)
Presentation / Conference Contribution

Statistical disclosure controls for machine learning models

Krueger, Susan; Mansouri-Benssassi, Esma; Ritchie, Felix; Smith, Jim

Authors

Abstract

You might also like

Downloadable Citations