Susan Krueger
Statistical disclosure controls for machine learning models
Krueger, Susan; Mansouri-Benssassi, Esma; Ritchie, Felix; Smith, Jim
Authors
Esma Mansouri-Benssassi
Felix Ritchie Felix.Ritchie@uwe.ac.uk
Professor in Economics
Jim Smith James.Smith@uwe.ac.uk
Professor in Interactive Artificial Intelligence
Abstract
Artificial Intelligence (AI) models are trained on large datasets. Where the training data is sensitive, the data holders need to consider risks posed by access to the training data and risks posed by the models that are released. The first problem can be considered solved: there are multiple tested solutions delivering secure access to sensitive data for research purposes. These include robust 'statistical disclosure control' (SDC) procedures for checking the confidentiality risk in outputs released from the secure environment. However, these SDC procedures are designed for statistical outputs. It is not clear how they relate to AI model specification created within the secure environment. Similarly, there is a small but growing literature on re-identification and other risks from AI models trained on personal data. However, this does not consider the operational circumstances which might limit opportunities for misuse. We bring these two fields together to consider • Is there any conceptual risk from releasing AI model specifications from a controlled environment? • If so, is there any practical risk? • If so, are there effective controls to minimise that practical risk without excessive cost or damage to the data/models? We show that there is certainly a theoretical risk, which also seems to have practical validity. There exist both statistical/technical controls to reduce risk, as well as operational controls which might be relevant for restricted environments. However, there remains a very large degree of uncertainty, including such fundamental questions as what exactly is 'disclosive' in ML models.
Citation
Krueger, S., Mansouri-Benssassi, E., Ritchie, F., & Smith, J. (2021). Statistical disclosure controls for machine learning models
Conference Name | 2021 Statistical Data Confidentiality Expert Meeting |
---|---|
Conference Location | Poznan |
Start Date | Dec 1, 2021 |
End Date | Dec 3, 2021 |
Acceptance Date | Jun 24, 2021 |
Online Publication Date | Oct 6, 2021 |
Publication Date | Oct 6, 2021 |
Deposit Date | Nov 23, 2021 |
Publicly Available Date | Mar 29, 2024 |
Public URL | https://uwe-repository.worktribe.com/output/8067227 |
Publisher URL | https://statswiki.unece.org/display/confid/Work+Session+on+Statistical+Data+Confidentiality+2021 |
You might also like
Disclosure control issues in complex medical data
(2023)
Presentation / Conference
Towards a comprehensive theory and practice of output SDC
(2023)
Presentation / Conference
Research data governance in low- and middle-income countries
(2023)
Presentation / Conference
Downloadable Citations
About UWE Bristol Research Repository
Administrator e-mail: repository@uwe.ac.uk
This application uses the following open-source libraries:
SheetJS Community Edition
Apache License Version 2.0 (http://www.apache.org/licenses/)
PDF.js
Apache License Version 2.0 (http://www.apache.org/licenses/)
Font Awesome
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2024
Advanced Search