Ben Derrick Ben.Derrick@uwe.ac.uk
Senior Lecturer
Ben Derrick Ben.Derrick@uwe.ac.uk
Senior Lecturer
Elizabeth Green Elizabeth7.Green@uwe.ac.uk
Senior Lecturer in Economics
Felix Ritchie Felix.Ritchie@uwe.ac.uk
Professor in Economics
Paul White Paul.White@uwe.ac.uk
Professor in Applied Statistics
When basic or descriptive summary statistics are reported, it may be possible that the entire sample of observations is inadvertently disclosed, or that members within a sample will be able to work out responses of others. Three sets of univariate summary statistics that are frequently reported are considered: the mean and standard deviation; the median and lower and upper quartiles; the median and minimum and maximum. The methodology assesses how often the full sample of results can be reverse engineered given the summary statistics. The R package uwedragon is recommended for users to assess this risk for a given data set, prior to reporting the mean and standard deviation. It is shown that the disclosure risk is particularly high for small sample sizes on a highly discrete scale. This risk is reduced when alternatives to the mean and standard deviation are reported. An example is given to invoke discussion on appropriate reporting of summary statistics, also giving attention to the box and whiskers plot which is frequently used to visualise some of the summary statistics. Six variations of the box and whiskers plot are discussed, to illustrate disclosure issues that may arise. It is concluded that the safest summary statistics to report is a three-number summary of median, and lower and upper quartiles, which can be graphically displayed by the literal ‘boxplot’ with no whiskers.
Presentation Conference Type | Conference Paper (published) |
---|---|
Conference Name | Privacy in Statistical Databases |
Start Date | Sep 21, 2022 |
End Date | Sep 23, 2022 |
Acceptance Date | Jun 17, 2022 |
Online Publication Date | Sep 14, 2022 |
Publication Date | 2022-10 |
Deposit Date | Aug 22, 2022 |
Publicly Available Date | Sep 15, 2023 |
Publisher | Springer Verlag |
Volume | 13463 LNCS |
Pages | 119-129 |
Series ISSN | 0302-9743 |
Book Title | Lecture Notes in Computer Science |
Chapter Number | 9 |
ISBN | 9783031139444 |
DOI | https://doi.org/10.1007/978-3-031-13945-1_9 |
Keywords | SDC, Statistics, Disclosure, Control, Summary, Quartile, Boxplot |
Public URL | https://uwe-repository.worktribe.com/output/9752939 |
Publisher URL | https://www.springer.com/gp/computer-science/lncs |
The Risk of disclosure when reporting commonly used univariate statistics
(365 Kb)
PDF
Licence
http://www.rioxx.net/licenses/all-rights-reserved
Publisher Licence URL
http://www.rioxx.net/licenses/all-rights-reserved
Copyright Statement
This is the author’s accepted manuscript. The final published version is available here: https://doi.org/10.1007/978-3-031-13945-1_9
Test statistics for comparing two proportions with partially overlapping samples
(2015)
Journal Article
Preliminary testing: The devil of statistics?
(2019)
Journal Article
An outlier in an independent samples design
(2018)
Presentation / Conference Contribution
To preliminary test or not to preliminary test, that is the question
(2018)
Presentation / Conference Contribution
About UWE Bristol Research Repository
Administrator e-mail: repository@uwe.ac.uk
This application uses the following open-source libraries:
Apache License Version 2.0 (http://www.apache.org/licenses/)
Apache License Version 2.0 (http://www.apache.org/licenses/)
SIL OFL 1.1 (http://scripts.sil.org/OFL)
MIT License (http://opensource.org/licenses/mit-license.html)
CC BY 3.0 ( http://creativecommons.org/licenses/by/3.0/)
Powered by Worktribe © 2025
Advanced Search