Skip to main content

Research Repository

Advanced Search

Risk of disclosure when reporting commonly used univariate statistics

Derrick, Ben; Green, Elizabeth; Ritchie, Felix; White, Paul

Risk of disclosure when reporting commonly used univariate statistics Thumbnail


Authors

Paul White Paul.White@uwe.ac.uk
Professor in Applied Statistics



Abstract

When basic or descriptive summary statistics are reported, it may be possible that the entire sample of observations is inadvertently disclosed, or that members within a sample will be able to work out responses of others. Three sets of univariate summary statistics that are frequently reported are considered: the mean and standard deviation; the median and lower and upper quartiles; the median and minimum and maximum. The methodology assesses how often the full sample of results can be reverse engineered given the summary statistics. The R package uwedragon is recommended for users to assess this risk for a given data set, prior to reporting the mean and standard deviation. It is shown that the disclosure risk is particularly high for small sample sizes on a highly discrete scale. This risk is reduced when alternatives to the mean and standard deviation are reported. An example is given to invoke discussion on appropriate reporting of summary statistics, also giving attention to the box and whiskers plot which is frequently used to visualise some of the summary statistics. Six variations of the box and whiskers plot are discussed, to illustrate disclosure issues that may arise. It is concluded that the safest summary statistics to report is a three-number summary of median, and lower and upper quartiles, which can be graphically displayed by the literal ‘boxplot’ with no whiskers.

Citation

Derrick, B., Green, E., Ritchie, F., & White, P. (2022). Risk of disclosure when reporting commonly used univariate statistics. In Lecture Notes in Computer Science (119-129). https://doi.org/10.1007/978-3-031-13945-1_9

Conference Name Privacy in Statistical Databases
Conference Location Paris, France
Start Date Sep 21, 2022
End Date Sep 23, 2022
Acceptance Date Jun 17, 2022
Online Publication Date Sep 14, 2022
Publication Date 2022-10
Deposit Date Aug 22, 2022
Publicly Available Date Sep 15, 2023
Publisher Springer Verlag
Volume 13463 LNCS
Pages 119-129
Series ISSN 0302-9743
Book Title Lecture Notes in Computer Science
Chapter Number 9
ISBN 9783031139444
DOI https://doi.org/10.1007/978-3-031-13945-1_9
Keywords SDC, Statistics, Disclosure, Control, Summary, Quartile, Boxplot
Public URL https://uwe-repository.worktribe.com/output/9752939
Publisher URL https://www.springer.com/gp/computer-science/lncs

Files






Related Outputs



You might also like



Downloadable Citations