Skip to main content

Research Repository

See what's under the surface

Advanced Search

Analyzing the disclosure risk of regression coefficients

Ritchie, Felix

Authors



Abstract

A major growth area in social science research this century has been access to highly sensitive confidential microdata, often via restricted-access remote facilities. These allow researchers highly unlimited access to manipulate the data but with checks for disclosure risk before the statistical results can be published. Effective output-based statistical disclosure control (OSDC) is therefore central to effective use of confidential microdata for research.
Multiple regression is a key anaytical tool for researchers, and so knowing whether multiple regression results are ‘safe’ for release is essential for research facilities. This is a relatively unexplored field; guidelines used by almost all restricted-access facilities reference an informal document from 2006, but more recent work suggests that problems may exist.
This paper demonstrates that linear regression coefficients show no substantive disclosure risks in realistic environments, and so should be considered as ‘safe statistics’ in the terminology of this field. Conflicting results in the literature reflect institutional perceptions rather than statistical differences, the confusion of statistical quality with disclosure risk, or the failure to identify the source of risk. The result has important implications for those responsible for providing research access to sensitive data.
The paper explores this result on simple linear regression models; more complex models are shown to be ‘safer’ subsets. Non-linear models pose slightly different problems, but this paper indicates a way such models may be tackled.

Journal Article Type Article
Publication Date 2019-08
Journal Transactions on Data Privacy
Print ISSN 1888-5063
Peer Reviewed Peer Reviewed
Volume 12
Issue 2
Pages 145-173
APA6 Citation Ritchie, F. (2019). Analyzing the disclosure risk of regression coefficients. Transactions on data privacy, 12(2), 145-173
Keywords privacy, confidentiality, statistical disclosure control, output SDC, principles-based, linear regression, safe statistics
Publisher URL http://www.tdp.cat/issues16/tdp.a303a18.pdf
Additional Information This is the accepted manuscript of an item currently in press and due to be published in Transactions on Data Privacy.

Files







You might also like



Downloadable Citations

;