Skip to main content

Research Repository

See what's under the surface

Advanced Search

Confidentiality and linked data

Ritchie, Felix; Smith, Jim

Authors

Jim Smith James.Smith@uwe.ac.uk
Professor in Interactive Artificial Intelligence



Contributors

Gentiana Roarson
Editor

Abstract

This chapter considers the confidentiality issues around linked data. It notes that the use and availability of secondary (adminstrative or social media) data, allied to powerful processing and machine learning techniques, in theory means that re-identification of confidential source data is likely in all types of releases.
In practice, there are barriers. Data linking is a complex and difficult process, and there are many things that could go wrong. However, this is less of a problem for a potential intruder, who is not concerned about re-identifying all data, but just enough to achieve his or her ends; the accuracy of the re-identification may not even be important if there is just a perception of poor confidentiality protection.
More importantly, focusing on the data-centred models misleads us into thinking "what can go wrong?" instead of "what will go wrong?". Aggregate statistics can be attacked to show hidden numbers of observations, but this does not necessarily disclose confidential information; aggregate statistics that could re-identify soruce data are not typically useful statistics. For the release of microdata, a user-centred perspective allows one to consider a range of non-statistical solutions which are both robust and fairly future-proof.
In summary, linked data does present a strong theoretical challenge to the protection of data, as statistical protection is outgunned by technology and software; but in practice a shift in focus to the evidence-based user-centred view shows that there are many directions for practical data protection to go.

Peer Reviewed Peer Reviewed
Pages 1-34
Series Title National Statistician's Quality Review
Book Title Privacy and Data Confidentiality Methods – a National Statistician’s Quality Review
APA6 Citation Ritchie, F., & Smith, J. Confidentiality and linked data. In G. Roarson (Ed.), Privacy and Data Confidentiality Methods – a National Statistician’s Quality Review, 1-34. Office for National Statistics
Keywords confidentiality, privacy, linked data, artificial intelligence, data mining, machine learning, data-centred, user-centred
Publisher URL https://gss.civilservice.gov.uk/guidances/quality/nsqr/privacy-and-data-confidentiality-methods-a-national-statisticians-quality-review/
;