2–6 Dec 2024
University of Applied Sciences of the Grisons (Pulvermühlestrasse 57)
Europe/Zurich timezone
Conference Hashtag: #EDDI2024

Metacurate-ML: Enhanced Data Curation - Automation of Disclosure Control Assessment

3 Dec 2024, 11:25
20m
Aula (University of Applied Sciences of the Grisons (Pulvermühlestrasse 57))

Aula

University of Applied Sciences of the Grisons (Pulvermühlestrasse 57)

Chur, Switzerland
Regular Presentation Metacurate-ML

Speaker

Deirdre Lungley

Description

Conceptual annotations and provenance can provide contextual information to inform a range of data processing activities. In this workstream we will be utilising the metadata generated in the earlier workstreams – the questions and response domains from the metadata extraction phase and the concorded variables from the conceptual comparison phase – to identify key variables, those that although are not sensitive in of themselves, have the potential to be disclosive if used in combination. This identification will be achieved using state-of-the-art text classification methods, which we will also use to identify such metadata as identifiers and weight variables. Rule-based classifiers will further interrogate the variable metadata to determine its classification hierarchy and level, e.g., a socio-economic variable may be coded using the ONS NS-SeC classification hierarchy at the 8-class analytic level.

This enhanced metadata can then be combined with the data itself to provide an enhanced curation platform – one which allows our data curators to evaluate and mitigate the disclosure risk of a dataset with relative ease. The resulting platform will be powered by metadata and microdata stored using the DDI-CDI schema, utilising such aspects as its variable cascade.

Primary authors

Deirdre Lungley Ivan Evdokimov (University of Essex) Jon Johnson (CLOSER, UCL) Paul Bradshaw (Scottish Centre for Social Research (ScotCen)) Suparna De (University of Surrey)

Presentation materials

There are no materials yet.