2–6 Dec 2024
University of Applied Sciences of the Grisons (Pulvermühlestrasse 57)
Europe/Zurich timezone
Conference Hashtag: #EDDI2024

“You are what you eat”: Deploying a method for creating an accurate ML model for variable tagging using ELSST vocabularies

3 Dec 2024, 13:35
20m
Aula (University of Applied Sciences of the Grisons (Pulvermühlestrasse 57))

Aula

University of Applied Sciences of the Grisons (Pulvermühlestrasse 57)

Chur, Switzerland

Speakers

Jieun Jeong (Centre for socio-political data, Sciences Po, CNRS) Lucie MARIE (Centre for socio-political data, Sciences Po, CNRS)

Description

With more than 400 DDI documented datasets, Center for Socio-Political Data’s catalogue (CDSP) counts ten of thousands of variables - mainly quantitative survey data collected from structured questionnaires.
With the final goal to produce accurate and consistent data training material for a machine learning model (camemBERT), the CDSP’s engineers launched a working group for variable tagging using the French version of the European Language Social Science Thesaurus (ELSST) keywords.
Experimenting with machine learning for classifying data at the variable level, this paper evaluates the machines' capabilities to process and classify large datasets, while emphasising the accuracy and contextual understanding that human experts provide.
This presentation aims to provide feedback on the methodology developed for the human tagging process in order to minimise bias and provide a harmonised classification.

Primary authors

Jieun Jeong (Centre for socio-political data, Sciences Po, CNRS) Lucie MARIE (Centre for socio-political data, Sciences Po, CNRS) Mathieu Olivier (Centre for socio-political data, Sciences Po, CNRS)

Presentation materials

There are no materials yet.