Description
This session groups four papers around the topic of controlled vocabularies (CVs), their current possibilities, practical use cases and of experiments with thematic vocabularies.
While the semantic web has been around for over twenty years, practical and sustainable implementations have been thin on the ground. During 2024, DDI Controlled Vocabularies and the DDI-CDI ontology have been made available, for the first time, as persistently resolvable linked open data. This presentation digs into the underlying cloud infrastructure, the rationale for creating it and...
With more than 400 DDI documented datasets, Center for Socio-Political Data’s catalogue (CDSP) counts ten of thousands of variables - mainly quantitative survey data collected from structured questionnaires.
With the final goal to produce accurate and consistent data training material for a machine learning model (camemBERT), the CDSP’s engineers launched a working group for variable tagging...
This paper is on the development of the KDK Thesaurus, a CV partly based on ELSST, used for the topical discovery of interview materials. The presentation discusses the workload for such a project, its sustainability and future perspectives of similar projects, incl. ONTOLISST and touch on the issue of economical and ethical considerations of metadata curation on smaller levels of datasets...
The talk introduces the new 2-year ONTOLISST project starting in December 2024, funded by the first OSCARS Cascading grant call. The project will develop a simplified multilingual ontology (LiSST) to describe social science research data, create a corpus of social science metadata, and research whether and how NLP tools can help with (semi)automated (meta)data curation. The aim is to better...