17th European DDI Users Conference, Budapest

Starts
Ends
(Timezone - Europe/Budapest)
Lecture Hall (ELTE Centre for Social Sciences (Research Documentation Centre))

Lecture Hall

ELTE Centre for Social Sciences (Research Documentation Centre)

1097 Budapest, Tóth Kálmán utca 4
Alina Danciu (Sciences Po, Center for Socio-Political Data (CDSP)), Jon Johnson (CLOSER, UCL)
Description

Building and supporting the DDI Community

EDDI is the annual conference for users of DDI, a suite of metadata specifications for the social, economic, and behavioral sciences.

EDDI meets once a year in the first week December, bringing together users of DDI from archives, studies, official statistics, commercial organisations, government and non-governmental organisations and more

Program Committee
    • 08:30 16:00
      An introduction to DDI for small organisations Lecture Hall

      Lecture Hall

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4
      • 08:30
        Registration 30m
      • 09:00
        An introduction to DDI for small organisations 30m

        The purpose of this all day meeting is to provide an overview for organisations such as archives and research institutes who have restricted budgets or limited personnel but would like to provide or know more about how to provide access to data using the DDI standard.

        Speaker: Alina DANCIU (Sciences Po, Center for Socio-Political Data (CDSP))
      • 09:30
        Case Studies from Existing Users 1h

        Running an Archive in Slovenia (ADP)
        Supporting the Development of Archives in Europe (CESSDA)

        Speakers: Alen Vodopijevec (CESSDA ERIC), Maja Dolinar (Slovenian Social Science Data Archives (ADP), Faculty of Social Sciences, University of Ljubljana)
      • 10:30
        Coffee 15m
      • 10:45
        Case Studies from Existing Users (II) 1h 45m

        Establishing a new archive (CROSSDA)
        Supporting Research Visibility (ELTE RDC)

        Speakers: Marijana Glavica (University of Zagreb Faculty of Humanities and Social Sciences), Judit Gárdos
      • 12:30
        Lunch 45m
      • 13:15
        DDI Alliance - supporting an Open Metadata Standard 45m

        Developing and maintaining Metadata Standards
        Supporting Users of of DDI

        Speakers: Jared Lyle (ICPSR, University of Michigan), Catherine Yuen (ISER), Chloe Hertrich (CDSP, SciencesPO)
      • 14:00
        Tools and software to support DDI 2h

        NADA Tools
        Dataverse
        Using R at the Romanian Data Archive
        Open Data Format with Stata, R & Python (Multiplatform Distribution powered by DDI-Codebook at low cost)

        Speakers: Julie Lenoir (Progedo), Lucie MARIE, Adrian Dusa (University of Bucharest), Knut Wenzig (DIW Berlin/SOEP)
    • 09:00 16:00
      DDI Alliance Scientific Board K11

      K11

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4
    • 13:00 16:00
      Workshops & Tutorial: Workshop K13-14

      K13-14

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4
      • 13:00
        Colectica: Collaborative Data Documentation, Curation, Team Approvals, and Web Discovery Platform 20m

        This workshop explores the expanded Colectica ecosystem, including the Colectica Datasets application for viewing, improving, and publishing data files, as well as new web-based tools for seamless collaborative editing and publication. Running on Windows, macOS, and Linux, Colectica supports formats like DDI, RDF, Parquet, SPSS, SAS, Stata, R, and CSV, while the web tools enable real-time team collaboration.

        Participants will learn to:

        • View: Inspect, visualize, and analyze datasets to gain quick insights.
        • Curate: Apply automated, human-in-the-loop techniques for cleaning, annotating, and documenting datasets with DDI metadata.
        • Collaborate: Use web-based features to edit datasets together in real time and manage team approval processes.
        • Publish: Discover methods to share, export, convert, and archive data in various formats for easy dissemination and preservation.

        Through hands-on activities, attendees will experience both desktop and web functionalities. This workshop is ideal for researchers, data managers, and professionals seeking to advance their data management and collaboration skills and learn about Colectica’s latest tools.

        Speaker: Jeremy Iverson (Colectica)
    • 17:00 19:00
      Guided city tour: Golden Age - Essential Budapest 2h In front of the Church "Budapest-Belvarosi, Nagyboldagasszony, Foplebania-templom, Budapest 1056, Marcius 15. ter.

      In front of the Church "Budapest-Belvarosi, Nagyboldagasszony, Foplebania-templom, Budapest 1056, Marcius 15. ter.

      Wondering through the Golden Era of Budapest with the highlights of the capital's architecture and the brief history of Hungarians. An overview of Budapest, how it became the Pearl of the river Danube and one of the fastest developing cities of the 19-20th century.

      The main highlights of the tour:
      - Danube Promenade
      - Chain Bridge
      - St. Stephen's Basilica
      - Liberty Square
      - House of Parliament
      - Heroes' Square.

      Meeting point: In front of the Church "Budapest-Belvarosi, Nagyboldagasszony, Foplebania-templom, Budapest 1056, Marcius 15. ter. (https://maps.app.goo.gl/BUoam2i2KWA7YvsCA)

    • 19:15 21:45
      Informal dinner & drinks 2h 30m Inner City Restaurants area

      Inner City Restaurants area

      At own expense (please register by 24th November). Location: three different inner city restaurants.
      Meeting point:

      Ffor those who take part in the city tour as well: at 5 PM in front of the church "Budapest-Belvarosi, Nagyboldagasszony, Foplebania-templom”, Budapest 1056, Marcius 15. ter. (https://maps.app.goo.gl/BUoam2i2KWA7YvsCA)

      For those who do not take part in the city tour: at 7 PM in the restaurant „Terv Presszo since 1954 - Goulash & Langosh Bar”, Budapest 1051, Nádor u. 19. (https://maps.app.goo.gl/FDchLVkABUFRSdw38).

    • 08:45 09:15
      Conference Opening Lecture Hall

      Lecture Hall

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4
      Conveners: Alina DANCIU (Sciences Po, Center for Socio-Political Data (CDSP)), Jon Johnson (CLOSER, UCL)
      • 09:05
        Welcome from Host 10m
        Speaker: Zsolt Boda (Director General of ELTE Centre for Social Sciences)
    • 09:15 10:15
      Keynote I Lecture Hall

      Lecture Hall

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4
      Convener: Alina DANCIU (Sciences Po, Center for Socio-Political Data (CDSP))
      • 09:15
        AI-Ready data for the new research lifecycle 1h

        The research lifecycle is being updated across many scientific fields to account
        for the adoption of AI in scientific exploration and analysis. To better
        support these changes, AI Readiness is emerging to define good data practices
        in combination with AI adoption. AI Readiness, thus, includes the documentation
        of data, models, and workflows for their use in AI throughout the research
        lifecycle, from data preparation and data quality to data access. This talk
        will introduce efforts toward AI readiness and AI-ready data and metadata,
        providing research examples from the Computational Social Sciences and
        Humanities Lab at the Barcelona Supercomputing Center.

        Speaker: Mercè Crossas (Barcelona Supercomputing Center)
    • 10:15 10:45
      Coffee 30m !st Floor Gallery

      !st Floor Gallery

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4
    • 10:45 12:00
      DDI Overview Lecture Hall

      Lecture Hall

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4
      Convener: Hilde Orten (Sikt - Norwegian Agency for Shared Services in Education and Research)
      • 10:45
        A lightning talk session from the Scientific Board 1h 15m

        2025 has been the most productive year yet for the Alliance with the number of initiatives, work groups, and products continuing to expand. The Scientific Board maintains a watchful eye over this hive of activity and this set of lightning talks is intended to give you bite-size insights into the latest developments, from tools to standards and all stops in between. We’ll begin with a whistle-stop tour of the differences between the Executive and Scientific Boards and the Working Groups and the way the Alliance manages its work. It’s an exciting time for Quali aficionados and the recently created Qualitative Data Working Group embeds this discipline into future DDI product roadmaps. The release of DDI-CDI 1.0 early this year, as well as being a significant milestone for the cross-domain applicability of DDI, has engendered a series of compelling AI-related activities related to upscaling and automating curation pipelines. A reinvigorated developer group has been busy enabling data professionals to work more seamlessly with DDI standards and to lower the barrier to entry for using DDI products. The next version of Lifecycle - 4.0 - will be fully model-driven for the first time and this talk will explain why you should care. Interoperability is the buzzword of the moment and we now have a Publishing and Access WG which focuses on making DDI a first-class API citizen. The updated DDI website prompted a re-examination of how we make resources available and easily findable for the community. If the phrase “Information Architecture" makes you want to self-harm, this talk will explain why it should give you a dopamine rush instead. Finally, as part of the Alliance’s strategic commitment to expansion, we wrap up with some lightning talks on launching DDI into the ISO and W3C spotlight and where we’re going next. The question is, are you ready for Total DDI?
        1) The Scientific Board for dummies - Darren Bell
        2) The new Qualitative Data WG – Noemi Cabrera Betancort
        3) DDI-CDI and AI - Slava Tykhonov
        4) Tools: from Nesstar to Nectar - Olof Olsson and Oliver Hopt
        5) Training - Amber Leahey
        6) DDI-L 4.0 - Dan Smith
        7) Publishing and Access WG - Knut Wenzig
        8) Why the Alliance needs an Information Architecture - Darren Bell
        9) DDI Alliance and W3C collaboration – Franck Cotton
        10) DDI as an ISO standard - Wendy Thomas
        11) The DDI Alliance - future vision - Steve McEachern
        12) Q&A

        Speakers: Darren Bell (UK Data Service), Noemi Betancort (RDC Qualiservice, University of Bremen), Slava Tykhonov (CODATA), Olof Olsson (Swedish National Data Service (SND)), Oliver Hopt, Amber Leahey (Scholars Portal), Dan Smith (Colectica), Knut Wenzig (DIW Berlin/SOEP), Franck Cotton (Making Sense), Wendy Thomas, Steven McEachern (UK Data Service, University of Essex)
    • 12:00 13:00
      Lunch 1h !st Floor Gallery

      !st Floor Gallery

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4
    • 13:00 14:15
      FAIR Data Lecture Hall

      Lecture Hall

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4
      Convener: Becky Oldroyd
      • 13:00
        Implementation of FAIR principles in Czech science: the search for metadata 25m

        Czech science is undergoing a significant transformation. What used to be only voluntary sharing of scientific data has, with the arrival of European programs and later also within domestic grant funding schemes, become an obligation—an obligation that is even embedded in national legislation. The state is also making substantial investments in supporting repositories and developing a domestic environment compatible with European and global standards and practices, particularly the FAIR principles.

        However, this entire process is very complex and not without its flaws. Using the example of the adoption of a national metadata standard—where the adoption of DDI was considered but ultimately not implemented—I would like to illustrate, from the perspective of a long-time administrator of a data archive in the field of social sciences (and someone involved in the FAIRification processes of Czech science), the complications that accompany this process and how they are being addressed.

        Speaker: Tomáš Čížek (tomas.cizek@soc.cas.cz)
      • 13:25
        Developing and testing harmonisation workflows for comparative survey data using DDI – a WorldFAIR case study 25m

        DDI Lifecycle and DDI-CDI provide significant capabilities for the integration and harmonisation of content across datasets. As part of the recently completed WorldFAIR project lead by CODATA, a team from the Australian Data Archive (ADA) and Sikt lead a work package to examine ways for improvement of FAIR practices in the management of harmonised content in cross-national social surveys.

        This work was completed in three stages – a review of comparative survey data management practices at Sikt and ADA; development of a human and machine-actionable workflow for harmonisation of social surveys (the Cross-Cultural Survey Harmonisation workflow – CCSH) that leverages DDI and other standards; and a proof-of-concept test of the CCSH workflows leveraging services available at ADA and Sikt through their respective Colectica registries.

        Overall, the pilot demonstrated that the CCSH workflow forms a viable foundation for standardising and progressively automating the process of survey data harmonisation. However the pilot also showed that there is still a significant degree of human manual input required – and thus has more work to do to be truly FAIR. We thus provide recommendations for data managers and the Alliance as to how more integration and automation might be achieved in future.

        Speaker: Steven McEachern (UK Data Service, University of Essex)
      • 13:50
        DDI and Scientific Knowledge Graphs Interoperability Framework - CESSDA’s Plans and work so far in the OSTrails Project 25m

        Interoperability of metadata across research infrastructures remains a major challenge for advancing Open Science. Aligning domain-specific standards with domain-agnostic frameworks is crucial for enabling cross-disciplinary discovery and FAIR data reuse.

        The OSTrails project addresses this by developing the Scientific Knowledge Graphs Interoperability Framework (SKG-IF) - a core model and set of API endpoints for metadata harmonisation. As part of its contribution, CESSDA has mapped its metadata standard, DDI Codebook 2.5, to SKG-IF. This mapping supports exposing CESSDA metadata via SKG-IF endpoints, enabling broader interoperability across systems. API development will continue in 2026, so the current endpoints are preliminary.

        As a Horizon Europe initiative supporting EOSC, OSTrails contributes to building open, reusable, and interoperable data infrastructures. CESSDA’s work helps make social science data more FAIR and integrated into the wider EOSC ecosystem.

        This presentation will showcase CESSDA’s contributions to OSTrails, with a focus on the DDI-to-SKG-IF mapping, current API work, and FAIR benchmarking activities. The latter involved defining community-specific FAIR tests, evaluating digital objects, and providing feedback to improve FAIRness - demonstrating a practical path toward greater interoperability and quality in research data.

        Speakers: Markus Tuominen (Finnish Social Science Data Archive, Tampere University), John Shepherdson (CESSDA ERIC)
    • 13:00 14:15
      METACURATE-ML (I) K11-12

      K11-12

      ELTE Centre for Social Sciences (Research Documentation Centre)

      The ESRC Future Data Services Program has funded CLOSER, University of Surrey, UK Data Service and ScotCen to progress ways in which information from longitudinal social science surveys can be improved in both quality and throughput to enhance the challenges of understanding and utilising these data.

      This session will cover a demonstration of a prototype pipeline, utilising source PDFs, into DDI-Lifecycle metadata and its transformation into DDI-CDI.

      Convener: Sebők Miklós
      • 13:00
        METACURATE-ML: Taking Metadata Uplift from Research to Prototype 25m

        The presentation will demonstrate a prototype of a pipeline which is able to take a PDF questionnaire, generate DDI-Lifecycle and utilise that for classification to a vocabulary which can then be leveraged to narrow down and identify variables that are deemed sensitive for further analysis and processing to support curation to a suitable access environment.
        This pipeline is built on an AWS architecture - successive AWS Lambda functions are orchestrated using an AWS StepFunctions workflow.

        Speaker: Deirdre Lungley
      • 13:25
        METACURATE-ML: Generalising the extraction of questionnaires to DDI-Lifecycle 25m

        The canonical source for most large scale surveys remains for the past and the foreseeable future the ‘paper’ version of the questionnaire as a PDF. These contain the essential information on the fielding of the survey, the questions asked including response options, ordering and the logic associated with filtering to create DDI-Lifecycle metadata. However, there is no standard layout and formatting in these source documents which limits the scalability of heuristic programming approaches on OCR’ed text.

        The presentation will outline an approach to creating DDI-Lifecycle from semi-structured text in social science questionnaires using a combination of text-layout large language models (LLMs) and knowledge graph construction approaches, including methods from digital signal processing.

        Speaker: Chandresh Pravin (University of Surrey)
      • 13:50
        METACURATE-ML: Semantic approaches to data disclosure assessment 25m

        Standard approaches to data disclosure assessment have utilised mathematical approaches such as SDC-Micro and manual assessment and modification of data to meet the ‘disclosure’ metrics.

        The ingest pipeline, presented, results in Data Privacy Vocabulary (DPV) concepts assigned at variable level and socio-economic classification concepts assigned at category level, held in DDI-CDI. This enhanced metadata together with a highly performant SDC-Micro implementation at UKDS enables rapid automated iteration of and assessment of disclosure risk, reducing the time and effort required for validation by human assessors. This presentation will not only briefly touch on the Data Product Builder developed at the UKDS demonstrating this, but also show how these functions exposed as tools by an MCP server can be used by our curators to explore dataset disclosure through locally hosted AI Assistant clients.

        Speaker: Deirdre Lungley
    • 14:15 15:45
      Challenges of Interoperability Lecture Hall

      Lecture Hall

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4

      The diversity of metadata structures in European social science archives – ranging from differences in granularity and provenance to divergent terminology – is a major obstacle to thematic interoperability. This heterogeneity curtails the FAIRness of social science data and hampers their effective reuse across repositories, languages and disciplinary borders. The ONTOLISST project (funded by the OSCARS call of the European Union) tackles this problem not by merely cataloguing the difficulties, but by research driven, AI supported exploration of concrete solutions. The session will present the project’s twofold strategy, showcase early results, and open a round table on the particular difficulties that arise when trying to achieve interoperability in social‑science digital repositories.

      Convener: Lucie MARIE
      • 14:15
        Semantic Interoperability in Social Science Repositories 20m

        Semantic interoperability is an especially challenging dimension for posing the requirement of the unambiguity of meanings. In a technical sense, interoperability is about the connectivity of data, which requires the use of compatible data formats and standardized solutions in data transmission. In social science repositories, as opposed to the real-time exchange of data, the emphasis lies in the long-term preservation of data making them available for secondary analysis. Considering content, the smooth connection of data is ensured by way of describing data using abundant documentation and appropriate metadata schemes. DDI standards greatly enable putting this goal into practice. At the same time, while DDI Codebook provides for efficient and easy-to-use tools, only the more difficult and time-consuming metadata schemes contained in DDI Life Cycle promote the generation of sufficient metadata when it comes to specific but significant use cases in social sciences, such as the archiving of longitudinal surveys. Drawing on ten semi‑structured interviews with repository managers the presentation explores the various objectives and achievements of large social science repositories that deal with survey data aiming at semantic interoperability. It investigates the significance and stakes of interoperability with respect to the purposes and financing, maintenance and operations, clients and uses of the institutions.

        Speakers: Judit Gárdos, Róza Vajda (ELTE Centre for Social Sciences), Dr Timea Venczel (ELTE Centre for Social Sciences)
      • 14:35
        Using NLP Methods in Social Sciences - Experience and Opportunities 25m

        In the Ontolisst project, our contribution was threefold. First, during data processing and the creation of the LiSST thesaurus, we applied clustering and topic modeling to uncover patterns in large datasets, using generative AI for cluster labeling. Second, we validated LiSST by processing extensive sets of social science paper keywords and applying semi-supervised clustering for codebook validation. Finally, we developed automated labeling by fine-tuning XML-RoBERTa models to classify new survey questions and variables into the established codebook. In our presentation, we will briefly introduce these techniques and share key results, highlighting both our practical experience and the opportunities they open for social science research.

        Speaker: Barbara Babolcsay (ELTE Institute of Political Science)
      • 15:00
        Roundtable on some challenges of interoperability: Can we be on the same page? 45m

        Whether because of technical difficulties or mismatched interests, interoperability is somewhat sidelined among FAIR principles. The stakes are obviously different according to the nature of projects determining the ways of data collection. Yet a thrust towards standardization and establishing connections is experienced broadly in today’s digital landscape. The discussion explores the challenges of rendering research data interoperable, especially focusing on the creation of appropriate thematic metadata

        Speakers: André Jernung (Swedish National Data Service (SND)), Benjamin Beuster (Sikt - Norwegian agency for shared services in education and research), Knut Wenzig (DIW Berlin/SOEP), Róza Vajda (ELTE Centre for Social Sciences)
    • 14:15 15:45
      Questionnaires K11-12

      K11-12

      Convener: Oliver Hopt
      • 14:15
        PhOrM – Phrase Organizer for Multilingual Documents: Efficient Creation of Communication Materials Using Metadata 30m

        Communication materials are essential for engaging panel participants in survey research, but their creation is often time-consuming and resource-intensive—especially in multilingual contexts. Panel studies require both recurring content and wave-specific updates. The complexity increases significantly when surveys are administered in multiple languages.
        With PhOrM, we propose a solution to reduce this effort in the long term. Since communication materials can be fully described using metadata—similar to metadata-based survey instrument programming—it becomes possible to automate their creation using reusable text components. We have developed an infrastructure that allows centralized documentation of metadata. Texts are broken down into sentence-level components, tagged, and archived (currently in Excel). Our aim is to deconstruct documents from the past ten years into their smallest components. While layout information is not yet included, all textual content is already covered.
        The main advantage lies in the reusability of structured text elements, which significantly reduces time and costs—particularly in translation workflows. In the future, PhOrM will enable automated suggestions and AI-assisted generation of new communication materials. Although currently tailored to our internal needs and not yet DDI-compliant, we plan to expand PhOrM for broader use across different contexts and stakeholders.

        Speaker: Loreen Beier (Leibniz Institute for Educational Trajectories (LIfBi))
      • 14:45
        Using DDI and Colectica to create a European Question Bank 30m

        The CESSDA European Question Bank (EQB) project, supported by CESSDA and participating service providers, aims to build up a rich database of survey questions in multiple languages. The question bank allows questionnaire designers to identify existing fielded survey questions and their translations, and allows researchers to search for and discover questions and data of interest. The source questionnaire metadata is provided using DDI Codebook and Lifecycle.

        Participating service providers pay into the consortium to cover the annual Colectica licence fees, and in return are expected to contribute questionnaires in their national languages. Each partner within the EQB consortium must comply with the requirements, which include establishing an OAI-PMH endpoint which will serve the DDI XML files.

        To import the data into Colectica, we developed an importer CLI tool. This tool allows EQB to leverage CESSDA's pre-existing harvesting and validation infrastructure written for the CESSDA Data Catalogue. It also normalises the metadata in a manner compatible with Colectica, and uses Colectica's official CLI utility to perform the import.

        This presentation will address the various DDI-related technical requirements, as well as the associated challenges and solutions.

        Speakers: Mr Brian Kleiner (FORS), Matthew Morris (CESSDA)
      • 15:15
        Translating metadata: Converting a feature complete NEPS-questionnaire into a DDI-compliant format 30m

        The Leibniz Institute for Educational Trajectories (LIfBi) runs an infrastructure for questionnaire metadata initially used for the National Educational Panel Study (NEPS). With NEPS Starting Cohort 8 (SC8), which started in 2022, this metadata is already available during the field preparation processes. Based on that we developed software tools that largely automate the questionnaire programming process, eliminating the need manually programming of the instruments itself. However, the NEPS metadata infrastructure is not yet DDI-compliant, which has significantly hampered the provision of these automation tools for questionnaire programming to other researchers, organizations, or studies.
        With a sample questionnaire that is feature-complete for previous NEPS-SC8 questionnaire scenarios, we would like to present our ideas for a tool that can translate questionnaire metadata between NEPS- and DDI-compliant metadata formats. Using a sample questionnaire that fully covers all previous NEPS-SC8 questionnaire scenarios, we would like to present our ideas for a tool that can translate questionnaire metadata between NEPS- and DDI-compliant metadata formats. Such a translation tool could be adapted relatively easy to non-NEPS metadata formats used by other researchers or organizations in order to generate a minimalist DDI-compliant questionnaire metadata format without already operating a fully DDI-compliant metadata infrastructure.

        Speaker: Mr Simon Dickopf (Leibniz Institute for Educational Trajectories (LIfBi))
    • 15:45 16:15
      Coffee 30m !st Floor Gallery

      !st Floor Gallery

      ELTE Centre for Social Sciences (Research Documentation Centre)

    • 16:15 17:30
      DDI Training Lecture Hall

      Lecture Hall

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4
      Convener: Catherine Yuen (ISER)
      • 16:15
        Supporting meaningful engagement and implementation of questionnaire metadata: Progress from the DDI Q2 Working Group 25m

        The DDI standard provides a structured framework for documenting questions and questionnaires, enabling users to identify exactly what was asked of respondents to generate the data (i.e. the provenance of variables). The DDI Questions and Questionnaires (Q2) Working Group comprises 15 members representing the international DDI community. Its aim is to provide guidance for creating and meaningfully engaging with question and questionnaire metadata. To achieve this, three subgroups have been established: Guides and Best Practices, Tools, and Real Life Examples.

        • The Guides and Best Practices subgroup is developing practical guidance that extends on existing learning materials to support meaningful engagement with and implementation of DDI question and questionnaire metadata.
        • The Tools subgroup is collating and evaluating questionnaire design and documentation tools and their interoperability with DDI, to help users choose the tool that best meets their needs.
        • The Real Life Examples subgroup is gathering exemplary documentation to be included in the practical guidance.

        This presentation will provide an overview of the Q2 Working Group’s aims, progress, and future plans. We also welcome feedback from the DDI community to ensure our work reflects and supports user needs.

        Speakers: Becky Oldroyd, Romain Tailhurat (Making Sense), Lucie MARIE
      • 16:40
        Q2 Working Group results - Tool support for questionnaire development and conduction 25m

        As part of the Questions and Questionnaires working group, we started to create an overview of tool support for questionnaire development and questionnaire conduction. The pivot point of this overview is the support of questionnaire related metadata elements to be found in DDI-L
        For this purpose, we started filling a matrix for which one dimension is the list of question information we want to see, and the other dimension is made up by the tools we found. Each tool section will then contain information about the general support of a given element and the possibilities and formats of structured im- and export.
        Our presentation will focus on presenting the picture that we have so far and a discussion with the audience about an extension of the matrix in both dimensions.

        Speaker: Oliver Hopt
      • 17:05
        DDI for beginners: free, bilingual training resources to start making your data FAIR with DDI 25m

        The FAIR (Findable, Accessible, Interoperable, Reusable) principles largely rely on metadata and metadata standards like the Data Documentation Initiative (DDI) for their implementation. More specifically, DDI facilitates the reuse and replicability of data as it provides a comprehensive metadata schema, including information on the data itself, which is important when it comes to assessing data quality and data provenance. DDI is mainly used by a community of people who are familiar with (meta)data management and documentation, and needs to engage more with the less experienced public to increase the uptake of people using the standard and, in turn, create a broader culture of FAIR data practices.

        In France, the two national open science plans and the recent developments they triggered in the data sharing practices brought DDI to the attention of a lot of « new » data professionals. Even though many DDI training materials (slide decks, recorded webinars) exist, there is a need for « real » beginner training materials. The existent resources are often considered as too abstract, complex and difficult to understand by first-time DDI-users who are not familiar with metadata management in general.

        The French FAIRwDDI project (work package 2) was designed to fulfil this gap. Specialists from CDSP France and CLOSER UK got together for two week-long sprints and created beginner DDI training materials to showcase at EDDI. The main question they had in mind was: what are the minimum DDI metadata requirements needed to make data reusable? They also wanted to ensure those interacting with the training materials had a foundational understanding of key metadata and DDI concepts. A designer was also a part of the project, helping to make the materials more user-friendly.

        This talk will present the DDI training material, that will be made available for the community.

        Speakers: Alina Danciu (Sciences Po, Center for Socio-Political Data (CDSP), CNRS), Becky Oldroyd ((CLOSER, UCL)), Jon Johnson (CLOSER, UCL), Lucie Marie (Sciences Po, Center for Socio-Political Data (CDSP), CNRS)
    • 16:15 17:30
      METACURATE-ML (II) K11-12

      K11-12

      ELTE Centre for Social Sciences (Research Documentation Centre)

      The ESRC Future Data Services Program has funded CLOSER, University of Surrey, UK Data Service and ScotCen to progress ways in which information from longitudinal social science surveys can be improved in both quality and throughput to enhance the challenges of understanding and utilising these data.

      This session will cover the semantic challenges that remain after structural alignment of questions in DDI-Lifecycle and the opportunities that can be leveraged using a common vocabulary such as ELSST.

      Convener: Franck Cotton (Making Sense)
      • 16:15
        METACURATE-ML: Approaches to managing semantic equivalence between questions 25m

        Longitudinal and comparative research relies heavily on repeated measures and harmonisation of data, DDI-Lifecycle has strong support for this through the variable cascade, however, scaling such activity has proven difficult to put into practice.

        Social science (and other!) researchers approach the development of questions from a range of perspectives, even where the response options are (nearly) identical, the phrasing and orchestration of the questions can vary considerably. This places limits on the utility of standard text comparison techniques (e.g. TF-IDF, Bag-of-Words).

        The presentation will outline the strengths and weaknesses of the different approaches taken during the project to address this problem. This includes problem decomposition which breaks the problem into sub-problems to mitigate the insensitivity of unsupervised methods to nuanced question relationships. Additionally, we will cover techniques for fine-tuning generative large language models for concept extraction and injecting the results into a subsequent retrieval model.

        Speaker: Justina Li (University of Surrey)
      • 16:40
        METACURATE-ML: Approaches to conceptual equivalence across languages 25m

        Cross-lingual alignment of nuanced sociological concepts can form the basis of comparing cross-national studies in different languages and harmonising longitudinal studies, by leveraging knowledge from social science taxonomies such as ELSST. Aligning sociological concepts is challenging due to cultural context-dependency, linguistic variation, and data scarcity. Traditional approaches for cross-lingual alignment require extensive parallel data in different languages.

        This presentation will outline a method for the multilingual alignment of sociological concepts. The approach posits that word embeddings (numerical vector representations of text) of domain-specific texts can be decomposed into 2 vectors: a domain knowledge vector that is language agnostic and should be the same across languages, and a language-specific feature vector (that is language specific and can be learned). The domain knowledge vector is trained primarily on English data structured by the ELSST hierarchy and captures core sociological semantics. The method will be demonstrated on a cross-lingual sociological concept retrieval task across 10 languages.

        Speaker: Suparna De (University of Surrey)
      • 17:05
        METACURATE-ML: Challenges of aligning a vocabulary with ELSST 25m

        A vocabulary, if it is to be useful, faces three main challenges, coverage, achievability and utility.
        With over 3,400 topics, its coverage is enormous, but assigning topics with such large numbers is not straight-forward at scale, and as such its utility as is, for discovery is thus diminished. It does however have a significant role to play as an anchor to which vocabularies could align such as this which hold information that both extend and refine the concepts in ELSST. Its multi-linguality is a crucial aspect for leveraging interoperability across European languages.

        The CLOSER vocabulary was developed as a mechanism for discovery of both social and biomedical metadata held in DDI-Lifecycle from UK longitudinal studies.However, it lacked good definitions and included terms that were specific to particular traditions in biomedical science, psychology and sociology which has proved to be difficult to implement across different scientific disciplines.

        The presentation will outline the alignment work has sought to address these deficiencies by aligning with ELSST where terms exist, and aggregating them into more useful ‘super’ topics for ease of discovery and classification, allowing CLOSER Terms to be compared to ELSST in a formal way such as SKOS. In addition, clarification of definitions includes ‘what it is not’, to guide human annotators, and provide additional information to improve ML classification models.

        Speaker: Jon Johnson (CLOSER, UCL)
    • 18:30 22:00
      Conference Dinner: Hotel Astoria Hotel Astoria

      Hotel Astoria

    • 08:45 09:00
      Welcome Lecture Hall

      Lecture Hall

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4
    • 09:00 10:00
      Keynote II Lecture Hall

      Lecture Hall

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4
      Convener: Jon Johnson (CLOSER, UCL)
      • 09:00
        The role of research data management in the academic ecosystem 1h

        Over the past 15 years, data management has received increasing attention among researchers and research policy agencies alike. Beyond the loss of trust triggered by questionable research practices (QRPs), the rapid expansion of data sources and new analytical methods has further underscored the need for professional data stewardship. While the field has developed significantly and matured institutionally, many of the systemic issues that made rigorous data management essential persist.
        I argue that a key reason lies in the incentive structures that have increasingly encouraged QRPs in recent years. This lecture proposes a reimagined academic ecosystem which could decrease the incentives for QRPs. In such a system, data management would no longer be treated as a support service. Instead, the scholarly publications about the creation and curation of accessible datasets would constitute one of the independent and mandatory academic outputs related to the research process.

        Speaker: Béla Janky (Dept. of Sociology and Communication, Budapest University of Technology and Economics (BME) & Computational Social Science Research Group, Centre for Social Sciences, Budapest)
    • 10:00 10:30
      Coffee 30m !st Floor Gallery

      !st Floor Gallery

      ELTE Centre for Social Sciences (Research Documentation Centre)

    • 10:30 12:00
      DDI Specifications K11-12

      K11-12

      ELTE Centre for Social Sciences (Research Documentation Centre)

      Convener: Wendy Thomas
      • 10:30
        DDI Adoption Metrics - Insights from a KonsortSWD Short Project 30m

        DDI metadata standards are widely used to describe especially tabular data in depth, including their columns/variables. According to re3data.org, a global registry of research data repositories, they are the most prevalent standards with these capabilities. Additionally, re3data.org identifies OAI-PMH as the most widely adopted protocol for harvesting such metadata, with many endpoints available at re3data.org. Building on this, Wenzig and Han (2024, https://doi.org/10.29173/iq1116) recently analyzed over 250,000 data records in the DDI Codebook format from various sources. Their methods can be adapted for continuous monitoring of metadata usage and availability. Such an approach could help new users identify where and how other institutions implement DDI standards, offering concrete starting
        points for further research and community engagement. A project funded by KonsortSWD – NFDI4Society is advancing these efforts with the following key deliverables:

        • Development of metrics on DDI standard adoption, including the number
          of institutions using DDI, number of institutions providing DDI
          metadata through standardized methods, and statistics about the
          availability of records in the different DDI standards (DDI-Codebook,
          DDI-Lifecycle, DDI-CDI).
        • Publication of datasets containing raw data for these metrics from
          each survey crawl.
        • An experimental dashboard displaying these metrics alongside links to
          individual metadata records and repositories.

        This talk will provide an inside to the project’s results, which serves two immediate goals:

        1. Visualizing the adoption of DDI metadata standards.
        2. Encouraging institutions that have not yet made their metadata
          accessible to do so, enhancing their visibility within the community.
        Speaker: Knut Wenzig (DIW Berlin/SOEP)
      • 11:00
        Adaptive Reference Metadata: Leveraging DCTAP and SSSOM for DDI Profiles and Mappings 30m

        In the context of managing statistical data, organizations encounter a range of metadata standards, including variants of the Data Documentation Initiative (DDI) such as Codebook, Lifecycle, and Cross-Domain Integration (CDI), as well as Dublin Core, DCAT, DataCite, SIMS, and bespoke schemas. Colectica has implemented a software framework that employs Dublin Core Tabular Application Profiles (DCTAP) and the Simple Standard for Sharing Ontology Mappings (SSSOM) to address these challenges.

        This presentation explores how DCTAP facilitates the development of user or organization tailored reference metadata profiles, which specify mandatory and optional elements, data types, and controlled vocabularies. These profiles can be applied to entities such as datasets or variables. Furthermore, SSSOM supports the alignment of equivalent URI-based terms across multiple DCTAP-derived profiles, promoting consistent terminology reuse, enabling subsequent transformation into diverse output formats and metadata standards. This terminology alignment allows the creation of a minimal editor interface and supports exporting the stored metadata in a variety of formats and standards.

        By reconciling established DDI metadata standards with institutional requirements, this approach enhances profile creation and sharing, leading to better data interoperability, efficiency, and scalability, and offering a practical framework for metadata-driven software development. This presentation will demonstrate real-world applications and future potential for standardized yet flexible metadata profiles.

        Speaker: Dan Smith (Colectica)
      • 11:30
        Using DDI as the metadata language for research data 30m

        At GESIS - Leibniz Institute for the Social Sciences, we have used DDI from the beginning. Mainly for data archiving, but recently, for more stages in the research data lifecycle, the DDI-Codebook and DDI-Lifecycle standards have become the language for documenting and re-using survey questionnaires and datasets for the social sciences. DDI is also being considered for documenting other data sources beyond the traditional surveys, such as digital behavioural data and web tracking data.
        The data archiving is currently in the process of migrating documentation from applications which were developed in-house to the Colectica software components. After introducing the process for data documentation and publication with Colectica for new studies, older studies were migrated to the new software. Several tools accomplish different functions, e.g., study-level documentation, PID registration, data file management, and question and variable documentation. During the change of workflows, older and newer tools are used in parallel, which results in the need for clear interfaces.
        DDI has become the language used to define the metadata needs and specify the integration of these tools. Metadata production is more complicated and challenging during the time of transition, but will be more stable and reliable in the future. This presentation will show the current workstate and highlight challenges and improvements for managing metadata across different lifecycle steps at GESIS.

        Speaker: Wolfgang Zenk-Möltgen (GESIS - Leibniz Institute for the Social Sciences)
    • 10:30 12:00
      Health Data Lecture Hall

      Lecture Hall

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4
      Convener: Jon Johnson (CLOSER, UCL)
      • 10:30
        Exploring Metadata Commonalities Across Restricted Health Data Sources in Canada 20m

        Metadata is not an inherent characteristic of restricted data, which limits its ability to be found and used. To better understand discoverability and accessibility of restricted data, this study reviewed restricted health data sources to determine how they describe their datasets and access procedures, what descriptive commonalities exist across data sources, and to what extent the commonalities we found can be accommodated within existing metadata schemas. This project includes analysis from three datasets: the first dataset compiles dataset metadata commonalities that were identified from 48 Canadian restricted health data sources. The second dataset compiles request access and access process metadata commonalities extracted from the same 48 data sources. The third dataset maps metadata commonalities of the first dataset to existing metadata standards including DataCite, DDI-Lifecycle and Codebook, DCAT, and DATS. This mapping exercise was completed to determine whether metadata used by restricted data sources aligned with existing standards for research data for improved discovery in the larger FAIR scientific data ecosystem. Read full article (https://doi.org/10.7191/jeslib.907)

        Speaker: Amber Leahey (University of Toronto)
      • 10:50
        FAIR in practice: Building a Metadata System for the Constances Cohort with DDI-Lifecycle 20m

        Following the international movement towards open science, the Constances cohort – known to provide the research community with high quality health and medical data - started in 2025 an ambitious project to build a metadata system leveraging the FAIR principles and supporting the creation, update and dissemination of FAIR data and metadata. DDI-Lifecycle has been chosen as the core modeling framework to realize this vision.

        The presentation will highlight the strategic principles guiding the project: activate the metadata whenever and wherever possible, keep the users’ needs at the center of the solution and promoting openness (both in terms of open science and open source).

        We will also explore key challenges in building this system:

        • Representing longitudinal studies with DDI-Lifecycle; and the limits of variables to concepts mapping;
        • Translating an existing business model into DDI-Lifecycle, and the implications of implementation choices;
        • Encouraging adhesion to DDI and transfer knowledge to Constances’ data managers and stakeholders.

        Finally, we will provide an overview of the project’s midterm roadmap and milestones, including the future use of complementary standards such as VTL or DCAT to enhance the documentation across all phases of the Constances’ data lifecycle.

        Speakers: Romain Tailhurat (Making Sense), Sofiane Kab (Inserm)
      • 11:10
        Free the concepts 15m

        Variables and concepts are at the center of the DDI Lifecycle model. Variables are key to describe the data in its different states. Concepts help understanding the meaning of the data and its links to information in the outside world; concepts are also pivotal for implementing FAIR principles, specifically the knowledge representation and vocabulary principles for interoperability.

        Nevertheless, the practical implementation in Lifecycle of the variable / concept duo is hampered by a strict zero to one relation: a variable may have a reference to at most one concept! In our experience, we found this a very hard limit that we are trying to lift.

        This presentation will:

        • describe a case study: when documenting variables from the data
          collection of a health cohort, we find that most of these variable
          labels refer to several concepts that need to be represented;
        • list possible solutions with the standard as it is - by leveraging
          existing elements or attributes like IncludesConceptReference - or
          using other standards (like SKOS or I-ADOPT)
        • and also consider evolutions of the standard, including of course
          relaxing the 0 to 1 constraint.

        Finally, we will discuss how to continue this conversation with DDI users and maintainers.

        Speaker: Romain Tailhurat (Making Sense)
      • 11:25
        Challenges in documenting health research data using the DDI standard 15m

        Challenges in documenting health research data using the DDI
        standard
        Documenting data across the wide spectrum of health research disciplines requires
        dealing with a diversity of nomenclatures, study designs, and reporting standards. The
        “France Recherche en Santé Humaine” (FReSH) catalog addresses this challenge by
        providing access to descriptions of individual data from scientific studies in France,
        covering cohort studies, clinical trials, health-related social sciences surveys, and
        registries, raising significant complexities in terms of metadata harmonization and
        interoperability.
        After identifying ongoing initiatives and engaging with stakeholders, FReSH adopted DDI
        Codebook standard as a founding for describing variables and study methodologies.
        The FReSH team engaged the research community through a series of workshops,
        which made it possible to collect user needs. This process made it possible to identify
        discrepancies between the structure and terminologies defined in DDI and the specific
        needs of health research.
        The presentation will focus on the challenges encountered in balancing domain-
        specific needs with international standards, and on the implications for harmonization
        and reuse of metadata. Rather than presenting a finalized solution, the goal is to open a
        space for discussion within the community about applying DDI in health research and
        identifying possible paths forward for interoperability and reusability.

        Speaker: Alessandro Morichetta (INSERM)
    • 12:00 12:45
      Lunch 45m 1st Floor Gallery

      1st Floor Gallery

      ELTE Centre for Social Sciences (Research Documentation Centre)

    • 12:45 13:30
      Posters Lecture Hall

      Lecture Hall

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4
      Convener: Judit Gárdos
      • 12:45
        A user-informed approach to metadata training: Insights from a qualitative research project. 45m

        Metadata plays a key role throughout the data lifecycle, enabling researchers to discover, understand, and reuse their own and others’ data. Despite its importance, metadata is rarely included in university courses, and there is little formal training on the topic, thus engagement with metadata standards such as DDI becomes challenging. As such, the utility of metadata and its role in supporting FAIR (Findable, Accessible, Interoperable, Reusable) data practices is not fully realised in the research community.

        Existing metadata training typically reflects what metadata experts think the audience’s needs and knowledge gaps are. To better align with the audience’s requirements, CLOSER - the UK’s interdisciplinary partnership of leading social and biomedical longitudinal population studies - is developing metadata training directly informed by users. CLOSER is conducting qualitative research via interviews with UCL master’s and PhD students to explore early career researchers’ understanding of metadata, their experiences using and/or creating it, and their awareness of FAIR and metadata’s role in FAIR.

        This poster will share preliminary findings from the interviews and discuss how these insights are directly informing the development of CLOSER’s metadata training, ensuring it is effectively tailored to and resonates with its intended audience.

        Speaker: Claudia Alioto (CLOSER, UCL)
      • 12:45
        Developing a sustainable training resource for understanding metadata. 45m

        For a number of years, CLOSER has been organising and delivering metadata training events related to, or specifically on, DDI. The major challenge has been a “knowledge gap” on what metadata actually is and why it is important to managing and sharing data, especially with the advent of FAIR, which is itself not well understood.

        Over the last year, CLOSER has been developing a training resource (openly available on GitHub), which seeks to address these issues. The training breaks subjects into Introductory and Foundational modules, which can act as base knowledge for a wide range of audiences - post-graduate researchers, junior data managers and others - to understand the need for, and lay the basis for, engaging with DDI and other metadata standards.

        Speakers: Kate Reed (CLOSER, UCL), Sarah White (CLOSER, UCL)
      • 12:45
        Improving metadata consistency in longitudinal research through controlled vocabulary realignment. 45m

        Controlled vocabularies are essential for enabling FAIR (Findable, Accessible, Interoperable, Reusable) data. CLOSER Discovery – the UK’s most comprehensive research tool for longitudinal population studies (LPS) – contains question-and variable-level metadata for 13 leading UK LPS, held in DDI-Lifecycle. Each question and variable in CLOSER Discovery is mapped to a topic from CLOSER’s two-level controlled vocabulary, initially developed using a combination of relevant Medical Subject Headings (MeSH) and European Language Social Sciences Thesaurus (ELSST) terms. Level one contains broad high-level topics such as Health behaviour, and level two contains more granular sub-topics such as Sleep and Smoking. Each topic has a name, code, and short description.

        CLOSER are currently refining our controlled vocabulary in alignment with ELSST. This involves mapping each topic to an ELSST term(s), providing comprehensive definitions and examples, removing duplicate topics, restructuring topic levels, and creating new topics where needed. These improvements aim to support more accurate and consistent topic mappings within and between studies, improving the findability and reusability of questions and variables in CLOSER Discovery.

        This poster will outline CLOSER’s controlled vocabulary realignment process, highlight key changes to the controlled vocabulary, and showcase our new guidance enabling users to map questions and variables to topics with greater accuracy and consistency.

        Speaker: Becky Oldroyd
      • 12:45
        Mastering the datacite API with a pure HTML/JS tool 45m

        For more than a decade, GESIS was offering DOI registration for data sets at datacite through the dara portal and several DDI driven dataportals have used dara to assign persistent identifiers to their content.
        By the end of this year, this technical service will be discontinued while GESIS will continue offering access to datacite and consulting about the usage of datacite fabrica and its API as a replacement.
        Nevertheless, the convenience features, the technical dara implementation had offered might be missed by institutions that had to change their DOI registration processes. To fill this gap, we like to present a simple tool for an easy, form-based interaction with the datacite API. The tool has no further technical dependencies than a current web browser and is available as open source.
        The simplicity of the code makes it some kind of best practice guide for integrating DOI registration into the DDI tooling of data portals. This will also include the mapping fo DDI content to datacite metadata fields.

        Speaker: Oliver Hopt
      • 12:45
        Recent Advances in Applying Artificial Intelligence to Metadata Creation in the Social Science Japan Data Archive: Extracting Metadata from Social Surveys Using Large Language Models 45m

        To improve the effectiveness of metadata production in Japanese for social surveys, the Social Science Japan Data Archive (SSJDA) at the University of Tokyo's Institute of Social Science has built a metadata extraction method using OpenAI's API. Since 1998, SSJDA has manually created metadata, and since 2021 has followed the DDI-Codebook standard, but the increasing number of data deposits now requires more efficient methods.
        At last year’s EDDI poster session, we reported that Japanese metadata could be generated with a certain level of accuracy by designing OpenAI prompts using Python.
        We have made two key updates since then. First, we further developed the new GUI application by modifying the previous CUI version, improving accessibility for users. Second, we offer two options for users: cloud-based and local LLM. Although the local LLM does not yet match the quality of the cloud-based version, it has still produced highly consistent metadata with expert-created metadata, significantly improving processing time and cost efficiency compared to manual creation.
        SSJDA will continue to enhance development using more advanced LLMs and evaluate their integration into the metadata creation workflow. In the future, we plan to release the source code of the developed application on GitHub and consider distributing it to SSJDA depositors.

        Speakers: Koichi Iriyama (Institute of Social Science, The University of Tokyo), Rantaro Nasu (Institute of Social Science, The University of Tokyo)
      • 12:45
        ResiMETA: Building an Open-Access FAIR Database for Trajectory-Based Resilience Research 45m

        Psychological resilience research is rapidly expanding, but diverse concepts, heterogeneous study designs, inconsistent reporting, and missing metadata standards limit data reuse and robust evidence synthesis. To address these challenges, we introduce ResiMETA, a continuously updated open-access database for trajectory-based resilience research that systematizes evidence from longitudinal studies on responses to stressor exposure and psychosocial resilience factors. The database compiles aggregated study-level data and metadata, including population characteristics, study designs, mental health outcomes, modeled trajectories, and resilience factors. For data harmonization, we developed categorization schemes for key variables and an ordinal rating scheme synthesizing evidence from heterogeneous statistical models. Data extraction and organization follow a customized metadata scheme informed by existing frameworks. ResiMETA is documented according to FAIR principles and openly accessible via the Open Science Framework (OSF: https://osf.io/xcwtk/). Currently, ResiMETA includes 344 primary studies with ongoing updates. We will present the ResiMETA metadata architecture and demonstrate its potential to enhance evidence synthesis and foster collaboration in resilience research. Although not yet aligned with DDI standards, ResiMETA offers opportunities for future integration, particularly in standardizing psychological variables using controlled vocabularies. We invite engagement with the DDI community to explore how metadata standards can advance evidence synthesis in psychology and related disciplines.

        Speaker: Svenja Mrugalla (Leibniz Institute for Resilience Research (LIR))
      • 12:45
        The Role of Metadata in CATI for a Public opinion Survey 45m

        Year 2017, People of Bangladesh were facing many problems due to electricity shortage. The overall objective of this survey was to gather opinions on electricity supply, multiple electricity consumption levels, satisfaction with electricity provision, perceptions of quality and electricity-saving strategies in Bangladesh. ‘Opinion survey on Power supply’ has been successfully completed. Thirty thousand (sample) Bangladeshis participated in the survey, where digitalization of data collection in CATI, was the first time in Bangladesh.
        Therefore, a new statistical metadata repository, called Metabase (metadata + database), is being set up and relies on two containers. One dedicated to the questionnaires, variables and their codification is called a Colectica Repository. Information is stored in an international DDI format along with Nesstar Publisher. Others host all related metadata, described in more appropriate GSBPM models. A more innovative idea was to link those two repositories together to use metadata throughout the process, from needs analysis to statistical results and assessment, in order to develop metadata driven processes.
        The results suggest that using mobile phones for short and frequent surveys can generate high-quality data faster and more cheaply on a per-survey basis than traditional methods and can be a valuable complement to more expensive conventional surveys.

        Speaker: Mr Aman Ullah Aman (Labcom Technology)
    • 13:30 15:00
      DDI Use Cases Lecture Hall

      Lecture Hall

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4
      • 13:30
        Updating DDI Use Cases Together: A Hands-On Session 1h 15m

        As part of the ongoing DDI website refresh, we’ve introduced a new Use Cases section to showcase practical applications of DDI -- including:

        • Creating a codebook (in various products)
        • Documenting data
        • Building a Data Catalog

        We are reusing existing content that is now outdated and in need of revision.

        This interactive session invites attendees to help update and improve these use cases using a standardized template. Participants will work collaboratively to edit content in real time and will also have the opportunity to volunteer for follow-up work after the conference to finalize the updates. Whether you're experienced with DDI or just getting started, your input will help ensure these examples remain relevant, accurate, and useful to the community.

        Speakers: Jared Lyle (ICPSR, University of Michigan), Wendy Thomas
    • 13:30 15:00
      Data Quality K11-12

      K11-12

      ELTE Centre for Social Sciences (Research Documentation Centre)

      Convener: Maja Dolinar (Slovenian Social Science Data Archives (ADP), Faculty of Social Sciences, University of Ljubljana)
      • 13:30
        ADP 2.0: Modernizing Technical Infrastructure with Dataverse and e-Storage applications 30m

        In 2025, the Slovenian Social Science Data Archives (ADP) introduced two new applications into regular use: Dataverse as its research data repository and the e-Storage app as its digital preservation system.

        Dataverse replaces the legacy Nesstar platform and streamlines ingest services, simplifying the deposit and dissemination of research data. Features, including an integrated deposit agreement and expanded metadata fields with controlled vocabularies, enhance usability and compliance. ADP also introduced a self-deposit model, offering a lighter curation option for datasets with lower quality or limited reuse potential.

        The e-Storage app supports professional digital preservation practices by ensuring systematic provenance tracking, authenticity checks, traceability, and version control. API integration between Dataverse and e-Storage, currently under development, will enable automated transfer of both data and metadata, further optimizing ingest and publication workflows. Both Dataverse and the e-Storage app are built around DDI standards, ensuring consistent metadata structure and interoperability across ADP's data services.

        A redesigned ADP website complements these services, offering improved access to resources, training materials, updated content, and a refreshed visual identity. Metadata are openly available for harvesting via the OAI-PMH protocol, including the DDI Codebook format, supporting discoverability.

        This presentation will outline ADP's modernization path, highlight the challenges of migration and integration, and demonstrate how ADP is building sustainable, user-friendly, and future-oriented data services that meet the evolving needs of research data management.

        Speaker: Gregor Zibert (UL-FDV-ADP)
      • 14:00
        Data anomalies and their impact on EU income poverty indicators: Evidence from Hungarian EU-SILC data 30m

        Poverty indicators such as the at-risk-of-poverty rate and the relative poverty gap are widely used across Europe, but they are only as reliable as the microdata behind them. In our analysis of EU-SILC data from 2005–2023, Hungary stands out with striking anomalies. In several survey waves, unusually many incomes cluster exactly at the poverty threshold—for instance, in 2020, 14% of single-person households reported the same annual income of €3,996. Since these values often fall just above the cut-off, official poverty rates may underestimate the true scale of poverty.

        We also observe implausible patterns in gross and disposable income, with net income exceeding gross income in up to 29% of Hungarian records. As external researchers, we cannot evaluate internal procedures at the statistical office, but our findings—based solely on public microdata—underline the importance of transparent metadata and documentation.

        We would welcome ideas and discussion that illustrate how DDI-based research data management would be useful in identifying such data quality anomalies prior to its release.

        Speakers: Annamária Tátrai (ELTE Eötvös Loránd Tudományegyetem), András Gábos (TÁRKI Social Research Institute)
      • 14:30
        Forecasting Instrument Timings (FIT): Using Metadata to Forecast Questionnaire Durations 30m

        Knowing how long a survey takes to complete is important for respondents, researchers, and survey practitioners alike. It is important for respondents because their time is a valuable and limited resource; for researchers and survey practitioners because research has shown how instrument duration is linked to response rates and respondent burden (Edwards et al., 2009; Eslick & Howell, 2001), while also being a significant contributor to the final costs of survey fieldwork. However, predicting survey duration before fieldwork remains a challenge.
        Traditionally, survey duration is estimated through pre-testing, which is both time-consuming and costly. Research shows that item characteristics — such as length (Couper & Kreuter, 2013), type (open vs. closed), number of response options (Yan & Tourangeau, 2008), or position (DeCastellarnau, 2018; Olson et al., 2020) — influence completion time. This information is retrievable from a metadata infrastructure, though not yet DDI-compliant, for the National Educational Panel Study (NEPS).
        We focus initially on self-administered surveys, comparing our metadata-based duration estimates with actual field processing times from surveys already conducted. We present initial results, assess the accuracy of these estimates, and discuss their potential extension to interviewer-based survey modes, aiming to provide a tool for metadata-based estimations of questionnaire durations.

        Speaker: Mr Simon Dickopf (Leibniz Institute for Educational Trajectories (LIfBi))
    • 15:00 15:15
      Coffee 15m 1st Floor Gallery

      1st Floor Gallery

      ELTE Centre for Social Sciences (Research Documentation Centre)

    • 15:15 16:45
      Data Sharing Lecture Hall

      Lecture Hall

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4
      Convener: Jared Lyle (ICPSR, University of Michigan)
      • 15:15
        Project BLU – Building Linking Using 30m

        The rapid development of digital infrastructures demands new approaches to managing born-digital data and enabling knowledge valorisation. Rather than constructing another digital Library of Alexandria, the SODHA-team is developing Project BLU, a framework for building a distributed information network. Drawing inspiration from Web3 principles—decentralisation, interoperability, and user-driven discovery—the project explores methods for metadata exchange between archives.
        This project engages with established metadata standards like the Data Documentation Initiative (DDI), exploring how they can support or be extended to enable interoperability. Instead of centralising data, Project BLU enables a shared, federated catalogue based on metadata-exchange accessible from any entry point. Researchers will navigate this network to discover and link data across disciplines, reducing access barriers and opening new avenues for metadata specialists. These include promoting and, where needed, upgrading DDI standards, linking interdisciplinary data, and bridging vocabularies across scientific fields.
        The interconnected nature of the system reinforces adherence to FAIR principles, improves access to both open and restricted data, and enhances knowledge reuse at national and international levels (e.g. linking CESSDA and DDI partners). This presentation outlines the conceptual foundations of Project BLU and its vision for transforming metadata into dynamic, actionable links across the research landscape.

        Speaker: István Gyimes (State Archives of Belgium/Vrije Universiteit Brussel)
      • 15:45
        Ease the use of DDI developing a set of tool : the case of the French National Institute for Demographic Studies 30m

        As a prominent producer of demographic data, the French Institue for Demographics Studies (INED) needs to take account of the last development in open science landscape and FAIR principles. Following the implementation of rigorous standards, such as the DDI (Data Documentation Initiative) norm, in data dissemination, real-world scenarios have highlighted the considerable effort required to translate non-standardised documentation into normative frameworks, thereby ensuring that data and metadata adhere to FAIR principles.

        In response, INED is developing a suite of tools designed to facilitate the work of researchers, engineers, and data officers. This endeavour began with the creation of a metadata gathering platform, built using cutting-edge web technologies, which enables the direct generation of DDI-compliant metadata from survey teams. Future projects will build upon this foundation, including the development of a question database and web-based survey tools, with the ultimate goal of creating a unified, all-in-one survey instrument.

        Speakers: Lucas Bourcier (Ined), Thibaud Ritzenthaler
      • 16:15
        Using GEO-coded Metadata for Historical data Governance and AI applications. 30m

        Abstract:
        This paper examines the evolution and strategic importance of Geographic Entity Object (GEO) codes in Bangladesh, underscoring their role in census and survey operations, (Meta)Data automation focusing on DDI tools. GEO codes are hierarchical identifiers representing every administrative unit from divisions to villages and Enumeration Areas (EAs) and serve as the cornerstone of national statistics. Introduced by the Bangladesh Bureau of Statistics (BBS) in 1978 through Mouza lists and maps, the system has since become indispensable for managing censuses and socio-economic surveys. Today, GEO-coded metadata are increasingly shared as linked open data aligned with GSBPM and CSPA standards. Village Code eg: “30-94-63-905-203-009”.

        At BBS, we have converted 165 legacy data sets. All of that data has been mapped according to FAIR principles for researchers using our GEO code and DDI-SDTL. 159 legacy-data documentations have been prepared using DDI standard metadata, DDI-Codebook and Nesstar Publisher 3.x for cataloging.
        The DDI XML format played a significant role in the digitization and e-book creation of 2391 Census/Survey publication of BBS. Each edition GEO-code expands their scope for AI-ready metadata, supporting advanced analytics and improved DDI metadata documentation of area-specific data.
        The paper concludes that scientific GEO-coding underpins inclusive development, accurate governance, AI-driven policymaking, and integrated smart administration.

        Keywords:
        GEO-Code, Census, Metadata, Upazila, Mouza.

        Speakers: Mr Uzzal Sarker (RK Software (Bangladesh) Ltd), Mr Chandra Shekhar Roy (Bangladesh Bureau of Statistics)
    • 15:15 16:45
      Software K11-12

      K11-12

      ELTE Centre for Social Sciences (Research Documentation Centre)

      Convener: Ingo Barkow (FHGR)
      • 15:15
        CDISC to DDI-Codebook - The REDCap usecase 25m

        Through the use case of DAN (Divers Alert Network) we explore the possibility of documenting in DDI-C studies conducted through REDCap. REDCap (https://project-redcap.org/) is a widespread web-based tool for building and managing online surveys (eCRF) and databases in the health research community. To date, REDCap is used by almost 8000 institutions for 2.4M projects and 3.8M users.

        For this presentation, we will describe the implementation of a direct conversion of REDCap's study description files to DDI-Codebook, in order to publish study dataset descriptions in a Nada Data Catalog.
        To achieve this we will use the CDISC ODM standard for describing health-related research eCRFs. REDCap, as well as other eCRF solutions, can export an XML study description in CDISC ODM standard. Through an XSLT conversion to DDI-Codebook we can import a REDCap study directly into a Nada catalog.

        Although being a standard, CDISC ODM allows vendors to add their specific attributes and extend the standard. We will compare how much information is present in REDCap's implementation of CDISC ODM compared to the native CDISC ODM standard. Would a generic CDISC ODM to DDI-Codebook XSLT converter provide a deep enough metadata description?

        Speaker: Vincent BENOIT (Divers Alert Network (DAN))
      • 15:40
        Nectar Publisher – progress towards a functional open source dataset documentation tool 20m

        Members of the DDI Developers group are creating a simple data documentation tool for basic dataset documentation. The goal is to develop a lightweight, client-side–only web application that supports common tabular formats. The initial concept for this tool was developed during the DDI Hackathons and resulted in a prototype with basic functionality. We have recently integrated DDIwR to provide better support for a wider range of formats while still running entirely client-side in the browser. This short presentation will report on the current status of the project and provide usage examples.

        https://github.com/ddi-developers/nectar-publisher

        Speaker: Olof Olsson (Swedish National Data Service (SND))
      • 16:00
        Advancing Statistical Data Management:the Colectica Data Engine, DDI, and their Applications 25m

        The Colectica Data Engine is a versatile, platform-agnostic solution designed for handling statistical data, enabling the development of applications for curation, discovery, visualization, and analysis. It powers tools like Colectica Datasets and online platforms. It is accessible via desktop, command line, web interfaces, and APIs.

        Key features include intelligent file conversions from proprietary formats, leveraging Apache Parquet and Apache Arrow for efficient handling of multiple languages, missing values, date mappings, and embedded DDI metadata such as value labels and documentation.

        The engine also supports calculations of summary statistics, frequencies, crosstabs, charts, correlations, and regressions, with outputs in formats like DDI Codebook, DDI Lifecycle, DDI CDI, HTML, images, and JSON.

        This presentation explores the engine's architecture, practical use cases, and its potential to streamline data workflows, fostering interoperability and advanced analytics in research, government, and industry settings. By bridging diverse data formats, standards, and tools, the Colectica Data Engine promises to enhance data accessibility and reliability for users worldwide.

        Speaker: Jeremy Iverson (Colectica)
      • 16:25
        CDIF service for Nectar Publisher 20m

        The experimental integration of the Nectar Publisher platform with the Cross-Domain Interoperability Framework (CDIF) service enables visual exploration of variables within any dataset. At the core of this development is a knowledge graph layer built on the Data Documentation Initiative – Common Data Interface (DDI-CDI) standard. By mapping DDI-CDI metadata into richly interconnected graph structures, we can reveal semantic relationships between dataset metadata, variables, and research outputs. This creates a variable cascade, where changes in one element can be semantically linked and traced across related entities in both machine-readable and human-interpretable forms. Nectar Publisher serves as a "human-in-the-loop" tool, allowing experts to correct or confirm the semantic conversion.

        Speaker: Slava Tykhonov (CODATA)
    • 16:45 17:15
      Plenary Lecture Hall

      Lecture Hall

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4
    • 09:00 16:00
      Hackathon (I) K13-14

      K13-14

      ELTE Centre for Social Sciences (Research Documentation Centre)

      The topics of the Hackathon will be announced at: https://github.com/ddi-developers. To keep up-to-date, Join the DDI Developers Group at: https://groups.google.com/g/ddi-developers

    • 09:00 17:00
      Side Meetings Lecture Hall

      Lecture Hall

      ELTE Centre for Social Sciences (Research Documentation Centre)

      1097 Budapest, Tóth Kálmán utca 4

      Please indicate on the registration form if you wish to attend so we can plan rooms

      • 09:00
        DDI Questionnaire Group Meeting 3h 30m B 1.37

        B 1.37

        ELTE Centre for Social Sciences (Research Documentation Centre)

        1/2 day meeting to meet face-to-face for the DDI Questionnaire Group

        Speakers: Becky Oldroyd, Romain Tailhurat (Making Sense)
      • 09:00
        Metadata Publication and Access Working Group – Side Meeting 2h K12

        K12

        ELTE Centre for Social Sciences (Research Documentation Centre)

        Discussion of the work plan (half day), max. 10 persons.

        Speaker: Knut Wenzig (DIW Berlin/SOEP)
      • 09:00
        Technical Committee face-to-face meeting 7h K11

        K11

        ELTE Centre for Social Sciences (Research Documentation Centre)

        Specific goals of Face-to-Face meeting:

        • Initiate expanded coordination with Developers Group to initiate and
          develop a section on the DDI Website focused on Implementation Best
          Practices
        • Identify and develop plan for expanding technical support
          for users Configuring git repositories (review and modify as needed)
        • Expand and improve access to various DDI Tools (reporting, access,
          git based)
          External management of Tools list (possibly others); GitHub
          for collection and management; CSV or other output from GitHub to
          populate updates to HubSpot DB
        • Searchable/filterable listing on HubSpot
        • Implementation exercise
        Speaker: Wendy Thomas
      • 13:00
        Marketing Group Side Meeting 3h K12

        K12

        ELTE Centre for Social Sciences (Research Documentation Centre)

        DDI Marketing Group meeting to review progress to date, and the finalisation of use cases.

        Speakers: Barry Radler (University of Wisconsin-Madison), Jon Johnson (CLOSER, UCL)
    • 09:00 16:00
      Hackathon (II) K13-14

      K13-14

      ELTE Centre for Social Sciences (Research Documentation Centre)

      The topics of the Hackathon will be announced at: https://github.com/ddi-developers. To keep up-to-date, Join the DDI Developers Group at: https://groups.google.com/g/ddi-developers