The purpose of this all day meeting is to provide an overview for organisations such as archives and research institutes who have restricted budgets or limited personnel but would like to provide or know more about how to provide access to data using the DDI standard.
Running an Archive in Slovenia (ADP)
Supporting the Development of Archives in Europe (CESSDA)
Establishing a new archive (CROSSDA)
Supporting Research Visibility (ELTE RDC)
This workshop explores the expanded Colectica ecosystem, including the Colectica Datasets application for viewing, improving, and publishing data files, as well as new web-based tools for seamless collaborative editing and publication. Running on Windows, macOS, and Linux, Colectica supports formats like DDI, RDF, Parquet, SPSS, SAS, Stata, R, and CSV, while the web tools enable real-time team...
Developing and maintaining Metadata Standards
Supporting Users of of DDI
NADA Tools
Dataverse
Using R at the Romanian Data Archive
Open Data Format with Stata, R & Python (Multiplatform Distribution powered by DDI-Codebook at low cost)
The research lifecycle is being updated across many scientific fields to account
for the adoption of AI in scientific exploration and analysis. To better
support these changes, AI Readiness is emerging to define good data practices
in combination with AI adoption. AI Readiness, thus, includes the documentation
of data, models, and workflows for their use in AI throughout the...
2025 has been the most productive year yet for the Alliance with the number of initiatives, work groups, and products continuing to expand. The Scientific Board maintains a watchful eye over this hive of activity and this set of lightning talks is intended to give you bite-size insights into the latest developments, from tools to standards and all stops in between. We’ll begin with a...
Czech science is undergoing a significant transformation. What used to be only voluntary sharing of scientific data has, with the arrival of European programs and later also within domestic grant funding schemes, become an obligation—an obligation that is even embedded in national legislation. The state is also making substantial investments in supporting repositories and developing a domestic...
DDI Lifecycle and DDI-CDI provide significant capabilities for the integration and harmonisation of content across datasets. As part of the recently completed WorldFAIR project lead by CODATA, a team from the Australian Data Archive (ADA) and Sikt lead a work package to examine ways for improvement of FAIR practices in the management of harmonised content in cross-national social surveys. ...
The canonical source for most large scale surveys remains for the past and the foreseeable future the ‘paper’ version of the questionnaire as a PDF. These contain the essential information on the fielding of the survey, the questions asked including response options, ordering and the logic associated with filtering to create DDI-Lifecycle metadata. However, there is no standard layout and...
Interoperability of metadata across research infrastructures remains a major challenge for advancing Open Science. Aligning domain-specific standards with domain-agnostic frameworks is crucial for enabling cross-disciplinary discovery and FAIR data reuse.
The OSTrails project addresses this by developing the Scientific Knowledge Graphs Interoperability Framework (SKG-IF) - a core model and...
Communication materials are essential for engaging panel participants in survey research, but their creation is often time-consuming and resource-intensive—especially in multilingual contexts. Panel studies require both recurring content and wave-specific updates. The complexity increases significantly when surveys are administered in multiple languages.
With PhOrM, we propose a solution to...
Semantic interoperability is an especially challenging dimension for posing the requirement of the unambiguity of meanings. In a technical sense, interoperability is about the connectivity of data, which requires the use of compatible data formats and standardized solutions in data transmission. In social science repositories, as opposed to the real-time exchange of data, the emphasis lies in...
In the Ontolisst project, our contribution was threefold. First, during data processing and the creation of the LiSST thesaurus, we applied clustering and topic modeling to uncover patterns in large datasets, using generative AI for cluster labeling. Second, we validated LiSST by processing extensive sets of social science paper keywords and applying semi-supervised clustering for codebook...
The CESSDA European Question Bank (EQB) project, supported by CESSDA and participating service providers, aims to build up a rich database of survey questions in multiple languages. The question bank allows questionnaire designers to identify existing fielded survey questions and their translations, and allows researchers to search for and discover questions and data of interest. The source...
Whether because of technical difficulties or mismatched interests, interoperability is somewhat sidelined among FAIR principles. The stakes are obviously different according to the nature of projects determining the ways of data collection. Yet a thrust towards standardization and establishing connections is experienced broadly in today’s digital landscape. The discussion explores the...
The Leibniz Institute for Educational Trajectories (LIfBi) runs an infrastructure for questionnaire metadata initially used for the National Educational Panel Study (NEPS). With NEPS Starting Cohort 8 (SC8), which started in 2022, this metadata is already available during the field preparation processes. Based on that we developed software tools that largely automate the questionnaire...
Longitudinal and comparative research relies heavily on repeated measures and harmonisation of data, DDI-Lifecycle has strong support for this through the variable cascade, however, scaling such activity has proven difficult to put into practice.
Social science (and other!) researchers approach the development of questions from a range of perspectives, even where the response options are...
The DDI standard provides a structured framework for documenting questions and questionnaires, enabling users to identify exactly what was asked of respondents to generate the data (i.e. the provenance of variables). The DDI Questions and Questionnaires (Q2) Working Group comprises 15 members representing the international DDI community. Its aim is to provide guidance for creating and...
Cross-lingual alignment of nuanced sociological concepts can form the basis of comparing cross-national studies in different languages and harmonising longitudinal studies, by leveraging knowledge from social science taxonomies such as ELSST. Aligning sociological concepts is challenging due to cultural context-dependency, linguistic variation, and data scarcity. Traditional approaches for...
The FAIR (Findable, Accessible, Interoperable, Reusable) principles largely rely on metadata and metadata standards like the Data Documentation Initiative (DDI) for their implementation. More specifically, DDI facilitates the reuse and replicability of data as it provides a comprehensive metadata schema, including information on the data itself, which is important when it comes to assessing...
A vocabulary, if it is to be useful, faces three main challenges, coverage, achievability and utility.
With over 3,400 topics, its coverage is enormous, but assigning topics with such large numbers is not straight-forward at scale, and as such its utility as is, for discovery is thus diminished. It does however have a significant role to play as an anchor to which vocabularies could align...
Over the past 15 years, data management has received increasing attention among researchers and research policy agencies alike. Beyond the loss of trust triggered by questionable research practices (QRPs), the rapid expansion of data sources and new analytical methods has further underscored the need for professional data stewardship. While the field has developed significantly and matured...
DDI metadata standards are widely used to describe especially tabular data in depth, including their columns/variables. According to re3data.org, a global registry of research data repositories, they are the most prevalent standards with these capabilities. Additionally, re3data.org identifies OAI-PMH as the most widely adopted protocol for harvesting such metadata, with many endpoints...
Metadata is not an inherent characteristic of restricted data, which limits its ability to be found and used. To better understand discoverability and accessibility of restricted data, this study reviewed restricted health data sources to determine how they describe their datasets and access procedures, what descriptive commonalities exist across data sources, and to what extent the...
Following the international movement towards open science, the Constances cohort – known to provide the research community with high quality health and medical data - started in 2025 an ambitious project to build a metadata system leveraging the FAIR principles and supporting the creation, update and dissemination of FAIR data and metadata. DDI-Lifecycle has been chosen as the core modeling...
Variables and concepts are at the center of the DDI Lifecycle model. Variables are key to describe the data in its different states. Concepts help understanding the meaning of the data and its links to information in the outside world; concepts are also pivotal for implementing FAIR principles, specifically the knowledge representation and vocabulary principles for...
Challenges in documenting health research data using the DDI
standard
Documenting data across the wide spectrum of health research disciplines requires
dealing with a diversity of nomenclatures, study designs, and reporting standards. The
“France Recherche en Santé Humaine” (FReSH) catalog addresses this challenge by
providing access to descriptions of individual data from scientific...
At GESIS - Leibniz Institute for the Social Sciences, we have used DDI from the beginning. Mainly for data archiving, but recently, for more stages in the research data lifecycle, the DDI-Codebook and DDI-Lifecycle standards have become the language for documenting and re-using survey questionnaires and datasets for the social sciences. DDI is also being considered for documenting other data...
Metadata plays a key role throughout the data lifecycle, enabling researchers to discover, understand, and reuse their own and others’ data. Despite its importance, metadata is rarely included in university courses, and there is little formal training on the topic, thus engagement with metadata standards such as DDI becomes challenging. As such, the utility of metadata and its role in...
For a number of years, CLOSER has been organising and delivering metadata training events related to, or specifically on, DDI. The major challenge has been a “knowledge gap” on what metadata actually is and why it is important to managing and sharing data, especially with the advent of FAIR, which is itself not well understood.
Over the last year, CLOSER has been developing a training...
To improve the effectiveness of metadata production in Japanese for social surveys, the Social Science Japan Data Archive (SSJDA) at the University of Tokyo's Institute of Social Science has built a metadata extraction method using OpenAI's API. Since 1998, SSJDA has manually created metadata, and since 2021 has followed the DDI-Codebook standard, but the increasing number of data deposits now...
Psychological resilience research is rapidly expanding, but diverse concepts, heterogeneous study designs, inconsistent reporting, and missing metadata standards limit data reuse and robust evidence synthesis. To address these challenges, we introduce ResiMETA, a continuously updated open-access database for trajectory-based resilience research that systematizes evidence from longitudinal...
Year 2017, People of Bangladesh were facing many problems due to electricity shortage. The overall objective of this survey was to gather opinions on electricity supply, multiple electricity consumption levels, satisfaction with electricity provision, perceptions of quality and electricity-saving strategies in Bangladesh. ‘Opinion survey on Power supply’ has been successfully completed. Thirty...
In 2025, the Slovenian Social Science Data Archives (ADP) introduced two new applications into regular use: Dataverse as its research data repository and the e-Storage app as its digital preservation system.
Dataverse replaces the legacy Nesstar platform and streamlines ingest services, simplifying the deposit and dissemination of research data. Features, including an integrated deposit...
As part of the ongoing DDI website refresh, we’ve introduced a new Use Cases section to showcase practical applications of DDI -- including:
- Creating a codebook (in various products)
- Documenting data
- Building a Data Catalog
We are reusing existing content that is now outdated and in need of revision.
This interactive session invites attendees to help update and improve...
Poverty indicators such as the at-risk-of-poverty rate and the relative poverty gap are widely used across Europe, but they are only as reliable as the microdata behind them. In our analysis of EU-SILC data from 2005–2023, Hungary stands out with striking anomalies. In several survey waves, unusually many incomes cluster exactly at the poverty threshold—for instance, in 2020, 14% of...
Knowing how long a survey takes to complete is important for respondents, researchers, and survey practitioners alike. It is important for respondents because their time is a valuable and limited resource; for researchers and survey practitioners because research has shown how instrument duration is linked to response rates and respondent burden (Edwards et al., 2009; Eslick & Howell, 2001),...
Through the use case of DAN (Divers Alert Network) we explore the possibility of documenting in DDI-C studies conducted through REDCap. REDCap (https://project-redcap.org/) is a widespread web-based tool for building and managing online surveys (eCRF) and databases in the health research community. To date, REDCap is used by almost 8000 institutions for 2.4M projects and 3.8M users.
For...
The rapid development of digital infrastructures demands new approaches to managing born-digital data and enabling knowledge valorisation. Rather than constructing another digital Library of Alexandria, the SODHA-team is developing Project BLU, a framework for building a distributed information network. Drawing inspiration from Web3 principles—decentralisation, interoperability, and...
Members of the DDI Developers group are creating a simple data documentation tool for basic dataset documentation. The goal is to develop a lightweight, client-side–only web application that supports common tabular formats. The initial concept for this tool was developed during the DDI Hackathons and resulted in a prototype with basic functionality. We have recently integrated DDIwR to provide...
As a prominent producer of demographic data, the French Institue for Demographics Studies (INED) needs to take account of the last development in open science landscape and FAIR principles. Following the implementation of rigorous standards, such as the DDI (Data Documentation Initiative) norm, in data dissemination, real-world scenarios have highlighted the considerable effort required to...
The Colectica Data Engine is a versatile, platform-agnostic solution designed for handling statistical data, enabling the development of applications for curation, discovery, visualization, and analysis. It powers tools like Colectica Datasets and online platforms. It is accessible via desktop, command line, web interfaces, and APIs.
Key features include intelligent file conversions from...
Abstract:
This paper examines the evolution and strategic importance of Geographic Entity Object (GEO) codes in Bangladesh, underscoring their role in census and survey operations, (Meta)Data automation focusing on DDI tools. GEO codes are hierarchical identifiers representing every administrative unit from divisions to villages and Enumeration Areas (EAs) and serve as the cornerstone of...
1/2 day meeting to meet face-to-face for the DDI Questionnaire Group
Discussion of the work plan (half day), max. 10 persons.
DDI Marketing Group meeting to review progress to date, and the finalisation of use cases.
The diversity of metadata structures in European social science archives – ranging from differences in granularity and provenance to divergent terminology – is a major obstacle to thematic interoperability. This heterogeneity curtails the FAIRness of social science data and hampers their effective reuse across repositories, languages and disciplinary borders. The ONTOLISST project (funded by...
The DDI Alliance Scientific Board would like to organise a full day work meeting in the margins of EDDI 2025.
The goal of the meeting is to review where we are with the activities specified in the current Scientific Work Plan, work on related tasks, and start to prepare topics for the 2027-2029 - all in line with the Strategic Plan.
We are 9 members on the board altogether, so would need a...