Do you document questionnaires with DDI-Lifecycle? The DDI Questions and Questionnaire Working Group invites you to participate in a workshop to discuss how the DDI Alliance can make use of the standard easier for you.
The session will include discussions on what you wish DDI-Lifecycle could do, and what aspects are most challenging when documenting questionnaires. The outputs will be used...
Location: Building A: Room A3.08
Producing, harmonising and updating metadata are tasks metadata curators have to do regularly in order to make sure catalogues and their contents stay up-to-standards.
Drawing from the French Institute for Demographic Studies (INED) experience, this workshop aims at sharing about new procedures implemented in the last two years in order to support the...
This workshop introduces Colectica Datasets, a powerful new application designed for viewing, improving, and publishing data files. Running on both Windows and macOS, Colectica Datasets supports a wide range of dataset formats, including Parquet, SPSS, SAS, Stata, and CSV. Participants will learn to:
- View: Learn how to inspect, visualize, and analyze datasets to quickly understand the...
The FAIR principles—Findability, Accessibility, Interoperability, and Reusability—are essential for maximizing the value of resources in today's data-driven world. Metadata serves as the cornerstone in achieving these principles, providing structure and meaning to data.
This keynote will trace the evolution of metadata, from its origins in ancient civilizations using clay tablets to its...
Extending the results of our work on pre-trained language models with recent developments in text-layout models and zero-shot techniques. Since relying solely on textual information makes it difficult to accurately classify and extract metadata, a combination of textual content and visual logic that incorporates vision transformers with optimisation techniques will be explored.
This will...
This presentation will introduce the latest longitudinal study to use DDI Lifecycle and Colectica software. The Wisconsin Longitudinal Study (WLS) is a long-term study of a random sample of 10,317 men and women who graduated from the Wisconsin stat (U.S.) high schools in 1957. Since then, there have been six rounds of survey data collected from the original respondents and the sample has been...
Questions from the CLOSER DDI-Lifecycle repository will be used to assist in training a model that is capable of using questions and response domains from the metadata extraction workstream to create conceptually equivalent items from which data variables can be concorded. Approaches such as fine-tuned large language model (LLM)-based relevance scores model and vector retrieval-LLM reordering...
High quality metadata is a prerequisite for any data producer in contemporary research. The [Generations and Gender Programme][1] has a strong interest in this matter. The GGP is a cross-national (Europe and beyond) longitudinal survey providing data on a variety of topics including partnerships, fertility, work-life balance, transition to adulthood and later life. The documentation of the...
Beyond documentation, machine-actionable survey metadata offer a wide range of possibilities for more efficient and less error-prone survey management.
We want to illustrate how the German National Educational Panel Study (NEPS) made use of metadata throughout the survey life cycle in its newly recruited Starting Cohort 8, a panel sample of 5th graders which started in 2022. With this new...
Conceptual annotations and provenance can provide contextual information to inform a range of data processing activities. In this workstream we will be utilising the metadata generated in the earlier workstreams – the questions and response domains from the metadata extraction phase and the concorded variables from the conceptual comparison phase – to identify key variables, those that...
While the semantic web has been around for over twenty years, practical and sustainable implementations have been thin on the ground. During 2024, DDI Controlled Vocabularies and the DDI-CDI ontology have been made available, for the first time, as persistently resolvable linked open data. This presentation digs into the underlying cloud infrastructure, the rationale for creating it and...
PROGEDO’s data repository, Quetelet-Progedo-Diffusion (https://data.progedo.fr/), holds more than 1,600 datasets produced by the social sciences and humanities community in France. In spring 2024, PROGEDO upgraded this repository to offer a single entry point for discovering and requesting access to the available datasets.
This upgrade inevitably brought to the forefront the rich and...
Launched in 2020, the [LIFEOBS project][1] spans seven major French national surveys covering all life stages, including three integrated into European research infrastructures: the Generations and Gender Programme (GGP2020), SHARE, and GUIDE-EuroCohort. This project is conducted by [key French institutions][2], including but not limited to INED and PROGEDO. To improve international visibility...
With more than 400 DDI documented datasets, Center for Socio-Political Data’s catalogue (CDSP) counts ten of thousands of variables - mainly quantitative survey data collected from structured questionnaires.
With the final goal to produce accurate and consistent data training material for a machine learning model (camemBERT), the CDSP’s engineers launched a working group for variable tagging...
One objective of the COORDINATE project is to provide improved access to studies related to child and youth wellbeing. This has been achieved by utilizing CESSDA's software and metadata provided by CESSDA Service Providers in DDI-C and DDI-L formats.
In this presentation, the various components that enable the portal's functionality will be covered. This includes harvesting DDI metadata...
This paper is on the development of the KDK Thesaurus, a CV partly based on ELSST, used for the topical discovery of interview materials. The presentation discusses the workload for such a project, its sustainability and future perspectives of similar projects, incl. ONTOLISST and touch on the issue of economical and ethical considerations of metadata curation on smaller levels of datasets...
The talk introduces the new 2-year ONTOLISST project starting in December 2024, funded by the first OSCARS Cascading grant call. The project will develop a simplified multilingual ontology (LiSST) to describe social science research data, create a corpus of social science metadata, and research whether and how NLP tools can help with (semi)automated (meta)data curation. The aim is to better...
At GESIS, we plan to collect more digital behavioral data, e.g. social media data and web tracking data. Data sources are currently X/Twitter and tracking data collected by GESIS. The GESIS Web Tracking software works via a browser plugin on desktop devices. To document these data sources for archiving, additional new information is needed beyond the usual survey metadata.
Challenges are...
As the availability of DDI metadata at the variable level is quite low, three German research data centers (DIW Berlin/SOEP, LIfBi, DZHW) collaborated in a KonsortSWD-financed project to make progress in this area.
All partners have a metadata system in place that is based on partly multilingual structured metadata at the variable level, including, for example, variable labels, categories,...
As the DDI community continues to grow and its user base expands, the demand for sharing and accessing metadata in this format is steadily increasing. This presentation shares our experiences in collecting DDI codebook metadata records using the OAI-PMH protocol from diverse repositories. We will demonstrate the tools and processes employed to access and extract DDI metadata elements and the...
Social research increasingly includes media formats like audio and video, which are often poorly documented and inaccessible. While archives handle traditional survey data well, media files are mostly limited to minimally annotated zip files due to the complexity of proper documentation. Recent advancements in AI, including the Whisper model, along with the use of Pydantic models and...
Microdata provides tremendous value in socioeconomic analysis. However, these data may not be easily discoverable when metadata are not as rich, structured, and optimized as they could be. In the case of microdata, an issue is the semantic discoverability of information contained in the variable-level metadata (the data dictionary). This paper presents an unsupervised framework that leverages...
DDI Lifecycle and DDI-CDI provide significant capabilities for the integration and harmonisation of content across datasets. As part of the recently completed WorldFAIR project lead by CODATA, a team from the Australian Data Archive (ADA) and Sikt lead a work package to examine ways for improvement of FAIR practices in the management of harmonised content in cross-national social...
Since DDI is a suite of standards and other semantic products, the idea of creating an ISO standard for them is a challenge. Do we pick one of the DDI products and create a standard for that? Which one? Why? Will the revision schedule for the product selected interfere with the creation and maintenance for the standard? Will the Alliance be able to provide the resources for an ambitious...
Early 2025 SND and eight other Swedish research infrastructures will launch a new joint data portal targeting researchers interested in finding Swedish research data and support pages for research data management. Although considered, DDI was not selected as a format to be used to harvest metadata from the participating research data repositories. This presentation will focus on how we work...
Many stakeholders in science policy have demonstrated a strong commitment to open science over the past decade. However, this commitment contrasts with the reality, where a comprehensive and effective open science framework is still lacking. This talk focuses on political and practical challenges faced by open science in Europe, and how they are (not) met in Switzerland.
While the benefits...
The National Archive of Computerized Data on Aging (NACDA) began working with DDI-Lifecycle in 2018. Since then, NACDA has made efforts to document in DDI-L some of our most established and frequently-used longitudinal data collections and display them on a Colectica Portal. In this presentation, I will discuss how the system of topical groups and subgroups we use to organize the conceptual...
This is a full session featuring three presentations and a panel discussion. The goal is to present the capabilities of DDI-CDI as an overview, to describe those organisations and efforts which are already using the standard and planning to do so in the near term, and to show the active tools development which is taking place.
- DDI-CDI: A General Introduction - describes the features...
INSEE, the French national statistical office, has launched an overhaul of its dissemination processes in order to improve efficiency and services to producers. Part of this project involves developing a data dissemination platform where data can be discovered and consistency between data and metadata verified. In terms of metadata, this boils down to describing the structure of the data and...
DDI standards offer an extremely valuable solution to metadata management throughout the whole data acquisition, processing and dissemination phases. However, when documenting variables in repeated context - one of the most fundamental entities of any data process - challenges may arise to find the ideal way of using DDI. There are several contexts in which repeated variables are created...
This is an informal group for developers of software implementations based on DDI. The group gets together periodically to discuss their implementations of the DDI specification. Join us at our poster to discuss developing software with DDI or if you are interested in the DDI Hackathons.
https://github.com/ddi-developers/
The DDI-CDI Converter Prototype is a Python-based web application designed to convert proprietary statistical files from Stata and SPSS into the open DDI-CDI format. This tool addresses the growing need for data interoperability and sharing by transforming closed data formats into a standardized, machine-readable structure. By converting both data and metadata from Stata and SPSS, including...
Earlier this year, the DDI Alliance Executive Board and the Scientific Board announced a new DDI Strategic Plan, 2024-2027 and the complementary new DDI Scientific Work Plan, 2024-2026. These documents represent the culmination of collaborative efforts and thoughtful input from our community, and they are now ready for implementation and use.
This poster will highlight the focal points of...
[CLOSER][1] is the interdisciplinary partnership of leading social and biomedical longitudinal population studies (LPS), the UK Data Service and The British Library. One of our areas of focus is training and capacity building. We currently offer free, online educational resources on our [Learning Hub][2] on a range of LPS-related topics, including [Understanding metadata][3], which provides a...
As more and more data producers and research institutions are interested in documenting their data, following the often mandatory Data Management Plan, they find themselves in need for ready to use software tools that are capable of creating and modifying a Codebook, in the vein of Nesstar Publisher.
Until such a software will appear, it is possible to fully populate a DDI Codebook using the...
Earlier this year, the DDI Alliance Executive Board and Scientific Board launched a new DDI Strategic Plan, 2024-2027 and the complementary new DDI Scientific Work Plan, 2024-2026. These plans reflect the collective vision and contributions of our community and are poised to drive meaningful change.
This Birds of a Feather session led by leaders of the Executive and Scientific Boards...
Founded in 1997, the Association of Religion Data Archives strives to democratize access to the best data on religion. The ARDA includes American and international data collection and a host of free resources. Data included in the ARDA are submitted by the foremost religion scholars and research centers in the world. This poster will introduce three major initiatives that are building upon and...
Do you currently, or do you plan to document questionnaires with DDI-Lifecycle? The DDI Questions and Questionnaire Working Group invites you to participate in this birds of a feather to discuss how the DDI Alliance can make use of the standard easier for you.
We will discuss what you wish DDI-Lifecycle could do, and what aspects are most challenging when documenting questionnaires. The...
This presentation will explore the strategic implementation and evolution of the DDI 3.3 standard at Statistics Canada, focusing on how it has enhanced metadata management practices within the organization. Under the guidance of StatCan's Information Governance Committee, DDI 3.3 has been mandated as a standard for microdata description, integrated into the agency’s policy suite alongside...
In 2022 the Technical Committee provided a roadmap for their work during the period 2023-2027. We are now about 1/3rd of the way through the initial roadmap and it's time to see what we have accomplished, what needs adjustment, and what lies ahead in the 18-24 months.
This roadmap reflects the long-term priorities of the Technical Committee and identifies the specific tasks that are needed...
This presentation will showcase public, in-production use cases of Colectica software, highlighting its role across various stages of the data lifecycle. Colectica, built on the Data Documentation Initiative (DDI) standard, serves as a powerful tool for data documentation and management. Specific use cases will include projects that leverage the software and the DDI standard in the following...
In mid-2022, Statistics Spain (INE) decided to design and set the strategic guidelines for a new technological infrastructure that would provide our statistical office with the necessary hardware and software to automate, standardize, and modernize the entire statistical production process, encompassing improvements in terms of storage, virtualization, ingestion, analysis, processing, and...
The Metadata Editor is an open-source web application developed by the World Bank, fully compliant with the DDI Codebook standard. It supports study, data file, and variable-level elements from DDI, and for variable-level metadata, it allows importing and exporting data and data dictionaries from various versions of SPSS and Stata. The editor features a robust templating system that helps...
Using top 50 “smart” city governments identified by Eden Strategy Institute and ONG&ONG Pte Ltd. (2018), we first establish a data analytic and statistical scoring framework to assess and investigate their existing open data policies, and review respective performance in releasing environmental and air quality attributes and information to public. The framework considers data availability,...
The CESSDA Data Catalogue (CDC) has long supported metadata about Social Science studies in DDI Codebook 1.2.2 and 2.5 forms. Historically, however, metadata in DDI Lifecycle formats has not been supported. This was a pain point for CESSDA’s service providers who work with this format.
The CESSDA Metadata Office has created mapping from DDI 3 metadata to CDC UI elements which was the basis...
The ModernStats models, including GSBPM and GSIM, alongside reference architectures like CSPA and CSDA, provide a robust framework for understanding statistical production, guiding business decisions, and designing reusable software components. However, bridging the gap between these conceptual models and their implementation standards—particularly DDI 3.3 and SDMX—has be-come a focal point of...
Since 2020, the French Institute for Demographic Studies (INED) has undertaken a significant transformation in its data dissemination and metadata production processes. Following a thorough evaluation of alternatives to the Nesstar software to the implementation of the NADA Microdata Cataloging Tool and the metadata update and migration process, we are now two years into using NADA. We devised...
This presentation explores the newly implemented RDF (Resource Description Framework) support in DDI Lifecycle Version 4 and its significant impact on enhancing metadata interoperability. With the growing use of linked data and semantic web technologies, RDF offers a standardized method for representing and exchanging metadata, enabling smoother integration across diverse metadata models and...
This talk will not contain grieve about the end of software projects. Instead, we will give some insight into the reasons for switching from a homegrown development towards bying a commercial sollution, Colectica in this case.
Besides the DDI-FlatDB we also discontinued the development of our questionnaire editor, our publishing pipeline and some other tooling around DDI metadata. Therefore...
In the DDI Developers group we have been discussing the need for a simple data documentation tool to do basic documentation of a dataset. The goal is to develop a simple client side only application to document datasets with the possibility of building integration with repositories to load data and metadata. During 2024, the first basic concept of this tool was developed during the DDI...
The DDI Training Working Group (TG) expects to have several members attending the EDDI 2024 conference in person. With this in mind, we propose having a side in-person workshop after the EDDI conference. We plan to engage attendees on the topics of DDI training standards, requirements, and resources. Attendees are expected to be a part of the DDI Alliance, with some knowledge and/or interest...
The new 2-year ONTOLISST project starts in December 2024. The project idea is a result of discussions in the EDDI2023 conference. ONTOLISST will develop a simplified multilingual ontology and research whether and how NLP tools can help with (semi)automated (meta)data curation. The work will build on social scientific metadata in DDI format in different languages and from varius sources. The...
The DDI Marketing Group has recently reformed and will meet to work on developing the Marketing Strategy, messaging and the development of Case Studies
This meeting will carry forward the work from the Dagstuhl Workshop in October. Topics will include syntax representation of DDI-CDI, tools, documentation, and the integration of the non-quantitative model. Participants who are not already members of the DDI-CDI WG (or the non-quantitative subgroup) are welcome, especially members of the DDI Developer's Group. The meeting will provide an...
In Buenos Aires, the data for importing, exporting and correct interoperability of academic years and subjects approved in other South American countries to be validated in the Argentine Republic are not harmonized.
For this reason, administrative processes have been accelerated in order not to stagnate the development of individuals from various Latin communities. Bolivians, Peruvians,...
As more and more data producers and research institutions are interested in documenting their data, following the often mandatory Data Management Plan, they find themselves in need for ready to use software tools that are capable of creating and modifying a Codebook, in the vein of Nesstar Publisher.
Until such a software will appear, it is possible to fully populate a DDI Codebook using the...