2–6 Dec 2024
University of Applied Sciences of the Grisons (Pulvermühlestrasse 57)
Europe/Zurich timezone
Conference Hashtag: #EDDI2024

Enhancing metadata interoperability: The journey to DDI Lifecycle implementation in the CESSDA Data Catalogue

4 Dec 2024, 15:30
20m
A1.02 (University of Applied Sciences of the Grisons (Pulvermühlestrasse 57))

A1.02

University of Applied Sciences of the Grisons (Pulvermühlestrasse 57)

Regular Presentation Software 2

Speaker

Matthew Morris (CESSDA)

Description

The CESSDA Data Catalogue (CDC) has long supported metadata about Social Science studies in DDI Codebook 1.2.2 and 2.5 forms. Historically, however, metadata in DDI Lifecycle formats has not been supported. This was a pain point for CESSDA’s service providers who work with this format.

The CESSDA Metadata Office has created mapping from DDI 3 metadata to CDC UI elements which was the basis for this work.

In order for the CDC to ingest Lifecycle metadata CESSDA MO has implemented a parser. Implementing Lifecycle is significantly more complex than Codebook. Simple parsing techniques like linearly reading through the document are not sufficient for Lifecycle. This is because DDI 3 elements can reference other parts of the document. These references need to be resolved.

To implement this, the behaviour of the parser needed to be defined programmatically at the XPath. This is different from what the parser did previously and required significant rewrites to introduce the required flexibility. Extensive use of lambdas was used to associate XPaths with parsing behaviour.

Next steps will be looking at performance and memory optimisations could be reduced when parsing large XML source files. Could this be accomplished with a streaming XML parser?

Primary author

Presentation materials

There are no materials yet.