Speakers
Description
The diversity of metadata structures in European social science archives – ranging from differences in granularity and provenance to divergent terminology – is a major obstacle to thematic interoperability. This heterogeneity curtails the FAIRness of social science data and hampers their effective reuse across repositories, languages and disciplinary borders. The ONTOLISST project (funded by the OSCARS call of the European Union) tackles this problem not by merely cataloguing the difficulties, but by research driven, AI supported exploration of concrete solutions. The session will present the project’s twofold strategy, showcase early results, and open a round table on the particular difficulties that arise when trying to achieve interoperability in social‑science digital repositories.
Presentation 1: Semantic Interoperability in Social Science Repositories
Presenters: Judit Gárdos, Róza Vajda, Timea Venczel
Semantic interoperability is an especially challenging dimension for posing the requirement of the unambiguity of meanings. In a technical sense, interoperability is about the connectivity of data, which requires the use of compatible data formats and standardized solutions in data transmission. In social science repositories, as opposed to the real-time exchange of data, the emphasis lies in the long-term preservation of data making them available for secondary analysis. Considering content, the smooth connection of data is ensured by way of describing data using abundant documentation and appropriate metadata schemes. DDI standards greatly enable putting this goal into practice. At the same time, while DDI Codebook provides for efficient and easy-to-use tools, only the more difficult and time-consuming metadata schemes contained in DDI Life Cycle promote the generation of sufficient metadata when it comes to specific but significant use cases in social sciences, such as the archiving of longitudinal surveys. Drawing on ten semi‑structured interviews with repository managers the presentation explores the various objectives and achievements of large social science repositories that deal with survey data aiming at semantic interoperability. It investigates the significance and stakes of interoperability with respect to the purposes and financing, maintenance and operations, clients and uses of the institutions.
Presentation 2: Using NLP Methods in Social Sciences – Experience and Opportunities
Presenter: Barbara Babolcsay
In the Ontolisst project, our contribution was threefold. First, during data processing and the creation of the LiSST thesaurus, we applied clustering and topic modeling to uncover patterns in large datasets, using generative AI for cluster labeling. Second, we validated LiSST by processing extensive sets of social science paper keywords and applying semi-supervised clustering for codebook validation. Finally, we developed automated labeling by fine-tuning XML-RoBERTa models to classify new survey questions and variables into the established codebook. In our presentation, we will briefly introduce these techniques and share key results, highlighting both our practical experience and the opportunities they open for social science research.
Roundtable on some challenges of interoperability: Can we be on the same page?
Discussiants: André Jernung, SND; Benjamin Beuster, Sikt; Knut Wenzig, DIW
Chair: Róza Vajda
Whether because of technical difficulties or mismatched interests, interoperability is somewhat sidelined among FAIR principles. The stakes are obviously different according to the nature of projects determining the ways of data collection. Yet a thrust towards standardization and establishing connections is experienced broadly in today’s digital landscape. The discussion explores the challenges of rendering research data interoperable, especially focusing on the creation of appropriate thematic metadata.