Description
Metacurate is an ESRC funded project between CLOSER, University of Surrey, UKDS and Scotcen
Extending the results of our work on pre-trained language models with recent developments in text-layout models and zero-shot techniques. Since relying solely on textual information makes it difficult to accurately classify and extract metadata, a combination of textual content and visual logic that incorporates vision transformers with optimisation techniques will be explored.
This will...
Questions from the CLOSER DDI-Lifecycle repository will be used to assist in training a model that is capable of using questions and response domains from the metadata extraction workstream to create conceptually equivalent items from which data variables can be concorded. Approaches such as fine-tuned large language model (LLM)-based relevance scores model and vector retrieval-LLM reordering...
Conceptual annotations and provenance can provide contextual information to inform a range of data processing activities. In this workstream we will be utilising the metadata generated in the earlier workstreams – the questions and response domains from the metadata extraction phase and the concorded variables from the conceptual comparison phase – to identify key variables, those that...