Digital collections

Field of action

The increasing availability of sources for data and texts in digital form is allowing previously inaccessible research questions to be handled and software solutions to be created for new areas of application (e.g. speech comprehension, navigation in physical space, intelligent web searches). The indispensable basis for such developments is the availability of well-maintained data collections and appropriately curated text corpora with the right subject content, appropriate quality and up-to-dateness and transparent processes for how they came into existence. The benefit bestowed by modern analytical methods such as Machine Learning (ML), Text and Data Mining (TDM) and Artificial Intelligence (AI), often espoused with considerable hopes with regard to their business development and financial potential, depends on the availability of suitable data collections and text corpora to which they can be applied. Data collections and text corpora are not just the basis for research but also an end product created as part of publicly funded scientific work and commercial efforts and of equal importance to research and business.

Priorities 2018 – 2022

It is essential to promote access to the digital and digitized cultural heritage from libraries, archives and museums (such as primary literature, archive material, objects and artefacts), to digital research data (including high-quality, top-class data for AI training purposes) as well as to usage and operational data created outside of the research process but nevertheless of interest to the research. It is also important for any scientific evaluation that citation indices and reference data are freely available and usable, and that data collections are available for alternative metrics (altmetrics). In addition, progress is to be made in the distribution of data in accordance with the principles of Findability, Accessibility, Interoperability and Reusability (FAIR data principles).
From a methodological perspective, particular emphasis is to be given to harnessing the opportunities of Text and Data Mining (TDM). Legal questions, e.g. in connection with TDM, are handled in connection with the field of action “Legislation for science in the digital age”.

Working group


  • Vorsitzende:
    stellvertretende Vorsitzende:
  • Thomas Stäcker
    Jana Hoffmann

NameNominated by
Alexander GeykenGerman Research Foundation (Deutsche Forschungsgemeinschaft, DFG)
Patrick SahleGerman Research Foundation (Deutsche Forschungsgemeinschaft, DFG)
Andrea WuchnerFraunhofer-Gesellschaft
Christian LangenbachHelmholtz Association
Nina WeisweilerHelmholtz Association
Phillip CimianoGerman Rectors' Conference (Hochschulrektorenkonferenz - HRK)
Gerhard HeyerGerman Rectors' Conference (Hochschulrektorenkonferenz - HRK)
Thomas StäckerGerman Rectors' Conference (Hochschulrektorenkonferenz - HRK)
Peer TrilckeGerman Rectors' Conference (Hochschulrektorenkonferenz - HRK)
Jana HoffmannLeibniz Association
Reiner MauerLeibniz Association
Torsten RoederGerman Academy of Sciences Leopoldina
Margit PalzenbergerMax Planck Society
Friederike Kleinfercher
Max Planck Society
Thomas ZastrowMax Planck Society
Olaf HeringBibliotheken der Ressortforschungseinrichtungen des Bundes (BRB) (Guest)