Martina Stockhause

Journal Production Guidance for Software and Data Citations

Shelley Stall

and 26 more

December 31, 2022

Software and data citation are emerging best practices in scholarly communication. This article provides structured guidance to the academic publishing community on how to implement software and data citation in publishing workflows. These best practices support the verifiability and reproducibility of scientific results; sharing and reuse of valuable data and software tools, and attribution to the creators of the software and data. While data citation is increasingly well-established, software citation is rapidly maturing. With the current intensive use of software, including specialized tools and models for scientific research problems, the research community has begun to recognize that software, as a key research result and resource, requires the same level of transparency, accessibility, and disclosure as data. Software and data that support scientific results should be preserved and shared in scientific repositories for discovery, transparency, and use by other researchers. These goals can be supported by citing these products in the Reference Section of papers and effectively associating them to the software and data preserved in scientific repositories. Publishers need to mark up these references in a specific way to enable downstream processes, specifically those that enable automated attribution. Academic publishers wishing to stay current with best practices in the field are encouraged to follow the guidance provided here.

AGU_ComplexCitation_IN43A-06

Martina Stockhause

April 22, 2024

A document by Martina Stockhause. Click on the document to view its contents.

AGU_ComplexCitation_IN43A-06

Martina Stockhause

and 5 more

April 22, 2024

Within the climate modeling community, the complex citation issue has been discussed for a decade in the context of research traceability and data citation/data impact. The traceability requires fine-granular information on individual datasets, whereas meaningful data impact analysis relies on data citations on large data collections of data belonging to and individual model run or to a model experiment. To this date, it is not yet possible to achieve both goals with one technical solution. Suggestions for combinations of DOIs on data collections and user-defined PID collections of data subsets across several DOIs have not been taken up (see Stockhause et al., 2013).The IPCC FAIR Guidelines introduced in the Sixth Assessment Report (AR6) aimed to enhance the transparency of the AR6 and its outcomes by documenting the figure creation process (Pirani et al., 2022). Many figures are based on large numbers of datasets hosted in various repositories. Citing every dataset in the captions is not feasible. User-defined data collections utilizing data provenance records could be included in a caption, but lack the information about the authors and funders of the individual objects required for data citation and data impact analysis.The exchange on complex citation difficulties intensified at the AGU 2020 within the Community of Practice and led to the establishment of the RDA Complex Citation Working Group (WG). The WG brings all stakeholders together. It aims to provide recommendations for citing a large number of existing objects in a way that allows to properly assign credit for individual objects.

CMIP6 Citation Service - Review and Perspectives

Martina Stockhause

December 16, 2021

Data publication with DOI assignment has become common practice. The Citation Service for Coupled Model Intercomparison Project Phase 6 (CMIP6) was requested by the scientific panel WGCM, which is part of the World Climate Research Programme (WCRP) to enable references of the CMIP6 data in the upcoming Sixth IPCC Assessment Report (AR6). Expectations for data citations include aspects of data usage metrics to receive credit and reproducibility of published research findings. These two are difficult to combine, because a meaningful data usage metrics requires information on large data collections such as experiment data while reproducibility relies on individual datasets. In addition, data usage metrics rely on data users citing the data in the reference list of a scholarly publication and the publisher to provide data references in the articles’ metadata. Neither is a given. Organizations like the Coalition for Publishing Data in the Earth and Space Sciences (COPDESS) play an important role for the required ongoing community change. The Reliquaries approach together with PID Graph implementations being developed in the Data Citation Community of Practice subgroup shows promise for combining reproducability and credit expectations. The contribution will present survey results on CMIP6 participants’ expectations, discuss gaps in the current citation system and investigate, how new ideas can help to close these gaps. Reference: Stockhause, M. and Lautenschlager, M., 2017. CMIP6 Data Citation of Evolving Data. Data Science Journal, 16, p.30. DOI: http://doi.org/10.5334/dsj-2017-030

IPCC Sixth Assessment approaches towards FAIR data and an enhanced data reuse

Martina Stockhause

and 12 more

November 17, 2020

The Intergovernmental Panel on Climate Change (IPCC) currently prepares its Sixth Assessment Report (AR6). Its authors assess peer-reviewed scientific literature and recent climate datasets to inform policy-makers about the current state of the science regarding climate change and its impacts, as well as adaptation and mitigation options. For AR6, efforts are underway to make its main results FAIR and preserve them in the TRUSTworthy repositories of the IPCC Data Distribution Centre (DDC), jointly managed by CEDA, DKRZ, and CIESIN. The AR6 FAIR initiative was kickstarted by the IPCC DDC and Working Group I (WGI) [Stockhause et al., 2019], then adopted by IPCC TG-Data (Task Group on Data Support for Climate Change Assessments) shortly after its creation. All three WGs have adopted the FAIR data guidelines. IPCC assessments are large and diverse in in terms of scientists involved as well as included scientific objects. Challenges for digital data curation are related to the scale and diversity of papers, reports, datasets, the variety of software, and the different familiarity of the scientists with these technical aspects. The following priority areas for improved data stewardship were selected based on the aims to enhance the traceability of AR6 key findings and their reusability: preserve figure datasets in the DDC; - preserve analysis software; . preserve main input datasets in the DDC; . assemble datasets and provenance information on the figure creation from IPCC authors; and . interlink datasets to the IPCC report. Datasets are transferred to the DDC at the end of AR6. The DDC partners are responsible to preserve the data for future reuse by different stakeholders and under a variety of current and future scientific and policy-related questions. As the role for the DDC expands within the IPCC, new partners are sought. The TRUST principles provide a framework for the communication of DDC tasks to different stakeholders, e.g. to countries interested to host a DDC. The presentation will give an overview over the IPCC AR6 approaches towards FAIR data maintained in TRUSTworthy repositories, their challenges, their approach to meet these challenges and open questions, e.g. the integration of digital data into the IPCC Error Protocol, targeted within TG-Data.

AGU data citation community of practice - Credit for creators of data within collecti...

Justin Buck

and 11 more

January 03, 2022

A gap in community practice on data citation that emerged during the AGU fall meeting 2020 Data FAIR Town Hall, “Why Is Citing Data Still Hard?” with the goal of addressing the use case of citing a large number of datasets such that credit for individual datasets is assigned properly. The discussion included the concept of a “Data Collection” and the infrastructure and guidance still needed to fully implement the capability so it is easier for researchers to use and receive credit when their data are cited in this manner. Such collections of data may contain thousands to millions of elements with a citation needing to include subsets of elements potentially from multiple collections. Such citations will be crucial to enable reproducible research and credit to data and digital object creators. To address this gap, the data citation community of practice formed including members from data centres, research journals, informatics research communities, and data citation infrastructure. The community has the goal of recommending an approach that is realistic for researchers to use and for each stakeholder to implement that leverages existing infrastructure. To achieve data citation of these subsets of large data collections the concept of a “reliquary” is introduced. In this context the reliquary is a container of persistent identifiers (PIDs) or references defining the objects used in a research study. This can include any number of elements. The reliquary can then be cited as a single entity in academic publications. The reliquary concept will enable data citation use cases such as the citation of elements within a data collection that are formed from numerous underlying datasets that have their own PIDs, unambiguous citation of data used in IPCC Assessment Reports, and citing the subsets of collections of research data that contain millions of elements. The discussions over the course of 2021 have developed a theoretical concept, at the time of writing formal use cases and initial applications are being defined. The recommendation developed by this effort will be available for review and comment by communities such as ESIP and RDA. All are welcome.