Towards Long-term and Archivable Reproducibility

Mohammad Akhlaghi; Raul Infante-Sainz; Boudewijn Roukema; Mohammadreza Khellat; David Valls-Gabaud; Roberto Baena Galle
Bibliographical reference

Computing in Science & Engineering

Advertised on:
4
2021
Description
Analysis pipelines commonly use high-level technologies that are popular when created, but are unlikely to be readable, executable, or sustainable in the long term. A set of criteria is introduced to address this problem: Completeness (no execution requirement beyond a minimal Unix-like operating system, no administrator privileges, no network connection, and storage primarily in plain text); modular design; minimal complexity; scalability; verifiable inputs and outputs; version control; linking analysis with narrative; and free and open source software. As a proof of concept, we introduce “Maneage” (Managing data lineage), enabling cheap archiving, provenance extraction, and peer verification that has been tested in several research publications. We show that longevity is a realistic requirement that does not sacrifice immediate or short-term reproducibility. The caveats (with proposed solutions) are then discussed and we conclude with the benefits for the various stakeholders. This paper is itself written with Maneage (project commit 925091e).
Type