Is ‘Reproducibility’ For All?
Sven Ulpts and Jesper Wiborg Schneider
Recently talk about reproducibility and replication seems to be everywhere, due to concerns and crisis claims mainly in some behavioral and biomedical sciences (Fidler & Wilcox, 2021). However, it does not just remain talk. A reform movement has come up with the goal to ‘improve’ the credibility of science. Pivotal are demands for ‘reproducibility’ norms and policies (see e.g., Penders et al., 2019). But there are two unresolved and potentially precarious problems associated with widespread ‘reproducibility’ norms and policies. First, a general conceptual mess regarding the ‘reproducibility’ terminology. Second the issue of narrowly conceived but universal policies in the face of epistemic diversity (see e.g., Guttinger, 2020; Penders et al., 2019). As part of TIER2 we try to mitigate these problems by providing a framework that enables navigation through the conceptual mess and to assess the appropriateness of ‘reproducibility’ on a case-by-case basis.
 Henceforth we will use ’reproducibility’ as a placeholder for all the “re-terms” like: replication, replicability, reproducibility, reproduction, repeatability, repetition, reanalysis, reinterpretation, conceptual replication, direct replication, (in)exact replication, constructive replication, literal replication, method reproducibility, result reproducibility, inferential reproducibility, computational reproducibility, hypothetical replicability and many more.
There is not just one kind of ‘reproducibility’, instead there is huge variation in practices and functions (Peterson & Panofsky, 2021). Practices can be distinguished between forms of redoing and kinds of enabling. Proposed functions include, for instance, error reduction, generalizability, different kinds of validity, reliability, and robustness. But there is no agreement regarding which practices are fulfilling what function, nor about which functions are actually achievable at all. Furthermore, the ‘reproducibility’ terminology is clouded in conceptual confusion across disciplines, approaches and time with the same terms referring to different practices and functions as well as different terms capturing the same practices and functions (see e.g., Barba, 2018; Matarese, 2022). As Renee Borges put it: “There is […] confusion in the definition of the terms themselves, although everyone believes that they know what is being said.” (Borges, 2022, p. 1). It is therefore, practically meaningless to speak of the appropriateness or even importance of ‘reproducibility’ without specifying what is actually meant. Prior to any assessment of the relevance or feasibility of ‘reproducibility’ we need to specify what we are talking about (Tuval-Mashiach, 2021). This includes specifying what has to be redone or which practices are expected to enable potential redoing (e.g., transparent or rich reporting, sharing of material, data, code, expertise etc.) and the intended function (e.g., reliability) has to be stated.
Research is not one activity nor is there one scientific method. Research can be various different activities with diverse purposes and goals. This diversity is crucial because it also implies a diversity of research characteristics that affect the relevance and feasibility of ‘reproducibility’ (Guttinger, 2020; Penders et al., 2019). The problem of epistemic injustice emerges when ‘reproducibility’ norms or policies are applied to types of research for which ‘reproducibility’ is either irrelevant or unfeasible. When gatekeepers like funders and publishers demand ‘reproducibility’ in cases where it is inappropriate or foreign, then at best they put an improper burden on the researchers because they have to conform to standards that are alien to their research or at worst, they and their type of research are excluded by not receiving funding or publication space (Guttinger, 2020; Penders et al., 2019).
 In psychology Malich and Rehmann-Sutter (2022) referred to this as the homogenizing tendencies of metascience based on a narrow (positivistic) image of what proper science should be.
Relevance and feasibility of ‘reproducibility’
We propose knowledge production modes (KPM) as a framework to capture the aspects of epistemic diversity that influence the relevance and feasibility of ‘reproducibility’. We distinguish between research characteristics that influence the relevance and aspects that affect the feasibility to assess appropriateness of ‘reproducibility’. This distinction is necessary because from an epistemic standpoint it might be relevant, but from a practical perspective it might be unfeasible. Furthermore, the relevance has to be distinguished from the feasibility, because the reformation of the scientific ecosystem does only address the feasibility — not the relevance of ‘reproducibility’.
To assess the relevance of reproducibility we focus on epistemology, systems of justification and research goal. Epistemologies build the philosophical foundations that provide the value system for the evaluation and justification of knowledge claims. This can be seen as the logic of knowledge production (Carter & Little, 2007). In actual research, one rarely finds clear-cut epistemologies. We therefore need something that represents logic-in-use. Logics-in-use are what we call systems of justification. Systems of justification are the combinations of practices and values that are used during the conduct of research to establish the quality and trustworthiness of the research and thereby justify the knowledge claims made. The final aspect to consider is the research goal. Is the research, for example, conducted in the name of knowledge accumulation or part of a pursued for profit in a competitive marketplace (Guzzo et al., 2022).
To assess the feasibility, we focus on the nature of the subject of investigation, the resource dependence and availability of the research setup, and the theoretical as well as methodological uncertainty. What can be expected regarding ‘reproducibility’ depends on, for instance, whether the subject of investigation is indifferent or interactive, or stable and uniform instead of context-dependent and time-sensitive. Moreover, the nature of the subject of investigation is crucial because it influences the other feasibility criteria. Each practice has a specific combination of necessary resources and the availability and cost of such resources influence the feasibility of ‘reproducibility’. Lastly, the feasibility depends on the degree of uncertainty that is attached to an investigation (Peterson & Panofsky, 2021). We distinguish between theoretical and methodological uncertainty. Theoretical uncertainty is determined by how well the subject is understood and how much the investigation is guided by that theoretical understanding as well as to what degree that understanding is shared between the originator of a study and the one ‘reproducing’ it. Similarly, methodological uncertainty is about how well the methods and methodology in an original study are understood and how much that understanding is shared between the original researcher and the ‘reproducer’.
Is ‘reproducibility’ for all?
The question whether reproducibility is for all cannot be answered in general. It has to be answered on a case-by-case bases with an assessment of the relevance and feasibility of the kind of ‘reproducibility’ in mind.
Barba, L. A. (2018). Terminologies for Reproducible Research. https://doi.org/10.48550/ARXIV.1802.03311
Borges, R. M. (2022). Reproducibility and replicability in science: A Sisyphean task. Journal of Biosciences, 47(1), 15, s12038-022-00259–6. https://doi.org/10.1007/s12038-022-00259-6
Carter, S. M., & Little, M. (2007). Justifying Knowledge, Justifying Method, Taking Action: Epistemologies, Methodologies, and Methods in Qualitative Research. Qualitative Health Research, 17(10), 1316–1328. https://doi.org/10.1177/1049732307306927
Fidler, F., & Wilcox, J. (2021). Reproducibility of Scientific Results. In E. N. Zalta (Ed.), The Stanford Encyclopedia of Philosophy (Summer 2021). Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/sum2021/entries/scientific-reproducibility/
Guttinger, S. (2020). The limits of replicability. European Journal for Philosophy of Science, 10(2), 10. https://doi.org/10.1007/s13194-019-0269-1
Guzzo, R. A., Schneider, B., & Nalbantian, H. R. (2022). Open science, closed doors: The perils and potential of open science for research in practice. Industrial and Organizational Psychology, 15(4), 495–515. https://doi.org/10.1017/iop.2022.61
Malich, L., & Rehmann-Sutter, C. (2022). Metascience Is Not Enough – A Plea for Psychological Humanities in the Wake of the Replication Crisis. Review of General Psychology, 26(2), 261–273. https://doi.org/10.1177/10892680221083876
Matarese, V. (2022). Kinds of Replicability: Different Terms and Different Functions. Axiomathes, 32(S2), 647–670. https://doi.org/10.1007/s10516-021-09610-2
Penders, Holbrook, & De Rijcke. (2019). Rinse and Repeat: Understanding the Value of Replication across Different Ways of Knowing. Publications, 7(3), 52. https://doi.org/10.3390/publications7030052
Peterson, D., & Panofsky, A. (2021). Self-correction in science: The diagnostic and integrative motives for replication. Social Studies of Science, 51(4), 583–605. https://doi.org/10.1177/03063127211005551
Tuval-Mashiach, R. (2021). Is replication relevant for qualitative research? Qualitative Psychology, 8(3), 365–377. https://doi.org/10.1037/qup0000217