Responsible sharing and reuse of data and other research outputs
Learning Objectives
- Understand the factors influencing scientists' willingness to openly share and use data and other research outputs.
- Build awareness about responsible storing, sharing and use of data and other research outputs.
Introduction
Open sharing of natural sciences data and other research outputs has many advantages. It promotes transparency and reproducibility of research and enhances the credibility of research results. Shared data is a valuable resource for the research community, reducing the need for duplicative efforts and facilitating the exploration of new research questions. It also allows for the aggregation of multiple datasets, enabling more comprehensive meta-analyses and systematic reviews. Additionally, shared datasets provide valuable resources for educational purposes, such as allowing students to use real-world data to enhance their analytical and methodological skills. Despite these benefits, scientists are not always willing to openly share research data and other research outputs.
To prepare data for sharing in open access, it is important to be aware of different degrees of openness, different levels of data sensitivity and different categories of data. Berkowitz and Dalacour provide the following definitions for different degrees of openness of research data access: "fully open, with no barriers to access at all, embargoed access, which means that external users cannot access datasets until the end of the embargo, restricted access, with some barriers to access that external users can overcome under certain conditions, closed access, meaning totally closed access”. (Berkowitz and Dalacour, 2022)
Data sensitivity levels play a crucial role in determining the extent to which data can be freely accessed and shared. The main categories of sensitive data (or special categories of data, according to GDPR) include race, ethnic origin, health data, genetic data, certain biometric data, information about sex life or sexual orientation, political opinions, religious beliefs, philosophical beliefs, and trade union membership. The more sensitive the dataset, the more restricted the access should be.
The terms hot, warm and cold data are used to describe different categories of data. (Pernet et al., 2023) ‘Hot data’ refers to datasets currently in use or being generated in real time. These datasets are actively undergoing analysis or are part of ongoing experiments. Hot data is likely to be the most recent and actively researched information, reflecting the current state of scientific research in the field. ‘Warm data’ describes data that is actively used and frequently accessed. This type of data signifies information that is currently integral to ongoing research projects. ‘Cold data’ refers to currently less actively used or archived datasets. It is information that has been collected in the past, perhaps not frequently accessed, but still valuable for historical or reference purposes.
There are many reasons why scientists may be unwilling to share data and other research outputs in open access. Concerns about potential violations of Intellectual Property Rights, fear of others using the data and resources without proper attribution or for commercial purposes, and the competitive nature of scientific practice are key factors. Researchers might worry that shared data could be exploited by competitors to publish findings more quickly or gain advantages in securing grants. Concerns about data quality, accuracy, and the potential for errors to be identified also play a role. These concerns should be weighed against the principle of openness to find the right balance between protecting individual scientists' and research groups' interests and promoting openness.
The current academic reward system, emphasizing traditional metrics like publications and citations, also discourages researchers from investing in open sharing of data and other research outputs. Cultural and disciplinary differences affect openness to collaboration and sharing. Addressing these concerns requires a shift in scientific culture, the development of supportive policies, and the establishment of incentives that recognize and reward open scientific practices.
Legitimate reasons for not sharing data include ethical and legal considerations, such as the privacy of research participants. Additionally, the lack of standardized practices for data sharing in health and life sciences, including formats and licensing, can be a barrier.
Klein et al. suggest a decision flowchart outlining important considerations when sharing data and other research outputs (Klein et al., 2018):
Decision flowchart outlining important considerations when sharing research products. Source: Klein, O. et al. https://doi.org/10.1525/collabra.158, CC BY 4.0
ROSiE General Guidelines on Responsible Open Science detail the responsibilities of different stakeholders to ensure responsible sharing and reuse of research outputs:
4.1.1. As much as reasonably possible, researchers and Research Performing Organisations should ensure open access to the entire research lifecycle, which includes, as the ECoC states, publications, data, metadata, protocols, code, software, images, artefacts, and other research materials and methods.
4.1.2. Contracts with Research Funding Organisations and other entities should include equitable agreements about access to and dissemination of research results.
4.1.3. Research Performing Organisations and repositories should ensure appropriate infrastructures to allow the proper conservation and management of all research results generated in the research lifecycle, including those unpublished, ensuring their protection and adequate access to them for a reasonable time.
4.1.4. Researchers and Research Performing Organisations should ensure that the research lifecycle, including interim evaluation results, are documented in a detailed, accurate, and clear manner in accordance with the guidelines specific to the subject of study. All information and resources produced throughout the research lifecycle, including those that have not yet been published, should be responsibly managed and conserved by the research institutions and the researchers. 4.1.5. Researchers should ensure that sources are verifiable, and that open data practices are responsible, to allow the research to be examined and, when relevant, reproduced. The methods used and the respective steps of the entire research lifecycle should be clear.
4.1.6. Researchers should always provide references when reusing research data, materials, software, and tools.
Before proceeding to the next task, please read "Open Data, Software and Code Guidelines" developed by Open Research Europe: https://open-research-europe.ec.europa.eu/for-authors/data-guidelines.
References
- Klein, O. et al. (2018). A practical guide for transparency in psychological science. Collabra: Psychology, 4(1), 20. https://doi.org/10.1525/collabra.158
- Berkowitz, H., & Delacour, H. (2022). Opening Research Data: What Does It Mean for Social Sciences?. M@n@gement, 25(4), 1-15. https://doi.org/10.37725/mgmt.v25.9123
- Pernet, C., Svarer, C., Blair, R., Van Horn, J. D., & Poldrack, R. A. (2023). On the long-term archiving of research data. Neuroinformatics, 21(2), 243-246. https://doi.org/10.1007/s12021-023-09621-x
- Gomes, D. G., Pottier, P., Crystal-Ornelas, R., Hudgins, E. J., Foroughirad, V., Sánchez-Reyes, L. L., ... & Gaynor, K. M. (2022). Why don't we share data and code? Perceived barriers and benefits to public archiving practices. Proceedings of the Royal Society B, 289(1987), 20221113. https://doi.org/10.1098/rspb.2022.1113
- Data sharing and the future of science. Nature Communications 9, 2817 (2018). https://doi.org/10.1038/s41467-018-05227-z
- McAllister, J. W. (2012). Climate science controversies and the demand for access to empirical data. Philosophy of Science, 79(5), 871-880. https://doi.org/10.1086/667871
- Zuiderwijk, A., Shinde, R., & Jeng, W. (2020). What drives and inhibits researchers to share and use open research data? A systematic literature review to analyze factors influencing open research data adoption. PloS one, 15(9), e0239283. https://doi.org/10.1371/journal.pone.0239283