Skip to main content

Quality of research outputs and data sets

Learning Objectives

  • Understand  the importance of the quality of data sets and research outputs and their responsible use in open science.

Introduction

The collection of research data is arguably one of the most challenging aspects of open science practice because it is highly vulnerable to misconduct (Hofmann, 2022). Misconduct related to data collection can be particularly costly to science and society, especially when data is shared open access for reuse and re-analysis. Therefore, it is crucial to ensure that both researchers and citizen scientists share an understanding of data collection standards. These standards ensure that data collected at different institutions and by various researchers are compatible and interoperable, facilitating the integration of datasets. This allows for meaningful reuse, comparisons, re-analysis, and the reproduction of research findings by other scientists. Adherence to data collection standards also contributes to the long-term accessibility of research data. Proper documentation and standardized formats make it easier for future researchers to understand, use, and build upon the data, thereby preserving the scientific record. Furthermore, standardized data allow for higher-quality meta-analyses and synthesis of research findings. When multiple studies follow similar data collection standards, it becomes feasible to combine and analyze data across studies, providing a more comprehensive understanding of a particular research question.

References

  1. Hofmann, B. (2022). Open science knowledge production: Addressing epistemological challenges and ethical implications. Publications, 10(3), 24. https://doi.org/10.3390/publications10030024

Ensuring dataset quality for open sharing in natural sciences involves implementing practices and standards to maintain the reliability of research data that is made openly available. It is important to assess and document the quality of datasets, including aspects such as completeness, accuracy, provenance, and timeliness. The focus on dataset quality covers not only the state of the data but also metadata, documentation, software, procedures, processes, workflows, and infrastructure throughout the dataset's lifecycle (Peng et al., 2022).

Challenges in ensuring and evaluating dataset quality include (1) the multidimensionality of quality attributes, which may vary based on context; (2) inconsistency in defining, measuring, and capturing quality attributes; and (3) a paradigm shift in the user community, from domain-literate individuals to diverse stakeholders, including those with limited scientific backgrounds (Peng et al., 2022). These aspects are further explained in the diagram:

Dataset quality aspects

Dataset quality aspects. Source: Peng et al. Data Science Journal https://doi. org/10.5334/dsj-2022-008, CC BY 4.0 

Additional challenges for data quality may arise from the involvement of citizen scientists, who are generally valuable contributors to the scientific process. However, there are instances where varying levels of training among citizen scientists can lead to inconsistencies in data collection standards. Moreover, issues such as incomplete data, limited understanding of the scientific context, or conflicts of interest by citizen scientists can impact the reliability of the datasets. Despite these challenges, citizen science also offers unique opportunities for large-scale data collection and public engagement in scientific research. Addressing these challenges involves careful project design, training of citizen scientists, effective communication, and the incorporation of quality control measures to enhance the reliability of the collected data.

Before moving to the next task, please, read Section 1.7.  "Publishing data for good citation" (pp. 29-32) in Publications Office of the European Union, Jessop, P., Data citation – A guide to best practice, Publications Office of the European Union, 2022, https://data.europa.eu/doi/10.2830/59387

References

  1. Peng, G., Lacagnina, C., Downs, R. R., Ganske, A., Ramapriyan, H. K., Ivánová, I., ... & Moroni, D. F. (2022). Global Community Guidelines for Documenting, Sharing, and Reusing Quality Information of Individual Digital Datasets. https://doi. org/10.5334/dsj-2022-008 
  1. Herodotou, C., Scanlon, E., & Sharples, M. (2021). Methods of promoting learning and data quality in citizen and community Science. Frontiers in Climate, 53. https://doi.org/10.3389/fclim.2021.614567
  2. Peng, G., Lacagnina, C., Downs, R. R., Ganske, A., Ramapriyan, H. K., Ivánová, I., ... & Moroni, D. F. (2022). Global Community Guidelines for Documenting, Sharing, and Reusing Quality Information of Individual Digital Datasets. http://doi.org/10.5334/dsj-2022-008