Skip to main content

Quality of research outputs and data sets

Learning Objectives

  • Understand the importance of the quality of data sets and research outputs and their responsible use in open science.

Introduction

The collection of research data is arguably one of the most challenging aspects of open science practice because it is highly vulnerable to misconduct (Hofmann, 2022). Misconduct related to data collection can be particularly costly to science and society, especially when data is shared open access for reuse and re-analysis. Therefore, it is crucial to ensure that both researchers and citizen scientists share an understanding of data collection standards. These standards ensure that data collected at different institutions and by various researchers are compatible and interoperable, facilitating the integration of datasets. This allows for meaningful reuse, comparisons, re-analysis, and the reproduction of research findings by other scientists. Adherence to data collection standards also contributes to the long-term accessibility of research data. Proper documentation and standardized formats make it easier for future researchers to understand, use, and build upon the data, thereby preserving the scientific record.

References

  1. Hofmann, B. (2022). Open science knowledge production: Addressing epistemological challenges and ethical implications. Publications, 10(3), 24. https://doi.org/10.3390/publications10030024

Citizen science projects collect and share diverse types of data. As pointed out by Balázs et al.: "Some projects are solely quantitative data projects, while others are solely qualitative. Mixed-method citizen science projects also exist which include both quantitative and qualitative data collection, generation, and manipulation." (Balázs et al., 2021) Due to this variety of data and other reasons, data quality in citizen science encounters various challenges that can impact the reliability and usability of the collected information. For example, analysis of the data collected by iNaturalist project revealed that the data suffers from various kinds of biases, for example, towards certain taxa (such as birds, plants, and mammals). Also, there is some evidence of spatial sampling bias. For example, about 58% of all threatened species observations in iNaturalist come from the U.S., Canada, Mexico, Russia and New Zealand (Soroye et al., 2022).

Balázs et al. point out the two main aspects of data quality in citizen science - reliability and validity. Reliability refers to the stability and consistency of data over time. In the context of citizen science, reliable data means that results can be replicated consistently. (Balázs et al., 2021) For example, in a project tracking water quality in a river, if different citizen scientists using the same measurement tools consistently report similar results for the same water samples, the data is deemed reliable. Validity in data refers to the extent to which the data accurately represents what it is supposed to measure or describe. For example, in a citizen science project on weather monitoring, if citizen scientists consistently report all relevant weather parameters (temperature, humidity, precipitation), the data is valid as it provides a comprehensive view of weather conditions.

Data contextualization refers to the practice of providing essential context and information surrounding a dataset, enabling a better understanding of how the data was generated, its purpose, and its quality. It includes metadata, attribution, and curation details to situate the data within its broader context. (Balázs et al., 2021) For example, in a climate monitoring citizen science project, metadata could include details about the creation of data set, contributors, methodology, instruments used, calibration procedures, and the temporal and spatial resolution of data. Metadata enhances the understanding and usability of the data. 

Four aspects of data accuracy in citizen science

Four aspects of data accuracy in citizen science. Balázs B. et al. https://doi.org/10.1007/978-3-030-58278-4_8, CC BY 4.0

References

  1. Balázs, B., Mooney, P., Nováková, E., Bastin, L., Jokar Arsanjani, J. (2021). Data Quality in Citizen Science. In: The Science of Citizen Science. Springer https://doi.org/10.1007/978-3-030-58278-4_8
  2. Soroye, P. et al. (2022). The risks and rewards of community science for threatened species monitoring. Conservation Science and Practice, 4(9), e12788. https://doi.org/10.1111/csp2.12788
  1. Balázs, B., Mooney, P., Nováková, E., Bastin, L., Jokar Arsanjani, J. (2021). Data Quality in Citizen Science. In: The Science of Citizen Science. Springer https://doi.org/10.1007/978-3-030-58278-4_8
  2. Herodotou, C., Scanlon, E., & Sharples, M. (2021). Methods of promoting learning and data quality in citizen and Community Science. Frontiers in Climate, 53. https://doi.org/10.3389/fclim.2021.614567