Skip to main content

Quality of research outputs and data sets

Learning Objectives

  • Understand  the importance of the quality of data sets and research outputs and their responsible use in open science.

Introduction

The collection of research data is arguably one of the most challenging aspects of open science practice because it is highly vulnerable to misconduct (Hofmann, 2022). Misconduct related to data collection can be particularly costly to science and society, especially when data is shared open access for reuse and re-analysis. Therefore, it is crucial to ensure that both researchers and citizen scientists share an understanding of data collection standards. These standards ensure that data collected at different institutions and by various researchers are compatible and interoperable, facilitating the integration of datasets. This allows for meaningful reuse, comparisons, re-analysis, and the reproduction of research findings by other scientists. Adherence to data collection standards also contributes to the long-term accessibility of research data. Proper documentation and standardized formats make it easier for future researchers to understand, use, and build upon the data, thereby preserving the scientific record. Furthermore, standardized data allow for higher-quality meta-analyses and synthesis of research findings. When multiple studies follow similar data collection standards, it becomes feasible to combine and analyze data across studies, providing a more comprehensive understanding of a particular research question.

References

  1. Hofmann, B. (2022). Open science knowledge production: Addressing epistemological challenges and ethical implications. Publications, 10(3), 24. https://doi.org/10.3390/publications10030024

The value of an open social sciences dataset is closely tied to the underlying quality of the data. (Sadiq & Indulska, 2017) Ensuring data and dataset quality in social sciences poses several challenges, including the dynamic nature of social processes, heterogeneity of data, potential for bias in data collection and analysis, lack of standardized guidelines preparation of datasets in social science research etc. For example, social processes and constructs evolve and are influenced by the unpredictability of human actions, requiring ongoing adaptation of data collection methods and collecting new data. Researchers must navigate these challenges to maintain the relevance of their open data in social science research.

Ensuring dataset quality for open sharing in social sciences involves implementing practices and standards to maintain the reliability of research data that is made openly available. It is important to assess and document the quality of datasets, including aspects such as completeness, accuracy, provenance, and timeliness. The focus on dataset quality covers not only the state of the data but also metadata, documentation, software, procedures, processes, workflows, and infrastructure throughout the dataset's lifecycle (Peng et al., 2022).

Challenges in ensuring and evaluating dataset quality include (1) the multidimensionality of quality attributes, which may vary based on context; (2) inconsistency in defining, measuring, and capturing quality attributes; and (3) a paradigm shift in the user community, from domain-literate individuals to diverse stakeholders, including those with limited scientific backgrounds (Peng et al., 2022). These aspects are further explained in the diagram:

Dataset quality aspects

Dataset quality aspects. Source: Peng et al. Data Science Journal https://doi. org/10.5334/dsj-2022-008, CC BY 4.0 

Additional challenges for data quality may arise from the involvement of citizen scientists, who are generally valuable contributors to the scientific process. However, there are instances where varying levels of training among citizen scientists can lead to inconsistencies in data collection standards. Moreover, issues such as incomplete data, limited understanding of the scientific context, or conflicts of interest by citizen scientists can impact the reliability of the datasets. Despite these challenges, citizen science also offers unique opportunities for large-scale data collection and public engagement in scientific research. Addressing these challenges involves careful project design, training of citizen scientists, effective communication, and the incorporation of quality control measures to enhance the reliability of the collected data.

Before moving to the next task, read the article by Towse, A. S., Ellis, D. A., & Towse, J. N. (2021). Making data meaningful: Guidelines for good quality open data. The Journal of Social Psychology, 161(4), 395–402. https://doi.org/10.1080/00224545.2021.1938811

References

  1. Peng, G., Lacagnina, C., Downs, R. R., Ganske, A., Ramapriyan, H. K., Ivánová, I., ... & Moroni, D. F. (2022). Global Community Guidelines for Documenting, Sharing, and Reusing Quality Information of Individual Digital Datasets. https://doi. org/10.5334/dsj-2022-008 
  2. Sadiq, S., & Indulska, M. (2017). Open data: Quality over quantity. International journal of information management, 37(3), 150-154. https://doi.org/10.1016/j.ijinfomgt.2017.01.003 
  1. Field, S. M., van Ravenzwaaij, D., Pittelkow, M. M., Hoek, J. M., & Derksen, M. (2021). Qualitative Open Science–Pain Points and Perspectives. https://osf.io/e3cq4/download 
  2. Riesch, H., & Potter, C. (2014). Citizen science as seen by scientists: Methodological, epistemological and ethical dimensions. Public understanding of science, 23(1), 107-120. https://doi.org/10.1177/09636625134973