"We benefit from the publication of our own and third-party data sets"

Prof. Hanna Gaspard heads the “Educational Data Science” working group at the Institute for School Development Research (IFS) at the Faculty of Education, Psychology and Educational Research. Together with research associate Dr. Elisabeth Graf, she is part of the interdisciplinary research area “FAIR - From Prediction to Agile Interventions in the Social Sciences”, in which innovative approaches from statistics and data science are used and further developed to optimize prediction and intervention models in empirical educational, rehabilitation and social research. Both report on how to successfully publish their own research data and reuse third-party data while taking data protection into account.
You have already published various materials such as instruments and analysis codes via the Open Science Framework (OSF), thereby making them available to other researchers. What added value do you see in publishing such research data in the OSF?
Hanna Gaspard: It makes it more transparent how certain research questions have been investigated in individual studies. This increases the reproducibility of research. In addition, journals and reviewers are increasingly demanding that instruments and analyses be made available. In our experience, materials often have to be requested individually. However, at this point, the researchers are sometimes no longer available. These hurdles can be avoided if materials are made available directly.
Elisabeth Graf: Such details don't always fit into the manuscript. Analysis codes, in particular, are difficult to publish in the supplemental material, so that information about the studies is missing. Instruments are more often available. But it makes a difference whether only items or the entire questionnaire is provided, because instructions also provide further study information. Researchers benefit from publication because their own instruments can be reused more quickly and easily.
In educational research and in related fields, there are growing calls for data sets to be reused, thus tapping into the potential of secondary analyses. What has been your experience with this?
Hanna Gaspard: I have used secondary data and made data I collected available for reuse, in particular for a large intervention study via the research data center at the Institute for Quality Development in Education (IQB) at Humboldt University in Berlin. This center specializes in providing large data sets from educational research. In educational research, large samples are often necessary to be able to make meaningful statements at all. This makes data collection very expensive and time-consuming – for the participating schools as well. For this reason, it is desirable and saves resources to make the data available for other research questions. FAIR also plans to publish the collected data.
Frequently, various constructs are also collected that make it possible to examine additional questions. This makes it possible to see whether similar results can be found in other data sets or whether they can be generalized across studies. It is important that the data collection is well documented and prepared from the outset. The effort involved in documentation should be taken into account as early as the project acquisition stage.
How can raw data be found and processed?
Elisabeth Graf: It is helpful to know where to look for data. For example, there are repositories that provide variable searches so that you can filter which data set contains constructs that interest you. An example of this would be the variable search from LifBi for the data from the National Education Panel Study (NEPS), which I worked with as part of a FAIR project. In addition, available raw data offer the possibility of conducting not only meta-analyses based on aggregated effect sizes, but also meta-analyses with “individual participant data” (IPD): here, data are synthesized directly in order to be able to conduct analyses on a larger database.
Data protection regulations influence the publication and reuse of research data, for example with regard to “sensitive data”. What does this mean for your own data handling?
Elisabeth Graf: If you know that more detailed information is available but cannot be published for data protection reasons, you have the option of collaborating with the authors. For example, I reused data that I would have needed in a very detailed resolution level. Here, I first wrote my analysis scripts with simulated data. These were then executed by the authors with the original data and I received the aggregated results. We then analyzed them together. This allowed me to use the detailed data and results without having the data itself.
Hanna Gaspard: You should consider the possibility of publishing data before you collect it, for example with regard to the consent of the participants to publish the data. I advise you to consult with the data protection officer at the university and, if necessary, with the ethics committee at an early stage. It may be that sensitive data has to be stored separately and not all research data can be made available for subsequent use. Qualitative data is often not so easy to anonymize. This applies not only to video data but also to text data, where it is not always possible to ensure that it does not contain information that could be used to identify individuals. Specialist research data centers such as the IQB have the relevant expertise. It was very helpful to have them check again whether the data we collected could actually be shared in this way. In some cases, the level of detail for individual variables was reduced by aggregation so that it should no longer be possible to assign it to individual persons. Overall, I recommend seeking advice and expertise from others.
About the persons:
- Prof. Gaspard has been Professor of Educational Data Science at the IFS since March 2023 and is a principal investigator in the interdisciplinary research area FAIR.
- Dr. Elisabeth Graf received her doctorate in psychology from the University of Vienna, Austria, in 2023 and has been a research associate at the IFS since January 2024.
Prof. Gaspard and Dr. Graf are portrayed as Data Champions because they make research data available that makes the entire research process comprehensible without endangering the interests and rights of the study participants.