I am working on a mathematical model to help explain some experimental results and generate new hypotheses. Unfortunately, I have neither the resources nor interest to gather experimental data, but in this particular sub-field publishing a model without showing its usefulness on some experimental data is not common.

There are several existing experimental findings that can be explained by my model. However, they are presented in other modeling papers and the raw data is not available with the paper or on the authors' websites. In the papers they only present partially-analyzed data (for instance, they show results averages over participants, but not individual participant's results; or sometimes they only give the results of statistical tests).

I want to contact the authors for their raw data and have 3 related questions:

  1. What is the protocol for contacting by email to ask for authors' raw data? Is this common?
  2. Will the researchers expect to be invited on-board as co-authors? Or is a citation to their papers, and an acknowledgement of the form "AK would like to thank X, Y, Z for providing their raw data" sufficient?
  3. If my model (without fits to specific data) is in a pre-print state then should I send a pre-print to the authors I contact? What if the pre-print points out weaknesses in their approach to modeling similar problems?

1 Answer 1

I talk about biological sciences, but this can probably apply elsewhere. In theory, the moment a set of analysis is published, the data associated with it should also be public or available for:

  1. Other researchers who want to use them
  2. Other researchers who want to evaluate (i.e. repeat) the experiments and verify the initial findings.

So, in principle, if the publication does not have a link to the public data, then you could contact the journal and complain. Of course it depends on the data, but dna sequencing or protein analysis data are usually available. There might be legal or other limitations for patient, medical or other types of data.

This is the formal way.

There are some exceptions: The data are public but no publication is out yet. Because of policy they want/have to provide the data to the public, but the publication is in preparation. In this case you cannot use the data and you have to contact the PI to see how your analysis comes in conflict (or not) with theirs. Every institution has different guidelines.

Back to the initial issue, in reality, you might find resistance in getting access to the raw data of a published work (which, as I said it shouldn't be the case if the data are not sensitive, because anyone should be able to evaluate and validate their analysis).

You have two options: Check publicly available data (depending on your field - I can propose some in biological sciences I'm familiar to, if needed) and work your models on them. That would be easier for you, as you could avoid peculiar situations.

Contact the authors and propose a collaboration to do a different type of analysis than theirs (with your model) on those data. They would be happy to collaborate with you and they might even provide some insights on your analysis that would make it even better.

To the initial questions:

  1. I would say that it's common. It's more common to look directly at the public repositories and see data that are already published. The collaboration proposal might be a safer approach (in terms of results and good relationship and future collaborations)

  2. They might be expecting to be invited. It depends on the terms of the collaboration. In any case, it should be agreed upon on the beginning, so you can avoid the frustration at the time of publication (and after you have spent time on working on the data).

  3. I'm not familiar, but if you go for a collaboration, then it makes sense to explain your method at the first meeting. If you just need the data, or you take the data from a public repository, you don't have to send anything.