1. About
  2. Features
  3. Explore

I tested a hypothesis using some observational data, but did not find support for the hypothesis (the difference was not significant at the p < 0.05 level). I subsequently realised that the inclusion of a certain subset of individuals was dubious, since there was some doubt over their measurements. I filtered out those individuals (on objective criteria), and lo and behold, I now find evidence to support my original hypothesis.

I am acutely aware that, had the original hypothesis test returned a significant result, I probably would have found justification to support the inclusion of those doubtful measurements (or perhaps never even stopped to think about removing them). Although this wasn't my deliberate intention, what I have done seems a lot like I have employed my “researcher degrees of freedom” to find a version of the analysis that supports my original hypothesis.

I realise that what I should have done was to plan out my analysis more carefully in advance, and decide whether or not to include the doubtful measurements before carrying out any analysis. But I can’t change what has already happened, so my question is, what do I do with my data/analysis now? I can see a number of options that vary in their sensibleness, but none of which seem ideal:

  1. Continue to use my updated analysis and present a clear argument for why those individuals should be excluded (i.e. ignore the RDF issue).

  2. Continue to use my updated analysis but, in any write-up/publication be fully open about the less-than-ideal path that I took to reach it.

  3. Conduct some kind of multiple-comparisons adjustment to take my multiple different analyses into account (I don’t know if this is even a valid approach in this context).

  4. Sigh and throw the analysis in the bin.

How (if at all) can I make good use of my analysis while still adhering to good research practice?

My field is Ecology, in case that makes any difference.

Edit, in response to close vote: I did consider submitting the question to Stats.SE instead, but I felt that this is more an issue of research philosophy than a detailed statistical question. The answers that I am interested in (and indeed have already received) are connected with a general research strategy, and how to present results, rather than details regarding particular statistical methods, and so I felt that it was appropriate for this site. Having received very useful answers and comments already, it won't make too much difference to me personally if it ends up closed (and I can understand the argument for doing so), but I think the material could be useful to others in a similar situation to my own.

1 Answer 1

dan1111's answer is quite good, and March Ho's answer is also very valuable.

Even though this is a 9-month-old thread, it remains a timeless topic, and so I would like to submit a further thought on the topic for you and/or any other researchers who stumble upon this question.

You asked whether researcher degrees of freedom invalidate your analysis. "Invalidate" might be a bit strong, but it certainly casts serious doubt on the effect you're investigating. Further complicating matters is that, in certain fields (and particularly in certain segments of certain fields), such practices are basically standard operating procedure.

See the story on p. 1 at Nosek, Spies, & Motyl (2012) doi:10.1177/1745691612459058 for a similar kind of scenario. The answer, as I see it, is that you should really treat your results as exploratory, and maintain a healthy skepticism about them. If you want to increase your certainty (and it sounds like you do), then the best answer is to run replications.

Note the plural here. March Ho advocates an independent replication--that is, a single replication. While this can be highly useful, as in the example in Nosek, Spies, & Motyl (2012; linked above), it can lead to further ambiguity if the replication yields a similar, but weaker, effect. Therefore, a single replication may backfire by increasing your uncertainty rather than your certainty! The ideal would be to test the effect multiple times, and look at the results in aggregate. Then, you will be much better able to 1) determine the direction of the effect at the population level, and 2) estimate the magnitude of the effect at the population level.

This approach would take a long time, possibly a lot of money (depending on the nature of your research), and it wades into the murky debate over whether a direct replication or a conceptual replication is more valuable. Despite these limitations, however, the advantage of this method is that you'll end up with a very high degree of certainty about the nature of the effect that you're studying.

If your goal is to increase certainty in your findings, I don't see any viable alternative to conducting multiple replications (both direct and conceptual). If your goal is to generate new findings and you're not terribly concerned about the generalizibility of the results, however, then the exploratory method that you described isn't much of a problem.

At least some academic journals seem to prioritize novelty over certainty. But people who care about the integrity of science seem to prioritize certainty over novelty. I think each approach can be valuable, but not at the exclusion of the other.

For maximum transparency while also maintaining publishability, it would probably be best to follow March Ho's above answer, which provides a great template to solve the issue if further data collection isn't possible.

This is a particularly thorny question! Research methodology and statistical methodology are inextricably linked. I have written about best statistical practice, on a note that is related to this issue; a preprint of the under-review manuscript is available at http://osf.io/preprints/psyarxiv/hp53k/