Friday, March 2, 2012

The fallacy of our misunderstand of the scientific method

Ryan Norris via Tom Nudds sent me a link to this web site, and I think all three of us had the same "What the ..." reaction after reading the entry by Irene Pepperberg, titled "The Fallacy of Hypothesis Testing":
"I was trained, as a chemist, to use the classic scientific method: Devise a testable hypothesis, and then design an experiment to see if the hypothesis is correct or not. And I was told that this method is equally valid for the social sciences. I've changed my mind that this is the best way to do science. I have three reasons for this change of mind."
These three reasons are:

  • the importance of observations, without explicitly testing hypotheses
  • testable hypotheses are not interesting
  • the scientific method often leads to proving a hypothesis, not testing it
These three reasons point out some major misunderstandings of the scientific method (see this post, or this post) :
  • the context leading up to the hypothesis-prediction-test (the question, background information, etc.) forms an essential part of the scientific method
  • the discussion of results (the information increase that leads to the next cycle) also forms an essential part of the scientific method. This is exactly what she points out "... the exciting part is a series of interrelated questions that arise and expand almost indefinitely".
  • this is a general misunderstanding that Dr Peppenberg correctly identifies. But this is not a reason to dismiss the method, but more a call for better education.
We always tell our students that a full understanding of the scientific method, despite its apparent simplicity, is actually more challenging than it looks like. And now we can point them to this article, because this is a successful scientist from one of the best universities in the world (Harvard), who has a very limited idea of the scientific method.


  1. I am continually amazed when I see scientists (often ones whom I greatly respect) defending the old fashioned "scientific method" outlined by Popper so many decades ago. I often see a kind of equivocation happening in these discussions. To avoid this, it is important to distinguish between "strict" and "loose" senses of the HD method.
    In the strict sense, as outlined by Popper, the HD method involves (1) generating a hypothesis H by any means; (2) deriving predictions from H; (3) testing predictions; and (4) accepting or rejecting H based on the outcome of step 3.
    Now, everyone seems to agree (I think) that this strict model is inadequate. It does not describe science accurately, even when science is going well. Part of the problem is that this model says nothing about how hypotheses should be generated. That was one of the points Pepperberg was emphasizing. This model also offers no useful guidance on when to reject H. For example, is a single "falsifying" experiment sufficient to reject H? Sometimes yes, sometimes no. It depends on a wide range of other factors that have never been formally described.
    Given that (I think) most people recognize that the strict HD model is seriously inadequate, some people appeal to a loose (non- strict) version of the HD method instead. Now, no one has ever managed to formulate how this "method" goes. But one thing is for sure, a whole bunch of implicit, discipline-specific knowledge is required to do good science. Pepperberg mentions how it is important, in her field, to watch an animal before generating hypotheses (something she gets from Nico Tinbergen, by the way). That little nugget of advice won't do much good in, say, quantum physics or, I'm guessing, community ecology. This is just one of many possible heuristics for generating hypothesis that works in one field but perhaps not in many others. Hence, not a part of THE scientific method.
    I also gather that there is a lot of discipline specific knowledge in the field of animal behaviour that helps practitioners decide when to retain or reject a hypothesis. Again, advice from this field won't generalize to physics or ecology.
    Where I think people are disagreeing is over the importance of these implicit, discipline-specific rules for scientific decision making. Scientists who cling to "The Scientific Method" (in some form) tend to downplay the importance of these implicit rules. On the other hand, from my perspective, these rules are where all of the action is.
    If you want to understand how a particular science works, I think, you need to understand how it *deviates* from the strict HD method. What distinguishes good ecology from bad is HOW hypotheses are generated and tested, not THAT they are generated and tested. The point is that distinguishing good from bad ecology requires employing discipline specific knowledge and techniques that do not generalize to neuroscience, animal behaviour, quantum physics, etc. Nor is it helpful to know that, no matter what, it is always possible to shoe-horn what these various scientists are doing into something that vaguely resembles the HD method. This exercise in abstraction does nothing to distinguish good science from bad. Or, so I contend.
    Finally, I don't know why so many scientists feel compelled to defend "the" scientific method when, in reality, there isn't one. At least, not if "method" is taken to mean a series of steps that will reliably generate good scientific outcomes in any field.

    Thanks for tolerating my little rant.
    -Stefan Linquist

  2. So many scientists feel compelled to defend "the" scientific method because, otherwise, so much else passes for reliable scientific knowledge that is generated by other ways of knowing that are not, in fact, evidence-based.

    As long as there are those in the field of science who contend that we need not test hypotheses at all to qualify as "doing science", we cannot distinguish better from poorer scientific inference. Rather, distinguishing better from poorer science, with respect to strength of inference, requires THAT hypotheses be generated (and tested); then we can debate the relative merits about HOW they are generated and tested. How can there be discussion about HOW hypotheses are generated and tested without accepting first that they must be?

    Further, Linquist is incorrect in asserting that scientists who promote hypothesis testing are somehow blind disciples of Popper. Perhaps Popper indeed left out how to generate hypotheses. Even if he did, that's hardly a condemnation of "the" method. His contribution, in my opinion, was not even the serious of steps, beginning with the generation of hypotheses. I would even contend that the "hypothesis" he's famous for - "all sheep are black" - is actually not one; it's a simple statement of a pattern, more like a statistical hypothesis, or a prediction wanting for a mechanistic explanation, or research hypothesis, behind it.

    Popper's contribution was in promoting falsificationism, namely, that we should be generating (explanatory) hypotheses and attempting always to refute them, such that the remaining ones constitute our best guess (for now) about why things are the way we perceive them. Too many scientists seem to think they are in the business of proving something (as Pepperberg argued too). As Lewis Thomas wrote, this leads to a significant misunderstanding in society at large about what we are really up to and makes science, in an increasingly anti-scientifc environment, all that much harder to do. And we are not to prove anything, then what is it that we are to test but hypotheses?

    Finally, leaping to conclusions about why things are the way they are has to be about one of the easiest of human foibles to commit. Scientists who are satisfied that collecting observations constitutes "doing science" are especially vulnerable to do doing just that, and in the process, committing errors of retroductive logic, that is, confusing explanation with speculation. That's how poor hypotheses appear and persist in the scientific literature, until someone realizes that they've never actually been tested. More on the vulnerability of scientific inference to hypothesis-free data collection and retroductive logic can be found here:

    Tom Nudds

  3. "This exercise in abstraction does nothing to distinguish good science from bad. Or, so I contend." Based on the above discussion and his past experience, Stefan Linquist comes up with a testable hypothesis that identifies a causal mechanism (a true hypothesis, according to Tom Nudds) between abstraction and identifying two types of science. Well, it is the lack of a causal mechanism.

    The scientist in me, of course, immediately sees this as a challenge: how can we test this? Just the fact that this is my immediate reaction already shows that I do not completely agree with Stefan. If his hypothesis is correct, I should not even attempt to answer this question, since I am a community ecologist and every attempt that I would make would be "bad" science compared to what a social scientist would come up with. I agree that it would probably be inferior to what a social scientist would come up with, but would it be "bad" science? I have the hubris to think that given my exposure to the scientific method, I would be able to come up with a test that would be considered "good" science.

    But this will quickly evolve into a discussion about the discussion of "good" and "bad" science. Let's, for simplicity sake, define it as publishable. So one test of Stefan's hypothesis would then be to select for submissions to PLoS One reviewers that are both field specialists and scientists outside the field. So for instance for a ecology submission both ecologists and social scientists. And the first measure stick would be how often the non-specialists make the same recommendation as the specialists. The advantage of using PLoS One submissions is that you are explicitly told to not evaluate the novelty of the findings, which would test field-specific background information.

    The devil is of course in the details: how many non-specialist/specialist per article, how to pick them, how many submissions, how to perform a more informative analysis of the actual review reports of the two types of reviewers, do we need a third control treatment of a non-specialist but from the same field.

    But given that we can work this out, would this be a "good" test of Stefan's hypothesis?

  4. Oh boy. This old topic again!

    There has been plenty of debate about the necessity (or not) of explicit hypothesis testing recently with reference to post-genomic data/technology-driven work. For example, what was the hypothesis being tested when the human genome was sequenced?

    There is an essential role for "discovery science", especially at the forefront of new areas of research. Later, the results of these explorations (if scientifically rigorous) can lead to testable hypotheses about specific aspects of the area. What makes it science is not whether a hypothesis is tested but whether standards of reliability and rigour are upheld. The two human genome papers of 2001 have been cited more than 15,000 times -- pretty clear indication that a) the work was reliable, given technology at that time, and b) these exploratory studies inspired an enormous amount of subsequent research. One might go so far as to argue that the first exploration was more influential than any one subsequent hypothesis testing project. One cannot formulate reasonable hypotheses if there is no background information on what the important issues are in a particular field.

    Anyway, I also point out that efforts to educate young students have, in some cases, moved away from the traditional "scientific method" idea. For example, the NSF-funded "Understanding Science" website out of Berkeley:

    - Ryan

    1. I am sure I do not disagree with the important or essential role of discovery science (see my first point in the original post above), and maybe Tom Nudds would also agree with that. I think we might disagree, though, with what makes the human genome paper "science". You write that it is the reliability an rigour of the study. I would claim that, more importantly, it is science because can lead to testable hypotheses (and maybe Tom would agree with me here as well, I wonder if Stefan Linquist would agree with me, probably not ;-). So in essence, the 15.000 citations make it science.

      The website you linked to "How science works", phrases it this way:

      "Scientific testing is at the heart of the process. In science, all ideas are tested with evidence from the natural world, which may take many different forms — from Antarctic ice cores, to particle accelerator experiments, to detailed descriptions of sedimentary rock layers. You can't move through the process of science without examining how that evidence reflects on your ideas about how the world works — even if that means giving up a favorite hypothesis."

      I assume that these 15.000 citations are using the human genome data to test ideas of how the world works, and I think that is what makes the human genome article "science". But it is obvious by now that I am only catching up on these philosophical questions that have been explored for decades, so I am really hope that Stefan Linquist will jump in and provide me (us) with a very illuminating explanation of the current consensus of this discussion.

    2. I'd say that this is what Stephan mentioned about equivocating. Hypothesis-testing vs. hypothesis-generating science are not the same thing. Just because they both have "hypothesis" in them does not make them both part of the standard "scientific method".

      Let's say the mass of gold was not known, and you set out to measure it. You use a well-calibrated instrument, carefully document your procedure, double check for contamination and instrument inaccuracies, measure samples from multiple sources, and get several experimenters to perform the measurements independently, and then you write up your methods and results and subject them to peer review.

      Was this science?

      What was your hypothesis?

      What hypothesis did your measurement of the mass of gold generate?

      What if no one cites your work because no one cares about the mass of gold but everyone cites another paper using the exact same methods to measure the mass of uranium? Is one science and not the other?

    3. My answer to whether one article about the measurement of the mass of gold is "science" and the other one not will depend this article help me/us to "examine how that evidence reflects on your ideas about how the world works" (the quote from How Science Works).

      How Science Works explicitly identifies "scientific testing ideas as the hart of the process". So if you look at the general flowchart (, testing ideas is at the center, and the other components are connected to it.

      There are lots of activities in the three outside bubbles that in itself we do not necessarily associate with testing ideas as it is explained in the central bubble:
      - exploration and discovery: writing literature would fall in this bubble (making observations, asking questions, sharing data and ideas, finding inspiration, ...)
      - community analysis and feedback: screen writing would fall in this bubble (feedback and peer review, discussion with colleagues, publication, coming up with new ideas)
      - benefits and outcomes: accounting (satisfy curiosity, solve everyday problems, address societal issues, inform policy)

      It "only" becomes "science" if it is connected with the testing-ideas bubble. Similarly with your mass of gold example, or with the human genome data.

      Does a lack of a connection with the central testing-ideas bubble make an activity less valuable? Of course not, we need accountants. But should we call all these activities in itself "science"? I don't think so.

    4. Testing ideas is not exclusively a matter of formulating and testing individual hypotheses one at a time. Blind data collection can be problematic, but a lot of the time people who do "discovery science" have many ideas in mind as they explore new ground. Later, once more specific questions become possible, smaller-scale individual hypotheses that only tackle one simple idea at a time can be developed and tested. So, just to be clear, I am not talking about just going around and documenting everything one sees. One has to have some ideas in mind, or one will not collect the relevant data using scientifically sound methodology. The human genome, for example, also had a component of annotation of genes, identification of potential gene function, analyses of transposable element composition, comparisons with other animal genome sequences, and so on. You could, I suppose, argue that they had dozens or hundreds of "hypotheses" in mind -- or, more realistically, they had dozens or hundreds of research questions in mind, ideas that they knew they wanted to test. Insisting that the scientists who do large-scale discovery work break everything down into simple, individual hypotheses always strikes me as being like telling a professional mathematician to show every single step of her work.

  5. If the ideas you want to test come from information that is not derived from science, then that creates pretty significant problems for the validity of the tests themselves, does it not?

    There are big differences in reliability of information, and one way that we decide what is good information and what is not is standards of evidence, repeatability, reliability, etc. -- in other words, what I would consider the standards of science.