Monday, November 15, 2010

What can science learn from Google?

That is the finishing sentence from this article. The author tries to argue that theory and the scientific method is getting obsolete in this day and age, using Google as a role model:
But faced with massive data, this approach to science — hypothesize, model, test — is becoming obsolete. [...] There is now a better way. Petabytes allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.
But what is the biggest problem about patterns? You will always find them, and nobody knows this better than Google. The irony of this article is that Google stands for everything that he claims is wrong with "conventional" science. Google test every new feature they implement, e.g. new gmail interface item (see contributions in the discussion by leggett.org, responsible for the new design), or this user interface design . This is not always completely successful (think Google Wave, duh), but based on some articles about how Google works, the scientific method is embedded in the core of their business model. This article, for instance, explains their overall strategy to improve their search algorithm:

At any moment, dozens of these changes are going through a well-oiled testing process. Google employs hundreds of people around the world to sit at their home computer and judge results for various queries, marking whether the tweaks return better or worse results than before. But Google also has a larger army of testers — its billions of users, virtually all of whom are unwittingly participating in its constant quality experiments. Every time engineers want to test a tweak, they run the new algorithm on a tiny percentage of random users, letting the rest of the site’s searchers serve as a massive control group. There are so many changes to measure that Google has discarded the traditional scientific nostrum that only one experiment should be conducted at a time. “On most Google queries, you’re actually in multiple control or experimental groups simultaneously,” says search quality engineer Patrick Riley. Then he corrects himself. “Essentially,” he says, “all the queries are involved in some test.” In other words, just about every time you search on Google, you’re a lab rat.

Google's strategy is probably the most successful implementation of the scientific method that does not involve technology at its core.

And what about the example that the author uses from biology?

The best practical example of this is the shotgun gene sequencing by J. Craig Venter. Enabled by high-speed sequencers and supercomputers that statistically analyze the data they produce, Venter went from sequencing individual organisms to sequencing entire ecosystems. In 2003, he started sequencing much of the ocean, retracing the voyage of Captain Cook. And in 2005 he started sequencing the air. In the process, he discovered thousands of previously unknown species of bacteria and other life-forms. If the words "discover a new species" call to mind Darwin and drawings of finches, you may be stuck in the old way of doing science. Venter can tell you almost nothing about the species he found. He doesn't know what they look like, how they live, or much of anything else about their morphology. He doesn't even have their entire genome. All he has is a statistical blip — a unique sequence that, being unlike any other sequence in the database, must represent a new species.
So what can biology learn from Google in this instance? First, that identifying these "unique sequences" as potential species is based on a hypothesis, because there probably are a staggering amount of unique sequences in each genome. Second, that these new species are nothing but a hypothesis that has to be tested. Sadly enough (or lucky enough, depending on your point of view), we biologists do not have 91 million searches per day at our disposal to test these hypotheses, but still have to get in our boat and get wet.

So should science learn the scientific method from Google? It turns out that for some science writers/scientists, the answer is "yes".

No comments:

Post a Comment