Standards of evidence

Science and the law, and their respective practitioners, may seem to be as different as chalk and cheese, but they are both very much concerned with the evaluation of evidence. Scientists like to think of themselves as dispassionately weighing the objective facts arising from their experiments and observations, and using these to validate existing theories or to propose new ones. However, as any practicing scientist knows, we don’t always apply the same standards when weighing evidence.

For example, my field, environmental microbiology, relies heavily on observations and measurements made on wild organisms, rather than experiments on cultivated ones. The positive side is that there is so much diversity in wild organisms that there’s always something new to discover. However, not having them in cultivation means that whole classes of experiments, such as making knockout mutants to study particular pathways, are simply not possible. If I wanted to demonstrate that a particular microbe I’m studying is using a certain metabolic pathway, I can marshal all sorts of indirect evidence: the presence of key genes in the genome, expression of the corresponding mRNAs, chemical measurements of metabolic compounds unique to that pathway, and so on. Whereas with a “lab rat” organism like E. coli, I would have a more direct route: show that the key phenotype is affected in a targeted knockout, clone and heterologously express the gene. If I am working with such a “lab rat” for which genetic manipulation is possible, the indirect evidence that was acceptable previously would no longer be acceptable to most of my peers. They would instead demand the more stringent “gold standard”.

In American legal jargon, the “preponderance of evidence” is the burden of proof required for civil cases, whereas a stricter standard, “beyond a reasonable doubt”, applies to criminal ones. We sometimes hear people saying that scientists have “proven” this or that, but my impression from biology at least is that most scientific papers make their arguments from the preponderance of evidence, much less rigorous proof. In some types of experiments or analyses, it is possible to construct a formal statistical model to evaluate the probabilities. Does the preponderance standard correspond to P > 50%, as some sources suggest? And what is a reasonable doubt? If I am 99% sure, is that reasonable enough? Or is 95% sufficient? Rhetoric is important. That elusive quality, “relevance”, is conjured up by putting pieces of evidence in the frame of a larger narrative to hint at some deeper understanding in the works.

Does this mean that we should be stricter in what results we allow to be published, or that scientists should have argue like prosecutors in a death penalty case? I don’t think so. A single scientific project, whether on the scale of a PhD thesis or a large scale collaboration like the Large Hadron Collider, is usually an accretionary process. The pieces of the puzzle come out one at a time, and quite often we slot them together wrongly in the beginning. Ideally, each step of the way we strive towards reducing uncertainty. Artificial rigor would, in the words of the Street-Fighting Mathematician, induce rigor mortis and instead be a hindrance to scientific work.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s