Observational Studies and Experimental Studies

Medical research studies can be divided into two types: observational and experimental. Observational studies simply observe the effect of a variable in a population. They can assess the strength of a relationship, for instance between dietary factors and disease. Are vegetarians less likely to develop cancer? Are patients treated with a new diabetes drug less likely to die of heart attacks than patients treated with older drugs?

Two types of observational studies are cohort and case-control studies. In a cohort study, a set of people with defined characteristics (for instance, patients who have been diagnosed with diabetes) are observed over a period of time to determine whether a variable (such as treatment with a new drug) is associated with better outcomes (such as fewer deaths or hospitalizations). Cohort studies can be prospective or retrospective, the latter being less reliable because they depend on patient reports of past events and are subject to recall bias and faulty memory.

In a case-control study, individuals who have a disease are matched to individuals who don’t have the disease but otherwise appear to be identical. There’s a problem, however: they may not be as identical as they first appear. There is always a risk that confounding factors may have been missed. When matching individuals, no one would think to ask them what brand of spaghetti sauce they’ve been buying, but what if more people in the disease group have switched to a new brand that just happens to contain an herb that is effective in combatting the disease in question? As Donald Rumsfeld pointed out, there are known knowns and known unknowns, but the difficult ones are the unknown unknowns. If we don’t know we don’t know about some confounding factor, there’s no way we can control for it.

Correlation Is Not Causation

Observational studies are useful for finding correlations. They may find that the more meat people eat, the more likely they are to die from a heart attack. But we can’t stress often enough that correlation is not causation. If A is correlated to B, that doesn’t tell us that A causes B. Maybe B causes A. Maybe both A and B are caused by another factor, C. Maybe it’s not a true correlation but a false finding due to errors in the way the study was designed or carried out. Maybe the apparent correlation is a meaningless coincidence. That happens a lot.

There is a whole website dedicated to spurious correlations, a hilarious compilation by Tyler Vigen. The website shows graphs and calculates correlation percentages for an astounding 30,000 examples. If you’ve never seen it, it’s well worth a look. The divorce rate in Maine is correlated with the per capita consumption of margarine. The number of suicides by hanging, strangulation, and suffocation is correlated with the number of lawyers in North Carolina. A widely cited example is the almost perfect correlation between diagnoses of autism and sales of organic foods (r=0.9971, p<0.0001). I don’t think anyone imagines that organic foods cause autism or that autism causes increases in organic food sales. The correlation is obviously a meaningless coincidence. And yet …

Many journalists who write about medical research are not well versed in science and critical thinking. They persist in reporting correlations from observational studies as evidence of causation. The media are full of alarmist headlines such as “Study Shows Eating X Causes Disease Y” and “If You Want to Avoid Disease Y, Stop Eating X.” In my humble opinion, journalists who misreport correlation as causation should be fired and required to get remedial education.

The Bradford Hill criteria for causation is a list of nine principles that can help establish whether a correlation represents causation. They are:

Strength (effect size)
Consistency (reproducibility)
Specificity
Temporality (effect occurs after cause)
Biological gradient (dose-response relationship)
Plausibility
Coherence between different kinds of evidence
Experimental confirmation
Analogy with other associations

Some authors add a tenth principle: Reversibility (if the cause is deleted, the effect should disappear).

Experimental Studies

Observational studies can produce suggestive correlations but can’t establish causation. For that, we need the other kind of study: experimental studies. In an experimental study, the researchers introduce an intervention and study its effects. If eating X is correlated with Y, does changing the amount of X in the diet result in different outcomes? The gold standard experimental study is the randomized controlled trial (RCT), preferably double blinded.

Sometimes an experimental study is neither possible nor feasible. Consider the question of whether smoking causes lung cancer. The ideal way to answer that question would be to randomize a large number of young people and make half of them start smoking and continue smoking for the rest of their lives while preventing the other half from ever smoking, and we would have to follow them over the many years it takes for lung cancer to develop. That can’t be done. Even if it were ethical, there’s no way to effectively control people’s behavior. Slavery is not legal, and even if we could experiment on a captive population of prisoners, there’s no way we could ensure that they wouldn’t find ways to avoid compliance. It’s not as if we could forcibly put lit cigarettes between their lips many times a day and make them inhale. And blinding would be impossible: people who smoke are well aware that they are smoking.

Fortunately, we have not had to resort to experimental studies. There is enough other evidence to have definitively established that smoking causes lung cancer. The evidence is strong, consistent, and biologically plausible. Ecological studies, epidemiologic studies, animal studies, and in vitro studies all agree. Cigarettes are known to contain carcinogens. And we know that when people stop smoking, their risk of lung cancer declines rapidly. The Bradford Hill principles are all amply met.

Problems with Randomized Controlled Trials

Randomized controlled trials are the “gold standard” of research, but they’re not always appropriate, because they may miss outcomes that take a long time to develop or that affect only a small minority of people. And they may inspire false confidence. We must remember that just because a study is a double-blind, randomized controlled trial, we can’t assume its conclusions are correct.

I review a lot of dietary supplements and so-called alternative medicines; sometimes they report positive results from a randomized, controlled trial that may appear to be a gold standard study but isn’t. One of the biggest pitfalls is when they try to do good science on something that has never been shown to exist. I call this Tooth Fairy science: you could study how much money the Tooth Fairy leaves to children in impoverished families compared to well-to-do families; you could tabulate the median amount of money children receive for the first tooth versus the tenth tooth lost. Your study could have all the trappings of science. Your results could be replicable and statistically significant. But your information is meaningless, because there’s no such thing as the Tooth Fairy. You’ve been misinterpreting parental behavior and popular customs and misattributing them as the actions of an imaginary being. Other nonexistent things that I have frequently seen studied are acupoints, acupuncture meridians, craniosacral skull movements and rhythmic fluctuations in the cerebrospinal fluid, Kirlian photography, and the human energy field that therapeutic touch practitioners have deluded themselves into thinking they are detecting and manipulating. There have even been RCTs on homeopathy, which is incredibly silly and not only doesn’t work but couldn’t possibly work as claimed.

Another frequent pitfall is reliance on a faulty research design, often called “pragmatic.” The “A + B versus B” design usually compares an alternative treatment to usual care plus the alternative treatment. If you add anything to the usual care, it is guaranteed that the combination will look better because of expectations, suggestion, the extra attention, and the placebo response. Edzard Ernst has repeatedly criticized that design, for instance in a trial among cancer survivors with chronic musculoskeletal pain, where electroacupuncture plus usual care and auricular acupuncture plus usual care produced greater pain reduction than usual care alone.

Blinding can be difficult, but RCTs that omit blinding are suspect. In double-blind studies, neither the patient nor the provider knows whether the patient received the test treatment or the placebo. In triple-blind studies, the people who assess the outcomes are also blinded as to which group the patient was in.

Sometimes it is difficult to find an appropriate placebo that patients can’t distinguish from the real thing. The best way to tell if it’s a good placebo control is what I call an exit poll: after the trial is over, subjects are asked to guess whether they had been in the treatment group or the placebo group. If they can guess better than chance, either the placebo failed to fool them or the information was somehow leaked to the participants.

Research Pitfalls

John Ioannidis showed that most published research findings are false. Research is done by human researchers, who are susceptible to human errors. To list just a few of the many things that can go wrong:

Technicians may consciously or unconsciously manipulate data to get the results they think their bosses want.
Fabricated data: no experiment was done; the researcher just made up data.
There may be calculation errors in the math, or the wrong statistical test may have been used.
The published protocol may not have been followed correctly.
Reagents may not have been properly stored.
Scientific misconduct and fraud; it’s not always detected, but 1,000 studies had to be retracted in 2014.
Experimenter bias.
Poor compliance of subjects.
Results are statistically significant but not clinically significant.
Test materials may have been contaminated.
The equipment may not have been properly maintained or calibrated.
Even when the data are good, the researchers may have drawn the wrong conclusion.
Multiple endpoints may not have been corrected for.

Avoiding Errors

Peer review isn’t perfect, but it can help spot errors. It is usually a mistake to believe a single study that has not been replicated or corroborated by other research groups. Even good preliminary studies are all too often followed by larger, better studies that reverse the original findings. When multiple studies disagree, a systematic review or meta-analysis can be done to help sort out the truth. But if the studies reviewed are not high-quality, it may be a matter of garbage in/garbage out.

Science is the best tool we have for understanding reality, but it’s carried out by fallible humans, and results can be false or misinterpreted. Science is difficult and complicated, but don’t despair! Even with all its flaws, it’s still far more reliable than any other way of knowing.

Harriet Hall

Harriet Hall, MD, a retired Air Force physician and flight surgeon, writes and educates about pseudoscientific and so-called alternative medicine. She is a contributing editor and frequent contributor to the Skeptical Inquirer and contributes to the blog Science-Based Medicine. She is author of Women Aren’t Supposed to Fly: Memoirs of a Female Flight Surgeon and coauthor of the 2012 textbook Consumer Health: A Guide to Intelligent Decisions.