Something Doesn't Add Up

Illustration: Alex Nabaum

Last June, Stanford orthopedic surgeon Eugene Carragee and his editorial team at the Spine Journal announced they had examined data that Medtronic Inc. presented a decade ago to get approval for the spinal bone graft product sold as Infuse.

Not only did the team find that evidence for Infuse's benefits over existing alternatives for most patients was questionable; they also discovered in a broad array of published research that risks of complications (including cancer, male sterility and other serious side effects) appeared to be 10 to 50 times higher than 13 industry-sponsored studies had shown. And they learned that authors of the early studies that found no complications had been paid between $1 million and $23 million annually by the company for consulting, royalties and other compensation. Carragee, MD '82, estimates Medtronic has sold several billion dollars' worth of Infuse for uses both approved and "off label." Medtronic issued a statement saying it believed the product was safe for approved use and gave a $2.5 million grant to Yale University researchers to review the data. Their analysis is expected this year.

The financial influence Carragee's team unearthed makes this case particularly jarring. Yet the phenomenon of flawed research is not new. Medical science studies routinely reverse or cast doubt on previous research that guided physicians' recommendations on everything from which fat we can safely slather on our morning muffin to some of the most invasive and expensive procedures doctors perform on the human body. No wonder many people feel less than confident when facing important health decisions. If it seems that the pace of these contradictory reports is picking up, it's not your imagination.

Prompted by soaring health-care costs and increasingly sophisticated analytical tools, more and more medical treatments are taking their turn under the microscope. One driving force is John P.A. Ioannidis, chief of the Stanford Prevention Research Center. He works with colleagues around the globe to scrutinize treatments that account for huge chunks of the health- care tab but that are, he says, virtually worthless and sometimes harmful. Ioannidis says financial influence is one of several factors that can, deliberately or unintentionally, skew study design and methodology and undermine the validity of published research findings. His extensive publications pointing out these problems are reverberating throughout the scientific community—and threaten entire medical specialties that have organized themselves around big-ticket, but low-value, interventions.

"Fixing this involves a rethinking of the process that won't happen overnight and is not very cheap," Ioannidis says. "However, if we continue in the same path we will run out of money, as a country, even the whole world . . . there is a cloud of ineffective interventions, or minimally effective interventions that are extremely expensive and therefore not worth it. We need to sort out that mess."

There are many examples of biomedical flip-flops, but Ioannidis says two are particularly illustrative of the challenge he and his colleagues face.

For years, doctors monitoring women prescribed hormone therapy during menopause believed that it protected their patients from heart disease. However, in the mid-1990s a Women's Health Initiative study randomly assigned more than 16,000 women to a controlled trial comparing the common hormone treatment—a combination of estrogen and progestin—to a placebo. Stanford's Marcia L. Stefanick, PhD '82, a professor of medicine, chaired the steering committee and was principal investigator. The trial was supposed to last eight years but was stopped in 2002 after five years, when researchers discovered that the treatment—at the time routinely prescribed to millions of women—not only failed to offer any benefit against heart disease but also appeared to correlate with an increase in both stroke and breast cancer risk.

The other jaw-dropper was the 2007 COURAGE trial examining patients with stable coronary artery disease, or hardening of the arteries. It found that a widely performed procedure called percutaneous coronary intervention, usually involving the insertion of tiny metal scaffolds called stents to prop arteries open, did not reduce incidence of death or heart attacks in these patients.

Those two treatments "cost billions of dollars and supported the existence of entire specialties for many years," Ioannidis and his co-authors wrote in January in the Journal of the American Medical Association. Ioannidis says the data clearly show that patients were subjected to risk with no chance of benefit. While the number of prescriptions for combination hormone therapy dropped 80 percent or more in the years after the WHI study, the number of coronary interventions did not decline nearly as dramatically following the COURAGE trial. "Defenders of these therapies and interventions wrote rebuttals and editorials and fought for their specialties, but the reality was that the best that could be done was to abandon ship," Ioannidis wrote in JAMA.

Those sound like fighting words. Yet Ioannidis, 46, is a soft-spoken academic whose personal style belies the startling conclusions and impact of his work. "The purpose of my research is not necessarily to be challenging or killing sacred cows," he says. "I'm not very interested in showing that one particular research paper is wrong, or you did it wrong and I'm correct. My penchant is to look at the big picture, to look at hundreds of thousands of associations." And while Ioannidis advocates for rigorous review of all existing treatments, his principal aim is to improve research design and remove bias so that ineffective treatments never enter practice to begin with.

Ioannidis's current work stems from his deep love of math and statistics. He was born in New York City to physician parents but raised in Athens, Greece, where he excelled at math from a young age. He attended the University of Athens Medical School, added a PhD in biopathology, and later trained at Harvard and Tufts and joined the National Institutes of Health, where he worked on pivotal HIV research. These days, although he often collaborates on the design of specific studies, what he mostly does is meta-research, or the study of studies. Using powerful number-crunching programs and constantly evolving algorithms, Ioannidis analyzes many trials, each with many patients. He's working to see not so much whether one treatment works or does not work, or whether one association of a specific risk factor with one disease is true or false, but whether factors related to the research process—the number of patients tested, the criteria for including data, statistical errors in an analysis, even fraud or financial incentives—may have compromised the data and conclusions. He burst on the medical establishment radar in 2005 with a paper in PLoS Medicine asserting nothing less than: "Why Most Published Research Findings Are False."

Ioannidis says he began to realize in medical school that a lot of what he was being taught was grounded not in hard data and evenly applied criteria, but rather in the instincts and habits of practitioners. Time and again he would be taught the standard of care for a given diagnosis, and yet be unable to find evidence that it was the best choice—or sometimes even that it worked at all. "Experts were dissociated from the numbers," he observes.

During his early training, Ioannidis says, he grew intrigued with some pioneering work by the late Thomas Chalmers, former dean of the Mount Sinai School of Medicine and a researcher at Harvard and Tufts medical schools. As early as the 1970s, Chalmers began advocating for large-scale, randomized clinical trials. He also pioneered the field of meta-analysis, methods that combine the results of multiple trials, which led, for example, to the more widespread use of clot-busting drugs in treating heart attacks.

Ioannidis laughs when he recalls how, early in his career, doing meta-analysis meant an infuriatingly slow process of going to the library, reading journals, and trying to compare researchers' methods to see if different studies reinforced each other's findings. Today, powerful computer programs and the growing practice of putting raw data from experiments in giant online databases have streamlined the process considerably. He can even do "meta-analyses of meta-analyses" and help large consortia of researchers harmonize the design and methods of their studies in advance so the information is more robust and comparable.

To put Ioannidis's role in perspective, it might be helpful to think of a given research investigator as a governor charged with trying to effectively create a budget, manage infrastructure and run a state. Ioannidis is like an economist for the federal government who evaluates the combined performance of all the states' efforts to grow and thrive, comparing differences, figuring out what works and doesn't, and what the resulting impact is on the national economy.

Right now, for example, he and his colleagues are researching the published results of thousands of different treatments to see if their benefits seem greater for patients in established market economies or in developing countries. If there is a difference, are there clues in the data as to why?

In another project, he is analyzing meta-analyses of expensive antibiotics. "The agenda of those funding the study can be biased, and so you may have a reliable large study or many large studies, but the framing of the question is too narrow and restricted. They compare an expensive antibiotic to a slightly less expensive antibiotic from the same company and find that both work and one works better. But they never compare the expensive antibiotic to penicillin or another inexpensive antibiotic. The studies are well done, but they ask an irrelevant question."

Studies underwritten by drug or medical device companies aren't the only research that risks being biased by financial incentives. Competition for funding from any source can influence researchers to focus on designing a study that is more likely to produce a positive outcome. What does "more likely" mean? Allowing too much leeway in determining which side effects are reported or finding reasons to toss out data that does not support a hypothesis can create skewed, unreliable results. There are also competitive professional pressures to get high-impact results published in top-tier journals, which means important confirmation studies for new findings can have a harder time getting published.

Biases are often unconscious, Ioannidis acknowledges. "Science itself is an unconscious bias. We want to discover things and make a difference. That doesn't mean that will happen. I can waste my whole career on something that doesn't matter." That pressure can inspire even a very good researcher to lose objectivity in ways that impact results. There was nothing nefarious about doctors doing observational studies about women taking hormones and extrapolating that they had less heart disease. The problem, Ioannidis says, is that it's likely that the women who sought and took the hormones during that period tended to be more health-conscious in general and probably made healthier lifestyle choices that led to their better heart health. It wasn't until a large randomized sample was scrutinized that the risk from the hormones themselves became clearer. The accumulation of problems like this led Ioannidis to conclude in the 2005 PLoS paper: "For most designs and settings, it is more likely for a research claim to be false than true."

The PLoS paper is the most viewed article in the history of Public Library of Science. Robert Tibshirani, a professor of health research and policy, and statistics at Stanford, agrees it was a watershed. "That paper really questioned the paradigm and showed that something is broken," he says. "People were suspecting a lot of positive results were borderline or not even true. John was so thorough in analyzing this. It's not like he took anecdotal stories and made big claims. His team worked for years and did the statistics and groundwork."

Dean of Medicine Philip Pizzo is a big proponent of Ioannidis's work and recruited him in 2010 from his post as chair of hygiene and epidemiology at the University of Ioannina School of Medicine in Greece to run the Stanford Prevention Research Center. "Because of the ever escalating cost of health care, medicine must become more evidence-based, and the foundations of care must be clear and defined criteria. Dr. Ioannidis's work compels the medical community to look more carefully and critically at medical recommendations that have 'assumed truth' but not critically and scientifically defensible foundations," Pizzo says.

Part of the appeal of coming to Stanford, Ioannidis says, was the emphasis on interdisciplinary collaboration. (Another was living in California; his wife Despina Contopoulos-Ioannidis is a pediatric infectious disease specialist and clinical associate professor at Stanford. They have one teenage daughter.) Both at SPRC and in Greece, where he maintains a research team at the University of Ioannina, and with scores of collaborators around the world, Ioannidis is training a new generation of research sleuths. In a 2011 study about so-called medical reversals, Ioannidis's colleague Vinay Prasad of Northwestern University analyzed 35 trials published in 2009 in a major medical journal that tested a variety of established clinical practices including prenatal care, cancer screening and surgical interventions. The researchers found that 46 percent of those trials reported results that contradicted current practice, and 3 percent were inconclusive. Something is seriously wrong. And yet, Ioannidis says, "Many standards of care are never tested."

'People say that we shouldn't delay science...we should get new treatments out there. I do not feel the pressure to do that until we have solid evidence.'

One research area that has undergone a dramatic rethinking and huge improvements in terms of study design and replication thanks to meta-analysis by Ioannidis and others is genomics, the quest to identify the genes involved in various diseases and conditions. "Until five or six years ago, the paradigm was that we had 10,000 papers a year reporting one or more genes someone thought would be important for genetic disease," Ioannidis says. "Researchers would claim they found the gene for schizophrenia or alcohol addiction or whatever, but there was very little emphasis on replicating [their findings]. Whenever we tried to replicate, most of the time it didn't survive. Something like 99 percent of the literature was unreliable." It wasn't fraud or sloppy work, he says, but rather artifacts of the process that yielded false information.

Muin J. Khoury, director of the Office of Public Health Genomics at the Centers for Disease Control and Prevention, has worked with Ioannidis for more than a decade. Khoury says that with the advent of more so-called genome-wide association studies, which tease apart the relationships between and roles of large numbers of genes, and the increased reliance on networks and consortia, confidence in results has risen. To qualify for publication, studies now must show much higher statistical significance than in the early years and researchers must perform large-scale replication efforts. "In addition, we need to fund more translational research in genomics to evaluate interventions and health outcomes," Khoury says. "So many people collaborate with [Ioannidis] now. He's a world-class expert who trains and works with many people to conduct collaborative research and research synthesis."

When it comes to the public's exposure to biomedical research findings, another frustration for Ioannidis is that "there is nobody whose job it is to frame this correctly." Journalists pursue stories about cures and progress—or scandals—but they aren't likely to diligently explain the fine points of clinical trial bias and why a first splashy result may not hold up. Ioannidis believes that mistakes and tough going are at the essence of science. "In science we always start with the possibility that we can be wrong. If we don't start there, we are just dogmatizing."

Much was wrong in the case of Infuse. Stanford's Carragee became worried in 2006 that the original industry-sponsored publications did not accurately reflect the risks of the product. He had noticed an increasing number of reports of complications in the medical literature. A leading spine surgeon's research paper on Infuse had been found to be fraudulent. And several federal investigations were looking into the company's compensation of surgeons and its promotion practices. In the editorial accompanying the team's investigative report, the Spine Journal stated, "It harms patients to have biased and corrupted research published. It harms patients to have unaccountable special interests permeate medical research. It harms patients when poor publication practices become business as usual. Yet harm has been done. And that fact creates a basic moral obligation." After Medtronic announced it would fund future study, Carragee noted that because the early studies did not connect patients' side effects to Infuse, "the voluntary reporting of FDA-recognized adverse events . . .  by practicing surgeons was handicapped," and that continues to delay a timely understanding of these connections.

Making sure that good data support interventions will become even more important with the evolution of personalized medicine, treatments uniquely designed for the individual profile of a patient. In theory, personalized medicine could reduce health costs by delivering more effective early treatment. However, without reliable data, personalized medicine will remain an expensive and destructive illusion. That's one reason Ioannidis is helping create a new populations studies initiative at the School of Medicine, designed to corral Stanford's diverse resources to better analyze health in the context of entire populations. Ioannidis explains: "It's not possible to provide the best health care for one person without understanding how the health of the larger group is determined."

Without a rigorous, data-driven context, medicine's expensive traditions and hunch-based treatments threaten to bankrupt us. "People say that we shouldn't delay science; people are dying; we should get new treatments out there. I do not feel the pressure to do that until we have solid evidence," Ioannidis asserts. "The resources many procedures draw are enormous." And that leaves insufficient funds for the prevention plans and treatments we know actually work.

Joan O'Connell Hamilton, '83, is a frequent contributor to Stanford based in Menlo Park.