No Good Drug Left Be­hind: Sen­si­tiv­i­ty and Speci­fici­ty in Drug De­vel­op­ment Ex­plained

Drug-in­duced liv­er in­jury (DILI), a lead­ing cause of safe­ty-re­lat­ed clin­i­cal tri­al fail­ure and mar­ket with­drawals around the world, has been a per­sis­tent threat to drug de­vel­op­ment for decades1,2. An­i­mals like rats, dogs, and mon­keys serve as the last line of de­fense against DILI, catch­ing the tox­ic ef­fects that drugs could have be­fore they reach hu­mans. Yet, dif­fer­ences be­tween species se­vere­ly lim­it these mod­els, and the con­se­quences of this gap are borne out in halt­ed clin­i­cal tri­als and even pa­tient deaths.

Put sim­ply, non-hu­man pre­clin­i­cal mod­els lim­it the num­ber of life-sav­ing drugs that make it to mar­ket—and have dire con­se­quences if they are wrong.

In pre­clin­i­cal drug de­vel­op­ment, re­searchers use a wide range of mod­els—in­clud­ing an­i­mals, spher­oids, and tran­swells—to de­ter­mine which drugs should ad­vance to clin­i­cal tri­als. Whether or not sci­en­tists make the right de­ci­sion de­pends large­ly on the qual­i­ty of the mod­els they use. And that qual­i­ty is mea­sured as both sen­si­tiv­i­ty and speci­fici­ty.

The Im­por­tance of Mod­el Sen­si­tiv­i­ty

In this con­text, “sen­si­tiv­i­ty” de­scribes how of­ten a mod­el suc­cess­ful­ly iden­ti­fies a tox­ic drug can­di­date as such. So, a mod­el with 100% sen­si­tiv­i­ty would cor­rect­ly flag all harm­ful drug can­di­dates. If the mod­el sys­tem al­lows a tox­ic drug to pass through with­out a strong in­di­ca­tion of harm, it will sup­port a false con­clu­sion that the drug is non-tox­ic—a re­sult col­lo­qui­al­ly known as a “false-neg­a­tive.”

False-neg­a­tives can en­able harm­ful drug can­di­dates to reach hu­man tri­als. To avoid this, re­searchers have long sought mod­els that close­ly ap­prox­i­mate the hu­man body. For more than 80 years, an­i­mals have filled that need; how­ev­er, they are far from per­fect. Ge­net­ic and phys­i­o­log­i­cal dif­fer­ences can pro­duce dis­crep­an­cies in drug re­sponse—a drug that ap­pears safe in rats may turn out to be lethal in hu­mans. Such a re­sult would be de­scribed as a false-neg­a­tive.

As rough­ly 90% of drugs that en­ter clin­i­cal tri­als fail—many due to safe­ty con­cerns—it is clear that an­i­mal mod­els are far from 100% sen­si­tive3. Un­for­tu­nate­ly, re­searchers have yet to pro­duce ro­bust da­ta on an­i­mal mod­els’ sen­si­tiv­i­ty, par­tic­u­lar­ly with re­spect to pre­clin­i­cal tox­i­col­o­gy4. Per­haps the main rea­son why is their as­sump­tion that an­i­mals are as good as it gets—a be­lief that can eclipse their de­sire for proof. How­ev­er, giv­en the drug fail­ure rate—30–40% of which is due to tox­i­c­i­ty re­spons­es—an­i­mals pre­sum­ably did not pro­vide suf­fi­cient ev­i­dence to fore­cast drug tox­i­c­i­ty in hu­mans, and us­ing more sen­si­tive mod­els could have helped re­searchers in pre­dict­ing these drugs’ tox­ic ef­fects.

Re­searchers mea­sure a mod­el’s sen­si­tiv­i­ty by screen­ing a set of test drug can­di­dates. It is es­sen­tial that the set of drugs be care­ful­ly se­lect­ed. Oth­er­wise, one could, for ex­am­ple, bias the test drugs to­wards “easy” drugs that are very tox­ic in ways that even sim­ple mod­els would iden­ti­fy; how­ev­er, show­ing that a new mod­el can do this like­ly does not demon­strate its util­i­ty in cap­tur­ing the dif­fi­cult drugs that are deemed safe by an­i­mal test­ing. This would be sim­i­lar to claim­ing a tele­scope’s abil­i­ty to spot the sun makes it sen­si­tive to ob­serv­ing stars—while tech­ni­cal­ly true, this is ir­rel­e­vant to re­al-world chal­lenges. That said, it is not un­com­mon for pre­clin­i­cal mod­els to sim­ply be test­ed us­ing high­ly tox­ic drugs that nev­er made it to clin­i­cal tri­als5,6.

While sen­si­tiv­i­ty is im­por­tant, it is not enough—mod­els must al­so iden­ti­fy drugs as tox­ic or non-tox­ic cor­rect­ly. That is, they must al­so have high speci­fici­ty.

The Give and Take of Speci­fici­ty and Sen­si­tiv­i­ty

“Speci­fici­ty” refers to how ac­cu­rate a mod­el is in iden­ti­fy­ing non-tox­ic drug can­di­dates. A 100%-spe­cif­ic mod­el would nev­er claim that a non-tox­ic can­di­date is tox­ic. Im­por­tant­ly, a mod­el can be 100% sen­si­tive with­out be­ing very spe­cif­ic. For ex­am­ple, an overea­ger mod­el that calls most can­di­dates “tox­ic” may cap­ture all tox­ic can­di­dates (100% sen­si­tiv­i­ty) but al­so mis­la­bel many non-tox­ic can­di­dates as tox­ic (mediocre speci­fici­ty).

Re­searchers want the most sen­si­tive pre­clin­i­cal tox­i­col­o­gy mod­els pos­si­ble, as high­er sen­si­tiv­i­ty means more suc­cess­ful clin­i­cal tri­als, safer pa­tients, and bet­ter eco­nom­ics. How­ev­er, this can­not come at the cost of low speci­fici­ty and po­ten­tial­ly fail­ing good drugs. An over­ly sen­si­tive mod­el with a low tox­i­c­i­ty thresh­old would catch all tox­ic drugs, but it may al­so misiden­ti­fy drugs that are ac­tu­al­ly safe and ef­fec­tive in hu­mans. Good drugs are rare, and con­sid­er­able ef­fort and in­vest­ment goes in­to their de­vel­op­ment. Even one drug that nev­er reach­es the clin­ic can cost phar­ma­ceu­ti­cal com­pa­nies bil­lions and leave a pa­tient pop­u­la­tion with­out treat­ment. Mod­els should do their ut­most to clas­si­fy non-tox­ic com­pounds as such—that is, to have 100% speci­fici­ty.

But how can drug de­vel­op­ment in­sist on per­fect speci­fici­ty when no mod­el is per­fect? For­tu­nate­ly, there is a give-and-take be­tween sen­si­tiv­i­ty and speci­fici­ty that mod­el de­vel­op­ers can take ad­van­tage of: One can be trad­ed for the oth­er.

In de­ci­sion analy­sis, sen­si­tiv­i­ty and speci­fici­ty can be “di­aled in” for the mod­el in ques­tion. In most cas­es, this in­volves set­ting a thresh­old when an­a­lyz­ing the mod­el’s out­put. In a re­cent study pub­lished in Com­mu­ni­ca­tions Med­i­cine, part of Na­ture Port­fo­lio, Ewart et al. set a thresh­old of 375 on the quan­ti­ta­tive out­put of the Em­u­late hu­man Liv­er-Chip—an ad­vanced, three-di­men­sion­al cul­ture sys­tem that mim­ics hu­man liv­er tis­sue; in the case of he­pat­ic spher­oids, an old­er mod­el sys­tem, re­searchers have set a thresh­old of 50. In both cas­es, the high­er the thresh­olds, the more sen­si­tive and less spe­cif­ic the mod­el tends to be. These thresh­olds were se­lect­ed pre­cise­ly to di­al the sys­tems in­to 100% speci­fici­ty.

Ewart et al. found that even while main­tain­ing such a strict speci­fici­ty, the Liv­er-Chip achieved a stag­ger­ing 87% sen­si­tiv­i­ty. This means that, on top of cor­rect­ly iden­ti­fy­ing most of the tox­ic drugs, the Liv­er-Chip nev­er misiden­ti­fied a non-tox­ic drug in the study as tox­ic. For drug de­vel­op­ers, this means that no good drugs—nor the con­sid­er­able re­sources poured in­to their de­vel­op­ment—would be wast­ed. Us­ing mod­els like Or­gan-Chips that achieve high sen­si­tiv­i­ty along­side per­fect speci­fici­ty would al­low drug de­vel­op­ers to de­pri­or­i­tize po­ten­tial­ly dan­ger­ous drugs with­out sac­ri­fic­ing good drugs. In all, this could lead to more pro­duc­tive drug de­vel­op­ment pipelines, safer drugs pro­gress­ing to clin­i­cal tri­als, and more pa­tient lives saved.


  1. Craveiro, Nuno Sales, et al. “Drug With­draw­al due to Safe­ty: A Re­view of the Da­ta Sup­port­ing With­draw­al De­ci­sion.” Cur­rent Drug Safe­ty, vol. 15, no. 1, 3 Feb. 2020, pp. 4–12,
  2. Re­search, Cen­ter for Drug Eval­u­a­tion and. “Drug-In­duced Liv­er In­jury: Pre­mar­ket­ing Clin­i­cal Eval­u­a­tion.” U.S. Food and Drug Ad­min­is­tra­tion, 17 Oct. 2019,­u­la­to­ry-in­for­ma­tion/search-fda-guid­ance-doc­u­ments/drug-in­duced-liv­er-in­jury-pre­mar­ket­ing-clin­i­cal-eval­u­a­tion.
  3. David B. “Fac­tors As­so­ci­at­ed with Clin­i­cal Tri­als That Fail and Op­por­tu­ni­ties for Im­prov­ing the Like­li­hood of Suc­cess: A Re­view.” Con­tem­po­rary Clin­i­cal Tri­als Com­mu­ni­ca­tions, vol. 11, Sept. 2018, pp. 156–164,­ti­cles/PMC6092479/,
  4. Bai­ley, Jar­rod, et al. “An Analy­sis of the Use of An­i­mal Mod­els in Pre­dict­ing Hu­man Tox­i­col­o­gy and Drug Safe­ty.” Al­ter­na­tives to Lab­o­ra­to­ry An­i­mals, vol. 42, no. 3, June 2014, pp. 181–199,
  5. Zhou, Yit­ian, et al. “Com­pre­hen­sive Eval­u­a­tion of Organ­otyp­ic and Mi­cro­phys­i­o­log­i­cal Liv­er Mod­els for Pre­dic­tion of Drug-In­duced Liv­er In­jury.” Fron­tiers in Phar­ma­col­o­gy, vol. 10, 24 Sept. 2019, Ac­cessed 22 Nov. 2020.
  6. Birc­sak, Kristin M., et al. “A 3D Mi­croflu­idic Liv­er Mod­el for High Through­put Com­pound Tox­i­c­i­ty Screen­ing in the OrganoPlate®.” Tox­i­col­o­gy, vol. 450, Feb. 2021, p. 152667,