Da­ta Qual­i­ty 2.0: The Fu­ture of Re­al-World Ev­i­dence

The need for a da­ta-qual­i­ty frame­work is clear as re­al-world da­ta (RWD) be­comes in­creas­ing­ly com­plex and di­verse. If we’re talk­ing about EHR da­ta qual­i­ty though, let’s take a mo­ment to dive a bit deep­er in­to RWD from the EHR.

Flat­iron Health’s RWD is cu­rat­ed from the EHRs of a na­tion­wide net­work of aca­d­e­m­ic and com­mu­ni­ty can­cer clin­ics. The rich­est clin­i­cal da­ta, like stages of di­ag­no­sis and clin­i­cal end­points, ex­ists in un­struc­tured fields. It’s chal­leng­ing and com­pli­cat­ed to pull that da­ta, re­quir­ing both hu­man in­ter­ac­tion (in­clud­ing 2,000 hu­man ab­strac­tors at Flat­iron), ma­chine learn­ing, and nat­ur­al lan­guage pro­cess­ing.

And it’s clear that qual­i­ty mat­ters. Re­cent­ly, we’ve seen the growth of reg­u­la­to­ry and pol­i­cy guid­ance around its use by the FDA, EMA, NICE, Duke-Mar­go­lis Health Pol­i­cy Cen­ter, and oth­ers. It’s al­so clear that qual­i­ty is not just a sin­gle con­cept. It has mul­ti­ple di­men­sions, which fall in­to the cat­e­gories of rel­e­vance and re­li­a­bil­i­ty.

As­sess­ing RWD: rel­e­vance.

Rel­e­vance of the source da­ta has sev­er­al sub­di­men­sions:

  • Avail­abil­i­ty: Are crit­i­cal da­ta fields rep­re­sent­ing ex­po­sures, co­vari­ates, and out­comes avail­able?
  • Rep­re­sen­ta­tive­ness: Do pa­tients in the dataset rep­re­sent the pop­u­la­tion you want to study, e.g., pa­tients on a par­tic­u­lar can­cer ther­a­py.
  • Suf­fi­cien­cy: Is the size of the pop­u­la­tion enough? Is there enough fol­low-up time in the da­ta source to demon­strate the ex­pect­ed out­comes (e.g., sur­vival, ad­verse events)?

These are tra­di­tion­al­ly as­sessed in sup­port of a spe­cif­ic re­search ques­tion or use case. But at Flat­iron, we must think more broad­ly to en­sure our mul­ti-pur­pose datasets cap­ture vari­ables that ad­dress the most com­mon and im­por­tant use cas­es (e.g., nat­ur­al his­to­ry, treat­ment pat­terns, safe­ty and ef­fi­ca­cy).

We al­so con­sid­er rel­e­vance as we ex­pand our net­work – re­ly­ing not on­ly on com­mu­ni­ty clin­ics that use our EHR soft­ware, On­coEMR®, but al­so in­ten­tion­al­ly part­ner­ing with aca­d­e­m­ic cen­ters that use oth­er soft­ware. This en­ables us to im­prove the num­ber of pa­tients rep­re­sent­ed and make sure we’re aligned to where can­cer pa­tients ac­tu­al­ly re­ceive care.

As­sess­ing RWD: re­li­a­bil­i­ty.

An­oth­er di­men­sion of qual­i­ty is re­li­a­bil­i­ty. which has sev­er­al crit­i­cal sub-di­men­sions:

  • Ac­cu­ra­cy: How well does the da­ta mea­sure what it’s ac­tu­al­ly sup­posed to mea­sure?
  • Com­plete­ness: How much of the da­ta is present or ab­sent for the co­hort stud­ied?
  • Prove­nance: What is the ori­gin of a piece of da­ta and how and why did it get to the present place? This in­cludes a record of trans­for­ma­tions from the point of col­lec­tion to the fi­nal data­base.
  • Time­li­ness: Does the da­ta col­lect­ed and cu­rat­ed have ac­cept­able re­cen­cy so that the pe­ri­od of cov­er­age rep­re­sents re­al­i­ty? Are doc­u­ments re­freshed in re­al time re­cen­cy?

At Flat­iron, we have de­vel­oped im­por­tant process­es and in­fra­struc­ture to en­sure our da­ta is re­li­able, with clear op­er­a­tional de­f­i­n­i­tions. Our clin­i­cal and sci­en­tif­ic ex­perts help es­tab­lish these process­es, whether us­ing an ML al­go­rithm or guid­ance for hu­man ab­strac­tion.

How does Flat­iron en­sure ac­cu­ra­cy through val­i­da­tion?

We per­form val­i­da­tion at mul­ti­ple lev­els through­out the da­ta life­cy­cle, e.g., at the field lev­el at the time of da­ta en­try and at the co­hort lev­el. We use dif­fer­ent quan­ti­ta­tive and sta­tis­ti­cal ap­proach­es to val­i­date the da­ta at dif­fer­ent lev­els – us­ing a range of met­rics de­pend­ing up­on the ap­proach.

Fig­ure 1:

Ex­am­ples of val­i­da­tion ap­proach­es we use at Flat­iron Health in­clude:

  • Ex­ter­nal ref­er­ence stan­dard: An ex­am­ple is Na­tion­al Death In­dex to val­i­date a com­pos­ite mor­tal­i­ty vari­able, date of death (al­go­rith­mi­cal­ly de­rived from EHR and oth­er sources such as the SS­DI and obit­u­ary da­ta). In Fig­ure 2, we ex­am­ined sur­vival curves us­ing our mor­tal­i­ty vari­able to de­fine the time to event out­come. We found that we got es­sen­tial­ly the same curve us­ing death dates from the NDI as from our mor­tal­i­ty vari­able.
  • In­di­rect bench­mark: De­rived from in­for­ma­tion from lit­er­a­ture or clin­i­cal prac­tice, e.g., we val­i­dat­ed a nov­el re­al world pro­gres­sion vari­able by cor­re­lat­ing to lit­er­a­ture and re­lat­ed end­points (Fig­ure 2). Each curve rep­re­sents a dif­fer­ent Time-to-event (TTE) analy­sis, and you see ex­pect­ed cor­re­la­tions be­tween pro­gres­sion free sur­vival, over­all sur­vival, time-to-next treat­ment (TTNT) and time-to-pro­gres­sion (TTP).
  • In­ter­nal ref­er­ence stan­dard: an ap­proach we typ­i­cal­ly use when eval­u­at­ing a nov­el cu­ra­tion process like ma­chine learn­ing. For ex­am­ple, in test­ing our ML al­go­rithms, we used hu­man-ab­stract­ed da­ta as our “ref­er­ence stan­dard”. In Fig­ure 2, you see two sur­vival curves for pa­tients with ROS1 pos­i­tive NSCLC. The curves close­ly over­lap, demon­strat­ing very sim­i­lar re­sults with each cu­ra­tion process.

Fig­ure 2:

Ac­cu­ra­cy: ver­i­fi­ca­tion checks.

Us­ing clin­i­cal knowl­edge, we al­so mon­i­tor da­ta and ad­dress dis­crep­an­cies and out­liers over time through dif­fer­ent types of ver­i­fi­ca­tion checks:

  • Con­for­mance is the com­pli­ance of da­ta val­ues with in­ter­nal re­la­tion­al, for­mat­ting, or com­pu­ta­tion­al de­f­i­n­i­tions or stan­dards.
  • Plau­si­bil­i­ty is the be­liev­abil­i­ty or truth­ful­ness of da­ta val­ues.
  • Con­sis­ten­cy is the sta­bil­i­ty of a da­ta val­ue with­in a dataset, across linked datasets, or over time.

An ex­am­ple is us­ing clin­i­cal ex­per­tise to eval­u­ate tem­po­ral plau­si­bil­i­ty of a pa­tient’s time­line of di­ag­no­sis, treat­ment se­quences, and fol­low-up to as­sess whether da­ta are log­i­cal­ly be­liev­able.

Ac­cu­ra­cy: com­plete­ness.

Com­plete­ness is a crit­i­cal com­ple­ment to ac­cu­ra­cy to as­sess re­li­a­bil­i­ty. It’s not enough for da­ta to be ac­cu­rate, if it must be present first! We re­al­ize though that com­plete­ness in EHR based da­ta is un­like­ly to be 100%.

To en­sure com­plete­ness is meet­ing an ac­cept­able lev­el for qual­i­ty, we place con­trols and process­es in place across mul­ti­ple lev­els. Da­ta flows through many chan­nels be­tween the ex­am room and the fi­nal dataset – each step along the way is a point at which some el­e­ments may be lost, mis­la­beled, or in­ap­pro­pri­ate­ly trans­formed.

We place con­trols and process­es in place across mul­ti­ple lev­els to mon­i­tor com­plete­ness. Thresh­olds are based on clin­i­cal ex­pec­ta­tions. In ad­di­tion, in­te­gra­tion of sources with­in or be­yond the EHR can im­prove com­plete­ness.

In sum­ma­ry.

Un­der­stand­ing the qual­i­ty of RWD is crit­i­cal to de­vel­op­ing the right an­a­lyt­ic ap­proach. But qual­i­ty is not mea­sured by a sin­gle num­ber, it re­quires mul­ti­ple di­men­sions. At Flat­iron, ap­ply­ing cross-dis­ci­pli­nary ex­per­tise across the da­ta life­cy­cle and a com­mit­ment to da­ta trans­paren­cy en­sure our da­ta users are equipped with the knowl­edge they need to gen­er­ate im­pact­ful RWE.


Emily Castellanos

MD, MPH, Senior Medical Director, Flatiron Health