Solving Pharma’s Data Silo Problem: Why a Linkable Data Infrastructure is Good for Biopharma and Good for Patients

Solv­ing Phar­ma’s Da­ta Si­lo Prob­lem: Why a Link­able Da­ta In­fra­struc­ture is Good for Bio­phar­ma and Good for Pa­tients

Each year, bio­phar­ma com­pa­nies spend bil­lions of dol­lars gen­er­at­ing, buy­ing and an­a­lyz­ing da­ta. But a ba­sic prob­lem con­tin­ues to plague every­one from the small­est biotech start­up to the largest For­tune 100 com­pa­nies: not be­ing able to con­nect the da­ta they’ve amassed at the pa­tient lev­el while pro­tect­ing pa­tient pri­va­cy. For decades, there hasn’t been a so­lu­tion.

COVID-19 has de­mand­ed un­prece­dent­ed speed and in­no­va­tion in drug de­vel­op­ment, and the time is right to re-think what’s pos­si­ble.

Con­nect­ing phar­ma’s siloed da­ta is not on­ly pos­si­ble, it’s al­ready be­ing done. With­in the last two years, com­pa­nies across the in­dus­try have rec­og­nized this is a solv­able tech­nol­o­gy and co­or­di­na­tion prob­lem. Es­tab­lish­ing a link­able da­ta in­fra­struc­ture — in which all of a phar­ma com­pa­ny’s da­ta can be linked at the pa­tient lev­el — is be­com­ing the key­stone of a next-gen­er­a­tion, pa­tient-cen­tric da­ta strat­e­gy.

Phar­ma’s Da­ta Si­lo Prob­lem

A sin­gle, siloed dataset can be use­ful for an­swer­ing on­ly a lim­it­ed set of ques­tions. For ex­am­ple:

  • Clin­i­cal tri­als are used to gath­er da­ta on drug ef­fi­ca­cy and safe­ty.
  • Be­fore launch, to un­der­stand the cur­rent mar­ket and tar­get pa­tient pop­u­la­tion, phar­ma com­pa­nies might pur­chase third-par­ty phar­ma­cy and med­ical claims, lab da­ta and EHR da­ta.
  • Post-launch, phar­ma com­pa­nies may use prod­uct reg­istries or third-par­ty, re­al-world da­ta like claims, EHR, and mor­tal­i­ty da­ta to con­duct HEOR in sup­port of re­im­burse­ment.
  • To im­prove pa­tient ac­cess and ad­her­ence, they might al­so use their pro­pri­etary da­ta, in­clud­ing da­ta from pa­tient sup­port pro­grams and spe­cial­ty phar­ma­cies.

Each ques­tion may re­quire its own dataset (or mul­ti­ple datasets) to an­swer — and con­nect­ing that da­ta at the pa­tient lev­el isn’t straight­for­ward. To pro­tect pa­tient pri­va­cy, phar­ma large­ly works with de-iden­ti­fied da­ta, to­k­enized da­ta. To­k­eniza­tion re­places iden­ti­fy­ing in­for­ma­tion with ran­dom strings of char­ac­ters, or to­kens, that can­not be re­versed to re­veal the un­der­ly­ing in­for­ma­tion. To­kens are al­so con­sis­tent — that is, the same iden­ti­fy­ing in­for­ma­tion will gen­er­ate the same to­ken every time, so the process can be used to link de-iden­ti­fied pa­tient records across datasets while pro­tect­ing pa­tient pri­va­cy.

His­tor­i­cal­ly, the util­i­ty of this ap­proach to pri­va­cy-pre­serv­ing da­ta link­age has been lim­it­ed be­cause each dataset that a phar­ma com­pa­ny us­es has been de-iden­ti­fied us­ing a dif­fer­ent to­k­eniza­tion scheme.

With­out us­ing a com­mon to­ken, or “key”, every dataset be­comes its own si­lo. And with­out be­ing able to link it to oth­er pa­tient-lev­el da­ta, much of the dataset’s val­ue re­mains locked away.

Let’s say a bio­phar­ma com­pa­ny wants to un­der­stand why pa­tients are drop­ping off their rheuma­toid arthri­tis prod­uct. Their prod­uct is dis­pensed through 6 dif­fer­ent spe­cial­ty phar­ma­cies. They’ve al­so pur­chased med­ical claims and EHR da­ta to sup­ple­ment their analy­sis.

His­tor­i­cal­ly, each of these datasets would have re­mained in sep­a­rate si­los, with analy­sis lim­it­ed to one dataset at a time. With­out link­ing da­ta at the pa­tient lev­el across all 6 spe­cial­ty phar­ma­cies and the third-par­ty da­ta sources, it’s dif­fi­cult to see treat­ment and ad­her­ence pat­terns, or un­der­stand the un­der­ly­ing clin­i­cal fac­tors for non-ad­her­ence.

The So­lu­tion: a Link­able Da­ta In­fra­struc­ture (LDI)

If the bio­phar­ma com­pa­ny ap­plies the same to­k­eniza­tion scheme to each dataset, all of the records for the same pa­tient can be linked with the same “key”, with­out com­pro­mis­ing pa­tient pri­va­cy or HIPAA com­pli­ance. The re­sult­ing lon­gi­tu­di­nal dataset pro­vides deep­er in­sights on whether a pa­tient has tru­ly dis­con­tin­ued ther­a­py or just switched phar­ma­cies, why they might be non-ad­her­ent, and how out­comes were af­fect­ed.

That’s just one ap­pli­ca­tion of a Link­able Da­ta In­fra­struc­ture (LDI).

Com­mer­cial teams can take ad­van­tage of an LDI to en­hance:

  • Brand an­a­lyt­ics & mar­ket­ing strat­e­gy by link­ing spe­cial­ty phar­ma­cy da­ta to third-par­ty re­al-world da­ta;
  • HEOR & mar­ket ac­cess by link­ing tri­al da­ta with third-par­ty re­al-world da­ta; and
  • Com­mer­cial tar­get­ing & mea­sure­ment by link­ing re­al-world da­ta to im­prove HCP tar­get­ing and mea­sure ef­fec­tive­ness of pro­mo­tion­al spend

How to Es­tab­lish a Link­able Da­ta In­fra­struc­ture

Es­tab­lish­ing an LDI doesn’t re­quire build­ing any­thing new, but rather tak­ing ad­van­tage of ex­ist­ing re­sources. The first step is work­ing with tech­nol­o­gy that is al­ready stan­dard across the in­dus­try.

Data­vant, for ex­am­ple, is the most wide­ly-used pri­va­cy pro­tec­tion and con­nec­tiv­i­ty part­ner in health­care, and has built the largest ecosys­tem of link­able re­al-world da­ta in the U.S. More than 350 da­ta sources — in­clud­ing all of the top claims, lab, and EHR da­ta providers, plus mor­tal­i­ty da­ta, spe­cial­ty phar­ma­cies, aca­d­e­m­ic med­ical cen­ters, and so­cial de­ter­mi­nants of health da­ta source — al­ready use Data­vant to to­k­enize their da­ta.

That gives phar­ma the miss­ing “key” to link da­ta across all their sources.

In the open da­ta ecosys­tem Data­vant has built, each par­ty is em­pow­ered to freely work with oth­er da­ta providers and link the da­ta most rel­e­vant to their use case. This means that phar­ma com­pa­nies aren’t lim­it­ed to buy­ing off-the-shelf cuts of siloed da­ta, and can en­hance what they’re al­ready pur­chas­ing from ag­gre­ga­tors.

The Next Fron­tier: Link­ing to Re­al-World Da­ta for Clin­i­cal De­vel­op­ment

One of the biggest re­main­ing da­ta si­los is clin­i­cal tri­al da­ta.

Un­til re­cent­ly, the tech­ni­cal chal­lenges of to­k­eniz­ing and link­ing da­ta with­out un­blind­ing the study have made this im­pos­si­ble. Data­vant has part­nered with com­pa­nies across the in­dus­try (in­clud­ing Janssen, Parex­el, Med­able, Me­di­da­ta, and TriNetX) to solve this prob­lem, and phar­ma com­pa­nies are now start­ing to use LDI to dis­man­tle the si­los be­tween re­al-world da­ta and clin­i­cal tri­als.

By link­ing re­al-world da­ta to clin­i­cal tri­al da­ta, phar­ma com­pa­nies can con­duct smarter sub­co­hort analy­sis, use re­al-world da­ta to pas­sive­ly and more ef­fi­cient­ly col­lect da­ta for post-mar­ket­ing stud­ies, and start gath­er­ing ev­i­dence to sup­port re­im­burse­ment much soon­er.

With an LDI, you could link da­ta from a pa­tient’s Phase III tri­al re­sults with the same pa­tient’s re­al-world claims, lab da­ta, and more. You could al­so link your da­ta from their in­ter­ac­tions with your pa­tient sup­port pro­grams and spe­cial­ty phar­ma­cies for a com­pre­hen­sive, lon­gi­tu­di­nal view.

Drug de­vel­op­ment and com­mer­cial­iza­tion has tra­di­tion­al­ly re­lied on large amounts of da­ta. Con­tin­ued in­no­va­tion de­pends not on gen­er­at­ing or buy­ing even more da­ta, but on con­nect­ing the dots.

More and more com­pa­nies across the in­dus­try are con­nect­ing their da­ta si­los and es­tab­lish­ing a link­able da­ta in­fra­struc­ture for a com­pre­hen­sive, lon­gi­tu­di­nal view of pa­tients. That means bring­ing more life-sav­ing ther­a­pies to mar­ket, get­ting them to the right pa­tients faster, and im­prov­ing pa­tient out­comes for gen­er­a­tions.

About Data­vant:

Data­vant’s mis­sion is to con­nect the world’s health da­ta to im­prove pa­tient out­comes. Data­vant works to re­duce the fric­tion of da­ta shar­ing across the health­care in­dus­try by build­ing tech­nol­o­gy that pro­tects the pri­va­cy of pa­tients, while sup­port­ing the link­age of de-iden­ti­fied pa­tient records across datasets. Data­vant is head­quar­tered in San Fran­cis­co. Learn more about Data­vant at­