
[Video] Putting the AI in R&D — with Badhri Srinivasan, Tony Wood, Rosana Kapeller, Hugo Ceulemans, Saurabh Saha and Shoibal Datta
During BIO this year, I had a chance to moderate a panel among some of the top tech experts in biopharma on their real-world use of artificial intelligence in R&D. There’s been a lot said about the potential of AI, but I wanted to explore more about what some of the larger players are actually doing with this technology today, and how they see it advancing in the future. It was a fascinating exchange, which you can see here. The transcript has been edited for brevity and clarity. — John Carroll
Thanks to PPD Biotech for their sponsorship of this event at #BIO2019.
John Carroll: This panel conversation originated last March when I had an interaction with somebody on Twitter, which was very interesting. It was somebody in AI who made the claim that they were in a position to do something dramatic with the way that pharma companies develop new drugs, particularly at the discovery stage. And the number that he used was more than $400 million for lead identification alone (which includes the cost of capital). This caused a significant response among a variety of people who don’t have more than $400 million to do lead identification.
We exchanged some emails after that. He pointed to a study that Steve Paul did when he was at Eli Lilly, I know some of the guys who were involved. One was Bernard Munos. Bernard and I had an exchange. I said in this study you said the hard cost of lead identification at Eli Lilly is $146 million. I mean, you could start a heck of a company for $146 million and that’s not just for lead identification. And he came back and we had this exchange: It’s a huge process that involves all this money and all this work.
So I’ve been thinking about it ever since then. And what I really wanted to do today was to bring together a group of experts who are grappling with this area, this technology of artificial intelligence and machine learning, because I think everybody’s focused on it in one fashion or another. And sometimes it’s practical and sometimes it’s kind of very futuristic. But everybody up here today is in a very practical situation, of having budgets, of having staffs, of having plans and working everything out so they can apply AI in a real world way right now, as it develops into something else. That is the way I would like to set the stage for this conversation.
With that, I’d like to introduce everybody here today. We have Badhri Srinivasan who is the head of Global Development Operations at Novartis. Tony Wood, Senior Vice President, Medicinal Science and Technology for GSK. Rosana Kapeller, formerly of Nimbus, currently an entrepreneur-in-residence at GV, or a company that I used to call Google. Hugo Ceulemans, the Scientific Director of Discovery Sciences at Janssen. Saurabh Saha, the Senior Vice President of R&D at Bristol Myers. And Shoibal Datta, Vice President, Head of Product and Technology, PPD Biotech.
So Rosana, you’ve been directly involved in this at Nimbus and also currently at GV. When you hear some of the different projections in terms of what you can do with AI today, what’s your reaction? Is this something that is going to take a while to develop? What’s actually happening now?
Rosana Kapeller: I’m very interested in the intersection of machine learning and drug discovery. So that’s one of the reasons actually why I joined GV, to learn more about it. And I think that it’s something that we all have to learn more about it because it’s really going to augment what we do in drug discovery and also what we do in developing drugs further during development.
Is AI going to be sort of the panacea that is going to cure all our problems right now? The answer, I think is no. I think that machine learning at the current stage can be applied to discrete pieces of the drug discovery development spectrum. But it’s not going to get you — let’s identify a target and get you to man in three years. It’s not going to be that.
John Carroll: Okay, so everybody here has their own particular take on this. When you hear it, something like $146 million for lead identification, where do you see the opportunities? Where are people going right now? Tony?
Tony Wood: John, the opportunity from our standpoint isn’t the $146 million, it’s the little better than 10% success rates during Phase II clinical studies. And so our focus really is about a problem that if, for example, we’ve shown from publications in genetics that you can increase your success rate to around about 20% if you focus on genetically inspired targets.
John Carroll: That would be 20% for everything going into the clinicals.
Tony Wood: That’s right. It’s clinical study. So in functional genomics, which if you like is the glue that connects genetics to the drug discovery processes that you’ve just been talking about, that’s a place where we can generate data of the right shape to make AI/ML a worthwhile pursuit. And one thing that I really want to get across here is that, for me, it’s not so much about the method, it’s about two things.
It’s about which problem are you trying to solve? Is it an impactful one, in our case survival in Phase II. And it’s about what data shape can you create to support the use of AI/ML methodology. Fundamentally, AI/ML needs narrow and deep data. And our problem in general, because of the last 20 years of the way we’ve conducted experimentation, is we’ve produced wide and shallow data. So the focus is where can we generate data that’s going to really allow this technique to be massively impactful.
Rosana Kapeller: I couldn’t agree more and I think that’s vastly underestimated; the shape of the data that you need to basically be utilized by machine learning.
Badhri Srinivasan: Yes, and I completely agree with both the comments here. So sometimes it started off as, okay, we can just apply an AI or ML and then just run off with it. And to both of your comments, the data that we have, we have spent the last so many years collecting this data for one purpose, which is we get it on the CRF, we send it for some registration, be done with it. This is a different kind of application. That’s going to be the hard part. How do we get that data? How do we get it in the shape we want and what is that data and what is the problem we’re trying to solve? So couldn’t agree more with this. That’s the hard part.
Saurabh Saha: Maybe I can key off of one comment Rosana made. AI can be used in a very discreet way if you ask the right question first. You have to ask the right question, I think Tony mentioned this. First you have to ask the question. Second, you have to have large labeled data sets. And third you have to have the cross functional expertise present to be able to do that analysis.
What we’ve learned in immuno-oncology is that one driver may not answer all our questions on whether a patient is responding or not responding. We tend to think in binary as humans, but there are a number of markers where if you incrementally look at each one in aggregate, the quantitative blend of those markers may be much more informative in terms of giving us an idea of who’s responding and not responding. And one of the discrete examples of this is an image analysis. So when we do any of our clinical trials now in the immuno-oncology space, we try to get biopsy samples at baseline during a patient, when the patient’s being treated and post-progression.
Now getting that data and looking at histopathology, it’s almost impossible for a pathologist or for a human to be able to look at those slides and be able to tell just visually looking at the visual landscape, which patient is responding and why they’re responding. But using machine learning now we’re able to work with some incredible startups in the Cambridge area where they can tag cell types. They can look at boundaries of tumor cells versus immune cells versus stromal tissue and be able to feed tons and tons of these data, these slides, these images from all different types of patients and tell us, okay, what are the markers that are associated with the response? And which aren’t associated with response? That’s a very discrete question that we can ask of machine learning at this time.
Tony Wood: That’s a really fantastic example because you have the case of image analysis being able to generate massive amounts of data and actually, quite frankly, see things that we can’t. To the human eye, you look at these things, they often look exactly the same. So that’s, for me, an area where here the method is doing something that we clearly cannot do with any other approach. And the point about histopathology extends also into cell biology and the opportunity then to use cell phenotype on the back of functional genomics to inform changes associated with genetic variants and what have you.
And so for me, that’s what we’re looking for. We’re looking for the intersection of measurement technologies and data sets which can allow us to see the world of biology in a different way from that which we’ve previously perceived it.
John Carroll: So how long has Janssen been working on AI?
Hugo Ceulemans: We have been at it for a couple of years. I agree with what all the colleagues here said, that the impact and development is huge. The challenge with development is also what is the available data for that? There is an upside to moving a bit earlier, to moving to discovery where cumulatively a lot of companies have a lot more data available and present and annotated and labeled, which is exactly fit for purpose if you want to go into this AI space. So having images is fantastic, but very often you do need to label data to make sense of them. So in the discovery space, this is somewhat easier and we do see that there is a lot of interest in putting all these questions to the test right now. Do we have the right types of data? Can we make true on this promise that is being raised by the AI methods? Can we join forces?
So yesterday we made an announcement that 10 pharma companies are actually going to join forces in a very privacy preserving way. So they’re not just growing, throwing the data at each other, but in a very privacy preserving way, looking at the cumulative warehouses of those pharma companies to try to see whether we do have indeed the right volume of data and whether by working together we can prove that these make a difference.
Saurabh Saha: And Hugo mentions a really good point. It’s not just the quality of the data, which is a paramount, it’s the actual volume of data. Because if you have a large sample size and a few dimensions that you’re looking at, classical statistics probably work just as fine, that’s what we did in graduate school. But when you have the number of variables or dimensions far exceeding the amount of data that you actually have, where the sample size is much less than the number of dimensions that you have, that’s when machine learning actually falls apart. That’s when you have a large difficulty and the fact of the matter is we’re at that infancy in stage. We just don’t have enough data to feed into the algorithms to get a meaningful output.
John Carroll: There’s a variety of different perspectives here on this particular question and Shoibal, I wanted to have you address one point that I’ve been interested in for a while, which is it seems that if you’re a bigger company, there are a lot more opportunities to see about how you can take advantage of this. You can invest in it over a period of years. You can see it bring fruit, you can try to increase your percentages from 10% to 20% which is a big thing Hal Barron likes to talk about a lot. But is this something that’s just reserved for the bigger companies? Or are there other strategies here that the other companies can get involved in?
Shoibal Datta: No, I don’t think it’s only for the bigger companies. It’s interesting that the conversation started off with that kind of shape, density and topology of data. And right now the ability to gather large amounts of data — it’s universal. I think it’s not unique to any one company. Fundamentally, I think imaging clearly was the first to break with practical examples of how it could be applied and I think you can follow healthcare. You saw it come into decision support first and now it’s going to be embraced in the mainstream.
I do think that with the pervasiveness of devices and sensors, the amount of data that will soon be available to analyze is going to be sufficient. The questions and the ability of ML to approach those kinds of questions, that’s, I think, what will help be the tipping point on this one. I do believe we are going to get into a new generation of, for example, digital biomarkers or novel endpoints based on these and they will be soon part of exploratory studies, if they’re not already, in clinical trials.
John Carroll: So Rosana, you work with people on the biotech side, on the smaller company side as well. What’s your take on this?
Rosana Kapeller: I do agree with what he’s saying. I think that biotech actually is investing heavily in basically generating the data and creating intersections with machine learning and AI groups to be able to work on that. I totally agree with that, but the point that I actually want to make is the shape of the data and the content of the data that you use, that you need from machine learning.
So one of my frustrations, and I always talk to my colleagues and this is like, “Oh, but there’s so much data in pharma. Why can’t we use all that data?” Especially in chemistry. Okay, you have all this solubility data, permeability data and some. Why can’t we put this all together and use machine learning to teach us all these different features of molecules? You can’t because the data has been generated over time, different experiments, different assays. There is so much variation in there that the machine cannot interpret it.
John Carroll: And isn’t that why imaging is one of the areas where you’re going to find the first exploitation because it’s more pure, the information is more pure, it’s for the machines?
Shoibal Datta: I think the quality of the data and imaging inherently lent itself to these approaches.
Tony Wood: I think that that’s critical. These methods are very good at finding discontinuities in data that have nothing to do with the question that you’re trying to solve. And when you’re putting a jigsaw puzzle together from pieces that come from different puzzles, which is a sort of analogy that we face in discovery data, there’s a real problem associated with that.
In addition, computational chemistry, I’m going to let you drag me into computational chemistry for a little while. We’ve been at this for 20 years now and the message that we already have in place based on quantum mechanics, based on other predictive approaches, you know this very well from your history, they’re already pretty good.
So whereas I’ve no doubt that AI/ML can solve those problems, the impact that it’s going to have relative to the methods that we currently have is probably going to be lesser than in areas like the one that [Hal] and I constantly talk about because there is a problem that we simply can’t solve right now and one which is enabled by new data collection technologies that we’ve been talking about like image analysis. So we can’t use any other method to get to the bottom of dealing with these massive data sets.
So for us, it’s very much about what’s the right problem to solve based on impact? Where are the data sets that are going to be suitable for solution of that problem likely to be created as a consequence of technologies that are becoming developed, crisper imaging, et cetera. And then building our focus around that, whilst at the same time putting in place a data infrastructure for everything else that will prepare the culture in our organization to put data at the center of everything we do.
Whether that’s AI/ML enabling functional genomics or it’s more typical routines, statistical or calculation methods enabling the prediction of small molecule or protein structures that will get to the savings that your proposition that started your initial interest here.
Saurabh Saha: So I agree with Tony and Rosana. I will say that if you look in the chemical space, one area that we’ve had some success, I think the fields had some success, is asking specific questions on: is a molecule… a HER channel modulator or not? And so we had millions of data points internally and others in the industry have the same. And applying ML to that space and asking that question, we’ve been able to get prediction rates from 70% to 95% sensitivity specificity. So that’s actually a very, very specific question.
Now that’s in a two-dimensional chemical space, what molecules look like on paper. But I’d love to ask Rosana — and from the Nimbus and Schrödinger days — on three-dimensional space I think is much more difficult when you’re trying to predict on paper looking at the entropy or the interactions of a molecule with the biological properties that you can predict affinity, solubility and other things. And Jonathan Montagu wrote a great blog on this on Bruce Booth’s site, which I think everyone should read. I think it’s a fantastic lesson.
Rosana Kapeller: It is hard and that’s something that we’re working on right now and I think machine learning is going to be able to augment that. So I think it’s the intersection of machine learning and physics-based approaches may be able to actually solve part of the problem right now and we’re doing a lot of work on that. There is something we call multi-parameter optimization because, Tony, you will agree with that, the hardest thing is to find one molecule that has the ideal desirable properties that you need to make a drug. That’s why it takes so long to go from a hit to have a drug in people that actually acts. And if we could actually improve that even by 20% or 30%, that would be huge. And I think that’s where we’re putting a lot of effort on.
John Carroll: I think that the economics of R&D is something that obviously everyone here is very involved in as well, trying to figure out if there’s a more efficient way of doing it to cut down not only just the cost, but also the time that goes into this sort of thing. I was talking to Clay Siegall from Seattle Genetics a couple of days ago and he’s at the point now of having his second drug hit the market 21 years after founding the company.
And these are long timelines that we’re talking about. Alnylam took about 20 years to get its first drug out into the marketplace. At the same time though, we are also seeing that these companies are coming along now, not just the the top 15, but all the rest of these companies are coming up as well. And everybody’s going to have their own kind of perspectives on this. I am curious though, Shoibal, in terms of where the most common pitfalls are right now for companies that are beginning to look at AI, where do you see most of the initial mistakes?
Shoibal Datta: I think it’s going back to what I think Tony was saying — what problem am I trying to solve? What data do I have available for it? And what are the right approaches? Because there’s no magical answer to that. You still have to go through that. I think every company went through where we have 30 years of clinical trial data and everybody has in some way, shape or form tried to make sense of it. And most have struggled to come up with something useful.
John Carroll: So at a company the size of Novartis, you’ve got a global operation like several of the folks up here have right now. So how do you tackle this from the initial perspective. How do you get into AI in R&D for the first time?
Badhri Srinivasan: I actually want to go back to what Shoibal said and I totally agree. It’s a young science and suddenly there seems to be a lot of promise and everybody says throw AI at it, ML at it, and suddenly you should have a solution. But what does that mean? What is the problem that we’re trying to solve? Do we know that we have sufficient data to actually address that problem? I think there is a lot of groundwork that we need to do. And so we try to look at it in a more disciplined way, just because we are the biggest, just because we’ve got money to spend, just because there’s a lot of retrospective data there.
And this is the other point I wanted to pick up from the question you asked Shoibal as well. There is a ton of retrospective data, but it’s all collected with a different purpose in mind. It has been all collected over the last 10, 15, 20 years. What AI and ML need now is a very different form of that data. So when we look at it, we say, do we have that? Should we go collect that? And is it to solve a specific problem? And second, is it a one problem, one solution? Or can we scale this approach? Is it something that we can then say, “Okay, now I’ve done this and I can apply this across the board.”
If we take that kind of a disciplined approach, we feel like we’ll start to get somewhere. Not that we’re there now, but at least we’ll start to make progress in the right direction. As opposed to saying, “Here was a fantastic success, but I don’t know what to do after that.”
John Carroll: So, I am curious about where we’re headed here with this, which is what you were addressing right now. If you get the data right, if the data stars coming in a uniform and predictable fashion, and you can use this across a variety of areas … that’s the first thing. Is the industry at the point where this is happening now, where the data coming in is pure and fairly reliable? Or is there still more work to be done on that?
Tony Wood: Oh, I think we’re just right at the beginning of that process quite frankly. There are early indications. We’ve talked about them so far, so I won’t repeat them, where the promise is certainly becoming manifest. And you can look across every aspect from idea generation through to the launch and marketing of medicines, and find areas where you can see the opportunity for that to happen. So, I think it will become much more prevalent. I think the bigger question then as we think about it is once you get there, how do you get the culture right? Because this is an exercise in putting the data collection strategy and analytical strategy at the beginning of the experience. And not at the end of the experiment when somebody gets handed a hard disk full of terabytes of data and has to find some sense in it.
So, one thing that I think is important for us to consider is how the, let’s call them a multilingual team, is assembled. And we have individuals who can, if you like, bridge the gap between advanced analytics, which comes from people that have come from a very different background, and scientists who understand the nature of the problem that they’re trying to solve. So, I see lots of opportunity developing over time. I think we will get to a point ultimately where the biggest issue we face though is one of making sure that we get the right culture to have data-driven organizations.
Rosana Kapeller: I’m just going to agree with you in spades. I think that’s the biggest issue right now, the mindset and the lost in translation. The don’t speak the same language, so it’s going across. And if you don’t have these teams to talk to each other, you’re not going to make the best of it.
John Carroll: You talked about this, the advertising folks. Just talk about that for a second, just in terms of the lost-in-translation aspect of this.
Rosana Kapeller: Yeah, it’s completely lost in translation. My experience was when I was at Nimbus, between the medicinal chemists and the computational chemists. And I think that in the beginning, it was very surprising to me, they don’t speak the same language. They don’t even think about how to solve the problems in the same way, and how to integrate the different solutions. And the only way that we actually made it successful at Nimbus was when we started putting them together, and making decisions together. They had to co-lead the programs and make decisions, to be able to speak the same language.
So, I think that is the same thing that we’re seeing in machine learning and AI; a lot folks that come from machine learning, they don’t know much of biology. They don’t understand how you solve problems in biology and vice versa, so they don’t have a common language. So, we have to develop that common language.
Hugo Ceulemans: One of the challenges is not just making the right type of data, the right volume of data, the right quality of data, but then matching those different data sets. And some of those will be small, by all kinds of restrictions. And it’s this multidisciplinarity that comes in in bridging those. So, how do you bridge from a set that annotates pure chemistry versus some that has already a lot more biology, versus some clinical aspects? With this exchange of multidisciplinarity comes in, into bridging things. I think one of the biggest mistakes that is often made is we generate an awful lot of data in one assay or a few assays. We’re really good at that. Now, we throw machine learning on it.
To predict what, to try to get to grips with what exactly? And I think I completely agree, having that multidisciplinarity is absolutely key to bridging all those different aspects.
Saurabh Saha: So, in immuno-oncology, we think that ultimately multiple markers or composite biomarkers will lead the way in being able to tell us which combination of drugs will best be effective for a given cancer or patient. And to the point that Rosana and Hugo make, it’s very difficult … it’s extremely difficult to translate from discovery, translational, through the clinic. But it’s more difficult just to figure out how to translate or link genomics data, proteomics data, flow data, T cell activation data, imaging data just within the translational space. So, just when we can try to come up with markers that can predict which patients would respond.
Listening to this conversation reminds me of a comment that Freeman Dyson once made: does science evolve through ideas or tools? And if you look back at the era of the steam engine, the tools of the steam engine preceded our understanding of the laws of thermodynamics. And that’s not terribly different from where we are today in terms of artificial intelligence. We have the tool. But we don’t necessarily understand, okay, what are all the branch points in the decision tree and the probabilities. How are they weighted to come to a recommendation that this is what the model outputs? We really don’t understand this black box.
So, I think people have compared this to genomics of 20 years ago, where we had the opposite problem. We had tons of data, but we didn’t really have the tools to effectively analyze it. And now we’re at the reverse situation. So, we’re kind of repeating history here.
Shoibal Datta: So, sorry, but if I could ask a question. Do we need to understand the black box? And if you look at healthcare and clinical decision support over there, there is not a fundamental need to understand what the black box is. And in R&D, the goal posts have shifted as we understood more about the systems that we study; increasingly have a need to be able to explain why that’s happening, than it used to be before. What’s the right thing to do over here?
Saurabh Saha: Yeah, that’s a great question. So, I fundamentally believe that coding lacks morality. So, if you’re a physician and you are getting a piece of paper that says, “This patient should get this set of drugs,” do you really believe there’s quality in the data? Was there poor quality, good quality? Do you have transparency as to how that data came about, it landed? And how it was analyzed, this black box? And third, has it been biased in any way? One of the challenges with ML is that you only are really being able to predict based on your training sets. Largely your training sets that your working, that’s being fed into the model.
And if you have unmeasured or unknown data that is important for let’s say, predicting a response for a patient, that data’s not being picked up. Because it’s unknown, it’s unmeasured. So, by definition, it hasn’t gone into your training set. So, if you’re missing that component and you’re making a recommendation, and you’ve left that out, then I think it’s a fundamental challenge. Because you really don’t know how you got there. Maybe it was Google, I don’t know which company it was, where they looked at retinal scans and tried to predict MIs, myocardial infarction. And what they ended up actually scoring for was the age of the patient, instead of actually picking up how the retina can predict that.
Tony Wood: Let me pick up on the point you made earlier here. Because there’s a great data integration opportunity. We can now describe the character of cells at multiple levels of phenotypical abstraction from DNA structure, all the way through to an image. And this is part of the problem with the complex molecular biomarker proposition, you need however to have some ground truth in order to connect that massive variability to something that you know to be true. And that ground truth in the context of our strategy is human genetics. So, if you go from high confidence variance to deep characterization of cell character, then I think you’re in a place where we can start to resolve some of these problems with regard to the use of molecular biomarkers, patient selection.
Because fundamentally, we’re anchoring the whole exercise on something that we know to be true. And so, that’s where our focus in functional genomics comes from. It’s not just about target identification. If we get that right, it echoes out into a lot of the other processes that we’ve been talking about. One additional point around the black box that I want to bring in at this point, which is probably worthwhile considering, I’ve never encountered a computer yet that didn’t produce an answer. That the issue is not the answer, the issue is the cost of finding out whether the answer is correct or not. And that’s a challenge for example in the synthetic area.
What we need to be able to do with these methods, is use the fact they can integrate across multiple different dimensions. And start not just worrying about, can I find an active compound? But can I design an active compound that I can synthesize at tonnage scale later on, without having to go through the development process optimization. Then you’re at a place where you’re designing for everything right from the beginning. And that’s when I think you get the huge impact from savings, with regards to really integrating challenges across the time scales.
John Carroll: I would like to shift a little bit away from the real world, what you’re doing today, and the challenges to where you think this is going. Because where we are today is not where this is going. And there are all sorts of different ideas about this. So, where are we going to be four, five years down the road? Where is this headed? Because four or five years in biotech is nothing. I mean, it’s now. So, I’d like to get your ideas. Badhri, what do you think this is going to be for Novartis, reimagining medicine?
Badhri Srinivasan: So, first of all, I think where we are going is not where we are today. And that’s a really nice segue into this. I think where we will be is … the industry is mature, you’ve heard a lot of people here say we are at a bit of an inflection point. We’re trying to understand the space. This is early days in us understanding what to do with this data and how to focus it on a problem. Where we will be, I hopefully where we’re getting there, is we’re using more and more data to say, “Okay, do I understand the disease state better?” Before I even start any kind of treatment, “Do I understand the disease state better? If I can understand the disease state better, can I bring medications that are appropriate to that disease state?” That depth of understanding I think is where we will be four, five years from now. If you’re then saying 10 years or (10)ish from now, again, which is not a long timeframe in a pharma biotech world, I think we will start to see that shift to actually using that data. To then say, “How are we shortening the timeframe? How are we improving our probability of success, et cetera?” But in that four, five year timeframe, I think it’s more of a better understanding of the insights that we have. Whether it’s patient insights or disease insights, or even molecule, compound insights, I think that’s where we will start.
John Carroll: Tony, I know this is a conversation at GSK.
Tony Wood: Yeah, look, we’re going to be in a place where we have better target identification, along the lines that I’ve described. And as a consequence of that, an ability to better identify patients who are likely to benefit from our medicines.
John Carroll: That’s it?
Tony Wood: That’s our focus.
John Carroll: Rosana?
Rosana Kapeller: Would you say I’m an outspoken person?
John Carroll: Well, I hope so.
Rosana Kapeller: As everyone is saying, we’re very much in the beginning. And we’re still trying to figure out how to put this all together, and what’s going to be the benefit of it. But I think that we right now are facing what Kodak faced with digital cameras. If we don’t embrace it, most of us will have a Kodak moment…
Tony Wood: Mm-hmm (affirmative).
Rosana Kapeller: Data is going to continue to accumulate, we’ll continue to generate more data. We are going to figure out how to get data in the right shape, format, et cetera. We will work on the cultural piece. We will make improvements for patients and all of that. So, I think people should embrace it instead of trying to push it away. That’s my personal opinion.
John Carroll: I think fear is driving this as much as anything; the fear of missing out. Hugo?
Hugo Ceulemans: I think one key aspect of where we’ll be going is integrating more. Because the one difference versus consumer, is that in consumer there were a few monolithic data owners who have massive access to the data. I think in pharma, in clinical, that is not the case. The stakeholder landscape is way bigger. So, we will need modalities to deploy AI across multiple data owners, across multiple stakeholders, in order to achieve the dream of where to land this. And I think a lot of the newer players, they will combine a vision of methodology combined with, “This is complementary data, this is data that big pharma or the clinic does not have. This is a missing in this.”
And we bring this together through an AI that can learn across multiple stakeholders, while respecting the IP rights and the business interests, but also the patient rights, the clinical aspects of all these stakeholders. And I think that will be a major challenge, that is very specific to the healthcare space. That in some of the other areas where AI has made early progress, was less prominent, the fragmentation of the space. That’s something we need to conquer.
John Carroll: I saw a study done not too long ago, about a month and a half ago, where they had looked into the success rates of drug development. Which is kind of an obsession in R&D … and one that I share. It was interesting, because they looked at all the different phases, going through the whole preclinical, Phase I, Phase II, Phase III. And they saw a significant increase in Phase III successes. Everything else was pretty much the same, and nothing has really changed all that dramatically. It’s pretty bad. The 10% or whatever that might be, is poor. And I think that most people would agree if you could make that 20%, you’d be celebrated around the world.
I think one of the reasons why the late stage development is working out better, is companies, bigger companies and all companies, are doing better in terms of deciding what they want to take into Phase III. And also, there’s been a drive toward more specialization. Where you see the companies, particularly the larger companies, are being very clear about where they think they can make a difference. Where they can come up with the new products that’ll have an impact on market. So, does AI continue to drive specialization? Does it make it more important to understand particular areas of specialization? And will this be a continuation of that process?
I’m curious from Janssen’s perspective, because you do cover quite a lot of territory.
Hugo Ceulemans: You need to integrate across a bigger space. Because those niches will get smaller and smaller. Diseases will be split up in subtypes. You need sufficient data power. But I absolutely do think if you have a very clear vision about, “This is the type of diseases. This is the type of treatments we want to aim for,” this will be enabled by AI and by integration across larger and interconnected data sets.
Tony Wood: Okay, I think whether or not diseases are split into subtypes is an interesting question. And our focus is very much in the early phases, to focus on the biology. And it may be that by focusing on the biology and what we learn there, that we get to think about the constellation of diseases in a very different way. So, for me I think the jury’s still out on that one. We may find that what we do is just group diseases in a different way from the previous approach we’ve taken through diagnosis and presentation.
Saurabh Saha: John, to your point about success rates, so I think we may be looking at the same study. I think it was a CRM study published in Nature Reviews Drug Discovery. The success rate from Phase I to launch is about 7%, 8%. From Phase II to launch is about 14%, 15%. And Phase III to launch is about 60%. So, if you look at that data, there’s inherently something very interesting about that. It says that, “Why are we doing these long complex Phase I studies, if you’re really not gaining much benefit and success from Phase I to II, in terms of trading, your portfolio? You’re really seeing huge leaps when you go from Phase III to launch.”
So, I think where we as a company and I think the industry would benefit from AI or whatever you want to call it, is increasing the probability of getting that Phase I study to read out much more data. Much more rich data, where you have a clear proof of concept. That notion’s been around for over 20, 30 years. But a real proof of concept, one that tells you that, “You know what? Stop the program, or continue and you’re going to have a Phase III-like success rate in the Phase II setting.” One of the things that is clear, I think is that science evolves. Or there’s revolutions in science based on being able to measure things better.
(If) we were to measure a patient’s tumor, for example, to an infinite amount. Like, know everything that’s going on in a patient’s tumor and have the tools to be able to do that, to analyze the consequences of those data that we’re getting out of that tumor, one could predict I think with almost certainty what will happen ultimately to that tumor, and what with Darwinian evolution may end up happening based on the fundamental genetics of that cancer. So, what I think when you say four to five years, in the last two years, we’ve seen a huge difference in the cancer space. Because it’s only in the last two years that we’ve been meaningfully collecting samples at baseline, on treatment, post-progression.
And it’s only in that two-year period that we’ve actually deployed all these now tools to look at androgenicity by sequencing mutations, looking at inflammation, by looking at PD-L1, looking at CD8, and looking at these markers. And analyzing the transcriptome, looking at gene expression in single cell RNA analyses. So, all that’s happened in the last 18 months or two years. Now prospectively in the next four to five years? Think about where that’s going to be. We’re going to have all that data, being able to measure a tumor as precisely as we possibly can. So, there’s going to be great advances in maybe the next two years.
John Carroll: I’d like the CRO perspective here, because in a lot of different respects, you can’t specialize. You have to cover a large territory for a variety of different people. But at the same time they’re coming to you and asking about this, so how do you address what … where you’re going to be in four or five years from now?
Shoibal Datta: Sure, and I think to try and … distill some of the conversation on this question, I mean, the industry as a whole is moving … I think we’ve gone past the plumbing, computer infrastructure, curation, ingestion issues. Everybody’s kind of figured that part out. We’re beginning to understand what the limitations are. And back to the question of data quality, data depth, data density, and we really see digitally enabled trials, the pervasiveness of clinical grade consumer, sort of, devices and sensors as powering some of that. And our focus is really to be able to run those kinds of studies to the same standards of integrity and quality like we do traditional trials. So we have to prepare for that. And it’s not just a technology problem. It’s a technology process… It’s a multifunctional, cross-disciplinary exercise.
John Carroll: Saurabh, you brought up the idea about what’s going on in cancer right now. Obviously, Bristol Myers has a stake in the cancer field, in I/O in particular. And there’s a lot of money going into that. It seems like those companies that are deeply invested into certain specific areas are going to find it the most productive when it comes to using AI as they build up expertise, and as they continue to build up the data and everything else with it.
I’ve seen some of the statistics in terms of the investment in cancer versus everything else. It’s always way out front. There’s been these massive advances. You’re getting into smaller and smaller niches here, where you’re breaking up cancers into very specific subpopulation groups, and so on. So you can see how AI could make a big difference there. You could see how that would happen.
I’m curious from the rest of everybody’s perspective here, what are the other areas where you’re going to find the greatest productivity? [Daphne Koller] works with Gilead. Gilead knows the liver really well. This is like one of those areas where, “Give me the liver,” and then they’re going to figure it out. So I’m curious, what are your specific areas where you find the greatest opportunities as it relates to the diseases? Which diseases, other than cancer perhaps, are we going to find the greatest advances in? Badhri?
Badhri Srinivasan: So I think oncology always has a tradition of leading, so that’s for sure, and as you pointed out as well. But I actually think the applications go far beyond that. I can think of ophthalmology, for example. I can think of respiratory and liver. We don’t understand really what’s going on in that, for example, in that space, in the immunology, hepatology space. I think the applications are far beyond and apply to other areas.
They are catching up now. Used to be that oncology was always in the lead, and the others were a little bit more “traditional,” but I think the other areas are catching up. Some of the image analysis that we see, etc, is extending to other areas as well. So I think it’s across the board.
Tony Wood: And where you understand the genetics of the disease, and you can take that understanding and point to the cells in which quantitative trait loci are expressed, you have the basic ingredients to follow a functional genomics approach enabled by AI/ML, to address any particular disease with high confidence in the way that I’ve described. So we don’t think about it so much from a point of view, is it cancer, or is it neurodegeneration? But rather, where are the basic ingredients that allow us to connect together these technologies and improve our chances of success?
Saurabh Saha: So I would add, in our case … So we’re not just a cancer company. We have a number of other disease areas — in cardiovascular, fibrosis, in immuno sciences. Autoimmune disease, I think, is one that we’re going to really make some major headway. So as a company, we belong to a consortium that looks at the UK Biobank data that’s coming out. It’s about half a million patients. And to Tony’s point, there’s just an incredibly rich amount of genetic data from exome sequencing in that biobank.
Rosana Kapeller: So autoimmune diseases, I think, it’s going to be big.
John Carroll: Hugo?
Hugo Ceulemans: All types of diseases where-
John Carroll: And add on a question. Where are we not going to find progress? I’m kind of curious about that too.
Hugo Ceulemans: And that’s a really tough one to answer. I think the nice thing about immunology, and the nice thing about metabolism is that they come back in so many diseases, and that’s where AI is good at. You bring it into a new area, and it says, “Hey, wait a minute, I recognize this. I’ve seen this before somewhere else.” And I drag that information, and I give you suggestions on where to go. So that’s where it will be helpful.
Now it’s very hard to predict where it will do that. So it may be that in some rare disease, you’ll recognize something you have seen before in one of the more traditional areas that got a lot of exploration. And there’s a lot of serendipity there. So predicting upfront, “Here’s where I am going to recognize those patterns,” is tough. But I think common themes are the things like immunology, like metabolism, that actually play in every disease.
John Carroll: So I did want to turn to the audience here. Alex! How did I know you were going to ask a question? Go. Go.
Alex Zhavoronkov: Right. So there is a lot of hype about AI, and nowadays it’s actually quite tough to distinguish who is who, because there are so many areas where it can be applied. So I’ll ask a very concrete question, very quantitative. So my question is that currently in your companies, how long does it take to go from target presentation to lead in an animal, and how much does it cost in your opinion? So those two figures, how long, how quickly, and did you already see the improvement experimentally with AI?
Badhri Srinivasan: So it’s highly varied, so I’m trying to think how to answer your question very specifically, and I don’t believe I can answer your question very specifically. Have we already seen the improvement? I think not yet. I think we are in the space now where we’re starting to explore. So the answer of having seen the improvement, no. We’re in the process, I would say.
Tony Wood: I guess to me that’s a sort of “how long is a piece of string” question. There are so many other factors that are important in that process, which determine its speed. What’s the relative focus? What’s your confidence to pick of your points and to use animal models at the end of that sequence? What’s your confidence in a result in that animal model being meaningful?
There’s volumes written, for example, on the difference in quality in animal models. So it can be quick if you’re focused on a great target, and you have confidence in translation, and you’re picking an area where we already know how to execute very effectively. So for me, I think this is a vehicle to help us make decisions. It is not a panacea that will fix other problems that AI/ML simply cannot.
John Carroll: In most cases, isn’t AI being applied in addition to everything else you’re doing? It’s really an augmentation as opposed to a replacement of anything that may be going on, it seems to me. Anyone else want to weigh in here?
Tony Wood: Mostly, the approaches, at least in the early R&D phases, is a hypothesis generator and augmenting something else you’re already doing.
But let me come back to the point about imaging, because that is a very clear area. We cannot do it any other way. So there are these unique pieces where it is the only method that can be applied.
Speaker 2: Much has been written about the emergence of China as an AI superpower, both because of heavy investment by the Chinese government at all levels, and perhaps a lower standard of privacy than we have here in the United States. From your perspectives as global companies, do you see a notable rise in China in terms of their strength in AI, relative to the United States? And is the US government doing enough to both invest in AI, and are the privacy standards in the United States an encumbrance, in your opinions?
John Carroll: Is there anyone you’d like to direct that to in particular? Okay. It’s up for grabs. Who wants to grab it? Hugo? It’s yours, Hugo.
Hugo Ceulemans: Oh, it’s mine. So first of all, I’m based in Europe, so making statements about either the US or China, I may not be best positioned. Now one thing, patients are so center to our attention, we cannot make progress while at the same time losing focus on patients. So just waiving patients’ rights, patients’ concerns, that’s who we do it for. So we should not lower the bar on that one. Do we do enough in artificial intelligence in the West, to just include a bit of Europe?
So probably we can do more. Probably we can do better at giving the right incentives. I do think we should also become stronger in working together. I think a lot of the initiatives in the West have been isolated, fragmented. I think that coming together in a biotope will give us a critical mass that will be needed in such a competition. But by no means I think we should jeopardize patients’ rights and privacy rights to accomplish that. I do not think that that is the answer.
Speaker 3: Hugo, just to follow up on that question. Whereas we have a lot of startups that are interested in AI … Alex, we’ve actually started an alliance with some of you as well, and I think that goes to my question. How are we going to combine, collaborate, partner? Because a lot of people are saying, “Hey, the data, the organization, the type of data that you put in, is the outcome that you’re going to have at the end, and how we speed up this R&D.”
So how are we going to democratize and speed up the R&D process by combining nimble startups that are disrupting, and have the right databases, the right codes, and have asked the right questions? Because all of us have an alliance, because each of us are asking different type of questions. So how are we going to democratize, and how are we going to group together and collaborate? So that’s my question to maybe each of you.
Tony Wood: I guess from my perspective, it goes back to, it’s about the data. And for me, where we need to be focusing are the technologies that are going to enable the generation of appropriate and relevant data against the backdrop of some labeling, which gives you a grand state of truth. So I’m worried less about the methodology development. I think what we should be much more focused on are some of the problems in data collection that you’ve heard my fellow panel members talk about, ambulatory molecular biomarker measurements, things like that. I’ll stop there. It’s about data development, data acquisition technology, not about the analytical method.
Speaker 4: So no mic, but I’m going to speak loud enough. Building off Maria’s question, let’s go one step deeper. Let’s talk about business models. So 130 plus AI in “drug discovery” companies, 15 top pharma companies. How do you adjudicate, evaluate, and then how do you think about business models from a data asset model economics?
Tony Wood: That’s a really good question because it’s very difficult. (Laughter.)
Speaker 4: Hugo asked that to me last week at the London conference. That’s his question.
Tony Wood: Yeah. Let me just put a plea out there. We need some means of assessing data on performance standards, right? This all started with the original image network, where quite frankly there was a global standard that everyone was aiming against. You didn’t have to … You could get away from this sort of current collaborate, suck it in, oh, you’ll find out when you pay the cost of finding out whether or not our predictions are right. That’s fundamentally the problem we face.
So how do we create an environment, be it in the context of this secure data philosophy that you described, Hugo, or others, where we just simply have a set of ImageNet-Like standards that one can create synthetic datasets against, what have you, but an impartial means of judging where are the best examples? Where are there investments that are really delivering massive improvements? And where are the just many versions of the same flavor? We just don’t have that right now, and it makes it very hard.
John Carroll: Okay.
Alex: So I just wanted to add to that. There are a few ImageNet-like projects, for example, generative chemistry. So there are two currently. So people created large data sets, opened them up, labeled them, created models, and created a leaderboard. But what is interesting is that most of the pharma companies, when they’re partnering, they’re not looking at those results. They’re typically … It’s kind of … I think it’s a little bit more of a crony reputation play at this point in time. So people do not really look at the benchmarks.
Tony Wood: I guess I just respond by saying it’s necessary but not sufficient in our analysis.
Matt Clark: Great. Hi, Matt Clark from Elsevier. One aspect of AI that we haven’t discussed much is how it changes making decisions. Right? We have all the AIs, we have all the readouts, but we actually have to make decisions that we wouldn’t have otherwise made based on this information to change the core. So for example, as part of the drug success rates at different clinical trial phases, when I was in big pharma, I read hundreds of project reports, and I never saw a drug team kill its own project.
They’d usually go on a couple years before someone outside said, “Hey, we’ve got to stop this. You can’t save it anymore.” So I would say, what is the aspect of thinking in your organizations for, now that you have this data coming in from AI, using it to change how you make decisions, and make different decisions than you made in the past? How are you changing the organization to account for that, as well as just the raw science of I have a new piece of information coming in?
John Carroll: Saurabh?
Saurabh Saha: Yeah, so I think your question doesn’t necessarily have to be scoped just to AI. I think just regular statistics, a p value should be enough to kill a project. I think you just have to have the mindset to be able to pull the trigger and say, “This is done.” This is traditionally very difficult for companies to do. One thing that we’ve adopted is a notion of truth seeking over progression seeking. So having this notion that when a target is identified, all the way through when the molecule may already be in the clinic, you never stopped validating that target asset pair. You’re constantly validating it.
Whether you’re using AI, whether you’re using regular statistics, whatever the means is, external data, competitor data, academic data, whatever it is, you’re constantly evaluating whether the target asset pair that you’re pursuing at year five is still as compelling and exciting as it was five years ago. And if you can measure up to that standard, then I think we’ll have higher success rates.
John Carroll: Okay. Well, I said we’d get out here sharp, about 8:30, so it’s 8:31. I know you guys have a busy day. I want to thank everybody on our panel here. We’re going to hear a lot more about AI. I’d like to particularly thank PPD Biotech for sponsoring today’s conversation, and hope to see you all later on. Thanks for coming.