A simple patient survey revolutionized ALS drug development in the 1990s. Its creator says it may be time for an upgrade
It’s been another tough year for ALS therapies. New trials that sparked patient hopes either fell short of statistical significance or failed to provide enough evidence for the FDA to recommend moving forward without another look. The end result is largely more of the same for patients — little progress in a fatal disease with only two approved treatments.
That frustrating slog has stirred passionate debate within the patient and advocacy communities, largely around things like trial design, compassionate use programs or even what the bar for approval should be. Long overlooked in those discussions, however, is a tool used to evaluate every ALS therapy that’s drawing increasing scrutiny.
At first, in the 1990s, this tool changed the landscape in a big way. A group of ALS researchers led by then-Regeneron exec Jesse Cedarbaum came together to develop a rating scale to measure how quickly the disease progresses, and whether experimental drugs could affect it meaningfully. Known as the ALS Functional Rating Scale, later revised in 1999, this scale has become one of the primary methods biopharmas have used to try to develop ALS drugs over the last 25 to 30 years, and is seen by the FDA as a highly useful tool when evaluating whether or not the experimental drugs actually work.
But there’s been exasperation within the ALS patient community over the scale itself, particularly given the heterogeneous nature of the disease. Because the disease can manifest in a variety of ways, two patients can be measured at exactly the same point of progression while presenting completely different symptoms. There are also complaints over the lack of granularity in some aspects of the ALSFRS-R, with patients and some researchers describing how the scale may not accurately portray how quickly some patients’ diseases progress in some cases.
And now Cedarbaum, too, says it may be time to produce a more modern version of the scale.
“It’s proved to be very, very robust, which pleases me a lot, but it’s not a perfect scale,” Cedarbaum said. “We know so much more about the disease now, medical care has changed, society has changed. So many things have changed that it may be time for an upgrade.”
Developing the scale
At its core, the ALSFRS-R is a patient survey. It comprises several questions that gauge how far a patient’s disease has progressed in 12 different categories under three main umbrellas: speech, mobility and breathing difficulty. Each category is then scored from 0 to 4, with 0 representing the complete need for assistance and 4 denoting no help needed. In clinical trials, the questions are typically asked once a month.
When Cedarbaum first started at Regeneron in 1990, there hadn’t been any sponsored ALS trials conducted up to that point. There was a “hodgepodge” of other surveys floating around, asking patients questions about their symptoms and quality of life, but it wasn’t clear how they related to one another.
“They didn’t carry equal weight in summing things up,” Cedarbaum said.
In order to launch an ALS trial for a Regeneron program, Cedarbaum wanted to create something that could measure the course of patients’ diseases and how people are functioning in meaningful ways. He set out to ensure these assessments would be correlated with both survival and muscle strength, the basic physiological symptom of ALS.
Working this out was no easy task, given all the variables associated with the disease, and he assembled a cast of characters that included scientists, clinicians and physical therapists from leading ALS centers across the country to solve this problem. They asked themselves a question: How can we build a single rating scale?
The group settled on two potential models on which to base a new scale. The first was something called the ALS Severity Scale, developed in the late 1980s by researchers at the University of Washington. This scale attempted to evaluate ALS symptoms numerically in four categories: speech, swallowing, lower extremity, and upper extremity abilities. Generally, it had been used in combination with a device measuring a patients’ breathing to try to paint as accurate a picture as possible regarding an individuals’ disease.
The other model proposed was based on a rating scale used at the time in Parkinson’s disease, shorthanded as the UPDRS. This relied more heavily on patient responses than the ALS Severity Scale, asking those with Parkinson’s about all aspects of daily living: walking, for example, or maneuvering in bed and cutting food. These things also proved important to ALS patients to determine how they might be feeling from one day to the next.
Borrowing from other fields is something that happens all the time in neurological diseases, Cedarbaum said. Even though different diseases can impair different parts of the body, basic daily tasks are the same.
“Human beings actually have a very limited physical repertoire, unless you’re Tom Cruise or an Olympic gymnast,” Cedarbaum said. “Most of us do the same basic things day in and day out. That’s what the FDA is interested in when talking about function.”
Out of these efforts came the first iteration of the ALS Functional Rating Scale. The group piloted the scale in a natural history study, putting it through what Cedarbaum said was a “very exhaustive clinimetric evaluation.” Not only did it correlate well with how quickly patients progressed, but the individual domains also correlated reasonably well with muscle strength and across sections like lung function and ability to walk. (Cedarbaum notes he wasn’t listed as an author on this study but said it includes the same team with whom he developed the scale.)
Within a few years, the scale started becoming a standard feature in ALS drug trials, and by 1994 or 1995 the FDA’s deputy head of neurology publicly endorsed the scale, Cedarbaum said. A revision was made soon after to more adequately represent respiratory function, adding three questions to a section that had only previously had one. That final product became the ALSFRS-Revised and it has remained “basically unmodified” since 1999.
The scale became widely used as the primary endpoint in late-stage ALS studies. Closely watched trials from Biogen and Amylyx both rely on it, as does Brainstorm Stem Cell Therapeutics’ widely panned Phase III study. A prominent “platform study” at Mass General, seeking to evaluate a swath of therapies at once, also uses the scale.
Validated, but not perfect
As more and more biopharma companies began conducting ALS drug trials, the ALSFRS-R continued showing correlations between disease progression and the effectiveness of those experimental drugs, Cedarbaum said. Many leading clinical researchers, including Mass General’s Sabrina Paganoni, who led a Phase II trial for a drug being developed by Amylyx, vouch for its effectiveness and hold the scale up as the gold standard of ALS drug development.
“Studies have shown over and over that it’s one of the most scientific ways to detect ALS disease progression in response to treatment,” said Paganoni. “There are some challenges with it, however as of today it is a very helpful measure and one of the best ways to assess clinically meaningful effects on a patient population.”
The scale even became a helpful communication tool at first, Cedarbaum said, with patients able to tell doctors at their appointments how they were feeling on a given day using some sort of guideline. When patients met other patients, they could introduce themselves with their ALSFRS-R score.
A typical conversation might have gone something like, “Hi, I’m so-and-so my score this week is 36, it’s been stable at that level for three months, this is what my doctor has done,” Cedarbaum said. “Does anyone have ideas when it’s getting worse in this way?”
But the challenges Paganoni alludes to, patients say, add to the already immense burden placed upon ALS patients who not only have to deal with their disease itself, but clinical trials and therapies that can be physically strenuous as well.
Gwen Petersen was just 32 years old when she received her ALS diagnosis, after more than a year of struggling with balance and coordination. She recalled her honeymoon in the Italian Alps where she struggled to navigate hiking the trails and, upon returning home, saw a host of different doctors to try to figure out what was going on. They initially attributed her problems to anxiety and orthostatic hypotension, a chronic, extended version of the drop in blood pressure one experiences when standing up too quickly.
When doctors determined ALS to be the cause in 2018, Petersen immediately got to work on a journey she says many patients and families face after diagnosis — performing an exhaustive internet search of experimental therapies and ALS clinical trials, while sending a flurry of emails and phone calls to participating centers. For Petersen, who now spends her time working with the I Am ALS advocacy group, that journey brought her to BrainStorm and their Phase III study for the cell therapy program known as NurOwn.
While participating in the NurOwn trial — a process that required seven lumbar punctures conducted in specialized clinics with three needing overnight hospital stays — Petersen says she became disillusioned with the scale when she realized how it tries to homogenize a heterogeneous disease. Because the disease can manifest in a variety of ways, two patients can score exactly the same on the ALSFRS-R while presenting completely different symptoms, she said.
Petersen further described the test as “painfully subjective,” claiming scores can vary depending on the person asking the survey questions, the kind of day patients are having or the time of day itself.
“If you asked me, “How’s your speech?”; normally I would say it’s slow but intelligible,” Petersen said. “If you asked me the same question after a long day of meetings when my voice is fried and I need to repeat myself, well that’s a different answer.”
The scale’s lack of granularity has also bothered Phil Green, another I Am ALS advocate and BrainStorm patient that took part in the Phase III. Green, also diagnosed in 2018, told Endpoints that not having fractional answers for the questions doesn’t give an accurate sense of how quickly someone’s disease can progress. One person can walk for a mile and be completely fine, but after a month become exhausted simply walking outside to their mailbox. Those two cases, Green said, will score the same on the ALSFRS-R ambulatory question.
“I like to use a fruit analogy,” Green said. “If we think ALS is an apple, you may have different varieties of apple that taste differently, may rot or ripen at different rates, have different colors, different sizes. But we’re all considering them to be the same because they’re all apples, and that’s not true.”
Paganoni pushed back on some of these concerns, pointing to studies that suggest ALSFRS-R can reliably show whether a drug slows disease progression. Clinicians who ask the questions undergo rigorous training in order to use the scale the way it was intended, she said. And the lack of available ALS treatments is not a result of the scale’s shortcomings.
“I’m not saying that the ALSFRS-R is perfect and we should never move forward,” Paganoni said. “It would be great to develop more sensitive scales, as well as more scientific biomarkers… there is a lot of work going in this direction, but it’s not that easy. That’s why we don’t have an alternative yet.”
Addressing the concerns
Cedarbaum, too, said he’s aware of these complaints and has been trying to push the ALS Association to convene a conference or meeting to review and update the ALSFRS-R. He noted that the Parkinson’s scale was updated in 2008 based on changes recommended by the Movement Disorder Society with the scale continuing to be revised — most recently in 2019.
“One thing the field could really, really benefit from,” Cedarbaum said, “is coming together and having expert groups of patients and not just physicians but clinicians, other people like OTs, PT, speech therapists, who work with ALS patients all the time, come together and say, ‘OK, this was a great scale for the ‘90s and the first quarter of the 2000s, but we can do better.’”
He stressed that it’s important to find the correct language for questions determining what might exist between a 3 and a 4 on the scale, and floated how a 3.5 could represent patients struggling to eat certain foods but not having progressed to the dropping of a full point. Such questions ideally would be crafted with heavy patient input, which is a more modern way of doing things compared to when the ALSFRS-R was first developed in the 1990s, he said.
“This is the kind of thing that can be learned by talking to patients,” Cedarbaum said, “Let’s go back and find people who were recently diagnosed, or talk to people earlier in their disease, and say, ‘Let’s go back to before your diagnosis, what was going on in that year or two? What were you thinking you were having problems with? What were you telling people?’
“We can talk with spouses or caregivers,” he continued. “‘What was your spouse or loved one telling you wasn’t quite right about what was going on?’ And then we can use that information to frame out maybe new gradations in that early phase of the disease that will help. These are the kinds of things that would need to be considered in any upgrade.”
With the way the questions are currently constructed, the scale currently exhibits phenomena known as “floor and ceiling effects,” said Merit Cudkowicz, another Mass General researcher and one of the lead investigators of the BrainStorm study. Simply put, the ALSFRS-R correlates less well with patients who have very early-stage and very late-stage disease. But Cudkowicz says in the middle, where most trial participants exist, the scale does show meaningful changes.
Cedarbaum added that there simply aren’t any data from people who are in these stages of the disease — all of the data on the scale comes from clinical trials, he said, which recruit patients with specific inclusion and exclusion criteria.
Cudkowicz acknowledged BrainStorm patients’ complaints around the scale, and said that as someone who helped design the trial, she could take the heat. BrainStorm’s study notably missed its primary goal and the company was directly rebuked recently by the FDA for touting a study that missed every endpoint. Cudkowicz, though, says the flop was likely due more to a trial design failure than the ALSFRS-R’s shortcomings. But the way patients were recruited and placed into a three-month lead-in period before being randomized to placebo only compounded some of these faults.
“It was actually designed to pick people who were progressing fast in order to be able to pick up a positive treatment effect in six months,” Cudkowicz said. “But the combination of the lead-in and picking fast progressers meant that by the time people were treated, they were in a much sicker state than would have been optimal to start treatment.”
BrainStorm claimed to see a potentially positive signal among patients who hadn’t yet progressed far with their diseases. Even though the FDA has publicly disavowed the Phase III data, BrainStorm is pursuing an approval in this subpopulation and declined to comment on the NurOwn Phase III trial specifically, given the pending regulatory discussions.
But Cedarbaum noted there’s a distinct, even essential, advantage in distilling a heterogenous disease into a single number. The scale arose out of the need for drugmakers to be able to precisely determine how their experimental therapies are working. When biopharmas ultimately have to present their findings to regulators, he said, they can only use one number to show statistical significance.
While Cedarbaum recognizes patients’ misgivings over this aspect of the ALSFRS-R, prognoses among similar scoring individuals are relatively comparable, regardless of how the disease manifested or how symptoms are presenting.
“The attempt to homogenize is prized for clinical trials,” Cedarbaum said. “The patients might feel like it puts them in a meat grinder, but that’s done deliberately so statisticians can deal with it sufficiently in communicating with FDA.”
The path forward
Many of the patients’ complaints stem from the fact that there are no consistent biomarkers in ALS, leaving drugmakers to use the ALSFRS-R or some other measurement instead, Green said. One of the most promising biomarkers that could prove to be the missing piece is something called neurofilament. Normally found in nerve cells, it’s a protein that leaks into blood and cerebrospinal fluid when those cells become injured.
Much like cholesterol is elevated in individuals with heart disease, neurofilament levels in the bloodstream and CSF have generally been found to be higher in ALS patients, Cudkowicz explains. Many ALS studies already look at neurofilament levels as secondary measures, but there hasn’t been much work yet in targeting these proteins specifically.
The trial that’s come the closest to showing a relationship is Biogen’s Phase I/II for an experimental drug called tofersen, Cudkowicz said, which showed a clinical benefit in addition to about 50% lower neurofilament levels. But that study was very small with only four people tested on the high dose, she added, and even though it looked like there wasn’t much progression compared to the placebo, it’s still “way too small” for anyone to say.
BrainStorm had done extensive biomarker research before conducting its studies, and it’s one of the things that impressed Petersen and Green before they signed up for the experimental therapies. Cudkowicz noted their trial saw lowered levels of neuroinflammation — something BrainStorm says correlated to a potential clinical benefit in that small group — despite NurOwn’s major whiffs on the primary and secondary endpoints.
In the Phase III NurOwn study, researchers looked at three different kinds of biomarkers, Brainstorm president and CMO Ralph Kern said. The first looked at neuroinflammation by examining levels of cytokines and effector molecules; the second examined neuronal injury and neurodegeneration biomarkers, the umbrella containing neurofilament; and the third explored how repair molecules delivered appropriate cargo to injured cells.
Though each biomarker serves a different purpose, when taken together they paint a “comprehensive picture,” Kern said.
And that biomarker research is also one of the reasons Green feels that the Phase III study shouldn’t be discounted as a failure. If the NurOwn therapy can show a potential benefit for a small group of patients, Green argues, then BrainStorm and the FDA should be doing everything in their power to make it available to that group of patients. In ALS, every little thing helps, he said.
“From an overall systemic perspective, we really need to rethink what a win looks like in ALS clinical trials,” Green said. “It’s time that we have a working meeting with all of the stakeholders to really figure out how to fix the therapeutic development process in ALS, because we’ve had so many failed trials in our disease. But maybe it’s not the therapies that are failing, it’s the process in which they’re being evaluated that’s failing the therapies.”
Not everyone is on board the neurofilament train, however, with the folks at Amylyx expressing doubt that it might be the key researchers are looking for. Finding a clinical biomarker for any neurological disease would amount to a “holy grail,” CEO Joshua Cohen said, but added there’s still a lot more work that needs to be done for something to definitively replace the ALSFRS-R.
It’s worth noting that Amylyx’s program, which hit statistical significance in the ALSFRS-R scale and will be submitted for approval in Europe and Canada by the end of the year, showed no correlation between slowing progression and neurofilament levels — a fact Cohen offered up freely.
But the question for him comes down to how much stock you then put into the measurement tests. If the drug is shown to improve the neurofilament levels, or improvements in another biomarker, but not slow disease progression according to the ALSFRS-R, is that something that should be given to patients?
“At the end of the day, if the drug is not impacting people’s daily functions — if it’s impacting the blood biomarker but not the function — I’m not sure that’s optimal,” Cohen said. “I think what would really make a biomarker convincing is one impacting that biomarker would also correspondingly impact that function. And I think to date we just don’t have that.”
Going forward, companies and investigators are continuing to tweak and experiment with ALS trial designs with the aim of including more patient input. There are other methods out there being developed to provide either a more granular rating scale or a more comprehensive look at how drugs correlate to symptoms, including a newer rating survey called the Rasch-Built Overall Amyotrophic Lateral Sclerosis Disability Scale, or ROADS. Petersen and Green both feel ROADS is less subjective than the ALSFRS-R and are trying to push companies to include it in future drug trials.
Clene Nanomedicine, a biotech developing an ALS treatment made from gold nanocrystals, is trying to supplement potentially registrational data from the HEALEY platform study with another trial in Australia using an approach called MUNIX(4). Here, researchers place electrodes on patients’ muscles and compare how they contract with electric stimulation to how patients can move them on their own.
Clene noted that most US companies don’t use MUNIX anymore because it’s much more time-consuming and expensive to administer compared to something like the ALSFRS-R, which is relatively cheap and only takes about 10 to 15 minutes.
“In a way, if you think about HEALEY like this, HEALEY’s looking at their secondary [endpoint] as lung function and handheld diometry,” Clene CEO Rob Etherington said. “And that’s effectively akin in a US model to what MUNIX is. So we see some interesting inverse correlations between these two data.”
One potential biomarker that’s piqued Cedarbaum’s interest is MRIs of patients’ brains and spinal cords, because it could give researchers a better idea of when neurons start becoming damaged or dying off. It’s a method that’s gained prominence in studies of Alzheimer’s disease, and such cell loss would result in a quantifiable shrinking of the brain in ALS patients, he said.
But the ALSFRS-R likely isn’t going away anytime soon. Cedarbaum feels similarly to Amylyx, saying that he doesn’t believe something like neurofilament or another biomarker will ever fully replace the scale, either in its current form or an updated version. There is too little known about how neurofilament directly correlates with how patients feel, which Cedarbaum puts atop his personal list of priorities.
Though a drug may keep neurofilament down for an extended period of time — Cedarbaum offered a year or two as an example — it’s not yet clear whether the ALSFRS-R scores will continue going down. And when all is said and done, the best way Cedarbaum feels a drug or therapy can help patients is by asking them.
“The bottom line is the ALSFRS-R is a good scale, it’s clearly good enough because now we see it can show treatment differences,” Cedarbaum said. “But a lot has happened in 25 years that a new version of the scale could improve on.”