Most R&D leaders do not need another lecture about attrition. They need a clearer view of why apparently solid preclinical packages still fail to travel into early clinical success. The issue is rarely that any single model is wrong. It is more often that good tools are applied out of clinical context, exposures are optimistic, biomarkers do not port into real protocols and busy sites, and governance tolerates ambiguous signals longer than it should.
Recent outcomes make the point. Late last year, Roche reported that its anti-TIGIT antibody tiragolumab failed to improve overall survival in the Phase 3 SKYSCRAPER-01 study in PD-L1-high non-small cell lung cancer when combined with atezolizumab, and subsequent communications through 2025 reinforced how fragile the translational signal proved without sharper patient selection and assay portability. This is not a verdict on the entire TIGIT class. It is a reminder that preclinical evidence must be designed to answer the same clinical question a pivotal program is actually asking. The biology can be elegant and the execution flawless, yet if the selection rule, cutoff logic, and clinical sampling plan were never truly stress-tested upstream, the bridge can still collapse.
Macrophage checkpoint strategies faced their own resets. Earlier in 2024, Gilead discontinued the Phase 3 ENHANCE-3 study of magrolimab in acute myeloid leukemia, and on the same day the FDA placed related magrolimab AML and MDS studies on full clinical hold. Again, this is not an indictment of anti-CD47 as a concept. It is a caution that even well-grounded mechanisms stall when three things are not aligned before first-in-human, patient-relevant biology, human-achievable exposure, and biomarkers that are statistically sound and operationally usable across real clinical sites.
The practical question is what these and other examples teach about the decisions that precede a first dose in humans.
First, biology mismatch. Models often underrepresent pretreated patient histories and resistance mechanisms that define modern lines of therapy. Effects seen in naive systems can vanish once prior therapy scarring, stromal dynamics, or immune context are present. This is not an argument to discard faster models. It is an argument to ask whether bench biology resembles the patient and protocol under consideration. If a Phase 1 plan will enroll heavily pretreated patients on a checkpoint backbone, the evidence that advances a program should be generated in model systems that encode that reality, or it should explicitly acknowledge the gap.
Second, exposure optimism. Many preclinical effects are generated at concentrations that are difficult to reach safely in humans. Teams sometimes treat mouse maximum tolerated doses as permission rather than caution. Early pharmacokinetics and tissue distribution provide guardrails, but only if the planned human dose and schedule are part of the conversation from the start. If the preclinical effect sits above what a first-in-human protocol can plausibly achieve, that is an insight, not a failure. It should prompt a redesign, a backbone change, or a stop.
Third, non-portable biomarkers. Discovery-phase signatures often collapse during transfer to clinical labs. Cutoffs drift. Assay formats change. Site workflows cannot accommodate sampling plans designed in isolation from protocol reality. The scientific signal may be real, yet operationally unusable. Publication bias toward positive translational stories does not help here. The field needs more honest accounts of biomarkers that did not survive site-level realities and why.
There is one shift that improves translation across modalities. Treat preclinical evidence as if it must stand up inside an actual clinical protocol. That mindset is not new, but it is not yet universal. It demands three habits that do not require a procedural manual.
Start with the patient and the protocol, not the platform. Name the intended line of therapy, the likely backbone, the dose window that investigators will test, and the sampling that busy sites can execute. When those anchors are explicit, model choices become less ideological and more about fit. The point is not to prove a model is superior. The point is to show that this model answers this clinical question under these constraints.
Demand exposure plausibility, not just efficacy. Link preclinical effects to human-achievable exposures using early PK and simple modeling. Insist on a plain English verdict. Either the effect sits inside a credible clinical range, or it does not. If it does not, acknowledge it and explain the path to plausibility, for example a schedule change, a delivery strategy, or a combination that raises the free exposure at the target site.
Design for operational biomarkers. If a biomarker will guide selection or interpretation, lock the assay format early, define the cutoff logic, and test whether it survives inter-lab variability and site workflows. If it does not, either rework it or remove it from the decision story. Selection rules that cannot be executed reliably at speed will fail the study even when the mechanism is sound.
The industry has spent a decade arguing model tribes. That debate has limited value for decision-makers. Cell-line xenografts enable speed and mechanism hypotheses. Patient-derived models and ex vivo systems capture resistance biology and dosing realism. Organoid and co-culture formats can probe heterogeneity and immune interactions. In silico components bound feasibility and inform dose logic. None of these is a universal answer, and all of them can be misused. The portfolio conversation should ask a simpler question. What clinical decision are we trying to inform now, and what evidence, if positive, would change the next step in a trial plan.
Regulators continue to signal interest in methods that better predict human outcomes. In April 2025, the FDA announced a plan to accelerate the use of New Approach Methodologies, including computational models and human-relevant in vitro systems, to supplement or replace certain animal tests where appropriate. In July 2025, the agency’s New Alternative Methods program site outlined how these approaches are being implemented and encouraged in regulatory submissions. The message is not novelty for novelty’s sake. It is fitness for regulatory purpose and better predictivity. That should encourage teams to elevate quality over volume in preclinical plans. In many programs, fewer studies, selected and sequenced against a clinical question, improve clarity more than larger batteries of loosely connected experiments.
A good preclinical story reads like the first chapter of a clinical protocol. It defines the human context plainly. It shows an effect where pretreatment and resistance are represented. It ties that effect to exposure that humans can reach at a feasible dose and schedule. It presents a biomarker that a site can run with stable performance and a clear decision rule. It identifies what would stop the program now, not six months later. Sponsors can disagree about which tools belong in the portfolio. They should not disagree about the standard for what counts as portable evidence.
The negative case is equally instructive. When a team discovers that its effect sits beyond credible human exposure, or that a marker cannot be ported without erasing the signal, stopping is a sign of strength, not weakness. The cost saved is not theoretical. It is a trial avoided, a cohort not exposed, and a budget preserved for programs with a better chance to help patients. In that light, negative results are not reputational hazards. They are demonstrations that governance is working.
It is tempting to overfit lessons to a single program. That would be a mistake. Instead, treat named outcomes as prompts for better questions. From Roche’s November 2024 SKYSCRAPER-01 readout with tiragolumab in PD-L1-high NSCLC, ask whether the patient selection rule and assay portability were locked early enough to survive global execution pressure. From Gilead’s February 2024 ENHANCE-3 discontinuation in AML and the concurrent magrolimab holds, ask whether exposure feasibility and site-level biomarker performance were challenged hard enough when the preclinical case was most compelling. These questions are uncomfortable, and they are the right ones.
Three collective moves would help. First, create more transparency around biomarker portability. Industry consortia and publishers can encourage brief, practical negative reports when assays fail to transfer. Second, improve how we talk about exposure. A simple figure that compares the preclinical effect level to predicted human Cmax or AUC should be standard in early decision forums. Third, reward teams for stopping early. Governance cultures that celebrate early no-go calls will end up with better first-in-human studies and fewer late disappointments.
Oncology has never had more tools, data, and ambition. That richness can become noise if it is not organized around the decisions that matter most. Evidence that survives the clinic is not a flourish. It is a standard. When preclinical work is planned and judged against clinical reality, fewer programs will advance on wishful readings of the data, and more will reach first-in-human with a real chance to show benefit. The examples of tiragolumab in and magrolimab do not prove a single doctrine. They prove a point about discipline. Translation is fragile. It demands respect for biology as patients experience it, respect for exposure as humans can tolerate it, and respect for evidence that can be executed in the real world of clinical care. That is the work. It is also the opportunity.