Tomorrow's Dose
Posts
Edition 17 - AI at the Ceiling: Specificity, Expertise, and the NHS at Scale

Edition 17 - AI at the Ceiling: Specificity, Expertise, and the NHS at Scale

Discover where AI lung cancer detection hits a specificity wall, explore what JAMA found when AI faced expert dermatologists in 1,117 real cases, and learn how the NHS is rolling out AI to 505,000 staff.

June 11, 2026

AI detects lung cancer nodules well, but specificity gaps limit its clinical use.
AI outperforms inexperienced dermatologists but falls short of experts in 1,117 real-world cases.
NHS England deploys Microsoft AI to 505,000 staff — 43 minutes saved per person per day.

Featured follow of the week
Top posts of the week across social
Meet the editor
Want a featured article?

Specialty: Radiology // Sub-Specialty: AI // Body Site: Lung

1. AI detects lung cancer nodules well but shows low specificity

A systematic review and meta-analysis published June 3 in Radiology: Artificial Intelligence evaluated 21 studies and 7,454 lung nodules from CT imaging to assess the real-world generalisability of AI models for classifying pulmonary nodule malignancy. Led by a group in the Netherlands, the analysis found a pooled sensitivity of 88% but a pooled specificity of only 75% when AI models were tested against external datasets. The area under the receiver operating characteristic curve (AUROC) was 0.89 and the diagnostic odds ratio 22.4. Heterogeneity across included algorithms was high (I² > 90%), and 81% of studies involved Asian populations. The authors flagged this as a significant generalisability concern for non-Asian clinical settings. The team concluded that current AI models may support rule-out of malignancy in lung nodules but that ‘moderate specificity limits their use for definitive classification of malignant nodules.’

Read Full Article

Paul’s Thoughts:

The specificity figure is the critical number here: 75% means one in four nodules the AI calls malignant is not. In a high-volume lung screening programme, that translates directly to unnecessary biopsies, patient anxiety, and cost. Hence why the authors frame this as a rule-out tool rather than a rule-in tool. At GMI we have seen similar patterns with other deployed AI tools: sensitivity is routinely excellent in validation studies, but specificity (and therefore positive predictive value in low-prevalence populations) is where real-world clinical utility often falls short. The I² greater than 90% is also concerning: it tells you that the performance range across algorithms is enormous, and that pooled numbers mask wide variation between products. With 81% of training data from Asian populations, European departments deploying lung nodule AI tools should be asking vendors hard questions about external validation in European, non-screening populations before assuming their specificity figures hold. The risk is purchasing a tool that performs brilliantly in its training environment and fails in yours.

Timescale: Acute | 1 Year

Specialty: Dermatology // Sub-Specialty: AI // Body Site: Skin

2. AI outperforms inexperienced dermatologists but cannot match experts in real-world skin cancer diagnosis

A large multi-institutional diagnostic study published June 3 in JAMA Dermatology used the Test of Dermoscopy for International Validation (TODIV) platform (1,117 real-world clinical cases representative of everyday dermatology practice) to compare the diagnostic accuracy of modern AI models with clinicians, across five experience levels. Expert dermatologists with more than 10 years of experience achieved the highest overall accuracy at 74.2%, significantly outperforming all AI systems. AI models performed better than physicians with less than one year of dermoscopy experience, who achieved 59.1% accuracy, but fell behind clinicians at intermediate and senior experience levels. The authors concluded that expert dermatologists remain the reference standard for skin cancer diagnosis, and that future research should focus on improving AI generalisability and evaluating collaborative human-AI workflows rather than autonomous deployment.

Read Full Article

Paul’s Thoughts:

This study is important because it quantifies what many clinical AI practitioners have suspected: AI performance is not homogeneous across a clinical population. The 10-year experience threshold is the inflection point: below it, AI adds value; above it, the expert surpasses the tool. That is a deployment insight, not a criticism of the technology. The problem is that most AI procurement decisions are made by senior clinicians who are, by definition, in the group that benefits least. The dataset is substantial: 1,117 cases from an international dermoscopy validation platform is a robust real-world test, not a curated benchmark. The open question is whether AI-assisted workflows can shift the performance curve for the 59%-accuracy group toward the expert range. The answer to that will determine whether AI in dermatology becomes a genuine training tool or simply a safety net for the least experienced. If it is the former, that is a compelling argument for rolling it out in primary care and general practice first, rather than specialist settings.

Timescale: Early | 3 Years

Specialty: All // Sub-Specialty: AI // Body Site: NA

3. NHS England rolls out Microsoft 365 Copilot to 505,000 staff in largest healthcare AI deployment in UK history

NHS England announced on June 7 that it is deploying Microsoft 365 Copilot to 505,000 clinicians and support staff across health and care services. This represents the largest healthcare AI deployment in UK history. The rollout follows a pilot across more than 30,000 NHS workers in 90 organisations, which found that AI-assisted administrative support saved an average of 43 minutes per staff member per day, equating to approximately five weeks of additional capacity per person annually. NHS England estimates that a full rollout could return millions of clinical hours per year to patient-facing care. The deployment is focused on administrative and documentation tasks (drafting correspondence, summarising patient notes, scheduling support) and does not involve autonomous clinical decision-making. The announcement reflects NHS England’s commitment to embedding AI into core digital infrastructure as part of its ten-year plan for productivity improvement.

Read Full Article

Paul’s Thoughts:

The 43-minutes-per-day figure is credible, with similar savings being reported in NHS Copilot pilots at individual trusts. However, forty-three minutes for a band 3 administrator is a different story to 43 minutes for a consultant radiologist. The aggregate claim of millions of hours saved is compelling at a headline level, but the question that matters for clinical AI advocates is: does this administrative dividend actually free up time for clinical AI implementation, governance, and surveillance work, or does it simply get absorbed into existing workloads? I’ve seen both outcomes in healthcare settings. There is also a structural risk worth naming: administrative AI is politically safe, demonstrably efficient, and relatively easy to deploy at scale. Clinical AI, which carries liability, requires rigorous validation, and challenges clinical authority, is none of those things. The risk is that the NHS uses Copilot as evidence of AI leadership while harder clinical AI decisions are quietly deprioritised. The 505,000-staff rollout is impressive. What matters now is what comes next in the clinical pipeline.

Timescale: Acute | 1 Year

Roxana Daneshjou, MD, PhD

Assistant Professor of Dermatology and Biomedical Data Science, Stanford University School of Medicine | Faculty Affiliate, Stanford HAI

Follow Roxana Daneshjou for some of the most rigorous and widely cited work on AI validation, bias, and equity in dermatology. This week’s edition connects directly to her agenda: the JAMA Dermatology study in Story 2 surfaces exactly the kind of real-world performance gap her validation and equity research has been working to expose and address.

A round-up of some of the best posts we found online this week.

WHO publishes first discussion paper on AI and evidence-informed health policy; flagging risks of bias, epistemic injustice, and regulatory gaps in AI-assisted policymaking.

AI-assisted navigation lifts lung cancer screening uptake from 18% to 42% over five years at a 17-hospital US health system. Presented at ASCO 2026.

Microsoft and Mayo Clinic partner to build AI trained on clinical expertise to help patients understand their diagnoses and support physician decision-making.

Was this email forwarded to you?
Our weekly email brings you the latest health trends and insights, combining top news and opinions into a straightforward, digestible format.

Want an article featured?

Have an insightful link or story about the future of medical health? Reach out below, and we may include it in a future release.

Reply

or to participate.