Calibrating Trust in Diagnostic Copilots
← All Projects
AI-1182024Clinical AI

Calibrating Trust in Diagnostic Copilots

Year
2024
Duration
9 months
Cohort Size
48 radiologists
Headline Result
+41%
Overview

A leading diagnostic AI vendor wanted to ship a lesion-detection copilot for chest CT, but pilot deployments showed worrying patterns: radiologists either over-trusted high-confidence predictions or wholesale ignored the system. We were brought in to characterise — and fix — the trust calibration problem.

Challenge

Confidence scores expressed as percentages were systematically misinterpreted. Radiologists treated 85% confidence as 'definitely there' and 60% as 'almost certainly not.' This binarisation defeated the calibration the model team had carefully built.

Approach

We ran a within-subject study with 48 board-certified radiologists across two countries. Each read 32 CT scans under three UI conditions: baseline (percentage), redesigned (calibrated visual language + counterfactual examples), and control (no AI). Eye-tracking captured attention to AI cues; verbal protocols captured reasoning.

Key Findings
  • 01Baseline UI produced 'lazy reliance' on 28% of cases — radiologists agreed with the AI without examining the scan.
  • 02Redesigned UI cut lazy reliance to 11% (−61%) without reducing total agreement rate.
  • 03Appropriate reliance (agreement when AI correct, disagreement when AI wrong) rose by 41%.
  • 04Counterfactual examples were the single highest-impact design element.
Next Project

Summative HFE for Smart Infusion Pumps

CATT
CATT사용성평가연구센터

Center for Advanced Technology and Testing — an independent usability research lab for AI, robotics, and medical devices.

Research
  • AI-Embedded Devices
  • Robotics
  • Medical Devices
  • Human-Computer Interaction
Contact
  • segeberg@kmu.ac.kr
  • 053-580-8980
  • CATT · Keimyung University
© 2026 CATT · Center for Advanced Technology and Testing · All rights reserved.
ISO 9241-210IEC 62366-1 LAB ONLINE