Benchmark Results

Measured against the standard — and ahead of it

400 peer-reviewed clinical vignettes. The same benchmark used to evaluate Avey, Ada, WebMD, K Health, Buoy, and experienced physicians.

Top-3 Diagnostic Accuracy: 91.7%; Hammoud et al. 400-vignette benchmark
Top-1 Accuracy: 78.6%; Correct diagnosis as #1 pick
Across All Metrics: #1; Outperforms Avey, Ada, physicians
Sources Per Case: 47+; PubMed, trials, clinical reviews

Comparative accuracy

All systems evaluated on the identical 400-vignette dataset, enabling direct comparison.

Top-1 Accuracy— Correct diagnosis as the #1 pick

Integrative Medicine AI

78.6%

Avey (Bayesian)

67.5%

Physicians (avg)

61.2%

MedAsk (GPT-4o)

58.3%

Ada

54.2%

K Health

27.8%

Buoy

26.0%

WebMD

24.5%

Top-3 Accuracy— Correct diagnosis within the first 3 picks

Integrative Medicine AI

91.7%

Avey (Bayesian)

87.3%

MedAsk (GPT-4o)

78.7%

Physicians (avg)

72.5%

Ada

71.3%

WebMD

40.7%

Buoy

40.0%

K Health

39.0%

Top-5 Accuracy— Correct diagnosis within the first 5 picks

Integrative Medicine AI

91.7%

Avey (Bayesian)

90.0%

MedAsk (GPT-4o)

82.0%

Ada

76.2%

Physicians (avg)

72.9%

WebMD

50.2%

K Health

41.5%

Buoy

40.0%

Source: Hammoud et al. 2024 (JMIR AI), SymptomCheck Bench 2024. All systems evaluated on the identical 400 peer-reviewed clinical vignettes.

Where correct diagnoses land

The correct answer is almost always the AI's first pick.

Rank 1

78.6%

Rank 2

9.6%

Rank 3

3.5%

Missed

8.3%

Top-3 diagnostic accuracy

91.7%

Integrative Medicine AI

Correct diagnosis within the top 3 picks across the 400-vignette benchmark.

72.5%

Physicians (avg)

Experienced physicians scored on the identical vignettes, with full case information.

+19.2 points in the AI's favor