Methodology

How the score is built.

Every intervention gets four 0–10 axes — Evidence, Benefit, Safety, Access — blended into a weighted composite, then mapped to a 14-grade letter scale. Same rubric across every category, so a vitamin sits on the same scale as a prescription drug.

The formula

Composite = 0.40·Ev + 0.25·Bn + 0.25·Sf + 0.10·Ax
(each axis 0–10 → composite 0–10)

Evidence carries the heaviest weight because the brand promise is evidence-first. Benefit and Safety are tied — a large effect with bad safety isn't a winner, and a safe placebo isn't either. Access is the lightest axis but still counts: a $4,000 prescription with no insurance pathway is a different product than the same molecule sold over the counter.

Evidence (40%)

How much is actually known about this in humans?

Benefit (25%)

If the claim is correct, how big is the effect?

Safety (25%)

Inverted from risk: higher score = safer.

Access (10%)

How hard is it to legally and practically obtain this in the US?

From composite to letter grade

The 0–10 composite gets mapped to a 14-grade letter scale. Each grade covers a 0.6-point band; F catches everything below 2.2.

S ≥ 9.4 Top of class. Strong evidence, real effect, low risk, accessible.
A+ 8.8 – 9.4 Exceptional. Multiple converging high-quality trials with sizeable effect.
A 8.2 – 8.8 Strong overall. Well-evidenced, meaningful effect, manageable risk.
A- 7.6 – 8.2 Strong with one weaker axis (cost, access, or modest effect size).
B+ 7.0 – 7.6 Solid case. Most axes good; one notable trade-off.
B 6.4 – 7.0 Solid. Reasonable evidence, modest effect, some friction.
B- 5.8 – 6.4 Solid lower edge. Defensible but dependent on context.
C+ 5.2 – 5.8 Limited. Mixed evidence, small effect, or noticeable risk.
C 4.6 – 5.2 Limited. One or two real weaknesses.
C- 4.0 – 4.6 Limited lower edge. Hard to justify outside narrow use cases.
D+ 3.4 – 4.0 Weak. Thin data or unfavourable risk/benefit.
D 2.8 – 3.4 Weak. Multiple axes underperform.
D- 2.2 – 2.8 Weak lower edge. Borderline avoid.
F < 2.2 Avoid or defer. Evidence absent, harms outweigh, or both.

Toxins use a separate scale

Toxins (heavy metals, pollutants, lifestyle exposures) aren't on a benefit ladder — there's no upside to maximize. We score them on avoidance priority instead: harm magnitude, evidence of that harm, and exposure prevalence. Higher priority means worth more attention to reduce exposure.

Priority = 0.40·Magnitude + 0.30·Evidence + 0.30·Prevalence

CRITICAL ≥ 8.5 Population-level harm, strong evidence. Reduce exposure now.
HIGH ≥ 6.5 Significant harm with broad exposure. Worth active mitigation.
MODERATE ≥ 4.0 Real harm in many settings. Worth attention.
LOW ≥ 2.0 Limited or context-dependent harm.
MINIMAL ≥ 0.0 Weak or speculative harm signal.

What the score isn't

It's a running synthesis, not a prescription. Three big caveats:

Who does the scoring

I do. I'm not a clinician. I read the literature obsessively and call scores based on what I read. Every score links to the sources that produced it, so you can disagree with me and see exactly where. I update scores when readers point out studies I missed.

Full context: who I am · disclaimer.