Methodology

How the score is built.

Every intervention gets four 0–10 axes — Evidence, Benefit, Safety, Access — blended into a weighted composite, then mapped to a 14-grade letter scale. Same rubric across every category, so a vitamin sits on the same scale as a prescription drug.

The formula

Composite = 0.40·Ev + 0.25·Bn + 0.25·Sf + 0.10·Ax
(each axis 0–10 → composite 0–10)

Evidence carries the heaviest weight because the brand promise is evidence-first. Benefit and Safety are tied — a large effect with bad safety isn't a winner, and a safe placebo isn't either. Access is the lightest axis but still counts: a $4,000 prescription with no insurance pathway is a different product than the same molecule sold over the counter.

Evidence (40%)

How much is actually known about this in humans?

9–10 — Very strong: Multiple high-quality RCTs, robust meta-analyses, or decades of human outcome data.
7–8 — Strong: At least one large, well-run RCT plus convergent secondary evidence.
5–6 — Moderate: One solid RCT, or several consistent observational studies.
3–4 — Weak / mixed: Small trials, conflicting results, or thin data.
0–2 — Preclinical or none: In-vitro, animal-only, anecdotal, or no usable data.

Benefit (25%)

If the claim is correct, how big is the effect?

9–10 — Very large: Mortality or disease-incidence shifts, or biomarker changes that translate clinically.
7–8 — Large: Reproducible effect sizes that show up in real-world outcomes.
5–6 — Moderate: Real but modest. Often only on surrogate biomarkers.
3–4 — Small: Detectable in trials, hard to feel in life.
0–2 — Negligible: Within noise or null on hard endpoints.

Safety (25%)

Inverted from risk: higher score = safer.

9–10 — Very low risk: No meaningful safety signal at reasonable doses.
7–8 — Low: Generally well-tolerated. Minor side-effects in a minority.
5–6 — Moderate: Real side-effect profile, drug interactions, or monitoring requirements.
3–4 — High: Serious adverse events at therapeutic doses. Clinician-supervised territory.
0–2 — Severe: Fatal-potential or unacceptable harm profile.

Access (10%)

How hard is it to legally and practically obtain this in the US?

9–10: Over-the-counter, freely available, or no purchase needed (a protocol).
5–8: Prescription channel, cash-pay, or off-label use.
3–4: Investigational, research-only, or jurisdiction-restricted.
0–2: Schedule I, illegal, or otherwise legally prohibited.

From composite to letter grade

The 0–10 composite gets mapped to a 14-grade letter scale. Each grade covers a 0.6-point band; F catches everything below 2.2.

S ≥ 9.4 Top of class. Strong evidence, real effect, low risk, accessible.

A+ 8.8 – 9.4 Exceptional. Multiple converging high-quality trials with sizeable effect.

A 8.2 – 8.8 Strong overall. Well-evidenced, meaningful effect, manageable risk.

A- 7.6 – 8.2 Strong with one weaker axis (cost, access, or modest effect size).

B+ 7.0 – 7.6 Solid case. Most axes good; one notable trade-off.

B 6.4 – 7.0 Solid. Reasonable evidence, modest effect, some friction.

B- 5.8 – 6.4 Solid lower edge. Defensible but dependent on context.

C+ 5.2 – 5.8 Limited. Mixed evidence, small effect, or noticeable risk.

C 4.6 – 5.2 Limited. One or two real weaknesses.

C- 4.0 – 4.6 Limited lower edge. Hard to justify outside narrow use cases.

D+ 3.4 – 4.0 Weak. Thin data or unfavourable risk/benefit.

D 2.8 – 3.4 Weak. Multiple axes underperform.

D- 2.2 – 2.8 Weak lower edge. Borderline avoid.

F < 2.2 Avoid or defer. Evidence absent, harms outweigh, or both.

Toxins use a separate scale

Toxins (heavy metals, pollutants, lifestyle exposures) aren't on a benefit ladder — there's no upside to maximize. We score them on avoidance priority instead: harm magnitude, evidence of that harm, and exposure prevalence. Higher priority means worth more attention to reduce exposure.

Priority = 0.40·Magnitude + 0.30·Evidence + 0.30·Prevalence

CRITICAL ≥ 8.5 Population-level harm, strong evidence. Reduce exposure now.

HIGH ≥ 6.5 Significant harm with broad exposure. Worth active mitigation.

MODERATE ≥ 4.0 Real harm in many settings. Worth attention.

LOW ≥ 2.0 Limited or context-dependent harm.

MINIMAL ≥ 0.0 Weak or speculative harm signal.

What the score isn't

It's a running synthesis, not a prescription. Three big caveats:

Personalisation dominates. A compound that's top-tier for population-level outcomes can be a terrible fit for you specifically. Your labs, your genetics, your other medications — all of that trumps a ranking.
Dose and context matter more than the grade. Most compounds here can be abused into harm or under-dosed into uselessness. The dosing column is a first pass, not a protocol.
Evidence moves. Grades update as new trials land and as we re-read the old ones. The changelog tracks every movement.

Who does the scoring

I do. I'm not a clinician. I read the literature obsessively and call scores based on what I read. Every score links to the sources that produced it, so you can disagree with me and see exactly where. I update scores when readers point out studies I missed.

Full context: who I am · disclaimer.