Back to Blog
· Ryan

Trust is Earned, Never Permanent: How Vorion's 8-Tier Trust Model Works

How do you trust an AI agent? You don't. You make it earn trust — then keep earning it. Here's the system we built.

trust-scoringtechnicalATSF

How do you trust an AI agent?

You don’t. You make it earn trust. Then you make it keep earning it.

Vorion’s trust model treats AI agents the way the real world treats trust — asymmetrically, skeptically, and with a short memory.

Trust Tiers: T0 → T7 Trust is earned, never permanent. Score range: 0-1000. T7 Autonomous 951-1000 Full autonomy, self-governance 10x penalty T6 Certified 876-950 Multi-agent coordination 9.6x T5 Trusted 800-875 Cross-system ops 9.1x T4 Standard 650-799 Full operational 8.7x T3 Monitored 500-649 Active audit T2 Provisional 350-499 T1 Observed 200-349 T0 Sandbox 0-199 ← API models capped here (T2 max) Gain: logarithmic (slow) Loss: 3x-10x asymmetry Decay: 182-day half-life Circuit breaker on oscillation

The Rules

1. Trust starts at zero

Every agent begins in Sandbox (T0). No exceptions. No fast-tracking. No “we trust this vendor.”

2. Trust is earned slowly

Logarithmic gain — each success adds less than the last. Early trust comes easy. High trust is hard-won. Just like real life.

3. Trust is lost fast

One failure at T7 (Autonomous) can drop you multiple tiers. The penalty ratio scales from 3x at T0 to 10x at T7. The higher you climb, the harder you fall. By design.

4. Trust decays

182-day half-life. If an agent goes idle, its trust erodes. Competence must be continuously demonstrated. Yesterday’s performance doesn’t guarantee today’s trustworthiness.

5. Trust has a ceiling

A black-box API agent (GPT-4, Claude) can never exceed T3 (Monitored), regardless of behavioral performance. The principle: you cannot fully trust what you cannot fully inspect.

Self-hosted open models can reach T6. TEE-attested models can reach T7.

6. Trust is never permanent

Circuit breakers trip on oscillation. Cooldowns prevent gaming. Canary probes continuously verify behavioral integrity. The system actively resists manipulation.

The 8 Tiers

TierNameScoreWhat It Means
T0Sandbox0-199Isolated. No external access.
T1Observed200-349Read-only. Fully monitored.
T2Provisional350-499Limited write. Scoped tools.
T3Monitored500-649Standard ops. Active audit.
T4Standard650-799Full operational capability.
T5Trusted800-875Cross-system operations.
T6Certified876-950Multi-agent coordination.
T7Autonomous951-1000Full autonomous operation.

An agent graduating from T0 to T7 takes sustained, demonstrated excellence over weeks or months. One bad day can undo it in seconds.

Why Asymmetric?

Research on human trust systems shows that negative information is weighted 2-5x more heavily than positive information in trust formation. Game theory demonstrates that cooperative equilibria require punishing defection disproportionately. Psychological research shows rebuilding trust after betrayal requires sustained effort, not merely equivalent positive actions.

We formalized these dynamics into mathematics suitable for machine governance.

The Defense Against Gaming

Three mechanisms prevent agents from gaming the trust system:

Mandatory cooldowns — After any trust loss event, a configurable cooldown period prevents trust gain. No rapid recovery after failures.

Oscillation circuit breaker — If an agent’s trust score alternates direction (up-down-up-down) too frequently, the circuit breaker trips. Trust is frozen. Administrative intervention required.

Canary probes — Stochastic behavioral tests, injected at Poisson-distributed intervals, indistinguishable from real requests. An agent can’t selectively behave well during monitoring and defect when unwatched.

Try It

npm install @vorionsys/atsf-core

Full documentation at learn.vorion.org.

Ready to govern your AI agents?

Get started with Vorion's open-source governance framework.