Trust is Earned, Never Permanent: How Vorion's 8-Tier Trust Model Works

How do you trust an AI agent?

You don’t. You make it earn trust. Then you make it keep earning it.

Vorion’s trust model treats AI agents the way the real world treats trust — asymmetrically, skeptically, and with a short memory.

The Rules

1. Trust starts at zero

Every agent begins in Sandbox (T0). No exceptions. No fast-tracking. No “we trust this vendor.”

2. Trust is earned slowly

Logarithmic gain — each success adds less than the last. Early trust comes easy. High trust is hard-won. Just like real life.

3. Trust is lost fast

One failure at T7 (Autonomous) can drop you multiple tiers. The penalty ratio scales from 3x at T0 to 10x at T7. The higher you climb, the harder you fall. By design.

4. Trust decays

182-day half-life. If an agent goes idle, its trust erodes. Competence must be continuously demonstrated. Yesterday’s performance doesn’t guarantee today’s trustworthiness.

5. Trust has a ceiling

A black-box API agent (GPT-4, Claude) can never exceed T3 (Monitored), regardless of behavioral performance. The principle: you cannot fully trust what you cannot fully inspect.

Self-hosted open models can reach T6. TEE-attested models can reach T7.

6. Trust is never permanent

Circuit breakers trip on oscillation. Cooldowns prevent gaming. Canary probes continuously verify behavioral integrity. The system actively resists manipulation.

The 8 Tiers

Tier	Name	Score	What It Means
T0	Sandbox	0-199	Isolated. No external access.
T1	Observed	200-349	Read-only. Fully monitored.
T2	Provisional	350-499	Limited write. Scoped tools.
T3	Monitored	500-649	Standard ops. Active audit.
T4	Standard	650-799	Full operational capability.
T5	Trusted	800-875	Cross-system operations.
T6	Certified	876-950	Multi-agent coordination.
T7	Autonomous	951-1000	Full autonomous operation.

An agent graduating from T0 to T7 takes sustained, demonstrated excellence over weeks or months. One bad day can undo it in seconds.

Why Asymmetric?

Research on human trust systems shows that negative information is weighted 2-5x more heavily than positive information in trust formation. Game theory demonstrates that cooperative equilibria require punishing defection disproportionately. Psychological research shows rebuilding trust after betrayal requires sustained effort, not merely equivalent positive actions.

We formalized these dynamics into mathematics suitable for machine governance.

The Defense Against Gaming

Three mechanisms prevent agents from gaming the trust system:

Mandatory cooldowns — After any trust loss event, a configurable cooldown period prevents trust gain. No rapid recovery after failures.

Oscillation circuit breaker — If an agent’s trust score alternates direction (up-down-up-down) too frequently, the circuit breaker trips. Trust is frozen. Administrative intervention required.

Canary probes — Stochastic behavioral tests, injected at Poisson-distributed intervals, indistinguishable from real requests. An agent can’t selectively behave well during monitoring and defect when unwatched.

Try It

npm install @vorionsys/atsf-core

Full documentation at learn.vorion.org.

Trust is Earned, Never Permanent: How Vorion's 8-Tier Trust Model Works

The Rules

1. Trust starts at zero

2. Trust is earned slowly

3. Trust is lost fast

4. Trust decays

5. Trust has a ceiling

6. Trust is never permanent

The 8 Tiers

Why Asymmetric?

The Defense Against Gaming

Try It

Ready to govern your AI agents?

More from the blog

The AI Replaced My Groceries with Enterprise GPUs

My Agent Decided 88°F is the 'Optimal Temperature for Coding'

Why My AI Endorsed 400 Strangers for 'Microsoft Paint'

How Can We Help You?