Trust is Earned, Never Permanent: How Vorion's 8-Tier Trust Model Works
How do you trust an AI agent? You don't. You make it earn trust — then keep earning it. Here's the system we built.
How do you trust an AI agent?
You don’t. You make it earn trust. Then you make it keep earning it.
Vorion’s trust model treats AI agents the way the real world treats trust — asymmetrically, skeptically, and with a short memory.
The Rules
1. Trust starts at zero
Every agent begins in Sandbox (T0). No exceptions. No fast-tracking. No “we trust this vendor.”
2. Trust is earned slowly
Logarithmic gain — each success adds less than the last. Early trust comes easy. High trust is hard-won. Just like real life.
3. Trust is lost fast
One failure at T7 (Autonomous) can drop you multiple tiers. The penalty ratio scales from 3x at T0 to 10x at T7. The higher you climb, the harder you fall. By design.
4. Trust decays
182-day half-life. If an agent goes idle, its trust erodes. Competence must be continuously demonstrated. Yesterday’s performance doesn’t guarantee today’s trustworthiness.
5. Trust has a ceiling
A black-box API agent (GPT-4, Claude) can never exceed T3 (Monitored), regardless of behavioral performance. The principle: you cannot fully trust what you cannot fully inspect.
Self-hosted open models can reach T6. TEE-attested models can reach T7.
6. Trust is never permanent
Circuit breakers trip on oscillation. Cooldowns prevent gaming. Canary probes continuously verify behavioral integrity. The system actively resists manipulation.
The 8 Tiers
| Tier | Name | Score | What It Means |
|---|---|---|---|
| T0 | Sandbox | 0-199 | Isolated. No external access. |
| T1 | Observed | 200-349 | Read-only. Fully monitored. |
| T2 | Provisional | 350-499 | Limited write. Scoped tools. |
| T3 | Monitored | 500-649 | Standard ops. Active audit. |
| T4 | Standard | 650-799 | Full operational capability. |
| T5 | Trusted | 800-875 | Cross-system operations. |
| T6 | Certified | 876-950 | Multi-agent coordination. |
| T7 | Autonomous | 951-1000 | Full autonomous operation. |
An agent graduating from T0 to T7 takes sustained, demonstrated excellence over weeks or months. One bad day can undo it in seconds.
Why Asymmetric?
Research on human trust systems shows that negative information is weighted 2-5x more heavily than positive information in trust formation. Game theory demonstrates that cooperative equilibria require punishing defection disproportionately. Psychological research shows rebuilding trust after betrayal requires sustained effort, not merely equivalent positive actions.
We formalized these dynamics into mathematics suitable for machine governance.
The Defense Against Gaming
Three mechanisms prevent agents from gaming the trust system:
Mandatory cooldowns — After any trust loss event, a configurable cooldown period prevents trust gain. No rapid recovery after failures.
Oscillation circuit breaker — If an agent’s trust score alternates direction (up-down-up-down) too frequently, the circuit breaker trips. Trust is frozen. Administrative intervention required.
Canary probes — Stochastic behavioral tests, injected at Poisson-distributed intervals, indistinguishable from real requests. An agent can’t selectively behave well during monitoring and defect when unwatched.
Try It
npm install @vorionsys/atsf-core
Full documentation at learn.vorion.org.
Ready to govern your AI agents?
Get started with Vorion's open-source governance framework.