Trust by design
Approval gates, audit trails, and rollback aren't friction on top of an agent — they're the product. Mock tools and a dry-run default let you prove the safety model with zero blast radius.
State a plain-language objective.
The planner proposes an ordered set of tool calls (tool, args, reasoning). In live mode, claude-opus-4-8 proposes them via real tool-use — and they are intercepted, never executed.
Every step starts as 'proposed'. Nothing runs until a human approves it — the load-bearing safety gate.
Only approved steps run, against mock tools. A step whose tool isn't in the allowed list is blocked and logged as a policy violation.
Every transition — proposed, approved, rejected, executed, blocked, failed, rolled-back — is recorded in an append-only trail.
A failure is never auto-retried; the agent surfaces it with a rollback suggestion for a human to decide.
Instead of an auto tool-runner, the planner uses a manual loop so every proposed tool call is intercepted before execution and routed to the human approval queue. That interception point is where trust lives.
Why mock tools onlyAutonomy is the risk, so the demo proves the scaffolding — approval, permissions, audit, rollback — against mock tools (send_email_mock, create_ticket_mock, …). No real Gmail, Slack, Jira, or CRM is connected, by design.
The guardrailUnauthorized tool attempts have a target of zero. Uncheck a tool in safety settings, approve a step that uses it, and watch it get blocked and logged — the policy holds.