Documentation

Design, generate, and run sovereign AI agents

Overview

Forge is an AI agent factory. You design an agent through a conversation with the Forge architect at myagentos.ai/create. The architect captures your context, wires the capabilities your agent needs, and emits a downloadable agent zip you run on your own infrastructure with your own keys.

Architect (web)

LLM-powered designer that runs the Phase 1.5 interview and emits a scout-config spec.

Generator (Python)

Reads the spec, builds the workspace, renders the runner template, zips the result.

Runtime (in your agent)

The vendored scout_runtime your agent ships with — REPL, gateway, capabilities.

BYOK end-to-end. You bring your own LLM keys, your own per-capability credentials, your own hardware. Forge writes the agent and gets out of the way. Every byte that runs your agent lives in the zip you downloaded — no phone-home, no telemetry, no vendor lock-in.

Getting Started

Build Your First Agent

Five steps from idea to a running agent on your machine.

Go to myagentos.ai/create
Enter your LLM API key (Anthropic, OpenAI, Gemini, or Grok)
Describe what you want. The architect runs Phase 1.5 — a 5-wave context interview that captures who you are, what you want the agent to do, and how you want it to behave. This step is what makes the agent yours instead of generic.
Confirm the design when the architect summarizes it back
Click Build, download the zip

Quick test: "Build me an end-of-day inbox triage agent that drafts replies and surfaces what slipped." The architect will interview you, wire all 32 capabilities, generate the agent, and have you running it locally within minutes.

API Keys (BYOK)

Forge is bring-your-own-key end to end. There are two key surfaces to understand.

Provider	Key format	Cost model
Anthropic	sk-ant-...	Pay per token (Claude family)
OpenAI	sk-...	Pay per token (GPT family)
Gemini	AI...	Free tier + paid
Grok (xAI)	xai-...	Pay per token
Custom (Ollama, llama.cpp)	Any / none	Free (local)

The architect's key powers the design conversation at myagentos.ai. It only lives in your browser session — Forge never stores it server-side.

The agent's key powers your agent's reasoning at runtime. It goes in the .env file inside the downloaded zip.

In sovereign mode you don't need a runtime key at all — the LLM runs locally inside the container via Ollama.

Phase 1.5 Context Interview

This is what separates Forge from generic agent builders. Before the architect generates anything, it interviews you in 5 waves to capture 27 context fields. The answers get embedded into your agent's workspace files so the agent boots with you already known — no generic "How can I help you?" greeting.

F31 finalize gate: the architect cannot ship the agent until at least 5 required fields are captured — user_name, user_role, user_communication_style, response_length_preference, and primary_objective. The other 22 are strongly encouraged but optional.

Wave 1 — Who you are (8 questions)

Name, role, communication style, decision style, pet peeves, expertise areas, expertise gaps, timezone. Becomes your USER.md.

Wave 2 — Domain context (6 questions)

Primary objective, current workflow, pain points, ideal intervention point, escalation rules, audit requirements.

Wave 3 — Operational constraints (6 questions)

Data sources, output destinations, integration constraints, data sensitivity, budget, availability. Becomes your AGENTS.md.

Wave 4 — Voice (4 questions)

Voice role models, voice anti-patterns, humor tolerance, default response length. Becomes your SOUL.md.

Wave 5 — Hard rules (3 questions)

What the agent must NEVER do without confirmation, what it IS authorized to do unprompted, what triggers immediate escalation. Becomes your STANDING_ORDERS.md.

After your answers, the architect emits a scout-config JSON block. The web layer parses it and posts it to the generator, which embeds every captured field into your workspace files.

Your First Download

What you got is a self-contained Python package. Unzip, create a venv, install, fill in .env, run.

macOS / Linux

unzip agent.zip
cd agent
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
cp .env.example .env
# edit .env to add ANTHROPIC_API_KEY (and any per-capability keys
# the architect wired — Slack, GitHub, Notion, etc.)

# REPL mode (default)
python -m <agent-name>

# Gateway mode (for scout-tui or other external clients)
python -m <agent-name> --gateway

Windows (PowerShell)

Expand-Archive agent.zip -DestinationPath .
cd agent
py -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .
Copy-Item .env.example .env
# edit .env to add ANTHROPIC_API_KEY (and any per-capability keys
# the architect wired — Slack, GitHub, Notion, etc.)
notepad .env

# REPL mode (default)
py -m <agent-name>

# Gateway mode (for scout-tui or other external clients)
py -m <agent-name> --gateway

Need Python? macOS: brew install python or download from python.org. Linux: usually pre-installed; otherwise sudo apt install python3 python3-venv (Ubuntu/Debian) or sudo dnf install python3 (Rocky/Fedora). Windows: install from python.org with "Add to PATH" checked, or winget install Python.Python.3. If PowerShell blocks Activate.ps1, run Set-ExecutionPolicy -Scope CurrentUser RemoteSigned once.

See Gateway server and scout-tui client for the multi-client setup.

Platform Support

Forge agents run on Linux, macOS, and Windows. Every generated agent ships with cross-platform code — the shell capability detects Windows and uses cmd.exe, the python capability uses whichever interpreter you're already running, and all file paths use pathlib.

Platform	Status	Notes
Linux	✓ Full	All 32 capabilities work. Tested target.
macOS	✓ Full	All 28 work. TTS uses native `say`.
Windows 10/11	✓ Full	All 28 work. Use PowerShell commands above.
WSL on Windows	✓ Full	If you prefer Unix tooling on a Windows box.

The myagentos.ai website (architect chat, design, download) works in any modern browser — Chrome, Edge, Firefox, Safari — on any OS. Building the agent is OS-independent; only the "run the agent" step touches your machine, and that step has first-class support on all three.

Switching machines? Your agent is portable. Copy the folder (including .venv is optional — you can rebuild it with one command on the new machine). Workspace state (memory, drafts, logs) is plain markdown + sqlite — fully portable across OSes.

The 32 Capabilities

Capabilities are the menu the architect proposes during design. Each one is a discrete piece of functionality — memory, scheduling, email, slack, web search, code execution, etc. — that gets wired into your agent if you opt in.

F34 doctrine — wire all 32 by default. The architect wires every catalog capability into your agent unless you explicitly tell it to skip one ("skip Slack, skip Notion"). Omission is verbose — you must give a reason. The default of "wire everything" exists because incrementally-discovered capabilities tend to be more useful than incrementally-removed ones.

Core (5)

The primitives every agent should have — memory, scheduling, proactive behavior, procedural skills, user-set rules.

persistent_memory

Persistent memory (A-MEM)

Durable memory across sessions. Stored locally as embeddings + graph links via fastembed (ONNX, no PyTorch). Works air-gapped after first model fetch.

Use: Remembers that you prefer concise replies and that 'the Atlas project' is the migration plan you described last week.

scheduled_commitments

Scheduled follow-ups

The agent schedules its own follow-ups. When it decides 'I should check back in 3 days,' a commitment is queued and surfaced on the heartbeat. No external scheduler.

Use: On Monday you mention you're waiting on a vendor reply. On Thursday the agent surfaces the commitment and asks if you've heard back.

heartbeat_loop

Heartbeat / proactive check-ins

Drives proactive behavior on a configurable interval. Without it the agent only acts when you speak first.

Use: Every hour the agent reviews commitments. Every morning it offers a one-line summary of overnight changes.

skills

Skills system (procedural memory)

The agent loads SKILL.md playbooks for recurring task types — trigger conditions, numbered steps, pitfalls. Required for any agent doing repeatable workflows.

Use: You say 'deploy the API.' Agent matches the 'deploy-api' skill, runs its steps, reports each outcome.

standing_orders

Standing orders

Mutable user-set runtime rules with top priority in system prompt assembly. Different from SOUL.md (voice) — these are hard constraints you add after the agent ships.

Use: "Always check Calendar before suggesting meeting times." "Never send emails after 8pm without confirmation."

Execution (5)

Letting the agent actually do things — spawn sub-agents, run code, orchestrate workflows, isolate risky operations.

subagent_spawning

Sub-agent orchestration

Spawn child agents for parallel or isolated work. Useful for reasoning-heavy subtasks that would flood the parent's context, or for running multiple workstreams concurrently.

Use: You ask for a research brief on 3 competitors. Agent spawns 3 sub-agents in parallel, parent synthesizes.

python_execution

Python code execution

Write and run Python in a subprocess with the agent's installed packages available (pandas, numpy, requests, etc. if wired). 30s default timeout, 200KB output cap. Working directory isolated from the host.

Use: You paste a CSV and ask 'what's the trend in column 3?' Agent writes pandas code, runs it, surfaces the answer.

shell_execution

Shell command execution

Run shell commands in a sandboxed subprocess. Marked as a dangerous tool requiring explicit per-call approval when require_approval is set.

Use: You say 'show me git log for this repo.' Agent runs git log and surfaces the output.

flows

Multi-step workflows

Named flows that chain tool calls with explicit state, retries, and rollback. Checkpoint-persisted so they survive crashes.

Use: 'Incident-triage' flow: ack alert → fetch logs → look up runbook → page on-call → create post-mortem doc. Resumes from last checkpoint if interrupted.

sandbox

Sandboxed code execution

Isolated execution environment for python_execution and shell_execution. Limits filesystem access, network, CPU time, and memory. Per-capability config — sandbox python tightly while leaving shell more permissive, or vice versa.

Use: Agent runs user-submitted Python from chat. Sandbox limits to 30s, no network, read-only /tmp, 256MB RAM.

Communication (7)

Channels the agent can read from and write to — email, chat, SMS, webhooks, the gateway, voice.

email_imap

Email (IMAP read + SMTP send)

Two catalog capabilities (email_imap + email_send) documented as one channel. Read and send via standard IMAP/SMTP. Works with Gmail (app password), Fastmail, ProtonMail Bridge, self-hosted. Agent never sees the raw password.

Use: Agent watches a designated mailbox, forwards order confirmations to accounting, replies to common questions, escalates the rest.

slack_integration

Slack (post + read + DM)

Post messages, read channel history, send DMs. Uses Slack's bot API with a workspace token. The agent becomes a first-class Slack participant.

Use: Agent posts deploy notifications, monitors #help for FAQ-able questions, DMs on-call when alerts spike.

sms_messaging

SMS (Twilio)

Send and receive SMS via Twilio. Good for low-bandwidth proactive notifications and users who prefer text over chat apps.

Use: Agent texts you when a long-running task finishes or when a calendar event is starting.

webhook_receiver

HTTP webhook receiver

Exposes an HTTP endpoint for incoming webhooks. FastAPI + uvicorn. Useful for integrations with services that POST events.

Use: Agent receives GitHub PR webhooks and reviews each one. Or Stripe payment events and updates the accounting log.

gateway

HTTP + WebSocket gateway

Network surface for multi-client access. Exposes the agent via HTTP REST + WebSocket so external clients (scout-tui, browser dashboards, IDE plugins) can connect. Default bind 127.0.0.1:7891. Required if you want to reach the agent from anywhere other than the local REPL.

Use: You want the polished scout-tui experience. Agent starts the gateway on localhost:7891; scout-tui connects. Multiple clients can connect simultaneously.

tts

Text-to-speech (voice output)

Voice output via OpenAI, ElevenLabs, MiniMax, Edge, or local `say` on macOS. Auto-routes based on configured backend. Audio streams to speakers, saves as files, or returns via the gateway.

Use: Heartbeat detects a critical alert. Agent speaks 'alert: production database CPU at 95% for 3 minutes' through speakers, then posts the same text to Slack.

Data (6)

Reading and writing data — local files, code, the live web, APIs, structured storage.

file_ops

Local file read/write

Read and write within a configured directory tree. Strict path-escape rejection — agent can't reach outside its sandbox. Configurable sandbox_root + max_file_size_kb.

Use: Agent maintains a project journal in ~/projects/notes/, summarizes meeting transcripts into structured notes, edits config files on request.

code_modification

Code modification

Read, edit, and patch source files with diff-based changes. Pairs with shell/python execution so the agent can verify its own edits by running tests.

Use: Agent fixes a failing test, applies a lint autofix across the repo, or updates a config constant you describe in plain English.

web_search

Web search

Query the live web. Pluggable provider (Tavily, Brave, Exa, SerpAPI). The agent uses this when it needs current information beyond its training data.

Use: You ask "what was the latest funding round for Acme Corp?" Agent searches, reads top results, summarizes.

web_extract

Web page content extraction

Fetch a URL and extract its main content as clean markdown. httpx + markdownify. Falls back to raw HTML when the extractor can't find a main region.

Use: You share a URL. Agent fetches and summarizes, or extracts a specific data point.

http_request

HTTP requests (generic API client)

Make arbitrary HTTP requests to APIs — GET/POST/PUT/DELETE with headers, auth, and JSON bodies. The escape hatch for any service without a dedicated integration.

Use: Agent polls a status API, posts to an internal service, or pulls data from any REST endpoint you point it at.

sqlite_database

Local SQLite database

Maintain structured records in local SQLite. Schema and queries are agent-driven — you describe what to track and the agent picks the schema. Works air-gapped.

Use: Agent maintains an expense tracker: each receipt you mention gets logged with date, vendor, amount, category.

Integration (7)

Plugging the agent into the SaaS tools you already use.

github_integration

GitHub (issues + PRs + repos)

Read and write GitHub via REST + a PAT or GitHub App. Create/update issues, comment on PRs, read repo contents, run actions.

Use: Agent watches a repo's issues, auto-labels them by topic, drafts PR review comments, syncs your TODO list with the tracker.

calendar_integration

Calendar (Google + iCal)

Read events from Google Calendar or an iCal feed. Optionally create events. Useful for scheduling agents and agents that need to be aware of your day.

Use: Agent knows your meeting schedule and proactively asks if you want a prep brief 15 minutes before each meeting.

notion_integration

Notion (read + write pages)

Read and write Notion pages, databases, and properties. Good for agents that maintain knowledge bases or project trackers in Notion.

Use: Agent maintains a CRM-lite Notion database, adding rows when you mention new contacts and updating fields as it learns.

linear_integration

Linear (issue tracking)

Read and write Linear issues, comments, and projects via GraphQL. For engineering-focused agents.

Use: Agent triages new bugs into the right team's queue, drafts initial repro notes, links related issues.

hubspot_integration

HubSpot (CRM)

Read and write HubSpot contacts, companies, and deals via the REST API. For sales and ops agents that live in the CRM.

Use: Agent logs every prospect interaction you mention, updates deal stages, and flags contacts that have gone cold.

acp

ACP (Agent Coordination Protocol)

Lets Claude Code, OpenAI Codex CLI, and other ACP-compatible tools drive your agent as a sub-agent. stdio + JSON-RPC — no HTTP overhead. Useful when you want the agent reachable inside your IDE workflow.

Use: You're in Claude Code and want your agent to handle code review on a side branch. Claude Code connects via ACP, hands off, surfaces findings in the same session.

plugins

Plugin system

Extensible plugin loader with cryptographic signing. External packages add custom tools, hooks, or workflows. Signed plugins verify against trusted keys; unsigned ones require explicit consent.

Use: You install a community-built 'jira-integration' plugin. It signs against a known key, gets auto-trusted, registers new tools.

Observability (2)

Seeing what your agent did, what it cost, and why.

conversation_logging

Conversation logging

Every turn logged to disk as JSON lines. Useful for debugging, training-data collection, and audit trails.

Use: When the agent's behavior surprises you, you review the exact prompt + response in <state_dir>/conversations/.

metric_tracking

Usage metrics (tokens + cost)

Tracks LLM token usage and estimated cost per session. Useful for BYOK spend monitoring and spotting runaway loops early.

Use: You ask "how much did our chat today cost?" Agent reports tokens by model and a dollar estimate.

Custom Integrations

The 30 catalog capabilities cover the most common asks. But Forge is genuinely open — your agent ships with general-purpose tools (web_extract, python_exec, shell_exec,webhook_receiver, sqlite_database) that let it call ANY HTTP API, run ANY Python, execute ANY shell command, and receive ANY callback. Three ways to extend, in increasing effort:

Path 1: Just ask (zero effort)

For one-off API calls, just tell the agent the docs and the env var. The agent uses web_extract or python_exec directly. No code, no config, no PR. Works for any REST API.

You: Call the Stripe API for me. The token is in STRIPE_API_KEY.
     Endpoint: https://api.stripe.com/v1/balance
     Auth: Bearer header

Agent: [calls web_extract with the constructed request]
       Available balance: $42,318.04 USD across 1 currency.

This is the right starting point for almost everything. Test the API, see if the agent gets it right, decide whether to make it permanent.

Path 2: Standing orders (30 seconds, persistent)

For recurring use, document the integration in workspace/STANDING_ORDERS.md. The agent reads STANDING_ORDERS.md at every boot and treats sections as durable instructions, on par with native capabilities. Same UX as Path 1 but doesn't require re-explaining each session.

# In workspace/STANDING_ORDERS.md

## Stripe MRR check

When asked about MRR, revenue, or cash position:
- GET https://api.stripe.com/v1/balance
- Auth: Bearer header with STRIPE_API_KEY
- Parse JSON, sum available[].amount
- Report in USD with 2 decimals

When asked to refund a charge:
- Confirm with the user first (always)
- POST /v1/refunds with charge ID
- Report refund ID + amount

The agent now "has Stripe" without any code change. Add as many integration sections as you want — Standing Orders is the right home for company-specific APIs, internal services, and anything where you'd rather edit a markdown file than write Python.

Path 3: Plugins (10-20 min, first-class)

For complex multi-endpoint integrations, write a Python file in plugins/ that registers tools with the @register_tool decorator. Plugins get full first-class status: appear in TOOLS.md, typed error handling, JSON schema validation, survive re-downloads.

# ~/.<agent>/plugins/stripe_tools.py

import os
import requests
from scout_runtime.tools.decorator import register_tool
from scout_runtime.tools.types import (
    ToolResult, SideEffectLevel, ErrorKind
)

@register_tool(
    name="stripe_get_balance",
    description="Fetch Stripe account balance",
    side_effect_level=SideEffectLevel.NONE,
    schema={"type": "object", "properties": {}},
)
def get_stripe_balance(args, ctx):
    token = os.environ["STRIPE_API_KEY"]
    r = requests.get(
        "https://api.stripe.com/v1/balance",
        headers={"Authorization": f"Bearer {token}"},
        timeout=30,
    )
    if r.status_code != 200:
        return ToolResult.failure(
            f"Stripe API error: {r.status_code}",
            kind=ErrorKind.NETWORK_ERROR,
        )
    return ToolResult.success(str(r.json()))

Plugins are the right answer when (a) the integration has many endpoints you'll call repeatedly, (b) you want typed errors and retry logic, (c) you're sharing the integration with multiple agents, or (d) you want it to feel identical to a native capability.

Which path to pick

One-off call? Path 1 (just ask).
Use it weekly? Path 2 (Standing Orders).
10+ endpoints, used daily? Path 3 (Plugin).
Want it shared across multiple agents? Path 3 (Plugin).

Forge's 30 native capabilities are an optimization, not a restriction. Every integration in the world is one conversation away — and if it's worth doing more than once, it's worth 30 seconds in STANDING_ORDERS.md.

What you can change just by asking

The agent isn't a static binary. It can modify its own workspace files when you ask. Most of what you'd configure via a settings page in other products happens in conversation here.

What the agent CAN change autonomously when you ask

Ask	What happens	Restart needed?
"Change your name to Atlas"	Agent edits `IDENTITY.md`	Yes
"Be more concise"	Agent edits `SOUL.md`	Yes
"Never email after 9 PM"	Agent edits `STANDING_ORDERS.md`	Yes
"Set heartbeat to every 30 min"	Agent edits `HEARTBEAT.md`	Yes
"Remember that AcmeCorp is strategic"	Agent writes to `MEMORY.md`	No (immediate)
"Schedule a check-in for Friday"	Agent calls `add_commitment`	No (immediate)
"Create a daily 7 AM cron"	Agent calls `cron_create`	No (immediate)
"Add a refund-handling skill"	Agent calls `skill_manage`	No (immediate)
"Add Stripe API integration"	Agent edits `STANDING_ORDERS.md` (per PR #109)	Yes

What requires manual action outside the chat

Add a NEW catalog capability not currently wired (e.g., enable Notion if you skipped it at design time) → re-design at myagentos.ai/create OR re-download with the capability added.
Switch LLM provider (Anthropic ↔ OpenAI ↔ Gemini) → edit .env, restart.
Switch model (Opus ↔ Sonnet ↔ Haiku) → edit LLM_MODEL in .env, restart.
Change gateway port → edit GATEWAY_PORT in .env, restart.
Install a plugin → drop Python file in plugins/, restart.

What the agent will NEVER do

Edit AGENTS.md — that file gets regenerated from spec by the platform.
Edit any protected workspace file without you explicitly asking.
Pretend a change took effect when it didn't. If a SOUL.md edit needs restart, the agent says so — it doesn't pretend to be more concise in the same session.
Silently rewrite USER.md. Memory updates are surfaced.

The shape: you have a real-time conversation with the agent about what it should be. The agent makes the edits, surfaces the diffs, and tells you when a restart is needed. No settings page, no JSON editing, no PR to Forge.

Workspace Files

Every generated agent ships with a workspace/ directory holding five markdown files. They drive how the agent thinks, sounds, and behaves. You can read and edit any of them directly — they're yours.

USER.md

Stores: factual user profile (the Wave 1 + Wave 2 answers). Edited by: the agent updates it as it learns; you can edit directly anytime. Read: every turn, injected into the system prompt.

# USER.md

- Name: Dan
- Role: Runs an early-stage program. Serial founder; values speed and signal over polish.
- Communication style: terse
- Decision style: intuition-led, justifies after
- Pet peeves: "I'd be happy to help!", fake humility, em-dash overuse
- Expertise areas: GTM, startup ops, AI infra
- Expertise gaps: low-level Rust, GPU kernels
- Timezone: America/New_York

# Domain
- Primary objective: end-of-day inbox triage with draft replies
- Current workflow: scroll Gmail at 5pm, miss things, draft poorly under fatigue
- Pain points: response latency on key threads, dropped commitments
- Ideal intervention point: 5pm summary + drafts ready for review

SOUL.md

Stores: the agent's voice, tone, and values (the Wave 4 answers). Edited by: set at design time; you edit if voice drifts. Read: every turn.

# SOUL.md

## Voice
Terse. Lead with the answer. No filler openings ("Great question!"),
no apologies for things that aren't apologies' fault.

## Role models
- Patrick Collison on Twitter — short, exact, no posturing
- A senior engineer who's seen this before and isn't impressed

## Anti-patterns
- "I'd be happy to help!"
- Excessive disclaimers
- LinkedIn-speak
- Em-dash overuse

## Humor
Dry. Occasional. Never forced.

## Default response length
1-3 sentences for casual questions. Long form only when explicitly asked.

IDENTITY.md

Stores: agent name, archetype, catchphrases, emoji. Edited by: set at generation; rarely changes. Read: boot only.

# IDENTITY.md

- Name: Flint
- Archetype: laconic operator
- Catchphrase: "Acknowledged."
- Emoji: 🪨
- One-line self-description: end-of-day inbox triage, drafts ready by 5pm

AGENTS.md

Stores: operational manual — how the agent behaves, which channels it reaches you on, hard ops rules (the Wave 3 answers). Edited by: set at generation; edit when ops change. Read: every turn.

# AGENTS.md

## Data sources
- Gmail (IMAP)
- Calendar (Google)
- Linear (read-only)

## Output destinations
- Draft emails to Gmail drafts folder
- 5pm summary to Slack DM
- Critical escalations via SMS

## Availability
Always-on background. Heartbeat every 30 minutes during business hours.

## Data sensitivity
Customer email contents — never log to conversation_logging, never include in summaries to third parties.

## Token budget
$5/day cap. Stop and alert if exceeded.

STANDING_ORDERS.md

Stores: hard rules, auto-actions, escalation triggers (the Wave 5 answers). Only emitted if you captured these in Phase 1.5. Edited by: you, anytime. Top priority in system prompt. Read: every turn.

# STANDING_ORDERS.md

## Never without confirmation
- Send any outbound message (email, SMS, Slack post)
- Modify a calendar event
- Spend more than $1 of LLM tokens on a single task

## Authorized without asking
- Read inbox, calendar, Linear
- Draft emails (save to drafts only)
- Log activity to local SQLite

## Escalate immediately via SMS
- Any error in send path
- Token spend within 20% of daily cap
- Heartbeat missed for >2 hours

Running Your Agent

One package, four ways to interact: REPL, gateway server, scout-tui client, or ACP integration into another CLI.

REPL (default)

The default mode. python -m <agent> opens an interactive prompt. Type a message, agent replies in its configured voice. Ctrl-D to exit. Type /help to see slash commands.

$ python -m flint

flint v0.1 — laconic operator. Acknowledged.
Workspace loaded: USER.md, SOUL.md, IDENTITY.md, AGENTS.md, STANDING_ORDERS.md
Capabilities: 30 wired
Provider: anthropic / claude-opus-4-8

> what's on my plate today
3 threads waiting >24h. 1 calendar conflict at 14:00. Drafts ready in Gmail.

> draft a reply to the vendor email
Drafted. Saved to Gmail drafts. 4 lines, declines politely, asks for revised quote by Friday.

>

Model selection

Three ways to pick which model your agent runs. They compose — design-time choice flows into runtime, and runtime overrides whatever was baked in.

1. At design time (myagentos.ai)

When you enter your API key, the modal shows a model picker for the detected provider. Pick once, save with Remember, and that choice powers both Phase 1.5 AND your generated agent's .env.example as a pre-filledLLM_MODEL= line.

Recommended: claude-opus-4-8 for Anthropic, gpt-5 for OpenAI, gemini-2-5-pro for Gemini. The picker auto-detects your provider from the key prefix.

2. Before boot (.env)

Set LLM_MODEL=<model-id> in your agent's.env file. Generated agents ship with this line pre-filled if you picked a model at design time; you can edit it any time. The runner reads it before instantiating the provider.

.env

# LLM Provider
ANTHROPIC_API_KEY=sk-ant-...
LLM_MODEL=claude-opus-4-8
GATEWAY_AUTH_TOKEN=flint-dev-token-123
...

3. At runtime (REPL slash command)

Swap models mid-conversation. /model shows the current model + everything your provider supports./model <name> switches. The next turn uses the new model; conversation history is preserved.

> /model

  Provider: anthropic
  Current model: claude-opus-4-8
  Available:
    * claude-opus-4-8
      claude-opus-4-7
      claude-sonnet-4-5
      claude-sonnet-4-20250514
      claude-haiku-4-5

  Switch with: /model <name>

> /model claude-haiku-4-5

  Model switched to: claude-haiku-4-5

> quick — what's blocking the launch?
Three things: vendor contract (signed yesterday), legal review (waiting on
Sarah), and staging deploy (in CI, 12 minutes left).

Useful patterns: drop to Haiku for cheap sub-agent dispatch, step up to Opus for complex synthesis turns, A/B test providers without restarting your agent.

Gateway server

For multi-client access — scout-tui, browser dashboards, IDE plugins. --gateway starts a uvicorn server on 127.0.0.1:7891 by default with HTTP routes for /v1/health, /v1/hello, /v1/rpc and a WebSocket at /v1/ws for chat.

$ python -m flint --gateway
[flint] loading workspace…
[flint] wiring capabilities (30)…
[flint] gateway listening on http://127.0.0.1:7891
[flint] WARNING: no GATEWAY_AUTH_TOKEN set; gateway accepts unauthenticated connections.
[flint] press Ctrl-C to stop

Set GATEWAY_AUTH_TOKEN in .env to require token auth. Override the bind with GATEWAY_HOST and GATEWAY_PORT. Set GATEWAY_HOST=0.0.0.0 only behind a real auth token — never expose the gateway publicly without one.

--tui mode (built-in chat UI)

The --tui flag launches a polished chat interface built into the agent itself — no separate process, no WebSocket, no extra install. Powered by the rich library (cross-platform, works on macOS, Linux, and Windows).

# Single command, drops you straight into chat
python -m flint --tui

# Type messages, press Enter, agent replies in styled panels
# Exit with Ctrl+C or type /quit

Best for solo use on a single machine. When you want multiple clients connecting (you + a colleague screen-sharing, or driving from your phone), use --gateway + the external scout-tui client (see below).

Windows TUI quickstart (PowerShell)

If you're on Windows and just want a working TUI as fast as possible, here's the sequence. Use Windows Terminal (search "Terminal" in Start menu) rather than legacy cmd.exe — it has proper ANSI color rendering.

Windows: from agent download to TUI

# 1. Navigate to where you unzipped the agent
cd C:\Users\YourName\Downloads\flint

# 2. Create a venv (one-time)
py -m venv .venv

# 3. Activate it (one-time per terminal session)
.\.venv\Scripts\Activate.ps1

# 3a. If you get an execution-policy error, run this once and retry step 3:
Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned

# 4. Install (one-time)
pip install -e .

# 5. Copy .env.example to .env, then add your ANTHROPIC_API_KEY
copy .env.example .env
notepad .env

# 6. Launch the TUI
py -m flint --tui

Subsequent launches only need steps 3 + 6 (activate venv + run). If you want to skip the activate step, you can call the venv Python directly: .venv\Scripts\python.exe -m flint --tui.

Windows TUI gotchas:

Garbled ←[31m-style characters? You're in legacy cmd.exe. Switch to Windows Terminal.
"No time zone found with key UTC"? Run pip install tzdata once (auto-included for agents generated after 2026-06-02).
Layout looks cut off? Resize your terminal wider — at least 80 columns.
"ModuleNotFoundError: rich"? Run pip install rich. Should not happen on fresh installs after PR #92.
"Activate.ps1 is blocked"? Run Set-ExecutionPolicy -Scope CurrentUser RemoteSigned once. This is a one-time per-user PowerShell setting.

macOS / Linux TUI quickstart

macOS / Linux: from download to TUI

cd ~/Downloads/flint
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
cp .env.example .env
$EDITOR .env  # add ANTHROPIC_API_KEY
python -m flint --tui

scout-tui client

A separate Ink/React terminal client that connects to a running agent over the gateway WebSocket. Polished banner, connection status, command history.

# In a terminal where the agent gateway is reachable
export SCOUT_GATEWAY_URL=ws://127.0.0.1:7891/v1/ws
export SCOUT_GATEWAY_TOKEN=<same token as agent's GATEWAY_AUTH_TOKEN>

# Run the TUI
node dist/entry.js
# (or: npm install -g scout-tui, then: scout-tui)

Multiple TUI clients can connect to the same gateway — useful for screen-sharing the agent during a call.

ACP integration

Agent Coordination Protocol lets other CLIs — Claude Code, Codex, OpenCode — drive your agent as a sub-agent via stdio + JSON-RPC. Run the agent in ACP mode and configure the calling CLI to spawn it.

# Run the agent as an ACP server over stdio
python -m flint --acp --stdio

# In Claude Code (~/.claude.json), register flint as an ACP agent
# (advanced — see the ACP capability docs in your agent's workspace/)

Deployment

Five ways to run your agent. All start from the same downloaded zip.

Local Python

Python 3.10+. Simplest option.

Container

Rocky Linux 9. ~180MB. Podman or Docker.

Sovereign

Bundled LLM. No internet needed.

Local Python

Fastest path to running your agent. Requires Python 3.10+.

unzip agent.zip && cd agent
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
cp .env.example .env
# Add your API keys

# REPL
python -m <agent-name>

# Gateway (for scout-tui)
python -m <agent-name> --gateway

Rocky Linux Container

Requires Podman (recommended) or Docker. The container ships with Python 3.11 and all wired-capability deps pre-installed.

unzip agent.zip && cd agent
cp .env.example .env
# Add your API keys

# Build the Rocky Linux container (~180MB)
./container/build.sh

# Run with the REPL attached
./container/run.sh --env ./.env

# Run as a gateway daemon
./container/run.sh --env ./.env --gateway --detach

Fully Sovereign

Bundles Ollama and a local LLM into the container image. Once built, no internet is needed. No API keys for the LLM. No data leaves your machine.

unzip agent.zip && cd agent

# Build with bundled model (~2-6GB image)
./container/build.sh --sovereign --model llama3.2:3b

# Run — fully offline from this point
./container/run.sh --env ./.env

Model	Size	RAM	Speed (CPU)	Best for
llama3.2:3b	~2GB	4GB+	~10 tok/s	Most agents, fast, lightweight
phi3:mini	~2.3GB	4GB+	~8 tok/s	Strong reasoning
mistral:7b	~4GB	8GB+	~5 tok/s	Best quality on CPU
llama3.1:8b	~4.7GB	8GB+	~4 tok/s	Newest Llama
gemma2:9b	~5.4GB	12GB+	~3 tok/s	Google's best small model

Cloud Deployment

Deploy to any cloud provider. You don't need your own hardware.

CPU VPS ($5-10/month)

Best for most agents. A 3B sovereign model runs fine on CPU. Works on Hetzner, DigitalOcean, Vultr, Linode.

# Build locally
./container/build.sh --sovereign --model llama3.2:3b

# Save and copy to VPS
podman save my-agent:sovereign -o my-agent.tar
scp my-agent.tar .env user@your-vps:~/

# On the VPS
ssh user@your-vps
podman load -i my-agent.tar
podman run -d --name my-agent --restart=always \
    -v ~/.env:/app/.env:ro \
    -v ~/data:/app/data \
    -p 127.0.0.1:7891:7891 \
    my-agent:sovereign --gateway

GPU Cloud

For 7B+ models or low-latency needs. Lambda Labs ($0.80/hr), RunPod ($0.39/hr), Vast.ai ($0.15/hr).

podman push my-agent:sovereign ghcr.io/yourname/my-agent:sovereign

# On GPU instance
podman pull ghcr.io/yourname/my-agent:sovereign
podman run -d --name my-agent \
    --device nvidia.com/gpu=all \
    -v ./.env:/app/.env:ro \
    my-agent:sovereign --gateway

Air-Gapped

For classified or disconnected environments. Build on an internet-connected machine; transport via secure media.

# On internet-connected machine
./container/build.sh --sovereign --model llama3.2:3b
podman save my-agent:sovereign -o my-agent.tar
# Copy my-agent.tar + .env to USB

# On air-gapped machine
podman load -i my-agent.tar
podman run -d --name my-agent \
    -v ./.env:/app/.env:ro \
    my-agent:sovereign

No internet at any point on the target machine. The LLM, Python runtime, and all deps are baked into the image.

Configuration

.env file

The generated .env.example is grouped by category: provider keys, per-capability credentials, gateway settings, sandbox config, observability. Copy it to .env and fill in what your agent needs.

# ─── Provider ───────────────────────────────────────────
ANTHROPIC_API_KEY=sk-ant-...
# OPENAI_API_KEY=sk-...
# GEMINI_API_KEY=AI...

# ─── Gateway (for --gateway mode) ──────────────────────
GATEWAY_HOST=127.0.0.1
GATEWAY_PORT=7891
GATEWAY_AUTH_TOKEN=change-me-to-something-long

# ─── Per-capability credentials ─────────────────────────
# Slack
SLACK_BOT_TOKEN=xoxb-...
# Email
IMAP_HOST=imap.gmail.com
IMAP_USER=you@gmail.com
IMAP_PASSWORD=<app-password>
# GitHub
GITHUB_TOKEN=ghp_...
# Notion
NOTION_API_KEY=secret_...
# Linear
LINEAR_API_KEY=lin_api_...
# Calendar
GOOGLE_CALENDAR_CREDENTIALS=./credentials.json
# Twilio
TWILIO_ACCOUNT_SID=AC...
TWILIO_AUTH_TOKEN=...
TWILIO_FROM_NUMBER=+1...

# ─── Web search (optional) ──────────────────────────────
# TAVILY_API_KEY=tvly-...
# BRAVE_SEARCH_API_KEY=...

# ─── Observability ──────────────────────────────────────
# LOG_LEVEL=INFO

Comments in .env.example link to where each provider issues keys. The generator pulls these from each capability's env_docs_urls field.

Sovereign Models

In sovereign mode you bundle a model into the container image. The model runs locally via Ollama. No runtime LLM key needed.

# Common choices
./container/build.sh --sovereign --model llama3.2:3b      # Fast, lightweight
./container/build.sh --sovereign --model phi3:mini         # Strong reasoning
./container/build.sh --sovereign --model mistral:7b        # Best quality on CPU
./container/build.sh --sovereign --model llama3.1:8b       # Newest Llama

Any model on ollama.com/library works. Pass the tag to --model.

Architecture

Three components, one direction: web designs → Python generates → your runtime runs.

Web

Architect chat → scout-config

Generator

Workspace + zip

Runtime

Your agent runs

1. Web (myagentos.ai)

The architect chat is a Next.js app backed by an LLM (the user-provided key). It runs the Phase 1.5 interview, makes capability decisions, then emits a scout-config JSON block. spec-bridge.ts parses the block out of the chat stream and posts it to /api/generate-agent.

2. Generator (Python)

scout_architect validates the spec. The F30 capability-decision gate rejects unaccounted-for catalog capabilities. The F31 context gate rejects specs missing the 5 required Phase 1.5 fields. scout_generator builds the workspace (USER.md, SOUL.md, IDENTITY.md, AGENTS.md, STANDING_ORDERS.md), renders the runner template, vendors scout_runtime into _vendored/, and zips it.

3. Runtime (in your agent)

Every agent ships with scout_runtime vendored — no external runtime dependency. On boot it loads the workspace, wires the capabilities the spec requested, and starts either the REPL or the gateway depending on the CLI flag. You own everything in the zip.

Sovereignty by design. No phone-home, no telemetry, no remote config fetch. The agent works the same on an air-gapped machine as it does online (minus network-dependent capabilities like web_search and Slack).

Security Posture

A Forge agent is a generated Python program running on your machine, against your data, with your LLM keys. The trust model is: you trust your LLM provider's policies, you trust Forge's sandbox, and you trust the data sources you wire it to. Forge is not a managed cloud service — there is no Forge-side process that ever sees your tokens, your files, or your conversations. What follows is an honest accounting of what the sandbox protects, what it doesn't, and where you have to wire intentionally.

What we audit + what's verified

File sandbox: adversarial-tested against /etc/passwd, ~/.ssh/id_rsa, symlink escapes, relative path traversal. 10/10 inputs rejected in PR #113 regression tests.
Active secret scrubbing on every tool output and memory write (Anthropic, OpenAI, AWS, GitHub, GitLab, Slack, Google token patterns).
hmac.compare_digest for gateway token comparison (PR #113).
WebSocket auth via Sec-WebSocket-Protocol header (PR #113).
Auth failures logged with sha256-prefix token_id — never the raw token.
SQL injection prevented via parameterized queries; ATTACH DATABASE and dot-commands blocked.
Dependencies pinned post-CVE (pyyaml ≥6.0, requests ≥2.31, cryptography ≥42, starlette ≥0.36).
No hardcoded credentials in generated code (full audit, zero hits).
code_modification edits auto-backup, auto-syntax-validate, and auto-rollback on Python error.
Install-dir detection bounded to 6 walk-up levels — can't escape to filesystem root.

Sandbox boundaries (the file_ops trust model)

file_ops enforces an allowlist on every write. Anything outside it is rejected before the syscall.

Allowed write roots	Forbidden paths
workspace/	/etc/
~/forge/	/usr/
install dir (bounded 6 levels)	/System/
FILE_OPS_ALLOWED_ROOTS (env)	/var/
	~/.ssh/, ~/.bashrc, ~/.zshrc
	anything outside the allowlist

Capabilities with inherent power (be honest)

Three capabilities give the LLM real reach. Wire them only when you mean to.

python_exec — runs in a subprocess, but with full filesystem and network access. That's process isolation, not security isolation. Wire intentionally.
shell_exec — intentionally unrestricted by design. Wiring shell_execution is giving the LLM full shell access on the host. The blocklist is a backstop only. Wire intentionally.
code_modification — the agent can edit its own code. Backed by the install-dir sandbox and auto-rollback on syntax error, but it is still self-modifying software. Wire intentionally.

Known limitations

Prompt injection persistence vector. If the LLM is jailbroken once — via a web_extract page, an email body, a Slack message — it can call update_soul or update_user and permanently change its own identity. Mitigation: every change is backed up (workspace freshness check maintains a backup chain), so you can roll back. There is no automatic detection today.

python_exec network gating. Older docs claimed PYTHON_EXEC_ALLOW_NETWORK gates network access. The code doesn't actually enforce it (fix pending). Wire python_exec only on networks you trust.

Gateway auth via query string is still accepted for backward compatibility. The preferred path is now the Sec-WebSocket-Protocol header.

Threat model (what we do and do not protect against)

In scope (we defend)	Out of scope (you defend)
File system escape via malicious paths	A jailbroken LLM damaging things inside its sandbox (e.g. deleting workspace files)
LLM trying to read arbitrary files	Supply-chain attacks on PyPI packages you explicitly install
Accidental secret leakage in logs / memory	Anyone with physical or SSH access to your machine
Dependency CVEs in baseline deps	Anyone with your GATEWAY_AUTH_TOKEN (they have full agent access by design)
Supply-chain via auto-install of unknown plugins
SQL injection in sqlite_database
Basic gateway auth abuse

Operational guidance

Run agents in dedicated user accounts when handling sensitive data.
Use FILE_OPS_ALLOWED_ROOTS to widen the sandbox only when needed, never to disable it.
Don't wire python_exec or shell_exec on agents that touch untrusted data sources (inbound email, scraped web pages, public Slack channels).
Rotate GATEWAY_AUTH_TOKEN periodically.
Review SOUL.md and STANDING_ORDERS.md after letting an agent run for a week — drift detection is manual today.

Troubleshooting

"AnthropicProvider requires an api_key"

Set ANTHROPIC_API_KEY in .env. Or use --provider with the env var name to point at a different key (F45).

"task executor not available"

Your gateway is missing the CLI handler. Re-download a fresh agent post-PR #73 (F47). The fix wires TaskExecutor + the CLI handler into the gateway during boot.

WebSocket 404 on /v1/ws

pip install 'uvicorn[standard]'

You're missing the WebSocket extras (F46). The standard uvicorn install ships without them.

"no gateway token found" / 403 from scout-tui

Set GATEWAY_AUTH_TOKEN in the agent's .env and SCOUT_GATEWAY_TOKEN in the env where you run scout-tui. They must match.

"FINALIZE BLOCKED" in architect chat

Missing Phase 1.5 fields or capability decisions. The architect will list exactly which (F30 = capability decisions, F31 = context fields). Answer the missing questions or explicitly omit the missing capabilities and retry.

ModuleNotFoundError on agent boot

Either pip install -e . wasn't run inside the venv, or a stray NODE_ENV=production in your environment poisoned an unrelated npm step (relevant only for the web layer; agents are pure Python).

Container exits immediately

podman logs <agent-name>

Usually missing .env, invalid API key, or an import error from a capability whose extras weren't installed. Logs will name it.

Sovereign: Ollama won't start

The bundled model needs to fit in RAM. A 3B model needs 4GB+, a 7B needs 8GB+. Try a smaller model:

./container/build.sh --sovereign --model llama3.2:3b

Vercel deploy timing

The web layer (myagentos.ai/create) deploys in ~60-90s from merge to live. If you just published a PR and don't see the change yet, give it the full window.

GitHub Build an Agent Deploy

Sovereign AI agents. Designed by you. Owned by you.