Documentation
Design, generate, and run sovereign AI agents
Overview
Forge is an AI agent factory. You design an agent through a conversation with the Forge architect at myagentos.ai/create. The architect captures your context, wires the capabilities your agent needs, and emits a downloadable agent zip you run on your own infrastructure with your own keys.
Architect (web)
LLM-powered designer that runs the Phase 1.5 interview and emits a scout-config spec.
Generator (Python)
Reads the spec, builds the workspace, renders the runner template, zips the result.
Runtime (in your agent)
The vendored scout_runtime your agent ships with — REPL, gateway, capabilities.
BYOK end-to-end. You bring your own LLM keys, your own per-capability credentials, your own hardware. Forge writes the agent and gets out of the way. Every byte that runs your agent lives in the zip you downloaded — no phone-home, no telemetry, no vendor lock-in.
Getting Started
Build Your First Agent
Five steps from idea to a running agent on your machine.
- Go to myagentos.ai/create
- Enter your LLM API key (Anthropic, OpenAI, Gemini, or Grok)
- Describe what you want. The architect runs Phase 1.5 — a 5-wave context interview that captures who you are, what you want the agent to do, and how you want it to behave. This step is what makes the agent yours instead of generic.
- Confirm the design when the architect summarizes it back
- Click Build, download the zip
API Keys (BYOK)
Forge is bring-your-own-key end to end. There are two key surfaces to understand.
| Provider | Key format | Cost model |
|---|---|---|
| Anthropic | sk-ant-... | Pay per token (Claude family) |
| OpenAI | sk-... | Pay per token (GPT family) |
| Gemini | AI... | Free tier + paid |
| Grok (xAI) | xai-... | Pay per token |
| Custom (Ollama, llama.cpp) | Any / none | Free (local) |
The architect's key powers the design conversation at myagentos.ai. It only lives in your browser session — Forge never stores it server-side.
The agent's key powers your agent's reasoning at runtime. It goes in the .env file inside the downloaded zip.
Phase 1.5 Context Interview
This is what separates Forge from generic agent builders. Before the architect generates anything, it interviews you in 5 waves to capture 27 context fields. The answers get embedded into your agent's workspace files so the agent boots with you already known — no generic "How can I help you?" greeting.
user_name, user_role, user_communication_style, response_length_preference, and primary_objective. The other 22 are strongly encouraged but optional.Wave 1 — Who you are (8 questions)
Name, role, communication style, decision style, pet peeves, expertise areas, expertise gaps, timezone. Becomes your USER.md.
Wave 2 — Domain context (6 questions)
Primary objective, current workflow, pain points, ideal intervention point, escalation rules, audit requirements.
Wave 3 — Operational constraints (6 questions)
Data sources, output destinations, integration constraints, data sensitivity, budget, availability. Becomes your AGENTS.md.
Wave 4 — Voice (4 questions)
Voice role models, voice anti-patterns, humor tolerance, default response length. Becomes your SOUL.md.
Wave 5 — Hard rules (3 questions)
What the agent must NEVER do without confirmation, what it IS authorized to do unprompted, what triggers immediate escalation. Becomes your STANDING_ORDERS.md.
After your answers, the architect emits a scout-config JSON block. The web layer parses it and posts it to the generator, which embeds every captured field into your workspace files.
Your First Download
What you got is a self-contained Python package. Unzip, create a venv, install, fill in .env, run.
macOS / Linux
unzip agent.zip
cd agent
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
cp .env.example .env
# edit .env to add ANTHROPIC_API_KEY (and any per-capability keys
# the architect wired — Slack, GitHub, Notion, etc.)
# REPL mode (default)
python -m <agent-name>
# Gateway mode (for scout-tui or other external clients)
python -m <agent-name> --gatewayWindows (PowerShell)
Expand-Archive agent.zip -DestinationPath .
cd agent
py -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -e .
Copy-Item .env.example .env
# edit .env to add ANTHROPIC_API_KEY (and any per-capability keys
# the architect wired — Slack, GitHub, Notion, etc.)
notepad .env
# REPL mode (default)
py -m <agent-name>
# Gateway mode (for scout-tui or other external clients)
py -m <agent-name> --gatewaybrew install python or download from python.org. Linux: usually pre-installed; otherwise sudo apt install python3 python3-venv (Ubuntu/Debian) or sudo dnf install python3 (Rocky/Fedora). Windows: install from python.org with "Add to PATH" checked, or winget install Python.Python.3. If PowerShell blocks Activate.ps1, run Set-ExecutionPolicy -Scope CurrentUser RemoteSigned once.See Gateway server and scout-tui client for the multi-client setup.
Platform Support
Forge agents run on Linux, macOS, and Windows. Every generated agent ships with cross-platform code — the shell capability detects Windows and uses cmd.exe, the python capability uses whichever interpreter you're already running, and all file paths use pathlib.
| Platform | Status | Notes |
|---|---|---|
| Linux | ✓ Full | All 32 capabilities work. Tested target. |
| macOS | ✓ Full | All 28 work. TTS uses native say. |
| Windows 10/11 | ✓ Full | All 28 work. Use PowerShell commands above. |
| WSL on Windows | ✓ Full | If you prefer Unix tooling on a Windows box. |
The myagentos.ai website (architect chat, design, download) works in any modern browser — Chrome, Edge, Firefox, Safari — on any OS. Building the agent is OS-independent; only the "run the agent" step touches your machine, and that step has first-class support on all three.
.venv is optional — you can rebuild it with one command on the new machine). Workspace state (memory, drafts, logs) is plain markdown + sqlite — fully portable across OSes.The 32 Capabilities
Capabilities are the menu the architect proposes during design. Each one is a discrete piece of functionality — memory, scheduling, email, slack, web search, code execution, etc. — that gets wired into your agent if you opt in.
Core (5)
The primitives every agent should have — memory, scheduling, proactive behavior, procedural skills, user-set rules.
persistent_memoryPersistent memory (A-MEM)
Durable memory across sessions. Stored locally as embeddings + graph links via fastembed (ONNX, no PyTorch). Works air-gapped after first model fetch.
Use: Remembers that you prefer concise replies and that 'the Atlas project' is the migration plan you described last week.
scheduled_commitmentsScheduled follow-ups
The agent schedules its own follow-ups. When it decides 'I should check back in 3 days,' a commitment is queued and surfaced on the heartbeat. No external scheduler.
Use: On Monday you mention you're waiting on a vendor reply. On Thursday the agent surfaces the commitment and asks if you've heard back.
heartbeat_loopHeartbeat / proactive check-ins
Drives proactive behavior on a configurable interval. Without it the agent only acts when you speak first.
Use: Every hour the agent reviews commitments. Every morning it offers a one-line summary of overnight changes.
skillsSkills system (procedural memory)
The agent loads SKILL.md playbooks for recurring task types — trigger conditions, numbered steps, pitfalls. Required for any agent doing repeatable workflows.
Use: You say 'deploy the API.' Agent matches the 'deploy-api' skill, runs its steps, reports each outcome.
standing_ordersStanding orders
Mutable user-set runtime rules with top priority in system prompt assembly. Different from SOUL.md (voice) — these are hard constraints you add after the agent ships.
Use: "Always check Calendar before suggesting meeting times." "Never send emails after 8pm without confirmation."
Execution (5)
Letting the agent actually do things — spawn sub-agents, run code, orchestrate workflows, isolate risky operations.
subagent_spawningSub-agent orchestration
Spawn child agents for parallel or isolated work. Useful for reasoning-heavy subtasks that would flood the parent's context, or for running multiple workstreams concurrently.
Use: You ask for a research brief on 3 competitors. Agent spawns 3 sub-agents in parallel, parent synthesizes.
python_executionPython code execution
Write and run Python in a subprocess with the agent's installed packages available (pandas, numpy, requests, etc. if wired). 30s default timeout, 200KB output cap. Working directory isolated from the host.
Use: You paste a CSV and ask 'what's the trend in column 3?' Agent writes pandas code, runs it, surfaces the answer.
shell_executionShell command execution
Run shell commands in a sandboxed subprocess. Marked as a dangerous tool requiring explicit per-call approval when require_approval is set.
Use: You say 'show me git log for this repo.' Agent runs git log and surfaces the output.
flowsMulti-step workflows
Named flows that chain tool calls with explicit state, retries, and rollback. Checkpoint-persisted so they survive crashes.
Use: 'Incident-triage' flow: ack alert → fetch logs → look up runbook → page on-call → create post-mortem doc. Resumes from last checkpoint if interrupted.
sandboxSandboxed code execution
Isolated execution environment for python_execution and shell_execution. Limits filesystem access, network, CPU time, and memory. Per-capability config — sandbox python tightly while leaving shell more permissive, or vice versa.
Use: Agent runs user-submitted Python from chat. Sandbox limits to 30s, no network, read-only /tmp, 256MB RAM.
Communication (7)
Channels the agent can read from and write to — email, chat, SMS, webhooks, the gateway, voice.
email_imapEmail (IMAP read + SMTP send)
Two catalog capabilities (email_imap + email_send) documented as one channel. Read and send via standard IMAP/SMTP. Works with Gmail (app password), Fastmail, ProtonMail Bridge, self-hosted. Agent never sees the raw password.
Use: Agent watches a designated mailbox, forwards order confirmations to accounting, replies to common questions, escalates the rest.
slack_integrationSlack (post + read + DM)
Post messages, read channel history, send DMs. Uses Slack's bot API with a workspace token. The agent becomes a first-class Slack participant.
Use: Agent posts deploy notifications, monitors #help for FAQ-able questions, DMs on-call when alerts spike.
sms_messagingSMS (Twilio)
Send and receive SMS via Twilio. Good for low-bandwidth proactive notifications and users who prefer text over chat apps.
Use: Agent texts you when a long-running task finishes or when a calendar event is starting.
webhook_receiverHTTP webhook receiver
Exposes an HTTP endpoint for incoming webhooks. FastAPI + uvicorn. Useful for integrations with services that POST events.
Use: Agent receives GitHub PR webhooks and reviews each one. Or Stripe payment events and updates the accounting log.
gatewayHTTP + WebSocket gateway
Network surface for multi-client access. Exposes the agent via HTTP REST + WebSocket so external clients (scout-tui, browser dashboards, IDE plugins) can connect. Default bind 127.0.0.1:7891. Required if you want to reach the agent from anywhere other than the local REPL.
Use: You want the polished scout-tui experience. Agent starts the gateway on localhost:7891; scout-tui connects. Multiple clients can connect simultaneously.
ttsText-to-speech (voice output)
Voice output via OpenAI, ElevenLabs, MiniMax, Edge, or local `say` on macOS. Auto-routes based on configured backend. Audio streams to speakers, saves as files, or returns via the gateway.
Use: Heartbeat detects a critical alert. Agent speaks 'alert: production database CPU at 95% for 3 minutes' through speakers, then posts the same text to Slack.
Data (6)
Reading and writing data — local files, code, the live web, APIs, structured storage.
file_opsLocal file read/write
Read and write within a configured directory tree. Strict path-escape rejection — agent can't reach outside its sandbox. Configurable sandbox_root + max_file_size_kb.
Use: Agent maintains a project journal in ~/projects/notes/, summarizes meeting transcripts into structured notes, edits config files on request.
code_modificationCode modification
Read, edit, and patch source files with diff-based changes. Pairs with shell/python execution so the agent can verify its own edits by running tests.
Use: Agent fixes a failing test, applies a lint autofix across the repo, or updates a config constant you describe in plain English.
web_searchWeb search
Query the live web. Pluggable provider (Tavily, Brave, Exa, SerpAPI). The agent uses this when it needs current information beyond its training data.
Use: You ask "what was the latest funding round for Acme Corp?" Agent searches, reads top results, summarizes.
web_extractWeb page content extraction
Fetch a URL and extract its main content as clean markdown. httpx + markdownify. Falls back to raw HTML when the extractor can't find a main region.
Use: You share a URL. Agent fetches and summarizes, or extracts a specific data point.
http_requestHTTP requests (generic API client)
Make arbitrary HTTP requests to APIs — GET/POST/PUT/DELETE with headers, auth, and JSON bodies. The escape hatch for any service without a dedicated integration.
Use: Agent polls a status API, posts to an internal service, or pulls data from any REST endpoint you point it at.
sqlite_databaseLocal SQLite database
Maintain structured records in local SQLite. Schema and queries are agent-driven — you describe what to track and the agent picks the schema. Works air-gapped.
Use: Agent maintains an expense tracker: each receipt you mention gets logged with date, vendor, amount, category.
Integration (7)
Plugging the agent into the SaaS tools you already use.
github_integrationGitHub (issues + PRs + repos)
Read and write GitHub via REST + a PAT or GitHub App. Create/update issues, comment on PRs, read repo contents, run actions.
Use: Agent watches a repo's issues, auto-labels them by topic, drafts PR review comments, syncs your TODO list with the tracker.
calendar_integrationCalendar (Google + iCal)
Read events from Google Calendar or an iCal feed. Optionally create events. Useful for scheduling agents and agents that need to be aware of your day.
Use: Agent knows your meeting schedule and proactively asks if you want a prep brief 15 minutes before each meeting.
notion_integrationNotion (read + write pages)
Read and write Notion pages, databases, and properties. Good for agents that maintain knowledge bases or project trackers in Notion.
Use: Agent maintains a CRM-lite Notion database, adding rows when you mention new contacts and updating fields as it learns.
linear_integrationLinear (issue tracking)
Read and write Linear issues, comments, and projects via GraphQL. For engineering-focused agents.
Use: Agent triages new bugs into the right team's queue, drafts initial repro notes, links related issues.
hubspot_integrationHubSpot (CRM)
Read and write HubSpot contacts, companies, and deals via the REST API. For sales and ops agents that live in the CRM.
Use: Agent logs every prospect interaction you mention, updates deal stages, and flags contacts that have gone cold.
acpACP (Agent Coordination Protocol)
Lets Claude Code, OpenAI Codex CLI, and other ACP-compatible tools drive your agent as a sub-agent. stdio + JSON-RPC — no HTTP overhead. Useful when you want the agent reachable inside your IDE workflow.
Use: You're in Claude Code and want your agent to handle code review on a side branch. Claude Code connects via ACP, hands off, surfaces findings in the same session.
pluginsPlugin system
Extensible plugin loader with cryptographic signing. External packages add custom tools, hooks, or workflows. Signed plugins verify against trusted keys; unsigned ones require explicit consent.
Use: You install a community-built 'jira-integration' plugin. It signs against a known key, gets auto-trusted, registers new tools.
Observability (2)
Seeing what your agent did, what it cost, and why.
conversation_loggingConversation logging
Every turn logged to disk as JSON lines. Useful for debugging, training-data collection, and audit trails.
Use: When the agent's behavior surprises you, you review the exact prompt + response in <state_dir>/conversations/.
metric_trackingUsage metrics (tokens + cost)
Tracks LLM token usage and estimated cost per session. Useful for BYOK spend monitoring and spotting runaway loops early.
Use: You ask "how much did our chat today cost?" Agent reports tokens by model and a dollar estimate.
Custom Integrations
The 30 catalog capabilities cover the most common asks. But Forge is genuinely open — your agent ships with general-purpose tools (web_extract, python_exec, shell_exec,webhook_receiver, sqlite_database) that let it call ANY HTTP API, run ANY Python, execute ANY shell command, and receive ANY callback. Three ways to extend, in increasing effort:
Path 1: Just ask (zero effort)
For one-off API calls, just tell the agent the docs and the env var. The agent uses web_extract or python_exec directly. No code, no config, no PR. Works for any REST API.
You: Call the Stripe API for me. The token is in STRIPE_API_KEY.
Endpoint: https://api.stripe.com/v1/balance
Auth: Bearer header
Agent: [calls web_extract with the constructed request]
Available balance: $42,318.04 USD across 1 currency.This is the right starting point for almost everything. Test the API, see if the agent gets it right, decide whether to make it permanent.
Path 2: Standing orders (30 seconds, persistent)
For recurring use, document the integration in workspace/STANDING_ORDERS.md. The agent reads STANDING_ORDERS.md at every boot and treats sections as durable instructions, on par with native capabilities. Same UX as Path 1 but doesn't require re-explaining each session.
# In workspace/STANDING_ORDERS.md ## Stripe MRR check When asked about MRR, revenue, or cash position: - GET https://api.stripe.com/v1/balance - Auth: Bearer header with STRIPE_API_KEY - Parse JSON, sum available[].amount - Report in USD with 2 decimals When asked to refund a charge: - Confirm with the user first (always) - POST /v1/refunds with charge ID - Report refund ID + amount
The agent now "has Stripe" without any code change. Add as many integration sections as you want — Standing Orders is the right home for company-specific APIs, internal services, and anything where you'd rather edit a markdown file than write Python.
Path 3: Plugins (10-20 min, first-class)
For complex multi-endpoint integrations, write a Python file in plugins/ that registers tools with the @register_tool decorator. Plugins get full first-class status: appear in TOOLS.md, typed error handling, JSON schema validation, survive re-downloads.
# ~/.<agent>/plugins/stripe_tools.py
import os
import requests
from scout_runtime.tools.decorator import register_tool
from scout_runtime.tools.types import (
ToolResult, SideEffectLevel, ErrorKind
)
@register_tool(
name="stripe_get_balance",
description="Fetch Stripe account balance",
side_effect_level=SideEffectLevel.NONE,
schema={"type": "object", "properties": {}},
)
def get_stripe_balance(args, ctx):
token = os.environ["STRIPE_API_KEY"]
r = requests.get(
"https://api.stripe.com/v1/balance",
headers={"Authorization": f"Bearer {token}"},
timeout=30,
)
if r.status_code != 200:
return ToolResult.failure(
f"Stripe API error: {r.status_code}",
kind=ErrorKind.NETWORK_ERROR,
)
return ToolResult.success(str(r.json()))Plugins are the right answer when (a) the integration has many endpoints you'll call repeatedly, (b) you want typed errors and retry logic, (c) you're sharing the integration with multiple agents, or (d) you want it to feel identical to a native capability.
Which path to pick
- One-off call? Path 1 (just ask).
- Use it weekly? Path 2 (Standing Orders).
- 10+ endpoints, used daily? Path 3 (Plugin).
- Want it shared across multiple agents? Path 3 (Plugin).
Forge's 30 native capabilities are an optimization, not a restriction. Every integration in the world is one conversation away — and if it's worth doing more than once, it's worth 30 seconds in STANDING_ORDERS.md.
What you can change just by asking
The agent isn't a static binary. It can modify its own workspace files when you ask. Most of what you'd configure via a settings page in other products happens in conversation here.
What the agent CAN change autonomously when you ask
| Ask | What happens | Restart needed? |
|---|---|---|
| "Change your name to Atlas" | Agent edits IDENTITY.md | Yes |
| "Be more concise" | Agent edits SOUL.md | Yes |
| "Never email after 9 PM" | Agent edits STANDING_ORDERS.md | Yes |
| "Set heartbeat to every 30 min" | Agent edits HEARTBEAT.md | Yes |
| "Remember that AcmeCorp is strategic" | Agent writes to MEMORY.md | No (immediate) |
| "Schedule a check-in for Friday" | Agent calls add_commitment | No (immediate) |
| "Create a daily 7 AM cron" | Agent calls cron_create | No (immediate) |
| "Add a refund-handling skill" | Agent calls skill_manage | No (immediate) |
| "Add Stripe API integration" | Agent edits STANDING_ORDERS.md (per PR #109) | Yes |
What requires manual action outside the chat
- Add a NEW catalog capability not currently wired (e.g., enable Notion if you skipped it at design time) → re-design at myagentos.ai/create OR re-download with the capability added.
- Switch LLM provider (Anthropic ↔ OpenAI ↔ Gemini) → edit
.env, restart. - Switch model (Opus ↔ Sonnet ↔ Haiku) → edit
LLM_MODELin.env, restart. - Change gateway port → edit
GATEWAY_PORTin.env, restart. - Install a plugin → drop Python file in
plugins/, restart.
What the agent will NEVER do
- Edit
AGENTS.md— that file gets regenerated from spec by the platform. - Edit any protected workspace file without you explicitly asking.
- Pretend a change took effect when it didn't. If a SOUL.md edit needs restart, the agent says so — it doesn't pretend to be more concise in the same session.
- Silently rewrite USER.md. Memory updates are surfaced.
The shape: you have a real-time conversation with the agent about what it should be. The agent makes the edits, surfaces the diffs, and tells you when a restart is needed. No settings page, no JSON editing, no PR to Forge.
Workspace Files
Every generated agent ships with a workspace/ directory holding five markdown files. They drive how the agent thinks, sounds, and behaves. You can read and edit any of them directly — they're yours.
USER.md
Stores: factual user profile (the Wave 1 + Wave 2 answers). Edited by: the agent updates it as it learns; you can edit directly anytime. Read: every turn, injected into the system prompt.
# USER.md
- Name: Dan
- Role: Runs an early-stage program. Serial founder; values speed and signal over polish.
- Communication style: terse
- Decision style: intuition-led, justifies after
- Pet peeves: "I'd be happy to help!", fake humility, em-dash overuse
- Expertise areas: GTM, startup ops, AI infra
- Expertise gaps: low-level Rust, GPU kernels
- Timezone: America/New_York
# Domain
- Primary objective: end-of-day inbox triage with draft replies
- Current workflow: scroll Gmail at 5pm, miss things, draft poorly under fatigue
- Pain points: response latency on key threads, dropped commitments
- Ideal intervention point: 5pm summary + drafts ready for reviewSOUL.md
Stores: the agent's voice, tone, and values (the Wave 4 answers). Edited by: set at design time; you edit if voice drifts. Read: every turn.
# SOUL.md
## Voice
Terse. Lead with the answer. No filler openings ("Great question!"),
no apologies for things that aren't apologies' fault.
## Role models
- Patrick Collison on Twitter — short, exact, no posturing
- A senior engineer who's seen this before and isn't impressed
## Anti-patterns
- "I'd be happy to help!"
- Excessive disclaimers
- LinkedIn-speak
- Em-dash overuse
## Humor
Dry. Occasional. Never forced.
## Default response length
1-3 sentences for casual questions. Long form only when explicitly asked.IDENTITY.md
Stores: agent name, archetype, catchphrases, emoji. Edited by: set at generation; rarely changes. Read: boot only.
# IDENTITY.md
- Name: Flint
- Archetype: laconic operator
- Catchphrase: "Acknowledged."
- Emoji: 🪨
- One-line self-description: end-of-day inbox triage, drafts ready by 5pmAGENTS.md
Stores: operational manual — how the agent behaves, which channels it reaches you on, hard ops rules (the Wave 3 answers). Edited by: set at generation; edit when ops change. Read: every turn.
# AGENTS.md
## Data sources
- Gmail (IMAP)
- Calendar (Google)
- Linear (read-only)
## Output destinations
- Draft emails to Gmail drafts folder
- 5pm summary to Slack DM
- Critical escalations via SMS
## Availability
Always-on background. Heartbeat every 30 minutes during business hours.
## Data sensitivity
Customer email contents — never log to conversation_logging, never include in summaries to third parties.
## Token budget
$5/day cap. Stop and alert if exceeded.STANDING_ORDERS.md
Stores: hard rules, auto-actions, escalation triggers (the Wave 5 answers). Only emitted if you captured these in Phase 1.5. Edited by: you, anytime. Top priority in system prompt. Read: every turn.
# STANDING_ORDERS.md
## Never without confirmation
- Send any outbound message (email, SMS, Slack post)
- Modify a calendar event
- Spend more than $1 of LLM tokens on a single task
## Authorized without asking
- Read inbox, calendar, Linear
- Draft emails (save to drafts only)
- Log activity to local SQLite
## Escalate immediately via SMS
- Any error in send path
- Token spend within 20% of daily cap
- Heartbeat missed for >2 hoursRunning Your Agent
One package, four ways to interact: REPL, gateway server, scout-tui client, or ACP integration into another CLI.
REPL (default)
The default mode. python -m <agent> opens an interactive prompt. Type a message, agent replies in its configured voice. Ctrl-D to exit. Type /help to see slash commands.
$ python -m flint
flint v0.1 — laconic operator. Acknowledged.
Workspace loaded: USER.md, SOUL.md, IDENTITY.md, AGENTS.md, STANDING_ORDERS.md
Capabilities: 30 wired
Provider: anthropic / claude-opus-4-8
> what's on my plate today
3 threads waiting >24h. 1 calendar conflict at 14:00. Drafts ready in Gmail.
> draft a reply to the vendor email
Drafted. Saved to Gmail drafts. 4 lines, declines politely, asks for revised quote by Friday.
>Model selection
Three ways to pick which model your agent runs. They compose — design-time choice flows into runtime, and runtime overrides whatever was baked in.
1. At design time (myagentos.ai)
When you enter your API key, the modal shows a model picker for the detected provider. Pick once, save with Remember, and that choice powers both Phase 1.5 AND your generated agent's .env.example as a pre-filledLLM_MODEL= line.
claude-opus-4-8 for Anthropic, gpt-5 for OpenAI, gemini-2-5-pro for Gemini. The picker auto-detects your provider from the key prefix.2. Before boot (.env)
Set LLM_MODEL=<model-id> in your agent's.env file. Generated agents ship with this line pre-filled if you picked a model at design time; you can edit it any time. The runner reads it before instantiating the provider.
.env
# LLM Provider
ANTHROPIC_API_KEY=sk-ant-...
LLM_MODEL=claude-opus-4-8
GATEWAY_AUTH_TOKEN=flint-dev-token-123
...3. At runtime (REPL slash command)
Swap models mid-conversation. /model shows the current model + everything your provider supports./model <name> switches. The next turn uses the new model; conversation history is preserved.
> /model
Provider: anthropic
Current model: claude-opus-4-8
Available:
* claude-opus-4-8
claude-opus-4-7
claude-sonnet-4-5
claude-sonnet-4-20250514
claude-haiku-4-5
Switch with: /model <name>
> /model claude-haiku-4-5
Model switched to: claude-haiku-4-5
> quick — what's blocking the launch?
Three things: vendor contract (signed yesterday), legal review (waiting on
Sarah), and staging deploy (in CI, 12 minutes left).
Useful patterns: drop to Haiku for cheap sub-agent dispatch, step up to Opus for complex synthesis turns, A/B test providers without restarting your agent.
Gateway server
For multi-client access — scout-tui, browser dashboards, IDE plugins. --gateway starts a uvicorn server on 127.0.0.1:7891 by default with HTTP routes for /v1/health, /v1/hello, /v1/rpc and a WebSocket at /v1/ws for chat.
$ python -m flint --gateway
[flint] loading workspace…
[flint] wiring capabilities (30)…
[flint] gateway listening on http://127.0.0.1:7891
[flint] WARNING: no GATEWAY_AUTH_TOKEN set; gateway accepts unauthenticated connections.
[flint] press Ctrl-C to stopSet GATEWAY_AUTH_TOKEN in .env to require token auth. Override the bind with GATEWAY_HOST and GATEWAY_PORT. Set GATEWAY_HOST=0.0.0.0 only behind a real auth token — never expose the gateway publicly without one.
--tui mode (built-in chat UI)
The --tui flag launches a polished chat interface built into the agent itself — no separate process, no WebSocket, no extra install. Powered by the rich library (cross-platform, works on macOS, Linux, and Windows).
# Single command, drops you straight into chat
python -m flint --tui
# Type messages, press Enter, agent replies in styled panels
# Exit with Ctrl+C or type /quitBest for solo use on a single machine. When you want multiple clients connecting (you + a colleague screen-sharing, or driving from your phone), use --gateway + the external scout-tui client (see below).
Windows TUI quickstart (PowerShell)
If you're on Windows and just want a working TUI as fast as possible, here's the sequence. Use Windows Terminal (search "Terminal" in Start menu) rather than legacy cmd.exe — it has proper ANSI color rendering.
Windows: from agent download to TUI
# 1. Navigate to where you unzipped the agent
cd C:\Users\YourName\Downloads\flint
# 2. Create a venv (one-time)
py -m venv .venv
# 3. Activate it (one-time per terminal session)
.\.venv\Scripts\Activate.ps1
# 3a. If you get an execution-policy error, run this once and retry step 3:
Set-ExecutionPolicy -Scope CurrentUser -ExecutionPolicy RemoteSigned
# 4. Install (one-time)
pip install -e .
# 5. Copy .env.example to .env, then add your ANTHROPIC_API_KEY
copy .env.example .env
notepad .env
# 6. Launch the TUI
py -m flint --tuiSubsequent launches only need steps 3 + 6 (activate venv + run). If you want to skip the activate step, you can call the venv Python directly: .venv\Scripts\python.exe -m flint --tui.
- Garbled
←[31m-style characters? You're in legacycmd.exe. Switch to Windows Terminal. - "No time zone found with key UTC"? Run
pip install tzdataonce (auto-included for agents generated after 2026-06-02). - Layout looks cut off? Resize your terminal wider — at least 80 columns.
- "ModuleNotFoundError: rich"? Run
pip install rich. Should not happen on fresh installs after PR #92. - "Activate.ps1 is blocked"? Run
Set-ExecutionPolicy -Scope CurrentUser RemoteSignedonce. This is a one-time per-user PowerShell setting.
macOS / Linux TUI quickstart
macOS / Linux: from download to TUI
cd ~/Downloads/flint
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
cp .env.example .env
$EDITOR .env # add ANTHROPIC_API_KEY
python -m flint --tuiscout-tui client
A separate Ink/React terminal client that connects to a running agent over the gateway WebSocket. Polished banner, connection status, command history.
# In a terminal where the agent gateway is reachable
export SCOUT_GATEWAY_URL=ws://127.0.0.1:7891/v1/ws
export SCOUT_GATEWAY_TOKEN=<same token as agent's GATEWAY_AUTH_TOKEN>
# Run the TUI
node dist/entry.js
# (or: npm install -g scout-tui, then: scout-tui)Multiple TUI clients can connect to the same gateway — useful for screen-sharing the agent during a call.
ACP integration
Agent Coordination Protocol lets other CLIs — Claude Code, Codex, OpenCode — drive your agent as a sub-agent via stdio + JSON-RPC. Run the agent in ACP mode and configure the calling CLI to spawn it.
# Run the agent as an ACP server over stdio
python -m flint --acp --stdio
# In Claude Code (~/.claude.json), register flint as an ACP agent
# (advanced — see the ACP capability docs in your agent's workspace/)Deployment
Five ways to run your agent. All start from the same downloaded zip.
Local Python
Python 3.10+. Simplest option.
Container
Rocky Linux 9. ~180MB. Podman or Docker.
Sovereign
Bundled LLM. No internet needed.
Local Python
Fastest path to running your agent. Requires Python 3.10+.
unzip agent.zip && cd agent
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
cp .env.example .env
# Add your API keys
# REPL
python -m <agent-name>
# Gateway (for scout-tui)
python -m <agent-name> --gatewayRocky Linux Container
Requires Podman (recommended) or Docker. The container ships with Python 3.11 and all wired-capability deps pre-installed.
unzip agent.zip && cd agent
cp .env.example .env
# Add your API keys
# Build the Rocky Linux container (~180MB)
./container/build.sh
# Run with the REPL attached
./container/run.sh --env ./.env
# Run as a gateway daemon
./container/run.sh --env ./.env --gateway --detachFully Sovereign
Bundles Ollama and a local LLM into the container image. Once built, no internet is needed. No API keys for the LLM. No data leaves your machine.
unzip agent.zip && cd agent
# Build with bundled model (~2-6GB image)
./container/build.sh --sovereign --model llama3.2:3b
# Run — fully offline from this point
./container/run.sh --env ./.env| Model | Size | RAM | Speed (CPU) | Best for |
|---|---|---|---|---|
| llama3.2:3b | ~2GB | 4GB+ | ~10 tok/s | Most agents, fast, lightweight |
| phi3:mini | ~2.3GB | 4GB+ | ~8 tok/s | Strong reasoning |
| mistral:7b | ~4GB | 8GB+ | ~5 tok/s | Best quality on CPU |
| llama3.1:8b | ~4.7GB | 8GB+ | ~4 tok/s | Newest Llama |
| gemma2:9b | ~5.4GB | 12GB+ | ~3 tok/s | Google's best small model |
Cloud Deployment
Deploy to any cloud provider. You don't need your own hardware.
CPU VPS ($5-10/month)
Best for most agents. A 3B sovereign model runs fine on CPU. Works on Hetzner, DigitalOcean, Vultr, Linode.
# Build locally
./container/build.sh --sovereign --model llama3.2:3b
# Save and copy to VPS
podman save my-agent:sovereign -o my-agent.tar
scp my-agent.tar .env user@your-vps:~/
# On the VPS
ssh user@your-vps
podman load -i my-agent.tar
podman run -d --name my-agent --restart=always \
-v ~/.env:/app/.env:ro \
-v ~/data:/app/data \
-p 127.0.0.1:7891:7891 \
my-agent:sovereign --gatewayGPU Cloud
For 7B+ models or low-latency needs. Lambda Labs ($0.80/hr), RunPod ($0.39/hr), Vast.ai ($0.15/hr).
podman push my-agent:sovereign ghcr.io/yourname/my-agent:sovereign
# On GPU instance
podman pull ghcr.io/yourname/my-agent:sovereign
podman run -d --name my-agent \
--device nvidia.com/gpu=all \
-v ./.env:/app/.env:ro \
my-agent:sovereign --gatewayAir-Gapped
For classified or disconnected environments. Build on an internet-connected machine; transport via secure media.
# On internet-connected machine
./container/build.sh --sovereign --model llama3.2:3b
podman save my-agent:sovereign -o my-agent.tar
# Copy my-agent.tar + .env to USB
# On air-gapped machine
podman load -i my-agent.tar
podman run -d --name my-agent \
-v ./.env:/app/.env:ro \
my-agent:sovereignConfiguration
.env file
The generated .env.example is grouped by category: provider keys, per-capability credentials, gateway settings, sandbox config, observability. Copy it to .env and fill in what your agent needs.
# ─── Provider ───────────────────────────────────────────
ANTHROPIC_API_KEY=sk-ant-...
# OPENAI_API_KEY=sk-...
# GEMINI_API_KEY=AI...
# ─── Gateway (for --gateway mode) ──────────────────────
GATEWAY_HOST=127.0.0.1
GATEWAY_PORT=7891
GATEWAY_AUTH_TOKEN=change-me-to-something-long
# ─── Per-capability credentials ─────────────────────────
# Slack
SLACK_BOT_TOKEN=xoxb-...
# Email
IMAP_HOST=imap.gmail.com
IMAP_USER=you@gmail.com
IMAP_PASSWORD=<app-password>
# GitHub
GITHUB_TOKEN=ghp_...
# Notion
NOTION_API_KEY=secret_...
# Linear
LINEAR_API_KEY=lin_api_...
# Calendar
GOOGLE_CALENDAR_CREDENTIALS=./credentials.json
# Twilio
TWILIO_ACCOUNT_SID=AC...
TWILIO_AUTH_TOKEN=...
TWILIO_FROM_NUMBER=+1...
# ─── Web search (optional) ──────────────────────────────
# TAVILY_API_KEY=tvly-...
# BRAVE_SEARCH_API_KEY=...
# ─── Observability ──────────────────────────────────────
# LOG_LEVEL=INFOComments in .env.example link to where each provider issues keys. The generator pulls these from each capability's env_docs_urls field.
Sovereign Models
In sovereign mode you bundle a model into the container image. The model runs locally via Ollama. No runtime LLM key needed.
# Common choices
./container/build.sh --sovereign --model llama3.2:3b # Fast, lightweight
./container/build.sh --sovereign --model phi3:mini # Strong reasoning
./container/build.sh --sovereign --model mistral:7b # Best quality on CPU
./container/build.sh --sovereign --model llama3.1:8b # Newest Llama--model.Architecture
Three components, one direction: web designs → Python generates → your runtime runs.
Web
Architect chat → scout-config
Generator
Workspace + zip
Runtime
Your agent runs
1. Web (myagentos.ai)
The architect chat is a Next.js app backed by an LLM (the user-provided key). It runs the Phase 1.5 interview, makes capability decisions, then emits a scout-config JSON block. spec-bridge.ts parses the block out of the chat stream and posts it to /api/generate-agent.
2. Generator (Python)
scout_architect validates the spec. The F30 capability-decision gate rejects unaccounted-for catalog capabilities. The F31 context gate rejects specs missing the 5 required Phase 1.5 fields. scout_generator builds the workspace (USER.md, SOUL.md, IDENTITY.md, AGENTS.md, STANDING_ORDERS.md), renders the runner template, vendors scout_runtime into _vendored/, and zips it.
3. Runtime (in your agent)
Every agent ships with scout_runtime vendored — no external runtime dependency. On boot it loads the workspace, wires the capabilities the spec requested, and starts either the REPL or the gateway depending on the CLI flag. You own everything in the zip.
Security Posture
A Forge agent is a generated Python program running on your machine, against your data, with your LLM keys. The trust model is: you trust your LLM provider's policies, you trust Forge's sandbox, and you trust the data sources you wire it to. Forge is not a managed cloud service — there is no Forge-side process that ever sees your tokens, your files, or your conversations. What follows is an honest accounting of what the sandbox protects, what it doesn't, and where you have to wire intentionally.
What we audit + what's verified
- File sandbox: adversarial-tested against
/etc/passwd,~/.ssh/id_rsa, symlink escapes, relative path traversal. 10/10 inputs rejected in PR #113 regression tests. - Active secret scrubbing on every tool output and memory write (Anthropic, OpenAI, AWS, GitHub, GitLab, Slack, Google token patterns).
hmac.compare_digestfor gateway token comparison (PR #113).- WebSocket auth via
Sec-WebSocket-Protocolheader (PR #113). - Auth failures logged with sha256-prefix
token_id— never the raw token. - SQL injection prevented via parameterized queries;
ATTACH DATABASEand dot-commands blocked. - Dependencies pinned post-CVE (
pyyaml ≥6.0,requests ≥2.31,cryptography ≥42,starlette ≥0.36). - No hardcoded credentials in generated code (full audit, zero hits).
code_modificationedits auto-backup, auto-syntax-validate, and auto-rollback on Python error.- Install-dir detection bounded to 6 walk-up levels — can't escape to filesystem root.
Sandbox boundaries (the file_ops trust model)
file_ops enforces an allowlist on every write. Anything outside it is rejected before the syscall.
| Allowed write roots | Forbidden paths |
|---|---|
| workspace/ | /etc/ |
| ~/forge/ | /usr/ |
| install dir (bounded 6 levels) | /System/ |
| FILE_OPS_ALLOWED_ROOTS (env) | /var/ |
| ~/.ssh/, ~/.bashrc, ~/.zshrc | |
| anything outside the allowlist |
Capabilities with inherent power (be honest)
Three capabilities give the LLM real reach. Wire them only when you mean to.
python_exec— runs in a subprocess, but with full filesystem and network access. That's process isolation, not security isolation. Wire intentionally.shell_exec— intentionally unrestricted by design. Wiringshell_executionis giving the LLM full shell access on the host. The blocklist is a backstop only. Wire intentionally.code_modification— the agent can edit its own code. Backed by the install-dir sandbox and auto-rollback on syntax error, but it is still self-modifying software. Wire intentionally.
Known limitations
Prompt injection persistence vector. If the LLM is jailbroken once — via a web_extract page, an email body, a Slack message — it can call update_soul or update_user and permanently change its own identity. Mitigation: every change is backed up (workspace freshness check maintains a backup chain), so you can roll back. There is no automatic detection today.
python_exec network gating. Older docs claimed PYTHON_EXEC_ALLOW_NETWORK gates network access. The code doesn't actually enforce it (fix pending). Wire python_exec only on networks you trust.
Gateway auth via query string is still accepted for backward compatibility. The preferred path is now the Sec-WebSocket-Protocol header.
Threat model (what we do and do not protect against)
| In scope (we defend) | Out of scope (you defend) |
|---|---|
| File system escape via malicious paths | A jailbroken LLM damaging things inside its sandbox (e.g. deleting workspace files) |
| LLM trying to read arbitrary files | Supply-chain attacks on PyPI packages you explicitly install |
| Accidental secret leakage in logs / memory | Anyone with physical or SSH access to your machine |
| Dependency CVEs in baseline deps | Anyone with your GATEWAY_AUTH_TOKEN (they have full agent access by design) |
| Supply-chain via auto-install of unknown plugins | |
| SQL injection in sqlite_database | |
| Basic gateway auth abuse |
Operational guidance
- Run agents in dedicated user accounts when handling sensitive data.
- Use
FILE_OPS_ALLOWED_ROOTSto widen the sandbox only when needed, never to disable it. - Don't wire
python_execorshell_execon agents that touch untrusted data sources (inbound email, scraped web pages, public Slack channels). - Rotate
GATEWAY_AUTH_TOKENperiodically. - Review
SOUL.mdandSTANDING_ORDERS.mdafter letting an agent run for a week — drift detection is manual today.
Troubleshooting
"AnthropicProvider requires an api_key"
Set ANTHROPIC_API_KEY in .env. Or use --provider with the env var name to point at a different key (F45).
"task executor not available"
Your gateway is missing the CLI handler. Re-download a fresh agent post-PR #73 (F47). The fix wires TaskExecutor + the CLI handler into the gateway during boot.
WebSocket 404 on /v1/ws
pip install 'uvicorn[standard]'You're missing the WebSocket extras (F46). The standard uvicorn install ships without them.
"no gateway token found" / 403 from scout-tui
Set GATEWAY_AUTH_TOKEN in the agent's .env and SCOUT_GATEWAY_TOKEN in the env where you run scout-tui. They must match.
"FINALIZE BLOCKED" in architect chat
Missing Phase 1.5 fields or capability decisions. The architect will list exactly which (F30 = capability decisions, F31 = context fields). Answer the missing questions or explicitly omit the missing capabilities and retry.
ModuleNotFoundError on agent boot
Either pip install -e . wasn't run inside the venv, or a stray NODE_ENV=production in your environment poisoned an unrelated npm step (relevant only for the web layer; agents are pure Python).
Container exits immediately
podman logs <agent-name>Usually missing .env, invalid API key, or an import error from a capability whose extras weren't installed. Logs will name it.
Sovereign: Ollama won't start
The bundled model needs to fit in RAM. A 3B model needs 4GB+, a 7B needs 8GB+. Try a smaller model:
./container/build.sh --sovereign --model llama3.2:3bVercel deploy timing
The web layer (myagentos.ai/create) deploys in ~60-90s from merge to live. If you just published a PR and don't see the change yet, give it the full window.