for engineers

Forge for engineers.

Production-grade agents in 5 minutes, not 2 months.

Autonomous by default. Conversational by nature.

The autonomy model

Forge agents are autonomous by default for routine work: reads, drafts, internal writes, scheduling, computations, web search, file ops, self-notes. They execute without asking. For 10 designated high-risk categories, the agent conversationally confirms with you before executing. The inverse of approval-queue platforms: instead of asking permission for everything, the agent asks permission for nothing routine but checks in before doing anything irreversible.

Categories that trigger conversational confirmation

Category	Example	Why high-risk
email_external	Send email to a customer	Once sent, can't unsend
slack_channel_post	Post to #deals	Team-wide visibility
sms_external	SMS a customer	Intrusive, immediate
hubspot_customer_write	Update deal stage	Affects team reporting
github_pr_create	Open a PR	Notifies reviewers, runs CI
github_push	Push to origin	Triggers CI, deploys
code_modification_self	Edit own runtime	Can break next boot
spend_over_threshold	Sub-agent at $15	Token spend control
strategic_account_action	Touch top-10 account	Critical relationships
quiet_hours_external	Email at 2am	Professionalism

What conversational confirmation looks like

Inline in the TUI. Slack DM and SMS confirmation are available when configured (Phase 2).

agent › About to send email
        to: sarah@acmecorp.com
        subject: Re: pricing follow-up
        category: email_external

        ─── draft ───
        Hi Sarah, following up on Tuesday's call.
        Attached the term sheet you asked for. Let me know
        if Friday 2pm works for the next step.
        ─────────────

        [a]pprove  [e]dit  [s]kip  [t]rust this category
you › a
  sent. logged to ~/forge/audit/confirmations.jsonl

Standing orders override this. Trust the agent with a category? Add it to STANDING_ORDERS.md. Want extra gates? Add them too. The 10 categories are defaults, not a cage.

The 30-second pitch

›30 first-class capabilities, all wired by default. No glue code, no per-tool registration boilerplate.
›96 tools per agent (43 built-in + 53 capability sub-tools), uniformly typed and discoverable.
›325 passing tests, CI-enforced tool/runtime alignment. test_capability_docs_runtime_audit.py fails the build if a capability declares tools that don't register.
›3 deployment modes: Solo Local (--tui), Headless Gateway (--gateway), Bare REPL.
›Self-modifying agents: code_create / code_edit / code_rollback with install-dir sandbox and Python-syntax auto-rollback.
›Cross-platform: macOS, Linux, Windows verified end-to-end (Dan ran macOS, tdemm ran Windows, same agent).

What you get vs what you write

Concrete comparison. Both columns are the same agent surface.

Concern	Building your own	Forge
Tool registration	LangChain: 100+ lines of glue per tool	0 lines. Catalog declares it, generator wires it.
Auth handling	You maintain retry, error, token refresh per service	Standardized across all 32 integrations
Memory	You wire SQLite + embeddings + decay yourself	A-MEM ships in the box (fastembed + graph + decay)
Voice / identity	Ad-hoc prompt engineering, drifts over time	SOUL.md / IDENTITY.md / USER.md prepended to every call
Scheduling	You wire cron + tzdata + retry semantics	scheduled_commitments + cron_create + heartbeat loop
Cross-platform	You debug Windows tzdata, terminal escapes, path sep	Handled (PR #108 ships tzdata; Rich TUI portable)
Security	You grep for /etc/passwd-style bugs at 2am	Sandbox tested 10/10 adversarial (PR #113)

Three lines of code that should sell it

From scout-config to running TUI. No build step.

# 1. Generate the agent from your scout-config
curl -X POST https://forge.example/api/generate-agent \
  -d @scout-config.json -o agent.zip

# 2. Unzip
unzip agent.zip && cd agent

# 3. Run
python -m my_agent --tui

Architecture in 90 seconds

Three components. Each one does one thing.

Architect

Phase 1.5 context interview. 5 waves, 27+ fields. Captures voice, role, hard rules, escalation triggers into a scout-config.

TypeScript + Anthropic

Generator

Reads scout-config, emits a Python package: workspace files, plugins, capability wiring, three deployment entrypoints.

Python

Runtime

scout_runtime — vendored into every zip. Tool registry, sandbox, A-MEM, heartbeat, gateway, sub-agent dispatch.

Python, no SaaS deps

Single source of truth. The Python schema defines every capability and tool. JS bindings are auto-generated via sync-python.js. When schema drifts, CI breaks the build, not production.

What's in the catalog

All 32 capabilities. All wired by default (F34 doctrine, PR #104 enforces).

file_ops

web_search

web_extract

sqlite_database

python_execution

shell_execution

github_integration

calendar_integration

notion_integration

linear_integration

slack_integration

sms_messaging

email_imap

webhook_receiver

conversation_logging

metric_tracking

hubspot_integration

code_modification

heartbeat_loop

scheduled_commitments

persistent_memory

skills

standing_orders

subagent_spawning

flows

sandbox

gateway

tts

acp

plugins

How it's different from LangChain / CrewAI

LangChain / CrewAI	Forge
Library you assemble	Complete agent generated from your spec
100+ lines of glue per integration	0 lines. Catalog wires it.
You maintain abstractions when LLM APIs change	Handled in the platform. Runtime is versioned, vendored.
CrewAI: agent-crew abstractions	32 capabilities + 96 tools + identity + memory shipped in one zip
No standardized voice / persistent identity	SOUL.md / USER.md / IDENTITY.md prepended to every call
You wire memory + scheduling + heartbeat yourself	A-MEM + scheduled_commitments + heartbeat all default

How it's different from rolling your own

You could build it. It takes 2-3 months for equivalent surface. And along the way you'd re-discover everything we already debugged. Sample from the last week:

PR #98 / #99Capability docs / runtime alignment — 17 audit tests fail the build when docs lie about which tools exist
PR #101Workspace freshness check on every boot — refreshes stale system files from package while preserving MEMORY.md / USER.md
PR #102Memory consistency check — warns when MEMORY.md references tools the registry no longer ships
PR #104Auto-wire all 32 capabilities — even bare specs get the full surface
PR #105Server-side auto-wire before validation — catches architect under-wiring
PR #106HubSpot integration (29th capability) + 7 dedicated tests
PR #107Tolerant scout-config parser — recovers from truncated JSON emissions across message boundaries
PR #108Generated agents ship tzdata for Windows scheduling
PR #109Three extensibility paths documented: conversational, standing orders, plugins
PR #111code_modification capability — agents write/edit/rollback their own code, auto-rollback on Python syntax error
PR #112PowerShell TUI quickstart in docs — Windows users were guessing
PR #11310/10 adversarial sandbox tests pass — /etc/passwd, ~/.ssh, symlink escape, relative traversal, install-dir walk-up

53 PRs in one session. Latest commit f582e4d. Every merge had full diff, clear message, critical tests green before --admin.

Get started

1.Visit /create and paste your Anthropic API key.
2.Design through the Phase 1.5 interview. Download the zip.
3.Run python -m <agent> --tui.

Build your agent Docs TUI mode Whitepaper GitHub