for engineers

Forge for engineers.

Production-grade agents in 5 minutes, not 2 months.

Autonomous by default. Conversational by nature.

The autonomy model

Forge agents are autonomous by default for routine work: reads, drafts, internal writes, scheduling, computations, web search, file ops, self-notes. They execute without asking. For 10 designated high-risk categories, the agent conversationally confirms with you before executing. The inverse of approval-queue platforms: instead of asking permission for everything, the agent asks permission for nothing routine but checks in before doing anything irreversible.

Categories that trigger conversational confirmation

CategoryExampleWhy high-risk
email_externalSend email to a customerOnce sent, can't unsend
slack_channel_postPost to #dealsTeam-wide visibility
sms_externalSMS a customerIntrusive, immediate
hubspot_customer_writeUpdate deal stageAffects team reporting
github_pr_createOpen a PRNotifies reviewers, runs CI
github_pushPush to originTriggers CI, deploys
code_modification_selfEdit own runtimeCan break next boot
spend_over_thresholdSub-agent at $15Token spend control
strategic_account_actionTouch top-10 accountCritical relationships
quiet_hours_externalEmail at 2amProfessionalism

What conversational confirmation looks like

Inline in the TUI. Slack DM and SMS confirmation are available when configured (Phase 2).

agent  About to send email
        to: sarah@acmecorp.com
        subject: Re: pricing follow-up
        category: email_external

        ─── draft ───
        Hi Sarah, following up on Tuesday's call.
        Attached the term sheet you asked for. Let me know
        if Friday 2pm works for the next step.
        ─────────────

        [a]pprove  [e]dit  [s]kip  [t]rust this category
you  a
  sent. logged to ~/forge/audit/confirmations.jsonl

Standing orders override this. Trust the agent with a category? Add it to STANDING_ORDERS.md. Want extra gates? Add them too. The 10 categories are defaults, not a cage.

The 30-second pitch

  • 30 first-class capabilities, all wired by default. No glue code, no per-tool registration boilerplate.
  • 96 tools per agent (43 built-in + 53 capability sub-tools), uniformly typed and discoverable.
  • 325 passing tests, CI-enforced tool/runtime alignment. test_capability_docs_runtime_audit.py fails the build if a capability declares tools that don't register.
  • 3 deployment modes: Solo Local (--tui), Headless Gateway (--gateway), Bare REPL.
  • Self-modifying agents: code_create / code_edit / code_rollback with install-dir sandbox and Python-syntax auto-rollback.
  • Cross-platform: macOS, Linux, Windows verified end-to-end (Dan ran macOS, tdemm ran Windows, same agent).

What you get vs what you write

Concrete comparison. Both columns are the same agent surface.

ConcernBuilding your ownForge
Tool registrationLangChain: 100+ lines of glue per tool0 lines. Catalog declares it, generator wires it.
Auth handlingYou maintain retry, error, token refresh per serviceStandardized across all 32 integrations
MemoryYou wire SQLite + embeddings + decay yourselfA-MEM ships in the box (fastembed + graph + decay)
Voice / identityAd-hoc prompt engineering, drifts over timeSOUL.md / IDENTITY.md / USER.md prepended to every call
SchedulingYou wire cron + tzdata + retry semanticsscheduled_commitments + cron_create + heartbeat loop
Cross-platformYou debug Windows tzdata, terminal escapes, path sepHandled (PR #108 ships tzdata; Rich TUI portable)
SecurityYou grep for /etc/passwd-style bugs at 2amSandbox tested 10/10 adversarial (PR #113)

Three lines of code that should sell it

From scout-config to running TUI. No build step.

# 1. Generate the agent from your scout-config
curl -X POST https://forge.example/api/generate-agent \
  -d @scout-config.json -o agent.zip

# 2. Unzip
unzip agent.zip && cd agent

# 3. Run
python -m my_agent --tui

Architecture in 90 seconds

Three components. Each one does one thing.

01

Architect

Phase 1.5 context interview. 5 waves, 27+ fields. Captures voice, role, hard rules, escalation triggers into a scout-config.

TypeScript + Anthropic

02

Generator

Reads scout-config, emits a Python package: workspace files, plugins, capability wiring, three deployment entrypoints.

Python

03

Runtime

scout_runtime — vendored into every zip. Tool registry, sandbox, A-MEM, heartbeat, gateway, sub-agent dispatch.

Python, no SaaS deps

Single source of truth. The Python schema defines every capability and tool. JS bindings are auto-generated via sync-python.js. When schema drifts, CI breaks the build, not production.

What's in the catalog

All 32 capabilities. All wired by default (F34 doctrine, PR #104 enforces).

file_ops
web_search
web_extract
sqlite_database
python_execution
shell_execution
github_integration
calendar_integration
notion_integration
linear_integration
slack_integration
sms_messaging
email_imap
webhook_receiver
conversation_logging
metric_tracking
hubspot_integration
code_modification
heartbeat_loop
scheduled_commitments
persistent_memory
skills
standing_orders
subagent_spawning
flows
sandbox
gateway
tts
acp
plugins

How it's different from LangChain / CrewAI

LangChain / CrewAIForge
Library you assembleComplete agent generated from your spec
100+ lines of glue per integration0 lines. Catalog wires it.
You maintain abstractions when LLM APIs changeHandled in the platform. Runtime is versioned, vendored.
CrewAI: agent-crew abstractions32 capabilities + 96 tools + identity + memory shipped in one zip
No standardized voice / persistent identitySOUL.md / USER.md / IDENTITY.md prepended to every call
You wire memory + scheduling + heartbeat yourselfA-MEM + scheduled_commitments + heartbeat all default

How it's different from rolling your own

You could build it. It takes 2-3 months for equivalent surface. And along the way you'd re-discover everything we already debugged. Sample from the last week:

  • PR #98 / #99Capability docs / runtime alignment — 17 audit tests fail the build when docs lie about which tools exist
  • PR #101Workspace freshness check on every boot — refreshes stale system files from package while preserving MEMORY.md / USER.md
  • PR #102Memory consistency check — warns when MEMORY.md references tools the registry no longer ships
  • PR #104Auto-wire all 32 capabilities — even bare specs get the full surface
  • PR #105Server-side auto-wire before validation — catches architect under-wiring
  • PR #106HubSpot integration (29th capability) + 7 dedicated tests
  • PR #107Tolerant scout-config parser — recovers from truncated JSON emissions across message boundaries
  • PR #108Generated agents ship tzdata for Windows scheduling
  • PR #109Three extensibility paths documented: conversational, standing orders, plugins
  • PR #111code_modification capability — agents write/edit/rollback their own code, auto-rollback on Python syntax error
  • PR #112PowerShell TUI quickstart in docs — Windows users were guessing
  • PR #11310/10 adversarial sandbox tests pass — /etc/passwd, ~/.ssh, symlink escape, relative traversal, install-dir walk-up

53 PRs in one session. Latest commit f582e4d. Every merge had full diff, clear message, critical tests green before --admin.

Get started

  1. 1.Visit /create and paste your Anthropic API key.
  2. 2.Design through the Phase 1.5 interview. Download the zip.
  3. 3.Run python -m <agent> --tui.