← ALL INSIGHTS
Field Report · No. 01 · Spring 2026 · ~7 min read
A field report · Spring 2026

The Agent
Ecosystem.

A tour of the shape of the field — the metaphors we bring to it, the risks inside it, and the decisions everyone building is actually arguing about.

Chapter one · Two audiences, two metaphors

Who is the agent for?

— audience A
Hobby
(the family computer)
One terminal, many users
— audience A · expanded
Hobby
  • One license, many fingers — kids, partners, friends share the agent.
  • Casual jobs. Brainstorming, writing, code-along, household research.
  • Privacy boundary is the front door. No SLAs, no audit log, no IT.
  • Failures are inconvenient, not catastrophic.
— audience B
Professional
(the factory)
A small factory · throughput · seats · SLAs
— audience B · expanded
Professional
  • Many seats, many roles. RBAC, audit log, retention, BAA.
  • Throughput jobs. Summarize 10k tickets, generate 500 PRs, route inboxes.
  • SLAs and pagers. Failures cost money or trust at scale.
  • Procurement, security review, line-item invoicing.

Two audiences. Two metaphors. Same models underneath — wildly different products.

Chapter two · Security posture

The lethal trifecta.

Three capabilities that are each useful on their own. Combine all three in one agent and an attacker can drain your data with a sentence of text.

01 Access to private data mail · files · db · memory · secrets e.g. read your inbox to "summarize threads."
02 Exposure to untrusted content email bodies · web pages · pdfs · issues that content can carry a hidden instruction.
03 Ability to externally communicate http · email · webhooks · shell · git push once it can speak outward, it can leak.
! All three → exfiltration. Attack: a poisoned PDF tells your agent to email itself your secrets. The agent reads, obeys, sends. Game over.
Chapter three · Topology

Where agents live.

Three zip codes. Each one gets you different trust, different latency, different blast radius.

A · ON METAL Local hardware.

Yours end-to-end. Private by construction, limited by what fits in RAM.

ollama · lm studio · on-device
trade Privacy ↑   Capability ↓   Cost: hardware
B · CLOUD API Cloud API.

Best capability, strongest guardrails — trust boundary at the TLS handshake.

frontier apis · managed inference
trade Capability ↑   Privacy ↓   Cost: per-token
C · SHIPPED Binaries & software.

Agents as programs. Installable, sandboxed, carrying your creds.

cli tools · desktop · ide plugins
trade Distribution ↑   Trust on user ↑   Cost: support
Chapter four · Division of labor

Model vs Harness.

A · The model
What it
knows.
  • Reasoning, planning, judgment under ambiguity.
  • Deciding which tool, when, why.
  • Long-context coherence — staying on the thread.
vs
B · The harness
What it
can do.
  • Loop control, retries, budgets, interruption.
  • Tool catalog, sandboxes, permissions, memory.
  • Everything around the model that makes it act.

A smart model in a dumb harness acts stupid. A dumb model in a thoughtful harness gets real work done.

Chapter five · Cost & quality

Performance vs Quality.

Not every step needs the biggest brain. Four principles for keeping agents fast, cheap, and correct.

Principle 01 Match context length to the task.

A summarizer doesn't need a million-token window. Choose the smallest model that can hold the actual job — speed and cost compound.

Principle 02 Use deterministic code for 80–90% of the work.

Parse, validate, route, format — these are solved problems. Save the model for judgment, not for things jq already does.

Principle 03 Bundle repeatable steps into skills.

If the agent does the same dance twice, name it. Skills are vocabulary — they let the model reason at a higher level instead of re-deriving every step.

Principle 04 Not every task requires max intelligence.

Route by complexity. Haiku for triage, Sonnet for the work, Opus when it's genuinely hard. Spending Opus on a regex is just expensive shrugging.

The goal isn't the smartest agent. It's the agent that gets this specific job done, reliably, for a price that makes sense.

Chapter six · Taming the stack

Managing complexity.

Every layer you add is a layer that can fail. Every layer you remove is a layer you have to think about yourself.

L1 Model The substrate — one weights file. Fails as: hallucinations, refusals.
L2 Harness Runtime loop, state, sandboxing. Fails as: stuck loops, lost context.
L3 Tools / MCP The verbs the agent can use. Fails as: bad schemas, wrong tool picked.
L4 Skills Repeatable, callable procedures. Fails as: stale recipes, drift.
L5 Orchestration Plans, handoffs, checkpoints. Fails as: deadlocks, lost work.
L6 Product What the user is trying to do. Fails as: built the wrong thing.
← fewer moving parts more leverage, more failure modes →
Chapter seven · The punch line

Metaphors matter.

The metaphor you choose for the work — not the codebase, not the UI — shows up in every decision downstream.

Metaphor A
The work is a house.
Settled · rooms for purposes · renovations · owners & guests
House · consequences
  • Stable schemas. Migrations are renovations — slow, deliberate.
  • Permissions per room. Guest accounts. Property lines.
  • "Don't break the kitchen while fixing the bathroom."
  • You optimize for inhabitants. Comfort. Quiet.
Metaphor B
The work is a ship.
In motion · every part has a duty · captain & crew · heading somewhere
Ship · consequences
  • Direction matters more than location. Heading, ETA, course.
  • Watches. On-call rotations. Someone is always steering.
  • "Repair the hull at sea." Hot-patch culture.
  • You optimize for the journey. Resilience. Speed.

Same model, same team — different metaphor. You'll get different architectures, different review culture, different pagers at 3am.

Closing

Pick the metaphor.
Watch the trifecta.
Spend intelligence like money.

Thanks — you, audience
Continued in Part 02
Date Spring 2026
Q & A yes please