26 May 2026

"Where Do I Run This?" — A Surprisingly Interesting Answer

"Where Do I Run This?" — A Surprisingly Interesting Answer

Published: 2026-05-15
Tags: claude-code, ai-agents, local-ai, meta


While setting up a large-context benchmark for our llama.cpp series, I asked Claude Code
to prepare a prompt for a sub-agent to run the long benchmark job autonomously. It did,
then added a note at the end:

"To launch: Agent(subagent_type="general-purpose", prompt=open(...).read())"

My immediate question: where do I run that?

The answer reframed something I thought I understood.


It's Not Your Code. It's Claude's.

Agent(...) isn't a Python library you install. It isn't a CLI command. It's a tool
that Claude Code calls internally
— in the same category as Bash, Read, or Write.

When Claude runs Bash("nvidia-smi"), your terminal executes nvidia-smi. When Claude
calls Agent(...), a new AI agent spins up — with its own Bash, its own file access, its
own web search — and works through a task autonomously, just like Claude is working through
this conversation.

The pseudocode Claude wrote was essentially describing its own next action in notation a
programmer would recognise. It was talking about itself.


The Practical Shape of It

The flow looks like this:

You → Claude Code (this chat)
         └─ Agent(prompt="benchmark llama.cpp at 262K context...")
                └─ Sub-agent (no memory of your conversation)
                       ├─ writes bench_large_context.py
                       ├─ runs it (takes ~90 minutes)
                       ├─ reads results
                       └─ writes blog_large_context.md
         └─ "Done — decode speed drops 12% at 262K tokens. Blog post written."
You ← result

The sub-agent gets one thing: the prompt. It has no access to your conversation history.
That's why the prompt file we prepared was so detailed — it had to stand alone as a
complete briefing for someone who just walked into the room.


Why This Matters More Than It Looks

Most AI tooling has a clean boundary: the human decides what to do, the AI executes one
step. What's different here is that Claude can delegate to another Claude — and that
second agent can delegate further, run for an hour, write code, execute it, read the
output, and revise. The human isn't in the loop for each step.

That changes the unit of work. Instead of "ask AI to write a benchmark script," the unit
becomes "ask AI to run the benchmark campaign and deliver results." The script is an
implementation detail.

It also changes what a good prompt looks like. Writing for a sub-agent is closer to
writing a spec for a colleague than writing a prompt for a chatbot. It needs context,
constraints, expected outputs, and failure modes — because there's nobody to ask for
clarification once it starts.


The Meta Moment

The most interesting part of this exchange wasn't the answer. It was the question.

"Where do I run this?" assumes that code is something humans execute. But in a system
where the AI has a shell, a file system, and the ability to spawn other AIs, that
assumption quietly stops being true. The code Claude wrote wasn't for me. It was a note
to itself about what to do next.

We're early enough in this that the boundary between "Claude explaining a thing" and
"Claude doing a thing" isn't always obvious. Paying attention to which side of that line
you're on turns out to be worth it.


This post is part of a series on running large language models locally on consumer
hardware. The benchmark it references — Qwen3.5-35B-A3B at 262K context on 8GB VRAM —
is covered in the companion posts in this series.

No comments:

Post a Comment