AI Engineer World's Fair · 2026

Recursive Coding Agents

Raymond Weitekamp

RAW.works | OpenProse

@raw_works

Motivation

We all want outcomes.

Agents that work on our behalf — reliable co-workers — while we're out on a hike.

The bottleneck is not intelligence. It's reliability. It's trust.

One day — my agents build me a full SaaS app from a single prompt.
The next day — they empty the entire contents of my Solana wallet.

Thesis

Today’s agents are mismanaged geniuses

The intelligence is there.

The missing layer is how we specify, manage, reuse, and verify the work.

Above-the-fold preview of the Turing Post OpenProse article

Turing Post · Raymond Weitekamp Stop Babysitting Agents, Start Authoring Outcomes

Above-the-fold preview of The Mismanaged Geniuses Hypothesis article

Alex Zhang · Zed Li · Omar Khattab The Mismanaged Geniuses Hypothesis

Recursive Language Models

Context itself is the object of computation

Externalize — the full prompt lives in a REPL, not the context window.
Operate — the model writes code to inspect, slice, and transform it.
Recurse — it sub-queries itself over the slices.

Top half of the first page of the Recursive Language Models arXiv paper

arXiv:2512.24601

Root RLM (depth=0)
├── Sub-RLM A (depth=1)
│   ├── LLM A1 (depth=2)
│   └── LLM A2 (depth=2)
└── Sub-RLM B (depth=1)
    ├── LLM B1 (depth=2)
    └── LLM B2 (depth=2)

Recursive Language Models

Code Execution As Reasoning

Can process inputs way beyond the context window. (Oolong) RLM paper
RLM is itself a powerful memory system. LongMemEval results
RLM can achieve SOTA on long reasoning tasks, even with very small models. LongCoT results

X card for the RAW.works article RLMs are the new reasoning models, including article preview text and social proof

RLMs: Too Hot To Benchmark

Raymond Weitekamp on X questioning whether ARC Prize will verify the Symbolica agent despite its high score

Sumeet Motwani on X announcing LongCoT Open Harness and Restricted Harness leaderboards, with GPT 5.2 RLM SOTA on Open Harness

I personally do not care if my AI programs do their reasoning in latent space or code.
I want results.

The RLM rubric

Lots of things feel close.

	Executable environment	Prompt externalized	Code calls the model	Model picks decomposition	State stays symbolic
Plain long-context call RAG / reasoning-only
Coding agents + subagents including loops
Hardcoded map-reduce developer-authored pipeline — e.g. λ-RLM
Recursive Language Model passes every check

Open the RLM rubric

Towards Recursive Coding Agents

RLM / LLM

Root RLM (depth=0)
├── Sub-RLM A (depth=1)
│   ├── LLM A1 (depth=2)
│   └── LLM A2 (depth=2)
└── Sub-RLM B (depth=1)
    ├── LLM B1 (depth=2)
    └── LLM B2 (depth=2)

Agent / Sub-Agent

Root Agent (depth=0)
├── Sub-Agent A (depth=1)
│   ├── Sub-Agent A1 (depth=2)
│   └── Sub-Agent A2 (depth=2)
└── Sub-Agent B (depth=1)
    ├── Sub-Agent B1 (depth=2)
    └── Sub-Agent B2 (depth=2)

Towards Recursive Coding Agents

Either... Trick question: RLMs are Recursive Coding Agents.

Or... How can we apply the principles of RLMs to coding agents?

My Experiments

Finding ypi

Built on Pi (minimal, extensible). Previously pi extensions could not support recursion — so I forked it. Y is for the Y-combinator.

Wrapper CLI — ypi - a fully recursive Pi agent.
Pi Extension — pi-recursive - make any existing Pi config recursive.

rawwerks/rlm-cli

CLI for Recursive Language Models.

Python 80 4 Updated Jun 16, 2026

Open repo

rawwerks/ypi

A recursive coding agent inspired by RLMs.

Shell 339 29 MIT Updated Jun 15, 2026

Open repo Homepage

The RLM ecosystem

Other notable projects

Claude Code

Is Claude Code an RLM?

No

alex zhang announcing Recursive Language Models on X, with Gary Basin replying below: "This is effectively Claude code sub-agents right?"

Q4 2025

Yes

Omar Khattab on X: "Claude Code is finally an RLM (oct 2025), congrats to Anthropic", quoting the Claude Code dynamic workflows announcement

Q2 2026

Dynamic workflows made Claude Code recursive.

Claude can write an orchestration script, then run a fleet of subagents. The line is whether the model chooses the decomposition, or the script fixes it ahead of time.

Claude Code blog diagram: Six workflow patterns

Claude Code blog · harness for every task

RLM example · model-chosen split file-handle-clean.workflow.js Decomposer reads a corpus handle, writes slice handles, then subagents extract and validate those slices. not-RLM contrast · script-fixed split hardcoded-map-reduce.workflow.js It has handles, subagents, and state, but the windows, fan-out, reducer, and stop rule are fixed in code.

For (almost) any coding agent

A language compiled by the agent, not the computer.

A markdown spec plus a giant prompt, in logical English. No new syntax to learn.

The key: a declarative contract the agent must satisfy to be “done.” That answers the reliability question.

Any agent with a filesystem and subagents can run it — and behave like an RLM.

See “Stop Babysitting Agents, Start Authoring Outcomes” on Turing Post.

openprose/prose

A new kind of language for a new kind of computer.

TypeScript 1.5k 121 MIT Updated Jun 16, 2026

Open repo Homepage

OpenProse explicitly declares subagent work

Here are two OpenProse examples where the model turns an external handle into smaller handles, then verifies the child-work trace.

Recursive decomposition handle-recursive-reader.prose.md

Starts from an external prompt_handle; root does not read the whole thing.
The model decides terminal vs. nonterminal handle.
Nonterminal handles produce child handles and call the same contract again.

if nonterminal:
  for child in manifest:
    recurse(child.path)

Directory handle slicer directory-handle-slicer.prose.md

Starts from a repo or directory handle, not copied root context.
The model uses search to choose relevant file handles for the question.
Workers inspect only assigned handles; aggregation cites worker evidence.

manifest = model_slice(directory)
for child in manifest:
  worker(child.path only)
validate worker provenance

Applied recursive coding agents

What can you actually do with recursive coding agents?

Recursive Coding Agents FTW

Trust is reliability The next step is behavioral, not more model intelligence.

A new paradigm of inference-time compute RLMs are the new reasoning models → recursive coding agents are the new coding agents.

Coding agents can be RLMs Claude Code dynamic workflows and OpenProse show two concrete paths.

Until Next Time...

Please Recurse Responsibly

Raymond Weitekamp

RAW.works | OpenProse

@raw_works

Presentation at recursivecodingagents.com | Companion GitHub repo