This is just sub agents, built into Claude. You don’t need 300,000 line tmux abs...

skippyboxedhero · 2026-01-24T19:39:22 1769283562

It isn't sub agents. The gap with existing tooling is that the abstraction is over a task rather than a conversation (due to the issue with third-party apps, Claude Code has been inherently limited to conversations which is why they have been lacking in this area, Claude Code Web was the first move in this direction), and the AI is actually coordinating the work (as opposed to being constantly prompted by the user).

One of the issues that people had which necessitated this feature is that you have a task, you tell Claude to work on it, and Claude has to keep checking back in for various (usually trivial) things. This workflow allows for more effective independent work without context management issues (if you have subagents, there is also an issue with how the progress of the task is communicated by introducing things like task board, it is possible to manage this state outside of context). The flow is quite complex and requires a lot of additional context that isn't required with chat-based flow, but is a much better way to do things.

The way to think about this pattern - one which many people began concurrently building in the past few months - is an AI which manages other AIs.

vidarh · 2026-01-24T20:50:01 1769287801

It isn't "just" sub agents, but you can achieve most of this just with a few agents that take on generic roles, and a skill or command that just tells claude to orchestrate those agents, and a CLAUDE.md that tells it how to maintain plans and task lists, and how to allow the agents to communicate their progress.

It isn't all that hard to bootstrap. It is, however, something most people don't think about and shouldn't need to have to learn how to cobble together themselves, and I'm sure there will be advantages to getting more sophisticated implementations.

skippyboxedhero · 2026-01-24T21:52:19 1769291539

Right, but the model is still: you tell the AI what to do, this is the AI tells other AIs what to do. The context makes a huge difference because it has to be able to run autonomously. It is possible to do this with SDK and the workflow is completely different.

It is very difficult to manage task lists in context. Have you actually tried to do this? i.e. not within a Claude Code chat instance but by one-shot prompting. It is possible that they have worked out some way to do this, but when you have tens of tasks, merge conflicts, you are running that prompt over months, etc. At best, it doesn't work. At worst, you are burning a lot of tokens for nothing.

It is hard to bootstrap because this isn't how Claude Code works. If you are just using OpenRouter, it is also not easy because, after setting up tools/rebuilding Claude Code, it is very challenging to setup an environment so the AI can work effectively, errors can be returned, questions returned, etc. Afaik, this is basically what Aider does...it is not easy, it is especially not easy in Claude Code which has a lot of binding choices from the business strategy that Anthropic picked.

vidarh · 2026-01-24T23:43:19 1769298199

> Have you actually tried to do this? i.e. not within a Claude Code chat instance but by one-shot prompting.

You ask if I've tried to do this, and then set constraints that are completely different to what I described.

I have done what I described. Several times for different projects. I have a setup like that running right now in a different window.

> It is hard to bootstrap because this isn't how Claude Code works.

It is how Claude Code works when you give it a number of sub-agents with rules for how to manage files that effectively works like task queues, or skills/mcp servers to interact with communications tools.

> it is not easy

It is not easy to do in a generic way that works without tweaks for every project and every user. It is reasonably easy to do for specific teams where you can adjust it to the desired workflows.

skippyboxedhero · 2026-01-25T17:41:03 1769362863

I can tell you based on your description that you did not do this. Subagents are completely different and cannot be used in this way.

No, it isn't how Claude Code works because Claude Code is designed to work with limited task queues, this is not what this feature is. Again, I would suggest you trying to actually build something like this. Why do you think Anthropic are doing this? They just don't understand anything about their product?

No, it doesn't work within that context. Again: sharing context between subagents, single instance running for months...I am not even sure why someone would think this could work. The constraints that I set are the ones that you require to build this...because I have done this. You are talking about having some CLAUDE.md files like you have invented the wheel, lol. HN is great.

vidarh · 2026-01-26T09:01:05 1769418065

> I can tell you based on your description that you did not do this. Subagents are completely different and cannot be used in this way.

And yet I have used them exactly in the way I described. That you assume they can't just demonstrate that you haven't tried very hard.

> No, it isn't how Claude Code works because Claude Code is designed to work with limited task queues, this is not what this feature is.

Claude allows your setup to execute arbitrary code that gets injected into context. The entire point is that you don't need to rely on built in capabilities of Claude Code to do any of this.

> No, it doesn't work within that context. Again: sharing context between subagents, single instance running for months...I am not even sure why someone would think this could work.

I know what I described works because I am doing it. You can achieve what I described in a variety of ways: Using skills to tell the agents how to access a shared communications channel. Using MCP servers. Just using CLAUDE.md and describe how to use files as a shared communications channel.

This is only difficult if you lack imagination.

> You are talking about having some CLAUDE.md files like you have invented the wheel, lol. HN is great.

No, the exact opposite: I'm saying that this isn't hard, that it isn't anything revolutionary or even special. It's pretty basic usage of the existing facilities. There's no invention there.

You're the one trying to imply this is more revolutionary than it is.

ukuina · 2026-01-24T23:33:19 1769297599

It's natural to assume that subagents will scale to the next level of abstraction; as you mentioned, they do not.

The unlock here is tmux-based session management for the teammates, with two-way communication using agent inbox. It works very well.

adastra22 · 2026-01-24T20:53:19 1769287999

> Claude Code has been inherently limited to conversations

How so? I’ve been using “claude -p” for a while now.

But even within an interactive session, an agent call out is non-interactive. It operates entirely autonomously, and then reports back the end result to the top level agent.

skippyboxedhero · 2026-01-24T21:38:20 1769290700

Because of OAuth. If they gave people API keys then no-one buys their ludicrously priced API product (I assume their strategy is to subsidise their consumer product with the business product).

You can use Claude Code SDK but it requires a token from Claude Code. If you use this token anywhere else, your account gets shut down.

Claude -p still hits Claude Code with all the tools, all the Claude Code wrapping.

tobyjsullivan · 2026-01-24T21:59:25 1769291965

I believe they’re talking about Claude Code’s built-in agents feature which works fine with a Max subscription.

https://code.claude.com/docs/en/sub-agents

Are you talking about the same thing or something else like having Claude start new shell sessions?

skippyboxedhero · 2026-01-25T17:44:42 1769363082

Okay...and continue to work up the levels? Why do you think OAuth might be limiting? Why do you think they started building subagents first? What is the difference between subagents and products like Aider?

If they were able to wrap the API directly, this is relatively easy to implement but they have to do this within Claude Code which is based on giving a prompt/hiding API access. This is obvious if you think carefully about what Claude Code is, what requests it is sending to the API, etc.

adastra22 · 2026-01-26T07:23:13 1769412193

What does OAuth have to do with any of this? I think you are deeply confused.

adastra22 · 2026-01-25T00:39:33 1769301573

That’s not what this subthread is about. They’re talking about the subagent within Claude Code itself.

Btw, you can use the Claude Agent SDK (the renamed Claude Code SDK) with a subscription. I can tell you it works out of the box, and AFAIK it is not a ToS violation.

skippyboxedhero · 2026-01-25T17:48:36 1769363316

Yes, you can build feature in OP with SDK have done this. Works well...but this is something completely different to agents.

Subagents and the auth implementation are linked because Anthropic's initial strategy was to have a prompt-based interaction which, because of the progress in model performance, has ended up being limiting as users want to run things without prompting. This is why they developed Claude Code Web (this product is more similiar to what this feature will do than subagents, subagents are similar if you have a very shallow understanding...the purpose of this change is to abstract away human interaction, i assume that will use subagents but the context/prompt management is quite different).

mmcclure · 2026-01-25T04:29:22 1769315362

Oh really? I was looking at the Agent SDK for an idea and the docs seemed to imply that wasn't the case.

    Unless previously approved, we do not allow third party developers to offer Claude.ai login or rate limits for their products, including agents built on the Claude Agent SDK. Please use the API key authentication methods described in this document instead.

I didn't dig deeper, but I'd pick it back up for a little personal project if I could just use my current subscription. Does it just use your local CC session out of the box?

adastra22 · 2026-01-25T15:52:57 1769356377

You can’t resell - that’s the third party language. You can build and use for your own purposes. And yes it just picks up your local sessions out of the box.

TeMPOraL · 2026-01-24T22:59:35 1769295575

> If they gave people API keys then no-one buys their ludicrously priced API product

The main driver for those subscriptions is that their monthly cost with Opus 3.7 and up pays itself back in couple hours of basic CC use, relative to API prices.

blibble · 2026-01-24T23:03:22 1769295802

can't you just rip the oauth client secret out of the code?

skippyboxedhero · 2026-01-25T17:49:53 1769363393

You can. This is how Opencode worked, but they are clamping down on that approach.

As someone else has mentioned, you can actually use SDK for programmatic access. But that happens within the CC wrapper so it isn't a true API experience i.e. it has CC tools.

stingraycharles · 2026-01-24T17:40:26 1769276426

It’s even less of a feature, Claude Code already has subagents; this new feature just ensures Claude Code actually uses this for implementation.

imho the plans of Claude Code are not detailed enough to pull this off; they’re trying to do it to preserve context, but the level of detail in the plans is not nearly enough for it to be reliable.

ctoth · 2026-01-24T20:18:21 1769285901

I agree with this. Any time I make a plan I have to go back and fill it in, fill it in, what did we miss, tdd, yada yada. And yes, I have all this stuff in CLAUDE.md.

You start to get a sense for what size plan (in kilobytes) corresponds to what level of effort. Verification adds effort, and there's a sort of ... Rocket equation? in that the more infrastructure you put in to handle like ... the logistics of the plan, the less you have for actual plan content, which puts a cap on the size of an actionable plan. If you can hit the sweet spot though... GTFO.

I also like to iterate several times in plan mode with Claude before just handing the whole plan to Codex to melt with a superlaser. Claude is a lot more ... fun/personable to work with? Codex is a force of nature.

Another important thing I will do is now that launching plans clear context, it's good to get out of planning mode early, hit an underspecified bit, go back into planning mode and say something like "As you can see the plan was underspecified, what will the next agent actually need to succeed?" and iterate that way before we actually start making moves. This is made possible by lots of explicit instructions in CLAUDE.md for Claude to tell me what it's planning/thinking before it acts. Suppressing the toolcall reflex and getting actual thought out helps so much.

tobyjsullivan · 2026-01-24T22:01:35 1769292095

It’s moving fast. Just today I noticed Claude Code now ends plans with a reference to the entire prior conversation (as a .jsonl file on disk) with instructions to check that for more details.

Not sure how well it’s working though (my agents haven’t used it yet)

dceddia · 2026-01-24T18:28:19 1769279299

Interesting about the level of detail. I’ve noticed that myself but I haven’t done much to address it yet.

I can imagine some ideas (ask it for more detail, ask it to make a smaller plan and add detail to that) but I’m curious if you have any experience improving those plans.

stingraycharles · 2026-01-24T19:02:18 1769281338

I’m trying to solve this myself by implementing a whole planner workflow at https://github.com/solatis/claude-config

Effectively it tries to resolve all ambiguities by making all decisions explicit — if the source cannot be resolved to documentation or anything, it’s asked to the user.

It also tries to capture all “invisible knowledge” by documenting everything, so that all these decisions and business context are captured in the codebase again.

Which - in theory - should make long term coding using LLMs more sane.

The downside is that it takes 30min - 60min to write a plan, but it’s much less likely to make silly choices.

ElFitz · 2026-01-25T08:55:10 1769331310

Have you tried the compound engineering plugin? [^1]

My workflow with it is usually brainstorm -> lfg (planning) -> clear context -> lfg (giving it the produced plan to work on) -> compound if it didn’t on its own.

[^1]: https://github.com/EveryInc/compound-engineering-plugin

stingraycharles · 2026-01-25T10:50:51 1769338251

That’s super interesting, I’ll take a look to see if I can learn something from it, as I’m not familiar with the concept of compound engineering.

Seems like a lot of it aligns with what I’m doing, though.

throwup238 · 2026-01-26T04:55:24 1769403324

> The downside is that it takes 30min - 60min to write a plan

Oof you weren't kidding. I've got your skills running on a particularly difficult problem and it's been running for over three hours (I keep telling it to increase the number of reviews until its satisfied).

stingraycharles · 2026-01-26T08:30:36 1769416236

Yeah I’m working on some improvements in this area, should make things faster. But yeah I’ve frequently had 1h-2h planning sessions as well, depending upon the complexity of the task.

colelyman · 2026-01-24T19:44:22 1769283862

I have had good success with the plans generated by https://github.com/obra/superpowers I also really like the Socratic method it uses to create the plans.

vardalab · 2026-01-24T21:07:12 1769288832

I iterate around issues. I have a skill to launch a new tmux window for worktree with Claude in one pane and Codex in another pane with instructions on which issue to work on, Claude has instructions to create a plan, while Codex has instructions to understand the background information necessary for this issue to be worked on. By the time they're both done, then I can feed Claude's plan into Codex, and Codex is ready to analyze it. And then Codex feeds the plan back to Claude, and they kind of ping pong like that a couple times. And after a certain or several iterations, there's enough refinement that things usually work. Then Claude clears context and executes the plan. Then Codex reviews the commit and it still has all the original context so it knows what we have been planning and what the research was about the infrastructure. And it does a really good job reviewing. And again, then they ping pong back and forth a couple times, and the end product is pretty decent. Codex's strength is that it really goes in-depth. I usually do this at a high reasoning effort. But Codex has zero EQ or communication skills, so it works really well as a pedantic reviewer. Claude is much more pleasant to interact with. There's just no comparison. That's why I like planning with Claude much more because we can iterate.. I am just a hobbyist though. I do this to run my Ansible/Terraform infrastructure for a good size homelab with 10 hosts. So we actually touch real hardware a lot and there's always some gotchas to deal with. But together, this is a pretty fun way to work. I like automating stuff, so it really scratches that itch.

chickensong · 2026-01-26T04:57:03 1769403423

> the plans of Claude Code are not detailed enough

You can make a template and tell Claude to make a plan that follows the template.

AffableSpatula · 2026-01-24T16:41:58 1769272918

Claude already had subagents. This is a new mode for the main agent to be in (bespoke context oriented to delegation), combined with a team-oriented task system and a mailbox system for subagents to communicate with each other. All integrated into the harness in a way that plugins can't achieve.

theturtletalks · 2026-01-24T19:15:52 1769282152

Wow there goes a lot of harnesses out the window. The main limitation of subagents was they couldn’t communicate back and forth with the main agent. How do we invoke swarm mode in Claude Code?

apsurd · 2026-01-24T18:42:12 1769280132

OT: Your visual on "stacked PRs" instantly made me understand what a stacked PR is. Thank you!

I had read about them before but for whatever reason it never clicked.

Turns out I already work like this, but I use commits as "PRs in the stack" and I constantly try to keep them up to date and ordered by rebasing, which is a pain.

Given my new insight with the way you displayed it, I had a chat with chatGPT and feel good about giving it a try:

    1. 2-3 branches based on a main feature branch
    2. can rebase base branch with same frequency, just don't overdo it, conflicts should be base-isolated.
    3. You're doing it wrong if conflicts cascade deeply and often
    4. Yes merge order matters, but tools can help and generally the isolation is the important piece

abhinavg · 2026-01-24T22:26:58 1769293618

If you’re interested in exploring tooling around stacked PRs, I wrote git-spice (https://abhinav.github.io/git-spice/) a while ago. It’s free and open-source, no strings attached.

Griffinsauce · 2026-01-24T19:16:55 1769282215

If you're rebasing a lot, definitely set up rerere (reuse recorded solution) - it improves things enormously.

Do make sure you know how to reset the cache, in case you did a bad conflict resolution because it will keep biting you. Besides that caveat it's a must.

byproxy · 2026-01-24T18:54:11 1769280851

Isn’t this just “Gitflow”?

https://www.atlassian.com/git/tutorials/comparing-workflows/...

apsurd · 2026-01-24T19:13:12 1769281992

After a quick read it seems like gitflow is intended to model a release cycle. It uses branches to coordinate and log releases.

Stacking is meant to make development of non-trivial features more manageable and more likely to enter main safer and faster.

it's specific to each developer's workflow and wouldn't necessarily produce artifacts once merged into main (as gitflow seems to intentionally have a stance on)

withinboredom · 2026-01-24T19:08:43 1769281723

Please don’t use git-flow. Every time I see it, it looks like an over-engineer’s wet dream.

seff · 2026-01-24T19:40:39 1769283639

Can you say more as to why? The concept is not complex and in our situation at least provides a lot of benefits.

jdxcode · 2026-01-24T20:35:33 1769286933

I think the guy that created it has even stated he thinks it's a bad idea

withinboredom · 2026-01-24T19:50:10 1769284210

Literally the reason’s for git’s existence is to make merging diverging histories less complicated. Adding back the complexity misses the point entirely.

mkw5053 · 2026-01-24T17:45:20 1769276720

Yeah, since they introduced (possibly async) subagents, I've had my main claude instance act as a manager overseeing implementation agents, keeping it's context clean, and ensuring everything goes to plan in the highest quality way.

AffableSpatula · 2026-01-24T17:47:11 1769276831

yep this is exactly how I use the main agent too, I explicitly instruct to only ever use background async subagents. Not enough people understand that the claude code harness is event driven now and will wake up whenever these subagent completion events happen.

bradgessler · 2026-01-24T19:31:18 1769283078

Any recommendations on sandboxing agents? Last time I asked folks recommended docker.

ahstilde · 2026-01-25T02:53:08 1769309588

https://github.com/finbarr/yolobox/

skybrian · 2026-01-24T19:40:26 1769283626

I like running remotely using exe.dev with SyncThing to sync files to my laptop.

I use Shelley (their web-based agent) but they have Claude Code installed too.