We’ve been experimenting with a similar idea but in a browser-native environment — running real containers + a WebSocket terminal + multi-agent workflows. GPT-5.1 (Codex Max especially) seems to handle multi-step refactors a lot more cleanly, and chaining it through CLI agents has been surprisingly reliable.
Curious if anyone else is trying agent orchestration beyond the editor itself?
Curious if anyone else is trying agent orchestration beyond the editor itself?