Hacker Newsnew | past | comments | ask | show | jobs | submit | thejash's commentslogin

(author here) I strongly agree that these systems start to break down once the code base gets larger (we've seen that with our own projects)

But our reaction to it has been to say "ok, well the best practice in software engineering is to make small, well-isolated components anyway, so what if we did that?"

We've been trying to really break things apart into smaller pieces (and that's even evident in mngr, where much of the code is split out into separate plugins), and have been having a ton of success with it.

I realize that that might not be an option for more brownfield / existing / legacy projects, but when making something new, I've really been enjoying this way of building things.


(author here) I agree that it's super important to understand what software actually does--that's part of the whole reason we made mngr in the first place!

I believe we can use these types of tools to make software more understandable, and mngr is an example of how to do that.

In our case study, we're using AI to increase our test coverage, and if you look at it, I would argue that we are making it more understandable--now instead of just having 100's of tests, we simply have a document that describes how the software is supposed to work, and the tests are linked to that document, and checked to ensure that they conform.

That means that anyone--not just the author of the software--is now able to read through the high level tutorial description of how the commands work in order to understand what the program should do!

And as for the tests themselves, we've been able to make nice testing infrastructure--like the transcripts and recordings that were highlighted in the post--to make it even easier for us to verify the behavior of the software.

We also have an incredibly detailed style guide and set of tests and guidelines to ensure that the entire code base is consistent, and high quality. You can drop into any of the code and pretty quickly understand what is happening. And if not, claude will do an excellent job of describing how any given component works, and how it relates to the others.

Finally, mngr itself is designed to be fully transparent when it is running--you can literally attach to the coding agent you are running and see exactly what is happening, and the program makes extensive log outputs for everything it does (feel free to open a PR if you'd like to see more!)

It's not perfect formal verification, but it does feel like we're making meaningful progress on making it easier to understand software--not harder.


Since Sculptor allows you to use custom docker containers (devcontainers), you can check out the other projects in there.

Then your primary project (from Sculptor's perspective) is simply whatever contains the devcontainer / dockerfile that you want it to use (to pull in all of those repos)

It's still a little awkward though -- if you do this, be sure to set a custom system prompt explaining this setup to the underlying coding agent!

(I'm a founder of Imbue, the company behind Sculptor: https://imbue.com/sculptor/ )


It actually originally worked that way, and you can still mostly kinda use it that way (except that, because of CSRF protection, it's obnoxious -- run the program once to figure out the command line that the backend python process is started with, eg, via `ps -Fe` on linux, then shut down the app, then run that process. As long as you don't set the `ELECTRON_APP_SECRET` env var, CSRF will be disabled. Use `netstat -pant | grep -i listen` to figure out what port it is listening on)

Obviously not the most user friendly or usable, but we found that people often got pretty confused when this was a browser tab instead of a standalone app (it's easy to lose the tab, etc)


sculptor_backend seems to consume a lot of CPU and memory when just idling. Would you consider switching from Electron to Tauri so it uses the native WebView of the OS?


We have some docs here: https://github.com/imbue-ai/sculptor

If you have any specific questions that aren't covered there, please let us know in Discord!


Yup -- electron app, typescript / react on the front end, python on the backend


I haven't tried it, but it might work if you set the env var yourself (I think you can create a `.env` file in `~/.sculptor/.env` and it will be injected into the environment for the agent)

I'd give it a good 20% chance of working if you set the right environment variables in there :) Feel free to experiment in the "Terminal" tab as well, you can call claude directly from there to confirm if it works.


Definitely! I'm very excited to get in support both for other coding agents, and for as many language models (and providers) as we can.

Eventually what we want is for the whole thing to be open -- Sculptor, the coding agent, the underlying language model, etc.


Nope, not based on vibekit, but it looks like a cool project!

Our approach is a bit more custom and deeply integrated with the coding agents (ex: we understand when the turn has finished and can snapshot the docker container, allowing rollbacks, etc)

We do also have a terminal though, so if you really wanted, I suppose you could run any text-based agent in there (although I've never tried that). Maybe we'll add better support for that as a feature someday :)


Replying to each piece:

> What if my application is not dockerized?

Then claude runs in a container created from our default image, and any code it executes will run in that container as well.

> Can Claude Code execute tests by itself in the context of the container when not paired?

Yup! It can do whatever you tell it. The "pairing" is purely optional -- it's just there in case you want to directly edit the agent's code from your IDE.

> Do they all share the same database?

We support custom docker containers, so you should be able to configure it however you want (eg, to have separate databases, or to share a database, depending on what you want)

> Running full containerized applications with many versions of Postgres at the same time sounds very heavy for a dev laptop

Yeah -- it's not quite as bad if you run a single containerized Postgres and they each connect to a different database within that instance, but it's still a good point.

One of the features on our roadmap (that I'm very excited about) is the ability to use fully remote containers (which definitely gets rid of this "heaviness", though it can get a bit expensive if you're not careful)

> the feature I was most looking forward to is a mobile integration to check the agent status while away from keyboard, from my phone.

That's definitely on the roadmap!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: