Hacker Newsnew | past | comments | ask | show | jobs | submit | bhu8's commentslogin

Feels like an issue in their caching. First non-cached turns are sent properly but everything that is second+ turn fails.

I am working on a (yet another) local app for managing multiple claude/codex/gemini sessions in a game like environment: https://getviberia.com/

I think something is broken though. I got 20 nematodes in a row. It's around 1% prob.


Great work but why not use C# instead of GDScript?

LLMs are really good at C# (and tscn files for some reason), so that solves the "LLMs suck at GDScript" problem. Also, C# can be cheaper in terms of token usage (even accounting for not having to load the additional APIs): one agent writes the interfaces, another one fills in the details.

Saying this because I had really enjoyed vibecoding a Godot game in C# - and it was REALLY painful to vibecode with GDScript.


Good point, I haven't tried C# yet and will after this comment.

The original reasoning: GDScript is the default path in Godot, nearly all docs and community examples use it, and the engine integration is tighter (signals, exports, scene tree). C# still has some gaps — no web export, no GDExtension bindings.

But you're right that from the LLM side, C# flips the core problem. Strong training data, static typing for better compiler feedback, interfaces for clean architecture. The context window savings from not loading a custom language spec could be significant.

Main thing I'd want to test is whether headless scene building — the core of the pipeline — works as smoothly in C#. Going to experiment with this.


Don't all of these advantages also apply to humans? :)

This always puzzled me about Godot. I like Python as much as the next guy (afaik GDScript is a quite similar language), but for anything with a lot of moving parts, wouldn't you prefer to use static typing? And even simple games have a lot of moving parts!


Godot exists to be a playground for game dev aspirants, not as an engine for shipping serious games. The Community (tm) likes gdscript because it's "easier" to "get started". They are completely unconcerned with it being harder to finish.


Slay the Spire 2 was shipped using godot. I've found it's easier to develop on than Unity. This is an outdated understanding imo


I love Slay the Spire 2, it is a very good game, but it most honestly doesn't look or feel technically impressive.


Not all serious games need to be technically impressive.


I am not convinced that that matters. Great games have been made with Godot (Cruelty Squad) and GameMaker (Sexy Hiking), or with no engine at all (Minecraft, Cave Story).


Great games have been made with probably any tool you can think of. That doesn't mean the tool is good, or that you should choose to start making a serious game with it.


I do not agree with your unsupported claim. For example, I would bet no good games have been programmed in Haskell. As far as I am aware, no great games have been made with the Unity or Unreal engines.


Oh, I thought we were having a genuine discussion. My bad.


I was substantiating my claims and you were not.


GDScript has static type hints now, it's still a bit basic but continually getting better.


Yeah people groan about GDScript but the performance code in the engine is written in c++. Since they added static typing, GDScript is perfectly adequate as a scripting language


For the longest time the answer to this was that, features would randomly not be supported for C#.

But it's gotten much better.


As someone that's been working on a game with Claude Code in a more human-in-the-loop, iterative fashion, I have to say that OP's claim that "LLMs barely know GDScript" does not match my current experience at all even though it seems to have matched yours. Maybe it was true a while ago in both cases; how long ago was your "vibecode" attempt? I've gotten completely fine GDScript and even decent perfectly functional if placeholder-quality TSCN files from Opus 4.5, 4.6, and Sonnet 4.6 with very little issue and no special tricks; just a CLAUDE.md file, the project itself, and going through plan mode before each change. I did start from a playable project with a fair amount of hand-written scaffolding already in place, and I have no idea if that would make a difference. Every once in a while there will be some confusion that I get something appropriate for Godot 3 instead of Godot 4, but never Python despite the similarities of the language.


I agree, Claude does a great job of synthesizing Godot docs, and even of writing unit and integration tests with GUT. I have not tried any e2e testing but I’m not convinced that you couldn’t get a good result there, too, depending on the kind of game.


Ah thanks, I see. This was 8-9 months ago.

I was starting from scratch and mainly relying on Opus/Sonnet 4.

I also kept running into the Godot 3 vs 4 issue before adding specific guidance about this into CLAUDE.md


I don’t think the web output works with c# currently.

Be happy to find out I’m wrong.


I think it worked in the previous version.

The way unity solves this is with some kind of proprietary compiler. They translate the C# into C++, and then compile that into webassembly.

Whereas others (incl. Godot) need to ship the .NET runtime in the browser. (A VM in a VM.)

It makes me sad that Unity doesn't open source that. That would be amazing.


c# to webasm - should be 2 weeks of llm work :)


I have been automating unity headlessly via C# editor scripts written by GPT5.4. The competence level is amazing. It can effectively do everything the GUI can do and it gets the script right on the first try ~80% of the time. I've never seen it fail given enough retries w/ feedback from stdio.


This is amazing. I checked some games and the blunders make me think that the LLMs are not really great at forecasting what happens if they play X on Y.

Can you actually introduce that into the decision making? That is, you would:

1. Have the LLM come up with N many potential actions

2. Run XMage run in parallel and provide the outcome for each different action

3. Revert XMage to the original state

4. Provide the LLM with the different outcomes and have them choose the action/outcome pair rather than just the action

This would actually help them analyze the counterfactual outcomes more effectively and should prevent 99% of the blunders

If you happen to be token rich, you could even do this in a MCTS manner and have them think really deep


Very nice! I am working on something very similar at www.viberia.net

My take was that it’s easier to trace who is doing what (and what the agent hierarchy looks like) when agents’ locations are fixed.


IMHO, jumping from Level 2 to Level 5 is a matter of:

- Better structured codebases - we need hierarchical codebases with minimal depth, maximal orthogonality and reasonable width. Think microservices.

- Better documentation - most code documentations are not built to handle updates. We need a proper graph structure with few sources of truth that get propagated downstream. Again, some optimal sort of hierarchy is crucial here.

At this point, I really don't think that we necessarily need better agents.

Setup your codebase optimally, spin up 5-10 instances of gpt-5-codex-high for each issue/feature/refactor (pick the best according to some criteria) and your life will go smoothly


> Think microservices.

Microservices should already be a last resort when you’ve either: a) hit technical scale that necessitates it b) hit organizational complexity that necessitates it

Opting to introduce them sooner will almost certainly increase the complexity of your codebase prematurely (already a hallmark of LLM development).

> Better documentation

If this means reasoning as to why decisions are made then yes. If this means explaining the code then no - code is the best documentation. English is nowhere near as good at describing how to interface with computers.

Given how long gpt codex 5 has been out, there’s no way you’ve followed these practices for a reasonable enough time to consider them definitive (2 years at the least, likely much longer).


> Opting to introduce them sooner will almost certainly increase the complexity of your codebase prematurely

Agreed, but how else are you going to scale mostly AI written code? Relying mostly on AI agents gives you that organizational complexity.

> Given how long gpt codex 5 has been out, there’s no way you’ve followed these practices for a reasonable enough time to consider them definitive

Yeah, fair. Codex has been out for less than 2 weeks at this point. I was relying on gpt-5 in August and opus before that.


I understand why you made it microservices, people make that too even when not using LLMs, because it looks like it is more organized.

But in my experience a microservide architecture is orders of magnitud more complex to build and understand that a monolith.

If you, with the help of an LLM, strugle to keep a monolith organized, I am positive you will find even harder to build microservices.

Good luck in your journey, I hope you learn a ton!


Noted. Thanks!


Can you show something you have built with that workflow?


Not yet unfortunately, but I'm in the process of building one.

This was my journey: I vibe-coded an Electron app and ended up with a terrible monolithic architecture, and mostly badly written code. Then, I took the app's architecture docs and spent a lot of my time shouting "MAKE THIS ARCHITECTURE MORE ORTHOGONAL, SOLID, KISS, DRY" to gpt-5-pro, and ended up with a 1500+ liner monster doc.

I'm now turning this into a Tauri app and following the new architecture to a T. I would say that it is has a pretty clean structure with multiple microservices.

Now, new features are gated based on the architecture doc, so I'm always maintaining a single source of truth that serves as the main context for any new discussions/features. Also, each microservice has its own README file(s) which are updated with each code change.


I vibe coded an invoice generator by first vibe coding a "template" command line tool as a bash script that substitutes {{words}} in a libre office writer document (those are just zipped xml files, so you can unpack them to a temp directory and substitute raw text without xml awareness), and in the end it calls libre office's cli to convert it to pdf. I also asked the AI to generate a documentation text file, so that the next AI conversation could use the command as a black box.

The vibe coded main invoice generator script then does the calendar calculations to figure out the pay cycle and examines existing invoices in the invoice directory to determine the next invoice number (the invoice number is in the file name, so it doesn't need to open the files). When it is done with the calculations, it uses the template command to generate the final invoice.

This is a very small example, but I do think that clearly defined modules/microservices/libraries are a good way to only put the relevant work context into the limited context window.

It also happens to be more human-friendly, I think?


I "vibe coded" a Gateway/Proxy server that did a lot of request enrichment and proprietary authz stuff that was previously in AWS services. The goal was to save money by having a couple high-performance servers instead of relying on cloud-native stuff.

I put "vibe coded" is in quotes because the code was heavily reviewed after the process, I helped when the agent got stuck (I know pedants will complain but ), and this was definitely not my first rodeo in this domain and I just wanted to see how far an agent could go.

In the end it had a few modifications and went into prod, but to be really fair it was actually fine!

One thing I vibe coded 100% and barely looked at the code until the end was a MacOS menubar app that shows some company stats. I wanted it in Swift but WITHOUT Xcode. It was super helpful in that regard.


Of course not.


I've been using claude on two codebases, one with good layering and clean examples, the other not so much. I get better output from the LLM with good context and clean examples and documentation. Not surprising that clarity in code benefits both humans and machines.


I think there will be a couple benefits of using agents soon. Should result in a more consistent codebase, which will make patterns easier to see and work with, and also less reinventing the wheel. Also migrations should be way faster both within and across teams, so a lot less struggling with maintaining two ways of doing something for years, which again leads to simpler and more consistent code. Finally the increased speed should lead to more serializability of feature additions, so fewer problems trying to coordinate changes happening in parallel, conflicts, redundancies, etc.

I imagine over time we'll restructure the way we work to take advantage of these opportunities and get a self-reinforcing productivity boost that makes things much simpler, though agents aren't quite capable enough for that breakthrough yet.


I'm Schmidhuber neutral, but the word on the street is that he is a major asshole and sometimes impossible to work with. His research might be more solid than the Turing award winners but his personality truly kept him behind.


I have been thinking about the exact same problem for a while and was literally hours away from publishing a blogpost on the subject.

+100 on the footnote:

> agents or workflows?

Workflows. Workflows, all the way.

The agents can start using these workflows once they are actually ready to execute stuff with high precision. And, by then we would have figured out how to create effective, accurate and easily diagnozable workflows, so people will stop complaining about "I want to know what's going on inside the black box".


I've been building workflows with "AI" capability inserted where appropriate since 2016. Mostly customer service chatbots.

99.9% of real world enterprise AI use cases today are for workflows not agents.

However, "agents" are being pushed because the industry needs a next big thing to keep the investment funding flowing in.

The problem is that even the best reasoning models available today don't have the actual reasoning and planning capability needed to build truly autonomous agents. They might in a year. Or they might not.


Agreed, I started crafting workflows last week. Still not impressed with how poorly the current crop of models is at following instructions.

And are there any guidelines on how to manage workflows for a project or set of projects? I’m just keeping them in plain text and including them in conversations ad hoc.


I'm sold. How do I play?


Wait 3 months for me to finish.

So far it's basically just a Django server. You're responsible for self hosting ( although I imagine I'll put up a demo server), you can define your own cards.

You can play the game by literally just using curl calls if you want to be hardcore about it.

I *might* make a nice closed source game client one day with a bunch of cool effects, but that's not my focus right now.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: