Hacker Newsnew | past | comments | ask | show | jobs | submit | querez's commentslogin

> The project owner strongly emphasize the no LLM dependency, in a world of AI slope this is so refreshing.

The project owner is talking about LLVM,a compiler toolkit, not an LLM.


They also said "hand written", implying that no LLMs whirred, slopped and moonwalked all over the project.


I mean.. I'm one of the staunchest skeptics of LLMs as agents, but they're amazing as supercharged autocomplete and I don't see anything wrong with them in that role. There's a position between handwritten and slopped that's pareto.


When I want to autocomplete my code with IP scraped from github with all licensing removed, nothing beats an LLM.


They can take away our jobs, but by god they cannot take away our autism!


It's actually quite easy to spot if LLMs were used or not.

Very few total number of commits, AI like documentation and code comments.

But even if LLMs were used, the overall project does feel steered by a human, given some decisions like not using bloated build systems. If this actually works then that's great.


Since when is squashing noisesum commits an AI activity instead of good manners?


The first commit was 17k lines. So this was either developed without using version control or at least without using this gh repo. Either way I have to say certain sections do feel like they would have been prime targets for having an LLM write them. You could do all of this by hand in 2026, but you wouldn't have to. In fact it would probably take forever to do this by hand as a single dev. But then again there are people who spend 2000 hours building a cpu in minecraft, so why not. The result speaks for itself.


> The first commit was 17k lines. So this was either developed without using version control or at least without using this gh repo.

Most of my free-time projects are developed either by my shooting the shit with code on disk for a couple of months, until it's in a working state, then I make one first commit. Alternatively, I commit a bunch iteratively, but before making it public I fold it all into one commit, which would be the init. 20K lines in the initial commit is not that uncommon, depends a lot on the type of project though.

I'm sure I'm not alone with this sort of workflow(s).


Can you explain the philosophy behind this? Why do this, what is the advantage? Genuinely asking, as I'm not a programmer by profession. I commit often irrespective of the state of the code (it may not even compile). I understand git commit as a snapshot system. I don't expect each commit to be pristine, working version.

Lot of people in this thread have argued for squashing but I don't see why one would do that for a personal project. In large scale open source or corporate projects I can imagine they would like to have clean commit histories but why for a personal project?


I do that because there's no point in anyone seeing the pre-release versions of my projects. They're a random mess that changed the architecture 3 times. Looking at that would not give anyone useful information about the actual app. It doesn't even give me any information. It's just useless noise, do it's less confusing if it's not public.


I don't care about anyone seeing or not seeing my unfinished hobby projects, I just immediately push to GitHub as another form of backup.


I don't care about backing up unfinished hobby projects, I just write/test until arbitrarily sharing, or if I'm completely honest, potentially abandoning it. I may not 'git init' for months, let alone make any commits or push to any remotes.

Reasoning: skip SCM 'cost' by not making commits I'd squash and ignore, anyway. The project lifetime and iteration loop are both short enough that I don't need history, bisection, or redundancy. Yet.

Point being... priorities vary. Not to make a judgement here, I just don't think the number of commits makes for a very good LLM purity test.


I literally keep this in my bash history so i can press up once and hit enter to commit: `git add *; git commit -m "changes"; git push origin main;`

I use git as backup and commit like every half an hour... but make sure to give proper commit message once a certain milestone have been reached.

Im also with the author on this on squashing all these commits into a new commit and then pushing it in one go as init commit before going public.


you should push to a private working branch- and freqently. But, when merging your changes to a central branch you should squash all the intermediate commits and just provide one commit with the asked for change.

Enshrining "end of day commits", "oh, that didn't work" mistakes, etc is not only demoralizing for the developer(s), but it makes tracing changes all but impossible.


Yeah, I don't care about that for my tiny hobby projects which are used by no one. XD


> I don't expect each commit to be pristine, working version.

I guess this is the difference, I expect the commit to represent a somewhat working version, at least when it's in upstream, locally it doesn't matter that much.

> Why do this, what is the advantage?

Cleaner I suppose. Doesn't make sense to have 10 commits whereas 9 are broken half-finished, and 10 is the only one that works, then I'd just rather have one larger commit.

> they would like to have clean commit histories but why for a personal project?

Not sure why it'd matter if it's personal, open source, corporate or anything else, I want my git log clean so I can do `git log --short` and actually understand what I'm seeing. If there is 4-5 commits with "WIP almost working" between each proper commit, then that's too much noise for me, personally.

But this isn't something I'm dictating everyone to follow, just my personal preference after all.


> If there is 4-5 commits with "WIP almost working" between each proper commit, then that's too much noise for me, personally.

Yep, no excuse for this, feature branches exist for this very reason. wip commits -> git rebase -i master -> profit


Fair enough. Thanks for the clarification. Personally, I think, everything before a versioned release (even something like 0.1) can be messy. But from your point I can see it that a cleaner history will have advantages.

Further, I guess if author is expecting contributions to the code in the future, it might be more "professional" for the commits to only the ones which are relevant.

My own projects, I consider, are just for my own learning and understanding so I never cared about this, but I do see the point now.

Regardless, I think it still remains a reasonable sign of someone doing one-shot agent-driven code generation.


One point I missed, that might be the most important, since I don't care about it looking "professional" or not, only care about how useful and usable something is: if you have commits with the codebase being in a broken state, then `git bisect` becomes essentially useless (or very cumbersome to use), which will make it kind of tricky to track down regressions unless you'd like to go back to the manual way of tracking those down.

> Regardless, I think it still remains a reasonable sign of someone doing one-shot agent-driven code generation.

Yeah, why change your perception in the face of new evidence? :)


I see the point.

Regarding changing the perception, I think you did not understand the underlying distrust. I will try to use your examples.

It's a moderate size project. There are two scenarios: author used git/some VCS or they did not use it. If they did not use it, that's quite weird, but maybe fine. If they did use git, then perhaps they squashed commits. But at certain point they did exist. Let's assume all these commits were pristine. It's 16K loc, so there must be decent number of these pristine commits that were squashed. But what was the harm in leaving them?

So these commits must have been made of both clean commits as well as broken commits. But we have seem this author likes to squash commits. Hmm, so why didn't they do it before and only towards the end?

Yes, I have been introduced to a new perception but it's the world does not work "if X, then not Y principles." And this is a case where the two things being discussed are not mutually exclusive like you are assuming. But I appreciate this conversation because I learnt importance and advantages of keeping clean commit history and I will take that into account next time reaching to the conclusion that it's just another one-shot LLM generated project. But nevertheless, I will always consider the latter as a reasonable possibility.

I hope the nuance is clear.


> I guess this is the difference, I expect the commit to represent a somewhat working version,

On a solo project I do the opposite: I make sure there is an error where I stopped last. Typically I put in in a call to the function that is needed next so i get a linker error.

6 months later when I go back to the project that link error tells me all I need to know about what comes next


How does that work out if you want to use `git bisect` to find regressions or similar things?


I dont do bisects on each individual branch. I'll bisect on master instead and find the offending merge.

From that point bisect is not needed.


Or first thousand commits were squashed. First public commit tells nothing about how this was developed. If I were to publish something that I have worked on my own for a long time, I would definitely squash all early commits into a single one just to be sure I don't accidentally leak something that I don't want to leak.


>leak what

For example when the commits were made. I would not like to share publicly for the whole world when I have worked with some project of mine. Commits themselves could also contain something that you don't want to share or commit messages.

At least I approach stuff differently depending if I am sharing it with whole world, with myself or with people who I trust.

Scrubbing git history when going from private to public should be seen totally normal.


Hmm I can see that. Some people are like that. I sometimes swear in my commit messages.

For me it's quite funny to sometimes read my older commit messages. To each of their own.

But my opinion on this is same as it is with other things that have become tell-tale signs of AI generated content. If something you used to do starts getting questioned as AI generated content, it's better to change that approach if you find it getting labelled as AI generated, offensive.


Leak what?


If you have for example a personal API key or credentials that you are using for testing, you throw it in a config file or hard code it at some point. Then you remove them. If you don't clean you git history those secrets are now exposed.


Timestamps


Hello not the poster but I am BarraCUDA's author. I didn't use GIT for this. This is just one of a dozen compiler projects sitting in my folder. Hence the one large initial commit. I was only posting on github to get feedback from r/compilers and friends I knew.

The original test implementation of this for instance was written in OCaml before I landed on C being better for me.


a lot of ppl dont use git. and just chuck stuff in there willynilly when they want to share it.

people are to keen to say something was produced with an LLM if they feel its something they cannot produce themselves readily..


I would be very concerned about someone working on a 16k loc codebase without a VCS.


Can you prove that this is what happened?


this type of project is the perfect project for an llm, llvm and cuda work as harnesses, easy to compare.


What do you mean by harnesses?


agentic ai harness for harness (ai)


Says the clawdbot


It's quite amusing the one time I did not make an anti-AI comment, I got called a clanker myself.

I'm glad the mood here is shifting towards the right side.


That's not how I remember it. Excitement for python strongly predated ML and data science. I remember python being the cool new language in 1997 when I was still in high school. Python 2.4 was already out, and O'Reilly had put several books kn the topic already it. Python was known as this almost pseudo code like language thst used indentation for blocking. MIT was considering switching to it for its introductory classes. It was definitely already hyped back then -- which led to U Toronto picking it for its first ML projects that eventually everyone adopted when deep learning got started.


It was popular as a teaching language when it started out, along side BASIC or Pascal. When the Web took off, it was one of a few that took off for scripting simple backends, along side PHP, JS and Ruby.

But the real explosion happened with ML.


I agree with the person you're replying to. Python was definitely already a thing before ML. The way I remember it is it started taking off as a nice scripting language that was more user friendly than Perl, the king of scripting languages at the time. The popularity gain accelerated with the proliferation of web frameworks, with Django tailgating immensely popular at the time Ruby on Rails and Flask capturing the micro-framework enthusiast crowd. At the same time the perceived ease of use and availability of numeric libraries established Python in scientific circles. By the time ML started breaking into mainstream, Python was already one of the most popular programming languages.


As I remember it there was a time when Ruby and Python were the two big up-and-coming scripting languages while Perl was in decline.


That is correct. I "came of age" in 2010-11 during the Web 2.0 era of web apps beginning to eat the world. Ruby was just starting to come down from its peak as the new hotness, and Python thanks to Django + Google's support/advocacy was becoming the new Next Big Thing for the web and seemed like a no-brainer to learn as my main tool back at the time.

At the time Java was the mature but boring "enterprise" alternative to both, but also beginning its decline in web mindshare as Ruby/Python (then JavaScript/Node) were seen as solving much of the verbosity/complexity associated with Java.

There was a lot of worry that the Python 2->3 controversy was threatening to hurt its adoption, but that concern came from Python in a position of strength/growing fast.

Python's latter day positioning as the ML/scientific computing language of choice came as its position in the web was being gobbled up by JavaScript by the day and was by then well on the downswing for web, for a variety of technical/aesthetic reasons but also just simply no longer being "cool" vs. a Node/NoSQL stack.


Sure, but the point was that it being used for web backends was years after it was invented, an area in which it never ruled the roost. ML is where it has gained massive traction outside SW dev.


I think you're underestimating what AI can do in the coding space. It is an extreme paradigm shift. It's not like "we wrote C, but now we switch to C++, so now we think in objects and templates". It's closer to the shift from assembly to a higher level language. Your goal is still the same. But suddenly you're working in a completely newer level of abstraction where a lot of the manual work that used to be your main concern is suddenly automated away.

If you never tried Claude Code, give it s try. It's very easy to get I to. And you'll soon see how powerful it is.


> But suddenly you're working in a completely newer level of abstraction where a lot of the manual work that used to be your main concern is suddenly automated away.

It's remarkable that people who think like this don't have the foresight to see that this technology is not a higher level of abstraction, but a replacement of human intellect. You may be working with it today, but whatever you're doing will eventually be done better by the same technology. This is just a transition period.

Assuming, of course, that the people producing these tools can actually deliver what they're selling, which is very much uncertain. It doesn't change their end goal, however. Nor the fact that working with this new "abstraction" is the most mind numbing activity a person can do.


I agree with this. At a higher level of abstraction, you’re still doing the fundamental problem solving. Low-level machine language or high-level Java, C++ or even Python, the fundamental algorithm design is still entirely done by the programmer. LLMs aren’t being used to just write the code unless the user is directing how each line or at least each function is being written, often times you can just describe the problem and it solves it most of the way if not entirely. Only for really long and complex tasks do the better models really require hand-holding and they are improving on that end rapidly.

That’s not a higher level of abstraction, it’s having someone do the work for you while doing less and less of the thinking as well. Someone might resist that urge and consistently guide the model closely but that’s probably not what the collective range of SWEs who use these models are doing and rapidly the ease of using these models and our natural reluctance to take on mental stress is likely to make sure that eventually everyone lets LLMs do most or all of the thinking for them. If things really go in that direction and spread, I foresee a collective dumbing down of the general population.


> No Gemini was not "entirely trained on TPUs". They did hundreds of experiments on GPUs to get to the final training run done entirely on TPUs. GCP literally has millions of GPUs and you bet your ass that the gemini team has access to them and uses them daily.

You are wrong. Gemini was definitely trained entirely on TPU. Of course your point of "you need to count failed experiments, too". Is correct. But you seem to have misconceptions around how deepmind operates and what infra it possess. Deepmind (or barely any of Google internal stuff) runs on Borg, an internal cloud system, which is completely separate (and different) from gcp. Deepmind does not have access to any meaningful gcp resources. And Borg barely has any GPUs. At the time I left deepmind, the amount of tpu compute available was probably 1000x to 10000x larger than the amount of gpu compute. You would never even think of seriously using GPUs for neural net training, it's too limited (in terms of available compute) and expensive (in terms of internal resource allocation units), and frankly less well supported by internal tooling than tpu. Even for small, short experiments, you would always use TPUs.


Using TPU has the same opportunity cost as GPU. Just because they built something doesn't mean it's cheaper. If it is they can rent it cheaper to save money on paying billions of dollars to Nvidia.

A big segment of the market just uses GPU/TPU to train LLMs, so they don't exactly need flexibility if some tool is well supported.


I assume TPU TCO is significantly cheaper than GPU TCO. At the same time, I also assume that market demand for GPUs is higher than TPUs (external tooling is just more suited to GPU -- e.g. I'm not sure what the Pytorch-on-TPU story is these days, but I'd be astounded if it's on par with their GPU support). So moving all your internal teams to TPUs means that all the GPUs can be allocated to GCP.


Just doesn't make sense. If you make significantly more money renting TPU, why not rent them cheaper to shift the customers(and save billions that you are giving to Nvidia). TPU right now isn't significantly more cheaper to external customer.

Again I am talking about LLM training/inference which if I were to guess is more than half of the workload currently for which the switching cost is close to 0.


At least blessed teams we used GPUs when we were allowed, else CPUs. TPUs were basically banned in YT since they were reserved for higher priority purposes. Gemini was almost certainly trained with one, but I guarantee an ungodly amount of compute has gone into training neural nets with CPUs and GPUs.


I grew up in the Internet at that time, and it's certainly not how I type. So you might want to be more specific about which sites or subcultures you think this style is representative of?


I’m certainly no authority but i tend to write the same way for casual communication, came from the 90s era BBS days. It was (and still is) common on irc nets too. Autocorrect fixes up some of it, but sometimes i just have ideas i’m trying to dump out of my head and the shift key isn’t helping that go faster. Emails at work get more attention, but bullshittin with friends on the PC? No need.

I’ll code switch depending on the venue, on HN i mostly Serious Post so my post history might demonstrate more care for the language than somewhere i consider more causal.


The sentence is constructed, weirdly, but it's meant to say that fever is "killing off unknown foreign bodies"


Pain being a way to let you know that something is damaged is close to true--close enough not to quibble with. But fever is not a way to let you know that foreign bodies are being killed off--that's his claim, and it's wrong.


> But fever is not a way to let you know that foreign bodies are being killed off--that's his claim, and it's wrong.

querez's point is that the sentence is meant to be parsed as:

> pain and fever which are the bodies way of <<letting you know something is damaged>> and <<killing off unknown foreign bodies>> respectively

So the claim is that fever is the body's way of killing off unknown foreign bodies, not the body's way of letting you know something is killing off unknown foreign bodies.


The fact that one can ferret out what someone perhaps meant to say from what they did say doesn't change the fact that what they did say was wrong, and can be rightly criticized for being wrong. Someone else responded to such a criticism by writing "He said that" -- but that is false.

And I would make the point that these two things are not analogous, so they shouldn't be mentioned together in any case. The response to the misstatement that started this subthread was "I think what they're saying here is that you're not just suppressing a symptom, you're suppressing a sickness fighting mechanism", which is exactly right, along with a subsequent statement "Fever isn't just a symptom. It's a defense mechanism. The idea is that use of antipyretic drugs may make the infection worse" and which the misstatement completely muddies. It's weird how some people who didn't even make the misleading misstatement are so desperately trying to defend it for no good reason, while others are rationally pointing out how the statement is off the mark. Even with the edit, the statement serves no purpose, mixing up symptoms like pain that guide us psychologically with autonomic immune system responses.

I will say no more about this dead horse.


> The fact that one can ferret out what someone perhaps meant to say from what they did say doesn't change the fact that what they did say was wrong, and can be rightly criticized for being wrong.

kruffalon said something ambiguous, with the intended interpretation (of that ambiguous part) being true and a secondary unintended interpretation being false - both interpretations grammatically valid.

querez tried to clarify the misinterpretation being made, but it looked as though their point was missed so I made it more explicit.

> It's weird how some people who didn't even make the misleading misstatement are so desperately trying to defend it for no good reason

I just saw some confusion so chimed in to try to clear it up. Only motive is that I feel like it's useful in a discussion for people to know what others mean, rather than arguing against a phantom point caused by miscommunication.


Correct!

I will edit my previous comment to make it more clear.

(Edit: Nvm, I'm not able to edit that comment any more, I've still not understood the edit-function here)


Arguably both Go and Python also have great stdlibs. The only advantage that JVM and .NET have is a default GUI package. Which is fair, but keeps getting less and less relevant as people rely more on web UIs.


.Net has Blazor for WebUI.

I don't ask you to judge if you like it, I'm just saying that you can totally make a professional WebUI within the dotnet stdlib.


Respectfully disagree. Python and Go std lib do not even play in the same league. I had to help someone with datetime¹ handling in Python a while back. The stdlib is so poor, you have to reach out for a thirdparty lib for even the most basic of tasks².

Don't take my word for it, take a dive. You wouldn't be the first to have adjust their view.

For example, this section is just about the built-in web framework asp.net: https://learn.microsoft.com/en-us/aspnet/core

______

1. This might be a poor example as .net has NodaTime and the jvm has YodaTime as 3rd-party libs, for if one has really strict needs. Still, the builtin DateTime constructs offer way more than what Python had to offer.

2. Don't get me started on the ORM side of things. I know, you don't have to use one, but if you do, it better does a great job. And I wouldn't bat an eye if the ORM is not in the standard, but boy was I disappointed in Python's ecosystem. EF Core come batteries included and is so much better, it isn't fun anymore.


You helped someone with Python, and what evidence do you have justifying your claims about alleged Go stdlib narrowness?


Online documentation: https://pkg.go.dev/std

Let me know if I look at the wrong place.


"developers can prototype, fine-tune, and inference [AI models]"...

shouldn't it be infer?


It should be "run inference on" in my opinion, and would be best shortened IMO to just "prototype, fine-tune, and run".

I'argue that "inference" has taken on a somewhat distinct new meaning in an LLM-context (loosely: running actual tokens through the model) and deviating from the base term to the verb form would make the sentence less clear to me.


No. It's quite common for technical slang to deviate from general vocabulary.

Cf. "compute" is a verb for normal people, but for techies it is also "hardware resources used to compute things".


I don't think "inference" as a verb has become technical slang. At least not in my bubble.


Perhaps "infer from"? I was also taken aback by how they just decided to make "inference" a verb, though. A decent writer would have rewritten the sentence to make it work, similar to how a software implementation sometimes just doesn't work out. But apparently that's too much to ask from Nvidia marketing.

Funnily enough things like this show that a human probably was involved in the writing. I doubt an LLM would have produce that. I've often thought about how future generations are going to signal that they are human and maybe the way will be human language changing much more rapidly than it has done, maybe even mid sentence.


I don’t think either of those are right…


I think what is distinct in this proposal is that there are n 1:n channels


Why? I fail to see how using chromium as basis for other apps has impact on who has the power to innovate in the browser space?


Because then the bugs we find in your app contribute back to chrome rather than Firefox. Then over time, chrome a becomes faster and more efficient browser which makes it harder to convince users to switch. Big picture thing.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: