They are planning to release a Mythos-class model (from the initial announcement), but they won't until they can trust their safeguards + the software ecosystem has been sufficiently patched.
Both. There's the risk of them instructing a user on how to produce a known formulation (the Anarchist Cookbook solution, as you say), which is irritating but not that problematic.
The bigger issue is that they are potentially capable of producing novel formulations capable of producing harm, and guiding someone through this process. That is, consider a world in which someone with malicious desires has access to a model as capable at chemistry / biology as Mythos is at offensive cybersecurity abilities.
This is obviously limited by the fact that the models don't operate in the physical world, but there's plenty of written material out there.
"Smart people have economic opportunities that align them away from being evil"
For some definition of evil, some of the time, ok. But as economic opportunities compound (looking at the behavior of the ultra-rich), it seems there's at least strong correlation in the other direction, if not full-on "root of all evil" causation.
Sure, but that’s not “slaughter a stadium of people with drones” evil or “poison the water supply” evil or “take out unprotected electrical substations” evil.
So much infrastructure is very soft because the evil people aren’t smart enough to conceive of or conduct an attack.
That’s not quite true. Take a look at all the billionaires destroying society. Being evil is the surest way to get to get rich. In fact it’s the only way to amass that level of capital: there’s no ethical billionaire.
Good. This is how we will force the world to reckon with the isolated, the disgruntled, and "lone wolf" terrorist. Real "sigma males" actually exist, and when they decide "society has to pay" we are all worse off for it. If Ted Kaczynski (quintessential example of a real actual sigma) had been in his prime operating right now, he'd have mail-bombed NeurIPS and ICLR already. I'm not cool with being in crowds of AI professionals right now for physical security reasons given the extreme anti-AI sentiment that exists from nearly everyone outside of the valley: https://jonready.com/blog/posts/everyone-in-seattle-hates-ai...
The front page is currently home to the announcement of Qwen 3.6 35B, which has comparable performance to the flagship coding models of a few months ago, and can be run at home by those with a gaming computer or MBP from the last five years. It is happening, but there will always be some lag.
Yes, but every time the capabilities, security, accuracy, or any other quality of LLMs is challenged, the default answer is that we'll essentially have AGI in a quarter or two. It's very tiring to try to argue with people about current quality, when the argument is always to wait and/or pay for a super expensive model.
right on. I certainly empathize with your frustrations about "AGI". but rest assurred, I'm firmly in the camp of "not in my lifetime" and even further in the camp of "not without at least 3 more massive breakthroughs about things we currently do not understand at all". so sorry if it sounded like I was asking "what about when local llms get SUPER GOOD", or something. that's not at all what I meant. All I was asking was - "Claude Code can currently be pointed to a directory and then be chatted with about what it needs to do in that directory to make a full code project. That ability is already available on local machines through a ton of convoluted setup, but it's almost certainly going to be a packaged solution within a year (and possibly within the next few months/weeks/days). So when that packaged solution arrives and the choices are 'use the llm for scaffolding which takes 3 hours of unattended time' or 'build the scaffolding myself which takes 6 hours of deep focus time', what will still be objectionable about choosing the former?"
and, to be clear, it's an earnest question. like I've said elsewhere, I have concerns about over-reliance on the tech, but once it all moves local, a lot of those concerns become much more trivial. so I'm curious if other people have concerns that remain pressing and practical.
ETA: I'm aware that Claude wouldn't take 3 hours to do this, while using its massive warehouses of GPUS. I'm estimating what I think is a reasonable time for a single-gpu device to produce something workable.
That's not what the grandparent poster was saying, but sure. They have been steadily improving across those metrics, as Opus 4.6 / 4.7 / Mythos demonstrate. They're certainly not perfect, and I understand your fatigue (it is certainly fatiguing to follow, even if interested!), but each new release pushes it that bit further, and the improvements percolate downwards to the cheaper models.
1. No one with good vision would give a single feature two names. It’s dumb. Here is our pager feature. Cool, how do I access it? Oh you set the ui.paginate options of course!!
2. It’s almost like we have some established ways to denote arguments that are pretty popular… ‘jj init —-git’ for example? By using ‘jj git init’ I would expect all of the git compatible commands to be be ‘jj git xxx’ because that is a reasonable expectation.
This is a problem with the voodoo. These obscure nonsense commands only makes sense when you are accustomed to them. If there’s no reasonable expectation that you could just figure it out on your own. Go on vacation and come back and be surprised when you forget the voodoo. Not to mention that every tool has to have its own unique voodoo.
Almost like the professional world has figured out that made by software engineers for software engineers will never be popular. And then engineers don’t understand the effects of why you might want tool to be intuitive and popular.
You're right that, looking solely at `init`, a flag could make sense to choose the backend.
The bigger picture here though: `jj git` is the subcommand that prefixes all commands that are git specific, rather than being backend agnostic. There is also `jj git clone`, `jj git fetch`, `jj git push`, etc.
For a different backend, say Google's piper backend, there's `jj piper <whatever>`.
This means that backend specific features aren't polluting the interface of more general features.
The on-disk repository compatibility is automatic. But if you're trying to fetch something via a specific protocol, you use the command for the protocol you want to use.
There is no extra step between `git push` and `jj git push`, they're both one step.
I meant the extra step being why would I bother with jj if I’m having to specific gut inside of jj?
The issue is pretty obvious to me. GIT is the standard and that likely won’t change for some time. So if jj makes my git life better, awesome, but it’s just a wrapper and I need to know all the git voodoo now with jj voodoo on top, I don’t quite get it.
While I agree with you, their system did not start privatised, and the Shinkansens predate privatisation by some time. I don't have the evidence to justify this, but I suspect that you need national buy-in - both financially and politically - to start a HSR build-out, which could then potentially be privatised at a later stage.
Adding to the chorus: if you need to apply a solution like this, it's probably time to walk away from the platform. (Well, the right time to walk away would have been years ago, but...)
All remotely popular online public spaces are completely infiltrated by bots/propagandists/trolls/morons/etc. If you could successfully filter that type of content out you'd end up with a much larger pool of valid/authentic content to access than if you abandoned the space altogether and switched to some very obscure/niche space that's yet to be manipulated.
And when you are not there you are not there. We are way too obsessed with missing a thing. May it be a popular figure or someone we know in person. The reality is that it's actually not too bad to miss things and most information still gets through. Especially the one that's important. You might even miss out on a lot of crap that is filtered out when it gets to you.
I am happy on my personal Mastodon instance and occasional visits to HN. You might be too if you allow yourself to be.
The problem is that your definition of "crap" is probably a bit different from others. Everyone probably has a slightly different definition. Also, your feed is probably mostly stuff that was posted on X first and replicated over somehow. Network effect is real.
That being said, there are clearly multiple active automated influence operations happening on X all the time. If Elon wants X to stick around, it would be in his interest to put a stop to those. The default feed is full of posts from those bots; that's also a big problem they (X) needs to fix.
> Also, your feed is probably mostly stuff that was posted on X first and replicated over somehow.
Possibly. But if it reaches me anyways then there clearly was no need for me to be there. And if more people realize maybe the discussion might be able to move away from that place.
> The problem is that your definition of "crap" is probably a bit different from others.
I was talking about everyone's personal definition of crap. If it has not enough velocity to leave the sphere it might be only relevant for a small community or just not relevant enough to discuss. Or something different.
My argument stands. It is okay to not be part of every discussion. A lot of people think that they must be on X to stay in touch and be informed. I am not there and I am informed enough and in touch with all the people I want. If you can't be bothered to make an account outside X then we don't need to talk.
We have a solution like this for HN, but people don't use it: It's the "hide" button, and it's right next to the "flag" button. Yet, when users see content they don't like, instead of just hiding it, to block it for themselves, they often choose to flag it so that they can block others from seeing it too.
I'd welcome per-user curation tools like OP's which don't affect the content for the rest of us.
HN is my top candidate for a solution like this, too. Because there's a ton of high quality content here, increasingly buried beneath a small number of sentiments and topics I don't care to see rehashed constantly.
I'd like to see it, too, but for the opposite[1] reason: Others can use this curation (which only affects their own view of HN) instead of flagging (which affects my view and everyone else's too).
I use the flag functionality as per the guidelines:
> Off-Topic: Most stories about politics, or crime, or sports, or celebrities, unless they're evidence of some interesting new phenomenon. If they'd cover it on TV news, it's probably off-topic.
> If a story is spam or off-topic, flag it. Don't feed egregious comments by replying; flag them instead. If you flag, please don't also comment that you did.
Flagging is a way to shape what types of content takes up the finite amount of attention available on HN. If everyone used it (only) in the way the guidelines ask you to, the front page would look very different on a given day.
You need to curate your algorithm. Took me 10 years before I started blocking aggressively and now my feed is amazing with 90% bangers. Twitter is by far the best product in this space. Every other platform is 2+ weeks behind. Twitter is where the news breaks.
I had a well curated feed too (even used word filters) and yet I felt compelled to pack up and walk away. It was simply not enough.
The negative effect the various drivel had on me was nonlinear. Even if 99% of posts were fine, if that 1% was seriously upsetting, it just ruined the whole thing.
Er, what? We've had open models that can outperform ChatGPT 3.5 for several years now, and they can run entirely on your phone these days. There is no metric by which 3.5 has not been exceeded.
Not in the creative writing I care about. I've been looking for years and trying new models practically every month, including closed, hosted models. None of them approach the quality of the logs I have from that original release.
reply