Hacker Newsnew | past | comments | ask | show | jobs | submit | orbital-decay's commentslogin

That's what VeraCrypt is, a fork of the original TrueCrypt after all drama, security doubts, and eventual discontinuation. It took a long time and two independent audits to establish trust in it.

>by the way, which is the best uncensored model at the moment?

There are no such models, depending on your definition of censorship. If you're referring to abliteration and similar automated techniques, they're snake oil.


That is absolutely not the case. Try HauHauCS's Qwen 3.5 models. They don't refuse anything, and they don't lose a noticeable amount of capability.

GPT-2 was already too dangerous to publicly release according to OpenAI, however they still did. If something is not dangerous, it's also not useful.

>That the model already decided on a solution, upon first seeing the problem, and the reasoning is post hoc rationalization?

Yes it plans ahead, but with significant uncertainty until it actually outputs these tokens and converges on a definite trajectory, so it's not a useless filler - the closer it is to a given point, the more certain it is about it, kind of similar to what happens explicitly in diffusion models. And it's not all that happens, it's just one of many competing phenomena.


Really weird comments here. It's a VFX technique for cinematography, one of many of that kind (e.g. supporting wire removal). Cinematography in general is about showing something that doesn't exist, unless it's a documentary. Your only reaction is apparently calling censorship. Says a lot about the current Overton window and I think it's something you should reflect on.

No, I get it, and the benefits of putting what would have required a professional VFX team at (eventually) the fingertips of every amateur filmmaker are amazing.

It’s also alarming what’s possible when reassembling photons en masse becomes commoditized.


> It’s also alarming what’s possible when reassembling photons en masse becomes commoditized.

Photons Be Free?


The act of editing existing footage to mask reality with non-reality is just that—the action of making something real less real. It can be used for filmmaking (and obviously often is), but it can be used for anything else too.

The issue is how easily these tools can (and will) enable the worst faith actors and actions. It's not controversial (or shocking) to think the current situation for deep-fakes or deceptive edits is historically awful. It's equally reasonable to think these tools are going to do less good for VFX houses in Hollywood, and more evil in authoritarian regimes or chaotic social media networks. You're right that the Overton window has moved—but we ought to blame a White House that disseminates deepfakes, not commenters on HackerNews.

Tools aren't just tools, nothing exists in a vacuum. A plane is a useful means of transportation, but that doesn't mean everyone should be rushing into the cockpit. It's pretty plain to me how a tool that streamlines doctoring footage to such a useful (and deceptive) degree is just a recipe for disaster. Considering the benefit to society is...slightly eased CGI work (?), I'll easily label this a net-negative for us all.


To be honest, if that's what we see in open access, then there must already be something in existence that is closed. I don't want to create another conspiracy theory - I just want to point out the theoretical possibility - but if it was created, it has probably already been tested and possibly even used widely. So, in conclusion, if a government or agency wanted to use that technique for censorship, they would most likely already have tried it.

It's kind of late to ring the bells once the city walls have been taken. In other (shorter) words, if someone wanted to use this for censorship, they would already have tried it, so... it's either too late or too early; although being interesting (but not good or fun) probability, it's probability only.

...I hope one day everyone, just everyone will finally learn that most of the technologies are not evil or good, they're neutral, they're just tools. And most (no, not all) of them were also created with good intentions.


You don't really need AI to do this, it can be done just fine with traditional techniques, just labor intensive. Hell you can probably find a dude on fiverr to do it right now for a couple hundred, YMMV.

Given the current situation in the world why do you not think about technology can be misused, especially if it’s VFX.

I guess I have seen more AI generated spam ads than content created for the intended purpose.


> Given the current situation in the world why do you not think about technology can be misused, especially if it’s VFX.

I think the point of the parent was that while you need to think about how this technology can be misused, that's not the only thing you need to and can think about.


If that's the point then it was a dumb point to begin with because there were a whole two top level comments mentioning censorship at the time. Technically it was a majority, but only because so few people had commented.

But wouldn't using this for eg. removing a support wire result in it making Superman fall down?

Yes but imagine how bad Stalin’s reign of terror would have been with modern cinematographic techniques.

That's an absurd point. That nearly absolute cult of personality never relied on movies, printing press was more than enough to make the reality disappear. And previous ones weren't even relying on printed media at all.

Yes but why are we required to imagine only this? With this logic, why won't we try to imagine how bad Stalin's (or Hitler's, or anyone's) reign of terror would be with state-controlled social media? With state-censored Internet (that's about current Russia, by the way)? With modern communication technologies et al. and so on?

Hell yeah, let's go back to the caves just because science, technology and progress can actually empower everyone, including dictators, maniacs, psychopaths, predators and "bad people et al.", not just "good" (the definition of "good" varies, but let's assume we're using common principles of goodness) people.


You're describing structured outputs.

Models are already artificially created to begin with. The entire post-training process is carefully engineered for the model to have certain character defined by hundreds of metrics, and these emotions the article is talking about are interpreted in ways researchers like or dislike.

This is something inherently hard to avoid with a prompt. The model is instruction-tuned and trained to interpret anything sent under the user role as an instruction, not necessarily in a straightforward manner. Even if you train it to refuse or dodge some inputs (which they do), it's going to affect model's response, often in subtle ways, especially in a multiturn convo. Anthropic themselves call this the character drift.

Of course they do have emotions as an internal circuit or abstraction, this is fully expected from intelligence at least at some point. But interpreting these emotions as human-like is a clear blunder. How do you tell the shoggoth likes or dislikes something, feels desperation or joy? Because it said so? How do you know these words mean the same for us? Our internal states are absolutely incompatible. We share a lot of our "architecture" and "dataset" with some complex animals and even then we barely understand many of their emotions. What does a hedgehog feel when eating its babies? This thing is 100% unlike a hedgehog or a human, it exists in its own bizarre time projection and nothing of it maps to your state. It's a shapeshifting alien.

In mechinterp you're reducing this hugely multidimensional and incomprehensible internal state to understandable text using the lens of the dataset you picked. It's inevitably a subjective interpretation, you're painting familiar faces on a faceless thing.

Anthropic researchers are heavily biased to see what they want to see, this is the biggest danger in research.


I think a counterargument would be parallel evolution: There are various examples in nature, where a certain feature evolved independently several times, without any genetic connection - from what I understand, we believe because the evolutionary pressures were similar.

One obvious example would be wings, where you have several different strategies - feathers, insect wings, bat-like wings, etc - that have similar functionality and employ the same physical principles, but are "implemented" vastly differently.

You have similar examples in brains, where e.g. corvids are capable of various cognitive feats that would involve the neocortex in human brains - only their brains don't have a neocortex. Instead they seem to use certain other brain regions for that, which don't have an equivalent in humans.

Nevertheless it's possible to communicate with corvids.

So this makes me wonder if a different "implementation" always necessarily means the results are incomparable.

In the interest of falsifiability, what behavior or internal structures in LLMs would be enough to be convincing that they are "real" emotions?


I believe that it is possible to make an artificial system that can have emotions in a way that cannot be meaningfully discriminated from those of an animal or a human.

However, I believe that designing such a system would be pointless and very wrong.

While emotions are something that is normally associated with a physical system that encounters in the real world various helpful or harmful experiences, one could make a program that simulates completely such a physical systems with emotions; as it lives in a simulated world, and then one could say that this program has emotions.

On the other hand, unlike with the kind of program that I have mentioned before, I do not agree that an LLM has emotions, but only that it mimics human emotions, as they had been recorded in the training texts.

There is no component of an LLM that can intrinsically generate emotions. An LLM that is trained only on texts without emotions, e.g. on program sources stripped of comments, will not show any emotion whatsoever, regardless of what you put in its input prompt.

On the other hand, when you train an LLM on texts that record human emotions, then whenever the LLM input contains something that is similar to what has elicited the human emotions recorded in the training texts, then the LLM will output a token probability distribution that will generate a response similar to the reactions of humans. Unlike a book or an audio or video recording, the output of the LLM usually will not match exactly one human emotion recording, but it will mix many of those recorded, but it will still be limited by the content used for training.


"Parallel" evolution is just different branches of the same evolutionary tree. The most distantly related naturally evolved lifeforms are more similar to each other than an LLM is to a human. The LLM did not evolve at all.

Evolution is the way how the "mechanism" came to be, which is indeed very different. But the mechanism itself - spiking neurons and neurotransmitters on one hand vs matrix multiplications and nonlinear functions (both "inspired" by our understanding of neurons) don't seem so different, at least not on a fundamental level.

What is different for sure is the time dimension: Biological brains are continuous and persistent, while LLMs only "think" in the space between two tokens, and the entire state that is persisted is the context window.


> The LLM did not evolve at all.

Evolution and Transormer training are 'just' different optimization algorithms. Different optimizers obviously can produce very comparable results given comparable constraints.


The training process shares a lot of high-level properties with the biological evolution.

"Minimize training loss while isolated from the environment" is not at all similar to "maximize replication of genes while physically interacting with the environment". Any human-like behavior observed from LLMs is built on such fundamentally alien foundations that it can only be unreliable mimicry.

The environment for the model is its dataset and training algorithms. It's literally a model of it, in the same sense we are models of our physical (and social) environment. Human-like behavior is of course too specific, but highest level things like staged learning (pretraining/posttraining/in-context learning) and evolutionary/algorithmic pressure are similar enough to draw certain parallels, especially when LLM's data is proxying our environment to an extent. In this sense the GP is right.

I don't think anything you said here contradicts what they said, they take great pains throughout the blog post to explain that the model does not "expedience" these "emotions," that they're not emotions in the human sense but models of emotions (both the "expected human emotional response to a prompt" as well as what emotions another character is experiencing in part of a prompt) and functional emotions (in that they can influence behavior), and that any apparent emotions the model may show is it playing a character.

Almost! They're merely making the claim of functional emotions and outright avoiding the thorny philosophical question of whether they're "real".

[ I've actually tried exploiting functional emotions in a RAG system. The sentiment scoring and retrieval part was easy. Sentiment analysis is pretty much a settled thing I'd say, even though the mechanisms are still being studied (see the paper we're discussing.

What I'd love to be able to do is be able to extract the vector(s) they're discussing, rather than outputting as text into the context]


If you listen to Anthropic in their other works and interviews, they clearly do believe the equivalence-by-proxy between humans and LLMs to a large degree, and introduce things like model welfare (that is, caring about what the model feels). This is just another study in the series. I think they're adding these disclaimers to not sound like absolute cranks to the unprepared audience, because sometimes they really do.

I like to call this Frieren's Demon. In that show, it is explained that demons evolved with no common ancestor to humans, but they speak the language. They learned the language to hunt humans. This leads to a fundamentally different understanding of words and language.

Now, I don't personally believe this is an intelligence at all, but it's possible I'm wrong. What we have with these machines is a different evolutionary reason for it speaking our language (we evolved it to speak our language ourselves). It's understanding of our language, and of our images is completely alien. If it is an intelligence, I could believe that the way it makes mistakes in image generation, and the strange logical mistakes it makes that no human would make are simply a result of that alien understanding.

After all, a human artist learning to draw hands makes mistakes, but those mistakes are rooted in a human understanding (e.g. the effects of perspective when translating a 3D object to 2D). The machine with a different understanding of what a hand is will instead render extra fingers (it does not conceptualize a hand as a 3D object at all).

Though, again, I still just think its an incomprehensible amount of data going through a really impressive pattern matcher. The result is still language out of a machine, which is really interesting. The only reason I'm not super confident it is not an intelligence is because I can't really rule out that I am not an incomprehensible amount of data going through a really impressive pattern matcher, just built different. I do however feel like I would know a real intelligence after interacting with it for long enough, though, and none of these models feel like a real intelligence to me.


>it does not conceptualize a hand as a 3D object at all

Oh but it does, it's an emergent property. The biggest finding in Sora was exactly that, an internal conceptualization of the 3D space and objects. Extra fingers in older models were the result of the insufficient fidelity of this conceptualization, and also architectural artifacts in small semantically dense details.


Oh, really. Very interesting. Any links on this? I'm curious if they tried to map that 3D understanding in a way we could read it (e.g. putting it into Blender somehow).

> interpreting these emotions as human-like is a clear blunder. How do you tell the shoggoth likes or dislikes something, feels desperation or joy? Because it said so? How do you know these words mean the same for us?

I think you took it backwards

those vectors are exactly what it says - it affects the output and we can measure it

and it's exactly what it means for us because that's what it's measured against

and the main problem isn't "is its emotion same as ours", but "does it apply our emotion as emotion"


One relevant thing is that these forks are unnaturally narrow in all models, and rather resemble locks (not quite but close). From multiple possible continuations models tend to prefer just a couple, i.e. the model is a lot less random than it should be. That's why you're seeing annoying slop in writing and instantly recognizable color schemes in vibecoded sites. Lack of diversity probably limits the usefulness of this method as well.

>I love that we're still learning the emergent properties of LLMs!

There are tons of low-hanging fruits there.


it feels like the modern recurrence of the early 2010s bootstrap templates. we figured out how to automate building sites instantly, but at the cost of making the entire web look exactly the same.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: