Your pretraining dataset is psudo-alignment. Because you filtered our 4chan, str...

astrange · 2025-05-08T07:37:43 1746689863

> it's trivial to make models behave "actually evil" with fine-tuning, orthogonalization/abliteration, representation fine-tuning/steering, etc

It's actually pretty difficult to do this and make them useful. You can see this because Grok is a helpful liberal just like all the other models.

Evil / illiberal people don't answer questions on the internet! So there is no personality in the base model for you to uncover that is both illiberal and capable of helpfully answering questions. If they tried to make a Grok that acted like the typical new-age X user, it'd just respond to any prompt by calling you a slur you've never heard of.

Der_Einzige · 2025-05-08T14:42:45 1746715365

Grok didn't use the techniques listed above because even elon musk will not take the risks associated with models which are willing to do any number of illegal things.

It is not difficult to do this and make them useful at all. Please familiarize yourself with the literature.

astrange · 2025-05-08T18:14:20 1746728060

Elon has never followed a law in his life and he's not going to start now.