Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Side note, that CoT summary they posted is done with a really small and dumb side model, and has absolutely nothing in common with the actual CoT Gemini uses. It's basically useless for any kind of debugging. Sure, the language the model is using in the reasoning chain can be reward-hacked into something misleading, but Deepmind does a lot for its actual readability in Gemini, and then does a lot to hide it behind this useless summary. They need it in Gemini 3 because they're doing hidden injections with their Model Armor that don't show up in this summary, so it's even more opaque than before. Every time their classifier has a false positive (which sometimes happens when you want anything formatted), most of the chain is dedicated to the processing of the injection it triggers, making the model hugely distracted from the actual task at hand.


Do you have anything to back that up? In the other words, is this your conjecture or a genuine observation somehow leaked from Deepmind?


It's just my observation from watching their actual CoT, which can be trivially leaked. I was trying to understand why some of my prompts were giving worse outputs for no apparent reason. 3.0 goes on a long paranoidal rant induced by the injection, trying to figure out if I'm jailbreaking it, instead of reasoning about the actual request - but not if I word the same request a bit differently so the injection doesn't happen. Regarding the injections, that's just the basic guardrail thing they're doing, like everyone else. They explain it better than me: https://security.googleblog.com/2025/06/mitigating-prompt-in...


what is Model Armor? can you explain, or have a link?


It's a customizable auditor for models offered via Vertex AI (among others), so to speak. [1]

[1] https://docs.cloud.google.com/security-command-center/docs/m...


The racketeering has started.

Don't worry, for just $9.99/month you can use our "Model Armor (tm)(r)*" that will protect you from our LLM destroying your infra.

* terms and conditions apply, we are not responsible for anything going wrong.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: