That's an issue I find too Its like the agent must succeed at all costs, even if...

That's an issue I find too

Its like the agent must succeed at all costs, even if it means doing some insane solution

It needs to just straight up fail sometimes but its like the models are not trained to allow that