Is this not just because aggressive material was filtered out of training data and the system prompts usually include some preamble about being polite?
"Acknowledging they might be wrong" makes them sound like more than token predictors trained on polite sounding text.
"Acknowledging they might be wrong" makes them sound like more than token predictors trained on polite sounding text.