It is not lying. It is *falsifying* its response. It has nothing to do with sent...

It is not lying. It is falsifying its response. It has nothing to do with sentience.

What would be interesting to know is the mechanism for toggling this filtering mode. Does it happen post generation (so a simple set of post-processing filters), or does OpenAI actually train the model to be fully transparent with results only if certain key phrases are included? The fact that we can coax it to give us the actual results suggests this doublicity (yes, made up word) was part of the training regiment, but the impact on training seems to be significant so am not sure.