This paper makes the claim that the “reasoning” output these models produce isn’t a necessary component of these models, and is likely unrelated to the actual final output:
I'm finding quite often from reading these streams in my own chats there's a lack of strong logical consistency from one chunk to the next where it should clearly be there. That combined with the output not being consistent with the 'thoughts'. I'll screenshot it next time I see a good example
'thinking' here needs a clearer definition. Even for us humans the language is just an after the fact attempt at describing what went on, or is going on inside our heads
The sharpest cries of, "that's not thinking!" always seem to have an air of desperation about them. It's as if I'm special, and I think, and maybe thinking is what makes me special, so if LLMs think then I'm less special.
At some point the cry will change to, "It only looks like thinking from the outside!" And some time after that: "It only looks conscious from the outside..."
That looks like thinking. If you want to argue it's not thinking you are absolutely welcome to, but you need to say more than just "no we can't?".