You already have this problem whenever you are using a library in any programming language. Unless you are extremely strict, vendor it and read line by line what the library does, you are just trusting that the code that you are using works.
And nothing is stopping the AI from making the unreadable mess more readable in later iterations. It can make it pass the spec first and make it cleaner later. Just like we do!
I think you missed the point. This was about whether proven-correct-but-unmaintainable-by-a-human is preferable over maintainable-but-not-proven-correct. I argued that no, it is not. If you change the two options at hand, then of course the outcome can be different.
And nothing is stopping the AI from making the unreadable mess more readable in later iterations. It can make it pass the spec first and make it cleaner later. Just like we do!