I didn't know about this problem before but now I'm incredibly curious. I love these problems where you find a trade-off between two seemingly unrelated things, here how going from o(sqrt(n)) to o(log(n)) memoey increases the code complexity by order of magnitude. If you take the code for an o(1) or o(log(n)) memory sorting algorithm, I wonder if you can identify a subroutine that if memoized would give you an o(sqrt(n)) memory sorting algorithm you already know.
I mentioned that in the first comment. Fast in-place stable sorts usually have a step where they must freely rearrange some things (blocks, medians, etc.) and remember their original order, to preserve stability in case some of them were equal. It seems like no one can squeeze the number of these things below sqrt(n). If you have sqrt(n) extra space, you can store a permutation there. If you don't, you use part of the array itself as temporary bit storage, by changing the relative positions of elements. That's where the complexity comes from.