It cannot scale logarithmically because you need at least a linear number of gates to represent the inputs.
It doesn't have to scale superlinearly either, because the natural circuit using a linear number of adders is already of a logarithmic depth, which is best possible asymptotically.
The gate depth scales logarithmically, and that determines how many cycles it takes, or (worse) how fast your cycles can be, if you want the answer in one cycle.