The designers argued that a million-line software (the OS + basic applications) was easier to debug/develop without having TCO everywhere. It makes stack traces useless, unless one thinks of clever ways to keep tail calls recorded, which makes it complex/tricky. The basic machine architecture is a stack machine with compact and nice instructions. Stack traces were useful then. The compiler also was not very sophisticated when it comes to optimizations.
I've long thought the right thing for Common Lisp would be to provide a way to declare sets of functions such that any tail call from one member of the set to another would be TCO'd; all other calls would push stack. This lets you write sets of mutually tail-recursive routines without forcing TCO on the entire system.
It's an entirely reasonable tradeoff, which I understand. But not having TCO is a strong negative from the language perspective, IMHO. Not an impossibly bad one, but it's really nice.