Other than refcounting (which is not a part of the Python language spec - it even specifically says that conforming code shouldn't rely on it), what other semantics did you have in mind?
Guarantee or not, it constrains whether something is usable as a drop-in replacement interpreter, especially if people can't tell which programs will break, and doubly so if the breakage is a subtle data corruption race that doesn't show up in tests.