It does use the UNet to denoise the VAE compressed image:
"The dithering of the palettized latents has introduced noise, which distorts the decoded result. But since Stable Diffusion is based on de-noising of latents, we can use the U-Net to remove the noise introduced by the dithering."
The included Colab doesn't have line numbers, but you can see the code doing it:
# Use Stable Diffusion U-Net to de-noise the dithered latents
latents = denoise(latents)
denoised_img = to_img(latents)
display(denoised_img)
del latents
print('VAE decoding of de-noised dithered 8-bit latents')
print('size: {}b = {}kB'.format(sd_bytes, sd_bytes/1024.0))
print_metrics(gt_img, denoised_img)
"The dithering of the palettized latents has introduced noise, which distorts the decoded result. But since Stable Diffusion is based on de-noising of latents, we can use the U-Net to remove the noise introduced by the dithering."
The included Colab doesn't have line numbers, but you can see the code doing it: