The bar graph seems a little whacky. It groups the TPU (which can only do FP16) with the FP32 results from the GPUs, then puts the FP16 GPU results off to the side even though that's much closer to what the TPU is doing.
Impressive results regardless though; quite a bit faster than V100 than the paper specs would suggest.
Impressive results regardless though; quite a bit faster than V100 than the paper specs would suggest.