You're completely ignoring that there are several distinct market segments that ...

You're completely ignoring that there are several distinct market segments that want hardware to do AI/ML. Matrix multiplication is not something you can implement in hardware just once.

NVIDIA's biggest weakness right now is that none of their GPUs are appropriate for any system with a lower power budget than a gaming laptop. There's a whole ecosystem of NPUs in phone and laptop SoCs targeting different tradeoffs in size, cost, and power than any of NVIDIA's offerings. These accelerators represent the biggest threat NVIDIA's CUDA monopoly has ever faced. The only response NVIDIA has at the moment is to start working with MediaTek to build laptop chips with NVIDIA GPU IP and start competing against pretty much the entire PC ecosystem.

At the same time, all the various low-power NPU architectures have differing limitations owing to their diverse histories, and approximately none of them currently shipping were designed from the beginning with LLMs in mind. On the timescale of hardware design cycles, AI is still a moving target.

So far, every laptop or phone SoC that has shipped with both an NPU and a GPU has demonstrated that there are some AI workloads where the NPU offers drastically better power efficiency. Putting a small-enough NVIDIA GPU IP block onto a laptop or phone SoC probably won't be able to break that trend.

In the datacenter space, there are also tradeoffs that mean you can't make a one-size-fits-all chip that's optimal for both training and inference.

In the face of all the above complexity, the question of whether a GPU-like architecture retains any actual graphics-specific hardware is a silly question. NVIDIA and AMD have both demonstrated that they can easily delete that stuff from their architectures to get more TFLOPs for general compute workloads using the same amount of silicon.