My tf or pytorch code usually uses custom CUDA extensions, so sadly it will neve...

partingshots · on Nov 28, 2019

Google is working on integrating CUDA directly into TensorFlow with the end goal being that you can use any GPU to do your computation. Basically a leapfrog over the inherently closed system that Nvidia has tried to implement.

I’m going to be glad when that finally happens. It’s not healthy for machine learning to be so dependent on a single provider like that.

shmerl · on Nov 29, 2019

> Google is working on integrating CUDA directly into TensorFlow

How is it going to untie it from Nvidia? The only way to do it, is not to use CUDA, but something portable that can target any GPU.

nl · on Nov 28, 2019

Google is working on integrating CUDA directly into TensorFlow

What does this mean? And how does having CUDA in TF make it better for vendors that don't support CUDA?

Could you post a link to this work?

summarity · on Nov 28, 2019

It's going to be oracle v Google all over again.

monocasa · on Nov 28, 2019

Oracle v Google is still working its way through the court system.

black_puppydog · on Nov 28, 2019

by the time Google v NVidia will hit the highest courts, those will be stacked with AI overlords. I wonder what they'd think about being stuck in a proprietary API. :D

thrwe4234safd · on Nov 28, 2019

IIRC ROCm basically takes PTX bytecode and runs them on AMD cards, so this shouldn't (theoretically) be an issue.

TomVDB · on Nov 28, 2019

That wasn't the case when I last looked it.

IIRC ROCm (in the form of HIP) defines a new C/C++ API that maps to either AMD intrincis or CUDA depending on a compile time flag.

It required converting your CUDA source code to ROCm code, though there was a code translation tool to help you with that.

To be honest: I don't really understand what ROCm stands for. AMD has been redefining their GP compute platforms so many times that it's easy to lose track.

https://rocm-documentation.readthedocs.io/en/latest/Programm...

jefft255 · on Nov 28, 2019

Yeah I’m sure stuff like this can work without code rewrite but my guess is that it’s far from plug and play. I couldn’t use that to run my model on an AMD card tommorrow without some effort.