The list below shows some of the more important optimizations for GPUs... A few of them have not been upstreamed due to lack of a customizable target-independent optimization pipeline.
So the LLVM version of gpucc will be incomplete? Will there be a release of the original stand-alone gpucc?
Yes, it is currently incomplete, but I'd say at least 80% of the optimizations are upstreamed already. Also, folks in the LLVM community are actively working on that. For example, Justin Lebar recently pushed http://reviews.llvm.org/D18626 that added the speculative execution pass to -O3.
Regarding performance, one thing worth noting is that missing one optimization does not necessarily cause significant slowdown on the benchmarks you care about. For example, the memory-space alias analysis only noticeably affects one benchmark in the Rodinia benchmark suite.
Regarding your second question, the short answer is no. The Clang/LLVM version uses a different architecture (as mentioned in http://wujingyue.com/docs/gpucc-talk.pdf) from the internal version. The LLVM version offers better functionality and compilation time, and is much easier to maintain and improve in the future. It would cost even more effort to upstream the internal version than to make all optimizations work with the new architecture.
In fact I think at the moment almost everything, other than the memory-space alias analysis and a few pass tuning tweaks, is in. I know the former will be difficult to land, and I suspect the latter may be as well.
I don't have a lot of benchmarks at the moment, so I can't say how important they are. And it of course depends on what you're doing.
clang/llvm's CUDA implementation shares most of the backend with gpucc, but it's an entirely new front-end. The front-end works for tensorflow, eigen, and thrust, but I suspect if you try hard enough you'll be able to find something nvcc accepts that we can't compile. At the moment we're pretty focused on making it work well for Tensorflow.
Thanks for the clarification! It's always a pleasure to get a direct response from the first author on something as awesome as this.
I'm definitely subscribing to the llvm-dev list[1] in case any discussion on this continues there. There's also the llvm-commits, clang-dev, and clang-commits lists as well, but llvm-dev kinda seems like the right place for this.
Gpucc in LLVM is definitely a breath of fresh air for all of us nvcc users. To get to see some compiler internals for cuda, it feels like Christmas. A big thanks from me for all the upstreaming effort!
Can anyone comment on the following quote:
The list below shows some of the more important optimizations for GPUs... A few of them have not been upstreamed due to lack of a customizable target-independent optimization pipeline.
So the LLVM version of gpucc will be incomplete? Will there be a release of the original stand-alone gpucc?