Forget about Xilinx marketing. I am curious, what did you find causes a design fall outside the '20%' usecase situation? Are you talking about asynchronous clocks? feedback? IO configurations? or what? Why do you say HLS should not be based on C++? Is this related to concurrency or something else? I am not disagreeing with you necessarily. I would say that C++ has the good/bad quality that it is possible to express the same thing 20 different ways. Also, the behavioral simulation is vastly faster. Given that a design has to be done in a limited amount of time realistically, there is an advantage to being able to iterate rapidly and make many structural changes to optimize a large design for both area and clock frequency. Only trivial designs or very specific blocks would be hand placed. I want the compiler to do register re-balancing and other optimizations. The same way that almost no-one could beat the performance of a modern C compiler by typing in machine code. Definitely Vivado is not there yet, but it should be.
There are many examples: a pci-express bus or an application optimized ddr controller or a full tcp/ip stack or a caching/prefetch system or any advanced processor with feedback .... these kinds systems require precise control.
It can be done but all the advantages of HLS are gone. The code is filled with a ton of pragmas that make the code unreadable and a lot longer than the VHDL or SV equivalent.
Register-rebalancing (other companies call it retiming) is a very old technique. You can do it with SV & VHDL, just add delays & the synthesizer will know what to do. Vivado has caught up with the solutions from Altera but there are better (more expensive) synthesizers that easily beat both, the have supported this feature for at least 15 years.
Uh, yes, that is all true (except maybe the processor with feedback bit is debatable..). I agree with all this, and yet my arguments for why C++ HLS is a good thing remain the same.