> For the usual situations where the compiler would use a single conditional move on any other ISA it now needs multiple instructions
Only if you need the full properties of cmove. In many cases it just generates a single Zicond.
While some companies implement a 3R1W integer pipeline and use fusion, others keep the integer side 2R1W.
If you use 2R1W you can get wider issue for the same area, if you have a four issue integer pipeline you may be able to add a fifth integer execition unit for cheaper than moving it to 3R1W, which may give you a higher performance gain.
"3R1W integer pipeline" is kinda ambiguous; I think it'd be extremely-stupid for any core to have all their ALUs be 3R. Much more sane is having ~half be such (if even that), and the rest at 2R.
Or, better yet, have the 3R extra port come from some of the 2R being split up; e.g. for a block of 3×2R1W ALUs, be able to split one up for its read ports, reusing it as 2×3R1W when needed, thereby being able to do 3R1W at 66% the throughput of 2R1W without any extra register ports (i.e. 1.3x throughput benefit of 3R1W over two 2R1W instrs). Probably has some extra costs from scheduling & co needing to handle 3R though.
Only if you need the full properties of cmove. In many cases it just generates a single Zicond.
While some companies implement a 3R1W integer pipeline and use fusion, others keep the integer side 2R1W. If you use 2R1W you can get wider issue for the same area, if you have a four issue integer pipeline you may be able to add a fifth integer execition unit for cheaper than moving it to 3R1W, which may give you a higher performance gain.