Uh, the ALU implements a DIV instruction straight up at the hardware level? Is this normal to have as a real instruction in something like a modern CUDA core or is DIV usually a software emulation instead? Because actual hardware divide circuits take up a ton a space and I wouldn't have expected them in a GPU ALU.
It's so easy to write "DIV: begin alu_out_reg <= rs / rt; end" in your verilog but that one line takes a lotta silicon. But the person simulating this might not never see that if all they do is simulate the verilog.
It's so easy to write "DIV: begin alu_out_reg <= rs / rt; end" in your verilog but that one line takes a lotta silicon. But the person simulating this might not never see that if all they do is simulate the verilog.