You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 2, 2023. It is now read-only.
The main question is how accurate the approximation should be. A common choice in other ISAs appears to be ~8 bits (the CRAY-1 produced 30 bits, though). We could use different levels of accuracy for different floating-point formats.
Here are a few different possibilities:
Approximation
NR steps for float32
NR steps for float16
NR steps for float8
8
2
1
0
11
2
0
0
12-13
1
0
0
For the instructions to be valuable, they should offer a reasonable improvement over the classic hack:
floatapproximate_rsqrt(float x) {
auto i = std::bit_cast<uint32_t>(x);
i = 0x5f3759df - (i >> 1);
return std::bit_cast<float>(i);
}
...which is very cheap on MRISC32 (only 4 integer instructions without any latencies):
ldi r2, #0x5f3759df ; ldi + or
lsr r1, r1, #1
sub r1, r2, r1
Reciprocal improvements
The approximation can be improved using Newton-Raphson iterations. These iterations can be implemented using regular floating-point arithmetic operations, but it's also possible to provide dedicated instructions that does the majority of the work of an iteration in a single instruction.
The main advantages of such instructions are:
Reduce the latency of consecutive, dependent NR operations.
Support 1/0 = Inf (during NR Inf*0 must result in Inf rather than NaN, which would have been the case without a dedicated instruction).
Possibly improved accuracy as the instruction performs a fused multiply-add.
These instructions would have to go into the format A encoding space.
Reciprocal
One NR step for improving y = 1 / x is: y = y * (2 - x * y).
A suitable instruction (based on FMA) should implement 2 - a * b.
Reciprocal square root
One NR step for improving y = 1 / sqrt(x) is: y = y * (1.5 - (0.5 * x * y * y)).
A suitable instruction (based on FMA) should implement (3 - a * b) / 2.
The text was updated successfully, but these errors were encountered:
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Reciprocal approximations
The following instructions are of special interest, and could make it into the B format encoding space:
The main question is how accurate the approximation should be. A common choice in other ISAs appears to be ~8 bits (the CRAY-1 produced 30 bits, though). We could use different levels of accuracy for different floating-point formats.
Here are a few different possibilities:
For the instructions to be valuable, they should offer a reasonable improvement over the classic hack:
...which is very cheap on MRISC32 (only 4 integer instructions without any latencies):
Reciprocal improvements
The approximation can be improved using Newton-Raphson iterations. These iterations can be implemented using regular floating-point arithmetic operations, but it's also possible to provide dedicated instructions that does the majority of the work of an iteration in a single instruction.
The main advantages of such instructions are:
Inf*0
must result inInf
rather thanNaN
, which would have been the case without a dedicated instruction).These instructions would have to go into the format A encoding space.
Reciprocal
One NR step for improving
y = 1 / x
is:y = y * (2 - x * y)
.A suitable instruction (based on FMA) should implement
2 - a * b
.Reciprocal square root
One NR step for improving
y = 1 / sqrt(x)
is:y = y * (1.5 - (0.5 * x * y * y))
.A suitable instruction (based on FMA) should implement
(3 - a * b) / 2
.The text was updated successfully, but these errors were encountered: