Add instructions for approximate floating-point reciprocals #140

mbitsnbites · 2022-03-25T13:41:58Z

Reciprocal approximations

The following instructions are of special interest, and could make it into the B format encoding space:

FRECPE - Floating-point reciprocal estimate (estimate 1.0/x)
FRSQRTE - Floating-point reciprocal square root estimate (estimate 1.0/sqrt(x))

The main question is how accurate the approximation should be. A common choice in other ISAs appears to be ~8 bits (the CRAY-1 produced 30 bits, though). We could use different levels of accuracy for different floating-point formats.

Here are a few different possibilities:

Approximation	NR steps for float32	NR steps for float16
8	2	1
11	2	0
12-13	1	0

For the instructions to be valuable, they should offer a reasonable improvement over the classic hack:

float approximate_rsqrt(float x) {
    auto i = std::bit_cast<uint32_t>(x);
    i = 0x5f3759df - (i >> 1);
    return std::bit_cast<float>(i);
}

...which is very cheap on MRISC32 (only 4 integer instructions without any latencies):

        ldi     r2, #0x5f3759df  ; ldi + or
        lsr     r1, r1, #1
        sub     r1, r2, r1

Reciprocal improvements

The approximation can be improved using Newton-Raphson iterations. These iterations can be implemented using regular floating-point arithmetic operations, but it's also possible to provide dedicated instructions that does the majority of the work of an iteration in a single instruction.

The main advantages of such instructions are:

Reduce the latency of consecutive, dependent NR operations.
Support 1/0 = Inf (during NR Inf*0 must result in Inf rather than NaN, which would have been the case without a dedicated instruction).
Possibly improved accuracy as the instruction performs a fused multiply-add.

These instructions would have to go into the format A encoding space.

Reciprocal

One NR step for improving y = 1 / x is: y = y * (2 - x * y).

A suitable instruction (based on FMA) should implement 2 - a * b.

Reciprocal square root

One NR step for improving y = 1 / sqrt(x) is: y = y * (1.5 - (0.5 * x * y * y)).

A suitable instruction (based on FMA) should implement (3 - a * b) / 2.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add instructions for approximate floating-point reciprocals #140

Add instructions for approximate floating-point reciprocals #140

mbitsnbites commented Mar 25, 2022 •

edited

Add instructions for approximate floating-point reciprocals #140

Add instructions for approximate floating-point reciprocals #140

Comments

mbitsnbites commented Mar 25, 2022 • edited

Reciprocal approximations

Reciprocal improvements

Reciprocal

Reciprocal square root

mbitsnbites commented Mar 25, 2022 •

edited