Skip to content
This repository has been archived by the owner on Sep 2, 2023. It is now read-only.

Add instructions for approximate floating-point reciprocals #140

Open
mbitsnbites opened this issue Mar 25, 2022 · 0 comments
Open

Add instructions for approximate floating-point reciprocals #140

mbitsnbites opened this issue Mar 25, 2022 · 0 comments

Comments

@mbitsnbites
Copy link
Member

mbitsnbites commented Mar 25, 2022

Reciprocal approximations

The following instructions are of special interest, and could make it into the B format encoding space:

  • FRECPE - Floating-point reciprocal estimate (estimate 1.0/x)
  • FRSQRTE - Floating-point reciprocal square root estimate (estimate 1.0/sqrt(x))

The main question is how accurate the approximation should be. A common choice in other ISAs appears to be ~8 bits (the CRAY-1 produced 30 bits, though). We could use different levels of accuracy for different floating-point formats.

Here are a few different possibilities:

Approximation NR steps for float32 NR steps for float16 NR steps for float8
8 2 1 0
11 2 0 0
12-13 1 0 0

For the instructions to be valuable, they should offer a reasonable improvement over the classic hack:

float approximate_rsqrt(float x) {
    auto i = std::bit_cast<uint32_t>(x);
    i = 0x5f3759df - (i >> 1);
    return std::bit_cast<float>(i);
}

...which is very cheap on MRISC32 (only 4 integer instructions without any latencies):

        ldi     r2, #0x5f3759df  ; ldi + or
        lsr     r1, r1, #1
        sub     r1, r2, r1

Reciprocal improvements

The approximation can be improved using Newton-Raphson iterations. These iterations can be implemented using regular floating-point arithmetic operations, but it's also possible to provide dedicated instructions that does the majority of the work of an iteration in a single instruction.

The main advantages of such instructions are:

  • Reduce the latency of consecutive, dependent NR operations.
  • Support 1/0 = Inf (during NR Inf*0 must result in Inf rather than NaN, which would have been the case without a dedicated instruction).
  • Possibly improved accuracy as the instruction performs a fused multiply-add.

These instructions would have to go into the format A encoding space.

Reciprocal

One NR step for improving y = 1 / x is: y = y * (2 - x * y).

A suitable instruction (based on FMA) should implement 2 - a * b.

Reciprocal square root

One NR step for improving y = 1 / sqrt(x) is: y = y * (1.5 - (0.5 * x * y * y)).

A suitable instruction (based on FMA) should implement (3 - a * b) / 2.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant