Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Decompiler erroneously swapping operators without changing the operand in floating-point comparisons where data is marked constant (x86) #6528

Open
K1TCH opened this issue May 16, 2024 · 1 comment · May be fixed by #6540
Assignees
Labels
Feature: Decompiler Status: Triage Information is being gathered

Comments

@K1TCH
Copy link

K1TCH commented May 16, 2024

Describe the bug
When a float variable is being compared with a double variable using one of the FCOM instructions, if the double is made constant, the comparator will flip directions, rendering the comparison incorrect.

For example:

if (x < const_1.5) {
    return 0;
}

will become

if (1.5 < x) {
    return 0;
}

To Reproduce
Steps to reproduce the behavior:

  1. Load the attached binary (I used x86:LE:32:default)
  2. Make the value at address 0x0 a double
  3. Make the value at address 0x8 a float
  4. Observe the decompiler
  5. Right-click the double in the listing view
  6. Select Data -> Settings
  7. Click 'normal' in the Mutability settings
  8. Click constant
  9. Click OK
  10. Observe the decompiler again. The operands should be flipped, while the operator stays the same.

Expected behavior
I expected the constant to be placed last in the comparison, flipping the operands and operators as necessary to make the statement still be correct.

Screenshots
Original:
image
After marking the double value as constant:
image

Attachments
fcompError

Environment (please complete the following information):

  • OS: Ubuntu 23.04
  • Java Version: openjdk 17.0.8 2023-07-18
  • Ghidra Version: 10.2.3
  • Ghidra Origin: Official Github release

Additional context
I've also seen the bug on Windows 10 with versions 10.3.2, 10.4, 11.0, and 11.0.2

LukeSerne added a commit to LukeSerne/ghidra that referenced this issue May 19, 2024
The decompiler rule `subflow_convert` would sometimes swap the inputs
to the P-Code ops `FLOAT_LESS` and `FLOAT_LESSEQUAL` if the float that
was traced happened to be the second input of the operation, because
the transformed operation had its inputs hardcoded: the traced float
would always be the first input. While this also affected `FLOAT_EQUAL`
and `FLOAT_NOTEQUAL`, it does not matter in those cases, because
swapping the inputs for those operations is still logically equivalent.

Fixes NationalSecurityAgency#6528.
LukeSerne added a commit to LukeSerne/ghidra that referenced this issue May 19, 2024
The decompiler rule `subflow_convert` would sometimes swap the inputs
to the P-Code ops `FLOAT_LESS` and `FLOAT_LESSEQUAL` if the float that
was traced happened to be the second input of the operation, because
the transformed operation had its inputs hardcoded: the traced float
would always be the first input. While this also affected `FLOAT_EQUAL`
and `FLOAT_NOTEQUAL`, it does not matter in those cases, because
swapping the inputs for those operations is still logically equivalent.

Fixes NationalSecurityAgency#6528.
@LukeSerne
Copy link
Contributor

Reproducing on Ghidra 11.0.3

I loaded the attached file as x86:LE:32:default:gcc on Ghidra 11.0.3 (Github release) on Ubuntu 22.04.

Here's the decompilation at step 4 (the double is not constant):
image

Here, the decompiled condition for the case in which 0xffffffff is returned is in fact equivalent to (Y > X) && !NAN(Y) && !NAN(X), as the following derivation shows, with comments between parentheses. It seems the decompiler is not able to simplify this further.

    (byte)(DOUBLE_00000000 < (double)FLOAT_00000008 | (byte)((ushort)((ushort)(NAN(DOUBLE_00000000) || NAN((double)FLOAT_00000008)) << 10) >> 8) | (byte)((ushort)((ushort)(DOUBLE_00000000 == (double)FLOAT_00000008) << 0xe) >> 8)) == 0
 = { '(byte)x' == 'x & 0xFF'  and  let 'X' := '(double)FLOAT_00000008'  and  let 'Y' := 'DOUBLE_00000000' }
    (Y < X | ((ushort)((ushort)(NAN(Y) || NAN(X)) << 10) >> 8) & 0xFF | ((ushort)((ushort)(Y == X) << 14) >> 8) & 0xFF) & 0xFF == 0
 = { 'a | b | c == 0' => '(a == 0) && (b == 0) && (c == 0)' }
    (Y < X) == 0 && (((ushort)((ushort)(NAN(Y) || NAN(X)) << 10) >> 8) & 0xFF == 0) && (((ushort)((ushort)(Y == X) << 14) >> 8) & 0xFF == 0)
 = { ('(a < b) == 0' => '! (a < b)')  and  ('((a << b) >> c)' == 'a << (b - c)' if c < b)  and  removing of unnecessary casts }
    !(Y < X) && (((NAN(Y) || NAN(X)) << 2) == 0) && (((Y == X) << 6) == 0)
 = { 'a << b == 0' => 'a == 0'  and  '(a == b) == 0' => '! (a == b)' }
    !(Y < X) && ((NAN(Y) || NAN(X)) == 0) && !(Y == X)
 = { 'a && b && c' => 'a && c && b' }
    !(Y < X) && !(Y == X) && ((NAN(Y) || NAN(X)) == 0)
 = { ('! (a < b)' => 'a >= b')  and  ('!(a == b)' => 'a != b') }
    (Y >= X) && (Y != X) && ((NAN(Y) || NAN(X)) == 0)
 = { '(a >= b) && (a != b)' => 'a > b' }
    (Y > X) && ((NAN(Y) || NAN(X)) == 0)
 = { '(a || b) == 0' => '!a && !b' }
    (Y > X) && !NAN(Y) && !NAN(X)

And here's the decompilation at step 10 (the double is constant):
image

Here, the decompiled condition for the case in which 0xffffffff is returned is in fact equivalent to (X > 1.5) && !NAN(X), as the following derivation shows, with comments between parentheses. It seems the decompiler is not able to simplify this further.

	(byte)(FLOAT_00000008 < 1.5 | (byte)((ushort)((ushort)NAN(FLOAT_00000008) << 10) >> 8) | (byte)((ushort)((ushort)(FLOAT_00000008 == 1.5) << 0xe) >> 8)) == 0
 = { '(byte)x' == 'x & 0xFF'  and  let 'X' := 'FLOAT_00000008' }
	(X < 1.5 | ((ushort)(NAN(X) << 10) >> 8) & 0xFF | ((ushort)((X == 1.5) << 14) >> 8) & 0xFF) & 0xFF == 0
 = { 'a | b | c == 0' => '(a == 0) && (b == 0) && (c == 0)' } 
	((X < 1.5) == 0) && (((ushort)(NAN(X) << 10) >> 8) & 0xFF == 0) && (((ushort)((X == 1.5) << 14) >> 8) & 0xFF == 0)
 = { ('(a < b) == 0' => '! (a < b)')  and  ('((a << b) >> c)' == 'a << (b - c)' if c < b)  and  removing of unnecessary casts }
	!(X < 1.5) && (NAN(X) << 2 == 0) && ((X == 1.5) << 6 == 0)
 = { 'a << b == 0' => 'a == 0' }
	!(X < 1.5) && !NAN(X) && !(X == 1.5)
 = { 'a && b && c' => 'a && c && b' }
	!(X < 1.5) && !(X == 1.5) && !NAN(X)
 = { ('! (a < b)' => 'a >= b')  and  ('!(a == b)' => 'a != b') }
	(X >= 1.5) && (X != 1.5) && !NAN(X)
 = { '(a >= b) && (a != b)' => 'a > b' }
	(X > 1.5) && !NAN(X)

So, if the DOUBLE_00000000 is not set as constant (and ignoring the possibility that either float might be NaN), the condition is equivalent to (DOUBLE_00000000 > FLOAT_00000008). If the DOUBLE_00000000 is set as constant, the condition is equivalent to (FLOAT_00000008 > DOUBLE_00000000). So, the bug still exists in Ghidra version 11.0.3.

I also created two xml files for debugging the decompilation, one with the double's mutability set to normal, and one with the mutability set to constant:

fcomp_error_normal.xml
fcomp_error_constant.xml

Tracking down the responsible decompiler rule

Loading the "constant" file in DecompVis, we find that the 49th rule application (of rule subflow_convert) flips the order of the operands to the FLOAT_LESS (incorrectly shown as INT_LESS in the below screenshots because #5063 hasn't been merged yet).

Before the application of rule subflow_convert:
image

After the application of rule subflow_convert:
image

Finding and fixing the bug in subflow_convert

The subflow_convert rule is implemented by the class RuleSubfloatConvert, which uses the class SubfloatFlow to do the analysis and data flow transformation. The documentation of this rule mentions:

/// \brief Class for tracing changes of precision in floating point variables
///
/// It follows the flow of a logical lower precision value stored in higher precision locations
/// and then rewrites the data-flow in terms of the lower precision, eliminating the
/// precision conversions.

Looking through the implementation, we find this suspicious piece of code in traceForward (code from latest master at the time of writing):

case CPUI_FLOAT_EQUAL:
case CPUI_FLOAT_NOTEQUAL:
case CPUI_FLOAT_LESS:
case CPUI_FLOAT_LESSEQUAL:
{
int4 slot = op->getSlot(vn);
TransformVar *rvn2 = setReplacement(op->getIn(1-slot));
if (rvn2 == (TransformVar *)0) return false;
if (rvn == rvn2) {
list<PcodeOp *>::const_iterator ourIter = iter;
--ourIter; // Back up one to our original iterator
slot = op->getRepeatSlot(vn, slot, ourIter);
}
if (preexistingGuard(slot, rvn2)) {
TransformOp *rop = newPreexistingOp(2, op->code(), op);
opSetInput(rop, rvn, 0);
opSetInput(rop, rvn2, 1);
terminatorCount += 1;
}
break;
}

On lines 2655 and 2656, a transform operation is created, and rvn and rvn2 are set to use slot 0 and 1 respectively, regardless of what their input slot is in the current operation. Changing this locally to use slot and 1-slot and then recompiling the decompiler, we find that this indeed fixes the issue - the 1.5 constant is on the left of the < operator:

$ ./decomp_dbg
[decomp]> restore /tmp/fcomp_error_constant.xml.txt
/tmp/fcomp_error_constant.xml.txt successfully loaded: Intel/AMD 32-bit x86
[decomp]> lo fu FUN_0000000c
Function FUN_0000000c: 0x0000000c
[decomp]> decompile
Decompiling FUN_0000000c
Decompilation complete
[decomp]> print C

undefined4 FUN_0000000c(void)

{
  undefined4 uVar1;
  
  if ((byte)(1.5 < FLOAT_00000008 | (byte)((ushort)((ushort)NAN(FLOAT_00000008) << 10) >> 8) |
            (byte)((ushort)((ushort)(FLOAT_00000008 == 1.5) << 0xe) >> 8)) == 0) {
    uVar1 = 0xffffffff;
  }
  else {
    uVar1 = 1;
  }
  return uVar1;
}

I created a PR with this change (#6540) that fixes this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: Decompiler Status: Triage Information is being gathered
Projects
None yet
4 participants