Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add return address stack location to context backtrace #1288 #1740

Open
wants to merge 9 commits into
base: dev
Choose a base branch
from
10 changes: 7 additions & 3 deletions pwndbg/commands/context.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,6 @@
"backtrace",
[
ColorParamSpec("prefix", "none", "color for prefix of current backtrace label"),
ColorParamSpec("address", "none", "color for backtrace (address)"),
ColorParamSpec("symbol", "none", "color for backtrace (symbol)"),
disconnect3d marked this conversation as resolved.
Show resolved Hide resolved
ColorParamSpec("frame-label", "none", "color for backtrace (frame label)"),
],
Expand Down Expand Up @@ -802,10 +801,15 @@ def context_backtrace(with_banner=True, target=sys.stdout, width=None):
while True:
prefix = bt_prefix if frame == this_frame else " " * len(bt_prefix)
prefix = f" {c.prefix(prefix)}"
addrsz = c.address(pwndbg.ui.addrsz(frame.pc()))
symbol = c.symbol(pwndbg.gdblib.symbol.get(int(frame.pc())))
frame_pc = frame.pc()
frame_pc_on_stack = pwndbg.gdblib.stack.find_addr_on_stack(frame_pc)
padded_text = "0x" + hex(frame_pc_on_stack)[2:].zfill(pwndbg.gdblib.arch.ptrsize * 2)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hex(...) returns a 0x... string, so we don't need to [2:] it and prepend a 0x string

Also nitpick, but we fetch and compute pwndbg.gdblib.arch.ptrsize * 2 in each loop iteration :<

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hex(1234).zfill(8)
'0000x4d2'

i was trying to avoid this - perhaps theres a way to zfill an already 0x prefixed string?

@disconnect3d

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will hoist the computation out of the loop

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this zfill. Like, do we really want to display those zeroes? :P They are redundant

stackaddr = M.get(frame_pc_on_stack, padded_text)
addrsz = "(" + stackaddr + ") " + M.get(frame_pc)
dmur1 marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use f-string literals here and wherever we can?

symbol = c.symbol(pwndbg.gdblib.symbol.get(int(frame_pc)))
if symbol:
addrsz = addrsz + " " + symbol

line = map(str, (prefix, c.frame_label("%s%i" % (backtrace_frame_label, i)), addrsz))
line = " ".join(line)
result.append(line)
Expand Down
2 changes: 1 addition & 1 deletion pwndbg/commands/stack.py
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ def retaddr() -> None:
# Find all of them on the stack
start = stack.vaddr
stop = start + stack.memsz
while addresses and start < sp < stop:
while addresses and start <= sp < stop:
dmur1 marked this conversation as resolved.
Show resolved Hide resolved
value = pwndbg.gdblib.memory.u(sp)

if value in addresses:
Expand Down
24 changes: 24 additions & 0 deletions pwndbg/gdblib/stack.py
Original file line number Diff line number Diff line change
Expand Up @@ -136,3 +136,27 @@ def is_executable() -> bool:
nx = True

return not nx


def find_addr_on_stack(addr) -> int:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a bit worried that this may be (way) too slow.

  • We are reading the whole stack memory: what if stack is super big?
  • What if SP does not point to mapped memory? This will crash
  • This can be optimized out in two ways:
  1. We could leverage the existing search functionality we have which will do a find using GDB's API which is written in C and so should be faster
  2. We could read the whole stack memory and use simple .find(...) to find the offset of the bytes of the addr value

The 2) will rather be slower than 1) but if we read the stack memory just once, then we can find the fetched bytes multiple times to look for different return addresses --- which would probably be faster overall?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given this I am not sure if we really want to look for those values on the stack... we are making context slower this way. We can however make this configurable.

@gsingh93 thoughts?

"""
Scans the stack looking to see if it contains the supplied address.

If the supplied address is found the location on the stack is returned or 0 if it cannot be found.
"""
stackaddr = 0

sp = pwndbg.gdblib.regs.sp
stack = pwndbg.gdblib.vmmap.find(sp)
start = stack.vaddr
stop = start + stack.memsz
ptrsize = pwndbg.gdblib.arch.ptrsize

while start <= sp < stop:
value = pwndbg.gdblib.memory.u(sp)
if value == addr:
stackaddr = sp
break
sp += ptrsize

return stackaddr
10 changes: 4 additions & 6 deletions tests/gdb-tests/tests/test_context_commands.py
Original file line number Diff line number Diff line change
Expand Up @@ -201,21 +201,19 @@ def test_context_backtrace_show_proper_symbol_names(start_binary):
== "─────────────────────────────────[ BACKTRACE ]──────────────────────────────────"
)

assert re.match(r".*0 0x[0-9a-f]+ A::foo\(int, int\)", backtrace[2])
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how important is it to tightly fit this regex to the entire output in a test that is mainly of proper symbol names?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more accurate we are the less regressions we will have :). I am fine with more relaxed assertions: its better to have some tests than none and its hard to test for everything.

assert re.match(r".* A::foo\(int, int\)", backtrace[2])

# Match A::call_foo()+38 or similar: the offset may change so we match \d+ at the end
assert re.match(r".*1 0x[0-9a-f]+ A::call_foo\(\)\+\d+", backtrace[3])
assert re.match(r".* A::call_foo\(\)\+\d+", backtrace[3])

# Match main+87 or similar offset
assert re.match(r".*2 0x[0-9a-f]+ main\+\d+", backtrace[4])
assert re.match(r".* main\+\d+", backtrace[4])

# Match __libc_start_main+243 or similar offset
# Note: on Ubuntu 22.04 there will be __libc_start_call_main and then __libc_start_main
# but on older distros there will be only __libc_start_main
# Let's not bother too much about it and make it the last call assertion here
assert re.match(
r".*3 0x[0-9a-f]+ (__libc_start_main|__libc_start_call_main)\+\d+", backtrace[5]
)
assert re.match(r".* (__libc_start_main|__libc_start_call_main)\+\d+", backtrace[5])

assert (
backtrace[-2]
Expand Down