Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fetch chunk metadata #2152

Open
wants to merge 12 commits into
base: dev
Choose a base branch
from
Open

Conversation

CptGibbon
Copy link
Collaborator

This PR builds on #2082 by adding a fetch_chunk_metadata() function to ptmalloc.py

It can be used to fetch all of a chunk's metadata given its address, or just fetch specific fields. It returns a dictionary of ChunkField : int. It also handles resolution of chunk fields that have had different names over the years e.g. "prev_size" vs. "mchunk_prev_size".

I know using a ChunkField enum looks over-engineered, but I think it has some benefits:

  1. Developers don't need to care about those renamed chunk fields, they can just use ChunkField.SIZE instead of knowing whether they need "size" or "mchunk_size"
  2. Minimal changes on future field renames, e.g. if "fd" becomes "mchunk_fd", developers just keep using ChunkField.FD
  3. Developers don't need to look up chunk field names in the glibc source code, they can just check the enum

Screenshot of the function in action:
Screenshot 2024-05-08 at 14 21 24

The next step is to integrate fetch_chunk_metadata() into the Chunk class, which will hopefully remove any direct references to the gdb module. Testing should be a bit easier after that, you could create fake chunks for testing by building a dictionary or mocking return values of calls to fetch_chunk_metadata()

pwndbg/heap/ptmalloc.py Outdated Show resolved Hide resolved
pwndbg/heap/ptmalloc.py Outdated Show resolved Hide resolved
Comment on lines 153 to 165
for field in include_only_fields:
if field is ChunkField.PREV_SIZE:
requested_fields.add(prev_size_field_name)
elif field is ChunkField.SIZE:
requested_fields.add(size_field_name)
elif field is ChunkField.FD:
requested_fields.add("fd")
elif field is ChunkField.BK:
requested_fields.add("bk")
elif field is ChunkField.FD_NEXTSIZE:
requested_fields.add("fd_nextsize")
elif field is ChunkField.BK_NEXTSIZE:
requested_fields.add("bk_nextsize")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this might actually be a good time to use a dictionary, mapping ChunkField.PREV_SIZE to prev_size_field_name, and so on. Then the for loop becomes simple and easy to read, and we can use the same dictionary in the next for loop, simplifying that code as well and make it less error prone.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dictionary was my first thought, but I knew @disconnect3d would say "this dictionary is created on every function call" and I'd have to turn it into an if/elif statement 😅
So I'll let you two pick which way you think is best.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need an enum? :P

It feels like without it we could just do include_only_fields=include_only_fields below and we seem to do the same job twice (note: we don't even validate if one passed an invalid value in include_only_fields).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My justification for the Enum is in the description old chap ☝️

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the dictionary, my thought process is usually if it simplifies more than one if-statement it's worth it, otherwise usually an if-statement is better. And in this case it would simplify two.

If the issue is about constructing the dictionary inside the function (I personally don't mind), it could be constructed outside, and the size/prev_size fields could be set inside the function the first time its run. Alternatively, you can create a second function that essentially does what a dictionary would do, taking a key and returning a value, but I honestly don't know how a function call compares to creating a dictionary in terms of performance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the enum, if we don't devs to have to think too much about it, we could also just accept all names for a parameter, i.e. size and mchunk_size. The downsides of that are mainly that this function gets a bit more complex, and that it may not be clear to the caller what's actually going to happen if the field was renamed. With an enum, it's more clear that we'll handle the field names for the caller.

I'm fine either way, will leave it to @disconnect3d.

@disconnect3d
Copy link
Member

Needs fixing conflicts :<

@CptGibbon
Copy link
Collaborator Author

So if I keep the import statement for typing.Set the linter complains, and if I remove it it complains...
What am I missing?

Thanks to @gsingh93 for figuring out what I'd done wrong here: capitalized "Set" must be used for compatibility with Python 3.8
@CptGibbon
Copy link
Collaborator Author

@disconnect3d Conflicts resolved & linter is happy 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants