Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BinExport2 backend #1950

Draft
wants to merge 164 commits into
base: master
Choose a base branch
from
Draft

BinExport2 backend #1950

wants to merge 164 commits into from

Conversation

williballenthin
Copy link
Collaborator

@williballenthin williballenthin commented Jan 26, 2024

closes #1755

don't merge before v7.0.

TODO:

Checklist

  • No CHANGELOG update needed
  • No new tests needed
  • No documentation update needed

williballenthin and others added 30 commits January 25, 2024 11:32
[wip] since there are a few references to BinExport2
that are in progress elsewhre. Next commit will remove them.
@williballenthin williballenthin mentioned this pull request May 14, 2024
7 tasks
@@ -362,16 +362,16 @@ def from_elf(cls, elf: ELFFile):

regions.append(MemoryRegion(segment_rva, segment_data))

return cls(0, tuple(regions))
return cls(base_address, tuple(regions))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@williballenthin let me know if there is a reason we were previously using 0 here for the base address - it was causing read errors that prevented bytes from being extracted properly. I've updated the code to use the base address retrieved from the BinExport file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the existing design, an AddressSpace describes a module (PE/ELF) loaded into its preferred address. For PE, this is image base, and for ELF it's 0x0 because there's no preferred address.

I think it's moderately important for the AddressSpace to depend only on the PE/ELF, to minimize the amount of context/concerns. It might be nice to be able to use the code in another project, for example. The changes you introduced seem to rely on the BinExport data, which I'm initially hesitant of.

It might make more sense for the AddressSpace to use module RVAs everywhere, rather than preferred VAs. shrug

In any case, users of this class need to do a little address arithmetic before they attempt a read. This is expected. Maybe the documentation should be extended.

I do recall I waffled a bit during the implementation of this class, and I'm open to further discussion.

Thoughts?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could introduce a method rebase that shifts an address space from one place to another. Then we can load from a PE/ELF to its preferred location and then rebase to what's found in the BinExport. Since it's only a matter of fixing up the range addresses, it's cheap.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps we could introduce a method rebase that shifts an address space from one place to another. Then we can load from a PE/ELF to its preferred location and then rebase to what's found in the BinExport. Since it's only a matter of fixing up the range addresses, it's cheap.

This sounds like a good option to me. My original change stemmed from capa not doing the arithmetic that you mentioned above before read resulting in read failures for ELF files (and inability to emit bytes features in most cases). Adding a rebase method would remove from AddressSpace users the burden of knowing what arithmetic and when to do it to help reduce bugs like this in the future.

@mike-hunhoff mike-hunhoff added the binexport related to BinExport support label May 29, 2024
Comment on lines +44 to +50
# file/exports
(
"687e79.be2",
"file",
capa.features.file.Export("android::clearDir"),
"xfail: not implemented yet?!",
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't handle name demangling (see #2112)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good, will update the reference in the code - I don't think this is an important feature for now

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💯

Comment on lines +75 to +87
# bb/characteristic(tight loop)
(
"687e79.be2",
"function=0x0",
capa.features.common.Characteristic("tight loop"),
"xfail: not implemented yet",
),
(
"687e79.be2",
"function=0x0",
capa.features.common.Characteristic("tight loop"),
"xfail: not implemented yet",
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mr-tz tight loop is implemented - is this a bug?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't work yet for this sample - I hope it's an easy fix in the capa basic block extractor code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mike-hunhoff and others added 3 commits May 31, 2024 11:51
binexport2_pb.BinExport2 isnt a package so we can't import it like:

    from ...binexport2_pb.BinExport2 import CallGraph
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
binexport related to BinExport support dont merge Indicate a PR that is still being worked on enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

investigate feasibility of a BinExport2 backend
4 participants