feat(reporting): report meta-data information about chunks. #557

qkaiser · 2023-04-14T12:17:40Z

Allow handlers to provide a dict value as part of a ValidChunk metadata attribute. That dictionary can contain any relevant metadata information from the perspective of the handler, but we advise handler writers to report parsed information such as header values.

This metadata dict is later reported as part of our ChunkReports and available in the JSON report file if the user requested one.

The idea is to expose metadata to further analysis steps through the unblob report. For example, a binary analysis toolkit would read the load address and architecture from a uImage chunk to analyze the file extracted from that chunk with the right settings.

A note on the 'as_dict' implementation.

The initial idea was to implement it in dissect.cstruct (see fox-it/dissect.cstruct#29), but due to expected changes in the project's API I chose to implement it in unblob so we're not dependent on another project.

Related to #16 and initial discussion in #16 (comment)

You can observe the changes like this:

poetry run unblob -vvv -e /tmp/out -f --report /tmp/report.json tests/integration/archive/sevenzip/__input__/cherry.7z

cat /tmp/report.json| jq '.[0].reports[-1].metadata'
{
  "magic": "7z��'\u001c",
  "version_maj": 0,
  "version_min": 4,
  "crc": 3845252377,
  "next_header_offset": 147,
  "next_header_size": 33,
  "next_header_crc": 2089891445
}

unblob/models.py

unblob/report.py

unblob/file_utils.py

qkaiser · 2023-04-18T15:56:19Z

@e3krisztian implemented the changes we talked about and introduced a test.

unblob/models.py

martonilles · 2023-04-18T20:27:32Z

unblob/report.py

@@ -181,6 +181,7 @@ class ChunkReport(Report):
 end_offset: int
 size: int
 is_encrypted: bool
+ metadata: dict = attr.ib(factory=dict)


just wondering if we want to validate metadata dict, do we want to enforce that key is a string and value is of a certain type, or we are ok we anything, even nested meta-data?

Maybe it would be great to somehow have a "namespace" or at least some convention on metadata variable naming?

What if we want to push data from multiple headers, or permissions, etc?

I'm ok with enforcing a convention on metadata variable naming. Having a namespace would be too complicated since we can't foresee the metadata field names used by handlers.

I would enforce that metadata is a dict without nested data, keys must be strings and values must be base types.

I would convey information about files created (timestamps, permissions, owner) with something different since it involves way more complex structures.

Added a validator. See 99944e3

martonilles · 2023-04-18T20:28:22Z

unblob/handlers/archive/sevenzip.py

@@ -70,4 +70,6 @@ def calculate_chunk(self, file: File, start_offset: int) -> Optional[ValidChunk]
 # We read the signature header here to get the offset to the header database
 first_db_header = start_offset + len(header) + header.next_header_offset
 end_offset = first_db_header + header.next_header_size
- return ValidChunk(start_offset=start_offset, end_offset=end_offset)
+ return ValidChunk(
+ start_offset=start_offset, end_offset=end_offset, metadata=header


do we want to pass all attributes from the header as metadata?

This point came up when discussing with @e3krisztian yesterday. I think it's better to only pass the most relevant header attributes rather than the whole instance.

See 75548d2

Allow handlers to provide a dict value as part of a ValidChunk metadata attribute. That dictionnary can contain any relevant metadata information from the perspective of the handler, but we advise handler writers to report parsed information such as header values. This metadata dict is later reported as part of our ChunkReports and available in the JSON report file if the user requested one. The idea is to expose metadata to further analysis steps through the unblob report. For example, a binary analysis toolkit would read the load address and architecture from a uImage chunk to analyze the file extracted from that chunk with the right settings. A note on the 'as_dict' implementation. The initial idea was to implement it in dissect.cstruct (see fox-it/dissect.cstruct#29), but due to expected changes in the project's API I chose to implement it in unblob so we're not dependent on another project.

qkaiser added the enhancement New feature or request label Apr 14, 2023

qkaiser self-assigned this Apr 14, 2023

qkaiser force-pushed the 16-metadata-reporting branch from 0b150ba to 1cc1169 Compare April 14, 2023 14:13

e3krisztian requested changes Apr 17, 2023

View reviewed changes

unblob/models.py Outdated Show resolved Hide resolved

unblob/report.py Outdated Show resolved Hide resolved

unblob/file_utils.py Outdated Show resolved Hide resolved

qkaiser force-pushed the 16-metadata-reporting branch 2 times, most recently from cfffb33 to 2ed0237 Compare April 18, 2023 15:40

e3krisztian reviewed Apr 18, 2023

View reviewed changes

unblob/models.py Outdated Show resolved Hide resolved

martonilles reviewed Apr 18, 2023

View reviewed changes

qkaiser force-pushed the 16-metadata-reporting branch 3 times, most recently from 5a7bf12 to 443985f Compare May 2, 2023 08:58

qkaiser force-pushed the 16-metadata-reporting branch from 443985f to 77cb778 Compare August 16, 2023 12:16

qkaiser mentioned this pull request Dec 24, 2023

Improve unblob "skip-extraction" mode of operation #692

Merged

qkaiser force-pushed the 16-metadata-reporting branch from 77cb778 to 0f5d9f2 Compare December 24, 2023 09:44

qkaiser force-pushed the 16-metadata-reporting branch 2 times, most recently from 601f123 to 6ef2737 Compare January 4, 2024 15:03

qkaiser force-pushed the 16-metadata-reporting branch from 6ef2737 to a312492 Compare January 20, 2024 16:49

qkaiser force-pushed the 16-metadata-reporting branch from a312492 to ef6e981 Compare February 4, 2024 10:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(reporting): report meta-data information about chunks. #557

feat(reporting): report meta-data information about chunks. #557

qkaiser commented Apr 14, 2023

qkaiser commented Apr 18, 2023

martonilles Apr 18, 2023

martonilles Apr 18, 2023

qkaiser Apr 19, 2023

qkaiser Apr 25, 2023

martonilles Apr 18, 2023

qkaiser Apr 19, 2023

qkaiser Apr 25, 2023

feat(reporting): report meta-data information about chunks. #557

Are you sure you want to change the base?

feat(reporting): report meta-data information about chunks. #557

Conversation

qkaiser commented Apr 14, 2023

qkaiser commented Apr 18, 2023

martonilles Apr 18, 2023

Choose a reason for hiding this comment

martonilles Apr 18, 2023

Choose a reason for hiding this comment

qkaiser Apr 19, 2023

Choose a reason for hiding this comment

qkaiser Apr 25, 2023

Choose a reason for hiding this comment

martonilles Apr 18, 2023

Choose a reason for hiding this comment

qkaiser Apr 19, 2023

Choose a reason for hiding this comment

qkaiser Apr 25, 2023

Choose a reason for hiding this comment