Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for CWE 252 #451

Merged
merged 26 commits into from
Apr 8, 2024
Merged

Check for CWE 252 #451

merged 26 commits into from
Apr 8, 2024

Conversation

vobst
Copy link
Collaborator

@vobst vobst commented Mar 18, 2024

This PR tracks the progress on the addition of a check for CWE252. It is based on the Taint Analysis abstractions introduced by PR #450. While in draft state, it will be rebased and force-pushed without notice!

Use the new Taint Analysis abstraction to implement a check for CWE252. The check performs a limited interprocedural analysis.

TODO

  • make it compile
  • propagate memory objects
  • settle on taint model
    • one taint and many passes vs. many taints and one pass (Decision: one taint and many passes. Reason: generic taint is something for its own PR)
  • build initial list of "must check" functions:
    • libc
    • Linux
  • tune propagation/warning behavior to balance FP vs FN
  • add unit tests
  • add integration/acceptance tests

@vobst vobst mentioned this pull request Mar 18, 2024
10 tasks
@vobst vobst force-pushed the cwe_252 branch 4 times, most recently from f751c48 to 2a1c4ec Compare March 26, 2024 19:24
@vobst vobst marked this pull request as ready for review March 26, 2024 19:32
@vobst vobst requested a review from Enkelmann March 26, 2024 19:33
@Enkelmann
Copy link
Contributor

In the short term, the huge list of symbol names in lkm_config.json is probably OK. In the long term, we still need to find a better solution, especially since the list may go out of sync with newer Linux kernel versions over time. Maybe provide or document a way how to extract the function list from Linux kernel sources? On the other hand, that would be more inconvenient for users, because they would need to do it before running the cwe_checker...

@vobst
Copy link
Collaborator Author

vobst commented Mar 27, 2024

I am also not super happy with the huge list in the repo, maybe we can start with a trimmed down list, e.g., only API functions that are defined by the core kernel (~4k).

Maybe provide or document a way how to extract the function list from Linux kernel sources? On the other hand, that would be more inconvenient for users, because they would need to do it before running the cwe_checker...

Currently the process I use to generate this list is not something I'd consider ready for publication. Maybe one can do something more elegant but that would be a bigger project.

In the short term, the huge list of symbol names in lkm_config.json is probably OK. In the long term, we still need to find a better solution, especially since the list may go out of sync with newer Linux kernel versions over time.

The list getting out-of-sync isn't that much of a problem though - in the end we want to be able to analyze modules that were compiled for all sorts of kernel versions. Thus, I guess in the long term we should periodically regenerate the list, but take the union with the previous one.

Copy link
Contributor

@Enkelmann Enkelmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good, I only found minor issues.

But we still need tests! Most importantly, we need an acceptance test.

src/cwe_checker_lib/src/checkers/cwe_252.rs Outdated Show resolved Hide resolved
src/cwe_checker_lib/src/checkers/cwe_252.rs Show resolved Hide resolved
src/cwe_checker_lib/src/checkers/cwe_252/context.rs Outdated Show resolved Hide resolved
src/cwe_checker_lib/src/checkers/cwe_252/context.rs Outdated Show resolved Hide resolved
src/cwe_checker_lib/src/checkers/cwe_252/context.rs Outdated Show resolved Hide resolved
src/cwe_checker_lib/src/checkers/cwe_252/context.rs Outdated Show resolved Hide resolved
@Enkelmann
Copy link
Contributor

I am also not super happy with the huge list in the repo, maybe we can start with a trimmed down list, e.g., only API functions that are defined by the core kernel (~4k).

I think that would be a good compromise. 4k is at least a lot less than 15k.

Currently the process I use to generate this list is not something I'd consider ready for publication. Maybe one can do something more elegant but that would be a bigger project.

Then let us postpone it for the time being and stick to using the resulting list in this PR. If it remains a bigger project, we could also think about moving it into a repository separate from the core cwe_checker code.

The list getting out-of-sync isn't that much of a problem though - in the end we want to be able to analyze modules that were compiled for all sorts of kernel versions. Thus, I guess in the long term we should periodically regenerate the list, but take the union with the previous one.

Good point!

@vobst
Copy link
Collaborator Author

vobst commented Apr 2, 2024

I am also not super happy with the huge list in the repo, maybe we can start with a trimmed down list, e.g., only API functions that are defined by the core kernel (~4k).

I think that would be a good compromise. 4k is at least a lot less than 15k.

Turns out there is warn_unused_result, changed the config to include only the ~600 functions that have it.

@vobst vobst requested a review from Enkelmann April 2, 2024 15:36
@vobst vobst changed the title WIP: Check for CWE 252 Check for CWE 252 Apr 2, 2024
test/artificial_samples/SConstruct Outdated Show resolved Hide resolved
test/artificial_samples/SConstruct Outdated Show resolved Hide resolved
test/lkm_samples/cwe_252.c Show resolved Hide resolved
test/artificial_samples/cwe_252.c Outdated Show resolved Hide resolved
test/src/lib.rs Outdated Show resolved Hide resolved
test/src/lib.rs Outdated Show resolved Hide resolved
@vobst
Copy link
Collaborator Author

vobst commented Apr 4, 2024

Force push only affects the commit that rebases with master ... nothing relevant to the review.

lib/abstract_domain: add merge_with method to AbstractDomain has a commit message with further reasoning on the change. Please check if it makes sense or if I missed something.

src/cwe_checker_lib/src/abstract_domain/mod.rs Outdated Show resolved Hide resolved
src/cwe_checker_lib/src/abstract_domain/mod.rs Outdated Show resolved Hide resolved
test/lkm_samples/cwe_252.c Show resolved Hide resolved
Copy link
Contributor

@Enkelmann Enkelmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for the MemoryTaintMergeStrategy everything looks fine. At least I think so, because I had to change my review process due to the force-push.

There is still the question on whether we should use -O0 or -O2 for the acceptance tests. In the end I will follow your decision here if you want to keep -O2. There also might be some old review comments that I have not yet marked as resolved. If some of them are resolved and I just missed it in the review, please tell me!

@vobst
Copy link
Collaborator Author

vobst commented Apr 5, 2024

Except for the MemoryTaintMergeStrategy everything looks fine.

This one should be fixed now.

At least I think so, because I had to change my review process due to the force-push.

Yea, sorry, didn't expect it would throw GH off that much.

There is still the question on whether we should use -O0 or -O2 for the acceptance tests. In the end I will follow your decision here if you want to keep -O2.

I've gone with O0 now since it means that more tests work (in the sense of reporting 9/9). I developed the tests with 02 in mind so I cannot tell if they still work as expected, only that the expected nr of warnings comes out in the end. I guess we will find out once we have more fine grained filtering that can also verify addresses and reasons.

I also modified one of the interprocedural test cases to cover the case where an object from the caller has to be merged with an offset. Seems to work as expected.

There also might be some old review comments that I have not yet marked as resolved. If some of them are resolved and I just missed it in the review, please tell me!

I think all open conversions should be resolved now, but double check for yourself.

Have a nice weekend!

Copy link
Contributor

@Enkelmann Enkelmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing left to nitpick for me, PR can be merged. Thanks for the great PR!

@vobst
Copy link
Collaborator Author

vobst commented Apr 8, 2024

Nothing left to nitpick for me, PR can be merged. Thanks for the great PR!

Thanks for taking the time to review this PR and all of the explanations along the way!

I'd now rearrange the commits a bit to make them more suitable for the permanent commit log, i.e., write proper commit messages and combine/reorder things that belong together. This will not change the content of any commit. Then I'd also add one commit that updates the changelog for the current release in the end. Sounds OK?

Valentin Obst added 6 commits April 8, 2024 10:37
Add a callback that allows a taint analysis to hook into the fixpoint
computation when some transfer function maps its input state(s) to the
empty state.

For some analyses this event may be a sink, e.g., for CWE252, while for
many analyses it does not make sense to propagate empty states further
since it is impossible to generate a non-empty state from them; they
may use this hook to optimize resource usage.

Signed-off-by: Valentin Obst <[email protected]>
Break the transition function for `ExternCallStub` edges up into two
parts. This allows analyses that are only interested in handling calls
to library functions to do so in a more convenient way. Reduces
boilerplate code and makes sure they can not forget to call
`handle_empty_state_out`.

Signed-off-by: Valentin Obst <[email protected]>
Seed the check for CWE252 with a list of all libc functions that are
annotated with the compiler attribute `warn_unused_result` in glibc.

Does not include functions that indicate a failure by returning a NULL
pointer since those are handled in the check for CWE476.

Signed-off-by: Valentin Obst <[email protected]>
Enables the CWE252 check for LKMs and seeds it with all functions in the
module API that are annotated with the `warn_unused_result` compiler
attribute.

Signed-off-by: Valentin Obst <[email protected]>
Valentin Obst added 20 commits April 8, 2024 10:37
Many types implement a custom JSON serialization method for
internal debugging purposes.

Add an abstraction for this pattern in form of the `ToJsonCompact` trait.
This enables all types to benefit from the default implementation of a
printing method and makes it easier to use generic programming.

This commit does not convert any existing types that implement this
behavior. They are expected to be converted in an ongoing process.

Signed-off-by: Valentin Obst <[email protected]>
Add a method to obtain the information how to translate abstract
identifiers from the callee to the caller context given the result of a
VSA.

Signed-off-by: Valentin Obst <[email protected]>
Remove the limitation that `update_return_callee` can not be used to
propagate memory taint from the callee to the caller. Do so by renaming
abstract identifiers in the default implementation of `update_return` in
the `TaintAnalysis` trait. In particular, implementers of
`update_return_callee` do not have to case about renaming and can return
the abstract identifiers of the callee context.

Adjust CWE252 to make use of this new feature.

Signed-off-by: Valentin Obst <[email protected]>
The `merge` method always produces a new, owned value. This might
be undesirable in situations where it is possible to modify an
existing value in-place.

Add a new method that allows an abstract domain to provide a method to
merge one object into another in-place.

It is a common pattern to see something like this:

```
*mut_ref = mut_ref.merge(other_ref);
```

where `mut_ref` is a mutable reference to a type that implements
`AbstractDomain`. Note that it is common that such types are just
wrappers around an `Arc` to an inner type that is expensive to clone.
However, while cloning of one of the refs in `merge` may be cheap, it
means that then there are >= 2 references to the underlying `Arc`, which
means that it can never be cheaply modified, i.e., the retuned owned
value will usually involve an expensive clone. However:

```
mut_ref.merge_with(other_ref);
```

can potentially do a cheap modification of the underlying `Arc`.

Due to the default implementation it should always be OK to replace the
first pattern with the second in generic code, i.e., it never decreases
performance and can only increase it if the type provides an optimized
implementation.

Signed-off-by: Valentin Obst <[email protected]>
…tegy`, implement `merge_with`

Add a `merge_map_with` method to the `MapMergeStrategy` and provide a
default implementation in terms of it for the `merge_map` method.

Use `merge_map_with` to provide an optimized implementation of
`merge_with` for `DomainMap`.

Convert all existing implementations of `MapMergeStrategy` to implement
`merge_map_with` instead.

The rationale is change similar to the one detailed in Commit("
7c0ffbe lib/abstract_domain: add `merge_with` method to
`AbstractDomain`").

Signed-off-by: Valentin Obst <[email protected]>
In the original implementation of memory taint propagation in
Commit("e7c25f7 lib/analysis/taint: propagate memory taint") it was
overlooked that we already have an abstraction for maps into abstract
domains that are abstract domains themselves.

Use the `DomainMap` abstraction for the register and memory taint maps
that make up the `State` type.

Signed-off-by: Valentin Obst <[email protected]>
This facilitates method chaining. Added to make the API more flexible,
even though there are no users of it at the moment.

Signed-off-by: Valentin Obst <[email protected]>
Currently we only overwrite memory taint if the PI result for the
target address is very exact.

Weaken the conditions under which we overwrite taint information by
allowing possibly constant or top values for the target address
as long as the target memory object and offset are unique.

This may lead to taint being overwritten with non-tainted values in more
cases.

Signed-off-by: Valentin Obst <[email protected]>
We gain 4 arch-compiler pairs and loose 2 by making this change.

Tests were developed with `-O2` in mind so they might not work as
expected, despite reporting the correct number of warnings. I only
verified correctness manually for `aarch64-clang`.

Signed-off-by: Valentin Obst <[email protected]>
Add instructions how to interpret the acceptance tests for CWE252 to the
source file.

Signed-off-by: Valentin Obst <[email protected]>
Modify existing test case such that it also covers the case where a
memory object from the callee must be merged into an object in the
caller with a non-zero offset.

Signed-off-by: Valentin Obst <[email protected]>
@vobst
Copy link
Collaborator Author

vobst commented Apr 8, 2024

I have cleaned up the history and will merge it once the CI passes.

Is there an unwritten rule that the CHANGES.md must be onliners? I'd be adding the first multiline entry here :)

@Enkelmann
Copy link
Contributor

Is there an unwritten rule that the CHANGES.md must be onliners? I'd be adding the first multiline entry here :)

No, that was just a habit. If you want to preserve the style, keep entries short (maybe 2 lines max?) and use the description texts of the referenced PRs for more detailed descriptions. But the style itself is only personal preference, you can change it up if you prefer a different style for the CHANGES.md.

@vobst
Copy link
Collaborator Author

vobst commented Apr 8, 2024

Is there an unwritten rule that the CHANGES.md must be onliners? I'd be adding the first multiline entry here :)

No, that was just a habit. If you want to preserve the style, keep entries short (maybe 2 lines max?) and use the description texts of the referenced PRs for more detailed descriptions. But the style itself is only personal preference, you can change it up if you prefer a different style for the CHANGES.md.

Okay, just wanted to be sure. It still renders as one line in the UI, looks like the limit there is about 120 chars and I tend to wrap at 80.

@vobst vobst merged commit e1838be into fkie-cad:master Apr 8, 2024
6 checks passed
@vobst vobst mentioned this pull request Apr 11, 2024
6 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants