Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Show prevalence of rules in the output #1737

Open
wants to merge 50 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 35 commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
7603f85
Entropy Methods
Aayush-Goel-04 Jul 29, 2023
f5b38d5
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Aug 2, 2023
bf1f59b
Sort rules in render based on match probability
Aayush-Goel-04 Aug 5, 2023
31bd6b3
Rendering rules into two sections. * for interesting rules.
Aayush-Goel-04 Aug 6, 2023
9ca4f9d
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Aug 7, 2023
78877f2
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Aug 13, 2023
f5f3e87
update
Aayush-Goel-04 Aug 13, 2023
a6797de
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Aug 19, 2023
0b5a326
Update default.py
Aayush-Goel-04 Aug 19, 2023
def2d98
Merge branch 'Aayush-Goel-04/Issue#520' of https://github.com/Aayush-…
Aayush-Goel-04 Aug 19, 2023
039fdbd
Update utils.py
Aayush-Goel-04 Aug 19, 2023
8a0e61b
Merge branch 'master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Aug 19, 2023
f6058b1
Update default.py
Aayush-Goel-04 Aug 19, 2023
dc399c3
Merge branch 'master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Aug 27, 2023
c5302cd
prevalence db update
Aayush-Goel-04 Aug 27, 2023
430bde6
Update default.py
Aayush-Goel-04 Aug 27, 2023
7f1566d
Update capa/render/default.py
Aayush-Goel-04 Aug 28, 2023
24541b6
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Sep 6, 2023
6787555
updated default render
Aayush-Goel-04 Sep 6, 2023
7c84926
Update utils.py
Aayush-Goel-04 Sep 6, 2023
c1f9e72
Revert "Update utils.py"
Aayush-Goel-04 Sep 6, 2023
7d6ec15
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Oct 9, 2023
5c1464c
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Oct 10, 2023
8ede526
Resolving path issues
Aayush-Goel-04 Oct 10, 2023
4476b2c
Update utils.py
Aayush-Goel-04 Oct 10, 2023
6077e99
Update utils.py
Aayush-Goel-04 Oct 10, 2023
bc0d129
Update pyinstaller.spec
Aayush-Goel-04 Oct 16, 2023
12dea73
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Oct 16, 2023
3bce5a9
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Oct 17, 2023
5a0a3a5
Update CHANGELOG.md
Aayush-Goel-04 Oct 20, 2023
e4bb521
Merge branch 'master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Oct 20, 2023
fe4af5c
render output with prevalence for (v) verbose
Aayush-Goel-04 Oct 20, 2023
95bdf5d
Update utils.py
Aayush-Goel-04 Oct 20, 2023
af57da8
Update RuleMetaData with Prevalence
Aayush-Goel-04 Nov 12, 2023
8057a73
Apply suggestions from code review
Aayush-Goel-04 Nov 12, 2023
5102ca1
Imports, Paths, Comments & Exceptions handled
Aayush-Goel-04 Nov 16, 2023
07553a6
Update result_document.py
Aayush-Goel-04 Nov 16, 2023
2c4931d
Update result_document.py
Aayush-Goel-04 Nov 20, 2023
c531a15
Merge branch 'master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Feb 3, 2024
61e7459
Added prevalence to verbose
Aayush-Goel-04 Feb 3, 2024
66d0ab7
linter checks
Aayush-Goel-04 Feb 3, 2024
e3ca32b
Revert "linter checks"
Aayush-Goel-04 Feb 3, 2024
f084040
Update result_document.py
Aayush-Goel-04 Feb 3, 2024
b07d600
Merge branch 'master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Feb 5, 2024
10d2140
Convert database to python files
Aayush-Goel-04 Feb 5, 2024
9bebffc
Lint checks
Aayush-Goel-04 Feb 5, 2024
fa89f44
Delete rules_prevalence.json.gz
Aayush-Goel-04 Feb 25, 2024
d93f135
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Feb 25, 2024
08ea4a9
Merge branch 'mandiant:master' into Aayush-Goel-04/Issue#520
Aayush-Goel-04 Mar 6, 2024
7992b1b
Merge branch 'master' into Aayush-Goel-04/Issue#520
mr-tz Mar 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
- binja: add support for forwarded exports #1646 @xusheng6
- binja: add support for symtab names #1504 @xusheng6
- add com class/interface features #322 @Aayush-goel-04
- Show prevalence of rules in the output #520 @Aayush-Goel-04

### Breaking Changes

Expand Down
Binary file not shown.
58 changes: 45 additions & 13 deletions capa/render/default.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@
# See the License for the specific language governing permissions and limitations under the License.

import collections
from typing import Dict

import tabulate

Expand Down Expand Up @@ -72,36 +73,67 @@ def rec(match: rd.Match):

def render_capabilities(doc: rd.ResultDocument, ostream: StringIO):
"""
render capabilities sorted by:
- prevalence rare -> unknown
- namespace a -> z

Aayush-Goel-04 marked this conversation as resolved.
Show resolved Hide resolved
example::
Aayush-Goel-04 marked this conversation as resolved.
Show resolved Hide resolved

+-------------------------------------------------------+-------------------------------------------------+
| CAPABILITY | NAMESPACE |
|-------------------------------------------------------+-------------------------------------------------|
| check for OutputDebugString error (2 matches) | anti-analysis/anti-debugging/debugger-detection |
| read and send data from client to server | c2/file-transfer |
| ... | ... |
+-------------------------------------------------------+-------------------------------------------------+
+-------------------------------------------------------+-------------------------------------------------+------------+
| CAPABILITY | NAMESPACE | PREVALENCE |
|-------------------------------------------------------+-------------------------------------------------|------------|
| check for OutputDebugString error (2 matches) | anti-analysis/anti-debugging/debugger-detection | rare |
| ... | ... | ... |
|-------------------------------------------------------|-------------------------------------------------|------------|
| read and send data from client to server | c2/file-transfer | common |
| ... | ... | ... |
+-------------------------------------------------------+-------------------------------------------------+------------+
"""
subrule_matches = find_subrule_matches(doc)

rows = []
# seperate rules based on their prevalence
common: Dict[str, str] = {"capability": "", "namespace": "", "prevalence": ""}
had_common = False
rare: Dict[str, str] = {"capability": "", "namespace": "", "prevalence": ""}
had_rare = False

for rule in rutils.capability_rules(doc):
if rule.meta.name in subrule_matches:
# rules that are also matched by other rules should not get rendered by default.
# this cuts down on the amount of output while giving approx the same detail.
# see #224
Aayush-Goel-04 marked this conversation as resolved.
Show resolved Hide resolved
continue

count = len(rule.matches)
if count == 1:
capability = rutils.bold(rule.meta.name)
else:
capability = f"{rutils.bold(rule.meta.name)} ({count} matches)"
rows.append((capability, rule.meta.namespace))

namespace = rule.meta.namespace if rule.meta.namespace is not None else ""
prevalence = rutils.bold(rule.meta.prevalence) if rule.meta.prevalence != "unknown" else "unknown"

if "rare" in prevalence:
rare["capability"] += capability + "\n"
rare["namespace"] += namespace + "\n"
rare["prevalence"] += prevalence + "\n"
had_rare = True
else:
common["capability"] += capability + "\n"
common["namespace"] += namespace + "\n"
common["prevalence"] += prevalence + "\n"
had_common = True

rows = []
if had_rare:
rows.append((rare["capability"], rare["namespace"], rare["prevalence"]))
if had_common:
rows.append((common["capability"], common["namespace"], common["prevalence"]))

if rows:
ostream.write(
tabulate.tabulate(rows, headers=[width("Capability", 50), width("Namespace", 50)], tablefmt="mixed_outline")
tabulate.tabulate(
rows,
headers=[width("Capability", 50), width("Namespace", 50), width("Prevalence", 10)],
tablefmt="mixed_grid",
)
)
ostream.write("\n")
else:
Expand Down
24 changes: 24 additions & 0 deletions capa/render/result_document.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.
import gzip
import json
import datetime
import collections
from typing import Dict, List, Tuple, Union, Literal, Optional
Expand All @@ -22,6 +24,13 @@
from capa.engine import MatchResults
from capa.helpers import assert_never

try:
from functools import lru_cache
except ImportError:
# need to type ignore this due to mypy bug here (duplicate name):
# https://github.com/python/mypy/issues/1153
from backports.functools_lru_cache import lru_cache # type: ignore

Aayush-Goel-04 marked this conversation as resolved.
Show resolved Hide resolved

class FrozenModel(BaseModel):
model_config = ConfigDict(frozen=True, extra="forbid")
Expand Down Expand Up @@ -501,9 +510,23 @@ class MaecMetadata(FrozenModel):
model_config = ConfigDict(frozen=True, populate_by_name=True)


@lru_cache(maxsize=None)
def load_rules_prevalence() -> Dict[str, str]:
CD = Path(__file__).resolve().parent.parent.parent
file = CD / "assets" / "rules_prevalence_data" / "rules_prevalence.json.gz"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use get_default_root()

def get_default_root() -> Path:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using get_default_root works well locally but it cause circular import when being during pyinstaller build.
@williballenthin I suggest moving such functions to capa.helpers.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moving it makes sense (see #1821 also)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mr-tz I suggest we move ahead with proposal 3 in above mentioned PR.
moving below to a new capa.loader or we can move them to capa.helper

has_file_limitation
is_supported_format
is_supported_arch
get_arch
is_supported_os
get_os
is_running_standalone
get_default_root
get_default_signatures
get_workspace
get_extractor
get_file_extractors
get_signatures
get_sample_analysis
collect_metadata
compute_dynamic_layout
compute_static_layout
compute_layout

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds good to me!

if not file.exists():
raise FileNotFoundError(f"File '{file}' not found.")
Aayush-Goel-04 marked this conversation as resolved.
Show resolved Hide resolved
try:
with gzip.open(file, "rb") as gzfile:
return json.loads(gzfile.read().decode("utf-8"))
except Exception as e:
Aayush-Goel-04 marked this conversation as resolved.
Show resolved Hide resolved
raise RuntimeError(f"An error occurred while loading '{file}': {e}")
Aayush-Goel-04 marked this conversation as resolved.
Show resolved Hide resolved


class RuleMetadata(FrozenModel):
name: str
namespace: Optional[str] = None
prevalence: str
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once finalized we'll have to update the proto definition and ResultDocument (rd) files used for testing

Aayush-Goel-04 marked this conversation as resolved.
Show resolved Hide resolved
authors: Tuple[str, ...]
scope: capa.rules.Scope
attack: Tuple[AttackSpec, ...] = Field(alias="att&ck")
Expand All @@ -521,6 +544,7 @@ def from_capa(cls, rule: capa.rules.Rule) -> "RuleMetadata":
return cls(
name=rule.meta.get("name"),
namespace=rule.meta.get("namespace"),
prevalence=load_rules_prevalence().get(rule.meta.get("name"), "unknown"),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the rule prevalence database distributed with capa the library? i think its important that people be able to use capa the library without maintaining this database. so perhaps we want to handle the case of the database not existing here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case database is not present, all rule matches will have prevalence as unknown in the results.
image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can provide a warning if no db is found (in case that's not already there) pointing to one and explaining shortly what it does

authors=rule.meta.get("authors"),
scope=capa.rules.Scope(rule.meta.get("scope")),
attack=tuple(map(AttackSpec.from_str, rule.meta.get("att&ck", []))),
Expand Down
1 change: 0 additions & 1 deletion capa/render/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
# Unless required by applicable law or agreed to in writing, software distributed under the License
# is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and limitations under the License.

import io
from typing import Union, Iterator

Expand Down
3 changes: 3 additions & 0 deletions capa/render/verbose.py
Original file line number Diff line number Diff line change
Expand Up @@ -139,6 +139,9 @@ def render_rules(ostream, doc: rd.ResultDocument):

rows.append((key, v))

prevalence = rutils.bold(rule.meta.prevalence) if rule.meta.prevalence != "unknown" else "unknown"
rows.insert(1, ("prevalence", prevalence))

if rule.meta.scope != capa.rules.FILE_SCOPE:
locations = [m[0] for m in doc.rules[rule.meta.name].matches]
rows.append(("matches", "\n".join(map(format_address, locations))))
Expand Down
3 changes: 3 additions & 0 deletions capa/render/vverbose.py
Original file line number Diff line number Diff line change
Expand Up @@ -305,6 +305,9 @@ def render_rules(ostream, doc: rd.ResultDocument):
# library rules should not have a namespace
rows.append(("namespace", rule.meta.namespace))

prevalence = rutils.bold(rule.meta.prevalence) if rule.meta.prevalence != "unknown" else "unknown"
rows.append(("prevalence", prevalence))

if rule.meta.maec.analysis_conclusion or rule.meta.maec.analysis_conclusion_ov:
rows.append(
(
Expand Down