Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Carry #1111: specs-go/config: add Landlock LSM support #1241

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

Zheaoli
Copy link

@Zheaoli Zheaoli commented Jan 2, 2024

Carry #1111

@Zheaoli
Copy link
Author

Zheaoli commented Jan 2, 2024

cc @l0kod @AkihiroSuda @vbatts @gnoack @cyphar

@AkihiroSuda AkihiroSuda added this to the v1.2.0 (tentative) milestone Jan 2, 2024
@AkihiroSuda
Copy link
Member

cc @kailun-qin (author of #1111) PTAL

@AkihiroSuda
Copy link
Member

Before merging this PR, can we have a POC implementation of runc (or crun, youki) to confirm the design?

@Zheaoli
Copy link
Author

Zheaoli commented Jan 9, 2024

opencontainers/runc#3194 POC is here. Still need polish some code and add tests for it.

I will carry it

@@ -340,6 +340,26 @@ For Linux-based systems, the `process` object supports the following process-spe

* **`class`** (string, REQUIRED) specifies the I/O scheduling class. Possible values are `IOPRIO_CLASS_RT`, `IOPRIO_CLASS_BE`, and `IOPRIO_CLASS_IDLE`.
* **`priority`** (int, REQUIRED) specifies the priority level within the class. The value should be an integer ranging from 0 (highest) to 7 (lowest).
* **`landlock`** (object, OPTIONAL) specifies the Landlock unprivileged access control settings for the container process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this PR should be carefully updated following the latest advances of landlock (rather than simply carried) as some of the spec proposals might be stale now. Pls see https://docs.kernel.org/userspace-api/landlock.html and https://github.com/landlock-lsm/go-landlock for details.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you got time to continue this PR or I can continue carry(

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls go ahead updating this PR. I'll need some time to follow up and dive into the updates of landlock (even for review).

@l0kod
Copy link

l0kod commented Jan 10, 2024

Thanks for working on this!

You might want to take a look at this talk about backward and forward compatibility challenges: https://archive.fosdem.org/2023/schedule/event/rust_backward_and_forward_compatibility_for_security_features/

FYI, I plan to review these changes next week.

@gnoack
Copy link

gnoack commented Jan 10, 2024

Thanks for working on this!

You might want to take a look at this talk about backward and forward compatibility challenges: https://archive.fosdem.org/2023/schedule/event/rust_backward_and_forward_compatibility_for_security_features/

FYI, I plan to review these changes next week.

FWIW the most common implementation problem so far has been the use of the "refer" right, whose logic unfortunately works slightly differently than for other access rights. I tried to explain this in https://blog.gnoack.org/post/landlock-best-effort/ last year.

With Linux 6.7, Landlock also has gained support for restricting the use of bind(2) and connect(2) for TCP. (Landlock ABI version 4)

@Zheaoli
Copy link
Author

Zheaoli commented Jan 10, 2024

FYI, I plan to review these changes next week.

Thanks for the important information, I will update this spec in this week ASAP I can

@utam0k
Copy link
Member

utam0k commented Jan 11, 2024

I wonder what the behavior should be when a new process(e.g. docker exec) jumps into the existing container.

@Zheaoli Zheaoli marked this pull request as draft February 27, 2024 16:33
Linux kernel 5.13 adds support for Landlock Linux Security Module (LSM).
This allows unprivileged processes to create safe security sandboxes
that can securely restrict the ambient rights (e.g. global filesystem
access) for themselves.

opencontainers#1110

Co-authored-by: Zheao Li <[email protected]>
Signed-off-by: Kailun Qin <[email protected]>
@Zheaoli Zheaoli marked this pull request as ready for review March 1, 2024 03:23
@Zheaoli
Copy link
Author

Zheaoli commented Mar 1, 2024

I have update the PR, PTAL @kailun-qin @gnoack @l0kod

@Zheaoli Zheaoli requested a review from kailun-qin March 1, 2024 03:24
// LandlockFSAction used to specify the FS actions that are handled by a ruleset or allowed by a rule.
type LandlockFSAction string

// Define actions on files and directories that Landlock can restrict a sandboxed process to.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation nit: The word "define" is usually unnecessary in docstrings (all of the exported symbols are defining something after all)

Ports []string `json:"ports,omitempty" platform:"linux"`
}

// LandlockFSAction used to specify the FS actions that are handled by a ruleset or allowed by a rule.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"LandlockFSAction specifies the FS actions that are handled by a ruleset or allowed by a rule."

("used to specify" is also past tense, which is a bit confusing to read)

// PathBeneath specifies the file-hierarchy typed rules.
PathBeneath []LandlockRulePathBeneath `json:"pathBeneath,omitempty" platform:"linux"`
// PortBeneath specifies the network-socket typed rules.
PortBeneath []LandlockRulePortBeneath `json:"portBeneath,omitempty" platform:"linux"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two types of rules that exist in Landlock are "path beneath" and "net port", in the Linux headers. I would encourage you to stay consistent with that naming, as it will make it easier for users to map your API to the Landlock documentation.

The word "beneath" is referring to the hierarchical nature of file systems -- when an access right is granted on a directory, it also applies to the files beneath that directory. The word "beneath" is inappropriate when used together with ports, as we have no such hierarchy there.

DisableBestEffort bool `json:"disableBestEffort,omitempty" platform:"linux"`
}

// LandlockRuleset identifies a set of rules (i.e., actions on objects) that need to be handled.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation nit: "The Ruleset identifies a set of rules" is a trivial statement which does not provide much additional value, IMHO.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, I think the naming of the "Landlock" and "LandlockRuleset" types is a bit backwards: In Landlock, the "ruleset" (which is a file descriptor under the hood), is actually the thing where users add their "path beneath" and "net port" rules. (Whereas, in this proposal, the "Ruleset" type does not actually hold the rules...? %-))

The "handled access rights" are just the information needed when creating the initial (empty) ruleset. If the kernel, the struct with the access rights is called the "ruleset attributes", but you might as well call it "handled access rights" or something like that, if you have the flexibility in your code to change that name later on.

"$ref": "#/definitions/LandlockRulePathBeneath"
}
},
"ArrayOfLandlockRulePortBeneaths": {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment on the other file about the word "beneath" when talking about network ports

The `ruleset` currently contains the following types:
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset.
If no rule explicitly allow them, they should then be forbidden.
* **`handledAssessNetwork`** (array of strings, OPTIONAL) is an array of NETWORK typed actions that are handled by a ruleset. (The NETWORK typed actions are avaliable when the ABI version >= 4. the behavior of the NETWORK typed actions is not used when the ABI version is less than 4 will depend on the **`disableBestEffort`**)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure you meant "Access" here?

The `ruleset` currently contains the following types:
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset.
If no rule explicitly allow them, they should then be forbidden.
* **`handledAssessNetwork`** (array of strings, OPTIONAL) is an array of NETWORK typed actions that are handled by a ruleset. (The NETWORK typed actions are avaliable when the ABI version >= 4. the behavior of the NETWORK typed actions is not used when the ABI version is less than 4 will depend on the **`disableBestEffort`**)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(The sentences in parenthesis have broken grammar)

1. bind
2. connect
* **`ports`** (array of strings, OPTIONAL) is an array of network ports to restrict.
* **`disableBestEffort`** (bool, OPTIONAL) the `disableBestEffort` field disables the best-effort security approach for Landlock access rights.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(This is slightly confusing in my mind that the boolean condition has the "inverted" logic here. I assume that in this config format, it is not possible to call this variable "bestEffortFallback" and to make true the default value instead?)

@gnoack
Copy link

gnoack commented Mar 1, 2024

High level remark: Looks good so far. I added a bunch of documentation nits.

As for Landlock-specific comments, I would recommend to re-think some of the type names, and I left comments in these places.

I think the bigger chance for mistakes is in the implementation itself. - Are you planning to add that in the same PR, or will that come separately?

@Zheaoli
Copy link
Author

Zheaoli commented Mar 1, 2024

I think the bigger chance for mistakes is in the implementation itself. - Are you planning to add that in the same PR, or will that come separately?

I think it would be great to implement it in a single PR(Actually, I don't know which is better to split the PR. Depend on the ABI version or else)

Copy link
Contributor

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

opencontainers/runc#3194 POC is here. Still need polish some code and add tests for it.

I will carry it

As there are no tests there, can you share the example config you used to test it?

I'm curious about overlayfs and the behavior in old runtimes.

@@ -340,6 +340,52 @@ For Linux-based systems, the `process` object supports the following process-spe

* **`class`** (string, REQUIRED) specifies the I/O scheduling class. Possible values are `IOPRIO_CLASS_RT`, `IOPRIO_CLASS_BE`, and `IOPRIO_CLASS_IDLE`.
* **`priority`** (int, REQUIRED) specifies the priority level within the class. The value should be an integer ranging from 0 (highest) to 7 (lowest).
* **`landlock`** (object, OPTIONAL) specifies the Landlock unprivileged access control settings for the container process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if the runtime doesn't know this field? I guess in runc it will be completely ignored, right? In that case, I think we at least need to expose landlock support via the features command too.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the runtime will ignore the field. I think it's not necessary to add feature command like other optional features.

Copy link
Contributor

@rata rata Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zheaoli why not? This is a security feature, I can see use cases where upper layers want to enforce this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friendly ping?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for replying late. I misunderstand something before. I think it's necessary to export the feature we support for landlock in features command

Comment on lines +350 to +351
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset.
If no rule explicitly allow them, they should then be forbidden.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How does this work on overlayfs? https://docs.kernel.org/userspace-api/landlock.html#bind-mounts-and-overlayfs

It is very widely used, can you clarify what works and what doesn't?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I'm not testing it on overlayfs yet. But I think it still work on overlayfs. Because we will always target to restrict the rule on the merged hierarchy.

Copy link
Author

@Zheaoli Zheaoli Mar 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you want to figure out is it working on the following circumstance?

  1. We got directories A and B, A and B will merged into C
  2. We give the process write permission for A and B
  3. The process still have write permission for C or not?

Copy link
Contributor

@rata rata Mar 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understandig of the link is, if C is the merged dir by overlayfs, then no, you don't have permissions to write even if you can write to A and B.

But I think if we use C for the rules, it might work. I don't know if we also need permissions in A and B (not 100% clear to me when reading the doc), in which case it will be tricky.

IMHO, it will be nice to understand that, as overlay is a widely used fs for containers. But not sure that will change something on how we support it in OCI, so maybe it is okay?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://github.com/Zheaoli/py-landlock/blob/main/examples/demo.py

I have some overlayfs test case here. PTAL if you got time

Copy link
Contributor

@rata rata May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zheaoli I can try to download the python modules and run that locally. But it would be way simpler if you just share what works and what doesn't :)

At the OCI layer we don't know the lowerdir nor the upperdir, we just know the merged dir. It will be great to find if we apply a landlock policy restricting the merged dir, what effect that has on reading files AND also writing files.

Knowing if it works, or its limitations, is key to evaluate a PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the confusion. Here's the full test result:

  1. A and B are lower dirs, and C is a merged dirs.
  2. Policy to C would no longer affect A and B. The reverse action would be the same(policy on A and B would not affect C).
  3. The write and read (or any permission) on C is not depend on the permission on A and B. The permission between C and AB is independent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zheaoli thanks! This LGTM.

I was playing with this, to make sure I'm not missing any case and indeed it seems to work just fine.

To play further, I used the example from: https://github.com/landlock-lsm/go-landlock/blob/efb66220540a9ef86aa0160d15e55f429d5b94d9/examples/go-landlock-configurable/main.go

With this json config:

JSON config:
{
  "forbidden_access": [
    "read_dir",
    "write_file",
    "make_reg"
  ],
  "exceptions": [
    {
      "paths": [
        "./rootfs-merged/tmp"
      ],
      "permitted_access": [
        "read_dir",
        "make_reg",
        "write_file"
      ]
    }
  ],
  "best_effort": false
}

And rootfs-merged is an overlayfs that I mounted manually. I tried it works as expected (can't create files on any paths except on ./rootfs-merged/tmp/, can't ls/read files except in that path), also if I chroot into that (chroot rootfs-merged /bin/sh).

The files are created by overlayfs in the upper dir and all is as expected.

@AkihiroSuda AkihiroSuda requested a review from rata May 17, 2024 17:44
Copy link
Contributor

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a cursory look, I need to look later at what field are exactly proposed. But this seems in the right direction to me.

Added some comments/issues that I see.

1. bind
2. connect
* **`ports`** (array of strings, OPTIONAL) is an array of network ports to restrict.
* **`disableBestEffort`** (bool, OPTIONAL) the `disableBestEffort` field disables the best-effort security approach for Landlock access rights.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can't completely work, sadly, due to some historic reasons. I've mentioned this also here: #1241 (comment)

The thing is, if the container runtime doesn't know about landlock, the whole landlock section will be silently ignored. So, you might think you will have this landlock policy or an error, but the truth is, if the runtime doesn't know about landlock, you can end up without a landlock policy at all.

For this reason, we should expose this in the features section too. This way, users of the OCI runtime can know if the runtime knows about landlock and handle that case properly. Without exposing it there, it is not really usable for this security features.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, we can remove the disableBestEffort field and export the features we support in features command instead

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can do boths, right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can output the feature check like this ?

{
    "linux": {
        "landlock": {
            "enabled": true,
            "bestEffort": {
                "enabled": true // False
            },
            "fs":{
                "enabled": true,
                "actions": [
                    "exectute",
                    "write_file",
                    "read_file",
                    "read_dir",
                    "remove_dir",
                    "remove_file",
                    "make_char",
                    "make_dir",
                    "make_reg",
                    "make_sock",
                    "make_fifo",
                    "make_block",
                    "make_sym",
                    "refer",
                    "truncate"
                ]
            },
            "network": {
                "enabled": true,
                "actions": [
                    "bind",
                    "connect"
                ]
            }
        }
    }
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to expose all of that. I think just the landlock object with the enabled field will be enough.

Then, if the runtime doesn't recognize some of the constants or if the kernel ABI is not the one needed, then make the runtime fail?

Comment on lines +350 to +351
* **`handledAccessFS`** (array of strings, OPTIONAL) is an array of FS typed actions that are handled by a ruleset.
If no rule explicitly allow them, they should then be forbidden.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zheaoli thanks! This LGTM.

I was playing with this, to make sure I'm not missing any case and indeed it seems to work just fine.

To play further, I used the example from: https://github.com/landlock-lsm/go-landlock/blob/efb66220540a9ef86aa0160d15e55f429d5b94d9/examples/go-landlock-configurable/main.go

With this json config:

JSON config:
{
  "forbidden_access": [
    "read_dir",
    "write_file",
    "make_reg"
  ],
  "exceptions": [
    {
      "paths": [
        "./rootfs-merged/tmp"
      ],
      "permitted_access": [
        "read_dir",
        "make_reg",
        "write_file"
      ]
    }
  ],
  "best_effort": false
}

And rootfs-merged is an overlayfs that I mounted manually. I tried it works as expected (can't create files on any paths except on ./rootfs-merged/tmp/, can't ls/read files except in that path), also if I chroot into that (chroot rootfs-merged /bin/sh).

The files are created by overlayfs in the upper dir and all is as expected.

3. ABI version >= 3:
1. truncate
* **`paths`** (array of strings, OPTIONAL) is an array of files or parent directories of the file hierarchies to restrict.
* **`portBeneath`** (array of objects, OPTIONAL) is an array of the network-hierarchy typed rules.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

portBeneath? Is this for all ports below the number? Or is this just a c&p mistake?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it's for all ports below the number. Actually I follow style with pathBeneath

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, okay, thanks. I'm in an off-site this week, so I'll check this up next week.

Copy link
Contributor

@rata rata Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't make sense out of that.

If I want to have a rule for port 21, 22, 53 and 80, it seems complicated to have a portBeneath rule. For filesystems that makes a lot of sense, because is all beneath this (i.e. subdirs and everything below that), this is applied. It's in the nature of fs.

But what does it mean for ports? It seems like a complicated way to express it. Am I missing something?

Oh, it seems @gnoack already commented on this. I fully agree: https://github.com/opencontainers/runtime-spec/pull/1241/files#r1508607132

@@ -340,6 +340,52 @@ For Linux-based systems, the `process` object supports the following process-spe

* **`class`** (string, REQUIRED) specifies the I/O scheduling class. Possible values are `IOPRIO_CLASS_RT`, `IOPRIO_CLASS_BE`, and `IOPRIO_CLASS_IDLE`.
* **`priority`** (int, REQUIRED) specifies the priority level within the class. The value should be an integer ranging from 0 (highest) to 7 (lowest).
* **`landlock`** (object, OPTIONAL) specifies the Landlock unprivileged access control settings for the container process.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

friendly ping?

Copy link
Contributor

@rata rata left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Zheaoli thanks for the PR! Added some comments.

I think the config.json interface needs more work, but I'd love for this to be ready.

I read @gnoack comments, that weren't addressed, and I fully agree with them. Please take a look their review comments.

3. ABI version >= 3:
1. truncate
* **`paths`** (array of strings, OPTIONAL) is an array of files or parent directories of the file hierarchies to restrict.
* **`portBeneath`** (array of objects, OPTIONAL) is an array of the network-hierarchy typed rules.
Copy link
Contributor

@rata rata Jun 11, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't make sense out of that.

If I want to have a rule for port 21, 22, 53 and 80, it seems complicated to have a portBeneath rule. For filesystems that makes a lot of sense, because is all beneath this (i.e. subdirs and everything below that), this is applied. It's in the nature of fs.

But what does it mean for ports? It seems like a complicated way to express it. Am I missing something?

Oh, it seems @gnoack already commented on this. I fully agree: https://github.com/opencontainers/runtime-spec/pull/1241/files#r1508607132

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants