Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New person accounts not being added to dynamic groups #2756

Closed
rungmc opened this issue May 7, 2024 · 12 comments · Fixed by #2779
Closed

New person accounts not being added to dynamic groups #2756

rungmc opened this issue May 7, 2024 · 12 comments · Fixed by #2779
Labels
bug Something isn't working

Comments

@rungmc
Copy link

rungmc commented May 7, 2024

I did this

kanidm person create someuser someuser

I expected the following

New user would be automatically added to the dynamic groups idm_all_accounts and idm_all_persons as was the case with previous versions.

Kanidm version details

  • Output of kanidm(d) version: 1.2.0
  • Are you running it in a container? If so, which image/tag?: N/A
  • If not a container, how'd you install it: NixOS module
  • Operating System / Version (On Unix please post the output of uname -a): Linux 6.6.29 1-NixOS SMP PREEMPT_DYNAMIC Sat Apr 27 15:11:44 UTC 2024 x86_64 GNU/Linux

Any other comments

yaleman added a commit to yaleman/kanidm that referenced this issue May 7, 2024
@yaleman yaleman added the bug Something isn't working label May 7, 2024
@Firstyear
Copy link
Member

I can't reproduce:

# KANIDM_INSTANCE=localhost kanidm person create testperson TestPerson
Successfully created display_name="TestPerson" username=testperson
# KANIDM_INSTANCE=localhost kanidm person get testperson
---
class: account
class: memberof
class: object
class: person
displayname: TestPerson
memberof: idm_all_persons@localhost
memberof: idm_all_accounts@localhost
name: testperson
spn: testperson@localhost
uuid: 83b46c2b-220f-4c25-85c9-42167bacec8c

Do you have more details to help here?

@CobaltCause
Copy link

Something weird about my experience with this issue is that the first user I created does appear in the dynamic groups, but none of the subsequent users do. I only ever used Kanidm 1.2.0, so I can't speak to whether this worked before/differently in prior versions.

I don't know if you're familiar enough with NixOS to make sense of this but here's the commit where I set up Kanidm (so it shows all the relevant configuration). Here are some less NixOS-y pieces of information:

Shell session demonstrating the problem
$ kanidm group get idm_all_persons --name idm_admin
---
class: account_policy
class: builtin
class: dyngroup
class: group
class: object
class: system
credential_type_minimum: mfa
description: Builtin IDM dynamic group containing all persons.
dynmember: [email protected]
name: idm_all_persons
spn: [email protected]
uuid: 00000000-0000-0000-0000-000000000035

$ echo "The above is already incorrect, there are more users than just me" > /dev/null
$ kanidm person create test-person "Test Person" --name idm_admin
Successfully created display_name="Test Person" username=test-person
$ kanidm group get idm_all_persons --name idm_admin
---
class: account_policy
class: builtin
class: dyngroup
class: group
class: object
class: system
credential_type_minimum: mfa
description: Builtin IDM dynamic group containing all persons.
dynmember: [email protected]
name: idm_all_persons
spn: [email protected]
uuid: 00000000-0000-0000-0000-000000000035

$ echo "Note that test-person does not appear" > /dev/null
$ kanidm person delete test-person --name idm_admin
success - account delete for test-person: deleted
/etc/kanidm/server.toml
bindaddress = "[::1]:6537"
db_path = "/var/lib/kanidm/kanidm.db"
domain = "kanidm.computer.surgery"
log_level = "info"
origin = "https://kanidm.computer.surgery"
role = "WriteReplica"
tls_chain = "/var/lib/acme/kanidm.computer.surgery/fullchain.pem"
tls_key = "/var/lib/acme/kanidm.computer.surgery/key.pem"
trust_x_forward_for = true

[online_backup]
path = "/var/lib/kanidm/backups"
schedule = "00 22 * * *"
versions = 0
/etc/systemd/system/kanidm.service
# /etc/systemd/system/kanidm.service
[Unit]
After=network.target
Description=kanidm identity management daemon

[Service]
Environment="LOCALE_ARCHIVE=/nix/store/fkrs7z887ql1sicyslai7klhzla9llh3-glibc-locales-2.39-31/lib/locale/locale-archive"
Environment="PATH=/nix/store/hq8765g3p1i7qbargnqli5mn0jpsdbfl-coreutils-9.5/bin:/nix/store/d1v47ybpl5cv9ycffgbfagfhvbvj8xdx-findutils-4.9.0/bin:/nix/store/wy37jk2hirzqzx0666w1849kjdgzdam6-gnugrep-3.11/bin:/nix/store/4cps736z7in3d37qc801lwv9z0ib67ps-gnused-4.9/bin:/nix/store/7kq9hvrhcy8g1v0jd07nr54279asg9vc-systemd-255.4/bin:/nix/store/hq8765g3p1i7qbargnqli5mn0jpsdbfl-coreutils-9.5/sbin:/nix/store/d1v47ybpl5cv9ycffgbfagfhvbvj8xdx-findutils-4.9.0/sbin:/nix/store/wy37jk2hirzqzx0666w1849kjdgzdam6-gnugrep-3.11/sbin:/nix/store/4cps736z7in3d37qc801lwv9z0ib67ps-gnused-4.9/sbin:/nix/store/7kq9hvrhcy8g1v0jd07nr54279asg9vc-systemd-255.4/sbin"
Environment="RUST_LOG=info"
Environment="TZDIR=/nix/store/7lzfzr3xsdgscccnfl9rykiwncwwvhbi-tzdata-2024a/share/zoneinfo"
AmbientCapabilities=CAP_NET_BIND_SERVICE
BindPaths=/run/kanidmd:/run/kanidmd
BindPaths=/var/lib/kanidm/backups
BindReadOnlyPaths=/nix/store
BindReadOnlyPaths=-/etc/resolv.conf
BindReadOnlyPaths=-/etc/nsswitch.conf
BindReadOnlyPaths=-/etc/hosts
BindReadOnlyPaths=-/etc/localtime
BindReadOnlyPaths=/var/lib/acme/kanidm.computer.surgery
CapabilityBoundingSet=CAP_NET_BIND_SERVICE
DeviceAllow=
ExecStart=/nix/store/kzwb7cypqmszpijwa5l7vcqx32fdqf7i-kanidm-1.2.0/bin/kanidmd server -c /nix/store/rk9vhdpljakm7bzfsh19y5x7g4kq69n2-server.toml
Group=kanidm
LockPersonality=true
MemoryDenyWriteExecute=true
NoNewPrivileges=true
PrivateDevices=true
PrivateMounts=true
PrivateNetwork=false
PrivateTmp=true
PrivateUsers=false
ProcSubset=pid
ProtectClock=true
ProtectControlGroups=true
ProtectHome=true
ProtectHostname=true
ProtectKernelLogs=true
ProtectKernelModules=true
ProtectKernelTunables=true
ProtectProc=invisible
RestrictAddressFamilies=AF_INET
RestrictAddressFamilies=AF_INET6
RestrictAddressFamilies=AF_UNIX
RestrictNamespaces=true
RestrictRealtime=true
RestrictSUIDSGID=true
RuntimeDirectory=kanidmd
StateDirectory=kanidm
StateDirectoryMode=0700
SystemCallArchitectures=native
SystemCallFilter=@system-service
SystemCallFilter=~@privileged @resources @setuid @keyring
TemporaryFileSystem=/:ro
User=kanidm

[Install]
WantedBy=multi-user.target

@rungmc
Copy link
Author

rungmc commented May 16, 2024

After messing about with it for way longer than I thought I'd have to, this appears to be an issue that arises if there's ever a change in the certificates being used. So probably anyone using Let's Encrypt certs is going to run into this within 90 days.

Easiest way to reproduce in a test env is just to use kanidmd cert-generate, run the main process, delete the certs, and re-generate.

Doesn't appear to matter if Kanidm is online or offline when the change occurs and it's apparently not limited to NixOS. I've been able to duplicate this problem running the official container via Podman as well (single volume persisting the db, certs, and config).

@yaleman
Copy link
Member

yaleman commented May 16, 2024

After messing about with it for way longer than I thought I'd have to, this appears to be an issue that arises if there's ever a change in the certificates being used. So probably anyone using Let's Encrypt certs is going to run into this within 90 days.

That's... weird and doesn't make any sense - Kanidm doesn't monitor or auto-reload certificates, they're persisted in memory from startup. You'd somehow have to be killing it in a way that silently corrupts the database, which is also... strange.

(for context, I've been running LE certs for years now and it works fine)

@Firstyear
Copy link
Member

Even if you could, every operation is 100% transactional, so a db corruption would mean sqlite corruption which I doubt.

Anyway, please show the output of kanidm person get name where name is one of the affected accounts.

@rungmc
Copy link
Author

rungmc commented May 17, 2024

Here's the output for an affected account:

[root@kanidm:/var/lib/kanidm]# kanidm person get postreset
2024-05-16T23:14:03.437899Z  WARN kanidm_client: verify_ca set to false in client configuration - this may allow network interception of passwords!
---
class: account
class: object
class: person
displayname: THIRD
name: postreset
spn: [email protected]
uuid: e0c8e3a5-878c-4a6e-a8a6-618746cc9327

@Firstyear
Copy link
Member

As discussed, please try image kanidm/server:gh2756. All the needed logging is enabled at the info level.

Collect the logs by:

  • Stopping the server
  • Copy your kanidmd volume as a backup
  • Pull and start with kanidm/server:gh2756
  • Immediately login as idm_admin and then create a new testperson
  • Wait a moment
  • Stop the server

Then post or email me the logs. You may prefer email so you don't have to redact anything.

@Firstyear
Copy link
Member

Are you also using any other third party tools from nix or other users in your setup?

@yaleman
Copy link
Member

yaleman commented May 17, 2024

How are you creating the users? Because ... I can't reproduce it either...

@rungmc
Copy link
Author

rungmc commented May 17, 2024

I'm a little bit floored that this isn't easily reproducible at this point and am heavily questioning my sanity because of that. From what I've seen after I stopped fixating on certs as a potential culprit and swapped over to running the container, the dynamic groups are breaking pretty immediately on restart.

Alright... everything but the kitchen sink follows:

Steps Taken

At this point I'm just focusing on getting the container working since it's the preferred deployment method and is slightly faster to totally destroy and redeploy repeatedly. For the sake of consistency, I basically just threw the evaluation quickstart guide from your docs into a script:

#!/bin/bash

STATEDIR=~/kanidm
DOM=idm.kanidemo.lan
PORT=8443
KANIVER=gh2756
# Working off 1.2.0/latest and gh2756

# Create minimal config file
cat << EOF > ${STATEDIR}/server.toml
bindaddress = "[::]:${PORT}"
db_path = "/data/kanidm.db"

tls_chain = "/data/chain.pem"
tls_key = "/data/key.pem"

domain = "${DOM}"
origin = "https://${DOM}:${PORT}"

log_level = "info"
EOF

# Generate certs
podman run --rm -i -t -v ${STATEDIR}:/data \
  docker.io/kanidm/server:${KANIVER} \
  kanidmd cert-generate

# Run container in background
podman run -d --name kanidm \
  -p ${PORT}:${PORT} \
  -v ${STATEDIR}:/data \
  docker.io/kanidm/server:${KANIVER}

printf "\nGiving container time to load up...\n"
sleep 20s

printf "\nRecovering admin...\n\n"
podman exec -i -t kanidm \
  kanidmd recover-account admin

printf "\nRecovering idm_admin...\n\n"
podman exec -i -t kanidm \
  kanidmd recover-account idm_admin

From there, the following actions are performed manually (throwaway usernames vary):

  1. kanidm login -D idm_admin
  2. kanidm person create workingperson FIRSTuser
  3. kanidm person list - Users created on first run are always added to dynamic groups
  4. podman stop kanidm
  5. podman start kanidm
  6. kanidm person create rebootperson SECONDuser
  7. kanidm person list - Users created after the restart are never added to dynamic groups, person list/person get/group list/group get all show consistent info

That produces these logs with the image specified:

gh2756.log

Additional Context

As I've mentioned elsewhere, this is reproducible on three different machines using multiple methods of deployment. There are really only two factors that are consistent between them:

  1. NixOS Unstable is the bottom layer beneath Podman, LXC, KVM, etc. The only third party tool I use on these installations is Agenix for secrets management, which doesn't interact with Kanidm.
  2. All machines run ZFS.

The following setups have all reliably exhibited this issue (all version 1.2.0):

  1. Running the services.kanidm NixOS module on bare metal (on my server with the "production" instance of Kanidm, using all of my certs and settings).
  2. Firing up a NixOS Unstable LXC container for further testing of the NixOS module (on my workstation, using an exact copy of my certs and settings as well as with generic settings with auto-generated certs).
  3. Deploying the Kanidm container via Podman (multiple variations of the process detailed above).
  4. Running via OpenSUSE Tumbleed in a VM (KVM), installing the Kanidm client via Zypper, and running the server from container (above process).
  5. Running via OpenSUSE Leap 15.5 live system (same as Tumbleweed VM).

The following setups DO NOT exhibit this issue:

  1. Firing up a NixOS Stable LXC container which defaults to the 1.1.0-rc.15 package for the client and pulling the kanidm/server:1.1.0-rc.15 image for the server (above process).

@Firstyear
Copy link
Member

I'm a little bit floored that this isn't easily reproducible at this point and am heavily questioning my sanity because of that. From what I've seen after I stopped fixating on certs as a potential culprit and swapped over to running the container, the dynamic groups are breaking pretty immediately on restart.

For the record, we do believe you that it's a problem, and we really appreciate the time you're putting in to help us here. It's greatly appreciated. We're just as stumped as you that we haven't reproduced it yet.

@Firstyear
Copy link
Member

I have just reproduced locally. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Status: ✅ Done
Development

Successfully merging a pull request may close this issue.

4 participants