Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unable to delete all the accounts in a loop, got internal error while deleting few of them #7920

Open
rkomandu opened this issue Mar 21, 2024 · 28 comments
Assignees
Labels
Milestone

Comments

@rkomandu
Copy link
Collaborator

rkomandu commented Mar 21, 2024

Environment info

  • NooBaa Version: VERSION
  • Platform: Kubernetes 1.14.1 | minikube 1.1.1 | OpenShift 4.1 | other: specify

Standalone Noobaa with ODF 4.15.0 (d/s build ) 0313 --> noobaa-core-5.15.0-20240313.el9.x86_64

Actual behavior

  1. Created accounts and deleting them in a loop
    2
    for i in seq 8100 9000; do mms3 account delete s3user-$i; done

Account s3user-8369 deleted successfully
Account s3user-8370 deleted successfully
Account s3user-8371 deleted successfully
Account s3user-8372 deleted successfully
Account s3user-8373 deleted successfully
Account s3user-8374 deleted successfully
Account s3user-8375 deleted successfully
Account s3user-8376 deleted successfully
Failed to execute command for Account s3user-8377: The server encountered an internal error. Please retry the request --> here
Account s3user-8378 deleted successfully
Account s3user-8379 deleted successfully
Account s3user-8380 deleted successfully
Account s3user-8381 deleted successfully
Account s3user-8382 deleted successfully

[root@rkomandu-ip-cls-x-worker2 ~]# mms3 account list | grep 8377
s3user-8377 /mnt/fs1/s3user-8377-dir 8377 8377

[root@rkomandu-ip-cls-x-worker2 ~]# mms3 account list | grep 8377
s3user-8377 /mnt/fs1/s3user-8377-dir 8377 8377
[root@rkomandu-ip-cls-x-worker2 ~]# mms3 account list | grep 8376
[root@rkomandu-ip-cls-x-worker2 ~]# mms3 account list | grep 8378

[root@rkomandu-ip-cls-x-worker2 ~]# ls -ld /mnt/fs1/s3user-8377-dir
drwxrwx--- 2 8377 8377 4096 Mar 14 10:27 /mnt/fs1/s3user-8377-dir

[root@rkomandu-ip-cls-x-worker2 ~]# ls -ltr /mnt/fs1/s3user-8377-dir
total 0

Account s3user-8522 deleted successfully
Account s3user-8523 deleted successfully
Account s3user-8524 deleted successfully
Account s3user-8525 deleted successfully
Account s3user-8526 deleted successfully
Account s3user-8527 deleted successfully
Account s3user-8528 deleted successfully
Failed to execute command for Account s3user-8529: The server encountered an internal error. Please retry the request --> here
Account s3user-8530 deleted successfully
Account s3user-8531 deleted successfully
Account s3user-8532 deleted successfully
..
Account s3user-8966 deleted successfully
Account s3user-8967 deleted successfully
Account s3user-8968 deleted successfully
Account s3user-8969 deleted successfully
Account s3user-8970 deleted successfully
Account s3user-8971 deleted successfully
Failed to execute command for Account s3user-8972: The server encountered an internal error. Please retry the request --> here
Account s3user-8973 deleted successfully
Account s3user-8974 deleted successfully
Account s3user-8975 deleted successfully
Account s3user-8976 deleted successfully
Failed to execute command for Account s3user-8977: The server encountered an internal error. Please retry the request --> here
Account s3user-8978 deleted successfully

[root@rkomandu-ip-cls-x-worker2 ~]# mms3 account list | grep 8972
s3user-8972 /mnt/fs1/s3user-8972-dir 8972 8972

[root@rkomandu-ip-cls-x-worker2 ~]# mms3 account list | grep 8977
s3user-8977 /mnt/fs1/s3user-8977-dir 8977 8977

we opened defect internally (RTC 327700) , and Ramya figured out that it is due to Noobaa problem.

noobaa_events.log
-------------------------
{"timestamp":"2024-03-19T03:35:46.501712-07:00","message":"{\"code\":\"noobaa_account_deleted\",\"message\":\"Account deleted\",\"description\":\"Noobaa Acc
ount deleted\",\"entity_type\":\"NODE\",\"event_type\":\"INFO\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"arguments\":{\"account\":\"
s3user-8970\"},\"pid\":515114}","host":"rkomandu-ip-cls-x-worker1","severity":"notice","facility":"local2","syslog-tag":"node[515114]:","source":"node"}
{"timestamp":"2024-03-19T03:35:50.833533-07:00","message":"{\"code\":\"noobaa_account_deleted\",\"message\":\"Account deleted\",\"description\":\"Noobaa Account deleted\",\"entity_type\":\"NODE\",\"event_type\":\"INFO\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"arguments\":{\"account\":\"s3user-8971\"},\"pid\":515690}","host":"rkomandu-ip-cls-x-worker1","severity":"notice","facility":"local2","syslog-tag":"node[515690]:","source":"node"}
{"timestamp":"2024-03-19T03:35:58.484860-07:00","message":"{\"code\":\"noobaa_account_deleted\",\"message\":\"Account deleted\",\"description\":\"Noobaa Account deleted\",\"entity_type\":\"NODE\",\"event_type\":\"INFO\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"arguments\":{\"account\":\"s3user-8973\"},\"pid\":516371}","host":"rkomandu-ip-cls-x-worker1","severity":"notice","facility":"local2","syslog-tag":"node[516371]:","source":"node"}
{"timestamp":"2024-03-19T03:36:02.950806-07:00","message":"{\"code\":\"noobaa_account_deleted\",\"message\":\"Account deleted\",\"description\":\"Noobaa Account deleted\",\"entity_type\":\"NODE\",\"event_type\":\"INFO\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"arguments\":{\"account\":\"s3user-8974\"},\"pid\":516690}","host":"rkomandu-ip-cls-x-worker1","severity":"notice","facility":"local2","syslog-tag":"node[516690]:","source":"node"}
{"timestamp":"2024-03-19T03:36:07.077782-07:00","message":"{\"code\":\"noobaa_account_deleted\",\"message\":\"Account deleted\",\"description\":\"Noobaa Account deleted\",\"entity_type\":\"NODE\",\"event_type\":\"INFO\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"arguments\":{\"account\":\"s3user-8975\"},\"pid\":517290}","host":"rkomandu-ip-cls-x-worker1","severity":"notice","facility":"local2","syslog-tag":"node[517290]:","source":"node"}

as you could observe no entry for the s3user-8972 delete request

Expected behavior

It shouldn't show

Steps to reproduce

Create accounts in a loop say 2K and then try to delete, may be along with buckets as well. Try deleting them from the CES nodes

More information - Screenshots / Logs / Other output

@rkomandu rkomandu added the NS-FS label Mar 21, 2024
@naveenpaul1 naveenpaul1 self-assigned this Mar 21, 2024
@naveenpaul1 naveenpaul1 added the Non Containerized Non containerized label Mar 21, 2024
@naveenpaul1
Copy link
Contributor

naveenpaul1 commented Mar 21, 2024

@rkomandu Please run the test after updating the log level, and share the actual error log.
you can search for the staring NSFS Manage command: exit on error

@rkomandu
Copy link
Collaborator Author

yes it is reporting as that only.

Deleting the account 2k in a loop with sleep 10 interval and observed multiple times the Failed to execute command


Account s3user-12394 deleted successfully
Account s3user-12395 deleted successfully
Account s3user-12396 deleted successfully
Account s3user-12397 deleted successfully
Failed to execute command for Account s3user-12398: The server encountered an internal error. Please retry the request
Account s3user-12399 deleted successfully
Failed to execute command for Account s3user-12400: The server encountered an internal error. Please retry the request
Failed to execute command for Account s3user-12401: The server encountered an internal error. Please retry the request
Account s3user-12402 deleted successfully
Account s3user-12403 deleted successfully


Account s3user-12397 deleted successfully
Failed to execute command for Account s3user-12398: The server encountered an internal error. Please retry the request
Account s3user-12399 deleted successfully
Failed to execute command for Account s3user-12400: The server encountered an internal error. Please retry the request
Failed to execute command for Account s3user-12401: The server encountered an internal error. Please retry the request

Now let us look at the noobaa.log for the 12397 and 12399 deletion and for the 12398, it had come as "NSFS Manage command"

in noobaa_event.log

{"timestamp":"2024-03-22T00:26:20.271279-07:00","message":"{\"code\":\"noobaa_account_deleted\",\"message\":\"Account deleted\",\"description\":\"Noobaa Accou
nt deleted\",\"entity_type\":\"NODE\",\"event_type\":\"INFO\",\"scope\":\"NODE\",\"severity\":\"INFO\",\"state\":\"HEALTHY\",\"arguments\":{\"account\":\"s3us
er-12397\"},\"pid\":3271588}","host":"rkomandu-ip-cls-x-worker1","severity":"notice","facility":"local2","syslog-tag":"node[3271588]:","source":"node"}

Now if we check at the above timestamp in the below noobaa.log for the s3user-12397, s3user-12398 and s3user-12399 account deletion


2024-03-22T00:26:20.258310-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: NamespaceFS._open_file: mode=r /mnt/ces-shared-root/ces/s3-config/accounts
2024-03-22T00:26:20.259054-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: NamespaceFS._open_file: mode=r /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json
2024-03-22T00:26:20.259709-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file config_path: /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json is_gpfs: ESC[33mtrueESC[39m
2024-03-22T00:26:20.259808-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file unlinking: /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json is_gpfs= ESC[33mtrueESC[39m
2024-03-22T00:26:20.260664-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: Namespace_fs._delete_version_id unlink: File {} ESC[33m20ESC[39m /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json File {} ESC[33m21ESC[39m
2024-03-22T00:26:20.264526-07:00 rkomandu-ip-cls-x-worker1 node[3271588]: [/3271588] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file done /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12397.json
2024-03-22T00:26:20.464058-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:20.464601-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
...
...
2024-03-22T00:26:20.867639-07:00 rkomandu-ip-cls-x-worker1 node[3062942]: [nsfs/3062942] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:20.867906-07:00 rkomandu-ip-cls-x-worker1 node[3062942]: [nsfs/3062942] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m20971520ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:20.867943-07:00 rkomandu-ip-cls-x-worker1 node[3062942]: [nsfs/3062942] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m33554432ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:20.867968-07:00 rkomandu-ip-cls-x-worker1 node[3062942]: [nsfs/3062942] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m2097152ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:20.956759-07:00 rkomandu-ip-cls-x-worker1 node[3271531]: [/3271531] ESC[36m   [L1]ESC[39m core.cmd.health:: Error: Config file path should be a valid path /mnt/ces-shared-root/ces/s3-config/buckets/bucket-12890.json [Error: No such file or directory] { code: ESC[32m'ENOENT'ESC[39m }
2024-03-22T00:26:21.309702-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m   [L1]ESC[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:21.310118-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:21.310274-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m20971520ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:21.310301-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m33554432ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:21.310324-07:00 rkomandu-ip-cls-x-worker1 node[3062948]: [nsfs/3062948] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m2097152ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:21.493171-07:00 rkomandu-ip-cls-x-worker1 node[3062959]: [nsfs/3062959] ESC[36m   [L1]ESC[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:21.493778-07:00 rkomandu-ip-cls-x-worker1 node[3062959]: [nsfs/3062959] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m

..

2024-03-22T00:26:28.675289-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m  [LOG]^[[39m CONSOLE:: read_rand_seed: got 32 bytes from /dev/random, total 32 ...
2024-03-22T00:26:28.675459-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m  [LOG]^[[39m CONSOLE:: read_rand_seed: closing fd ...
2024-03-22T00:26:28.675954-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m  [LOG]^[[39m CONSOLE:: init_rand_seed: seeding with 32 bytes
2024-03-22T00:26:28.677312-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m   [L1]^[[39m core.cmd.manage_nsfs:: nsfs.check_and_create_config_dirs: config dir exists: /mnt/ces-shared-root/ces/s3-config/buckets
2024-03-22T00:26:28.678024-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m   [L1]^[[39m core.cmd.manage_nsfs:: nsfs.check_and_create_config_dirs: config dir exists: /mnt/ces-shared-root/ces/s3-config/accounts
2024-03-22T00:26:28.678507-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m   [L1]^[[39m core.cmd.manage_nsfs:: nsfs.check_and_create_config_dirs: config dir exists: /mnt/ces-shared-root/ces/s3-config/access_keys
**2024-03-22T00:26:28.944651-07:00 rkomandu-ip-cls-x-worker1 node[3272188]: [/3272188] ^[[36m   [L1]^[[39m core.cmd.manage_nsfs:: NSFS Manage command: exit on error Error: No such file or directory**
2024-03-22T00:26:30.464455-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ^[[36m   [L1]^[[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:30.464796-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ^[[36m   [L1]^[[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ^[[33m184549376^[[39m waiting_value ^[[33m0^[[39m buffers length ^[[33m0^[[39m



2024-03-22T00:26:37.530367-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m  [LOG]ESC[39m CONSOLE:: generate_entropy: adding entropy: dd if=/dev/vda bs=1048576 count=32 skip=255731 | md5sum
2024-03-22T00:26:37.538563-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: NamespaceFS._open_file: mode=r /mnt/ces-shared-root/ces/s3-config/accounts
2024-03-22T00:26:37.539654-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: NamespaceFS._open_file: mode=r /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json
2024-03-22T00:26:37.540691-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file config_path: /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json is_gpfs: ESC[33mtrueESC[39m
2024-03-22T00:26:37.540797-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file unlinking: /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json is_gpfs= ESC[33mtrueESC[39m
2024-03-22T00:26:37.541637-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: Namespace_fs._delete_version_id unlink: File {} ESC[33m21ESC[39m /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json File {} ESC[33m23ESC[39m
2024-03-22T00:26:37.552608-07:00 rkomandu-ip-cls-x-worker1 node[3272625]: [/3272625] ESC[36m   [L1]ESC[39m core.util.native_fs_utils:: native_fs_utils: delete_config_file done /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12399.json
2024-03-22T00:26:40.464133-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.server.bg_services.semaphore_monitor:: semaphore_monitor: START
2024-03-22T00:26:40.464511-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m184549376ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:40.464650-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m20971520ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:40.464676-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/3062940] ESC[36m   [L1]ESC[39m core.util.buffer_utils:: MultiSizeBuffersPool.get_buffers_pool: sem value ESC[33m33554432ESC[39m waiting_value ESC[33m0ESC[39m buffers length ESC[33m0ESC[39m
2024-03-22T00:26:40.464700-07:00 rkomandu-ip-cls-x-worker1 node[3062940]: [nsfs/30629

where as the account is still there

[root@rkomandu-ip-cls-x-worker1 log]# mms3 account list |grep 12398
s3user-12398            /mnt/fs1/s3user-12398-dir                       12398   12398

[root@rkomandu-ip-cls-x-worker1 log]# mms3 account list |grep 12399

[root@rkomandu-ip-cls-x-worker1 log]# ls -ld /mnt/ces-shared-root/ces/s3-config
s3-config/                s3-config-backup.tar.bz2
[root@rkomandu-ip-cls-x-worker1 log]# ls -ld /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12398.json
-rw------- 1 root root 376 Mar 21 09:12 /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12398.json
[root@rkomandu-ip-cls-x-worker1 log]# less /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12398.json
[root@rkomandu-ip-cls-x-worker1 log]# cat /mnt/ces-shared-root/ces/s3-config/accounts/s3user-12398.json
{"name":"s3user-12398","creation_date":"2024-03-21T16:12:04.264Z","access_keys":[{"access_key":"EdOlObT16fevKyqcDIY0","secret_key":"73Iako95fyfdaMvFySHxq//BfimhW3+D6LLG+4rv"}],"nsfs_account_config":{"uid":12398,"gid":12398,"new_buckets_path":"/mnt/fs1/s3user-12398-dir","fs_backend":"GPFS"},"email":"s3user-12398","allow_bucket_creation":true,"_id":"65fc5c54a19a460e1e094ab4"}

noobaa.log is about 700MB,

omandu-ip-cls-x-worker1 log]# ls -lh /var/log/noobaa.log
-rw-r--r-- 1 root root 733M Mar 22 01:37 /var/log/noobaa.log

for now, you can check from above logs @naveenpaul1 . uploading in GH is not possible and box also has this restriction. I need to delete the older noobaa.log content and have for the current day only to see if it reduces any size. However you can continue from above log snippets i think

@guymguym guymguym added this to the 5.15.3 milestone Apr 21, 2024
@guymguym guymguym modified the milestones: 5.15.3, 5.15.4 May 8, 2024
@romayalon romayalon assigned shirady and unassigned naveenpaul1 May 20, 2024
@shirady
Copy link
Contributor

shirady commented May 21, 2024

Hi @rkomandu,

  1. From what I understand you are creating accounts in a loop and then deleting the accounts in a loop.
    In the details of the issue we can see the example of the account s3user-12398 that was not deleted:
    The server encountered an internal error. Please retry the request
    Since it was not deleted, we can see the config file of it.

  2. I would like to add to the printings of the Internal Error in this issue, it might look like this:
    stdout: '{\n "error": {\n "code": "InternalError",\n "message": "The server encountered an internal error. Please retry the request",\n "detail":

  3. I was not able to reproduce this, I created and deleted over 1,000 accounts with sleep of 1,000 milliseconds.
    You can see that I have them all created (and the directory of the accounts config is empty from any config after this).

➜  grep -i AccountCreated run_test_2048.txt | wc -l
    2048
➜  grep -i AccountDeleted run_test_2048.txt | wc -l
    2048

@rkomandu
Copy link
Collaborator Author

it is a recent noobaa 5.15.3 d/s build of 0514


[root@c83f1-app2 ~]# mms3 account delete s3user5001
Failed to execute command for Account s3user5001: The server encountered an internal error. Please retry the request
[root@c83f1-app2 ~]# mms3 account list s3user5001
Failed to list Account s3user5001: Account does not exist
[root@c83f1-app2 ~]# rpm -qa |grep mms3
gpfs.mms3-5.2.1-0.240521.104312.el9.x86_64

[root@c83f1-app2 ~]# mmhealth node show CES S3 |grep s3user5001
S3                 DEGRADED      16 hours ago      s3_access_denied(s3user5001), s3_storage_not_exist(newbucket-5001)
  s3user5001       DEGRADED      16 hours ago      s3_access_denied(s3user5001)
s3_access_denied      s3user5001      WARNING     16 hours ago         Account does not have access to the storage path mentioned in schema.

it shows the account exists though via mmhealth 

@rkomandu
Copy link
Collaborator Author

@shirady

i am deleting all the accounts and buckets via mms3 cli and came across the similar problem as shown above

bucket delete passed , account delete failed with Internal error

Bucket bucket-10335 deleted successfully
Failed to execute command for Account s3user-10335: The server encountered an internal error. Please retry the request
10335 is done

noobaa cli for status and list for the account is shown below

noobaa-cli account status --name s3user-10335
load_nsfs_nc_config.setting config.NSFS_NC_CONF_DIR /ibm/cesSharedRoot/ces/s3-config
nsfs: config_dir_path=/ibm/cesSharedRoot/ces/s3-config config.json= {
  ENDPOINT_PORT: 6001,
  ENDPOINT_SSL_PORT: 6443,
  GPFS_DL_PATH: '/usr/lpp/mmfs/lib/libgpfs.so',
  NOOBAA_LOG_LEVEL: 'default',
  ENDPOINT_FORKS: 2,
  UV_THREADPOOL_SIZE: 16,
  NSFS_CALCULATE_MD5: false,
  ALLOW_HTTP: false,
  NSFS_NC_STORAGE_BACKEND: 'GPFS',
  NSFS_NC_CONFIG_DIR_BACKEND: 'GPFS',
  NSFS_DIR_CACHE_MAX_DIR_SIZE: 536870912,
  NSFS_DIR_CACHE_MAX_TOTAL_SIZE: 1073741824,
  NC_MASTER_KEYS_STORE_TYPE: 'executable',
  NC_MASTER_KEYS_GET_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_get',
  NC_MASTER_KEYS_PUT_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_put'
}
2024-05-23 08:55:13.218253 [PID-3055247/TID-3055247] FS::GPFS GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
2024-05-23 08:55:13.218355 [PID-3055247/TID-3055247] FS::GPFS found GPFS lib file GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
May-23 8:55:13.548 [/3055247]   [LOG] CONSOLE:: detect_fips_mode: found /proc/sys/crypto/fips_enabled with value 0
(node:3055247) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
May-23 8:55:13.579 [/3055247]   [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
May-23 8:55:13.583 [/3055247]   [LOG] CONSOLE:: read_rand_seed: got 32 bytes from /dev/urandom, total 32 ...
May-23 8:55:13.583 [/3055247]   [LOG] CONSOLE:: read_rand_seed: closing fd ...
May-23 8:55:13.583 [/3055247]   [LOG] CONSOLE:: init_rand_seed: seeding with 32 bytes
{
  "response": {
    "code": "AccountStatus",
    "reply": {
      "_id": "664e0aa633201f0668d9ac6b",
      "name": "s3user-10335",
      "email": "s3user-10335",
      "creation_date": "2024-05-22T15:09:26.881Z",
      "nsfs_account_config": {
        "uid": 10335,
        "gid": 10335,
        "new_buckets_path": "/gpfs/remote_fvt_fs/s3user-10335-dir",
        "fs_backend": "GPFS"
      },
      "allow_bucket_creation": true,
      "master_key_id": "664e04d7f8bda475703e483b"
    }
  }
}

]# noobaa-cli account list --name s3user-10335
load_nsfs_nc_config.setting config.NSFS_NC_CONF_DIR /ibm/cesSharedRoot/ces/s3-config
nsfs: config_dir_path=/ibm/cesSharedRoot/ces/s3-config config.json= {
  ENDPOINT_PORT: 6001,
  ENDPOINT_SSL_PORT: 6443,
  GPFS_DL_PATH: '/usr/lpp/mmfs/lib/libgpfs.so',
  NOOBAA_LOG_LEVEL: 'default',
  ENDPOINT_FORKS: 2,
  UV_THREADPOOL_SIZE: 16,
  NSFS_CALCULATE_MD5: false,
  ALLOW_HTTP: false,
  NSFS_NC_STORAGE_BACKEND: 'GPFS',
  NSFS_NC_CONFIG_DIR_BACKEND: 'GPFS',
  NSFS_DIR_CACHE_MAX_DIR_SIZE: 536870912,
  NSFS_DIR_CACHE_MAX_TOTAL_SIZE: 1073741824,
  NC_MASTER_KEYS_STORE_TYPE: 'executable',
  NC_MASTER_KEYS_GET_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_get',
  NC_MASTER_KEYS_PUT_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_put'
}
2024-05-23 08:55:23.493482 [PID-3055620/TID-3055620] FS::GPFS GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
2024-05-23 08:55:23.493593 [PID-3055620/TID-3055620] FS::GPFS found GPFS lib file GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
May-23 8:55:23.825 [/3055620]   [LOG] CONSOLE:: detect_fips_mode: found /proc/sys/crypto/fips_enabled with value 0
(node:3055620) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
May-23 8:55:23.856 [/3055620]   [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
May-23 8:55:23.861 [/3055620]   [LOG] CONSOLE:: read_rand_seed: got 32 bytes from /dev/urandom, total 32 ...
May-23 8:55:23.861 [/3055620]   [LOG] CONSOLE:: read_rand_seed: closing fd ...
May-23 8:55:23.862 [/3055620]   [LOG] CONSOLE:: init_rand_seed: seeding with 32 bytes
May-23 8:55:24.452 [/3055620]    [L0] core.manage_nsfs.nc_master_key_manager:: init_from_exec: get master keys response status=OK, version=12
{
  "error": {
    "code": "InternalError",
    "message": "The server encountered an internal error. Please retry the request",
    "detail": "Error: master key id is missing in master_keys_by_id\n    at NCMasterKeysManager.decryptSync (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:305:59)\n    at NCMasterKeysManager.decrypt (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:294:21)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async /usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:335:29\n    at async Promise.all (index 0)\n    at async NCMasterKeysManager.decrypt_access_keys (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:333:39)\n    at async /usr/local/noobaa-core/src/cmd/manage_nsfs.js:665:58\n    at async Semaphore.surround (/usr/local/noobaa-core/src/util/semaphore.js:71:84)\n    at async Promise.all (index 9)\n    at async list_config_files (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:660:29)"
  }
}

@shirady
Copy link
Contributor

shirady commented May 23, 2024

Thank you @rkomandu, so we can understand from the error attached that the reason for the internal error is related to encryption (master key id is missing in master_keys_by_id), full edited message:

"detail": "Error: master key id is missing in master_keys_by_id
at NCMasterKeysManager.decryptSync (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:305:59)
at NCMasterKeysManager.decrypt (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:294:21)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async /usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:335:29
at async Promise.all (index 0)
at async NCMasterKeysManager.decrypt_access_keys (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:333:39)
at async /usr/local/noobaa-core/src/cmd/manage_nsfs.js:665:58
async Semaphore.surround (/usr/local/noobaa-core/src/util/semaphore.js:71:84)
async Promise.all (index 9)
async list_config_files (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:660:29)"

This internal error printing came from the account list.

I would verify that when manually deleting the account we see the same details in the error (please run account delete on the account that account).
If you can attach this please so we will have the full information.

cc: @romayalon

@rkomandu
Copy link
Collaborator Author

# noobaa-cli account delete --name s3user-10335
load_nsfs_nc_config.setting config.NSFS_NC_CONF_DIR /ibm/cesSharedRoot/ces/s3-config
nsfs: config_dir_path=/ibm/cesSharedRoot/ces/s3-config config.json= {
  ENDPOINT_PORT: 6001,
  ENDPOINT_SSL_PORT: 6443,
  GPFS_DL_PATH: '/usr/lpp/mmfs/lib/libgpfs.so',
  NOOBAA_LOG_LEVEL: 'default',
  ENDPOINT_FORKS: 2,
  UV_THREADPOOL_SIZE: 16,
  NSFS_CALCULATE_MD5: false,
  ALLOW_HTTP: false,
  NSFS_NC_STORAGE_BACKEND: 'GPFS',
  NSFS_NC_CONFIG_DIR_BACKEND: 'GPFS',
  NSFS_DIR_CACHE_MAX_DIR_SIZE: 536870912,
  NSFS_DIR_CACHE_MAX_TOTAL_SIZE: 1073741824,
  NC_MASTER_KEYS_STORE_TYPE: 'executable',
  NC_MASTER_KEYS_GET_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_get',
  NC_MASTER_KEYS_PUT_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_put'
}
2024-05-23 10:19:24.584673 [PID-3183486/TID-3183486] FS::GPFS GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
2024-05-23 10:19:24.584759 [PID-3183486/TID-3183486] FS::GPFS found GPFS lib file GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
May-23 10:19:24.904 [/3183486]   [LOG] CONSOLE:: detect_fips_mode: found /proc/sys/crypto/fips_enabled with value 0
(node:3183486) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use `node --trace-warnings ...` to show where the warning was created)
May-23 10:19:24.933 [/3183486]   [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
May-23 10:19:24.938 [/3183486]   [LOG] CONSOLE:: read_rand_seed: got 32 bytes from /dev/urandom, total 32 ...
May-23 10:19:24.938 [/3183486]   [LOG] CONSOLE:: read_rand_seed: closing fd ...
May-23 10:19:24.938 [/3183486]   [LOG] CONSOLE:: init_rand_seed: seeding with 32 bytes
May-23 10:19:25.531 [/3183486]    [L0] core.manage_nsfs.nc_master_key_manager:: init_from_exec: get master keys response status=OK, version=15
{
  "error": {
    "code": "InternalError",
    "message": "The server encountered an internal error. Please retry the request",
    "detail": "Error: master key id is missing in master_keys_by_id\n    at NCMasterKeysManager.decryptSync (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:305:59)\n    at NCMasterKeysManager.decrypt (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:294:21)\n    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n    at async /usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:335:29\n    at async Promise.all (index 0)\n    at async NCMasterKeysManager.decrypt_access_keys (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:333:39)\n    at async fetch_existing_account_data (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:416:30)\n    at async fetch_account_data (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:385:16)\n    at async account_management (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:336:18)\n    at async main (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:112:13)"
  }
}

Infact i would say, that all account deletes have the same problem. this is a blocker and a high priority one for us now 


@rkomandu
Copy link
Collaborator Author

Every account delete has an issue now

Bucket bucket-10001 deleted successfully
Failed to execute command for Account s3user-10001: The server encountered an internal error. Please retry the request
10001 is done
Bucket bucket-10002 deleted successfully
Failed to execute command for Account s3user-10002: The server encountered an internal error. Please retry the request
10002 is done
Bucket bucket-10003 deleted successfully
Failed to execute command for Account s3user-10003: The server encountered an internal error. Please retry the request
10003 is done
Bucket bucket-10004 deleted successfully
Failed to execute command for Account s3user-10004: The server encountered an internal error. Please retry the request
10004 is done
Bucket bucket-10005 deleted successfully
Failed to execute command for Account s3user-10005: The server encountered an internal error. Please retry the request
10005 is done
Bucket bucket-10006 deleted successfully
Failed to execute command for Account s3user-10006: The server encountered an internal error. Please retry the request
10006 is done
Bucket bucket-10007 deleted successfully
Failed to execute command for Account s3user-10007: The server encountered an internal error. Please retry the request
10007 is done
Bucket bucket-10008 deleted successfully
Failed to execute command for Account s3user-10008: The server encountered an internal error. Please retry the request
10008 is done
Bucket bucket-10009 deleted successfully
Failed to execute command for Account s3user-10009: The server encountered an internal error. Please retry the request
10009 is done
Bucket bucket-10010 deleted successfully
Failed to execute command for Account s3user-10010: The server encountered an internal error. Please retry the request
10010 is done
Bucket bucket-10011 deleted successfully
Failed to execute command for Account s3user-10011: The server encountered an internal error. Please retry the request
10011 is done
Bucket bucket-10012 deleted successfully
Failed to execute command for Account s3user-10012: The server encountered an internal error. Please retry the request
10012 is done

@rkomandu
Copy link
Collaborator Author

Thank you @rkomandu, so we can understand from the error attached that the reason for the internal error is related to encryption (master key id is missing in master_keys_by_id), full edited message:

"detail": "Error: master key id is missing in master_keys_by_id
at NCMasterKeysManager.decryptSync (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:305:59)
at NCMasterKeysManager.decrypt (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:294:21)
at process.processTicksAndRejections (node:internal/process/task_queues:95:5)
at async /usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:335:29
at async Promise.all (index 0)
at async NCMasterKeysManager.decrypt_access_keys (/usr/local/noobaa-core/src/manage_nsfs/nc_master_key_manager.js:333:39)
at async /usr/local/noobaa-core/src/cmd/manage_nsfs.js:665:58
async Semaphore.surround (/usr/local/noobaa-core/src/util/semaphore.js:71:84)
async Promise.all (index 9)
async list_config_files (/usr/local/noobaa-core/src/cmd/manage_nsfs.js:660:29)"

This internal error printing came from the account list.

I would verify that when manually deleting the account we see the same details in the error (please run account delete on the account that account). If you can attach this please so we will have the full information.

cc: @romayalon

@shirady , this "internal server" is the latest problem w/r/t master_key as posted in above comments. However when the defect was opened in Mar 3rd week, there is no Enc enabled in d/s ODF 4.15.0. At that time the error is same as now, but don't have the noobaa cli command output

@rkomandu
Copy link
Collaborator Author

this ENC is a new problem with ODF 4.15.3, however the main problem could still be there, as deleting in a loop is what is being in all of my first few updates

@rkomandu
Copy link
Collaborator Author

rkomandu commented May 24, 2024

for the latest problem @romayalon , it is the master_key problem in the CCR. I had updated in the RTC defect https://jazz07.rchland.ibm.com:21443/jazz/web/projects/GPFS#action=com.ibm.team.workitem.viewWorkItem&id=330894
as comment 2

@shirady
the master_key problem is solved as shown below

for example the CCR master_key is

  1  gpfs.ganesha.statdargs.conf
        1  idmapd.conf
       18  _ces_s3.master_keys

# mmccr fget _ces_s3.master_keys /tmp/_ces_s3.master_keys
fget:18

# cat /tmp/_ces_s3.master_keys
{"timestemp":1716479808994,"active_master_key":"664f6740471f94033f84c97b","master_keys_by_id":{"664f6740471f94033f84c97b":{"id":"664f6740471f94033f84c97b","cipher_key":"7uboJzYGucsCimII30BbFCUdgS3zBv/oobwg9TXG0V8=","cipher_iv":"uipRG4v0jVQg8jnR2pcxsA==","encryption_type":"aes-256-gcm"}}}[root@c83f1-app3 s3]#


 cp /ibm/cesSharedRoot/ces/s3-config/accounts/s3user-10001.json /tmp/s3user-10001.json

less /tmp/s3user-10001.json

{"_id":"664dffe41b9d01307a23e986","name":"s3user-10001","email":"s3user-10001","creation_date":"2024-05-22T14:23:32.560Z","access_keys":[{"access_key":"8ca0bWSRFCScwdd4s8JJ","encrypted_secret_key":"k07Bhd5oQ2tphVljeV9Y6pUIjyqQLFoEG9nHqFTWTHmwagrgEh/fBg=="}],"nsfs_account_config":{"uid":10001,"gid":10001,"new_buckets_path":"/gpfs/remote_fvt_fs/s3user-10001-dir","fs_backend":"GPFS"},"allow_bucket_creation":true,"master_key_id":"664dd5d4b9b23ffb8378c6da"}




edit the s3user-10001.json file with the master_key of the CCR as per the /tmp/_ces_s3.master_keys
   
 vi /ibm/cesSharedRoot/ces/s3-config/accounts/s3user-10001.json (update the master_key of CCR) 

 mms3 account delete s3user-10001 --debug
Running Command /usr/lpp/mmfs/bin/mmlsconfig cesSharedRoot
Running Command /usr/lpp/mmfs/bin/mmces service list
Running Command: env LC_ALL=C /usr/local/bin/noobaa-cli account delete --name s3user-10001 2>/dev/null
Execute command failed
info:
{
  "response": {
    "code": "AccountDeleted"
  }
}

error: <nil>
Account s3user-10001 deleted successfully
it showed the command deleted successfully

We need to get this fixed ASAP, otherwise the 5.15.3-0415 build with Enc is not going to work for basic functionality of account delete

@romayalon
Copy link
Contributor

@rkomandu
According to the timestamp of the account, this account (creation_date":"2024-05-22T14:23:32.560) is older than the master key ("timestemp":1716479808994 = Thursday, May 23, 2024 3:56:48.994 PM), this is not a valid situation, you had an old account in this config_dir, this is why you see the missing master key error.

@rkomandu
Copy link
Collaborator Author

@romayalon ,
the updates that are made for the below comments are occurring on the 5.15.3 d/s build, when there are 2K accounts and 2K buckets are created. The master_key are different in the *.json files for many accounts, why they are different not sure, where Ramya is analyzing why the master_keys are regenerated when "we ran in a loop".
Bottom line as the account are unable to delete and this is happening due to the master_key value is different

#7920 (comment)
#7920 (comment)
#7920 (comment)

@romayalon
Copy link
Contributor

@rkomandu @shirady I posted yesterday a fix for the encryption issue, according to @ramya-c3 it's working now, @rkomandu can you try to reproduce with the new code and share if you still see the original internal error now?

@romayalon
Copy link
Contributor

@rkomandu @shirady any news about this one?

@rkomandu
Copy link
Collaborator Author

rkomandu commented Jun 4, 2024

@romayalon ,

as on Physical machine (BM) , with your 30th May provided RPM, we have recreated the delete account problem as 47 when executing concurrently from 3 nodes of 1K each , hit the error.

https://ibmandredhatguests.slack.com/archives/C015Z7SDWQ0/p1717130490257419?thread_ts=1716826449.194409&cid=C015Z7SDWQ0

@Ramya-c has taken up further from Friday , discussed with Guy and she is making experiments.

Unless that is sorted our first, this error even though related would be masked IMO. If you understand and address from code flow, then that is the next move, otherwise we need to wait and try when we have those extra cycles. Priority is to sort that problem

@romayalon
Copy link
Contributor

thanks @rkomandu for the update, waiting for the logs of the concurrency tests, please keep us updated with the information you capture about this issue.

@romayalon romayalon modified the milestones: 5.15.4, 5.15.5 Jun 4, 2024
@shirady
Copy link
Contributor

shirady commented Jun 4, 2024

Hi @romayalon @rkomandu
I will share that I tried to run a concurrency test using the bash loop on my machine and I will attach the steps (so you can comment on it).

The steps are described for a few accounts so you can test in, and then change the numbers of the iterations - for example, I changed it from for i in {1501..1503} to for i in {1501..1599}.

Requirements:

  1. (optional) Create the config root (so I'm sure there are no accounts from the previous run):
sudo mkdir -p /tmp/my-config
sudo chmod 777 /tmp/my-config
  1.  Create and give permission to the new-buckets-path:
for i in {1501..1503}
do
   mkdir -p /tmp/nsfs_root_s3user/s3user-$i; chmod 777 /tmp/nsfs_root_s3user/s3user-$i;
done

 Steps:
3. Create accounts:

for i in {1501..1503}
do
   sudo node src/cmd/manage_nsfs account add --name s3user-$i --uid $i --gid $i --new_buckets_path /tmp/nsfs_root_s3user/s3user-$i --config_root /tmp/my-config
done

You can check you can see the accounts config: sudo ls -al /tmp/my-config/accounts

  1. Delete accounts:
for i in {1501..1503}
do
   sudo node src/cmd/manage_nsfs account delete --name s3user-$i --config_root /tmp/my-config
done

You can check if you can not see any account config: sudo ls -al /tmp/my-config/accounts

@romayalon
Copy link
Contributor

Thanks @shirady for checking it, @rkomandu please keep us updated if the root cause of the reproduction Ramya sees is related to encryption or not (and might be a reproduction of this issue)

@rkomandu
Copy link
Collaborator Author

rkomandu commented Jun 4, 2024

@romayalon , as per ramya posting in the channel here https://ibmandredhatguests.slack.com/archives/C015Z7SDWQ0/p1717495547911289?thread_ts=1716826449.194409&cid=C015Z7SDWQ0, from here to the next 4 comments which shows "noobaa-cli with status and list" didn't show any master_key problem as was the case previously (i.e before 30th May build). So this delete error is occurring and now can be related to this defect of why it is failing.

@romayalon
Copy link
Contributor

@rkomandu I agree, we just need to validate it using a print of the full error object/ stderr, thank you

@shirady
Copy link
Contributor

shirady commented Jun 4, 2024

@rkomandu we need the details of the error to investigate.
stdout: '{\n "error": {\n "code": "InternalError",\n "message": "The server encountered an internal error. Please retry the request",\n "detail":<here are details, error stack, etc.>

Did you try to delete one of the accounts (not in the loop) and see the error?

@rkomandu
Copy link
Collaborator Author

rkomandu commented Jun 4, 2024

@shirady that is what is mentioned in the slack thread with the noobaa-cli status and list, as shown above

@ramya-c3
Copy link

ramya-c3 commented Jun 5, 2024

Name New Buckets Path Uid Gid User


s3user-31407 /gpfs/remote_fvt_fs/s3user-31407-dir 31407 31407 None
s3user-31038 /gpfs/remote_fvt_fs/s3user-31038-dir 31038 31038 None
s3user-31484 /gpfs/remote_fvt_fs/s3user-31484-dir 31484 31484 None
s3user-31037 /gpfs/remote_fvt_fs/s3user-31037-dir 31037 31037 None
s3user-32316 /gpfs/remote_fvt_fs/s3user-32316-dir 32316 32316 None
s3user-31402 /gpfs/remote_fvt_fs/s3user-31402-dir 31402 31402 None
s3user-32466 /gpfs/remote_fvt_fs/s3user-32466-dir 32466 32466 None
s3user-32171 /gpfs/remote_fvt_fs/s3user-32171-dir 32171 32171 None
s3user-31320 /gpfs/remote_fvt_fs/s3user-31320-dir 31320 31320 None
s3user-31475 /gpfs/remote_fvt_fs/s3user-31475-dir 31475 31475 None
s3user-70749 /gpfs/remote_fvt_fs/s3user-70749-dir 70749 70749 None
s3user-31576 /gpfs/remote_fvt_fs/s3user-31576-dir 31576 31576 None
s3user-31312 /gpfs/remote_fvt_fs/s3user-31312-dir 31312 31312 None
s3user-32180 /gpfs/remote_fvt_fs/s3user-32180-dir 32180 32180 None
s3user-32111 /gpfs/remote_fvt_fs/s3user-32111-dir 32111 32111 None
s3user-31371 /gpfs/remote_fvt_fs/s3user-31371-dir 31371 31371 None
s3user-32250 /gpfs/remote_fvt_fs/s3user-32250-dir 32250 32250 None
s3user-32003 /gpfs/remote_fvt_fs/s3user-32003-dir 32003 32003 None
s3user-31372 /gpfs/remote_fvt_fs/s3user-31372-dir 31372 31372 None
s3user-31041 /gpfs/remote_fvt_fs/s3user-31041-dir 31041 31041 None
s3user-31040 /gpfs/remote_fvt_fs/s3user-31040-dir 31040 31040 None
s3user-70698 /gpfs/remote_fvt_fs/s3user-70698-dir 70698 70698 None
s3user-32120 /gpfs/remote_fvt_fs/s3user-32120-dir 32120 32120 None
s3user-31207 /gpfs/remote_fvt_fs/s3user-31207-dir 31207 31207 None
[root@c83f1-app2 ~]# noobaa-cli account delete --name s3user-31407
load_nsfs_nc_config.setting config.NSFS_NC_CONF_DIR /ibm/cesSharedRoot/ces/s3-config
nsfs: config_dir_path=/ibm/cesSharedRoot/ces/s3-config config.json= {
ENDPOINT_PORT: 6001,
ENDPOINT_SSL_PORT: 6443,
GPFS_DL_PATH: '/usr/lpp/mmfs/lib/libgpfs.so',
NOOBAA_LOG_LEVEL: 'default',
ENDPOINT_FORKS: 2,
UV_THREADPOOL_SIZE: 16,
NSFS_CALCULATE_MD5: false,
ALLOW_HTTP: false,
NSFS_NC_STORAGE_BACKEND: 'GPFS',
NSFS_NC_CONFIG_DIR_BACKEND: 'GPFS',
NSFS_DIR_CACHE_MAX_DIR_SIZE: 536870912,
NSFS_DIR_CACHE_MAX_TOTAL_SIZE: 1073741824,
NC_MASTER_KEYS_STORE_TYPE: 'executable',
NC_MASTER_KEYS_GET_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_get',
NC_MASTER_KEYS_PUT_EXECUTABLE: '/usr/lpp/mmfs/bin/cess3_key_put'
}
2024-06-05 01:07:41.693221 [PID-2987141/TID-2987141] FS::GPFS GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
2024-06-05 01:07:41.693315 [PID-2987141/TID-2987141] FS::GPFS found GPFS lib file GPFS_DL_PATH=/usr/lpp/mmfs/lib/libgpfs.so
Jun-5 1:07:42.024 [/2987141] [LOG] CONSOLE:: detect_fips_mode: found /proc/sys/crypto/fips_enabled with value 0
(node:2987141) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

Please migrate your code to use AWS SDK for JavaScript (v3).
For more information, check the migration guide at https://a.co/7PzMCcy
(Use node --trace-warnings ... to show where the warning was created)
Jun-5 1:07:42.054 [/2987141] [LOG] CONSOLE:: read_rand_seed: reading 32 bytes from /dev/urandom ...
Jun-5 1:07:42.060 [/2987141] [LOG] CONSOLE:: read_rand_seed: got 32 bytes from /dev/urandom, total 32 ...
Jun-5 1:07:42.060 [/2987141] [LOG] CONSOLE:: read_rand_seed: closing fd ...
Jun-5 1:07:42.060 [/2987141] [LOG] CONSOLE:: init_rand_seed: seeding with 32 bytes
{
"response": {
"code": "AccountDeleted"
}
}

@shirady
Copy link
Contributor

shirady commented Jun 5, 2024

@ramya-c3 would you please add an explanation?
The list above is accounts that were not deleted in a loop and then you successfully deleted one of them?

Edit:
As I understand from Ramya and the comment above:

  1. They tried to delete the accounts in a loop - the list is of accounts that had InternalError were not deleted
    One of them is s3user-31407
  2. They tried to delete one of the accounts (not in a loop) s3user-31407 - it was successful:
    noobaa-cli account delete --name s3user-31407
{
"response": {
"code": "AccountDeleted"
}
  1. So the next step is to send up the output of the internal error during the loop.
    The structure is mentioned in the comment above:

stdout: '{\n "error": {\n "code": "InternalError",\n "message": "The server encountered an internal error. Please retry the request",\n "detail":<here are details, error stack, etc.>

@ramya-c3 @rkomandu @romayalon
WDYT?

@rkomandu
Copy link
Collaborator Author

rkomandu commented Jun 5, 2024

@shirady , what you mentioned in your above comment as step 1, 2 is correct.

@shirady
Copy link
Contributor

shirady commented Jun 5, 2024

@rkomandu as I understand in the loop you are using the mms3 command and you need to change the code in order to see the error details, right?

I would also suggest to run the same loop with noobaa-cli and see if there are any issues (as mentioned in comment above I was not able to reproduce it).

@rkomandu
Copy link
Collaborator Author

rkomandu commented Jun 5, 2024

@shirady , Dev team will have to change that in mms3 and then only this can be tried.

noobaa-cli i don't think we will use it here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants