Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create WazuhDB endpoint to recalculate agent group hashes #23422

Closed
TomasTurina opened this issue May 14, 2024 · 4 comments · Fixed by #23441
Closed

Create WazuhDB endpoint to recalculate agent group hashes #23422

TomasTurina opened this issue May 14, 2024 · 4 comments · Fixed by #23441
Assignees
Labels
level/task module/db Wazuh DB engine type/bug Something isn't working

Comments

@TomasTurina
Copy link
Member

TomasTurina commented May 14, 2024

Description

A condition was discovered that could lead the cluster to enter an infinite loop to synchronize agent groups information.

When the master node fails to set the group_hash column for any agent in the global.db, this column remains empty until a new group modification occurs, which is something that doesn't happen very often. By synchronizing these columns in the cluster, the worker nodes receive the information and recalculate the hashes, producing a different result than the master which has some empty hashes. The master does not synchronize the hashes of empty groups, the worker nodes calculate them and that is why this happens.

To fix this bug, it is proposed to implement a new endpoint in WazuhDB that the framework can use to make the manager recalculate all agent group hashes. This way the master will be able to recalculate bad group hashes without having to wait for the agent to change groups.

This new endpoint will be called recalculate-agent-group-hashes. It will not receive any parameters, it will simply iterate the list of all agents and recalculate the group_hash for all of them. About the group_sync_status column, it will be set to synced if group_hash doesn't change and to syncreq if it does (depending on if it's a worker or master node).

@TomasTurina TomasTurina added type/bug Something isn't working module/db Wazuh DB engine level/task labels May 14, 2024
@TomasTurina TomasTurina changed the title Create WazuhDB endpoint to recalculate agent groups hash Create WazuhDB endpoint to recalculate agent group hashes May 14, 2024
@Nicogp
Copy link
Member

Nicogp commented May 14, 2024

I started working on the issue

  • Added the new endpoint
  • Changed the wdb_global_recalculate_agent_groups_hash function to receive the old hash as a parameter and update the db in case it changes.

Branch:
https://github.com/wazuh/wazuh/tree/fix/23422-groups-hash

First tests (single node):

root@nico-VirtualBox:/home/nico# python3 wdb-query.py  'global sql update agent set group_sync_status="syncreq" w
here id=2'
[]
root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global sql update agent set group_hash=Null where id=2"
[]
root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global sql select id,group_hash,group_sync_status from agent"
[
    {
        "id": 0,
        "group_sync_status": "synced"
    },
    {
        "id": 1,
        "group_hash": "37a8eec1",
        "group_sync_status": "synced"
    },
    {
        "id": 2,
        "group_sync_status": "syncreq"
    }
]
root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global recalculate-agent-group-hashes"
ok
root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global sql select id,group_hash,group_sync_status from agent"
[
    {
        "id": 0,
        "group_sync_status": "synced"
    },
    {
        "id": 1,
        "group_hash": "37a8eec1",
        "group_sync_status": "synced"
    },
    {
        "id": 2,
        "group_hash": "0da05cf3",
        "group_sync_status": "synced"
    }
]
root@nico-VirtualBox:/home/nico#

@Nicogp
Copy link
Member

Nicogp commented May 14, 2024

Test cluster:

Set the group_hash of agent 2 to NULL to force continuous synchronization between master and worker:

root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global sql select id,group_hash,group_sync_status from agent"
[
    {
        "id": 0,
        "group_sync_status": "synced"
    },
    {
        "id": 1,
        "group_hash": "37a8eec1",
        "group_sync_status": "synced"
    },
    {
        "id": 2,
        "group_hash": "0da05cf3",
        "group_sync_status": "synced"
    },
    {
        "id": 4,
        "group_hash": "37a8eec1",
        "group_sync_status": "synced"
    }
]
root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global sql update agent set group_hash=Null where id=2"
[]
root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global sql select id,group_hash,group_sync_status from agent"
[
    {
        "id": 0,
        "group_sync_status": "synced"
    },
    {
        "id": 1,
        "group_hash": "37a8eec1",
        "group_sync_status": "synced"
    },
    {
        "id": 2,
        "group_sync_status": "synced"
    },
    {
        "id": 4,
        "group_hash": "37a8eec1",
        "group_sync_status": "synced"
    }
]

cluster.log (worker):

cluster.log
root@vagrant:/home/vagrant# cat /var/ossec/logs/cluster.log | grep "The checksum of master"
2024/05/14 23:02:06 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different.
2024/05/14 23:03:06 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different.
2024/05/14 23:03:16 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different.
2024/05/14 23:03:26 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different.
2024/05/14 23:03:36 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different.
2024/05/14 23:03:46 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different.
2024/05/14 23:03:56 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different.
2024/05/14 23:04:06 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different.

Recalculation of hashes in the master node:

root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global recalculate-agent-group-hashes"
ok
root@nico-VirtualBox:/home/nico# python3 wdb-query.py  "global sql select id,group_hash,group_sync_status from agent"
[
    {
        "id": 0,
        "group_sync_status": "synced"
    },
    {
        "id": 1,
        "group_hash": "37a8eec1",
        "group_sync_status": "synced"
    },
    {
        "id": 2,
        "group_hash": "0da05cf3",
        "group_sync_status": "syncreq"
    },
    {
        "id": 4,
        "group_hash": "37a8eec1",
        "group_sync_status": "synced"
    }
]

cluster.log (worker) after recalculation of hashes:

cluster.log
2024/05/14 23:03:57 INFO: [Worker worker01-node] [Agent-info sync] Finished in 0.019s. Updated 0 chunks.
2024/05/14 23:03:58 DEBUG: [Worker worker01-node] [Integrity check] Permission to synchronize granted.
2024/05/14 23:03:58 INFO: [Worker worker01-node] [Integrity check] Starting.
2024/05/14 23:03:58 DEBUG: [Worker worker01-node] [Integrity check] Compressing 'files_metadata.json' of 37 files.
2024/05/14 23:03:58 DEBUG: [Worker worker01-node] [Integrity check] Sending zip file.
2024/05/14 23:03:58 DEBUG: [Worker worker01-node] [Integrity check] Zip file sent.
2024/05/14 23:03:58 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_m_c_ok''
2024/05/14 23:03:58 INFO: [Worker worker01-node] [Integrity check] Finished in 0.035s. Sync not required.
2024/05/14 23:04:06 DEBUG: [Worker worker01-node] [Main] Command received: 'b'new_str''
2024/05/14 23:04:06 DEBUG: [Worker worker01-node] [Main] Command received: 'b'str_upd''
2024/05/14 23:04:06 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_g_m_w''
2024/05/14 23:04:06 INFO: [Worker worker01-node] [Agent-groups recv] Starting.
2024/05/14 23:04:06 DEBUG: [Worker worker01-node] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.000s.
2024/05/14 23:04:06 DEBUG: [Worker worker01-node] [Agent-groups recv] Obtained 1 chunks of data in 0.001s.
2024/05/14 23:04:06 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of master (4839dba4cbbb040c70be6cdcd0a61e28279ec796) and worker (be5508d938177c1c25938fdfaa328ba4bb021037) are different.
2024/05/14 23:04:06 DEBUG: [Worker worker01-node] [Agent-groups recv] Checksum comparison failed (2/5).
2024/05/14 23:04:06 INFO: [Worker worker01-node] [Agent-groups recv] Finished in 0.011s. Updated 1 chunks.
2024/05/14 23:04:07 DEBUG: [Worker worker01-node] [Agent-info sync] Permission to synchronize granted.
2024/05/14 23:04:07 INFO: [Worker worker01-node] [Agent-info sync] Starting.
2024/05/14 23:04:07 DEBUG: [Worker worker01-node] [Agent-info sync] Obtained 0 chunks of data in 0.001s.
2024/05/14 23:04:07 INFO: [Worker worker01-node] [Agent-info sync] Finished in 0.005s. Updated 0 chunks.
2024/05/14 23:04:07 DEBUG: [Worker worker01-node] [Integrity check] Permission to synchronize granted.
2024/05/14 23:04:07 INFO: [Worker worker01-node] [Integrity check] Starting.
2024/05/14 23:04:07 DEBUG: [Worker worker01-node] [Integrity check] Compressing 'files_metadata.json' of 37 files.
2024/05/14 23:04:07 DEBUG: [Worker worker01-node] [Integrity check] Sending zip file.
2024/05/14 23:04:07 DEBUG: [Worker worker01-node] [Integrity check] Zip file sent.
2024/05/14 23:04:07 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_m_c_ok''
2024/05/14 23:04:07 INFO: [Worker worker01-node] [Integrity check] Finished in 0.022s. Sync not required.
2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Main] Command received: 'b'new_str''
2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Main] Command received: 'b'str_upd''
2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_g_m_w''
2024/05/14 23:04:16 INFO: [Worker worker01-node] [Agent-groups recv] Starting.
2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.001s.
2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Agent-groups recv] Obtained 1 chunks of data in 0.001s.
2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of both databases match. Counter reset.
2024/05/14 23:04:16 INFO: [Worker worker01-node] [Agent-groups recv] Finished in 0.006s. Updated 1 chunks.
2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Integrity check] Permission to synchronize granted.
2024/05/14 23:04:16 INFO: [Worker worker01-node] [Integrity check] Starting.
2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Integrity check] Compressing 'files_metadata.json' of 37 files.
2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Integrity check] Sending zip file.
2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Integrity check] Zip file sent.
2024/05/14 23:04:16 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_m_c_ok''
2024/05/14 23:04:16 INFO: [Worker worker01-node] [Integrity check] Finished in 0.060s. Sync not required.
2024/05/14 23:04:17 DEBUG: [Worker worker01-node] [Agent-info sync] Permission to synchronize granted.
2024/05/14 23:04:17 INFO: [Worker worker01-node] [Agent-info sync] Starting.
2024/05/14 23:04:17 DEBUG: [Worker worker01-node] [Agent-info sync] Obtained 0 chunks of data in 0.001s.
2024/05/14 23:04:17 INFO: [Worker worker01-node] [Agent-info sync] Finished in 0.005s. Updated 0 chunks.
2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Integrity check] Permission to synchronize granted.
2024/05/14 23:04:25 INFO: [Worker worker01-node] [Integrity check] Starting.
2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Integrity check] Compressing 'files_metadata.json' of 37 files.
2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Integrity check] Sending zip file.
2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Integrity check] Zip file sent.
2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_m_c''
2024/05/14 23:04:25 INFO: [Worker worker01-node] [Integrity check] Finished in 0.053s. Sync required.
2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Main] Command received: 'b'new_file''
2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Main] Command received: 'b'file_upd''
2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Main] Command received: 'b'file_end''
2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_m_c_e''
2024/05/14 23:04:25 INFO: [Worker worker01-node] [Integrity sync] Starting.
2024/05/14 23:04:25 INFO: [Worker worker01-node] [Integrity sync] Files to create: 1 | Files to update: 0 | Files to delete: 0
2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Integrity sync] Worker does not meet integrity checks. Actions required.
2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Integrity sync] Updating local files: Start.
2024/05/14 23:04:25 DEBUG: [Worker worker01-node] [Integrity sync] Updating local files: End.
2024/05/14 23:04:25 INFO: [Worker worker01-node] [Integrity sync] Finished in 0.010s.
2024/05/14 23:04:26 DEBUG: [Worker worker01-node] [Main] Command received: 'b'new_str''
2024/05/14 23:04:26 DEBUG: [Worker worker01-node] [Main] Command received: 'b'str_upd''
2024/05/14 23:04:26 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_g_m_w''
2024/05/14 23:04:26 INFO: [Worker worker01-node] [Agent-groups recv] Starting.
2024/05/14 23:04:26 DEBUG: [Worker worker01-node] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.000s.
2024/05/14 23:04:26 DEBUG: [Worker worker01-node] [Agent-groups recv] Obtained 1 chunks of data in 0.001s.
2024/05/14 23:04:26 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of both databases match.
2024/05/14 23:04:26 INFO: [Worker worker01-node] [Agent-groups recv] Finished in 0.007s. Updated 1 chunks.
2024/05/14 23:04:26 DEBUG: [Local Server] [Keep alive] Calculating.
2024/05/14 23:04:26 DEBUG: [Local Server] [Keep alive] Calculated.
2024/05/14 23:04:27 DEBUG: [Worker worker01-node] [Agent-info sync] Permission to synchronize granted.
2024/05/14 23:04:27 INFO: [Worker worker01-node] [Agent-info sync] Starting.
2024/05/14 23:04:27 DEBUG: [Worker worker01-node] [Agent-info sync] Obtained 0 chunks of data in 0.000s.
2024/05/14 23:04:27 INFO: [Worker worker01-node] [Agent-info sync] Finished in 0.002s. Updated 0 chunks.
2024/05/14 23:04:34 DEBUG: [Worker worker01-node] [Integrity check] Permission to synchronize granted.
2024/05/14 23:04:34 INFO: [Worker worker01-node] [Integrity check] Starting.
2024/05/14 23:04:34 DEBUG: [Worker worker01-node] [Integrity check] Compressing 'files_metadata.json' of 38 files.
2024/05/14 23:04:34 DEBUG: [Worker worker01-node] [Integrity check] Sending zip file.
2024/05/14 23:04:34 DEBUG: [Worker worker01-node] [Integrity check] Zip file sent.
2024/05/14 23:04:34 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_m_c_ok''
2024/05/14 23:04:34 INFO: [Worker worker01-node] [Integrity check] Finished in 0.033s. Sync not required.
2024/05/14 23:04:36 DEBUG: [Worker worker01-node] [Main] Command received: 'b'new_str''
2024/05/14 23:04:36 DEBUG: [Worker worker01-node] [Main] Command received: 'b'str_upd''
2024/05/14 23:04:36 DEBUG: [Worker worker01-node] [Main] Command received: 'b'syn_g_m_w''
2024/05/14 23:04:36 INFO: [Worker worker01-node] [Agent-groups recv] Starting.
2024/05/14 23:04:36 DEBUG: [Worker worker01-node] [Agent-groups recv] 1/1 chunks updated in wazuh-db in 0.001s.
2024/05/14 23:04:36 DEBUG: [Worker worker01-node] [Agent-groups recv] Obtained 1 chunks of data in 0.001s.
2024/05/14 23:04:36 DEBUG: [Worker worker01-node] [Agent-groups recv] The checksum of both databases match.
2024/05/14 23:04:36 INFO: [Worker worker01-node] [Agent-groups recv] Finished in 0.008s. Updated 1 chunks.
2024/05/14 23:04:36 INFO: [Worker worker01-node] [Keep Alive] Successful response from master: keepalive
2024/05/14 23:04:37 DEBUG: [Worker worker01-node] [Agent-info sync] Permission to synchronize granted.
2024/05/14 23:04:37 INFO: [Worker worker01-node] [Agent-info sync] Starting.
2024/05/14 23:04:37 DEBUG: [Worker worker01-node] [Agent-info sync] Obtained 0 chunks of data in 0.000s.
2024/05/14 23:04:37 INFO: [Worker worker01-node] [Agent-info sync] Finished in 0.002s. Updated 0 chunks.
2024/05/14 23:04:43 DEBUG: [Worker worker01-node] [Integrity check] Permission to synchronize granted.
2024/05/14 23:04:43 INFO: [Worker worker01-node] [Integrity check] Starting.
2024/05/14 23:04:43 DEBUG: [Worker worker01-node] [Integrity check] Compressing 'files_metadata.json' of 38 files.

@Nicogp
Copy link
Member

Nicogp commented May 15, 2024

Added UTs for:

  • wdb_global_parser() function endpoint recalculate-agent-group-hashes
  • wdb_global_recalculate_all_agent_groups_hash() function

@Nicogp
Copy link
Member

Nicogp commented May 15, 2024

Update 15/05/2024

  • Changes were applied to avoid modifying the group_sync_status column.
  • Hash recalculation is performed for all agents (except id=0), even though the column group==NULL
  • PR created and under revision

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
level/task module/db Wazuh DB engine type/bug Something isn't working
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

2 participants