SPIKE - Initial `Agent comms API` server design #23395

Selutario · 2024-05-14T09:37:08Z

Epic
#22677

Description

We want to, as part of #22677, replace the current wazuh-remoted and wazuh-agentd services. Instead, we intend to develop a service that uses a standard protocol such as HTTP and request-driven communication, where different events can be forwarded to any of the Wazuh servers, unlike the current session-oriented approach where an agent sends all its messages to the server where it is connected.

However, we will also need to maintain a session-oriented connection so that the server can send commands to the agents on demand. Some proposals for this other mode of communication could include the use of websockets or gRPC.

The preliminary design of the server must have a /login endpoint so that agents/clients can authenticate and obtain a token from the obtained credentials. Additionally, requests of three different types must be handled:

Stateless: After receiving events, the API will immediately respond to the client, without waiting to confirm whether the engine can process said events.
Stateful: The API must confirm that the event has been processed and indexed before responding to the client.
Commands: Would be session-oriented. The server must verify that a command exists (they will be listed in an indexer's index) and forward the command to the appropriate place.
Management: API endpoints related to management tasks such as getting the information of a configuration group, receiving a package to upgrade the Agent, etc.

The API must be versioned.

This is a research issue.

Implementation restrictions

The opensearch-py library should be considered for API-Indexer communication.
Use existing libraries as much as possible, avoiding developing very complex components on our own.
Collaborate with the Agent team to align on communication protocols and API integration.

Plan

Analyze everything that wazuh does and create an extensive list of endpoints to support it. We can rely on the devel-agent team for this and their research issue.
- A high level of detail is not necessary. It is enough for each endpoint in the list to include a description of what it does.
Create a list of use cases. For example:
- How to connect and communicate with the indexer to perform X action.
- How to connect and communicate with the engine to perform Y action.
Investigation of available/candidate and most suitable technologies: websockets, gRPC, etc.
Library research: Starlette, FastAPI, Connection, etc, etc.
Initial server design that meets the requirements listed in Agent/server communication protocol #22677. For example: how to request the token? How to request a new token once it expires?

The text was updated successfully, but these errors were encountered:

GGP1 · 2024-05-29T18:39:23Z

API

Note

This is a work in progress

The Agent comms API will be exposed via an HTTP server using TLS as the transport layer. It will contain the endpoints below.

Authentication

POST /login

Log in to the server.

Body: UUID, password
Response: JWT token

Events

POST /events/stateless

Send events that are not necessarily processed by the engine.

Body: Event
Response: Event received message

POST /events/stateful

Send events that must be processed and persisted.

Body: Event
Response: Processing status

Commands

GET /commands

Subscribe to obtain commands sent by the server. The connection is hijacked by the server and converted into a websocket or SSE connection. It is kept alive during the whole agent session and only the server can send events.

Parameters: UUID (so the server knows which agent to send specific events to)
Response: No response, connection is kept alive

POST /commands

Only other Wazuh components would be able to publish commands, agents won't. Perhaps this could be achieved through a unix sockets if all the commands are received from components in the same node.

Send commands to all or a specific set of agents.

Parameters: command, UUIDs
Response: Command received message

Management

GET /configuration

Get information about the group configuration.

Parameters: -
Response: Current configuration

PUT /configuration

Update the group configuration.

Body: New configuration
Response: Updated configuration

GET /upgrade

Download WPK files to upgrade agents.

Parameters: File name
Response: File bytes stream

SSE vs Websockets

The two alternatives I would consider for server-side events are Server sent events (SSE) and WebSockets.

Here is an image that explains their differences pretty well.

Server sent events are simpler and use the HTTP protocol under the hood, the messages can flow in one direction only and due to their simplicity, no external library may be required. Another advantage is that enterprise firewalls do not have issues inspecting the packets like it happens with websockets.

I would only choose websockets if the messages structure is a limitation and we want to use a specific encoding for the commands.

API framework

The frameworks that would require the least amount of dependencies changes are Connexion 3.0 and FastAPI, only these two are taken into consideration. Both are based on Starlette and Uvicorn, they are pretty similar in terms of dependencies and performance.

Connexion does not offer built-in support for websockets or SSE. On the other hand, FastAPI does not require any external dependencies for it.

Using SSE with connexion would require the sse-starlette library

FastAPI has broader community and maintainer support, but it does not support a spec-first approach like Connexion, and considering we have connexion 3.0 running in the Server management API it may be appropriate to continue using it. This is something we must decide with the team.

GGP1 · 2024-05-30T13:15:02Z

API-Indexer communication

The communication with the indexer will be performed through the API it exposes, using the opensearch-py library as a SDK.

For example, if a new agent wants to log in, we craft and send a HTTP POST request to the indexer with the identifiers of the agent so we can validate the authorization token.

flowchart TD

subgraph Agents
    Endpoints
    Clouds
    Other
end

subgraph Server["Server cluster"]

    subgraph Wazuh1["Server node n"]
        api1["Agent comms API"]
    end

    subgraph Wazuh2[" Server node 2"]
        api2["Agent comms API"]
    end

end

subgraph Indexer
    subgraph Data_states["Data states"]
        agents_list["Agents list"]
        states["States"]
    end
end

subgraph lb["Load Balancer"]
    lb_node["Per request"]
end

Agents -- /login --> lb
lb -- /login --> Wazuh1
lb -- /login --> Wazuh2
Wazuh1 -- Read credentials --> agents_list
Wazuh2 -- Read credentials --> agents_list

style Wazuh1 fill:#abc2eb
style Wazuh2 fill:#abc2eb
style Data_states fill:#abc2eb

API-Engine communication

For the communication with the Engine, we will use Unix sockets like the Server management API. Since we will have an Engine instance running on each of the nodes of a cluster, there's no need to use the broader internet to communicate.

The API sees the request received from an agent, builds a request to the Engine and sends it through the socket, in the case of stateless events, it's not necessary to wait for the response.

flowchart TD

subgraph Agents
    Endpoints
    Clouds
    Other
end

subgraph Server["Server cluster"]

    subgraph Wazuh1["Server node n"]
        api1["Agent comms API"]
        server1["Server </br> management API"]
        Engine1["Engine"]
        VD1["VD"]
    end

    subgraph Wazuh2[" Server node 2"]
        api2["Agent comms API"]
        server2["Server </br> management API"]
        Engine2["Engine"]
        VD2["VD"]
    end

end

subgraph lb["Load Balancer"]
    lb_node["Per request"]
end

Agents -- /events/stateless --> lb
lb -- /events/stateless --> Wazuh1
lb -- /events/stateless --> Wazuh2
api1 -- Unix socket --> Engine1
api2 -- Unix socket --> Engine2

style Wazuh1 fill:#abc2eb
style Wazuh2 fill:#abc2eb

Selutario added type/research level/subtask labels May 14, 2024

Selutario mentioned this issue May 14, 2024

SPIKE - PoC implementation for agent and server #23396

Open

jr0me mentioned this issue May 21, 2024

SPIKE - New Agent comms API endpoint client wazuh/wazuh-agent#1

Open

havidarou mentioned this issue May 23, 2024

Agent/server communication protocol #22677

Open

4 tasks

GGP1 self-assigned this May 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SPIKE - Initial `Agent comms API` server design #23395

SPIKE - Initial `Agent comms API` server design #23395

Selutario commented May 14, 2024 •

edited by havidarou

GGP1 commented May 29, 2024 •

edited

GGP1 commented May 30, 2024 •

edited

SPIKE - Initial Agent comms API server design #23395

SPIKE - Initial Agent comms API server design #23395

Comments

Selutario commented May 14, 2024 • edited by havidarou

Description

Implementation restrictions

Plan

GGP1 commented May 29, 2024 • edited

API

Authentication

Events

Commands

Management

SSE vs Websockets

API framework

GGP1 commented May 30, 2024 • edited

API-Indexer communication

API-Engine communication

SPIKE - Initial `Agent comms API` server design #23395

SPIKE - Initial `Agent comms API` server design #23395

Selutario commented May 14, 2024 •

edited by havidarou

GGP1 commented May 29, 2024 •

edited

GGP1 commented May 30, 2024 •

edited