Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mismatch Datadog metrics vs Artillery Json report #2753

Open
Anonycoders opened this issue May 15, 2024 · 12 comments
Open

Mismatch Datadog metrics vs Artillery Json report #2753

Anonycoders opened this issue May 15, 2024 · 12 comments

Comments

@Anonycoders
Copy link

Hi @hassy 馃憢馃徏

During our tests using Artillery, we noticed that reported metrics (i.e. req/sec) in Datadog are a not matched with those reported by Artillery logs! The numbers displayed in logs are more than those reported in Datadog.
Unfortunately, this deviation caused some issues in our final report.

I hope you can help us with this.

Version info:

2.0.11

Running this command:

run-fargate -o report.json .../test.yml --count 6

In order to find out what is causing the issue, I created a test .yml to generate 300 req/sec for 5 min ( Sustain phase) but each time, the req/sec reported in Datadog is around 150 req/sec but in Artillery logs it shows that it's 300 req/sec!

I tried to decrease the number of arrivals and scale out instead but, even with 6 instances and it still can not reach to 300 req/sec in Datadog.
Also tried maxVusers: 50 in Sustain phase but the result was the same.

Here is the .yml file config for :

config:
  target: "https://xxxxxxxxxx/api"
  http:
    extendedMetrics: true
  phases:
  - name: WarmUp
    duration: 2m
    arrivalRate: 10
    rampTo: 50
  - name: Sustain
    duration: 5m
    arrivalRate: 50
 plugins:
   ensure: {}
   metrics-by-endpoint:
     useOnlyRequestNames: true
     stripQueryString: true
   publish-metrics:
   - type: datadog
      apiKey: ...
      appKey: ...
      prefix: ...
      tags:
        - XXXX
        - ....
      traces:
        ....
 

scenarios:
  - name: List users
     flow:
       - get:
          url: "/users"

I also found this #1569 issue which seems similar.

@InesNi
Copy link
Contributor

InesNi commented May 15, 2024

Hi @Anonycoders 馃憢馃徎 ,

Thank you for reporting the issue, and for all the info! Will take a look and get back to you.

@Anonycoders
Copy link
Author

Hi @InesNi 馃憢
Do we have any update on this issue?
Please let me know if more information is needed from my end.

@jCOTINEAU
Copy link

jCOTINEAU commented May 30, 2024

Hello, we have a similar issue

When running a simple

  phases:
    - duration: 60
      arrivalRate: 1
      name: One per sec

I expect 60 aggregated user and this is the case when running from my machine and with fargate --count 1, but when I bump up the --count I start having weird numbers, is this expected ?

After investigating a bit, if I use --output to look at the json report, the metrics are wrong also in this file, so my guess is that its not coming from datadog.

@hassy
Copy link
Member

hassy commented May 30, 2024

@jCOTINEAU can you share more on the numbers you're seeing? --count should act as a multiplier, e.g. with the phases config that you shared and --count 2 you should see a total of 120 VUs created. The period may be slightly longer than 60s, but by no more than a few seconds (this is to be expected in a distributed test as Fargate tasks are not guaranteed to start at the exact same time).

@hassy
Copy link
Member

hassy commented May 30, 2024

and for everyone else seeing mismatched numbers between Artillery and Datadog - can you share the exact config of the chart you're using in Datadog? Sometimes if the correct aggregation is not set in Datadog the numbers can be different to what Artillery is reporting.

@jCOTINEAU
Copy link

@hassy hooo that is interesting, i was not understanding that the --count would multiply the phase, I was thinking the phase load would get splited between the count members.

I was expecting the aggregation thingy, will take a deeper look and come back here, thanks a lot for the clarifications.

@hassy
Copy link
Member

hassy commented May 30, 2024

@jCOTINEAU that's a fair expectation, we need to make it clearer in the docs. We definitely want to add support for distributing the load as you described as well, but no ETA on that yet.

@jCOTINEAU
Copy link

Side note, the product is insanely cool, integration with fargate and datadog is super smooth, huge kudos!

@hassy
Copy link
Member

hassy commented May 30, 2024

@jCOTINEAU thank you, very happy to hear that!

@jCOTINEAU
Copy link

ANother question @hassy, you were mentioning datadog aggregation, what value should be setup over there?

Copy link
Member

hassy commented May 31, 2024

It should always be SUM for counter metrics (like vusers.created or vusers.failed) and depending on the time window you might need to add an explicit rollup with a sum as well: https://docs.datadoghq.com/dashboards/functions/rollup/

@jCOTINEAU
Copy link

jCOTINEAU commented May 31, 2024

that is interesting, because indeed when i run some more big tests with fargate count=10, the numbers on datadog does not match the one from the json report.

I will dig a bit into the rollup thing, thanks for the clarification.

Edit: damn even when adding a rollup sum from 1 second to X minutes, it does not change anything I am still 2/3 times below the number of vuser created.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants