Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Big find query fails "failed to fetch data from server/group" #815

Open
interfan7 opened this issue Jan 16, 2024 · 0 comments
Open
Labels

Comments

@interfan7
Copy link

Describe the bug
go-carob log seems to handle the query in a breeze, but carbonapi log reports failure with runtime of 30 seconds even if timeout is set to much higher.
IDK what limits it to 30 secs, some lower-level network stack? The error message is not really informative here (a suggestion to improve?).

CarbonAPI Version
0.16.0~1

Logs
jq'ed go-carbon log entry of the request:

{
  "handler": "find",
  "url": "/metrics/find/?format=protobuf&query=s1.%2As2%2A.s3.s4.%2A",
  "peer": "127.0.0.1:58152",
  "carbonapi_uuid": "2900f058-d6ec-4cc0-b3a4-985876fdfd23",
  "carbonzipper_uuid": "2900f058-d6ec-4cc0-b3a4-985876fdfd23",
  "query": [
    "s1.*s2*.s3.s4.*"
  ],
  "format": "protobuf",
  "runtime_seconds": 0.19695218,
  "Files": 55848,
  "find_cache_enabled": true,
  "from_cache": false,
  "http_code": 200,
  "lookups": 0
}

The jq'ed entries of the request which are grepped by UUID:

{
  "level": "DEBUG",
  "timestamp": "2024-01-16T15:23:44.907Z",
  "logger": "zipper",
  "message": "will do query",
  "type": "find",
  "name": "http://127.0.0.1:8080",
  "request": [
    "s1.*s2*.s3.s4.*"
  ],
  "carbonapi_uuid": "2900f058-d6ec-4cc0-b3a4-985876fdfd23",
  "query": "s1.*s2*.s3.s4.*"
}
{
  "level": "DEBUG",
  "timestamp": "2024-01-16T15:23:44.908Z",
  "logger": "zipper",
  "message": "trying to get slot",
  "type": "find",
  "name": "http://127.0.0.1:8080",
  "request": [
    "s1.*s2*.s3.s4.*"
  ],
  "carbonapi_uuid": "2900f058-d6ec-4cc0-b3a4-985876fdfd23",
  "function": "HttpQuery.doRequest",
  "server": "http://127.0.0.1:8080",
  "uri": "http://127.0.0.1:8080/metrics/find/?format=protobuf&query=s1.%2As2%2A.s3.s4.%2A"
}
{
  "level": "DEBUG",
  "timestamp": "2024-01-16T15:23:44.908Z",
  "logger": "zipper",
  "message": "got slot for server",
  "type": "find",
  "name": "http://127.0.0.1:8080",
  "request": [
    "s1.*s2*.s3.s4.*"
  ],
  "carbonapi_uuid": "2900f058-d6ec-4cc0-b3a4-985876fdfd23",
  "function": "HttpQuery.doRequest",
  "server": "http://127.0.0.1:8080",
  "uri": "http://127.0.0.1:8080/metrics/find/?format=protobuf&query=s1.%2As2%2A.s3.s4.%2A"
}
{
  "level": "DEBUG",
  "timestamp": "2024-01-16T15:24:14.856Z",
  "logger": "zipper",
  "message": "had errors while fetching result",
  "function": "FindProtoV3",
  "carbonapi_uuid": "2900f058-d6ec-4cc0-b3a4-985876fdfd23",
  "errors": "failed to fetch data from server/group",
  "errorsVerbose": "failed to fetch data from server/group\nHTTP Code: 500\n\ngithub.com/go-graphite/carbonapi/zipper/types.init\n\t/root/go/src/github.com/go-graphite/carbonapi/zipper/types/errors.go:27\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6321\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594\n\nCaused By: timeout while fetching Response\nHTTP Code: 504\n\ngithub.com/go-graphite/carbonapi/zipper/types.init\n\t/root/go/src/github.com/go-graphite/carbonapi/zipper/types/errors.go:20\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6321\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.doInit\n\t/usr/local/go/src/runtime/proc.go:6298\nruntime.main\n\t/usr/local/go/src/runtime/proc.go:233\nruntime.goexit\n\t/usr/local/go/src/runtime/asm_amd64.s:1594"
}
{
  "level": "ERROR",
  "timestamp": "2024-01-16T15:24:14.866Z",
  "logger": "access",
  "message": "request failed",
  "data": {
    "handler": "find",
    "carbonapi_uuid": "2900f058-d6ec-4cc0-b3a4-985876fdfd23",
    "url": "/metrics/find?from=1705245823&until=1705418625",
    "peer_ip": "185.114.120.245",
    "host": "graphite-dev.catonet.works:4040",
    "metrics": [
      "s1.*s2*.s3.s4.*"
    ],
    "runtime": 29.992397025,
    "http_code": 500,
    "reason": "failed to fetch data from server/group",
    "uri": "/metrics/find?from=1705245823&until=1705418625",
    "from_cache": false,
    "used_backend_cache": false,
    "zipper_requests": 1,
    "request_headers": {
      "X-Grafana-Org-Id": "1"
    }
  }
}

CarbonAPI Configuration:

listeners:
        - address: "127.0.0.1:8081"
        - address: "[::1]:8081"
        - address: "localhost:8081"
listen: "localhost:8081"

prefix: ""
useCachingDNSResolver: false
cachingDNSRefreshTime: "1m"
expvar:
  enabled: true
  pprofEnabled: false
  listen: ""
headersToPass:
  - "X-Dashboard-Id"
  - "X-Grafana-Org-Id"
  - "X-Panel-Id"
headersToLog:
  - "X-Dashboard-Id"
  - "X-Grafana-Org-Id"
  - "X-Panel-Id"
define:
  -
    name: "perMinute"
    template: "perSecond({{.argString}})|scale(60)"# Control what status code will be returned where /render or find query do not return any metric. Default is 200
notFoundStatusCode: 200
concurency: 100000
cache:
   type: "mem"
   size_mb: 0
   defaultTimeoutSec: 60
   memcachedServers:
       - "127.0.0.1:1234"
       - "127.0.0.2:1235"
cpus: 0
tz: ""

maxBatchSize: 300000
graphite:
    host: ""
    interval: "60s"
    prefix: "carbon.api"
    pattern: "{prefix}.{fqdn}"
idleConnections: 10
pidFile: ""
upstreams:
    tldCacheDisabled: false

    buckets: 10

    slowLogThreshold: "1s"

    timeouts:
        find: "55s"
        render: "20s"
        connect: "800ms"

    concurrencyLimitPerServer: 0

    keepAliveInterval: "230s"

    maxIdleConnsPerHost: 100

    doMultipleRequestsIfSplit: false

    backends:
        - "http://127.0.0.1:8080"

    backendsv2:
        backends:
          -
            groupName: "group1"
            protocol: "carbonapi_v2_pb"
            lbMethod: "broadcast"
            maxTries: 3
            maxBatchSize: 300000
            keepAliveInterval: "10s"
            concurrencyLimit: 0
            maxIdleConnsPerHost: 1000
            forceAttemptHTTP2: false
            doMultipleRequestsIfSplit: false
            timeouts:
                find: "400s"
                render: "50s"
                connect: "500ms"
            servers:
                - "http://127.0.0.1:8080"

          -
            groupName: "group2"
            protocol: "carbonapi_v3_pb"
            lbMethod: "roundrobin"
            servers:
                - "http://127.0.0.4:8080"
                - "http://127.0.0.5:8080"


    carbonsearch:
        backend: "http://127.0.0.1:8070"
        prefix: "virt.v1.*"
    carbonsearchv2:
        backends:
            -
              groupName: "group1"
              protocol: "carbonapi_v2_pb"
              lbMethod: "broadcast"
              servers:
                  - "http://127.0.0.4:8080"
                  - "http://127.0.0.5:8080"
        prefix: "virt.v1.*"

    graphite09compat: false
expireDelaySec: 10
logger:
    - logger: ""
      file: "stderr"
      level: "debug"
      encoding: "console"
      encodingTime: "iso8601"
      encodingDuration: "seconds"
    - logger: ""
      file: "/var/log/carbonapi/carbonapi.log"
      level: "debug"
      encoding: "json"

Simplified query (if applicable)
Query is noted in multiple lines in logs.

Backend metric retention and aggregation schemas
Will provide if that relevant.

Backend response (if possible)
Single backend in our system. The log of the backend go-carbon is copied above.

Additional context
Our node executes both go-carbon and carbonapi. We have a dev node which reproduces the issue even though it has much less load than the prod because carbonapi almost doesn't receive any queries.

@interfan7 interfan7 added the bug label Jan 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant