Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed to compile eBPF code for the Linux distro 'debian' running kernel version 6.5.0-1018-aws. #264

Open
ccoqueiro opened this issue Apr 25, 2024 · 6 comments
Labels
bug Something isn't working help wanted Extra attention is needed

Comments

@ccoqueiro
Copy link

What happened?

Description

When installing ebpf, the collector kernel pod, although running, emits the following error:

2024-04-25 17:47:50.398732+00:00 debug [p:28721 t:28721] TCPChannel::connect: Conectando a la entrada @ opentelemetry-ebpf-reducer:7000
En el archivo incluido de .. /.. /.. /src/collector/kernel/bpf_src/render_bpf.c:39:
En el archivo incluido de include/net/tcp.h:35:
En el archivo incluido de include/net/sock_reuseport.h:5:
En el archivo incluido de include/linux/filter.h:9:
include/linux/bpf.h:321:10: Error: Aplicación no válida de 'sizeof' a un tipo incompleto 'struct bpf_rb_root'
return sizeof(struct bpf_rb_root);
^ ~~~~~~~~~~~~~~~~~~~~
include/linux/bpf.h:321:24: Nota: declaración directa de 'struct bpf_rb_root'
return sizeof(struct bpf_rb_root);
^
include/linux/bpf.h:323:10: Error: Aplicación no válida de 'sizeof' a un tipo incompleto 'struct bpf_rb_node'
return sizeof(struct bpf_rb_node);

Important that the following command was run before installation:

sudo apt-get install --yes linux-headers-$(uname -r)

Kernel version: Linux show-no-config-i-05bbcdabc7509e781 6.5.0-1018-aws #18~22.04.1-Ubuntu SMP Fri Apr 5 17:44:33 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

Steps to Reproduce

helm repo add open-telemetry https://open-telemetry.github.io/opentelemetry-helm-charts
helm repo update open-telemetry
helm install my-opentelemetry-ebpf -f ./otel-ebpf-values.yaml open-telemetry/opentelemetry-ebpf
check logs of kernel collector pod

Expected Result

transmission of metrics

Actual Result

Errors in data collection.

eBPF Collector version

latest

Environment information

Environment

Kernel version: Linux show-no-config-i-05bbcdabc7509e781 6.5.0-1018-aws #18~22.04.1-Ubuntu SMP Fri Apr 5 17:44:33 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux

PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

eBPF Collector configuration

# Default values for opentelemetry-ebpf.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.

nameOverride: ""
fullnameOverride: ""
clusterName: "demohebnpm"

image:
  tag: ""
  registry: otel
  pullPolicy: IfNotPresent

imagePullSecrets: []

resources: {}

# OTLP gRPC endpoint to send the collected metrics
endpoint:
  address: "0.0.0.0"
  port: 4317

log:
  console: true
  # possible values: { error | warning | info | debug | trace }
  level: debug

debug:
  enabled: true
  storeMinidump: false
  sendUnplannedExitMetric: false

kernelCollector:
  enabled: true
  serviceAccount:
    create: true
    name: ""
  image:
    registry: ""
    tag: ""
    name: opentelemetry-ebpf-kernel-collector

  nodeSelector: {}
  disableHttpMetrics: false

  tolerations:
    - operator: "Exists"
      effect: "NoExecute"
    - operator: "Exists"
      effect: "NoSchedule"

  affinity: {}
  resources: {}

  # uncomment the line below to disable automatic kernel headers fetching
fetchKernelHeaders: true

  # uncomment to enable enrichment using Docker metadata
  useDockerMetadata: true

  # uncomment to enable enrichment using Nomad metadata (https://www.nomadproject.io/)
  collectNomadMetadata: true

cloudCollector:
  enabled: false
  image:
    registry: ""
    tag: ""
    name: opentelemetry-ebpf-cloud-collector

  serviceAccount:
    create: true
    name: ""
    annotations: {}
      ## eks.amazonaws.com/role-arn: "role-arn-name"

  tolerations: []
  affinity: {}

k8sCollector:
  enabled: true
  serviceAccount:
    create: true
    name: ""
  relay:
    image:
      registry: ""
      tag: ""
      name: opentelemetry-ebpf-k8s-relay
  watcher:
    image:
      registry: ""
      tag: ""
      name: opentelemetry-ebpf-k8s-watcher

  tolerations: []
  affinity: {}

reducer:
  image:
    registry: ""
    tag: ""
    name: opentelemetry-ebpf-reducer
  extraArgs: {}
  ingestShards: 1
  matchingShards: 1
  aggregationShards: 1
  disableInternalMetrics: true
  disableMetrics: []
    ### to disable an entire metric category: ###
# - tcp.all
    # - udp.all
    # - dns.all
    # - http.all
    ### to disable an individual metric: ###
    ### tcp ###
    # - tcp.bytes
    # - tcp.rtt.num_measurements
    # - tcp.active
    # - tcp.rtt.average
    # - tcp.packets
    # - tcp.retrans
    # - tcp.syn_timeouts
    # - tcp.new_sockets
    # - tcp.resets
    ### udp ###
    # - udp.bytes
    # - udp.packets
    # - udp.active
    # - udp.drops
    ### dns ###
    # - dns.client.duration.average
    # - dns.server.duration.average
    # - dns.active_sockets
    # - dns.responses
    # - dns.timeouts
    ### http ##
    # - http.client.duration.average
    # - http.server.duration.average
    # - http.active_sockets
    # - http.status_code
    ### ebpf_net ##
    # - ebpf_net.span_utilization_fraction
    # - ebpf_net.pipeline_metric_bytes_discarded
    # - ebpf_net.codetiming_min_ns
    # - ebpf_net.entrypoint_info
    # - ebpf_net.otlp_grpc.requests_sent
    # - ebpf_net.connections
    # - ebpf_net.rpc_queue_elem_utilization_fraction
    # - ebpf_net.disconnects
    # - ebpf_net.codetiming_avg_ns
    # - ebpf_net.client_handle_pool
    # - ebpf_net.otlp_grpc.successful_requests
    # - ebpf_net.span_utilization
    # - ebpf_net.up
    # - ebpf_net.rpc_queue_buf_utilization_fraction
    # - ebpf_net.collector_log_count
    # - ebpf_net.time_since_last_message_ns
    # - ebpf_net.bpf_log
    # - ebpf_net.codetiming_count
    # - ebpf_net.message
    # - ebpf_net.otlp_grpc.bytes_sent
    # - ebpf_net.pipeline_message_error
    # - ebpf_net.pipeline_metric_bytes_written
    # - ebpf_net.codetiming_max_ns
  # - ebpf_net.codetiming_sum_ns
    # - ebpf_net.otlp_grpc.failed_requests
    # - ebpf_net.rpc_queue_buf_utilization
    ### to enable all metrics (including metrics turned off by default): ###
    # - none
  enableMetrics: []
    ### Disable metrics flag is evaluated first and only then enable metric flag is evaluated. ###
    ### to enable an entire metric category: ###
    # - tcp.all
    # - udp.all
    # - dns.all
    # - http.all
    # - ebpf_net.all
    ### to enable an individual metric: ###
    ### tcp ###
    # - tcp.bytes
    # - tcp.rtt.num_measurements
    # - tcp.active
    # - tcp.rtt.average
    # - tcp.packets
    # - tcp.retrans
    # - tcp.syn_timeouts
    # - tcp.new_sockets
    # - tcp.resets
    ### udp ###
    # - udp.bytes
    # - udp.packets
    # - udp.active
    # - udp.drops
    ### dns ###
    # - dns.client.duration.average
    # - dns.server.duration.average
    # - dns.active_sockets
    # - dns.responses
    # - dns.timeouts
    ### http ###
    # - http.client.duration.average
    # - http.server.duration.average
    # - http.active_sockets
    # - http.status_code
    ### ebpf_net ###
    # - ebpf_net.span_utilization_fraction
    # - ebpf_net.pipeline_metric_bytes_discarded
    # - ebpf_net.codetiming_min_ns
    # - ebpf_net.entrypoint_info
    # - ebpf_net.otlp_grpc.requests_sent
    # - ebpf_net.connections
    # - ebpf_net.rpc_queue_elem_utilization_fraction
    # - ebpf_net.disconnects
    # - ebpf_net.codetiming_avg_ns
    # - ebpf_net.client_handle_pool
    # - ebpf_net.otlp_grpc.successful_requests
    # - ebpf_net.span_utilization
    # - ebpf_net.rpc_queue_elem_utilization_fraction
    # - ebpf_net.disconnects
    # - ebpf_net.codetiming_avg_ns
    # - ebpf_net.client_handle_pool
    # - ebpf_net.otlp_grpc.successful_requests
    # - ebpf_net.span_utilization
    # - ebpf_net.up
    # - ebpf_net.rpc_queue_buf_utilization_fraction
    # - ebpf_net.collector_log_count
    # - ebpf_net.time_since_last_message_ns
    # - ebpf_net.bpf_log
    # - ebpf_net.codetiming_count
    # - ebpf_net.message
    # - ebpf_net.otlp_grpc.bytes_sent
    # - ebpf_net.pipeline_message_error
    # - ebpf_net.pipeline_metric_bytes_written
    # - ebpf_net.codetiming_max_ns
    # - ebpf_net.span_utilization_max
    # - ebpf_net.client_handle_pool_fraction
    # - ebpf_net.span_utilization_fraction
    # - ebpf_net.rpc_latency_ns
    # - ebpf_net.agg_root_truncation
    # - ebpf_net.clock_offset_ns
    # - ebpf_net.otlp_grpc.metrics_sent
    # - ebpf_net.otlp_grpc.unknown_response_tags
    # - ebpf_net.collector_health
    # - ebpf_net.codetiming_sum_ns
    # - ebpf_net.otlp_grpc.failed_requests
    # - ebpf_net.rpc_queue_buf_utilization

  resources: {}
  nodeSelector: {}
  tolerations: []
  affinity: {}
  service:
    type: ClusterIP
    ports:
      telemetry:
        enabled: true
        servicePort: 7000
        containerPort: 7000
        targetPort: 7000
        protocol: TCP
        appProtocol: http
      stats:
        enabled: true
        servicePort: 7001
        containerPort: 7001
        targetPort: 7001
        protocol: TCP
        appProtocol: http

rbac:
  create: true

Log output

PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
PWD=/srv
SHLVL=0
SSL_CERT_DIR=/etc/ssl/certs
_=/usr/bin/env
===========================================================
resolving kernel headers...
cleaning up stale kprobes...
launching kernel collector...
+ exec /srv/kernel-collector --host-distro debian --kernel-headers-source pre_installed --config-file=/etc/network-explorer/config.yaml --force-docker-metadata --log-console --debug
2024-04-25 17:47:41.000682+00:00 debug [p:28721 t:28721] setting up breakpad...
2024-04-25 17:47:41.000794+00:00 debug [p:28721 t:28721] setting up breakpad...
2024-04-25 17:47:41.000909+00:00 info [p:28721 t:28721] Starting Kernel Collector version 0.10.0 (release)
2024-04-25 17:47:41.000921+00:00 info [p:28721 t:28721] Kernel Collector agent ID is FAIDIN4D2V25Q0YAXWQK8F1QSLM5FG688BQB
2024-04-25 17:47:41.000925+00:00 info [p:28721 t:28721] Running on:
   sysname: Linux
  nodename: show-no-config-i-05bbcdabc7509e781
   release: 6.5.0-1018-aws
   version: #18~22.04.1-Ubuntu SMP Fri Apr  5 17:44:33 UTC 2024
   machine: x86_64
2024-04-25 17:47:41.000947+00:00 info [p:28721 t:28721] HTTP Metrics: Enabled
2024-04-25 17:47:41.000949+00:00 info [p:28721 t:28721] Socket stats interval in seconds: 10
2024-04-25 17:47:41.000950+00:00 info [p:28721 t:28721] Userland TCP: Disabled
2024-04-25 17:47:41.007377+00:00 debug [p:28721 t:28721] Unable to fetch AWS metadata: no metadata returned by AWS
2024-04-25 17:47:41.019944+00:00 debug [p:28721 t:28721] Unable to fetch GCP metadata: error while fetching Google Cloud Platform instance metadata: Could not resolve host: metadata.google.internal
2024-04-25 17:47:41.019960+00:00 debug [p:28721 t:28721] Unable to fetch Nomad metadata - environment variables not found
2024-04-25 17:47:41.019970+00:00 info [p:28721 t:28721] Kernel Collector version 0.10.0 (release) started on host show-no-config-i-05bbcdabc7509e781
2024-04-25 17:47:41.020086+00:00 info [p:28721 t:28721] Node label has been set in config: 'environment':'demohebnpm'
2024-04-25 17:47:41.047126+00:00 debug [p:28721 t:28721] intake record file: ``
2024-04-25 17:47:41.047191+00:00 debug [p:28721 t:28721] starting event loop...
2024-04-25 17:47:50.398714+00:00 info [p:28721 t:28721] connecting to opentelemetry-ebpf-reducer:7000 (binary)...
2024-04-25 17:47:50.398732+00:00 debug [p:28721 t:28721] TCPChannel::connect: Connecting to intake @ opentelemetry-ebpf-reducer:7000
In file included from ../../../src/collector/kernel/bpf_src/render_bpf.c:39:
In file included from include/net/tcp.h:35:
In file included from include/net/sock_reuseport.h:5:
In file included from include/linux/filter.h:9:
include/linux/bpf.h:321:10: error: invalid application of 'sizeof' to an incomplete type 'struct bpf_rb_root'
                return sizeof(struct bpf_rb_root);
                       ^     ~~~~~~~~~~~~~~~~~~~~
include/linux/bpf.h:321:24: note: forward declaration of 'struct bpf_rb_root'
                return sizeof(struct bpf_rb_root);
                                     ^
include/linux/bpf.h:323:10: error: invalid application of 'sizeof' to an incomplete type 'struct bpf_rb_node'
                return sizeof(struct bpf_rb_node);
                       ^     ~~~~~~~~~~~~~~~~~~~~
include/linux/bpf.h:323:24: note: forward declaration of 'struct bpf_rb_node'
                return sizeof(struct bpf_rb_node);
                                     ^
include/linux/bpf.h:325:10: error: invalid application of 'sizeof' to an incomplete type 'struct bpf_refcount'
                return sizeof(struct bpf_refcount);
                       ^     ~~~~~~~~~~~~~~~~~~~~~
include/linux/bpf.h:325:24: note: forward declaration of 'struct bpf_refcount'
                return sizeof(struct bpf_refcount);
                                     ^
include/linux/bpf.h:347:10: error: invalid application of '__alignof' to an incomplete type 'struct bpf_rb_root'
                return __alignof__(struct bpf_rb_root);
                       ^          ~~~~~~~~~~~~~~~~~~~~
include/linux/bpf.h:347:29: note: forward declaration of 'struct bpf_rb_root'
                return __alignof__(struct bpf_rb_root);
                                          ^
include/linux/bpf.h:349:10: error: invalid application of '__alignof' to an incomplete type 'struct bpf_rb_node'
                return __alignof__(struct bpf_rb_node);
                       ^          ~~~~~~~~~~~~~~~~~~~~
include/linux/bpf.h:349:29: note: forward declaration of 'struct bpf_rb_node'
                return __alignof__(struct bpf_rb_node);
                                          ^
include/linux/bpf.h:351:10: error: invalid application of '__alignof' to an incomplete type 'struct bpf_refcount'
                return __alignof__(struct bpf_refcount);
                       ^          ~~~~~~~~~~~~~~~~~~~~~
include/linux/bpf.h:351:29: note: forward declaration of 'struct bpf_refcount'
                return __alignof__(struct bpf_refcount);
                                          ^
../../../src/collector/kernel/bpf_src/tcp-processor/bpf_tcp_send_recv.h:184:53: error: no member named 'iov' in 'struct iov_iter'
  bpf_probe_read(&iov, sizeof(iov), &(msg->msg_iter.iov));
                                      ~~~~~~~~~~~~~ ^
../../../src/collector/kernel/bpf_src/tcp-processor/bpf_tcp_send_recv.h:393:53: error: no member named 'iov' in 'struct iov_iter'
  bpf_probe_read(&iov, sizeof(iov), &(msg->msg_iter.iov));
                                      ~~~~~~~~~~~~~ ^
8 errors generated.
2024-04-25 17:47:56.205695+00:00 error [p:28721 t:28721] Cannot initialize BPF program, res=-1

Failed to compile eBPF code for the Linux distro 'debian' running kernel version 6.5.0-1018-aws.

troubleshoot item bpf_compilation_failed (os=Linux,flavor=debian,headers_src=pre_installed,kernel=6.5.0-1018-aws): ProbeHandler couldn't load BPFModule: Success

This usually means that kernel headers weren't installed correctly.

Please reach out to support and include this log in its entirety so we can diagnose and fix
the problem.


In the meantime, please install kernel headers manually on each host before running
the Kernel Collector.

To manually install kernel headers, follow the instructions below:

  - for Debian/Ubuntu based distros, run:

      sudo apt-get install --yes "linux-headers-`uname -r`"

  - for RedHat based distros like CentOS and Amazon Linux, run:

      sudo yum install -y "kernel-devel-`uname -r`"

Additional context

No response

@ccoqueiro ccoqueiro added the bug Something isn't working label Apr 25, 2024
@yonch
Copy link
Contributor

yonch commented Apr 25, 2024

The first set of errors (include/linux/bpf.h), at first glance, could be due to some internal inconsistency in the kernel headers. For example take the first error:

so there should be a full definition -- curious.

@ccoqueiro would the package repository used to install the packages contain recent versions of the headers? Is the kernel on that machine a recent release in the distro?

@yonch
Copy link
Contributor

yonch commented Apr 25, 2024

The two errors in bpf_tcp_send_recv.h:

  • v6.5 definition of msghdr
  • msg->msg_iter is a struct iov_iter, defined here
  • Some digging in git history shows commit de4f5fed3f231
-               const struct iovec *iov;
+               /* use iter_iov() to get the current vec */
+               const struct iovec *__iov;
  • Seems to have originated in v6.4:
$ git describe --contains de4f5fed3f231
v6.4-rc1~214^2~10

So we'd want to figure out what Iter_iov() does and handle the modified structure with an #if LINUX_VERSION_CODE < KERNEL_VERSION(6, 4, 0) (edit: the < case would contain old code, and the #else for the new)

@ccoqueiro
Copy link
Author

Hello @yonch , I understand that yes, I'm using the chart opentelemetry ebpf package -> https://github.com/open-telemetry/opentelemetry-helm-charts/tree/main/charts/opentelemetry-ebpf

@yonch
Copy link
Contributor

yonch commented Apr 29, 2024

@ccoqueiro I'm wondering if the header package might somehow be old/broken, is one of these true in your case:

  1. The package repo for the distro (used by apt) is not standard
  2. The kernel header package was installed a long time ago and not updated
  3. The machine is running a bleeding edge kernel for the distro (so the header packaging might be work-in-progress)

and if the answer is no, a couple of things to try:

  • updating the packages on the system apt-get upgrade, see if that fixes the headers
  • running on a machine that does not have headers (e.g., without first running sudo apt-get install --yes linux-headers-$(uname -r), so letting the network collector fetch its own headers

note that these will probably only fix the first set of errors. The second set requires modifications in the eBPF code. Are you in a position to pursue those, or should we search for community contributors?

@ccoqueiro
Copy link
Author

Hello @yonch

Answering questions:

  1. The package repo for the distro (used by apt) is not standard. The distro I used is an ubuntu 22.04 provided by AWS, I understand it's standard.
  2. The kernel header package was installed a long time ago and not up. The kernel header package was not installed, I installed it as a prerequisite for the installation of otel ebpf.
  3. The machine is running a bleeding edge kernel for the distro (so the header packaging might be work-in-progress)
    .I can't answer this question, how could we check this?

updating the packages on the system apt-get upgrade, see if that fixes the headers. Done but not fixed the headers.
running on a machine that does not have headers (e.g., without first running sudo apt-get install --yes linux-headers-$(uname -r), so letting the network collector fetch its own headers. I ran this command, installing the package reader before installing the ebpf otel, but it didn't help, it kept giving the same error.

The second set requires modifications in the eBPF code. Are you in a position to pursue those, or should we search for community contributors? To be quite honest with you, I have no idea how I would do this.

@yonch yonch added the help wanted Extra attention is needed label May 1, 2024
@yonch
Copy link
Contributor

yonch commented May 1, 2024

Got it @ccoqueiro, I marked with "help wanted" and will direct contributors here if asked. I'm sorry I don't have anything more immediate for you. If you find anyone who would like to tackle, happy to work with them!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants