Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

node_exporter collector failure for zfs err="couldn't get sysctl: no such file or directory" #2847

Open
void-fm opened this issue Nov 15, 2023 · 3 comments

Comments

@void-fm
Copy link

void-fm commented Nov 15, 2023

Host operating system: output of uname -a

FreeBSD stable/14-n265566 aarch64 1400500 1400500

node_exporter version: output of node_exporter --version

% node_exporter --version
node_exporter, version (branch: , revision: unknown)
build user:
build date:
go version: go1.20.8
platform: freebsd/arm64
tags: unknown

node_exporter command line flags

(defaults)

node_exporter log output

Nov 15 17:14:22 REDACTED node_exporter[58452]: ts=2023-11-15T17:14:22.737Z caller=collector.go:169 level=error msg="collector failed" name=zfs duration_seconds=0.000465773 err="couldn't get sysctl: no such file or directory"

(every 15 seconds in /var/log/daemon.log)

Workaround for now is editing /usr/local/etc/rc.d/node_exporter and finding this line:

: ${node_exporter_args:=""

and editing it like so:

: ${node_exporter_args:="--no-collector.zfs"}
then restarting node_collector.

@eekay35
Copy link

eekay35 commented Dec 8, 2023

I also found this error after upgrading to FreeBSD 14.0-RELEASE (at least, I hadn't noticed it before that). You shouldn't be editing the RC file, though. Could cause problems in the future and will be overwritten on next node_exporter update. Just add the args line to /etc/rc.conf and restart the node_exporter service:

sysrc node_exporter_args="--no-collector.zfs"
service node_exporter restart

dekimsey added a commit to dekimsey/node_exporter that referenced this issue Mar 23, 2024
When the zfs collector fails on FreeBSD it doesn't log which `mib` triggered the issue. This makes diagnostics hard.

Incompatibilities in the list of supported mibs is not uncommon with major os updates. By adding this change, it'll be easier for users to report the specific mib that is triggering the failure.

Related to prometheus#2847

Signed-off-by: Daniel Kimsey <[email protected]>
@dekimsey
Copy link
Contributor

dekimsey commented Mar 23, 2024

I pulled the list of mibs being scanned and passed them to sysctl on my FreeBSD 14.0 box, looks like kstat.zfs.misc.arcstats.p is the missing oid. The code suggests this is known, but it doesn't stop trying to access the value so at least the error is simply noisy, the rest of the zfs stats are being collected normally.

Additionally, I spiked a quick change that would use getUname to grab the OS Release and then add the appropriate sysctl stats. It's not clear to me if that is a road the project would want to go. I'd be more comfortable proposing it if I had more samples from the other *BSDs to allow a major minor function to return something sensible, but I don't have any.

@eekay35
Copy link

eekay35 commented Mar 27, 2024

Excellent, thank you! Once this new code goes into place via FBSD's ports/packages, I'll give it a try and verify. Looks good, though. I (and likely many others) appreciate it!

discordianfish pushed a commit that referenced this issue Apr 10, 2024
When the zfs collector fails on FreeBSD it doesn't log which `mib` triggered the issue. This makes diagnostics hard.

Incompatibilities in the list of supported mibs is not uncommon with major os updates. By adding this change, it'll be easier for users to report the specific mib that is triggering the failure.

Related to #2847

Signed-off-by: Daniel Kimsey <[email protected]>
gitperr pushed a commit to gitperr/node_exporter that referenced this issue Apr 30, 2024
When the zfs collector fails on FreeBSD it doesn't log which `mib` triggered the issue. This makes diagnostics hard.

Incompatibilities in the list of supported mibs is not uncommon with major os updates. By adding this change, it'll be easier for users to report the specific mib that is triggering the failure.

Related to prometheus#2847

Signed-off-by: Daniel Kimsey <[email protected]>
gitperr pushed a commit to gitperr/node_exporter that referenced this issue Apr 30, 2024
Signed-off-by: David O'Rourke <[email protected]>

chore:remove constant from function (prometheus#2884)

Signed-off-by: tyltr <[email protected]>

build(deps): bump github.com/jsimonetti/rtnetlink from 1.4.0 to 1.4.1 (prometheus#2909)

Bumps [github.com/jsimonetti/rtnetlink](https://github.com/jsimonetti/rtnetlink) from 1.4.0 to 1.4.1.
- [Release notes](https://github.com/jsimonetti/rtnetlink/releases)
- [Commits](jsimonetti/rtnetlink@v1.4.0...v1.4.1)

---
updated-dependencies:
- dependency-name: github.com/jsimonetti/rtnetlink
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

fix hwmon nil ptr (prometheus#2873)

* fix hwmon nil ptr

syslink maybe lost in some cases.

---------

Signed-off-by: TaoGe <[email protected]>

Fix hwmon error capture (prometheus#2915)

Fix golangci-lint "ineffectual assignment" by correctly capturing any
errors within the hwmon gathering loop.

Signed-off-by: Ben Kochie <[email protected]>

Update common Prometheus files (prometheus#2917)

Signed-off-by: prombot <[email protected]>

Revert "Add ZFS freebsd per dataset stats (prometheus#2753)" (prometheus#2925)

This reverts commit f34aaa6.

Signed-off-by: Caleb Webber <[email protected]>

filesystem: fix mountTimeout not working issue (prometheus#2903)

Signed-off-by: DongWei <[email protected]>

Fix description for NodeDiskIOSaturation alert (prometheus#2929)

NodeDiskIOSaturation description should say 30m per the "for" clause

Signed-off-by: Taylor Sly <[email protected]>

Enforce no subprocess policy (prometheus#2926)

Add depguard to golangci-lint to enforce the no-os/exec policy.

Signed-off-by: Ben Kochie <[email protected]>

filesystem: surface device errors (prometheus#2923)

filesystem: surface filesystem device error

Fixes: prometheus#2918
---------

Signed-off-by: Pamela Mei i540369 <[email protected]>

Revert "filesystem: fix mountTimeout not working issue (prometheus#2903)" (prometheus#2932)

This reverts commit 9f1f791.

Signed-off-by: Ben Kochie <[email protected]>

Update common Prometheus files (prometheus#2939)

Signed-off-by: prombot <[email protected]>

Update common Prometheus files (prometheus#2946)

Signed-off-by: prombot <[email protected]>

Update common Prometheus files (prometheus#2949)

Signed-off-by: prombot <[email protected]>

Add multi-cluster support for Nodes dashboard (prometheus#2945)

Signed-off-by: Adrian Berger <[email protected]>

disable selinux,fix end-to-end-test.sh error(prometheus#2934) (prometheus#2937)

Signed-off-by: heyitao <[email protected]>
Co-authored-by: heyitao <[email protected]>

Add new collector and metrics for watchdog (prometheus#2309) (prometheus#2880)

Signed-off-by: Gavin Lam <[email protected]>

Enable watchdog module by default; Add no data error (prometheus#2953)

Signed-off-by: Gavin Lam <[email protected]>

Update common Prometheus files (prometheus#2954)

Signed-off-by: prombot <[email protected]>

build(deps): bump google.golang.org/protobuf from 1.32.0 to 1.33.0 (prometheus#2955)

Bumps google.golang.org/protobuf from 1.32.0 to 1.33.0.

---
updated-dependencies:
- dependency-name: google.golang.org/protobuf
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <[email protected]>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

Update common Prometheus files (prometheus#2959)

Signed-off-by: prombot <[email protected]>

Sanitize ethtool metric name keys

Apply the same metric name sanitization to the keys as to the metric
names. This avoids conflicting help strings in the metric registry.

Fixes: prometheus#2893

Signed-off-by: Ben Kochie <[email protected]>

Update common Prometheus files

Signed-off-by: prombot <[email protected]>

chore: fix some typos (prometheus#2974)

Signed-off-by: occupyhabit <[email protected]>

collector/textfile: Avoid inconsistent help-texts (prometheus#2962)

Avoid metrics with inconsistent help-texts. The earlier behaviour has
been preserved in the sense that the first encountered instance is still
used to generate metrics, whereas the subsequent inconsistent ones are
ignored along with a few peripheral changes.

```
 # HELP node_scrape_collector_duration_seconds node_exporter: Duration of a collector scrape.
 #TYPE node_scrape_collector_duration_seconds gauge
 node_scrape_collector_duration_seconds{collector="textfile"} 0.0004005
 # HELP node_scrape_collector_success node_exporter: Whether a collector succeeded.
 # TYPE node_scrape_collector_success gauge
 node_scrape_collector_success{collector="textfile"} 1
 # HELP node_textfile_mtime_seconds Unixtime mtime of textfiles successfully read.
 # TYPE node_textfile_mtime_seconds gauge
 node_textfile_mtime_seconds{file="/Users/rexagod/repositories/misc/node_exporter/ne-bar.prom"} 1.710812009e+09
 node_textfile_mtime_seconds{file="/Users/rexagod/repositories/misc/node_exporter/ne-foo.prom"} 1.710811982e+09
 # HELP node_textfile_scrape_error 1 if there was an error opening or reading a file, 0 otherwise
 # TYPE node_textfile_scrape_error gauge
 node_textfile_scrape_error 1
 # HELP promhttp_metric_handler_errors_total Total number of internal errors encountered by the promhttp metric handler.
 # TYPE promhttp_metric_handler_errors_total counter
 promhttp_metric_handler_errors_total{cause="encoding"} 0
 promhttp_metric_handler_errors_total{cause="gathering"} 0
 # HELP promhttp_metric_handler_requests_in_flight Current number of scrapes being served.
 # TYPE promhttp_metric_handler_requests_in_flight gauge
 promhttp_metric_handler_requests_in_flight 1
 # HELP promhttp_metric_handler_requests_total Total number of scrapes by HTTP status code.
 # TYPE promhttp_metric_handler_requests_total counter
 promhttp_metric_handler_requests_total{code="200"} 0
 promhttp_metric_handler_requests_total{code="500"} 0
 promhttp_metric_handler_requests_total{code="503"} 0
 # HELP tau_infrastructure_performing_maintenance_task At what timestamp a given task started or stopped, the last time it was run.
 # TYPE tau_infrastructure_performing_maintenance_task gauge
 tau_infrastructure_performing_maintenance_task{main_task="nightly",start_or_stop="start",sub_task="main"} 1.64728080198446e+09
```

Fixes: prometheus#2317

Signed-off-by: Pranshu Srivastava <[email protected]>

Update common Prometheus files (prometheus#2973)

Signed-off-by: prombot <[email protected]>

zfs: Log mib when sysctl read fails on FreeBSD

When the zfs collector fails on FreeBSD it doesn't log which `mib` triggered the issue. This makes diagnostics hard.

Incompatibilities in the list of supported mibs is not uncommon with major os updates. By adding this change, it'll be easier for users to report the specific mib that is triggering the failure.

Related to prometheus#2847

Signed-off-by: Daniel Kimsey <[email protected]>

chore: fix typo in comment

Signed-off-by: looklose <[email protected]>

fibre_channel: update procfs to take into account optional attributes (prometheus#2933)

Signed-off-by: machine424 <[email protected]>

refactor: Optimize code by using built-in constants in the standard library (prometheus#2989)

Signed-off-by: coderwander <[email protected]>

os_release.go: Removed caching of modtime/filename of os-release file. (prometheus#2987)

Signed-off-by: Jonathan Davies <[email protected]>

fix: data race of NetClassCollector metrics initialization when multiple requests happen (prometheus#2995)

Signed-off-by: John Guo <[email protected]>

Update common Prometheus files (prometheus#2992)

Signed-off-by: prombot <[email protected]>

Update build (prometheus#3000)

* Update Go to 1.22.
* Update Go modules.
* Use new version collector.
* Use standard library slices package.

Signed-off-by: Ben Kochie <[email protected]>

Fix watchdog_test lint and test failures on macos. (prometheus#3003)

Ensure identical build flags embedded in both files.

Signed-off-by: Chris Cleeland <[email protected]>

Release v1.8.0 (prometheus#3002)

* [CHANGE] exec_bsd: Fix labels for `vm.stats.sys.v_syscall` sysctl prometheus#2895
* [CHANGE] diskstats: Ignore zram devices on linux systems prometheus#2898
* [CHANGE] textfile: Avoid inconsistent help-texts  prometheus#2962
* [CHANGE] os: Removed caching of modtime/filename of os-release file prometheus#2987
* [FEATURE] xfrm: Add new collector prometheus#2866
* [FEATURE] watchdog: Add new collector prometheus#2880
* [ENHANCEMENT] cpu_vulnerabilities: Add mitigation information label prometheus#2806
* [ENHANCEMENT] nfsd: Handle new `wdeleg_getattr` attribute prometheus#2810
* [ENHANCEMENT] netstat: Add TCPOFOQueue to default netstat metrics prometheus#2867
* [ENHANCEMENT] filesystem: surface device errors prometheus#2923
* [ENHANCEMENT] os: Add support end parsing prometheus#2982
* [ENHANCEMENT] zfs: Log mib when sysctl read fails on FreeBSD prometheus#2975
* [ENHANCEMENT] fibre_channel: update procfs to take into account optional attributes prometheus#2933
* [BUGFIX] cpu: Fix debug log in cpu collector prometheus#2857
* [BUGFIX] hwmon: Fix hwmon nil ptr prometheus#2873
* [BUGFIX] hwmon: Fix hwmon error capture prometheus#2915
* [BUGFIX] zfs: Revert "Add ZFS freebsd per dataset stats prometheus#2925
* [BUGFIX] ethtool: Sanitize ethtool metric name keys prometheus#2940
* [BUGFIX] fix: data race of NetClassCollector metrics initialization prometheus#2995

Signed-off-by: Ben Kochie <[email protected]>

Add logging for ethtool device include/exclude and metrics include flags (prometheus#2979)

Signed-off-by: Sam Leiken <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants