-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Infiniband metrics: still not collected when irdma is loaded (PE 1.7.0) #2846
Comments
For the collector to return no data, it means that the FS.InfiniBandClass function in procfs is returning os.ErrNotExist. func (c *infinibandCollector) Update(ch chan<- prometheus.Metric) error {
devices, err := c.fs.InfiniBandClass()
if err != nil {
if errors.Is(err, os.ErrNotExist) {
level.Debug(c.logger).Log("msg", "infiniband statistics not found, skipping")
return ErrNoData
}
... There are multiple places in the InfiniBandClass procfs collector which could potentially return os.ErrNotExist. Can you please paste a recursive directory listing of your |
In comparison, when the
Content of the directory related to the IB driver:
The
|
Can you also dig a bit deeper into the There is one other bit of code in the procfs collector that might be bailing out: // Parse legacy counters
path = filepath.Join(portPath, "counters_ext")
files, err = os.ReadDir(path)
if err != nil && !os.IsNotExist(err) {
return nil, err
} There is a good chance that the |
Here are the listing of the
|
Aha, I also misread the code I quoted in my previous comment, since it would tolerate os.ErrNotExist for the However, this code will bail out on the func parseInfiniBandCounters(portPath string) (*InfiniBandCounters, error) {
var counters InfiniBandCounters
path := filepath.Join(portPath, "counters")
files, err := os.ReadDir(path)
if err != nil {
return nil, err
}
... |
I would have assumed that Node Exporter will go through all the paths under Why the exporter is giving up (seemingly) after its first try? |
@mtds The behaviour is due to fairly generic error handling in the procfs code, whereby it bails out upon pretty much any error. I suspect that the code was originally written by somebody who only had access to Mellanox HCAs, since they are (in my experience) by far the most common IB hardware in use for about the last 10 years. The Intel irdma driver has opted to only implement This should be a fairly easy fix, but unfortunately will require another release cycle of both procfs and node_exporter. |
@dswarbrick Thanks, it's clear now. For the time being, I guess we can easily implement the workaround on our side (unload the Should I open a bug report on the |
@mtds I would recommend opening an issue on the procfs repository and reference this one, also keeping it open as a placeholder until a new node_exporter is released with a fix. |
For reference: procfs#589 issue. |
Just pulled and built master, even with the procfs issue resolved, node_exporter still does not work if irdma is loaded. |
@blixuga Can you please provide debug logs so that we can try to resolve this? The more info, the better. |
Host operating system: output of
uname -a
Host operating system: Rocky Linux 8.8
node_exporter version: output of
node_exporter --version
node_exporter command line flags
node_exporter log output
Are you running node_exporter in Docker?
No.
What did you do that produced an error?
There's no error whatsoever: the exporter is just not able to collect IB metrics (see next section).
What did you expect to see?
When the
irdma
module is not loaded, Node Exporter correctly collects and reports IB metrics:What did you see instead?
Infiniband metrics are not collected when the
irdma
module is loaded:Workaround
irdma
module:References
The text was updated successfully, but these errors were encountered: