Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow blacklisting sensors #25

Open
rudolf81 opened this issue Jan 24, 2024 · 6 comments
Open

Allow blacklisting sensors #25

rudolf81 opened this issue Jan 24, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@rudolf81
Copy link

rudolf81 commented Jan 24, 2024

I noticed my SSD temp (around 67 degs) gets picked up as the temp, and used for the rules, etc.

The SSD temp lives under /sys/class/hwmon/hwmon1/temp*_input.
temp1 is "nvme composite" temp, and "temp2 is nvme sensor 2".
Sensor 2 reads around 67 most of the time.

The SSD is not covered by the Thinkpad cooler (on the T16 at least):
https://laptopmedia.com/wp-content/uploads/2022/08/internals-1000x711.jpg

So, controlling fan temp based on this max temp, is not very useful.
TEMP_FILES_GLOB "/sys/class/hwmon/hwmon*/temp*_input" is too broad.

I think a solution could be to allow an override path for TEMP_FILES_GLOB to be specified in /etc/zcfan.conf
On my system, /sys/class/hwmon/hwmon6/ contains all the CPU and GPU temps.

(I dunno if having a single path override is going to be feasible for some hardware configuration - with dedicated graphics chips, which might be reported under a different /sys/class/hwmon/hwmon*/...)

@cdown
Copy link
Owner

cdown commented Jan 26, 2024

This is by design, on laptops the entire laptop is affected by airflow, and we only have one fan knob. It's not only supposed to be CPU/GPU specific. 67C is very close to recommended maximum operating temperature for basically any NVMe drive, so it's normal to pump a bunch of air in there.

It sounds like zcfan is doing the right thing here, that's way too close to max operating temp.

@rudolf81
Copy link
Author

rudolf81 commented Jan 26, 2024

Thanks for the reply.
/sys/class/hwmon/hwmon1/name is def "nvme"
and the temps:
/sys/class/hwmon/hwmon1/temp*_input:
show 31850 and 67850... I think all the time.

They never change.

I've got the T16 in idle mode... nothing going on and no load on storage.
The zcfan default for low_temp is 70 so, if my nvme is sitting at 67, then the fan won't even turn on...

Yes, I see you are right... close to 70 is the danger zone for nvme SSDs...

Ok, lets test:
I edited zcfan.conf to set max_temp to 66, to force the fan to run at max.
It's been going full tilt for a few mins now, with no load, and /sys/class/hwmon/hwmon1/temp1_input dropped from 31850 to 29850.
Interesting. I guess there is some airflow on the SSD, even through it is not covered by the heat pipes...

The other temp sensor on /sys/class/hwmon/hwmon1/temp3_input is STILL on 67850.
(There is no temp2 sensor).

Soooo... maybe the actual SSD temp is temp1_input, at around 30 degrees, and temp3_input is not working, or something else? It never changes.

@cdown
Copy link
Owner

cdown commented Feb 29, 2024

I'm willing to add a mechanism to disable faulty sensors, but it would need to be robust and not too cumbersome. One of the problems is hwmon ordering is not deterministic.

So probably what one would have is, instead, the ability to select which sensor(s) they want by name. Alternatively, we can just look for coretemp/k10temp and only use those sensors.

I don't know what the right answer is yet, it requires some thinking in terms of ergonomics and complexity.

@cdown cdown added the enhancement New feature or request label Feb 29, 2024
@cdown cdown changed the title zcfan pics up SSD temp in get_max_temp() instead of CPU/GPU temp Allow blacklisting sensors Feb 29, 2024
@rudolf81
Copy link
Author

Thanks again for the reply.

Yes, you are right. They might change numbers.
Blocking them based on name sounds like the right way to do it.
Probably just stating the sensors to skip is better than stating the single sensor to measure... so in case of multiple temp sensors for GPU, etc - it would work better.

Or - do make it work with select sensors only, but have them passed in as optional arguments at runtime (not from config), and then leave it to the user to write a script to hunt and filter for the needed sensor paths, and then pass them to the application at runtime.
(Hmm not sure how that would work if you still wanted to run it as a service)

@Pyntux
Copy link

Pyntux commented Apr 17, 2024

Same problem I have with wifi chipset temp...

@rudolf81
Copy link
Author

@Pyntux - I've cobbled together a crude hard-coded filter into a zcfan, to exclude the one I don't want. It works... but I usually end up with a hard crash on the system if I leave it for an extended period. I think if the system enters standby mode or something - its gone. No coming back.

Probably my own bad code somewhere. I didn't expect a bad array pointer or something in a user-space fan-driver could bring down the whole system...

I'm contemplating writing my own version of this in something like Python, just for fun, and then maybe later in Rust.

Though I'm pondering the use of /proc/acpi/ibm/fan - as it only allows for some 3 pre-set fan temps it seems...
Its not PWM or something.

The Thinkpad's own thermal management system, which I guess lives in the bios, has much smoother control of the fan (not bracketed speeds). However - it bombs out sometimes (for me at least), and then spins the fan up/down constantly with no hysteresis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants