The server has 3 high speed industrial grade fan to keep it cool.
They are great but are quite noisy.
More understanding about how to control them is needed.
Note !
This article was written for the old System76 server, currently NOT used anymore.
Fan management
The fan in the system are controlled by the BMC (Baseboard Management Controller).
The server has 3 fans, one generic and 2 high speed industrial grade fans used to keep cool the CPU.
|
Delta electronic Model FFB0412UHN
|
|
Nidec Ultraflow Model V40W12BGA5-07A1
|
The fans are controlled by a PWM controller in order to vary the speed depending the temperature (throttling).
I think the controller is handled via BIOS since there are some settings in the BIOS.
This is an example of the noise I'm talking about.
The problem
The fans of the server are running at the maximum speed all the time and they are quite noisy.
When the server starts, the fans are at the maximum speed during the bootstrap, then are throttling down almost to stop, showing that the BIOS control of the fans is working.
When the OS takes over, however the fans restart at the maximum speed and apparently ignores the CPU temperature as input since they run full speed at temperatures well below the critical level.
With the older OS/setting the fans were running almost at the lower speed, so something is missing.
Poking the BIOS settings didn't bring up any improvement.
Solution
The ideal solution is to find an utility capable to drive the PWM fans and run it, from a batch or depending about how the utility is built.
Basically under Linux there are different utilities capable to deal with the sensors and peripherals :
- lm-sensors
- pwmconfig
- i8kutils
- system76-firmware
- system76-utils
Every utilities handle sensors but on some motherboards not all the sensors are handled.
For example using sensors (from lm-sensors) is possible to read the CPU temperature but is not possible to control the PWM fans nor reading the feedback from them.
ipmitool seems capable to read my motherboard sensors
It seems ipmitool is the way to go
Alternatives
As alternative solutions I see other two choices :
- Change the server
Buying a newer server would allows to install the native Pop!_OS with all the updated drivers and firmware and would also allow to have a GUI in console.
But of course the main issue here is the budget.
Not sure how much would cost a new server but it could easily be around 2-3000$ at least.
And beside, my old server works nicely, is fit for my needs and now with 32 GB RAM instead 4 and 4 TB HD I can do much more !
I hate to trash away a nice machine only because ONE problem.
- Build a fan controller handled via USB
Maybe little bit awkward solution, but using a cheap microcontroller I could build up a PWM controller for the fans. The OS can simply measure the CPU temperature and/or other data (Like the server room temperature) and drive the Fans accordingly.
Utilities installed
On the Ubuntu 22.04 LTS Server edition these utilities are installed (and working) to monitor some sensors.
lm-sensors
The utility is capable to read the CPU temperature and the HD temperature via the RAID controller
ipmitool
The utility like lm-sensors can read the temperatures and also the FAN and other sensors.
Here some useful commands :
- sudo ipmitool sensors
Read all the sensors - sudo ipmitool raw <netfn> <cmd> [data]
Network Function Codes:
VAL HEX STRING
==============================================
0 0x00 Chassis
2 0x02 Bridge
4 0x04 SensorEvent
6 0x06 Application
8 0x08 Firmware
10 0x0a Storage
12 0x0c Transport
This is the first reading from ipmitool (sudo ipmitool sensor):
Pwr Unit Status | 0x0 | discrete | 0x0000| na | na | na | na | na | na
IPMI Watchdog | 0x0 | discrete | 0x0000| na | na | na | na | na | na
Physical Scrty | 0x0 | discrete | 0x0000| na | na | na | na | na | na
FP NMI Diag Int | 0x0 | discrete | 0x0000| na | na | na | na | na | na
SMI Timeout | 0x0 | discrete | 0x0000| na | na | na | na | na | na
System Event Log | 0x0 | discrete | 0x0000| na | na | na | na | na | na
System Event | 0x0 | discrete | 0x0000| na | na | na | na | na | na
Button | 0x0 | discrete | 0x0000| na | na | na | na | na | na
BMC Watchdog | 0x0 | discrete | 0x0000| na | na | na | na | na | na
VR Watchdog | 0x0 | discrete | 0x0000| na | na | na | na | na | na
PCH Therm Trip | 0x0 | discrete | 0x0000| na | na | na | na | na | na
BMC FW Health | 0x0 | discrete | 0x0000| na | na | na | na | na | na
System Airflow | 50.000 | CFM | ok | na | na | na | na | na | na
BB EDGE Temp | 31.000 | degrees C | ok | na | 0.000 | 5.000 | 110.000 | 115.000 | na
Front Panel Temp | 30.000 | degrees C | ok | na | 0.000 | 5.000 | 50.000 | 55.000 | na
PCH Temp | 37.000 | degrees C | ok | na | 0.000 | 5.000 | 98.000 | 103.000 | na
BB BMC Temp | 36.000 | degrees C | ok | na | 0.000 | 5.000 | 110.000 | 115.000 | na
BB CPU VR Temp | 33.000 | degrees C | ok | na | 0.000 | 5.000 | 110.000 | 115.000 | na
Exit Air Temp | 33.000 | degrees C | ok | na | 0.000 | 5.000 | 80.000 | 85.000 | na
System Fan 1 | 7742.000 | RPM | ok | na | 392.000 | 588.000 | na | na | na
System Fan 2 | 23128.000 | RPM | ok | na | 392.000 | 588.000 | na | na | na
System Fan 3 | 23128.000 | RPM | ok | na | 392.000 | 588.000 | na | na | na
PS1 Status | 0x0 | discrete | 0x0100| na | na | na | na | na | na
PS1 Power In | 63.000 | Watts | ok | na | na | na | 1533.000 | 1750.000 | na
PS1 Temperature | 32.000 | degrees C | ok | na | na | na | 55.000 | 60.000 | na
P1 Status | 0x0 | discrete | 0x8000| na | na | na | na | na | na
P1 Therm Margin | -65.000 | degrees C | ok | na | na | na | na | na | na
P1 Therm Ctrl % | 0.000 | percent | ok | na | na | na | 30.000 | 50.000 | na
P1 ERR2 | 0x0 | discrete | 0x0000| na | na | na | na | na | na
CATERR | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P1 ICC Mismatch | 0x0 | discrete | 0x0000| na | na | na | na | na | na
CPU Missing | 0x0 | discrete | 0x0000| na | na | na | na | na | na
P1 DTS Therm Mgn | -65.000 | degrees C | ok | na | na | na | na | na | na
P1 VRD Hot | 0x0 | discrete | 0x0000| na | na | na | na | na | na
DIMM Thrm Mrgn 1 | na | degrees C | na | na | na | na | 5.000 | 10.000 | na
P1 Mem Thrm Trip | 0x0 | discrete | 0x0000| na | na | na | na | na | na
BB +12.0V | 11.883 | Volts | ok | na | 10.635 | 10.947 | 13.027 | 13.391 | na
BB +5.0V1 | 4.959 | Volts | ok | na | 4.460 | 4.590 | 5.415 | 5.566 | na
BB +3.3V | 3.253 | Volts | ok | na | 2.953 | 3.039 | 3.554 | 3.654 | na
BB +5.0V2 | 4.959 | Volts | ok | na | 4.460 | 4.590 | 5.415 | 5.566 | na
BB +12.0V V2 | 11.831 | Volts | ok | na | 10.635 | 10.947 | 13.027 | 13.391 | na
BB +1.75V Vccp | 1.769 | Volts | ok | na | 1.611 | 1.653 | 1.902 | 1.960 | na
BB +1.5 P1DDR | na | | na | na | 1.339 | 1.387 | 1.611 | 1.659 | na
BB VBAT | 3.133 | Volts | ok | na | 2.211 | 2.544 | na | na | na
BB +1.05V PCH | 1.038 | Volts | ok | na | 0.546 | 0.564 | 1.464 | 1.506 | na
BB +1.05V AUX | 1.038 | Volts | ok | na | 0.546 | 0.564 | 1.464 | 1.506 | na
BB +1.35V MEM | 1.323 | Volts | ok | na | 1.201 | 1.244 | 1.445 | 1.488 | na
BB +12.0V V1 | 11.883 | Volts | ok | na | 10.635 | 10.947 | 13.027 | 13.391 | na
P1 MTT | 0.000 | percent | ok | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | 0.000
The column are :
- Sensor name
- Current value
- unit
- status
This is a more detailed list of the Fans characteristics (sudo ipmitool -v sdr type Fan):
Sensor ID : System Fan 1 (0x30)
Entity ID : 29.2 (Fan Device)
Sensor Type (Threshold) : Fan (0x04)
Sensor Reading : 7252 (+/- 0) RPM
Status : ok
Nominal Reading : 12936.000
Normal Minimum : 980.000
Normal Maximum : 24990.000
Lower critical : 392.000
Lower non-critical : 588.000
Positive Hysteresis : 196.000
Negative Hysteresis : 196.000
Minimum sensor range : Unspecified
Maximum sensor range : Unspecified
Event Message Control : Per-threshold
Readable Thresholds : lcr lnc
Settable Thresholds : lcr lnc
Threshold Read Mask : lcr lnc
Assertion Events :
Assertions Enabled : lnc- lcr-
Deassertions Enabled : lnc- lcr-
Sensor ID : System Fan 2 (0x31)
Entity ID : 29.3 (Fan Device)
Sensor Type (Threshold) : Fan (0x04)
Sensor Reading : 22344 (+/- 0) RPM
Status : ok
Nominal Reading : 12936.000
Normal Minimum : 980.000
Normal Maximum : 24990.000
Lower critical : 392.000
Lower non-critical : 588.000
Positive Hysteresis : 196.000
Negative Hysteresis : 196.000
Minimum sensor range : Unspecified
Maximum sensor range : Unspecified
Event Message Control : Per-threshold
Readable Thresholds : lcr lnc
Settable Thresholds : lcr lnc
Threshold Read Mask : lcr lnc
Assertion Events :
Assertions Enabled : lnc- lcr-
Deassertions Enabled : lnc- lcr-
Sensor ID : System Fan 3 (0x32)
Entity ID : 29.4 (Fan Device)
Sensor Type (Threshold) : Fan (0x04)
Sensor Reading : 23912 (+/- 0) RPM
Status : ok
Nominal Reading : 12936.000
Normal Minimum : 980.000
Normal Maximum : 24990.000
Lower critical : 392.000
Lower non-critical : 588.000
Positive Hysteresis : 196.000
Negative Hysteresis : 196.000
Minimum sensor range : Unspecified
Maximum sensor range : Unspecified
Event Message Control : Per-threshold
Readable Thresholds : lcr lnc
Settable Thresholds : lcr lnc
Threshold Read Mask : lcr lnc
Assertion Events :
Assertions Enabled : lnc- lcr-
Deassertions Enabled : lnc- lcr-
No comments:
Post a Comment