It’s quite risky to lose main power for Kubernetes cluster and it’s even more risky if all drives on nodes are provided via iSCSI. In my setup UPS is connected to Synology NAS via USB cable, so Synology NAS knows when something is wrong with main power and can shutdown itself gracefully, however what to do with RPi cluster which is running on iSCSI volumes? Answer is simple - Network UPS Tools (NUT).

Server side

Synology NAS has ability to enable UPS server (to work as master), in addition to that all RPi nodes should be specified in “Permitted DiskStation devices”. synology-ups

Client side

On all RPi’s NUT packet has to be installed and configured in client mode (slave mode).

apt-get install nut -y

vi /etc/nut/nut.conf

vi /etc/nut/upsmon.conf
MONITOR ups@ 1 monuser secret slave

service nut-client restart
service nut-client status
● nut-monitor.service - Network UPS Tools - power device monitor and shutdown controller
   Loaded: loaded (/lib/systemd/system/nut-monitor.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2021-07-19 09:59:28 EEST; 57s ago
  Process: 3961 ExecStart=/sbin/upsmon (code=exited, status=0/SUCCESS)
 Main PID: 3963 (upsmon)
    Tasks: 2 (limit: 4915)
   Memory: 1.0M
   CGroup: /system.slice/nut-monitor.service
           ├─3962 /lib/nut/upsmon
           └─3963 /lib/nut/upsmon

With a help of upsc command it is possible to confirm if UPS server returns required data about UPS status. The main values are battery.charge (%), battery.charge.low (%), battery.runtime (sec), battery.runtime.low (sec) and ups.status (OL - online):

upsc ups@
Init SSL without certificate database
battery.charge: 100
battery.charge.low: 10
battery.charge.warning: 50 2021/08/15 2021/08/15
battery.runtime: 3007
battery.runtime.low: 120
battery.temperature: 29.2
battery.type: PbAc
battery.voltage: 13.5
battery.voltage.nominal: 12.0
device.mfr: American Power Conversion
device.model: Back-UPS CS 350
device.serial: 4B0841P06334
device.type: ups usbhid-ups
driver.parameter.pollfreq: 30
driver.parameter.pollinterval: 5
driver.parameter.port: auto
driver.version: SDS6-0-8700-factory-repack-8700-170413 APC HID 0.95
driver.version.internal: 0.38
input.sensitivity: low
input.transfer.high: 278
input.transfer.low: 160
input.voltage: 234.0
input.voltage.nominal: 230
output.frequency: 50.0
output.voltage: 230.0
output.voltage.nominal: 230.0
ups.beeper.status: enabled
ups.delay.shutdown: 20
ups.delay.start: 30
ups.firmware: 807.q8.I
ups.firmware.aux: q8
ups.load: 16.0
ups.mfr: American Power Conversion 2008/10/07
ups.model: Back-UPS CS 350
ups.productid: 0002
ups.realpower.nominal: 210
ups.serial: 4B0841P06334
ups.status: OL
ups.test.result: No test initiated
ups.timer.reboot: 0
ups.timer.shutdown: -1
ups.timer.start: 0
ups.vendorid: 051d

How it works

When a UPS goes critical (status on battery + reaches low battery, or “FSD”: forced shutdown), the slaves are supposed to disconnect and shut down right away. After all slaves went to shutdown master goes down either.

To test shutdown FSD can be initiated on slave (in that case only that slave goes down) or master (in that case all slaves will go down) with a help of command:

upsmon -c fsd

In the logs it will be see like:

Aug 28 13:52:21 p1 upsmon[599]: UPS ups@ forced shutdown in progress
Aug 28 13:52:21 p1 upsmon[599]: Executing automatic power-fail shutdown
Aug 28 13:52:21 p1 upsmon[599]: Auto logout and shutdown proceeding

My own moments