Proxmox Cluster Going Sluggish? Your Offline Node Has a Stale Config
You power on a node that's been offline for a while. Within seconds, the Proxmox web UI starts showing other nodes as dead. Management operations slow to a crawl. Nothing is obviously broken — all the nodes are still pinging — but something is clearly very wrong.
This is the stale corosync config problem, and it's easy to fix once you know what to look for.
What's Happening
Proxmox uses corosync to manage cluster membership. Every config change — adding a node, removing a node, changing votes — increments a config_version in /etc/corosync/corosync.conf. All cluster members must agree on this version.
When a node comes back online after missing several config changes, corosync starts up with an old config_version. The other nodes reject its packets. But corosync doesn't give up — it keeps retrying, flooding the network with rejected authentication attempts. This hammers pvedaemon on every node, causing the web UI to become sluggish and show phantom "dead" nodes even though the cluster itself is technically still quorate.
Diagnosis
First, confirm the cluster itself is still healthy from a node you trust:
pvecm status
If you see Quorate: Yes, the cluster is fine — the problem is the misbehaving node, not a genuine quorum loss. Note the Config Version value.
Then SSH into the suspect node and check:
ssh pve7 systemctl status corosync
ssh pve7 cat /etc/corosync/corosync.conf | grep config_version
Here's what it looks like when you've found the culprit:
● corosync.service - Corosync Cluster Engine
Active: active (running) since Thu 2026-05-07 17:30:55 BST; 8min ago
Main PID: 1146 (corosync)
Memory: 155.9M (peak: 171.9M)
CPU: 11.549s
May 07 17:39:36 pve7 corosync[1146]: [KNET ] rx: Packet rejected from 10.140.3.80:5405
May 07 17:39:37 pve7 corosync[1146]: [KNET ] rx: Packet rejected from 10.140.3.80:5405
May 07 17:39:38 pve7 corosync[1146]: [KNET ] rx: Packet rejected from 10.140.3.80:5405
May 07 17:39:43 pve7 corosync[1146]: [QUORUM] Sync members[1]: 3
May 07 17:39:43 pve7 corosync[1146]: [TOTEM ] A new membership (3.1a3c) was formed. Members
May 07 17:39:43 pve7 corosync[1146]: [QUORUM] Members[1]: 3
May 07 17:39:43 pve7 corosync[1146]: [MAIN ] Completed service synchronization, ready to provide service.
config_version: 10
Two red flags:
- "Packet rejected" — every node is refusing this node's traffic
- "Sync members[1]: 3" — the node has formed its own single-node pseudo-cluster with just itself (nodeid 3)
- config_version: 10 while the live cluster is at version 19 or higher
Fix
Step 1 — Stop corosync on the problem node
systemctl stop corosync
Verify it stopped:
systemctl status corosync
Expected output:
○ corosync.service - Corosync Cluster Engine
Active: inactive (dead) since Thu 2026-05-07 17:40:45 BST; 48s ago
The web UI should recover almost immediately once the flood of rejected packets stops.
Step 2 — Check the corosync directory
ls -la /etc/corosync/
You'll likely see an authkey from the node's prior cluster membership:
drwxr-xr-x 3 root root 4096 Apr 25 17:21 .
-rw-r--r-- 1 root root 256 Apr 25 16:14 authkey
-rw-r--r-- 1 root root 639 Apr 25 17:21 corosync.conf
Don't manually delete it — pvecm add --force will handle it cleanly.
Step 3 — Rejoin the cluster
Run this from /tmp (pvecm refuses to run from inside /etc/pve/):
cd /tmp && pvecm add <lead-node-ip> --use_ssh --force
For example:
cd /tmp && pvecm add 10.140.3.10 --use_ssh --force
--use_ssh — uses existing SSH key trust instead of the API password prompt
--force — overrides warnings about existing config, authkey, and VMs (all expected for a rejoin)
You'll see output like:
detected the following error(s):
* authentication key '/etc/corosync/authkey' already exists
* cluster config '/etc/pve/corosync.conf' already exists
* this host already contains virtual guests
WARNING : detected error but forced to continue!
copy corosync auth key
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1778172132.sql.gz'
waiting for quorum...OK
(re)generate node files
generate new node certificate
merge authorized SSH keys
generated new node certificate, restart pveproxy and pvedaemon services
successfully added node 'pve7' to cluster.
Step 4 — Verify
pvecm status
Healthy output looks like:
Cluster information
-------------------
Name: pve
Config Version: 20
Transport: knet
Secure auth: on
Quorum information
------------------
Nodes: 5
Quorate: Yes
Votequorum information
----------------------
Expected votes: 7
Total votes: 7
Quorum: 4
Flags: Quorate
Membership information
----------------------
Nodeid Votes Name
0x00000001 3 10.140.3.10
0x00000002 1 10.140.3.80
0x00000003 1 10.140.3.70 (local)
0x00000004 1 10.140.3.82
0x00000006 1 10.140.3.20
All nodes present, Quorate: Yes, config_version incremented by 2 (one increment per node during the join handshake — this is expected).
Why This Keeps Happening
Corosync is enabled by default and starts automatically on boot. If a node has been offline long enough to miss cluster config changes, it will always boot into this broken state. The node isn't malfunctioning — it's doing exactly what it's designed to do with the config it has. It's just that the config is stale.
Prevention: Before powering on a long-offline Proxmox node, check your current cluster's config_version with pvecm status. If it's significantly ahead of what the returning node last knew, plan for a rejoin rather than assuming it'll come back cleanly.
Quick Reference
| Command |
Purpose |
pvecm status |
Check cluster health and config version |
systemctl status corosync |
Check corosync state on a node |
grep config_version /etc/corosync/corosync.conf |
Check node's config version |
systemctl stop corosync |
Stop the misbehaving corosync |
cd /tmp && pvecm add <ip> --use_ssh --force |
Rejoin the cluster |