07 May 2026

Proxmox: Removing a Ghost Node from the Web UI

Proxmox: Removing a Ghost Node from the Web UI

This is a follow-on to Proxmox Cluster Going Sluggish? Your Offline Node Has a Stale Config. That post covers nodes that are misbehaving but still real. This one covers nodes that don't exist at all.


After sorting out a stale corosync config on a returning node, I noticed the web UI was still showing an extra node — PVE9 — on every host in the cluster. It wasn't causing any problems, just sitting there looking wrong. Here's how to get rid of it.

What a Ghost Node Is

A ghost node is a stale directory in /etc/pve/nodes/ with no corresponding corosync membership. The Proxmox web UI reads from the shared cluster filesystem, not from corosync directly — so a leftover directory shows up as a node even if the machine is long gone. It can appear after a node was removed uncleanly, rebuilt under a different name, or never properly decommissioned.

Check It's Actually a Ghost

First confirm the node isn't just offline — check corosync:

pvecm status
Membership information
----------------------
    Nodeid      Votes Name
0x00000001          3 10.140.3.10
0x00000002          1 10.140.3.80
0x00000003          1 10.140.3.70
0x00000004          1 10.140.3.82
0x00000006          1 10.140.3.20

If it's not in this list, it's a ghost. Confirm the directory exists:

cat /etc/pve/corosync.conf | grep pve9   # nothing
ls /etc/pve/nodes/
# pve1  pve2  pve7  pve8  pve9  xenon

Check for VMs Before Deleting

The node directory may contain VM or container configs:

ls /etc/pve/nodes/pve9/qemu-server/
ls /etc/pve/nodes/pve9/lxc/

If there are configs, check them before doing anything:

cat /etc/pve/nodes/pve9/qemu-server/102.conf

If the node is truly gone, any disks listed as local-lvm:vm-XXX-disk-Y were on that node's local storage and are already inaccessible. You won't be able to recover them. Make sure you're happy with that before proceeding.

Remove It

Try the proper route first:

pvecm delnode pve9

If the node was never in corosync you'll get:

Node/IP: pve9 is not a known host of the cluster.

In that case, remove the directory directly:

rm -rf /etc/pve/nodes/pve9

The deletion replicates across the cluster filesystem immediately. Verify:

ls /etc/pve/nodes/
# pve1  pve2  pve7  pve8  xenon

Reload the web UI — the ghost node is gone.

Quick Reference

Command Purpose
pvecm status Confirm node isn't in corosync
ls /etc/pve/nodes/ List all node directories
ls /etc/pve/nodes/<name>/qemu-server/ Check for VM configs
ls /etc/pve/nodes/<name>/lxc/ Check for container configs
pvecm delnode <name> Proper removal (works if node was in corosync)
rm -rf /etc/pve/nodes/<name> Manual removal for true ghost nodes

Proxmox Cluster Going Sluggish? Your Offline Node Has a Stale Config

Proxmox Cluster Going Sluggish? Your Offline Node Has a Stale Config

You power on a node that's been offline for a while. Within seconds, the Proxmox web UI starts showing other nodes as dead. Management operations slow to a crawl. Nothing is obviously broken — all the nodes are still pinging — but something is clearly very wrong.

This is the stale corosync config problem, and it's easy to fix once you know what to look for.

What's Happening

Proxmox uses corosync to manage cluster membership. Every config change — adding a node, removing a node, changing votes — increments a config_version in /etc/corosync/corosync.conf. All cluster members must agree on this version.

When a node comes back online after missing several config changes, corosync starts up with an old config_version. The other nodes reject its packets. But corosync doesn't give up — it keeps retrying, flooding the network with rejected authentication attempts. This hammers pvedaemon on every node, causing the web UI to become sluggish and show phantom "dead" nodes even though the cluster itself is technically still quorate.

Diagnosis

First, confirm the cluster itself is still healthy from a node you trust:

pvecm status

If you see Quorate: Yes, the cluster is fine — the problem is the misbehaving node, not a genuine quorum loss. Note the Config Version value.

Then SSH into the suspect node and check:

ssh pve7 systemctl status corosync
ssh pve7 cat /etc/corosync/corosync.conf | grep config_version

Here's what it looks like when you've found the culprit:

● corosync.service - Corosync Cluster Engine
     Active: active (running) since Thu 2026-05-07 17:30:55 BST; 8min ago
   Main PID: 1146 (corosync)
     Memory: 155.9M (peak: 171.9M)
        CPU: 11.549s

May 07 17:39:36 pve7 corosync[1146]:   [KNET  ] rx: Packet rejected from 10.140.3.80:5405
May 07 17:39:37 pve7 corosync[1146]:   [KNET  ] rx: Packet rejected from 10.140.3.80:5405
May 07 17:39:38 pve7 corosync[1146]:   [KNET  ] rx: Packet rejected from 10.140.3.80:5405
May 07 17:39:43 pve7 corosync[1146]:   [QUORUM] Sync members[1]: 3
May 07 17:39:43 pve7 corosync[1146]:   [TOTEM ] A new membership (3.1a3c) was formed. Members
May 07 17:39:43 pve7 corosync[1146]:   [QUORUM] Members[1]: 3
May 07 17:39:43 pve7 corosync[1146]:   [MAIN  ] Completed service synchronization, ready to provide service.

  config_version: 10

Two red flags:
- "Packet rejected" — every node is refusing this node's traffic
- "Sync members[1]: 3" — the node has formed its own single-node pseudo-cluster with just itself (nodeid 3)
- config_version: 10 while the live cluster is at version 19 or higher

Fix

Step 1 — Stop corosync on the problem node

systemctl stop corosync

Verify it stopped:

systemctl status corosync

Expected output:

○ corosync.service - Corosync Cluster Engine
     Active: inactive (dead) since Thu 2026-05-07 17:40:45 BST; 48s ago

The web UI should recover almost immediately once the flood of rejected packets stops.

Step 2 — Check the corosync directory

ls -la /etc/corosync/

You'll likely see an authkey from the node's prior cluster membership:

drwxr-xr-x  3 root root 4096 Apr 25 17:21 .
-rw-r--r--  1 root root  256 Apr 25 16:14 authkey
-rw-r--r--  1 root root  639 Apr 25 17:21 corosync.conf

Don't manually delete it — pvecm add --force will handle it cleanly.

Step 3 — Rejoin the cluster

Run this from /tmp (pvecm refuses to run from inside /etc/pve/):

cd /tmp && pvecm add <lead-node-ip> --use_ssh --force

For example:

cd /tmp && pvecm add 10.140.3.10 --use_ssh --force
  • --use_ssh — uses existing SSH key trust instead of the API password prompt
  • --force — overrides warnings about existing config, authkey, and VMs (all expected for a rejoin)

You'll see output like:

detected the following error(s):
* authentication key '/etc/corosync/authkey' already exists
* cluster config '/etc/pve/corosync.conf' already exists
* this host already contains virtual guests

WARNING : detected error but forced to continue!

copy corosync auth key
stopping pve-cluster service
backup old database to '/var/lib/pve-cluster/backup/config-1778172132.sql.gz'
waiting for quorum...OK
(re)generate node files
generate new node certificate
merge authorized SSH keys
generated new node certificate, restart pveproxy and pvedaemon services
successfully added node 'pve7' to cluster.

Step 4 — Verify

pvecm status

Healthy output looks like:

Cluster information
-------------------
Name:             pve
Config Version:   20
Transport:        knet
Secure auth:      on

Quorum information
------------------
Nodes:            5
Quorate:          Yes

Votequorum information
----------------------
Expected votes:   7
Total votes:      7
Quorum:           4
Flags:            Quorate

Membership information
----------------------
    Nodeid      Votes Name
0x00000001          3 10.140.3.10
0x00000002          1 10.140.3.80
0x00000003          1 10.140.3.70 (local)
0x00000004          1 10.140.3.82
0x00000006          1 10.140.3.20

All nodes present, Quorate: Yes, config_version incremented by 2 (one increment per node during the join handshake — this is expected).

Why This Keeps Happening

Corosync is enabled by default and starts automatically on boot. If a node has been offline long enough to miss cluster config changes, it will always boot into this broken state. The node isn't malfunctioning — it's doing exactly what it's designed to do with the config it has. It's just that the config is stale.

Prevention: Before powering on a long-offline Proxmox node, check your current cluster's config_version with pvecm status. If it's significantly ahead of what the returning node last knew, plan for a rejoin rather than assuming it'll come back cleanly.

Quick Reference

Command Purpose
pvecm status Check cluster health and config version
systemctl status corosync Check corosync state on a node
grep config_version /etc/corosync/corosync.conf Check node's config version
systemctl stop corosync Stop the misbehaving corosync
cd /tmp && pvecm add <ip> --use_ssh --force Rejoin the cluster