limit tasklist to the maximal pmxcfs status entry size
We tried to limit the size of the tasklist by including non-running
task only if we have less than 25 entries. A reason, among others,
was that a single status entry in the cfs_status.kvhash is limited to
32 KiB.
The "max. 25 entry" heuristic assumes that entries are small, which
is also the norm. But on failed tasks, e.g. a Qemu VM with a
problematic command line, is far longer than the usual task entry.
This led to a situation where the last 25 task were bigger than
32KiB, so the ipcc call to the pmxcfs failed with EFBIG.
This aborted then every new task run with fork_worker, and could
render a node partially unusable until "/var/log/pve/tasks/active"
got truncated.
You should see soon a "ipcc_send_rec failed: File too large"
After this all new task fail, even if they could succeed. pvestatd
also fails to broadcast the tasklist now. To get out of this do:
To address this check the length of the serialized list and remove
elements from its end until we do not exceed the size limit anymore.
Current running tasks and chronological newer ones will get
prioritized.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>