Thomas Lamprecht [Mon, 22 Nov 2021 19:15:29 +0000 (20:15 +0100)]
pvescheduler: make jobs tracking more flexible, rework stop
Avoid hard-coding the current implication of the replication stack to
not get started again until the old worker is done..
We still apply the same check, but changing that to let the jobs have
control is rather easy now.
Also rework the stop logic, send terminate to _all_ workers and make
the timeout a actual shared one (not first gets all, remaining get
kill) and send a kill to the stuck, leftover ones in one go at the
end, including some logging so that the admin can actually know about
this non-ideal situation.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Dominik Csapak [Thu, 18 Nov 2021 13:28:30 +0000 (14:28 +0100)]
pvescheduler: reworking child pid tracking
previously, systemd timers were responsible for running replication jobs.
those timers would not restart if the previous one is still running.
though trying again while it is running does no harm really, it spams
the log with errors about not being able to acquire the correct lock
to fix this, we rework the handling of child processes such that we only
start one per loop if there is currently none running. for that,
introduce the types of forks we do and allow one child process per type
(for now, we have 'jobs' and 'replication' as types)
Dominik Csapak [Thu, 18 Nov 2021 13:28:29 +0000 (14:28 +0100)]
pvescheduler: catch errors in forked childs
if '$sub' dies, the error handler of PVE::Daemon triggers, which
initiates a shutdown of the child, resulting in confusing error logs
(e.g. 'got shutdown request, signal running jobs to stop')
instead, run it under 'eval' and print the error to the sylog instead
Dominik Csapak [Wed, 17 Nov 2021 14:21:01 +0000 (15:21 +0100)]
api: backup: normalize 'dow' format when converting
the old web ui sends the days as seperate parameters, which will
be concatenated by a null-byte in the api, causing it to land it this
way in the jobs.cfg
to fix this, split+join the list to get a well-formed dow list
Thomas Lamprecht [Tue, 16 Nov 2021 13:17:42 +0000 (14:17 +0100)]
ui: qemu: disk edit: drop label widths from advanced columns
this is a historical left over from the time when the bandwidth
limits weren't in their own, separate tab, as there we got quite
long labels and we synced the width up for the remaining fields to
avoid that it looks to much off.
Luckily not required anymore, so just drop it for non BW fields.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Fabian Ebner [Tue, 16 Nov 2021 13:08:22 +0000 (14:08 +0100)]
ui: ceph: osd: handle edge case with dead node
If there is a left-over entry for a dead node in the ceph osd tree
the panel wouldn't show and produce an
Uncaught TypeError: data.versions is undefined
because of an access
node.version = data.versions[node.name];
further below (not visible in the patch itself).
AFAICT, the same issue would also happen when something went wrong
with getting the broadcasted ceph-versions, or when a node is part
of Ceph, but not PVE.
Handle the situation gracefully by always initializing data.versions.
Thomas Lamprecht [Mon, 15 Nov 2021 09:33:05 +0000 (10:33 +0100)]
ui: qemu: disk edit: refactor to more declarative style using bindings
would technically require a versioned dependency bump to widget
toolkit as the `clearOnDisable` flag is new in 3.4-2, but this is
really only for slight UX improvement, so avoid the hard dependency
bump..
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
For DB and WAL disks, not only partitions will show up now, but one
more type of disk, that didn't show up before: Namely, GPT-partitioned
disks with any partitions detected as used.
It's confusing as the size shown is of the full disk, with no
indication that a new partition will be appended at the end. This
problem was already present before, but only affected GPT-partitioned
disks where no usage on a partition was detected.
Fabian Ebner [Wed, 6 Oct 2021 09:18:49 +0000 (11:18 +0200)]
partially fix #2285: api: ceph: create osd: allow using partitions
Note that this does not only allow partitions to be used, but for DB
and WAL disks, one more type of disk, that wasn't allowed before.
Namely, GPT-partitioned disks with any partitions detected as used.
The reason is get_disks' behavior:
* Without $include_partitions=1, the disk will have the same usage
as it's first used partition, and thus wasn't allowed. (Except in
the case that usage was LVM, where the check was bypassed, but
luckily OSD creation just failed later because no Ceph volume
group would be detected).
* With $include_partitions=1, the disk will have usage 'partitions'
and thus be allowed.
Dominik Csapak [Thu, 11 Nov 2021 11:07:07 +0000 (12:07 +0100)]
api: cluster: add jobs/schedule-analyze api call
a simple api call to simulate calendar event triggers
takes a schedule, an optional number (default 10), an optional starttime
(default 'now') and returns a list with unix timestamps, as well as
humanly readable utc timestamps.
Dominik Csapak [Mon, 25 Oct 2021 14:01:31 +0000 (16:01 +0200)]
api: cephfs: more checks on fs create
namely if the fs is already existing, and if there is currently a
standby mds that can be used for the new fs
previosuly, only one cephfs was possible, so these checks were not
necessary. now with pacific, it is possible to have multiple cephfs'
and we should check for those.
Dominik Csapak [Mon, 25 Oct 2021 14:01:29 +0000 (16:01 +0200)]
ui: ceph: catch missing version for service list
when a daemon is stopped, the version here is 'undefined'. catch that
instead of letting the template renderer run into an error.
this fixes the rendering of the grid backgrounds
Dominik Csapak [Mon, 25 Oct 2021 14:01:28 +0000 (16:01 +0200)]
api: ceph-mds: get mds state when multple ceph filesystems exist
by iterating over all of them and saving the name to the active ones
this fixes the issue that an mds that is assigned to not the first
fs in the list gets wrongly shown as offline
broadcast the built-in, statically available version info, e.g.:
{
"release" : "7.0",
"repoid" : "3ce05d40",
"version" : "7.0-14"
}
We can expand this by more actual package version info in the future,
but that certainly needs more elaborate update control mechanisms as
the oneshot at boot we have now.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>