snapshot: replace global sync with a namespace sync
snapshot_create() called did a global 'sync' after freeze()
which syncs everything including all other containers and
the host. So if you want to snapshot container A while
container B tries to write to a broken NFS mount the
snapshot will hang in that sync call.
Instead we now enter the container's mount namespace and do
a syncfs() on all of its mountpoints.
quotactl(2) requires a path to the device node to work which
means we need to expose them to the container, luckily it
doesn't need r/w access to the device. Also, loop devices
will not detach from the images anymore with them being
still mounted in the monitor's mount namespace (which is
unshared from the host to prevent accidental unmounts via
lxc.monitor.unshare).
Note that quota manipulation currently does not work with
unprivileged containers.
Set unfreeze before trying to freeze, otherwise an aborted
or failed lxc-freeze will not be reversed by our error
handling, leaving the container in a (partially) frozen
state.
Make snapshot_create failure handling more resembling
to the QemuServer codebase and prepare for future code
convergence:
* use $drivehash parameter in snapshot_delete to bypass
check_lock() and delete config lock
* call $snapshot_commit last, it's only needed now if
there were no errors
Since VZDump was the only user of lock_aquire and
lock_release, and does not actually need this split,
we can drop lock_aquire and lock_release.
Since lock_file_full in PVE::Tools now uses the same
refcounting implementation that lock_aquire/release
had, lock_container can simply wrap lock_file_full.
Dominik Csapak [Thu, 4 Feb 2016 12:40:15 +0000 (13:40 +0100)]
improve mountpoint parsing
changes from v1:
renamed function to verify_*
added check for ../ at the beginning
cleaned up regex (\.)? -> \.?
currently we sanitize mountpoints with sanitize_mountpoint, which
tries to remove dots, double-dots and multiple slashes, but it does it
not correctly (e.g. /test/././ gets truncated to /test./ )
instead of trying to truncate the path, we create a format for mp strings
which throws an error if /./ or /../ exist (also /. and /.. at the end or
../ at the beginning) since there should be no valid use for these in
mountpoint paths anyway
with the new behaviour, we don't need sanitize_mountpoint anymore:
Since lxc.autodev defaults to 1, LXC will mount /dev as
tmpfs an populate it. The removed code was unnecessary,
since the device node was not accessable in the container
anyway. A /dev mountpoint is mounted into the rootfs and
accessable under its mountpoint, even if there is no
associated /dev node in the container.
To make matters worse, there was no cleanup for this device
node, which made all but the first boot of containers with
a configured /dev mountpoint fail until the host itself was
rebooted.
Since the memory cgroup has a memory and a "total" value
depending on whether you're increasing or decreasing the
values you have to set then in a working order. (Eg. you
can't reduce the total amount to less than the swap limit
or grow the swap limit to more than the total one.)
Like with qemu the root user can use -skiplock with 'pct
start' and 'pct stop'.
This does not alter the container's lxc config, instead we
pass PVE_SKIPLOCK=1 via the environment which will be seen
from the prestart hook but not from inside the container.
At this point the underlying file has already been
successfully resized which means it makes sense to refelct
that change in the config, but the guest will not see the
effect of it, however, a subsequent resize command will
further increase the size relative to the 'new' size, so
after such an error the best option is to manually deal with
the error and perform the necessary resize steps.
Fix #881: uninitialized value on valid lxc.cgroup keys
We have no lxc.cgroup.* keys in $valid_lxc_conf_keys so they
and unknown keys showed an uninitialized value warning for
the new 'eq' operation.
This also avoids the second hash access.
Correctly update parent relations in config file upon snapshot removal.
Previously, only the parent of the current state was updated/removed,
which led to broken parent relations if any snapshot other then the
immediate parent of the current snapshot was removed. To fix this,
the parent relation of all children snapshots of the removed snapshot
are updated/removed as well.
Based on code in qemu-server/PVE/QemuServer.pm and parts
of a patch by Gerrit Venema <gmoniker at gmail.com>
Instead of holding the flock for the whole backup operation,
release it at the end of prepare(), and use
lock_container() to remove a potential 'backup' lock
from the config file when the backup is finished.
Wolfgang Link [Fri, 15 Jan 2016 06:25:08 +0000 (07:25 +0100)]
Add mp to required in pct set mount-point.
If map is not set you get a warning of an empty variable without real information.
And when you try to start the container, it will not start without an explication.
$comp is a command string and needs to be split. The set of
possible commands is limited and known so splitting by
/\s+/ (as suggested by Marc Cousin) should be safe enough.
* Detection via /etc/SuSE-brand
* Currently only supporting version 13.1 (This apparently
ships no systemd-networkd and has no wicked yet.)
* Introduced ct_modify_file_head_portion: Both Redhat and
SuSE have separate route files for network interfaces, but
with a different formats. For consistency the SuSE code also
only changes routes between the BEGIN/END PVE comment lines.
This version also fixes a bug where the route file got
deleted instead of left untouched when no changes were made
(now caught by a testcase).
create: don't skip arch detection on unpack errors
The -ignore-unpack-errors option needs to be taken into
account in restore_archive instead of restore_and_configure
as restore_archive is also responsible for arch detection.
For now only Fedora 22 is tested. The setup routines from
the Redhat base can be kept, so the only difference for now
is the version scheme and 'ostype'.
Otherwise this runs through the code causing all kinds of
different errors like use of uninitialized values in
peculiar places or format errors trying to validate empty
string or 'missing property' errors trying to parse empty
property strings...
When using the 'storage:size' notation to allocate a disk we
only modify the volume id, so it makes sense to just update
this along with the size rather than creating a new hash
which would drop extra parameters such as 'backup=yes'.
vzdump: exclude lost+found with unprivilged containers
The lost+found directory is created by mkfs and fsck with
the absolute numeric owner of 0:0 which causes tar on an
unprivileged container to error when trying to read it, so
it needs to be excluded un-anchored.
This doesn't need to be done for rsync as rsync runs as
privileged root.
rsync treats --exclude as anchored when they start with
a slash which they do, and which is our desired behavior,
so we should also include --anchored for our tar command.
honor backup=yes/no for bind and device mountpoints
Initially we skipped bind and device mountpoints because we
didn't start out with a backup property. Now that it is
available it is more appropriate to give control back to the
user. The default is 'off' anyway.
To avoid having to use the ^/ and ^/dev/ regexes which are
easy to forget about there's now a 'type' property on
mountpoints which classify them via names, for now including
"volume", "bind" and "device".
The NETWORKING and NETWORKING_IPV6 variables are now setup
in setup_network instead of set_hostname, which now only
sets the hostname.
This changes the variable order so the testcase had to be
adapted.
Note that the HOSTNAME update s// now uses \h instead of \s
for horizontal spaces so it doesn't eat up newlines at the
end of file (caught by the testcase).
In some cases the user may genuinly want to ignore unpacking
errors. (Like permission denied errors on mknod commands in
some templates where the user might choose to work around
the problem manually in the running container.)