mountpoint_mount: disallow symlinks in bind mounts
symlinks in mount paths can cause security issues
assume the following setup:
mp1: local:X,mp=/disk2
mp2: /mnt/shared,mp=/shared
Now the container boots and executes this sequence:
ct:# ln -s /var/lib/lxc/$VMID/etc /disk2/shared
ct:# umount /disk2
ct:# ln -s /mnt /disk2
ct:# umount /shared
ct:# rmdir /shared
ct:# ln -s /etc /shared
ct:# poweroff
Now the owner waits for a stop-mode backup of the container
to be created:
mp1 will be mounted to the host's /mnt because the
container's /disk2 is a symlink to /mnt.
mp2 will now access the replaced /mnt/shared, which is a
symlink to the container's /etc, and mount that over the
container's /shared, which is a symlink to the host's /etc.
Now until the backup is finished the container's owner could
log into the host via ssh using his container's user
credentials.
We'll also unshare the mount namespace when performing such
backups, but it's still a bad idea to allow symlinks
modifying mount container paths.
-) '-x' is '--one-file-system' (the longer version is easier
to spot.)
-) Use --relative's special handling of `/./` in paths in
order to make --one-file-system and --exclude options work
together the way they should.
Here's the issue:
Say you have thse files in your container:
/the-file
/mp0/the-file
And assume /mp0 is a mountpoint.
Naturally you want `-exclude-path /the-file` to only exclude
the first of the two files. This is hard when rsyncing each
mountpoint separately, as the rsync command for mp0 would
see files relative to /mp0, and thus both files would be
excluded unless we modify exclude paths accordingly - which
we can't as they can be arbitrary glob patterns.
Now with rsync's --relative option - assume the container is
mounted at /temp (iow: /temp/ and /temp/mp0). Passing
/temp/mp0/ to rsync would copy the contents of the mp0
mountpoint into the root directory of the destination
(essentially doing the equivalent of `mv mp0/* /` in the
container's backup.). However, rsync's special treatment of
/./ with the --relative option allows us to pass
/temp/./mp0/ which tells rsync that `/mp0` is supposed to be
included in the path, iow. we're actually copying from
/temp/, but we want only its mp0/ directory.
See rsync(1)'s section about --relative for a detailed
description.
Use the array-of-array version of run_command to build the
pipe, this should deal with most quoting issues.
Note that tar handles glob patterns in --exclude itself, so
quoting patterns instead of letting the shell resolve them
is also actually more correct.
To void at least some weird quoting issues, and since tar
has a --one-file-system option, always skips sockets and
also supports exclusion by pattern we now simply use tar
directly instead of passing files listed by 'find'.
-) The condition was apparently to ignore /dev/ paths while
that's actually what it was supposed to handle... (other
paths aren't devices...).
-) Get rid of the blockdevices_list heuristics, it doesn't
work reliably for all types of devices.
-) Check whether a device is a block device via S_ISBLK on
the device file.
-) Don't try to follow symlinks as the name we provide in
the mp config is the name we want in the container.
-) Blacklisted loop devices as they pose a security risk.
-) loop devices are now attached in mountpoint_mount, and
immediately detached in order to set the auto-clear flag
Keeping track of loop-devices is otherweise next to
impossible and a security concern.
We mount the filesystems for the container. We do not
support full loop device access for containers for a simple
reason: once a container detached a loop device, the
startup of another container might reuse it, exposing its
devices to the first container, generating unwatned cross
container access permissions.
Loop devices are also set to auto-clear, so that we do not
need to worry about detaching them when stopping the
container.
made create_disk non-private
made print_ct_mountpoint include 'mp=' unless 'nomp' is set
added an mkdirs flag to mount_all
create mount directories for now mountpoints simply by
calling mount_all followed by umount_all
-) format disks after vdisk_alloc instead of in mount_all
-) make mount_all return the loop device list so it can be
passed to umount_all optionally
-) umount_all takes an optional loopdevs list to avoid
listing loopdevs when not necessary
Convert to disksize to GB after reading the config file
This fixes a bug when calling pct restore a vzdump backup:
we store the rootfs size in KB, but create_disks expects that a GB parameter.
So the restore was failing trying to create something of a Petabyte image.
Formatting '/var/lib/vz/images/110/vm-110-disk-1.raw', fmt=raw size=4503599627370496
unable to create image: qemu-img: /var/lib/vz/images/110/vm-110-disk-1.raw: The image size is too large for file format 'raw'
Our containers all only have a number as name. If any other
character appears in a container name, skip our hooks in
order to allow lxc to be used manually without interference
from PVE.
Thomas Lamprecht [Wed, 26 Aug 2015 13:36:28 +0000 (15:36 +0200)]
allow to load configs from CTs located on other nodes
This allows load_config to load also configs from LXC container
which arent't located on the same node.
This is needed to fix the bug that doesn't let view the noVNC
console from an CT located on another node, whereas viewing it on
the same node as the container is located works fine.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>