ringbuf: implement simple and efficient ringbuffer
liblxc will use a ringbuffer implementation that employs mmap()ed memory.
Specifically, the ringbuffer will create an anonymous memory mapping twice the
requested size for the ringbuffer. Afterwards, an in-memory file the requested
size for the ringbuffer will be created. This in-memory file will then be
memory mapped twice into the previously established anonymous memory mapping
thereby effectively splitting the anoymous memory mapping in two halves of
equal size. This will allow the ringbuffer to get rid of any complex boundary
and wrap-around calculation logic. Since the underlying physical memory is the
same in both halves of the memory mapping only a single memcpy() call for both
reads and writes from and to the ringbuffer is needed.
Design Notes:
- Since we're using MAP_FIXED memory mappings to map the same in-memory file
twice into the anonymous memory mapping the kernel requires us to always
operate on properly aligned pages. To guarantee proper page aligment the size
of the ringbuffer must always be a muliple of the kernel's page size. This
also implies that the minimum size of the ringbuffer must be at least equal to
one page size. This additional requirement is reasonably unproblematic.
First, any ringbuffer smaller than the size of a single page is very likely
useless since the standard page size on linux is 4096 bytes.
- Because liblxc is not able to predict the output a user is going to produce
(e.g. users could cat binary files onto the console) and because the
ringbuffer is located in a hotpath and needs to be as performant as possible
liblxc will not parse the buffer.
Use Case:
The ringbuffer is needed by liblxc in order to safely log the output of write
intensive callers that produce unpredictable output or unpredictable amounts of
output. The console output created by a booting system and the user is one of
those cases. Allowing a container to log the console's output to a file it
would be possible for a malicious user to fill up the host filesystem by
producing random ouput on the container's console if quota support is either
not enabled or not available for the underlying filesystem. Using a ringbuffer
is a reliable and secure way to ensure a fixed-size log.
Closes #1857.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Adam Borowski [Sun, 15 Oct 2017 19:20:34 +0000 (19:20 +0000)]
Use the proper type for rlim_t, fixing build failure on x32.
Assuming a particular width of a type (or equivalence with "long") doesn't
work everywhere. On new architectures, LFS/etc is enabled by default,
making rlim_t same as rlim64_t even if long is only 32-bit.
Not sure how you handle too big values -- you may want to re-check the
strtoull part.
Signed-off-by: Adam Borowski <kilobyte@angband.pl>
The kernel only allows 4k writes to most files in /proc including {g,u}id_map
so let's not try to write partial mappings. (This will obviously become a lot
more relevant when my patch to extend the idmap limit in the kernel is merged.)
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
This adds set_running_config_item() which is the analogue of
get_running_config_item(). In essence it allows a caller to livepatch the
container's in-memory configuration. This POC is severly limited. Here are the
most obvious ones:
- Only the container's in-memory config can be updated but no further actions
(e.g. on-disk actions) are made.
- Only keys in the "lxc.net." namespace can be changed. This POC also allows
updating an existing network. For example it allows to change the network
type of an existing network. This is obviously nonsense and in a non-POC
implementation this should be blocked.
Use Case:
Callers can hotplug a new network for the container. For example, LXD can
create a pair of veth devices in the host and in the container and add it to
the container's in-memory config. This means, the container can later be
queried for the name of the device later on etc. Note that liblxc will
currently not delete hotplugged network devices on container shutdown since it
won't have the ifindex of the container.
Relates to https://github.com/lxc/lxd/issues/3920 .
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
With the release LXC 2.1 we started warning users who use LXC through the API
and users who use LXC through the tools equally about updating their config.
This quickly got confusing and annoying to API users who e.g. generate configs
on the fly (e.g. LXD). So instead of unconditionally warning users we make this
opt-in. If LXC detects that the env variable LXC_UPDATE_CONFIG_FORMAT is set
then it will warn the user if any legacy configuration keys are present. If it
is not set however, it will not warn the user. This is ok, since the log will
still log WARN()s for all legacy configuration keys.
The tools will all set LXC_UPDATE_CONFIG_FORMAT since it is very much required
that users update to the new configuration format pre-LXC 3.0.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
Serge Hallyn [Wed, 4 Oct 2017 05:14:00 +0000 (05:14 +0000)]
implement lxc_string_split_quoted
lxc_string_split_quoted() splits a string on spaces, but keeps
groups in single or double qoutes together. In other words,
generally what we'd want for argv behavior.
Switch lxc-execute to use this for lxc.execute.cmd.
Switch lxc-oci template to put the lxc.execute.cmd inside single
quotes, because parse_line() will eat those. If we don't do that,
then if we have lxc.execute.cmd = /bin/echo "hello, world", then the
last double quote will disappear.
We need to clear any ifindeces we recorded so liblxc won't have cached stale
data which would cause it to fail on reboot we're we don't re-read the on-disk
config file.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
echo "starting" > /tmp/debug
ip link add host1 type veth peer name peer1
ip link set host1 master lxcbr0
ip link set host1 up
ip link set peer1 netns "${LXC_PID}"
=================================
The nic 'peer1' was placed into the container as expected.
For this to work, we pass the container init's pid as LXC_PID in
an environment variable, since lxc-info cannot work at that point.
Change file check to also check file size (`-f` => `-s`)
Because the `fetch` wget wrapper outputs files to stdout we may end up in a
situation where wget fails but the files are still created. This can happen
e.g. when the host date is out of sync leading to a failed certificate
check, resulting in the creation of empty key files.
Once the empty files have been created the template will try to use them which
causes the certificate check to fail.
By using `-s` instead of `-f` the template will re-fetch the files unless they
exist AND have a size greater than zero.
To match names beginning with the letters "f" or "b" one can use
the regular expression "[fb].*" or "(f|b).*", but not "[f|b].*",
which would match strings beginning with "f", "|", or "b".
Signed-off-by: Christian von Roques <roques@z12.ch>
plamo: Delete unnecessary process during container shutdown
Since some remounts/umounts is executed in the plamo shutdown script,
the filesystem on where a container exists might be mount as
read-only. This patch delete some mounts and umounts from the shutdown
script. It also delete hwclock setting process.
This is technically not necessary but it is a privilege sensitive operation.
Meaning if anyone wants to do something that requires privilege it should be
done before the id switch. So let's move the id switch immediately before the
exec so that it's called at the last possible moment.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>