]>
Commit | Line | Data |
---|---|---|
60f73aff SG |
1 | # lxcfs |
2 | ||
c397924a | 3 | ## Introduction |
12993ccc CB |
4 | LXCFS is a small FUSE filesystem written with the intention of making Linux |
5 | containers feel more like a virtual machine. It started as a side-project of | |
6 | `LXC` but is useable by any runtime. | |
758ad80c | 7 | |
12993ccc CB |
8 | LXCFS will take care that the information provided by crucial files in `procfs` |
9 | such as: | |
10 | ||
11 | ``` | |
12 | /proc/cpuinfo | |
13 | /proc/diskstats | |
14 | /proc/meminfo | |
15 | /proc/stat | |
16 | /proc/swaps | |
17 | /proc/uptime | |
6cc153e6 | 18 | /proc/slabinfo |
71f17cd2 | 19 | /sys/devices/system/cpu/online |
12993ccc CB |
20 | ``` |
21 | ||
22 | are container aware such that the values displayed (e.g. in `/proc/uptime`) | |
23 | really reflect how long the container is running and not how long the host is | |
24 | running. | |
25 | ||
26 | Prior to the implementation of cgroup namespaces by Serge Hallyn `LXCFS` also | |
27 | provided a container aware `cgroupfs` tree. It took care that the container | |
28 | only had access to cgroups underneath it's own cgroups and thus provided | |
29 | additional safety. For systems without support for cgroup namespaces `LXCFS` | |
8b9d0a3f CB |
30 | will still provide this feature but it is mostly considered deprecated. |
31 | ||
32 | ## Upgrading `LXCFS` without restart | |
33 | ||
34 | `LXCFS` is split into a shared library (a libtool module, to be precise) | |
35 | `liblxcfs` and a simple binary `lxcfs`. When upgrading to a newer version of | |
36 | `LXCFS` the `lxcfs` binary will not be restarted. Instead it will detect that | |
37 | a new version of the shared library is available and will reload it using | |
38 | `dlclose(3)` and `dlopen(3)`. This design was chosen so that the fuse main loop | |
39 | that `LXCFS` uses will not need to be restarted. If it were then all containers | |
40 | using `LXCFS` would need to be restarted since they would otherwise be left | |
41 | with broken fuse mounts. | |
42 | ||
3f9b9afb CB |
43 | To force a reload of the shared library at the next possible instance simply |
44 | send `SIGUSR1` to the pid of the running `LXCFS` process. This can be as simple | |
45 | as doing: | |
46 | ||
e5c2d189 | 47 | rm /usr/lib64/lxcfs/liblxcfs.so # MUST to delete the old library file first |
48 | cp liblxcfs.so /usr/lib64/lxcfs/liblxcfs.so # to place new library file | |
49 | kill -s USR1 $(pidof lxcfs) # reload | |
3f9b9afb | 50 | |
8b9d0a3f CB |
51 | ### musl |
52 | ||
53 | To achieve smooth upgrades through shared library reloads `LXCFS` also relies | |
54 | on the fact that when `dlclose(3)` drops the last reference to the shared | |
55 | library destructors are run and when `dlopen(3)` is called constructors are | |
56 | run. While this is true for `glibc` it is not true for `musl` (See the section | |
57 | [Unloading libraries](https://wiki.musl-libc.org/functional-differences-from-glibc.html).). | |
3f9b9afb CB |
58 | So users of `LXCFS` on `musl` are advised to restart `LXCFS` completely and all |
59 | containers making use of it. | |
955ce662 | 60 | |
bbf99398 | 61 | ## Building |
bbf99398 | 62 | |
d18b5eb5 CB |
63 | In order to build LXCFS install fuse and the fuse development headers according |
64 | to your distro. LXCFS prefers `fuse3` but does work with new enough `fuse2` | |
65 | versions: | |
66 | ||
bbf99398 LW |
67 | git clone git://github.com/lxc/lxcfs |
68 | cd lxcfs | |
d18b5eb5 CB |
69 | meson setup -Dinit-script=systemd --prefix=/usr build/ |
70 | meson compile -C build/ | |
71 | sudo meson install -C build/ | |
bbf99398 | 72 | |
ef53a287 AM |
73 | To build with sanitizers you have to specify `-Db_sanitize=...` option to `meson setup`. |
74 | For example, to enable ASAN and UBSAN: | |
75 | ||
76 | meson setup -Dinit-script=systemd --prefix=/usr build/ -Db_sanitize=address,undefined | |
77 | meson compile -C build/ | |
78 | ||
c397924a | 79 | ## Usage |
758ad80c SH |
80 | The recommended command to run lxcfs is: |
81 | ||
c397924a | 82 | sudo mkdir -p /var/lib/lxcfs |
40dd7f1b | 83 | sudo lxcfs /var/lib/lxcfs |
7456f3b5 | 84 | |
12993ccc CB |
85 | A container runtime wishing to use `LXCFS` should then bind mount the |
86 | approriate files into the correct places on container startup. | |
87 | ||
88 | ### LXC | |
7456f3b5 SG |
89 | In order to use lxcfs with systemd-based containers, you can either use |
90 | LXC 1.1 in which case it should work automatically, or otherwise, copy | |
77647bf9 EG |
91 | the `lxc.mount.hook` and `lxc.reboot.hook` files (once built) from this tree to |
92 | `/usr/share/lxcfs`, make sure it is executable, then add the | |
93 | following lines to your container configuration: | |
5b1e45dd | 94 | ``` |
77647bf9 | 95 | lxc.mount.auto = cgroup:mixed |
1a188fcb | 96 | lxc.autodev = 1 |
ef65395d | 97 | lxc.kmsg = 0 |
77647bf9 | 98 | lxc.include = /usr/share/lxc/config/common.conf.d/00-lxcfs.conf |
5b1e45dd | 99 | ``` |
12993ccc | 100 | |
7e60aa1b | 101 | ## Using with Docker |
102 | ||
103 | ``` | |
104 | docker run -it -m 256m --memory-swap 256m \ | |
105 | -v /var/lib/lxcfs/proc/cpuinfo:/proc/cpuinfo:rw \ | |
106 | -v /var/lib/lxcfs/proc/diskstats:/proc/diskstats:rw \ | |
107 | -v /var/lib/lxcfs/proc/meminfo:/proc/meminfo:rw \ | |
108 | -v /var/lib/lxcfs/proc/stat:/proc/stat:rw \ | |
109 | -v /var/lib/lxcfs/proc/swaps:/proc/swaps:rw \ | |
110 | -v /var/lib/lxcfs/proc/uptime:/proc/uptime:rw \ | |
6cc153e6 | 111 | -v /var/lib/lxcfs/proc/slabinfo:/proc/slabinfo:rw \ |
808b7db2 | 112 | -v /var/lib/lxcfs/sys/devices/system/cpu:/sys/devices/system/cpu:rw \ |
7e60aa1b | 113 | ubuntu:18.04 /bin/bash |
114 | ``` | |
115 | ||
116 | In a system with swap enabled, the parameter "-u" can be used to set all values in "meminfo" that refer to the swap to 0. | |
117 | ||
118 | sudo lxcfs -u /var/lib/lxcfs | |
6279c0f4 SG |
119 | |
120 | ## Swap handling | |
121 | If you noticed LXCFS not showing any SWAP in your container despite | |
122 | having SWAP on your system, please read this section carefully and look | |
123 | for instructions on how to enable SWAP accounting for your distribution. | |
124 | ||
125 | Swap cgroup handling on Linux is very confusing and there just isn't a | |
126 | perfect way for LXCFS to handle it. | |
127 | ||
128 | Terminology used below: | |
129 | - RAM refers to `memory.usage_in_bytes` and `memory.limit_in_bytes` | |
130 | - RAM+SWAP refers to `memory.memsw.usage_in_bytes` and `memory.memsw.limit_in_bytes` | |
131 | ||
132 | The main issues are: | |
133 | - SWAP accounting is often opt-in and, requiring a special kernel boot | |
134 | time option (`swapaccount=1`) and/or special kernel build options | |
135 | (`CONFIG_MEMCG_SWAP`). | |
136 | ||
137 | - Both a RAM limit and a RAM+SWAP limit can be set. The delta however | |
138 | isn't the available SWAP space as the kernel is still free to SWAP as | |
139 | much of the RAM as it feels like. This makes it impossible to render | |
140 | a SWAP device size as using the delta between RAM and RAM+SWAP for that | |
141 | wouldn't account for the kernel swapping more pages, leading to swap | |
142 | usage exceeding swap total. | |
143 | ||
144 | - It's impossible to disable SWAP in a given container. The closest | |
145 | that can be done is setting swappiness down to 0 which severly limits | |
146 | the risk of swapping pages but doesn't eliminate it. | |
147 | ||
148 | As a result, LXCFS had to make some compromise which go as follow: | |
149 | - When SWAP accounting isn't enabled, no SWAP space is reported at all. | |
150 | This is simply because there is no way to know the SWAP consumption. | |
151 | The container may very much be using some SWAP though, there's just | |
152 | no way to know how much of it and showing a SWAP device would require | |
153 | some kind of SWAP usage to be reported. Showing the host value would be | |
154 | completely wrong, showing a 0 value would be equallty wrong. | |
155 | ||
156 | - Because SWAP usage for a given container can exceed the delta between | |
157 | RAM and RAM+SWAP, the SWAP size is always reported to be the smaller of | |
158 | the RAM+SWAP limit or the host SWAP device itself. This ensures that at no | |
159 | point SWAP usage will be allowed to exceed the SWAP size. | |
160 | ||
161 | - If the swappiness is set to 0 and there is no SWAP usage, no SWAP is reported. | |
162 | However if there is SWAP usage, then a SWAP device of the size of the | |
163 | usage (100% full) is reported. This provides adequate reporting of | |
164 | the memory consumption while preventing applications from assuming more | |
165 | SWAP is available. |