Tycho Andersen [Thu, 27 Oct 2022 16:23:08 +0000 (10:23 -0600)]
sysfs: don't mask cpus in /sys/devices/system/cpu
The kernel does not mask the cpu%d dirs when they are offlined:
(root) /sys/devices/system/cpu # cat online
0-7
(root) /sys/devices/system/cpu # chcpu -d 4
CPU 4 disabled
(root) /sys/devices/system/cpu # cat online
0-3,5-7
(root) /sys/devices/system/cpu # cat offline
4
(root) /sys/devices/system/cpu # ls -al
total 0
drwxr-xr-x 16 root root 0 Oct 25 20:42 .
drwxr-xr-x 10 root root 0 Oct 25 20:42 ..
drwxr-xr-x 7 root root 0 Oct 25 20:42 cpu0
drwxr-xr-x 7 root root 0 Oct 25 20:42 cpu1
drwxr-xr-x 7 root root 0 Oct 25 20:42 cpu2
drwxr-xr-x 7 root root 0 Oct 25 20:42 cpu3
drwxr-xr-x 5 root root 0 Oct 25 20:42 cpu4
drwxr-xr-x 7 root root 0 Oct 25 20:42 cpu5
drwxr-xr-x 7 root root 0 Oct 25 20:42 cpu6
drwxr-xr-x 7 root root 0 Oct 25 20:42 cpu7
drwxr-xr-x 2 root root 0 Oct 25 20:43 cpufreq
drwxr-xr-x 2 root root 0 Oct 26 15:19 cpuidle
drwxr-xr-x 2 root root 0 Oct 26 15:19 hotplug
-r--r--r-- 1 root root 4096 Oct 25 20:42 isolated
-r--r--r-- 1 root root 4096 Oct 25 20:43 kernel_max
-r--r--r-- 1 root root 4096 Oct 26 15:19 modalias
-r--r--r-- 1 root root 4096 Oct 26 15:19 offline
-r--r--r-- 1 root root 4096 Oct 25 20:42 online
-r--r--r-- 1 root root 4096 Oct 25 20:43 possible
drwxr-xr-x 2 root root 0 Oct 26 15:19 power
-r--r--r-- 1 root root 4096 Oct 25 20:43 present
drwxr-xr-x 2 root root 0 Oct 26 15:19 smt
-rw-r--r-- 1 root root 4096 Oct 25 20:42 uevent
drwxr-xr-x 2 root root 0 Oct 26 15:19 vulnerabilities
let's not mask them in lxcfs either. In particular, we have observed this
causing problems with some JVMs' implementation of
Runtime.getRuntime().availableProcessors().
This is a bit of a strange patch: it seems masking this dir was always
incorrect, so we could go back to just not offering it as an lxcfs
endpoint, and having people use sysfs' implementation directly. But maybe
people are expecting it now, so I've left it as a proxy. Perhaps a more
appropriate patch is to just delete it entirely and add an API extension
note?
Tycho Andersen [Fri, 28 Oct 2022 20:24:54 +0000 (14:24 -0600)]
/proc/stat: render physical cpu number in non-view mode
When the kernel has an offline CPU, it only renders the online CPUs in
/proc/stat.
When in non-use_view mode, /sys/devices/system/cpu/online shows the CPU
numbers as they actually are on the physical system, but /proc/stat used
"virtual" (i.e. always zero-indexed) numbers, which causes confusion for
some applications. Let's use the same use_view logic in /proc/stat as well.
It was discovered that with libfuse3 we lost FOPEN_DIRECT_IO flag
on (struct fuse_file)->open_flags. I'm sure that this is the reason
for all the strange bugs that our users met recently.
cpuview: fix possible use-after-free in find_proc_stat_node
Our current lock design uses 2 sync primitives.
First (pthread_rwlock) protects hash table buckets.
Second (pthread_mutex) protects each struct cg_proc_stat
from concurrent modification. But the problem is that function
find_proc_stat_node() can return a pointer to the node
(struct cg_proc_stat) which can be freed by prune_proc_stat_history()
call *before* we take pthread_mutex. Moreover, we perform
memory release of (struct cg_proc_stat) in prune_proc_stat_list()
without any protection like refcounter or mutex on (struct cg_proc_stat).
An attempt to guess what happens in:
https://github.com/lxc/lxcfs/issues/565
https://discuss.linuxcontainers.org/t/number-of-cpus-reported-by-proc-stat-fluctuates-causing-issues/15780/14
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
cpuview: paththrough personality when reading cpuinfo
Let's change processing thread personality if caller personality
is different. It allows to read /proc/cpuinfo properly in
some cases (arm64 rely on current->personality inside Linux kernel).
https://github.com/lxc/lxcfs/issues/553
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com>
Mathias Gibbens [Thu, 17 Nov 2022 21:57:58 +0000 (21:57 +0000)]
Fix build on ia64
The relevant code was added in commit 35acc24, but the function/macro
prctl_arg() didn't seem to be defined anywhere in the repo. lxc
currently has a corresponding macro defined in src/lxc/macro.h that
casts the value to an unsigned long. But 0 doesn't require any special
handling, so remove the call to prctl_arg().
Verified that the code compiles properly on Debian's ia64 porterbox
(yttrium).
With fuse3 `fuse_get_context` returns NULL before fuse was
fully initialized, so we must not access it.
Futher, we call 'do_reload' for normal initialization as
well, so let's prevent that from re-initializing the
bindings initially and only do this on actual reloads,
otherwise we do it twice on startup.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Fixes #549
`opathdir` was used to replace `opendir` in order to ensure
`O_NOFOLLOW` and `O_CLOEXEC` were set, however it also added
`O_PATH` which prevents `readdir`/`getdents` to be used on
it, causing the `/sys/devices/system/cpu/<subdir>`
directories to be empty.
Instead, let's have an `opendir_flags` utility which simply
passed additional flags to the `open(..., O_DIRECTORY)` call
preceding `fdopendir()`.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
When introducing versioned options, we started using fuse's
"init" callback in order to tell the library to set
`can_use_sys_cpu` and `has_versioned_opts` accordingly.
However, we forgot to also do this on a reload. Fix this by
simply calling `lxcfs_fuse_init()` in `do_reload()` as well.
Additionaly: ignore lxcfs_fuse_init()'s return value.
We just "passed through" the private_data from fuse which is
set via the `fuse_main()` call.
It's better to not leave this up to the library anyway in
order to make it easier to be fuse version agnostic in the
future.
Without this, issuing a reload to lxcfs would cause
files in `/sys/devices/system/cpu/` to be visible via
`readdir`, but accessing them would fail:
~ # ls /sys/devices/system/cpu/
ls: /sys/devices/system/cpu/cpuidle: No such file or directory
ls: /sys/devices/system/cpu/uevent: No such file or directory
(...)
Morten Linderud [Sun, 13 Mar 2022 11:36:50 +0000 (12:36 +0100)]
init/meson: Use libdir instead of hardcoded /lib path
Hardcoding `/lib` makes meson create a directory which would conflict on
distros with usrmerge as `/lib` is a symlink. We define `libdir` in the
top-level so we should be using that instead.
Stéphane Graber [Sun, 13 Mar 2022 03:34:15 +0000 (22:34 -0500)]
sysfs: Don't incorrectly filter entries
The filtering logic was completely skipping any entry which was 3
characters or shorter. Instead change the logic to only attempt to parse
those entries longer than 3 characters.
Morten Linderud [Sat, 12 Mar 2022 14:53:51 +0000 (15:53 +0100)]
meson: Include documentation
Documentation was removed from the build system with the migration to
meson. This implements the help2man generation which existed in the
autoconf setup.
Some of these missed the spaces in between and in order to
make this more readable and apparent, put one field per line
and add comments matching the field names.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>