Seth Forshee [Wed, 19 Aug 2020 16:04:30 +0000 (11:04 -0500)]
UBUNTU: SAUCE: i915: Fix build error due to missing struct definition
FTBFS in v5.9-rc1:
In file included from /tmp/kernel-sforshee-f5108e59edd8-jyEs/build/drivers/gpu/drm/i915/i915_active.h:12,
from /tmp/kernel-sforshee-f5108e59edd8-jyEs/build/drivers/gpu/drm/i915/gt/intel_context_param.c:6:
/tmp/kernel-sforshee-f5108e59edd8-jyEs/build/drivers/gpu/drm/i915/i915_active_types.h:35:22: error: field 'rwsem' has incomplete type
35 | struct rw_semaphore rwsem;
| ^~~~~
Fix by adding an include to provide the definition.
Daniel Axtens [Thu, 2 Apr 2020 05:16:32 +0000 (16:16 +1100)]
UBUNTU: SAUCE: (lockdown) powerpc: lock down kernel in secure boot mode
BugLink: https://bugs.launchpad.net/bugs/1855668
PowerNV has recently gained Secure Boot support. If it's enabled through
the firmware and bootloader stack, then lock down the kernel.
Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Seth Forshee [Thu, 10 Oct 2019 15:57:25 +0000 (10:57 -0500)]
UBUNTU: SAUCE: (lockdown) arm64: Allow locking down the kernel under EFI secure boot
Add support to arm64 for the CONFIG_LOCK_DOWN_IN_EFI_SECURE_BOOT
option. When enabled the lockdown LSM will be enabled with
maximum confidentiality when booted under EFI secure boot.
Based on an earlier patch by Linn Crosetto.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
[v2: ported to 5.7-rc1 and adapted to the new fdt parsing mechanism] Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Robert Holmes [Tue, 23 Apr 2019 07:39:29 +0000 (07:39 +0000)]
UBUNTU: SAUCE: (lockdown) KEYS: Make use of platform keyring for module signature verify
This patch completes commit 278311e417be ("kexec, KEYS: Make use of
platform keyring for signature verify") which, while adding the
platform keyring for bzImage verification, neglected to also add
this keyring for module verification.
As such, kernel modules signed with keys from the MokList variable
were not successfully verified.
Signed-off-by: Robert Holmes <robeholmes@gmail.com> Signed-off-by: Jeremy Cline <jcline@redhat.com>
(cherry picked from commit 0d32c182cdbd50dd2fc8d2063d00705d0052387c
git://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
David Howells [Mon, 30 Sep 2019 21:28:16 +0000 (21:28 +0000)]
UBUNTU: SAUCE: (lockdown) efi: Lock down the kernel if booted in secure boot mode
UEFI Secure Boot provides a mechanism for ensuring that the firmware
will only load signed bootloaders and kernels. Certain use cases may
also require that all kernel modules also be signed. Add a
configuration option that to lock down the kernel - which includes
requiring validly signed modules - if the kernel is secure-booted.
Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: Jeremy Cline <jcline@redhat.com>
(cherry picked from commit 2e3e75ce5dfddec32741d4f75d007fbc61aedf39
git://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
David Howells [Tue, 27 Feb 2018 10:04:55 +0000 (10:04 +0000)]
UBUNTU: SAUCE: (lockdown) efi: Add an EFI_SECURE_BOOT flag to indicate secure boot mode
UEFI machines can be booted in Secure Boot mode. Add an EFI_SECURE_BOOT
flag that can be passed to efi_enabled() to find out whether secure boot is
enabled.
Move the switch-statement in x86's setup_arch() that inteprets the
secure_boot boot parameter to generic code and set the bit there.
Suggested-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
cc: linux-efi@vger.kernel.org
[Rebased for context; efi_is_table_address was moved to arch/x86] Signed-off-by: Jeremy Cline <jcline@redhat.com>
(cherry picked from commit a080e08b637d48dc9bdf4367447e47948f6d98b8
git://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Peter Jones [Mon, 2 Oct 2017 22:18:30 +0000 (18:18 -0400)]
UBUNTU: SAUCE: (lockdown) Make get_cert_list() use efi_status_to_str() to print error messages.
Signed-off-by: Peter Jones <pjones@redhat.com> Signed-off-by: Jeremy Cline <jcline@redhat.com>
(cherry picked from commit 63ca37a77ff29e3951a77a9a1d30f9fbb714ed79
git://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Peter Jones [Mon, 2 Oct 2017 22:22:13 +0000 (18:22 -0400)]
UBUNTU: SAUCE: (lockdown) Add efi_status_to_str() and rework efi_status_to_err().
This adds efi_status_to_str() for use when printing efi_status_t
messages, and reworks efi_status_to_err() so that the two use a common
list of errors.
Signed-off-by: Peter Jones <pjones@redhat.com>
(cherry picked from commit 910a6db8a4b0f38f38a4aa61a0fc473182795151
git://git.kernel.org/pub/scm/linux/kernel/git/jwboyer/fedora.git) Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Seth Forshee [Tue, 11 Aug 2020 19:52:12 +0000 (14:52 -0500)]
UBUNTU: hio -- Update to use bio_{start,end}_io_acct with 5.8+
Since e722fff238bb "block: remove generic_{start,end}_io_acct"
the generic io accounting interaces are no longer available.
Switch to using the replacements.
Andrea Righi [Thu, 30 Jul 2020 15:31:37 +0000 (17:31 +0200)]
UBUNTU: SAUCE: apply a workaround to re-enable CONFIG_CRYPTO_AEGIS128_SIMD
After the update to gcc 10 we started to experience the following build
errors on ARM:
crypto/aegis128-neon-inner.c: In function 'crypto_aegis128_init_neon':
crypto/aegis128-neon-inner.c:151:3: error: incompatible types when initializing type 'unsigned char' using type 'uint8x16_t'
151 | k ^ vld1q_u8(const0),
| ^
crypto/aegis128-neon-inner.c:152:3: error: incompatible types when initializing type 'unsigned char' using type 'uint8x16_t'
152 | k ^ vld1q_u8(const1),
| ^
This seems to be a gcc bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96377
The workaround (suggested in the bug report) is to enforce a cast to
uint8x16_t.
Apply the workaround so that we can re-enable the driver disabled by 7c950e057db6 ("UBUNTU: [Config] disable CONFIG_CRYPTO_AEGIS128_SIMD").
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
UBUNTU: SAUCE: Revert "radix-tree: Use local_lock for protection"
This reverts commit cfa6705d89b6562f79c40c249f8d94073c4276e4. It
adds a gpl-only export which is leaking into nvidia module
builds. This is being discussed upstream, but revert the change
in the mean time. This is harmless, as the change is really for
RT builds and was not intended to have any functional change
outside of that context.
This is caused by net_cls and net_prio cgroups disabling cgroup BPF and
causing it to stop refcounting when allocating new sockets. Releasing those
sockets will cause the refcount to go negative, leading to the potential
use-after-free.
Though this revert won't prevent the issue from happening as it could still
theoretically be caused by setting net_cls.classid or net_prio.ifpriomap,
this will prevent it from happening on default system configurations. A
combination of systemd use of cgroup BPF and extensive cgroup use including
net_prio will cause this. Reports usually involve using lxd, libvirt,
docker or kubernetes and some systemd service with IPAddressDeny or
IPAddressAllow.
And though this patch has been introduced to avoid some potential memory
leaks, the cure is worse than the disease. We will need to revisit both
issues later on and reapply this patch when we have a real fix for the
crash.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Acked-by: Ian May <ian.may@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
def test():
with tempfile.TemporaryFile() as fd:
fd.write("data".encode('utf-8'))
# re-open the file to get a read-only file descriptor
return open(f"/proc/self/fd/{fd.fileno()}", "r")
def main():
fd = test()
fd.close()
if __name__ == "__main__":
main()
a similar issue was reported here:
https://github.com/systemd/systemd/issues/14861
Our revalidate methods were very opinionated about whether or not a
lower dentry was valid especially when it became unlinked we simply
invalidated the lower dentry which caused above bug to surface. This has
led to bugs where a ESTALE was returned for e.g. temporary files that
were created and directly re-opened afterwards through
/proc/<pid>/fd/<nr-of-deleted-file>. When a file is re-opened through
/proc/<pid>/fd/<nr> LOOKUP_JUMP is set and the vfs will revalidate via
d_weak_revalidate(). Since the file has been unhashed or even already
gone negative we'd fail the open when we should've succeeded.
Navid Emamdoost [Tue, 16 Jun 2020 11:08:49 +0000 (08:08 -0300)]
UBUNTU: SAUCE: nbd_genl_status: null check for nla_nest_start
CVE-2019-16089
nla_nest_start may fail and return NULL. The check is inserted, and
errno is selected based on other call sites within the same source code.
Update: removed extra new line.
v3 Update: added release reply, thanks to Michal Kubecek for pointing
out.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Ben Hutchings [Tue, 16 Aug 2016 16:27:00 +0000 (10:27 -0600)]
UBUNTU: SAUCE: security,perf: Allow further restriction of perf_event_open
https://lkml.org/lkml/2016/1/11/587
The GRKERNSEC_PERF_HARDEN feature extracted from grsecurity. Adds the
option to disable perf_event_open() entirely for unprivileged users.
This standalone version doesn't include making the variable read-only
(or renaming it).
When kernel.perf_event_open is set to 3 (or greater), disallow all
access to performance events by users without CAP_SYS_ADMIN.
Add a Kconfig symbol CONFIG_SECURITY_PERF_EVENTS_RESTRICT that
makes this value the default.
This is based on a similar feature in grsecurity
(CONFIG_GRKERNSEC_PERF_HARDEN). This version doesn't include making
the variable read-only. It also allows enabling further restriction
at run-time regardless of whether the default is changed.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
[ saf: resolve conflicts with v5.8-rc1 ] Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
/tmp/kernel-sforshee-6727637082e4-45IQ/build/fs/shiftfs.c: In function 'shiftfs_fiemap':
/tmp/kernel-sforshee-6727637082e4-45IQ/build/fs/shiftfs.c:731:13: error: dereferencing pointer to incomplete type 'struct fiemap_extent_info'
/tmp/kernel-sforshee-6727637082e4-45IQ/build/fs/shiftfs.c:731:26: error: 'FIEMAP_FLAG_SYNC' undeclared (first use in this function); did you mean 'FS_XFLAG_SYNC'?
It seems that shiftfs was getting linux/fiemap.h included
indirectly before. Include it directly.
UBUNTU: SAUCE: shiftfs: let userns root destroy subvolumes from other users
BugLink: https://bugs.launchpad.net/bugs/1879688
Stéphane reported a bug found during NorthSec that makes heavy use of
shiftfs. When a subvolume or snapshot is created as userns root in the
container and then chowned to another user a delete as the root user
will fail. The reason for this is that we drop all capabilities as a
safety measure before calling btrfs ioctls. The only workable fix I
could think of is to retain the CAP_DAC_OVERRIDE capability for the
BTRFS_IOC_SNAP_DESTROY ioctl. All other solutions would be way more
invasive.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Seth Forshee <seth.forshee@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Seth Forshee [Tue, 3 Mar 2020 17:09:31 +0000 (11:09 -0600)]
UBUNTU: SAUCE: selftests/net -- disable timeout
Some of our net selftests are timing out in autopkgtest. These
tests pass when run in a different (presumably faster)
environment. It appears that we can't disable the timeout for
individual test cases, so disable the timeout for the net
selftests globally.
Seth Forshee [Tue, 3 Mar 2020 17:23:25 +0000 (11:23 -0600)]
UBUNTU: SAUCE: selftests/net -- disable l2tp.sh test
Our autotest infrastructure tries to disable the test by making
it not executable, but the kselftest runner regards this as an
error. Remove the test from the net selftest makefile to avoid
this.
BugLink: http://bugs.launchpad.net/bugs/1628889
Add support for automatic message tags to the printk macro
families dev_xyz and pr_xyz. The message tag consists of a
component name and a 24 bit hash of the message text. For
each message that is documented in the included kernel message
catalog a man page can be created with a script (which is
included in the patch). The generated man pages contain
explanatory text that is intended to help understand the
messages.
Note that only s390 specific messages are prepared
appropriately and included in the generated message catalog.
This patch is optional as it is very unlikely to be accepted
in upstream kernel, but is recommended for all distributions
which are built based on the 'Development stream'
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
[ saf: Adjust context, fixes for errors caused by 663336ee2628
"device: Add #define dev_fmt similar to #define pr_fmt" ]
[ saf: Adjust context for v5.7-rc, update for move of device
print definitions to dev_printk.h ]
[ saf: Fix yet more conflicts, this time with pr_* macro changes
in v5.8-rc1 ] Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Andy Whitcroft [Fri, 19 Oct 2018 16:44:53 +0000 (16:44 +0000)]
UBUNTU: SAUCE: overlayfs: ensure mounter privileges when reading directories
BugLink: https://launchpad.net/bugs/1793458
When reading directory contents ensure the mounter has permissions for
the operation over the constituent parts (lower and upper). Where we are
in a namespace this ensures that the mounter (root in that namespace)
has permissions over the files and directories, preventing exposure of
protected files and directory contents.
CVE-2018-6559
Signed-off-by: Andy Whitcroft <apw@canonical.com>
[tyhicks: make use of new upstream check in ovl_permission() for copy-ups]
[tyhicks: make use of creator (mounter) creds hanging off the super block] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
UBUNTU: SAUCE: shiftfs: record correct creator credentials
BugLink: https://bugs.launchpad.net/bugs/1872094
When shiftfs is nested we failed to be able to create any files or
access directories because we recorded the wrong creator credentials. We
need to record the credentials of the creator of the lowers mark mount
of shiftfs. Otherwise we aren't privileged wrt to the shiftfs layer in
the nesting case. This is similar to how we always record the user
namespace of the base filesystem.
Suggested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Commit "581e260 block: move block layer internals out of include/linux/genhd.h"
hid disk_map_sector_rcu() (and other blk APIs) from driver code, locally add
back the prototype.
Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
...
WARNING: modpost: drivers/platform/x86/dell-uart-backlight.o(.text+0x979): Section mismatch in reference from the function dell_uart_bl_add() to the variable .init.rodata:dell_uart_backlight_alpha_platform
The function dell_uart_bl_add() references
the variable __initconst dell_uart_backlight_alpha_platform.
This is often because dell_uart_bl_add lacks a __initconst
annotation or the annotation of dell_uart_backlight_alpha_platform is wrong.
dell_uart_bl_add() was referencing an __initconst
dell_uart_backlight_alpha_platform variable without the __init annotation: fix it by removing __initconst
Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Aaron Ma [Thu, 2 Apr 2020 04:22:28 +0000 (12:22 +0800)]
UBUNTU: SAUCE: e1000e: bump up timeout to wait when ME un-configure ULP mode
BugLink: https://bugs.launchpad.net/bugs/1865570
ME takes 2+ seconds to un-configure ULP mode done after resume
from s2idle on some ThinkPad laptops.
Without enough wait, reset and re-init will fail with error.
Fixes: f15bb6dde738cc8fa0 ("e1000e: Add support for S0ix") BugLink: https://bugs.launchpad.net/bugs/1865570 Signed-off-by: Aaron Ma <aaron.ma@canonical.com>
(cherry picked from commit 29ae4ba09c15869dd3dc9b76f92d9d6d50249885
https://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue.git dev-queue) Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Andrea Righi [Mon, 9 Mar 2020 17:22:40 +0000 (18:22 +0100)]
UBUNTU: SAUCE: ptp: free ptp clock properly
There is a bug in ptp_clock_unregister() where pps_unregister_source()
can free up resources needed by posix_clock_unregister() to properly
destroy a related sysfs device.
Fix this by calling pps_unregister_source() in ptp_clock_release().
BugLink: https://bugs.launchpad.net/bugs/1864754 Fixes: a33121e5487b ("ptp: fix the race between the release of ptp_clock and cdev") Tested-by: Piotr Morgwai Kotarbiński <foss@morgwai.pl> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Andrea Righi [Fri, 27 Mar 2020 18:09:21 +0000 (19:09 +0100)]
UBUNTU: SAUCE: mm/page_alloc.c: disable memory reclaim watermark boosting by default
BugLink: https://bugs.launchpad.net/bugs/1861359
High watermark boosting can cause large swap activity under certain
memory intensive workloads, making the system very unresponsive (screen
does not refresh, keyboard not responding, etc.).
Disable this feature by default to prevent potential large swap
activity.
Signed-off-by: Sultan Alsawaf <sultan.alsawaf@canonical.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
AceLan Kao [Tue, 10 Mar 2020 01:48:36 +0000 (09:48 +0800)]
UBUNTU: SAUCE: r8169: disable ASPM L1.1
BguLink: https://bugs.launchpad.net/bugs/1836030
r8169 doesn't suport ASPM L1.1, so we don't have to disable ASPM
completely. Disable ASPM L1.1 doesn't affect the power consumption and
the network function keeps working after S3 test 30 times.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
You-Sheng Yang [Mon, 16 Mar 2020 09:27:21 +0000 (17:27 +0800)]
UBUNTU: SAUCE: Input: i8042 - fix the selftest retry logic
BugLink: https://bugs.launchpad.net/bugs/1866734
It returns -NODEV at the first selftest timeout, so the retry logic
doesn't work. Move the return outside of the while loop to make it real
retry 5 times before returns -ENODEV.
BTW, the origin loop will retry 6 times, also fix this.
Signed-off-by: You-Sheng Yang <vicamo.yang@canonical.com>
(backported from
https://lore.kernel.org/linux-input/20200310033640.14440-1-vicamo@gmail.com/) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
AceLan Kao [Wed, 12 Feb 2020 06:53:15 +0000 (14:53 +0800)]
UBUNTU: SAUCE: platform/x86: dell-uart-backlight: increase retry times
BugLink: https://bugs.launchpad.net/bugs/1862885
From ODM, scalar takes some time to activate panel during booting up,
it can't respond the UART commands within 1 seconds.
So, we add retry and wait 2 seconds for the response. But sometimes it
still fails to read the response.
During the boot up time, it sometimes takes more than 2 seconds to respond
the first command, so we enlarge the retry timeout from 2 seconds to 5
seconds to make sure we get the first response from scalar.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Acked-By: You-Sheng Yang <vicamo.yang@canonical.com> Acked-by: Anthony Wong <anthony.wong@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Andrea Righi [Wed, 12 Feb 2020 09:39:42 +0000 (10:39 +0100)]
UBUNTU: hio -- proc_create() requires a "struct proc_ops" in 5.6
With d56c0d45f0e27f814e87a1676b6bdccccbc252e9 ("proc: decouple proc from
VFS with "struct proc_ops"") proc_create() requires a "struct proc_ops"
instead of a "struct file_operations". Change the code accordingly.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Stefan Bader [Wed, 15 Jan 2020 09:14:28 +0000 (10:14 +0100)]
UBUNTU: SAUCE: md/raid0: Use kernel specific layout
BugLink: https://bugs.launchpad.net/bugs/1850540
This allows to roll out the support for the alternate layout which
accidentally got introduced since kernel v3.14+ without causing
breakage on reboot. The real danger is moving between a 3.13 or
older kernel and any newer. This either has already happened and
the damage has potentially been done or is not yet immediate or
not happening at all (if the raid0 array was created by a 3.14+
kernel). So it is better to just warn from the kernel or once the
user-space tool supporting meta-data update gets rolled out, from
there as well.
Once user-space is in place an with a bit of waiting time this change
should get reverted later.
UBUNTU: SAUCE: shiftfs: prevent lower dentries from going negative during unlink
BugLink: https://bugs.launchpad.net/bugs/1860041
All non-special files (For shiftfs this only includes fifos and - for
this case - unix sockets - since we don't allow character and block
devices to be created.) go through shiftfs_open() and have their dentry
pinned through this codepath preventing it from going negative. But
fifos don't use the shiftfs fops but rather use the pipefifo_fops which
means they do not go through shiftfs_open() and thus don't have their
dentry pinned that way. Thus, the lower dentries for such files can go
negative on unlink causing segfaults. The following C program can be
used to reproduce the crash:
AceLan Kao [Wed, 8 Jan 2020 07:59:45 +0000 (15:59 +0800)]
UBUNTU: SAUCE: platform/x86: dell-uart-backlight: add retry for get scalar status
BugLink: https://bugs.launchpad.net/bugs/1858761
Found on new platforms that UART require more than 1 second to respond
commands in the first 10 seconds after booted.
dell_uart_get_scalar_status() is the first command we send to scalar and
this command should be more reliable than other commands, and make sure
we got correct response from scalar. So, add retry and increase the read
timeout to 2 seconds.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
dann frazier [Wed, 18 Dec 2019 14:29:30 +0000 (07:29 -0700)]
UBUNTU: SAUCE: md/raid0: Link to wiki with guidance on multi-zone RAID0 layout migration
BugLink: https://bugs.launchpad.net/bugs/1850540
Helping an administrator understand this issue and how to deal with it
requires more text than achievable in a kernel error message. Let's
clarify the issue in the Ubuntu wiki, and have the kernel emit a link
to it.
I've submitted a similar change upstream:
https://marc.info/?l=linux-raid&m=157360088014027&w=2
Should it get merged, we should consider replacing this patch with that one.
Otherwise, it is probably safe to drop this SAUCE patch after focal.
Signed-off-by: dann frazier <dann.frazier@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Kai-Heng Feng [Thu, 5 Dec 2019 17:05:27 +0000 (01:05 +0800)]
UBUNTU: SAUCE: USB: core: Attempt power cycle port when it's in eSS.Disabled state
BugLink: https://bugs.launchpad.net/bugs/1855312
On Dell TB16, Realtek USB ethernet (r8152) connects to an SMSC hub which
then connects to ASMedia xHCI's root hub:
/: Bus 04.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/2p, 5000M
|__ Port 1: Dev 2, If 0, Class=Hub, Driver=hub/7p, 5000M
|__ Port 2: Dev 3, If 0, Class=Vendor Specific Class, Driver=r8152, 5000M
Bus 004 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 004 Device 002: ID 0424:5537 Standard Microsystems Corp. USB5537B
Bus 004 Device 003: ID 0bda:8153 Realtek Semiconductor Corp. RTL8153 Gigabit Ethernet Adapter
The SMSC hub may disconnect after system resume from suspend. When this
happens, the reset resume attempt fails, and the last resort to disable
the port and see something comes up later, also fails.
When the issue occurs, the link state stays in eSS.Disabled state
despite the warm reset attempts. Accoding to spec this can be caused by
invalid VBus, after some expiremets, the SMSC hub can be brought back
after a powercycle.
So let's power cycle the port at the end of reset resume attempt, if
it's in eSS.Disabled state.
Kai-Heng Feng [Thu, 5 Dec 2019 17:05:26 +0000 (01:05 +0800)]
UBUNTU: SAUCE: USB: core: Make port power cycle a seperate helper function
BugLink: https://bugs.launchpad.net/bugs/1855312
Add a new function, hub_port_power_cycle() to power cycle port's power.
It'll be used by a following patch.
In addition to that, check the return value of usb_hub_set_port_power(),
so we don't need to wait if the set power operation fails.
Furthermore, remove parameter *hdev from usb_hub_set_port_power(), since
we can get *hdev from *hub directly.
UBUNTU: SAUCE: net: ena: fix too long default tx interrupt moderation interval
BugLink: https://bugs.launchpad.net/bugs/1853180
Current default non-adaptive tx interrupt moderation interval is 196 us.
This commit sets it to 0, which is much more sensible as a default value.
It can be modified using ethtool -C.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Reference: https://lore.kernel.org/netdev/1572868728-5211-1-git-send-email-akiyano@amazon.com/ Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Seth Forshee [Fri, 1 Nov 2019 18:35:25 +0000 (13:35 -0500)]
UBUNTU: SAUCE: shiftfs: Correct id translation for lower fs operations
BugLink: https://bugs.launchpad.net/bugs/1850867
Several locations which shift ids translate user/group ids before
performing operations in the lower filesystem are translating
them into init_user_ns, whereas they should be translated into
the s_user_ns for the lower filesystem. This will result in using
ids other than the intended ones in the lower fs, which will
likely not map into the shifts s_user_ns.
Change these sites to use shift_k[ug]id() to do a translation
into the s_user_ns of the lower filesystem.
Quoting Jann Horn:
#################### Bug 2: Type confusion ####################
shiftfs_btrfs_ioctl_fd_replace() calls fdget(oldfd), then without further checks
passes the resulting file* into shiftfs_real_fdget(), which does this:
/* Did the flags change since open? */
if (unlikely(file->f_flags & ~lowerfd->file->f_flags))
return shiftfs_change_flags(lowerfd->file, file->f_flags);
return 0;
}
file->private_data is a void* that points to a filesystem-dependent type; and
some filesystems even use it to store a type-cast number instead of a pointer.
The implicit cast to a "struct shiftfs_file_info *" can therefore be a bad cast.
As a PoC, here I'm causing a type confusion between struct shiftfs_file_info
(with ->realfile at offset 0x10) and struct mm_struct (with vmacache_seqnum at
offset 0x10), and I use that to cause a memory dereference somewhere around
0x4242:
Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
[ saf: use f_op->open instead as special inodes in shiftfs sbs
will not use shiftfs open f_ops ] Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
CVE-2019-15792
Acked-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Seth Forshee [Fri, 1 Nov 2019 15:41:03 +0000 (10:41 -0500)]
UBUNTU: SAUCE: shiftfs: Fix refcount underflow in btrfs ioctl handling
BugLink: https://bugs.launchpad.net/bugs/1850867
shiftfs_btrfs_ioctl_fd_replace() installs an fd referencing a
file from the lower filesystem without taking an additional
reference to that file. After the btrfs ioctl completes this fd
is closed, which then puts a reference to that file, leading to a
refcount underflow. Original bug report and test case from Jann
Horn is below.
Fix this, and at the sametime simplify the management of the fd
to the lower file for the ioctl. In
shiftfs_btrfs_ioctl_fd_replace(), take the missing reference to
the lower file and set FDPUT_FPUT so that this reference will get
dropped on fdput() in error paths. Do not maintain the struct fd
in the caller, as it the fd installed in the fd table is
sufficient to properly clean up. Finally, remove the fdput() in
shiftfs_btrfs_ioctl_fd_restore() as it is redundant with the
__close_fd() call.
Original report from Jann Horn:
In shiftfs_btrfs_ioctl_fd_replace() ("//" comments added by me):
src = fdget(oldfd);
if (!src.file)
return -EINVAL;
// src holds one reference (assuming multithreaded execution)
ret = shiftfs_real_fdget(src.file, lfd);
// lfd->file is a file* now, but shiftfs_real_fdget didn't take any
// extra references
fdput(src);
// this drops the only reference we were holding on src, and src was
// the only thing holding a reference to lfd->file. lfd->file may be
// dangling at this point.
if (ret)
return ret;
*newfd = get_unused_fd_flags(lfd->file->f_flags);
if (*newfd < 0) {
// always a no-op
fdput(*lfd);
return *newfd;
}
fd_install(*newfd, lfd->file);
// fd_install() consumes a counted reference, but we don't hold any
// counted references. so at this point, if lfd->file hasn't been freed
// yet, its refcount is one lower than it ought to be.
[...]
// the following code is refcount-neutral, so the refcount stays one too
// low.
if (ret)
shiftfs_btrfs_ioctl_fd_restore(cmd, *lfd, *newfd, arg, v1, v2);
/* Did the flags change since open? */
if (unlikely(file->f_flags & ~lowerfd->file->f_flags))
return shiftfs_change_flags(lowerfd->file, file->f_flags);
return 0;
}
Therefore, the following PoC will cause reference count overdecrements; I ran it
with SLUB debugging enabled and got the following splat:
=======================================
user@ubuntu1910vm:~/shiftfs$ cat run.sh
sync
unshare -mUr ./run2.sh
t run2user@ubuntu1910vm:~/shiftfs$ cat run2.sh
set -e
This is an attempted dereference of 0x6b6b6b6b6b6b6b6b, which is POISON_FREE; I
think this corresponds to the load of "realfile->f_op->mmap" in the source code.
We are seeing some EFI based machines failing to boot hard in the EFI
stub:
exit_boot() failed!
efi_main() failed!
This seems to occur when the bootloader (grub2 in this case) has had
to manipulate some additional files due to a change in the way MAAS
boots the machines. We tracked this down to the memory map dance
efi_get_memory_map(). Basically we attempt to close boot services and
it informs us it cannot do so because it failed to record the updated
memory map. This occurs when there is insufficient space in the passed
memory map buffer to record changes during the operation. At the point
when this occurs we are unable to call the allocation functions to
reallocate the buffer so we panic.
To avoid this we allocate some additional entries in the buffer to cover
any additional entries. This headroom is currently insufficient for
these machines under this use case. Increase EFI_MMAP_NR_SLACK_SLOTS to
provide space for more memory map modifications.
UBUNTU: SAUCE: shiftfs: drop CAP_SYS_RESOURCE from effective capabilities
BugLink: https://bugs.launchpad.net/bugs/1849483
Currently shiftfs allows to exceed project quota and reserved space on
e.g. ext2. See [1] and especially [2] for a bug report. This is very
much not what we want. Quotas and reserverd space settings set on the
host need to respected. The cause for this issue is overriding the
credentials with the superblock creator's credentials whenever we
perform operations such as fallocate() or writes while retaining
CAP_SYS_RESOURCE.
The fix is to drop CAP_SYS_RESOURCE from the effective capability set
after we have made a copy of the superblock creator's credential at
superblock creation time. This very likely gives us more security than
we had before and the regression potential seems limited. I would like
to try this apporach first before coming up with something potentially
more sophisticated. I don't see why CAP_SYS_RESOURCE should become a
limiting factor in most use-cases.
[1]: https://github.com/lxc/lxd/issues/6333
[2]: https://github.com/lxc/lxd/issues/6333#issuecomment-545154838 Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1849482
Set the s_maxbytes limit to MAX_LFS_FILESIZE.
Currently shiftfs limits the maximum size for fallocate() needlessly
causing calls such as fallocate --length 2GB ./file to fail. This
limitation is arbitrary since it's not caused by the underlay but
rather by shiftfs itself capping the s_maxbytes. This causes bugs such
as the one reported in [1].
[1]: https://github.com/lxc/lxd/issues/6333 Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
AceLan Kao [Thu, 7 Nov 2019 06:36:44 +0000 (14:36 +0800)]
UBUNTU: SAUCE: platform/x86: dell-uart-backlight: add quirk for old platforms
BugLink: https://bugs.launchpad.net/bugs/1813877
Old platforms do not support DELL_UART_GET_SCALAR command and the
behavior of DELL_UART_GET_FIRMWARE_VER command is different as the new
firmware, so the new way to check if the backlight is controlled by
scalar IC doesn't work on old platforms. We now add them into a list and
use the old way to do the check.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
AceLan Kao [Thu, 7 Nov 2019 06:36:43 +0000 (14:36 +0800)]
UBUNTU: SAUCE: platform/x86: dell-uart-backlight: add force parameter
BugLink: https://bugs.launchpad.net/bugs/1813877
Add force parameter to force load the driver if the platform doesn't
provide a working scalar status command.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
AceLan Kao [Thu, 7 Nov 2019 06:36:41 +0000 (14:36 +0800)]
UBUNTU: SAUCE: platform/x86: dell-uart-backlight: add missing status command
BugLink: https://bugs.launchpad.net/bugs/1813877
DELL_UART_GET_SCALAR has been declared in
drivers/platform/x86/dell-uart-backlight.h, but its definition is
missing. It won't lead to issues on old AIO platforms, since this
command is newly introduced and is not supported by all old AIOs.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Seth Forshee [Wed, 6 Nov 2019 15:02:19 +0000 (09:02 -0600)]
UBUNTU: SAUCE: fs: Move SB_I_NOSUID to the top of s_iflags
BugLink: https://bugs.launchpad.net/bugs/1851677
SB_I_NOSUID was added by a sauce patch, and over time it has come
to occpy the same bit in s_iflags as SB_I_USERNS_VISIBLE without
being noticed. overlayfs will set SB_I_NOSUID when any lower
mount is nosuid. When this happens for a user namespace mount,
mount_too_revealing() will perform additional, unnecessary checks
which may block mounting when it should be allowed.
Move SB_I_NOSUID to prevent this conflict, and move it to the top
of s_iflags to make future conflicts less likely.
Seth Forshee [Wed, 6 Nov 2019 15:57:30 +0000 (09:57 -0600)]
UBUNTU: SAUCE: ovl: Restore vm_file value when lower fs mmap fails
BugLink: https://bugs.launchpad.net/bugs/1850994
ovl_mmap() overwrites vma->vm_file before calling the lower
filesystem mmap but does not restore the original value on
failure. This means it is giving a pointer to the lower fs file
back to the caller with no reference, which is a bad practice.
However, it does not lead to any issues with upstream kernels as
no caller accesses vma->vm_file after call_mmap().
With the aufs patches applied the story is different. Whereas
mmap_region() previously fput a local variable containing the
file it assigned to vm_file, it now calls vma_fput() which will
fput vm_file, for which it has no reference, and the reference
for the original vm_file is not put.
Fix this by restoring vma->vm_file to the original value when the
mmap call into the lower fs fails.
Seth Forshee [Wed, 6 Nov 2019 15:38:57 +0000 (09:38 -0600)]
UBUNTU: SAUCE: shiftfs: Restore vm_file value when lower fs mmap fails
BugLink: https://bugs.launchpad.net/bugs/1850994
shiftfs_mmap() overwrites vma->vm_file before calling the lower
filesystem mmap but does not restore the original value on
failure. This means it is giving a pointer to the lower fs file
back to the caller with no reference, which is a bad practice.
However, it does not lead to any issues with upstream kernels as
no caller accesses vma->vm_file after call_mmap().
With the aufs patches applied the story is different. Whereas
mmap_region() previously fput a local variable containing the
file it assigned to vm_file, it now calls vma_fput() which will
fput vm_file, for which it has no reference, and the reference
for the original vm_file is not put.
Fix this by restoring vma->vm_file to the original value when the
mmap call into the lower fs fails.
UBUNTU: SAUCE: overlayfs: allow with shiftfs as underlay
BugLink: https://bugs.launchpad.net/bugs/1846272
In commit [1] we enabled overlayfs on top of shiftfs. This approach was
buggy since it let to a regression for some standard overlayfs workloads
(cf. [2]).
In our original approach in [1] Seth and I concluded that running
overlayfs on top of shiftfs was not possible because of the way
overlayfs is currently opening files. The fact that it did not pass down
the dentry of shiftfs but rather it's own caused shiftfs to be confused
since it stashes away necessary information in d_fsdata.
Our solution was to modify open_with_fake_path() to also take a dentry
as an argument, then change overlayfs to pass in the shiftfs dentry
which then would override the dentry in the passed in struct path in
open_with_fake_path().
However, this led to a regression for some standard overlayfs workloads
(cf. [2]).
After various discussions involving Seth and myself in Paris we
concluded the reason for the regression was that we effectively created
a struct path that was comprised of the vfsmount of the overlayfs dentry
and the dentry of shiftfs. This is obviously broken.
The fix is to a) not modify open_with_fake_path() and b) change
overlayfs to do what shiftfs is doing, namely correctly setup the struct
path such that vfsmount and dentry match and are both from shiftfs.
Note, that overlayfs already does this for the .open method for
directories. It just did not do it for the .open method for regular
files leading to this issue. The reason why this hasn't been a problem
for overlayfs so far is that it didn't allow running on top of
filesystems that make use of d_fsdata _implicitly_ by disallowing any
filesystem that is itself an overlay, or has revalidate methods for it's
dentries as those usually have d_fsdata set up. Any other filesystem
falling in this category would have suffered from the same problem.
Seth managed to trigger the regression with the following script:
#!/bin/bash
utils=(bash cat)
mkdir -p lower/proc upper work root
for util in ${utils[@]}; do
path="$(which $util)"
dir="$(dirname $path)"
mkdir -p "lower/$dir"
cp -v "$path" "lower/$path"
libs="$(ldd $path | egrep -o '(/usr)?/lib.*\.[0-9]')"
for lib in $libs; do
dir="$(dirname $lib)"
mkdir -p "lower/$dir"
cp -v "$lib" "lower/$lib"
done
done
In the first iteration, we implemented a kmem cache for struct
shiftfs_file_info which stashed away a struct path and the struct file
for the underlay. The path however was never used anywhere so the struct
shiftfs_file_info and therefore the whole kmem cache can go away.
Instead we move to the same model as overlayfs and just stash away the
struct file for the underlay in file->private_data of the shiftfs struct
file.
Addtionally, we split the .open method for files and directories.
Similar to overlayfs .open for regular files uses open_with_fake_path()
which ensures that it doesn't contribute to the open file count (since
this would mean we'd count double). The .open method for directories
however used dentry_open() which contributes to the open file count.
The basic logic for opening files is unchanged. The main point is to
ensure that a reference to the underlay's dentry is kept through struct
path.
Various bits and pieces of this were cooked up in discussions Seth and I
had in Paris.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
This causes an early udevadm trigger to fail. On some installer versions of
Ubuntu, this will cause init to exit, thus panicing the system very early
during boot.
Removing the bus_type from the parent device will remove some of the extra
empty files from /sys/devices/vio/, but will keep the rest of the layout for
vio devices, keeping them under /sys/devices/vio/.
It has been tested that uevents for vio devices don't change after this fix,
they still contain MODALIAS.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
UBUNTU: SAUCE: shiftfs: mark slab objects SLAB_RECLAIM_ACCOUNT
BugLink: https://bugs.launchpad.net/bugs/1842059
Shiftfs does not mark it's slab cache as reclaimable. While this is not
a big deal it is not nice to the kernel in general. The shiftfs cache is
not so important that it can't be reclaimed.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1841977
The way we messed with setting i_nlink was brittle and wrong. We used to
set the i_nlink of the shiftfs dentry to be deleted to the i_nlink count
of the underlay dentry of the directory it resided in which makes no
sense whatsoever. We also missed drop_nlink() which is crucial since
i_nlink affects whether a dentry is cleaned up on dput().
With this I cannot reproduce the bug anymore where shiftfs misleads zfs
into believing that a deleted file can not be removed from disk because
it is still referenced.
Fixes: commit 87011da41961 ("shiftfs: rework and extend") Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Seth Forshee [Wed, 21 Aug 2019 20:09:45 +0000 (15:09 -0500)]
UBUNTU: SAUCE: selftests: fib_tests: assign address to dummy1 for rp_filter tests
The rp_filter test tries to ping using the dummy1 interface
without assigning it an IP address. Give the interface an IP
address so the tests will pass.
BugLink: https://bugs.launchpad.net/bugs/1837231
This used to pass an unsigned long to copy_from_user() instead of a
void __user * pointer. This will produce warning with a sufficiently
advanced compiler.
Currently shiftfs does not handle O_DIRECT if the underlay supports it.
This is blocking dqlite - an essential part of LXD - from profiting from
the performance benefits of O_DIRECT on suitable filesystems when used
with async io such as aio or io_uring.
Overlayfs cannot support this directly since the upper filesystem in
overlay can be any filesystem. So if the upper filesystem does not
support O_DIRECT but the lower filesystem does you're out of luck.
Shiftfs does not suffer from the same problem since there is not concept
of an upper filesystem in the same way that overlayfs has it.
Essentially, shiftfs is a transparent shim relaying everything to the
underlay while overlayfs' upper layer is not (completely).
UBUNTU: SAUCE: usbip: add -Wno-address-of-packed-member to EXTRA_CFLAGS
Fails to build with gcc 9.1.0 due to
-Werror=address-of-packed-member. One example:
usbip_network.c: In function 'usbip_net_pack_usb_device':
usbip_network.c:79:32: error: taking address of packed member of 'struct usbip_usb_device' may result in an unaligned pointer value [-Werror=address-of-packed-member]
79 | usbip_net_pack_uint32_t(pack, &udev->busnum);
| ^~~~~~~~~~~~~
All of these are code which is explicitly packing a struct, so
add -Wno-address-of-packed-member to EXTRA_CFLAGS to disable this
warning.
Andy Whitcroft [Wed, 8 May 2019 13:24:40 +0000 (14:24 +0100)]
UBUNTU: SAUCE: tools -- fix add ability to disable libbfd
BugLink: https://bugs.launchpad.net/bugs/1826410
In commit 14541b1e7e ("perf build: Don't unconditionally link the libbfd
feature test to -liberty and -lz") the enablement code changed radically
neutering our override. Adapt to that new form.
Fixes: 546d50456e ("UBUNTU: SAUCE: tools -- add ability to disable libbfd") Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1828092
We're seeing on the order of 10K cma_alloc() failure messages on
certain systems (HiSilicon D06 w/ SMMU BIOS-disabled, HP m400s).
While we continue to try and identify a solution that avoids
these messages altogether, in the meantime let's lessen the impact
(slow boot time, etc) by ratelimiting these messages. On a D06
w/ SMMU disabled, this drops the error messages count from 10758 to
21.
Signed-off-by: dann frazier <dann.frazier@canonical.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Andrea Righi [Sat, 20 Apr 2019 07:41:00 +0000 (09:41 +0200)]
UBUNTU: SAUCE: integrity: downgrade error to warning
BugLink: https://bugs.launchpad.net/bugs/1766201
In 58441dc86d7b the error "Unable to open file: ..." has been downgraded
to warning in the integrity/ima subsystem. Do the same for a similar
error message in the generic integrity subsystem.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Currently, btrfs workloads employing shiftfs cause regression.
With btrfs unprivileged users can already toggle whether a subvolume
will be ro or rw. This is broken on current shiftfs as we haven't
whitelisted these ioctls().
To prevent such regression, we need to whitelist the ioctls
BTRFS_IOC_FS_INFO, BTRFS_IOC_SUBVOL_GETFLAGS, and
BTRFS_IOC_SUBVOL_SETFLAGS. All of them should be safe for unprivileged
users.
UBUNTU: SAUCE: shiftfs: lock down certain superblock flags
BugLink: https://bugs.launchpad.net/bugs/1827122
This locks down various superblock flags to prevent userns-root from
remounting a superblock with less restrictive options than the original
mark or underlay mount.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Before this commit we used to rely on an llseek method that was
targeted for regular files for both directories and regular files.
However, the realfile's f_pos was not correctly handled when userspace
called lseek(2) on a shiftfs directory file. Give directories their
own llseek operation so that seeking on a directory file is properly
supported.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Before this commit we used to keep a reference to the shiftfs mark
mount's shiftfs_super_info which was stashed in the superblock of the
mark mount. The problem is that we only take a reference to the mount of
the underlay, i.e. the filesystem that is *under* the shiftfs mark
mount. This means when someone performs a shiftfs mark mount, then a
shiftfs overlay mount and then immediately unmounts the shiftfs mark
mount we muck with invalid memory since shiftfs_put_super might have
already been called freeing that memory.
Another solution would be to start reference counting. But this would be
overkill. We only care about the passthrough mount option of the mark
mount. And we only need it to verify that on remount the new passthrough
options of the shiftfs overlay are a subset of the mark mount's
passthrough options. In other scenarios we don't care. So copying up is
good enough and also only needs to happen once on mount, i.e. when a new
superblock is created and the .fill_super method is called.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
UBUNTU: SAUCE: shiftfs: fix passing of attrs to underaly for setattr
BugLink: https://bugs.launchpad.net/bugs/1824717
shiftfs_setattr() makes a copy of the attrs it was passed to pass
to the lower fs. It then calls setattr_prepare() with the original
attrs, and this may make changes which are not reflected in the
attrs passed to the lower fs. To fix this, copy the attrs to the
new struct for the lower fs after calling setattr_prepare().
Additionally, notify_change() may have set ATTR_MODE when one of
ATTR_KILL_S[UG]ID is set, and passing this combination to
notify_change() will trigger a BUG(). Do as overlayfs and
ecryptfs both do, and clear ATTR_MODE if either of those bits
is set.
UBUNTU: SAUCE: shiftfs: use translated ids when chaning lower fs attrs
BugLink: https://bugs.launchpad.net/bugs/1824350
shiftfs_setattr() is preparing a new set of attributes with the
owner translated for the lower fs, but it then passes the
original attrs. As a result the owner is set to the untranslated
owner, which causes the shiftfs inodes to also have incorrect
ids. For example:
# mkdir dir
# touch file
# ls -lh dir file
drwxr-xr-x 2 root root 4.0K Apr 11 13:05 dir
-rw-r--r-- 1 root root 0 Apr 11 13:05 file
# chown 500:500 dir file
# ls -lh dir file
drwxr-xr-x 2 10005001000500 4.0K Apr 11 12:42 dir
-rw-r--r-- 1 10005001000500 0 Apr 11 12:42 file
Fix this to pass the correct iattr struct to notify_change().
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1823186
Shiftfs currently only passes through a few ioctl()s to the underlay. These
are ioctl()s that are generally considered safe. Doing it for random
ioctl()s would be a security issue. Permissions for ioctl()s are not
checked before the filesystem gets involved so if we were to override
credentials we e.g. could do a btrfs tree search in the underlay which we
normally wouldn't be allowed to do.
However, the btrfs filesystem allows unprivileged users to perform various
operations through its ioctl() interface. With shiftfs these ioctl() are
currently not working. To not regress users that expect btrfs ioctl()s to
work in unprivileged containers we can create a whitelist of ioctl()s that
we allow to go through to the underlay and for which we also switch
credentials.
The main problem is how we switch credentials. Since permissions checks for
ioctl()s are
done by the actual file system and not by the vfs this would mean that any
additional capable(<cap>)-based checks done by the filesystem would
unconditonally pass after we switch credentials. So to make credential
switching safe we drop *all* capabilities when switching credentials. This
means that only inode-based permission checks will pass.
Btrfs also allows unprivileged users to delete snapshots when the
filesystem is mounted with user_subvol_rm_allowed mount option or if the
the callers is capable(CAP_SYS_ADMIN). The latter should never be the case
with unprivileged users. To make sure we only allow removal of snapshots in
the former case we drop all capabilities (see above) when switching
credentials.
Additonally, btrfs allows the creation of snapshots. To make this work we
need to be (too) clever. When doing snapshots btrfs requires that an fd to
the directory the snapshot is supposed to be created in be passed along.
This fd obviously references a shiftfs file and as such a shiftfs dentry
and inode. This will cause btrfs to yell EXDEV. To circumnavigate this
problem we need to silently temporarily replace the passed in fd with an fd
that refers to a file that references a btrfs dentry and inode.
BugLink: https://bugs.launchpad.net/bugs/1823186
/* Introduction */
The shiftfs filesystem is implemented as a stacking filesystem. Since it is
a stacking filesystem it shares concepts with overlayfs and ecryptfs.
Usually, shiftfs will be stacked upon another filesystem. The filesystem on
top - shiftfs - is referred to as "upper filesystem" or "overlay" and the
filesystem it is stacked upon is referred to as "lower filesystem" or
"underlay".
/* Marked and Unmarked shiftfs mounts */
To use shiftfs it is necessary that a given mount is marked as shiftable via
the "mark" mount option. Any mount of shiftfs without the "mark" mount option
not on top of a shiftfs mount with the "mark" mount option will be refused with
EPERM.
After a marked shiftfs mount has been performed other shiftfs mounts
referencing the marked shiftfs mount can be created. These secondary shiftfs
mounts are usually what are of interest.
The marked shiftfs mount will take a reference to the underlying mountpoint of
the directory it is marking as shiftable. Any unmarked shiftfts mounts
referencing this marked shifts mount will take a second reference to this
directory as well. This ensures that the underlying marked shiftfs mount can be
unmounted thereby dropping the reference to the underlying directory without
invalidating the mountpoint of said directory since the non-marked shiftfs
mount still holds another reference to it.
/* Stacking Depth */
Shiftfs tries to keep the stack as flat as possible to avoid hitting the
kernel enforced filesystem stacking limit.
/* Permission Model */
When the mark shiftfs mount is created shiftfs will record the credentials of
the creator of the super block and stash it in the super block. When other
non-mark shiftfs mounts are created that reference the mark shiftfs mount they
will stash another reference to the creators credentials. Before calling into
the underlying filesystem shiftfs will switch to the creators credentials and
revert to the original credentials after the underlying filesystem operation
returns.
/* Mount Options */
- mark
When set the mark mount option indicates that the mount in question is
allowed to be shifted. Since shiftfs it mountable in by user namespace root
non-initial user namespace this mount options ensures that the system
administrator has decided that the marked mount is safe to be shifted.
To mark a mount as shiftable CAP_SYS_ADMIN in the user namespace is required.
- passthrough={0,1,2,3}
This mount options functions as a bitmask. When set to a non-zero value
shiftfs will try to act as an invisible shim sitting on top of the
underlying filesystem.
- 1: Shifts will report the filesystem type of the underlay for stat-like
system calls.
- 2: Shiftfs will passthrough whitelisted ioctl() to the underlay.
- 3: Shiftfs will both use 1 and 2.
Note that mount options on a marked mount cannot be changed.
/* Extended Attributes */
Shiftfs will make sure to translate extended attributes.
/* Inodes Numbers */
Shiftfs inodes numbers are copied up from the underlying filesystem, i.e.
shiftfs inode numbers will be identical to the corresponding underlying
filesystem's inode numbers. This has the advantage that inotify and friends
should work out of the box.
(In essence, shiftfs is nothing but a 1:1 mirror of the underlying filesystem's
dentries and inodes.)
/* Device Support */
Shiftfs only supports the creation of pipe and socket devices. Character and
block devices cannot be created through shiftfs.
James Bottomley [Thu, 4 Apr 2019 13:39:11 +0000 (15:39 +0200)]
UBUNTU: SAUCE: shiftfs: uid/gid shifting bind mount
BugLink: https://bugs.launchpad.net/bugs/1823186
This allows any subtree to be uid/gid shifted and bound elsewhere. It
does this by operating simlarly to overlayfs. Its primary use is for
shifting the underlying uids of filesystems used to support
unpriviliged (uid shifted) containers. The usual use case here is
that the container is operating with an uid shifted unprivileged root
but sometimes needs to make use of or work with a filesystem image
that has root at real uid 0.
The mechanism is to allow any subordinate mount namespace to mount a
shiftfs filesystem (by marking it FS_USERNS_MOUNT) but only allowing
it to mount marked subtrees (using the -o mark option as root). Once
mounted, the subtree is mapped via the super block user namespace so
that the interior ids of the mounting user namespace are the ids
written to the filesystem.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
[ saf: use designated initializers for path declarations to fix errors
with struct randomization ] Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
[update: port to 5.0] Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Seth Forshee [Wed, 27 Feb 2019 14:17:08 +0000 (08:17 -0600)]
UBUNTU: SAUCE: selftests: net: Don't fail test_vxlan_under_vrf on xfail
I changed the test for VXLAN underlay in non-default VRF to print
XFAIL for expected failure, but the script still exits with an
error which makes the test overall fail. Fix this to still exit
successfully following the xfail.
Seth Forshee [Mon, 25 Feb 2019 15:13:40 +0000 (09:13 -0600)]
UBUNTU: SAUCE: selftests: net: Make test for VXLAN underlay in non-default VRF an expected failure
This is a new test and fails with older Ubuntu kernels, so it's
not a regression. Change the output from "FAIL" to "XFAIL" for
now so it won't cause test failures. This is temporary until we
find out the reason the test fails.
UBUNTU: SAUCE: prevent a glibc test failure when looking for obsolete types on headers
BugLink: https://bugs.launchpad.net/bugs/1813060
glibc will look for ulong and other obsolete types on headers, including linux
headers, and warn of their use. That, unfortunately, makes automated testing
fail.
Though that type is only referred inside a comment, and the test is what needs
fixing, we are temporarily changing the comment to make tests pass.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Seth Forshee [Wed, 6 Feb 2019 21:17:10 +0000 (15:17 -0600)]
UBUNTU: hio -- part_round_stats() removed in 5.0
This can no longer be called. The only place which was still
calling it for 4.14 and later was ssd_update_smart(), and it was
not updating any statistics used there anyhow, so there's no need
to replace the call with anything else.
Seth Forshee [Wed, 6 Feb 2019 20:12:43 +0000 (14:12 -0600)]
UBUNTU: hio -- replace use of do_gettimeofday()
This function was removed in 5.0. In all cases only the seconds
component of the time is used, and we don't have to worry about
backward compatibility, so just replace it with
ktime_get_real_seconds();
Seth Forshee [Wed, 6 Feb 2019 19:49:13 +0000 (13:49 -0600)]
UBUNTU: hio -- stub out BIOVEC_PHYS_MERGEABLE for 4.20+
This was moved to be internal to the block core in 4.20. It looks
to me like the driver doesn't need to be doing this anyway, as
the block layer already tries to merge bio segments when possible.
But in the worst case we still just end up with segments which
could have been merged but are not merged, which doesn't look to
be fatal.
UBUNTU: SAUCE: selftests: net: fix "from" match test in fib_rule_tests.sh
Fix the IPv4 address of the dummy0 interface and ensure that ip_forward
is enabled in the network space to get a valid response when checking
for routes between the gateway and other hosts.
Seth Forshee [Fri, 25 Jan 2019 18:43:49 +0000 (12:43 -0600)]
UBUNTU: SAUCE: selftests/ftrace: Fix tab expansion in trace_marker snapshot trigger test
When trace lines are passed through echo tabs are being changed
to spaces, causing later string comparisons to fail. Add quotes
around the variables to prevent this.