Casey Schaufler [Thu, 20 Aug 2020 15:43:21 +0000 (08:43 -0700)]
UBUNTU: SAUCE: LSM: Use lsmblob in security_cred_getsecid
Change the security_cred_getsecid() interface to fill in a
lsmblob instead of a u32 secid. The associated data elements
in the audit sub-system are changed from a secid to a lsmblob
to accommodate multiple possible LSM audit users.
Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: John Johansen <john.johansen@canonical.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
cc: linux-integrity@vger.kernel.org Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Casey Schaufler [Thu, 20 Aug 2020 00:28:57 +0000 (17:28 -0700)]
UBUNTU: SAUCE: LSM: Use lsmblob in security_inode_getsecid
Change the security_inode_getsecid() interface to fill in a
lsmblob structure instead of a u32 secid. This allows for its
callers to gather data from all registered LSMs. Data is provided
for IMA and audit.
Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: John Johansen <john.johansen@canonical.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
cc: linux-integrity@vger.kernel.org Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Andrea Righi <andrea.righi@canonical.com>
[ saf: resolve conflicts ] Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Casey Schaufler [Wed, 19 Aug 2020 23:06:37 +0000 (16:06 -0700)]
UBUNTU: SAUCE: LSM: Use lsmblob in security_task_getsecid
Change the security_task_getsecid() interface to fill in
a lsmblob structure instead of a u32 secid in support of
LSM stacking. Audit interfaces will need to collect all
possible secids for possible reporting.
Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: John Johansen <john.johansen@canonical.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com>
cc: linux-integrity@vger.kernel.org Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Andrea Righi <andrea.righi@canonical.com>
[ saf: resolve conflicts ] Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Casey Schaufler [Thu, 19 Mar 2020 16:40:29 +0000 (09:40 -0700)]
UBUNTU: SAUCE: LSM: Use lsmblob in security_ipc_getsecid
There may be more than one LSM that provides IPC data
for auditing. Change security_ipc_getsecid() to fill in
a lsmblob structure instead of the u32 secid. The
audit data structure containing the secid will be updated
later, so there is a bit of scaffolding here.
Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: John Johansen <john.johansen@canonical.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Casey Schaufler [Wed, 19 Aug 2020 16:32:48 +0000 (09:32 -0700)]
UBUNTU: SAUCE: LSM: Use lsmblob in security_secid_to_secctx
Change security_secid_to_secctx() to take a lsmblob as input
instead of a u32 secid. It will then call the LSM hooks
using the lsmblob element allocated for that module. The
callers have been updated as well. This allows for the
possibility that more than one module may be called upon
to translate a secid to a string, as can occur in the
audit code.
Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: John Johansen <john.johansen@canonical.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
UBUNTU: SAUCE: LSM: Use lsmblob in security_secctx_to_secid
Change security_secctx_to_secid() to fill in a lsmblob instead
of a u32 secid. Multiple LSMs may be able to interpret the
string, and this allows for setting whichever secid is
appropriate. Change security_secmark_relabel_packet() to use a
lsmblob instead of a u32 secid. In some other cases there is
scaffolding where interfaces have yet to be converted.
UBUNTU: SAUCE: net: Prepare UDS for security module stacking
Change the data used in UDS SO_PEERSEC processing from a
secid to a more general struct lsmblob. Update the
security_socket_getpeersec_dgram() interface to use the
lsmblob. There is a small amount of scaffolding code
that will come out when the security_secid_to_secctx()
code is brought in line with the lsmblob.
Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Casey Schaufler [Tue, 18 Aug 2020 17:12:56 +0000 (10:12 -0700)]
UBUNTU: SAUCE: LSM: Use lsmblob in security_kernel_act_as
Change the security_kernel_act_as interface to use a lsmblob
structure in place of the single u32 secid in support of
module stacking. Change its only caller, set_security_override,
to do the same. Change that one's only caller,
set_security_override_from_ctx, to call it with the new
parameter type.
The security module hook is unchanged, still taking a secid.
The infrastructure passes the correct entry from the lsmblob.
lsmblob_init() is used to fill the lsmblob structure, however
this will be removed later in the series when security_secctx_to_secid()
is undated to provide a lsmblob instead of a secid.
Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: John Johansen <john.johansen@canonical.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Casey Schaufler [Tue, 18 Aug 2020 00:15:27 +0000 (17:15 -0700)]
UBUNTU: SAUCE: LSM: Use lsmblob in security_audit_rule_match
Change the secid parameter of security_audit_rule_match
to a lsmblob structure pointer. Pass the entry from the
lsmblob structure for the approprite slot to the LSM hook.
Change the users of security_audit_rule_match to use the
lsmblob instead of a u32. The scaffolding function lsmblob_init()
fills the blob with the value of the old secid, ensuring that
it is available to the appropriate module hook. The sources of
the secid, security_task_getsecid() and security_inode_getsecid(),
will be converted to use the blob structure later in the series.
At the point the use of lsmblob_init() is dropped.
Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: John Johansen <john.johansen@canonical.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Andrea Righi <andrea.righi@canonical.com>
[ saf: resolve conflicts ] Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Casey Schaufler [Mon, 17 Aug 2020 23:02:56 +0000 (16:02 -0700)]
UBUNTU: SAUCE: LSM: Create and manage the lsmblob data structure.
When more than one security module is exporting data to
audit and networking sub-systems a single 32 bit integer
is no longer sufficient to represent the data. Add a
structure to be used instead.
The lsmblob structure is currently an array of
u32 "secids". There is an entry for each of the
security modules built into the system that would
use secids if active. The system assigns the module
a "slot" when it registers hooks. If modules are
compiled in but not registered there will be unused
slots.
A new lsm_id structure, which contains the name
of the LSM and its slot number, is created. There
is an instance for each LSM, which assigns the name
and passes it to the infrastructure to set the slot.
The audit rules data is expanded to use an array of
security module data rather than a single instance.
Because IMA uses the audit rule functions it is
affected as well.
Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Andrea Righi <andrea.righi@canonical.com>
[ saf: resolve conflicts ] Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
[ update to support landlock ] Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
UBUNTU: SAUCE: LSM: Infrastructure management of the sock security
Move management of the sock->sk_security blob out
of the individual security modules and into the security
infrastructure. Instead of allocating the blobs from within
the modules the modules tell the infrastructure how much
space is required, and the space is allocated there.
Acked-by: Paul Moore <paul@paul-moore.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: John Johansen <john.johansen@canonical.com> Acked-by: Stephen Smalley <sds@tycho.nsa.gov> Signed-off-by: Casey Schaufler <casey@schaufler-ca.com> Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
John Johansen [Tue, 6 Oct 2020 21:29:39 +0000 (14:29 -0700)]
UBUNTU: SAUCE: apparmor: LSM stacking: switch from SK_CTX() to aa_sock()
LSM: Infrastructure management of the sock security
changes apparmor to use aa_sock() instead of SK_CTX() but doesn't
update the apparmor unix mediation because that code is not upstream.
So make the change here instead of modifying the LSM patch.
Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
John Johansen [Tue, 6 Oct 2020 21:01:04 +0000 (14:01 -0700)]
UBUNTU: SAUCE: apparmor: rename aa_sock() to aa_unix_sk()
The LSM stacking patches introduce and use a macro aa_sock
which conflicts with the apparmor unix mediation patches. Rename
aa_sock() in apparmor to avoid a conflict.
Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
John Johansen [Tue, 6 Oct 2020 21:43:16 +0000 (14:43 -0700)]
UBUNTU: SAUCE: apparmor: disable showing the mode as part of a secid to secctx
Displaying the mode as part of the seectx takes up unnecessary memory,
makes it so we can't use refcounted secctx so we need to alloc/free on
every conversion from secid to secctx and introduces a space that
could be potentially mishandled by tooling.
Eg. In an audit record we get
subj_type=firefix (enforce)
Having the mode reported is not necessary, and might even be confusing
eg. when writing an audit rule to match the above record field you
would use
-F subj_type=firefox
ie. the mode is not included. AppArmor provides ways to find the mode
without reporting as part of the secctx. So disable this by default
before its use is wide spread and we can't. For now we add a sysctl
to control the behavior as we can't guarentee no one is using this.
Signed-off-by: John Johansen <john.johansen@canonical.com> Acked-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
ubuntu-host is a module for providing data to containers via proc.
Initially it is populated with a single file, esm-token, for
supplying ESM access tokens.
Seth Forshee [Wed, 19 Aug 2020 16:22:11 +0000 (11:22 -0500)]
UBUNTU: hio -- Updates for move of make_request_fn to struct block_device_operations
Commit c62b37d96b6e ("block: move ->make_request_fn to struct
block_device_operations") from v5.9-rc1 replaces make_request_fn
with a submit_bio method in struct block_device_operations and
removes the request_queue argument. Update the driver accordingly.
Seth Forshee [Tue, 11 Aug 2020 19:52:12 +0000 (14:52 -0500)]
UBUNTU: hio -- Update to use bio_{start,end}_io_acct with 5.8+
Since e722fff238bb "block: remove generic_{start,end}_io_acct"
the generic io accounting interaces are no longer available.
Switch to using the replacements.
Andrea Righi [Thu, 30 Jul 2020 15:31:37 +0000 (17:31 +0200)]
UBUNTU: SAUCE: apply a workaround to re-enable CONFIG_CRYPTO_AEGIS128_SIMD
After the update to gcc 10 we started to experience the following build
errors on ARM:
crypto/aegis128-neon-inner.c: In function 'crypto_aegis128_init_neon':
crypto/aegis128-neon-inner.c:151:3: error: incompatible types when initializing type 'unsigned char' using type 'uint8x16_t'
151 | k ^ vld1q_u8(const0),
| ^
crypto/aegis128-neon-inner.c:152:3: error: incompatible types when initializing type 'unsigned char' using type 'uint8x16_t'
152 | k ^ vld1q_u8(const1),
| ^
This seems to be a gcc bug:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=96377
The workaround (suggested in the bug report) is to enforce a cast to
uint8x16_t.
Apply the workaround so that we can re-enable the driver disabled by 7c950e057db6 ("UBUNTU: [Config] disable CONFIG_CRYPTO_AEGIS128_SIMD").
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
UBUNTU: SAUCE: Revert "radix-tree: Use local_lock for protection"
This reverts commit cfa6705d89b6562f79c40c249f8d94073c4276e4. It
adds a gpl-only export which is leaking into nvidia module
builds. This is being discussed upstream, but revert the change
in the mean time. This is harmless, as the change is really for
RT builds and was not intended to have any functional change
outside of that context.
def test():
with tempfile.TemporaryFile() as fd:
fd.write("data".encode('utf-8'))
# re-open the file to get a read-only file descriptor
return open(f"/proc/self/fd/{fd.fileno()}", "r")
def main():
fd = test()
fd.close()
if __name__ == "__main__":
main()
a similar issue was reported here:
https://github.com/systemd/systemd/issues/14861
Our revalidate methods were very opinionated about whether or not a
lower dentry was valid especially when it became unlinked we simply
invalidated the lower dentry which caused above bug to surface. This has
led to bugs where a ESTALE was returned for e.g. temporary files that
were created and directly re-opened afterwards through
/proc/<pid>/fd/<nr-of-deleted-file>. When a file is re-opened through
/proc/<pid>/fd/<nr> LOOKUP_JUMP is set and the vfs will revalidate via
d_weak_revalidate(). Since the file has been unhashed or even already
gone negative we'd fail the open when we should've succeeded.
Navid Emamdoost [Tue, 16 Jun 2020 11:08:49 +0000 (08:08 -0300)]
UBUNTU: SAUCE: nbd_genl_status: null check for nla_nest_start
CVE-2019-16089
nla_nest_start may fail and return NULL. The check is inserted, and
errno is selected based on other call sites within the same source code.
Update: removed extra new line.
v3 Update: added release reply, thanks to Michal Kubecek for pointing
out.
Signed-off-by: Navid Emamdoost <navid.emamdoost@gmail.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Ben Hutchings [Tue, 16 Aug 2016 16:27:00 +0000 (10:27 -0600)]
UBUNTU: SAUCE: security,perf: Allow further restriction of perf_event_open
https://lkml.org/lkml/2016/1/11/587
The GRKERNSEC_PERF_HARDEN feature extracted from grsecurity. Adds the
option to disable perf_event_open() entirely for unprivileged users.
This standalone version doesn't include making the variable read-only
(or renaming it).
When kernel.perf_event_open is set to 3 (or greater), disallow all
access to performance events by users without CAP_SYS_ADMIN.
Add a Kconfig symbol CONFIG_SECURITY_PERF_EVENTS_RESTRICT that
makes this value the default.
This is based on a similar feature in grsecurity
(CONFIG_GRKERNSEC_PERF_HARDEN). This version doesn't include making
the variable read-only. It also allows enabling further restriction
at run-time regardless of whether the default is changed.
Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
[ saf: resolve conflicts with v5.8-rc1 ] Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
[ arighi: resolve conflicts with v6.2-rc2 ] Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
/tmp/kernel-sforshee-6727637082e4-45IQ/build/fs/shiftfs.c: In function 'shiftfs_fiemap':
/tmp/kernel-sforshee-6727637082e4-45IQ/build/fs/shiftfs.c:731:13: error: dereferencing pointer to incomplete type 'struct fiemap_extent_info'
/tmp/kernel-sforshee-6727637082e4-45IQ/build/fs/shiftfs.c:731:26: error: 'FIEMAP_FLAG_SYNC' undeclared (first use in this function); did you mean 'FS_XFLAG_SYNC'?
It seems that shiftfs was getting linux/fiemap.h included
indirectly before. Include it directly.
UBUNTU: SAUCE: shiftfs: let userns root destroy subvolumes from other users
BugLink: https://bugs.launchpad.net/bugs/1879688
Stéphane reported a bug found during NorthSec that makes heavy use of
shiftfs. When a subvolume or snapshot is created as userns root in the
container and then chowned to another user a delete as the root user
will fail. The reason for this is that we drop all capabilities as a
safety measure before calling btrfs ioctls. The only workable fix I
could think of is to retain the CAP_DAC_OVERRIDE capability for the
BTRFS_IOC_SNAP_DESTROY ioctl. All other solutions would be way more
invasive.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Seth Forshee <seth.forshee@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Seth Forshee [Tue, 3 Mar 2020 17:09:31 +0000 (11:09 -0600)]
UBUNTU: SAUCE: selftests/net -- disable timeout
Some of our net selftests are timing out in autopkgtest. These
tests pass when run in a different (presumably faster)
environment. It appears that we can't disable the timeout for
individual test cases, so disable the timeout for the net
selftests globally.
UBUNTU: SAUCE: shiftfs: record correct creator credentials
BugLink: https://bugs.launchpad.net/bugs/1872094
When shiftfs is nested we failed to be able to create any files or
access directories because we recorded the wrong creator credentials. We
need to record the credentials of the creator of the lowers mark mount
of shiftfs. Otherwise we aren't privileged wrt to the shiftfs layer in
the nesting case. This is similar to how we always record the user
namespace of the base filesystem.
Suggested-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Commit "581e260 block: move block layer internals out of include/linux/genhd.h"
hid disk_map_sector_rcu() (and other blk APIs) from driver code, locally add
back the prototype.
Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
...
WARNING: modpost: drivers/platform/x86/dell-uart-backlight.o(.text+0x979): Section mismatch in reference from the function dell_uart_bl_add() to the variable .init.rodata:dell_uart_backlight_alpha_platform
The function dell_uart_bl_add() references
the variable __initconst dell_uart_backlight_alpha_platform.
This is often because dell_uart_bl_add lacks a __initconst
annotation or the annotation of dell_uart_backlight_alpha_platform is wrong.
dell_uart_bl_add() was referencing an __initconst
dell_uart_backlight_alpha_platform variable without the __init annotation: fix it by removing __initconst
Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Andrea Righi [Mon, 9 Mar 2020 17:22:40 +0000 (18:22 +0100)]
UBUNTU: SAUCE: ptp: free ptp clock properly
There is a bug in ptp_clock_unregister() where pps_unregister_source()
can free up resources needed by posix_clock_unregister() to properly
destroy a related sysfs device.
Fix this by calling pps_unregister_source() in ptp_clock_release().
BugLink: https://bugs.launchpad.net/bugs/1864754 Fixes: a33121e5487b ("ptp: fix the race between the release of ptp_clock and cdev") Tested-by: Piotr Morgwai Kotarbiński <foss@morgwai.pl> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
You-Sheng Yang [Mon, 16 Mar 2020 09:27:21 +0000 (17:27 +0800)]
UBUNTU: SAUCE: Input: i8042 - fix the selftest retry logic
BugLink: https://bugs.launchpad.net/bugs/1866734
It returns -NODEV at the first selftest timeout, so the retry logic
doesn't work. Move the return outside of the while loop to make it real
retry 5 times before returns -ENODEV.
BTW, the origin loop will retry 6 times, also fix this.
Signed-off-by: You-Sheng Yang <vicamo.yang@canonical.com>
(backported from
https://lore.kernel.org/linux-input/20200310033640.14440-1-vicamo@gmail.com/) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
AceLan Kao [Wed, 12 Feb 2020 06:53:15 +0000 (14:53 +0800)]
UBUNTU: SAUCE: platform/x86: dell-uart-backlight: increase retry times
BugLink: https://bugs.launchpad.net/bugs/1862885
From ODM, scalar takes some time to activate panel during booting up,
it can't respond the UART commands within 1 seconds.
So, we add retry and wait 2 seconds for the response. But sometimes it
still fails to read the response.
During the boot up time, it sometimes takes more than 2 seconds to respond
the first command, so we enlarge the retry timeout from 2 seconds to 5
seconds to make sure we get the first response from scalar.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Acked-By: You-Sheng Yang <vicamo.yang@canonical.com> Acked-by: Anthony Wong <anthony.wong@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Andrea Righi [Wed, 12 Feb 2020 09:39:42 +0000 (10:39 +0100)]
UBUNTU: hio -- proc_create() requires a "struct proc_ops" in 5.6
With d56c0d45f0e27f814e87a1676b6bdccccbc252e9 ("proc: decouple proc from
VFS with "struct proc_ops"") proc_create() requires a "struct proc_ops"
instead of a "struct file_operations". Change the code accordingly.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Stefan Bader [Wed, 15 Jan 2020 09:14:28 +0000 (10:14 +0100)]
UBUNTU: SAUCE: md/raid0: Use kernel specific layout
BugLink: https://bugs.launchpad.net/bugs/1850540
This allows to roll out the support for the alternate layout which
accidentally got introduced since kernel v3.14+ without causing
breakage on reboot. The real danger is moving between a 3.13 or
older kernel and any newer. This either has already happened and
the damage has potentially been done or is not yet immediate or
not happening at all (if the raid0 array was created by a 3.14+
kernel). So it is better to just warn from the kernel or once the
user-space tool supporting meta-data update gets rolled out, from
there as well.
Once user-space is in place an with a bit of waiting time this change
should get reverted later.
UBUNTU: SAUCE: shiftfs: prevent lower dentries from going negative during unlink
BugLink: https://bugs.launchpad.net/bugs/1860041
All non-special files (For shiftfs this only includes fifos and - for
this case - unix sockets - since we don't allow character and block
devices to be created.) go through shiftfs_open() and have their dentry
pinned through this codepath preventing it from going negative. But
fifos don't use the shiftfs fops but rather use the pipefifo_fops which
means they do not go through shiftfs_open() and thus don't have their
dentry pinned that way. Thus, the lower dentries for such files can go
negative on unlink causing segfaults. The following C program can be
used to reproduce the crash:
AceLan Kao [Wed, 8 Jan 2020 07:59:45 +0000 (15:59 +0800)]
UBUNTU: SAUCE: platform/x86: dell-uart-backlight: add retry for get scalar status
BugLink: https://bugs.launchpad.net/bugs/1858761
Found on new platforms that UART require more than 1 second to respond
commands in the first 10 seconds after booted.
dell_uart_get_scalar_status() is the first command we send to scalar and
this command should be more reliable than other commands, and make sure
we got correct response from scalar. So, add retry and increase the read
timeout to 2 seconds.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
dann frazier [Wed, 18 Dec 2019 14:29:30 +0000 (07:29 -0700)]
UBUNTU: SAUCE: md/raid0: Link to wiki with guidance on multi-zone RAID0 layout migration
BugLink: https://bugs.launchpad.net/bugs/1850540
Helping an administrator understand this issue and how to deal with it
requires more text than achievable in a kernel error message. Let's
clarify the issue in the Ubuntu wiki, and have the kernel emit a link
to it.
I've submitted a similar change upstream:
https://marc.info/?l=linux-raid&m=157360088014027&w=2
Should it get merged, we should consider replacing this patch with that one.
Otherwise, it is probably safe to drop this SAUCE patch after focal.
Signed-off-by: dann frazier <dann.frazier@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
UBUNTU: SAUCE: net: ena: fix too long default tx interrupt moderation interval
BugLink: https://bugs.launchpad.net/bugs/1853180
Current default non-adaptive tx interrupt moderation interval is 196 us.
This commit sets it to 0, which is much more sensible as a default value.
It can be modified using ethtool -C.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com>
Reference: https://lore.kernel.org/netdev/1572868728-5211-1-git-send-email-akiyano@amazon.com/ Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Seth Forshee [Fri, 1 Nov 2019 18:35:25 +0000 (13:35 -0500)]
UBUNTU: SAUCE: shiftfs: Correct id translation for lower fs operations
BugLink: https://bugs.launchpad.net/bugs/1850867
Several locations which shift ids translate user/group ids before
performing operations in the lower filesystem are translating
them into init_user_ns, whereas they should be translated into
the s_user_ns for the lower filesystem. This will result in using
ids other than the intended ones in the lower fs, which will
likely not map into the shifts s_user_ns.
Change these sites to use shift_k[ug]id() to do a translation
into the s_user_ns of the lower filesystem.
Quoting Jann Horn:
#################### Bug 2: Type confusion ####################
shiftfs_btrfs_ioctl_fd_replace() calls fdget(oldfd), then without further checks
passes the resulting file* into shiftfs_real_fdget(), which does this:
/* Did the flags change since open? */
if (unlikely(file->f_flags & ~lowerfd->file->f_flags))
return shiftfs_change_flags(lowerfd->file, file->f_flags);
return 0;
}
file->private_data is a void* that points to a filesystem-dependent type; and
some filesystems even use it to store a type-cast number instead of a pointer.
The implicit cast to a "struct shiftfs_file_info *" can therefore be a bad cast.
As a PoC, here I'm causing a type confusion between struct shiftfs_file_info
(with ->realfile at offset 0x10) and struct mm_struct (with vmacache_seqnum at
offset 0x10), and I use that to cause a memory dereference somewhere around
0x4242:
Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com>
[ saf: use f_op->open instead as special inodes in shiftfs sbs
will not use shiftfs open f_ops ] Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
CVE-2019-15792
Acked-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Seth Forshee [Fri, 1 Nov 2019 15:41:03 +0000 (10:41 -0500)]
UBUNTU: SAUCE: shiftfs: Fix refcount underflow in btrfs ioctl handling
BugLink: https://bugs.launchpad.net/bugs/1850867
shiftfs_btrfs_ioctl_fd_replace() installs an fd referencing a
file from the lower filesystem without taking an additional
reference to that file. After the btrfs ioctl completes this fd
is closed, which then puts a reference to that file, leading to a
refcount underflow. Original bug report and test case from Jann
Horn is below.
Fix this, and at the sametime simplify the management of the fd
to the lower file for the ioctl. In
shiftfs_btrfs_ioctl_fd_replace(), take the missing reference to
the lower file and set FDPUT_FPUT so that this reference will get
dropped on fdput() in error paths. Do not maintain the struct fd
in the caller, as it the fd installed in the fd table is
sufficient to properly clean up. Finally, remove the fdput() in
shiftfs_btrfs_ioctl_fd_restore() as it is redundant with the
__close_fd() call.
Original report from Jann Horn:
In shiftfs_btrfs_ioctl_fd_replace() ("//" comments added by me):
src = fdget(oldfd);
if (!src.file)
return -EINVAL;
// src holds one reference (assuming multithreaded execution)
ret = shiftfs_real_fdget(src.file, lfd);
// lfd->file is a file* now, but shiftfs_real_fdget didn't take any
// extra references
fdput(src);
// this drops the only reference we were holding on src, and src was
// the only thing holding a reference to lfd->file. lfd->file may be
// dangling at this point.
if (ret)
return ret;
*newfd = get_unused_fd_flags(lfd->file->f_flags);
if (*newfd < 0) {
// always a no-op
fdput(*lfd);
return *newfd;
}
fd_install(*newfd, lfd->file);
// fd_install() consumes a counted reference, but we don't hold any
// counted references. so at this point, if lfd->file hasn't been freed
// yet, its refcount is one lower than it ought to be.
[...]
// the following code is refcount-neutral, so the refcount stays one too
// low.
if (ret)
shiftfs_btrfs_ioctl_fd_restore(cmd, *lfd, *newfd, arg, v1, v2);
/* Did the flags change since open? */
if (unlikely(file->f_flags & ~lowerfd->file->f_flags))
return shiftfs_change_flags(lowerfd->file, file->f_flags);
return 0;
}
Therefore, the following PoC will cause reference count overdecrements; I ran it
with SLUB debugging enabled and got the following splat:
=======================================
user@ubuntu1910vm:~/shiftfs$ cat run.sh
sync
unshare -mUr ./run2.sh
t run2user@ubuntu1910vm:~/shiftfs$ cat run2.sh
set -e
This is an attempted dereference of 0x6b6b6b6b6b6b6b6b, which is POISON_FREE; I
think this corresponds to the load of "realfile->f_op->mmap" in the source code.
We are seeing some EFI based machines failing to boot hard in the EFI
stub:
exit_boot() failed!
efi_main() failed!
This seems to occur when the bootloader (grub2 in this case) has had
to manipulate some additional files due to a change in the way MAAS
boots the machines. We tracked this down to the memory map dance
efi_get_memory_map(). Basically we attempt to close boot services and
it informs us it cannot do so because it failed to record the updated
memory map. This occurs when there is insufficient space in the passed
memory map buffer to record changes during the operation. At the point
when this occurs we are unable to call the allocation functions to
reallocate the buffer so we panic.
To avoid this we allocate some additional entries in the buffer to cover
any additional entries. This headroom is currently insufficient for
these machines under this use case. Increase EFI_MMAP_NR_SLACK_SLOTS to
provide space for more memory map modifications.
UBUNTU: SAUCE: shiftfs: drop CAP_SYS_RESOURCE from effective capabilities
BugLink: https://bugs.launchpad.net/bugs/1849483
Currently shiftfs allows to exceed project quota and reserved space on
e.g. ext2. See [1] and especially [2] for a bug report. This is very
much not what we want. Quotas and reserverd space settings set on the
host need to respected. The cause for this issue is overriding the
credentials with the superblock creator's credentials whenever we
perform operations such as fallocate() or writes while retaining
CAP_SYS_RESOURCE.
The fix is to drop CAP_SYS_RESOURCE from the effective capability set
after we have made a copy of the superblock creator's credential at
superblock creation time. This very likely gives us more security than
we had before and the regression potential seems limited. I would like
to try this apporach first before coming up with something potentially
more sophisticated. I don't see why CAP_SYS_RESOURCE should become a
limiting factor in most use-cases.
[1]: https://github.com/lxc/lxd/issues/6333
[2]: https://github.com/lxc/lxd/issues/6333#issuecomment-545154838 Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1849482
Set the s_maxbytes limit to MAX_LFS_FILESIZE.
Currently shiftfs limits the maximum size for fallocate() needlessly
causing calls such as fallocate --length 2GB ./file to fail. This
limitation is arbitrary since it's not caused by the underlay but
rather by shiftfs itself capping the s_maxbytes. This causes bugs such
as the one reported in [1].
[1]: https://github.com/lxc/lxd/issues/6333 Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
AceLan Kao [Thu, 7 Nov 2019 06:36:44 +0000 (14:36 +0800)]
UBUNTU: SAUCE: platform/x86: dell-uart-backlight: add quirk for old platforms
BugLink: https://bugs.launchpad.net/bugs/1813877
Old platforms do not support DELL_UART_GET_SCALAR command and the
behavior of DELL_UART_GET_FIRMWARE_VER command is different as the new
firmware, so the new way to check if the backlight is controlled by
scalar IC doesn't work on old platforms. We now add them into a list and
use the old way to do the check.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
AceLan Kao [Thu, 7 Nov 2019 06:36:43 +0000 (14:36 +0800)]
UBUNTU: SAUCE: platform/x86: dell-uart-backlight: add force parameter
BugLink: https://bugs.launchpad.net/bugs/1813877
Add force parameter to force load the driver if the platform doesn't
provide a working scalar status command.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
AceLan Kao [Thu, 7 Nov 2019 06:36:41 +0000 (14:36 +0800)]
UBUNTU: SAUCE: platform/x86: dell-uart-backlight: add missing status command
BugLink: https://bugs.launchpad.net/bugs/1813877
DELL_UART_GET_SCALAR has been declared in
drivers/platform/x86/dell-uart-backlight.h, but its definition is
missing. It won't lead to issues on old AIO platforms, since this
command is newly introduced and is not supported by all old AIOs.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Seth Forshee [Wed, 6 Nov 2019 15:38:57 +0000 (09:38 -0600)]
UBUNTU: SAUCE: shiftfs: Restore vm_file value when lower fs mmap fails
BugLink: https://bugs.launchpad.net/bugs/1850994
shiftfs_mmap() overwrites vma->vm_file before calling the lower
filesystem mmap but does not restore the original value on
failure. This means it is giving a pointer to the lower fs file
back to the caller with no reference, which is a bad practice.
However, it does not lead to any issues with upstream kernels as
no caller accesses vma->vm_file after call_mmap().
With the aufs patches applied the story is different. Whereas
mmap_region() previously fput a local variable containing the
file it assigned to vm_file, it now calls vma_fput() which will
fput vm_file, for which it has no reference, and the reference
for the original vm_file is not put.
Fix this by restoring vma->vm_file to the original value when the
mmap call into the lower fs fails.
In the first iteration, we implemented a kmem cache for struct
shiftfs_file_info which stashed away a struct path and the struct file
for the underlay. The path however was never used anywhere so the struct
shiftfs_file_info and therefore the whole kmem cache can go away.
Instead we move to the same model as overlayfs and just stash away the
struct file for the underlay in file->private_data of the shiftfs struct
file.
Addtionally, we split the .open method for files and directories.
Similar to overlayfs .open for regular files uses open_with_fake_path()
which ensures that it doesn't contribute to the open file count (since
this would mean we'd count double). The .open method for directories
however used dentry_open() which contributes to the open file count.
The basic logic for opening files is unchanged. The main point is to
ensure that a reference to the underlay's dentry is kept through struct
path.
Various bits and pieces of this were cooked up in discussions Seth and I
had in Paris.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
This causes an early udevadm trigger to fail. On some installer versions of
Ubuntu, this will cause init to exit, thus panicing the system very early
during boot.
Removing the bus_type from the parent device will remove some of the extra
empty files from /sys/devices/vio/, but will keep the rest of the layout for
vio devices, keeping them under /sys/devices/vio/.
It has been tested that uevents for vio devices don't change after this fix,
they still contain MODALIAS.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Marcelo Henrique Cerri <marcelo.cerri@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
UBUNTU: SAUCE: shiftfs: mark slab objects SLAB_RECLAIM_ACCOUNT
BugLink: https://bugs.launchpad.net/bugs/1842059
Shiftfs does not mark it's slab cache as reclaimable. While this is not
a big deal it is not nice to the kernel in general. The shiftfs cache is
not so important that it can't be reclaimed.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1841977
The way we messed with setting i_nlink was brittle and wrong. We used to
set the i_nlink of the shiftfs dentry to be deleted to the i_nlink count
of the underlay dentry of the directory it resided in which makes no
sense whatsoever. We also missed drop_nlink() which is crucial since
i_nlink affects whether a dentry is cleaned up on dput().
With this I cannot reproduce the bug anymore where shiftfs misleads zfs
into believing that a deleted file can not be removed from disk because
it is still referenced.
Fixes: commit 87011da41961 ("shiftfs: rework and extend") Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1837231
This used to pass an unsigned long to copy_from_user() instead of a
void __user * pointer. This will produce warning with a sufficiently
advanced compiler.
Currently shiftfs does not handle O_DIRECT if the underlay supports it.
This is blocking dqlite - an essential part of LXD - from profiting from
the performance benefits of O_DIRECT on suitable filesystems when used
with async io such as aio or io_uring.
Overlayfs cannot support this directly since the upper filesystem in
overlay can be any filesystem. So if the upper filesystem does not
support O_DIRECT but the lower filesystem does you're out of luck.
Shiftfs does not suffer from the same problem since there is not concept
of an upper filesystem in the same way that overlayfs has it.
Essentially, shiftfs is a transparent shim relaying everything to the
underlay while overlayfs' upper layer is not (completely).
UBUNTU: SAUCE: usbip: add -Wno-address-of-packed-member to EXTRA_CFLAGS
Fails to build with gcc 9.1.0 due to
-Werror=address-of-packed-member. One example:
usbip_network.c: In function 'usbip_net_pack_usb_device':
usbip_network.c:79:32: error: taking address of packed member of 'struct usbip_usb_device' may result in an unaligned pointer value [-Werror=address-of-packed-member]
79 | usbip_net_pack_uint32_t(pack, &udev->busnum);
| ^~~~~~~~~~~~~
All of these are code which is explicitly packing a struct, so
add -Wno-address-of-packed-member to EXTRA_CFLAGS to disable this
warning.
Andy Whitcroft [Wed, 8 May 2019 13:24:40 +0000 (14:24 +0100)]
UBUNTU: SAUCE: tools -- fix add ability to disable libbfd
BugLink: https://bugs.launchpad.net/bugs/1826410
In commit 14541b1e7e ("perf build: Don't unconditionally link the libbfd
feature test to -liberty and -lz") the enablement code changed radically
neutering our override. Adapt to that new form.
Fixes: 546d50456e ("UBUNTU: SAUCE: tools -- add ability to disable libbfd") Signed-off-by: Andy Whitcroft <apw@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Andrea Righi [Sat, 20 Apr 2019 07:41:00 +0000 (09:41 +0200)]
UBUNTU: SAUCE: integrity: downgrade error to warning
BugLink: https://bugs.launchpad.net/bugs/1766201
In 58441dc86d7b the error "Unable to open file: ..." has been downgraded
to warning in the integrity/ima subsystem. Do the same for a similar
error message in the generic integrity subsystem.
Signed-off-by: Andrea Righi <andrea.righi@canonical.com> Acked-by: Connor Kuehl <connor.kuehl@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Currently, btrfs workloads employing shiftfs cause regression.
With btrfs unprivileged users can already toggle whether a subvolume
will be ro or rw. This is broken on current shiftfs as we haven't
whitelisted these ioctls().
To prevent such regression, we need to whitelist the ioctls
BTRFS_IOC_FS_INFO, BTRFS_IOC_SUBVOL_GETFLAGS, and
BTRFS_IOC_SUBVOL_SETFLAGS. All of them should be safe for unprivileged
users.
UBUNTU: SAUCE: shiftfs: lock down certain superblock flags
BugLink: https://bugs.launchpad.net/bugs/1827122
This locks down various superblock flags to prevent userns-root from
remounting a superblock with less restrictive options than the original
mark or underlay mount.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Before this commit we used to rely on an llseek method that was
targeted for regular files for both directories and regular files.
However, the realfile's f_pos was not correctly handled when userspace
called lseek(2) on a shiftfs directory file. Give directories their
own llseek operation so that seeking on a directory file is properly
supported.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Before this commit we used to keep a reference to the shiftfs mark
mount's shiftfs_super_info which was stashed in the superblock of the
mark mount. The problem is that we only take a reference to the mount of
the underlay, i.e. the filesystem that is *under* the shiftfs mark
mount. This means when someone performs a shiftfs mark mount, then a
shiftfs overlay mount and then immediately unmounts the shiftfs mark
mount we muck with invalid memory since shiftfs_put_super might have
already been called freeing that memory.
Another solution would be to start reference counting. But this would be
overkill. We only care about the passthrough mount option of the mark
mount. And we only need it to verify that on remount the new passthrough
options of the shiftfs overlay are a subset of the mark mount's
passthrough options. In other scenarios we don't care. So copying up is
good enough and also only needs to happen once on mount, i.e. when a new
superblock is created and the .fill_super method is called.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
UBUNTU: SAUCE: shiftfs: fix passing of attrs to underaly for setattr
BugLink: https://bugs.launchpad.net/bugs/1824717
shiftfs_setattr() makes a copy of the attrs it was passed to pass
to the lower fs. It then calls setattr_prepare() with the original
attrs, and this may make changes which are not reflected in the
attrs passed to the lower fs. To fix this, copy the attrs to the
new struct for the lower fs after calling setattr_prepare().
Additionally, notify_change() may have set ATTR_MODE when one of
ATTR_KILL_S[UG]ID is set, and passing this combination to
notify_change() will trigger a BUG(). Do as overlayfs and
ecryptfs both do, and clear ATTR_MODE if either of those bits
is set.
UBUNTU: SAUCE: shiftfs: use translated ids when chaning lower fs attrs
BugLink: https://bugs.launchpad.net/bugs/1824350
shiftfs_setattr() is preparing a new set of attributes with the
owner translated for the lower fs, but it then passes the
original attrs. As a result the owner is set to the untranslated
owner, which causes the shiftfs inodes to also have incorrect
ids. For example:
# mkdir dir
# touch file
# ls -lh dir file
drwxr-xr-x 2 root root 4.0K Apr 11 13:05 dir
-rw-r--r-- 1 root root 0 Apr 11 13:05 file
# chown 500:500 dir file
# ls -lh dir file
drwxr-xr-x 2 10005001000500 4.0K Apr 11 12:42 dir
-rw-r--r-- 1 10005001000500 0 Apr 11 12:42 file
Fix this to pass the correct iattr struct to notify_change().
Reviewed-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
BugLink: https://bugs.launchpad.net/bugs/1823186
Shiftfs currently only passes through a few ioctl()s to the underlay. These
are ioctl()s that are generally considered safe. Doing it for random
ioctl()s would be a security issue. Permissions for ioctl()s are not
checked before the filesystem gets involved so if we were to override
credentials we e.g. could do a btrfs tree search in the underlay which we
normally wouldn't be allowed to do.
However, the btrfs filesystem allows unprivileged users to perform various
operations through its ioctl() interface. With shiftfs these ioctl() are
currently not working. To not regress users that expect btrfs ioctl()s to
work in unprivileged containers we can create a whitelist of ioctl()s that
we allow to go through to the underlay and for which we also switch
credentials.
The main problem is how we switch credentials. Since permissions checks for
ioctl()s are
done by the actual file system and not by the vfs this would mean that any
additional capable(<cap>)-based checks done by the filesystem would
unconditonally pass after we switch credentials. So to make credential
switching safe we drop *all* capabilities when switching credentials. This
means that only inode-based permission checks will pass.
Btrfs also allows unprivileged users to delete snapshots when the
filesystem is mounted with user_subvol_rm_allowed mount option or if the
the callers is capable(CAP_SYS_ADMIN). The latter should never be the case
with unprivileged users. To make sure we only allow removal of snapshots in
the former case we drop all capabilities (see above) when switching
credentials.
Additonally, btrfs allows the creation of snapshots. To make this work we
need to be (too) clever. When doing snapshots btrfs requires that an fd to
the directory the snapshot is supposed to be created in be passed along.
This fd obviously references a shiftfs file and as such a shiftfs dentry
and inode. This will cause btrfs to yell EXDEV. To circumnavigate this
problem we need to silently temporarily replace the passed in fd with an fd
that refers to a file that references a btrfs dentry and inode.
BugLink: https://bugs.launchpad.net/bugs/1823186
/* Introduction */
The shiftfs filesystem is implemented as a stacking filesystem. Since it is
a stacking filesystem it shares concepts with overlayfs and ecryptfs.
Usually, shiftfs will be stacked upon another filesystem. The filesystem on
top - shiftfs - is referred to as "upper filesystem" or "overlay" and the
filesystem it is stacked upon is referred to as "lower filesystem" or
"underlay".
/* Marked and Unmarked shiftfs mounts */
To use shiftfs it is necessary that a given mount is marked as shiftable via
the "mark" mount option. Any mount of shiftfs without the "mark" mount option
not on top of a shiftfs mount with the "mark" mount option will be refused with
EPERM.
After a marked shiftfs mount has been performed other shiftfs mounts
referencing the marked shiftfs mount can be created. These secondary shiftfs
mounts are usually what are of interest.
The marked shiftfs mount will take a reference to the underlying mountpoint of
the directory it is marking as shiftable. Any unmarked shiftfts mounts
referencing this marked shifts mount will take a second reference to this
directory as well. This ensures that the underlying marked shiftfs mount can be
unmounted thereby dropping the reference to the underlying directory without
invalidating the mountpoint of said directory since the non-marked shiftfs
mount still holds another reference to it.
/* Stacking Depth */
Shiftfs tries to keep the stack as flat as possible to avoid hitting the
kernel enforced filesystem stacking limit.
/* Permission Model */
When the mark shiftfs mount is created shiftfs will record the credentials of
the creator of the super block and stash it in the super block. When other
non-mark shiftfs mounts are created that reference the mark shiftfs mount they
will stash another reference to the creators credentials. Before calling into
the underlying filesystem shiftfs will switch to the creators credentials and
revert to the original credentials after the underlying filesystem operation
returns.
/* Mount Options */
- mark
When set the mark mount option indicates that the mount in question is
allowed to be shifted. Since shiftfs it mountable in by user namespace root
non-initial user namespace this mount options ensures that the system
administrator has decided that the marked mount is safe to be shifted.
To mark a mount as shiftable CAP_SYS_ADMIN in the user namespace is required.
- passthrough={0,1,2,3}
This mount options functions as a bitmask. When set to a non-zero value
shiftfs will try to act as an invisible shim sitting on top of the
underlying filesystem.
- 1: Shifts will report the filesystem type of the underlay for stat-like
system calls.
- 2: Shiftfs will passthrough whitelisted ioctl() to the underlay.
- 3: Shiftfs will both use 1 and 2.
Note that mount options on a marked mount cannot be changed.
/* Extended Attributes */
Shiftfs will make sure to translate extended attributes.
/* Inodes Numbers */
Shiftfs inodes numbers are copied up from the underlying filesystem, i.e.
shiftfs inode numbers will be identical to the corresponding underlying
filesystem's inode numbers. This has the advantage that inotify and friends
should work out of the box.
(In essence, shiftfs is nothing but a 1:1 mirror of the underlying filesystem's
dentries and inodes.)
/* Device Support */
Shiftfs only supports the creation of pipe and socket devices. Character and
block devices cannot be created through shiftfs.
James Bottomley [Thu, 4 Apr 2019 13:39:11 +0000 (15:39 +0200)]
UBUNTU: SAUCE: shiftfs: uid/gid shifting bind mount
BugLink: https://bugs.launchpad.net/bugs/1823186
This allows any subtree to be uid/gid shifted and bound elsewhere. It
does this by operating simlarly to overlayfs. Its primary use is for
shifting the underlying uids of filesystems used to support
unpriviliged (uid shifted) containers. The usual use case here is
that the container is operating with an uid shifted unprivileged root
but sometimes needs to make use of or work with a filesystem image
that has root at real uid 0.
The mechanism is to allow any subordinate mount namespace to mount a
shiftfs filesystem (by marking it FS_USERNS_MOUNT) but only allowing
it to mount marked subtrees (using the -o mark option as root). Once
mounted, the subtree is mapped via the super block user namespace so
that the interior ids of the mounting user namespace are the ids
written to the filesystem.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
[ saf: use designated initializers for path declarations to fix errors
with struct randomization ] Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
[update: port to 5.0] Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Acked-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
UBUNTU: SAUCE: prevent a glibc test failure when looking for obsolete types on headers
BugLink: https://bugs.launchpad.net/bugs/1813060
glibc will look for ulong and other obsolete types on headers, including linux
headers, and warn of their use. That, unfortunately, makes automated testing
fail.
Though that type is only referred inside a comment, and the test is what needs
fixing, we are temporarily changing the comment to make tests pass.
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Seth Forshee [Wed, 6 Feb 2019 21:17:10 +0000 (15:17 -0600)]
UBUNTU: hio -- part_round_stats() removed in 5.0
This can no longer be called. The only place which was still
calling it for 4.14 and later was ssd_update_smart(), and it was
not updating any statistics used there anyhow, so there's no need
to replace the call with anything else.
Seth Forshee [Wed, 6 Feb 2019 20:12:43 +0000 (14:12 -0600)]
UBUNTU: hio -- replace use of do_gettimeofday()
This function was removed in 5.0. In all cases only the seconds
component of the time is used, and we don't have to worry about
backward compatibility, so just replace it with
ktime_get_real_seconds();
Seth Forshee [Wed, 6 Feb 2019 19:49:13 +0000 (13:49 -0600)]
UBUNTU: hio -- stub out BIOVEC_PHYS_MERGEABLE for 4.20+
This was moved to be internal to the block core in 4.20. It looks
to me like the driver doesn't need to be doing this anyway, as
the block layer already tries to merge bio segments when possible.
But in the worst case we still just end up with segments which
could have been merged but are not merged, which doesn't look to
be fatal.
UBUNTU: SAUCE: selftests: net: fix "from" match test in fib_rule_tests.sh
Fix the IPv4 address of the dummy0 interface and ensure that ip_forward
is enabled in the network space to get a valid response when checking
for routes between the gateway and other hosts.
Seth Forshee [Fri, 25 Jan 2019 18:43:49 +0000 (12:43 -0600)]
UBUNTU: SAUCE: selftests/ftrace: Fix tab expansion in trace_marker snapshot trigger test
When trace lines are passed through echo tabs are being changed
to spaces, causing later string comparisons to fail. Add quotes
around the variables to prevent this.
UBUNTU: SAUCE: binder: give binder_alloc its own debug mask file
Currently both binder.c and binder_alloc.c both register the
/sys/module/binder_linux/paramters/debug_mask file which leads to conflicts
in sysfs. This commit gives binder_alloc.c its own
/sys/module/binder_linux/paramters/alloc_debug_mask file.
Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The Android binder driver needs to become a module for the sake of shipping
Anbox. To do this we need to export the following functions since binder is
currently still using them:
Juerg Haefliger [Fri, 18 Jan 2019 12:40:02 +0000 (13:40 +0100)]
UBUNTU: SAUCE: fan: Fix NULL pointer dereference
BugLink: https://bugs.launchpad.net/bugs/1811803
Fix a NULL pointer dereference in fan code that can easily be triggered
by running:
$ sudo ip link add foo type ipip
Signed-off-by: Juerg Haefliger <juergh@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
UBUNTU: SAUCE: x86/quirks: Scan all busses for early PCI quirks
BugLink: https://bugs.launchpad.net/bugs/1797990
Recently was noticed in an HP GEN9 system that kdump couldn't succeed
due to an irq storm coming from an Intel NIC, narrowed down to be lack
of clearing the MSI/MSI-X enable bits during the kdump kernel boot.
For that, we need an early quirk to manually turn off MSI/MSI-X for
PCI devices - this was worked as an optional boot parameter in a
(~subsequent~) previous patch.
Problem is that in our test system, the Intel NICs were not present in
any secondary bus under the first PCIe root complex, so they couldn't
be reached by the recursion in check_dev_quirk(). Modern systems,
specially with multi-processors and multiple NUMA nodes expose multiple
root complexes, describing more than one PCI hierarchy domain. Currently
the simple recursion present in the early-quirks code from x86 starts a
descending recursion from bus 0000:00, and reach many other busses by
navigating this hierarchy walking through the bridges. This is not
enough in systems with more than one root complex/host bridge, since
the recursion won't "traverse" to other root complexes by starting
statically in 0000:00 (for more details, see [0]).
This patch hence implements the full bus/device/function scan in
early_quirks(), by checking all possible busses instead of using a
recursion based on the first root bus or limiting the search scope to
the first 32 busses (like it was done in the beginning [1]).
[1] From historical perspective, early PCI scan dates back
to BitKeeper, added by Andi Kleen's "[PATCH] APIC fixes for x86-64",
on October/2003. It initially restricted the search to the first
32 busses and slots.
Due to a potential bug found in Nvidia chipsets, the scan
was changed to run only in the first root bus: see
commit 8659c406ade3 ("x86: only scan the root bus in early PCI quirks")
Finally, secondary busses reachable from the 1st bus were re-added back by:
commit 850c321027c2 ("x86/quirks: Reintroduce scanning of secondary buses")
Reported-by: Dan Streetman <ddstreet@canonical.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com>
[mfo: v2:
- gate the bus-scan differences with the cmdline option.
- update changelog: subsequent/previous patch.] Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
UBUNTU: SAUCE: x86/quirks: Add parameter to clear MSIs early on boot
BugLink: https://bugs.launchpad.net/bugs/1797990
We observed a kdump failure in x86 that was narrowed down to MSI irq
storm coming from a PCI network device. The bug manifests as a lack of
progress in the boot process of kdump kernel, and a flood of kernel
messages like:
[...]
[ 342.265294] do_IRQ: 0.155 No irq handler for vector
[ 342.266916] do_IRQ: 0.155 No irq handler for vector
[ 347.258422] do_IRQ: 14053260 callbacks suppressed
[...]
The root cause of the issue is that kexec process of the kdump kernel
doesn't ensure PCI devices are reset or MSI capabilities are disabled,
so a PCI adapter could produce a huge amount of irqs which would steal
all the processing time for the CPU (specially since we usually restrict
kdump kernel to use a single CPU only).
This patch implements the kernel parameter "pci=clearmsi" to clear the
MSI/MSI-X enable bits in the Message Control register for all PCI devices
during early boot time, thus preventing potential issues in the kexec'ed
kernel. PCI spec also supports/enforces this need (see PCI Local Bus
spec sections 6.8.1.3 and 6.8.2.3).
Suggested-by: Dan Streetman <ddstreet@canonical.com> Suggested-by: Gavin Shan <shan.gavin@linux.alibaba.com> Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
UBUNTU: SAUCE: x86/PCI: Export find_cap() to be used in early PCI code
BugLink: https://bugs.launchpad.net/bugs/1797990
This patch exports (and renames) the function find_cap() to be used
in the early PCI quirk code, by the next patch.
This is being moved out from AGP code to generic early-PCI code
since it's not AGP-specific and can be used for any PCI device.
No functional changes intended.
Signed-off-by: Guilherme G. Piccoli <gpiccoli@canonical.com> Signed-off-by: Mauricio Faria de Oliveira <mfo@canonical.com> Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
AceLan Kao [Thu, 20 Sep 2018 08:41:14 +0000 (16:41 +0800)]
UBUNTU: SAUCE: platform/x86: dell-uart-backlight: new backlight driver for DELL AIO
BugLink: https://bugs.launchpad.net/bugs/1727235
The Dell AIO machines released after 2017 come with a UART interface
to communicate with the backlight scalar board. This driver creates
a standard backlight interface and talks to the scalar board through
UART.
In DSDT this uart port will be defined as
Name (_HID, "DELL0501")
Name (_CID, EisaId ("PNP0501")
The 8250 PNP driver will be loaded by default, and this driver uses
"DELL0501" to confirm the uart port is a backlight interface and
leverage the port created by 8250 PNP driver to communicate with
the scalar board.
v2:
1. move struct uart_cmd to .c file
2. make dell_uart_get_bl_power() inline
3. add space to ternary operator "bl_cmd->cmd[2] = power ? 0 : 1;"
4. check return value of kzalloc()
5. add kzfree()
6. check return value of backlight_device_register()
7. check return value of filp_open() at init
v3:
1. Fix compiling warning.
v4:
1. make *tty and *ftty static
2. bl_cmd->ret[0] will never be less than 0, fixed the if statement
3. fix some line over 80 chars warnings.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Seth Forshee [Mon, 26 Feb 2018 21:32:55 +0000 (15:32 -0600)]
UBUNTU: SAUCE: tools: use CC for linking acpi tools
Prior to 7ed1c1901fe5 ("tools: fix cross-compile var clobbering")
the acpi tools makefiles were using gcc for linking. That commit
causes ld to be used instead, however this doesn't work as the
flags supplied are meant for gcc and not ld. Change the acpi
tools rules to use $(QUIET_LINK)$(CC) for linking to fix this
regression.
John Johansen [Sun, 17 Jun 2018 10:56:25 +0000 (03:56 -0700)]
UBUNTU: SAUCE: apparmor: patch to provide compatibility with v2.x net rules
The networking rules upstreamed in 4.17 have a deliberate abi break
with the older 2.x network rules.
This patch provides compatibility with the older rules for those
still using an apparmor 2.x userspace and still want network rules
to work on a newer kernel.
Signed-off-by: John Johansen <john.johansen@canonical.com>
[ saf: resolve conflicts when rebasing to 4.20 ] Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
[ arighi: resolve conflicts when rebasing to 6.2 ] Signed-off-by: Andrea Righi <andrea.righi@canonical.com>