Seth Forshee [Tue, 29 Nov 2016 04:02:48 +0000 (22:02 -0600)]
UBUNTU: [Config] Update annotations
Update the annotations file based on the current configurations.
Add bug numbers for some options, and flag some options for
review which require another look.
Seth Forshee [Fri, 18 Nov 2016 19:37:40 +0000 (13:37 -0600)]
UBUNTU: [Config] Fix up CONFIG_I2C_SLAVE values
The annotations file says this should be =n on all architectures
except for armhf, but it is currently =y on several others. This
is because it's being selected by CONFIG_I2C_EMEV2, which is only
really relevant for armhf, and CONFIG_I2C_RCAR, which is only
relevant on armhf and arm64. Disable CONFIG_I2C_EMEV2 and
CONFIG_I2C_SLAVE for all other architectures, and update the
annotations file entry for CONFIG_I2C_EMEV2 accordingly.
Seth Forshee [Tue, 15 Nov 2016 16:36:49 +0000 (10:36 -0600)]
UBUNTU: [Config] Fix s390x config carnage
In all the config carnage during the yakkety cycle some of the
s390x config options got changed by mistake. Try to fix this
based on xenial's config and the git history from yakkety.
Seth Forshee [Tue, 22 Nov 2016 14:32:17 +0000 (08:32 -0600)]
UBUNTU: [Debian] config-check -- Make it easier to find annotations syntax errors
The errors from eval due to malformed annotations values often
are not very helpful in tracking down the source of the error.
Make this easier by printing the name of the confguration option
along with the error string.
UBUNTU: SAUCE: (namespace) block_dev: Forbid unprivileged mounting when device is opened for writing
For unprivileged mounts to be safe the user must not be able to
make changes to the backing store while it is mounted. This patch
takes a step towards preventing this by refusing to mount in a
user namepspace if the block device is open for writing and
refusing attempts to open the block device for writing by non-
root while it is mounted in a user namespace.
To prevent this from happening we use i_writecount in the inodes
of the bdev filesystem similarly to how it is used for regular
files. Whenever the device is opened for writing i_writecount
is checked; if it is negative the open returns -EBUSY, otherwise
i_writecount is incremented. On mount, a positive i_writecount
results in mount_bdev returning -EBUSY, otherwise i_writecount
is decremented. Opens by root and mounts from init_user_ns do not
check nor modify i_writecount.
Seth Forshee [Tue, 9 Feb 2016 19:26:34 +0000 (13:26 -0600)]
UBUNTU: SAUCE: (namespace) ext4: Add module parameter to enable user namespace mounts
This is still an experimental feature, so disable it by default
and allow it only when the system administrator supplies the
userns_mounts=true module parameter.
Seth Forshee [Sat, 18 Oct 2014 11:02:09 +0000 (13:02 +0200)]
UBUNTU: SAUCE: (namespace) ext4: Add support for unprivileged mounts from user namespaces
Support unprivileged mounting of ext4 volumes from user
namespaces. This requires the following changes:
- Perform all uid, gid, and projid conversions to/from disk
relative to s_user_ns. In many cases this will already be
handled by the vfs helper functions. This also requires
updates to handle cases where ids may not map into s_user_ns.
A new helper, projid_valid_eq(), is added to help with this.
- Update most capability checks to check for capabilities in
s_user_ns rather than init_user_ns. These mostly reflect
changes to the filesystem that a user in s_user_ns could
already make externally by virtue of having write access to
the backing device.
- Restrict unsafe options in either the mount options or the
ext4 superblock. Currently the only concerning option is
errors=panic, and this is made to require CAP_SYS_ADMIN in
init_user_ns.
- Verify that unprivileged users have the required access to the
journal device at the path passed via the journal_path mount
option.
Note that for the journal_path and the journal_dev mount
options, and for external journal devices specified in the
ext4 superblock, devcgroup restrictions will be enforced by
__blkdev_get(), (via blkdev_get_by_dev()), ensuring that the
user has been granted appropriate access to the block device.
- Set the FS_USERNS_MOUNT flag on the filesystem types supported
by ext4.
sysfs attributes for ext4 mounts remain writable only by real
root.
Seth Forshee [Thu, 2 Oct 2014 20:34:45 +0000 (15:34 -0500)]
UBUNTU: SAUCE: (namespace) fuse: Restrict allow_other to the superblock's namespace or a descendant
Unprivileged users are normally restricted from mounting with the
allow_other option by system policy, but this could be bypassed
for a mount done with user namespace root permissions. In such
cases allow_other should not allow users outside the userns
to access the mount as doing so would give the unprivileged user
the ability to manipulate processes it would otherwise be unable
to manipulate. Restrict allow_other to apply to users in the same
userns used at mount or a descendant of that namespace. Also
export current_in_userns() for use by fuse when built as a
module.
Seth Forshee [Wed, 24 Aug 2016 16:47:05 +0000 (11:47 -0500)]
UBUNTU: SAUCE: (namespace) fuse: Translate ids in posix acl xattrs
Fuse currently lacks comprehensive support for posix ACLs, but
some fuse filesystems process the acl xattrs internally. For this
to continue to work the ids within the xattrs need to be mapped
into s_user_ns when written to the filesystem and mapped from
s_user_ns when read.
Seth Forshee [Thu, 26 Jun 2014 16:58:11 +0000 (11:58 -0500)]
UBUNTU: SAUCE: (namespace) fuse: Support fuse filesystems outside of init_user_ns
In order to support mounts from namespaces other than
init_user_ns, fuse must translate uids and gids to/from the
userns of the process servicing requests on /dev/fuse. This
patch does that, with a couple of restrictions on the namespace:
- The userns for the fuse connection is fixed to the namespace
from which /dev/fuse is opened.
- The namespace must be the same as s_user_ns.
These restrictions simplify the implementation by avoiding the
need to pass around userns references and by allowing fuse to
rely on the checks in inode_change_ok for ownership changes.
Either restriction could be relaxed in the future if needed.
For cuse the namespace used for the connection is also simply
current_user_ns() at the time /dev/cuse is opened.
UBUNTU: SAUCE: (namespace) fuse: Add support for pid namespaces
When the userspace process servicing fuse requests is running in
a pid namespace then pids passed via the fuse fd are not being
translated into that process' namespace. Translation is necessary
for the pid to be useful to that process.
Since no use case currently exists for changing namespaces all
translations can be done relative to the pid namespace in use
when fuse_conn_init() is called. For fuse this translates to
mount time, and for cuse this is when /dev/cuse is opened. IO for
this connection from another namespace will return errors.
Requests from processes whose pid cannot be translated into the
target namespace will have a value of 0 for in.h.pid.
File locking changes based on previous work done by Eric
Biederman.
UBUNTU: SAUCE: (namespace) posix_acl: Export posix_acl_fix_xattr_userns() to modules
Fuse will make use of this function to provide backwards-
compatible acl support when proper posix acl support is added.
Add a check to return immediately if the to and from namespaces
are the same, and remove equivalent checks from its callers.
Also return an error code to indicate to callers whether or not
the conversion of the id between the user namespaces was
successful. For a valid xattr the id will continue to be changed
regardless to maintain the current behaviour for existing
callers, so they do not require updates to handle failed
conversions.
Seth Forshee [Sun, 15 Feb 2015 20:35:35 +0000 (14:35 -0600)]
UBUNTU: SAUCE: (namespace) fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems
The user in control of a super block should be allowed to freeze
and thaw it. Relax the restrictions on the FIFREEZE and FITHAW
ioctls to require CAP_SYS_ADMIN in s_user_ns.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
UBUNTU: SAUCE: (namespace) capabilities: Allow privileged user in s_user_ns to set security.* xattrs
A privileged user in s_user_ns will generally have the ability to
manipulate the backing store and insert security.* xattrs into
the filesystem directly. Therefore the kernel must be prepared to
handle these xattrs from unprivileged mounts, and it makes little
sense for commoncap to prevent writing these xattrs to the
filesystem. The capability and LSM code have already been updated
to appropriately handle xattrs from unprivileged mounts, so it
is safe to loosen this restriction on setting xattrs.
The exception to this logic is that writing xattrs to a mounted
filesystem may also cause the LSM inode_post_setxattr or
inode_setsecurity callbacks to be invoked. SELinux will deny the
xattr update by virtue of applying mountpoint labeling to
unprivileged userns mounts, and Smack will deny the writes for
any user without global CAP_MAC_ADMIN, so loosening the
capability check in commoncap is safe in this respect as well.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
UBUNTU: SAUCE: (namespace) fs: Allow superblock owner to access do_remount_sb()
Superblock level remounts are currently restricted to global
CAP_SYS_ADMIN, as is the path for changing the root mount to
read only on umount. Loosen both of these permission checks to
also allow CAP_SYS_ADMIN in any namespace which is privileged
towards the userns which originally mounted the filesystem.
Signed-off-by: Seth Forshee <seth.forshee@canonical.com> Acked-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Serge Hallyn <serge.hallyn@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
UBUNTU: SAUCE: (namespace) fs: Allow superblock owner to change ownership of inodes
Allow users with CAP_SYS_CHOWN over the superblock of a filesystem to
chown files. Ordinarily the capable_wrt_inode_uidgid check is
sufficient to allow access to files but when the underlying filesystem
has uids or gids that don't map to the current user namespace it is
not enough, so the chown permission checks need to be extended to
allow this case.
Calling chown on filesystem nodes whose uid or gid don't map is
necessary if those nodes are going to be modified as writing back
inodes which contain uids or gids that don't map is likely to cause
filesystem corruption of the uid or gid fields.
Once chown has been called the existing capable_wrt_inode_uidgid
checks are sufficient, to allow the owner of a superblock to do anything
the global root user can do with an appropriate set of capabilities.
For the proc filesystem this relaxation of permissions is not safe, as
some files are owned by users (particularly GLOBAL_ROOT_UID) outside
of the control of the mounter of the proc and that would be unsafe to
grant chown access to. So update setattr on proc to disallow changing
files whose uids or gids are outside of proc's s_user_ns.
The original version of this patch was written by: Seth Forshee. I
have rewritten and rethought this patch enough so it's really not the
same thing (certainly it needs a different description), but he
deserves credit for getting out there and getting the conversation
started, and finding the potential gotcha's and putting up with my
semi-paranoid feedback.
Inspired-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
[ saf: Context adjustments in proc ] Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Seth Forshee [Wed, 7 Oct 2015 19:53:33 +0000 (14:53 -0500)]
UBUNTU: SAUCE: (namespace) mtd: Check permissions towards mtd block device inode when mounting
Unprivileged users should not be able to mount mtd block devices
when they lack sufficient privileges towards the block device
inode. Update mount_mtd() to validate that the user has the
required access to the inode at the specified path. The check
will be skipped for CAP_SYS_ADMIN, so privileged mounts will
continue working as before.
Seth Forshee [Wed, 7 Oct 2015 19:49:47 +0000 (14:49 -0500)]
UBUNTU: SAUCE: (namespace) block_dev: Check permissions towards block device inode when mounting
Unprivileged users should not be able to mount block devices when
they lack sufficient privileges towards the block device inode.
Update blkdev_get_by_path() to validate that the user has the
required access to the inode at the specified path. The check
will be skipped for CAP_SYS_ADMIN, so privileged mounts will
continue working as before.
UBUNTU: SAUCE: (namespace) block_dev: Support checking inode permissions in lookup_bdev()
When looking up a block device by path no permission check is
done to verify that the user has access to the block device inode
at the specified path. In some cases it may be necessary to
check permissions towards the inode, such as allowing
unprivileged users to mount block devices in user namespaces.
Add an argument to lookup_bdev() to optionally perform this
permission check. A value of 0 skips the permission check and
behaves the same as before. A non-zero value specifies the mask
of access rights required towards the inode at the specified
path. The check is always skipped if the user has CAP_SYS_ADMIN.
All callers of lookup_bdev() currently pass a mask of 0, so this
patch results in no functional change. Subsequent patches will
add permission checks where appropriate.
Tim Gardner [Thu, 17 Nov 2016 19:18:05 +0000 (12:18 -0700)]
UBUNTU: SAUCE: UEFI: bpf: disable bpf when module security is enabled
BPF carnage - Hi, It looks like CONFIG_BPF_EVENTS needs to be disabled
in secure boot environments since you can read kernel memory (and
hence, the hibernation image signing key) by attaching an eBPF program
to a tracepoint through a perf_event_open() fd which uses bpf_probe_read()
and either bpf_trace_printk() or bpf_probe_write_user(). (Or, rather,
kernel memory _reads_ need to be added to the threat model if a private
key is held in kernel memory.) -Kees -- Kees Cook Nexus Security
Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Conflicts:
kernel/bpf/syscall.c
Ming Lei [Thu, 3 Nov 2016 01:20:01 +0000 (09:20 +0800)]
UBUNTU: SAUCE: hio: splitting bio in the entry of .make_request_fn
BugLink: http://bugs.launchpad.net/bugs/1638700
From v4.3, the incoming bio can be very big[1], and it is
required to split it first in .make_request_fn(), so
we need to do that for hio.c too.
Signed-off-by: Ming Lei <ming.lei@canonical.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
This program is free software; you can redistribute it and/or modify it
under the terms and conditions of the GNU General Public License,
version 2, as published by the Free Software Foundation.
This program is distributed in the hope it will be useful, but WITHOUT
ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
more details.
Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Acked-by: Tim Gardner <tim.gardner@canonical.com> Signed-off-by: Kamal Mostafa <kamal@canonical.com> BugLink: http://bugs.launchpad.net/bugs/1635594 Signed-off-by: Andy Whitcroft <apw@canonical.com> Acked-by: Leann Ogasawara <leann.ogasawara@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Rob Nelson [Tue, 17 Nov 2015 23:47:27 +0000 (15:47 -0800)]
UBUNTU: SAUCE: nvme: improve performance for virtual NVMe devices
This change provides a mechanism to reduce the number of MMIO doorbell
writes for the NVMe driver. When running in a virtualized environment
like QEMU, the cost of an MMIO is quite hefy here. The main idea for
the patch is provide the device two memory location locations:
1) to store the doorbell values so they can be lookup without the doorbell
MMIO write
2) to store an event index.
I believe the doorbell value is obvious, the event index not so much.
Similar to the virtio specificaiton, the virtual device can tell the
driver (guest OS) not to write MMIO unless you are writing past this
value.
FYI: doorbell values are written by the nvme driver (guest OS) and the
event index is written by the virtual device (host OS).
The patch implements a new admin command that will communicate where
these two memory locations reside. If the command fails, the nvme
driver will work as before without any optimizations.
Contributions:
Eric Northup <digitaleric@google.com>
Frank Swiderski <fes@google.com>
Ted Tso <tytso@mit.edu>
Keith Busch <keith.busch@intel.com>
Just to give an idea on the performance boost with the vendor
extension: Running fio [1], a stock NVMe driver I get about 200K read
IOPs with my vendor patch I get about 1000K read IOPs. This was
running with a null device i.e. the backing device simply returned
success on every read IO request.
Signed-off-by: Rob Nelson <rlnelson@google.com>
[mlin: port for upstream] Signed-off-by: Ming Lin <mlin@kernel.org>
[koike: updated for current APIs] Signed-off-by: Helen Mae Koike Fornazier <helen.koike@collabora.co.uk>
Conflicts:
drivers/nvme/host/Kconfig
drivers/nvme/host/pci.c
Andy Whitcroft [Wed, 26 Oct 2016 16:47:57 +0000 (17:47 +0100)]
UBUNTU: [Config] switch squashfs to single threaded decode
There is some issue with squashfs decoding when done in a multi-threaded
manner which leads to large memory consumption. Either we have a leak
or more probabally we have pathalogical case leading to horrible internal
fragmentation. For the moment turn it off while it can be investigated.
BugLink: http://bugs.launchpad.net/bugs/1636847 Signed-off-by: Andy Whitcroft <apw@canonical.com> Acked-by: Brad Figg <brad.figg@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Gavin Guo [Wed, 12 Oct 2016 01:13:35 +0000 (09:13 +0800)]
UBUNTU: SAUCE: (no-up) If zone is so small that watermarks are the same, stop zone balance.
BugLink: http://bugs.launchpad.net/bugs/1518457
On an AWS t2.micro instance (Xeon E5-2670, 991MiB of memory).
Occasionally (about once a day), kswapd0 falls into a busy loop and
spins on 100% CPU usage indefinitely. Reject to do the zone balance
when the memory is too small.
Andy Whitcroft [Thu, 6 Oct 2016 13:22:12 +0000 (14:22 +0100)]
UBUNTU: SAUCE: (no-up) include/linux/security.h -- fix syntax error with CONFIG_SECURITYFS=n
commit c2ac27f7a443 ("securityfs: update interface to allow
inode_ops, and setup from vfs") introduced a syntax error
in include/linux/security.h when CONFIG_SECURITYFS is not set.
This is exercised by the zfcpdump-kernel for s390x.
BugLink: http://bugs.launchpad.net/bugs/1630990 Signed-off-by: Andy Whitcroft <apw@canonical.com> Acked-by: Colin King <colin.king@canonical.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com>