git.proxmox.com Git - mirror_ubuntu-zesty-kernel.git/log

livepatch: add sympos as disambiguator field to klp_reloc

In cases of duplicate symbols, sympos will be used to disambiguate instead
of val. By default sympos will be 0, and patching will only succeed if
the symbol is unique. Specifying a positive value will ensure that
occurrence of the symbol in kallsyms for the patched object will be used
for patching if it is valid. For external relocations sympos is not
supported.

Remove klp_verify_callback, klp_verify_args and klp_verify_vmlinux_symbol
as they are no longer used.

From the klp_reloc structure remove val, as it can be refactored as a
local variable in klp_write_object_relocations.

Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
(cherry picked from commit 064c89df6247cd829a7880cc8a87b7ed2cdfccd8)
Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

livepatch: add old_sympos as disambiguator field to klp_func

Currently, patching objects with duplicate symbol names fail because the
creation of the sysfs function directory collides with the previous
attempt. Appending old_addr to the function name is problematic as it
reveals the address of the function being patch to a normal user. Using
the symbol's occurrence in kallsyms to postfix the function name in the
sysfs directory solves the issue of having consistent unique names and
ensuring that the address is not exposed to a normal user.

In addition, using the symbol position as the user's method to disambiguate
symbols instead of addr allows for disambiguating symbols in modules as
well for both function addresses and for relocs. This also simplifies much
of the code. Special handling for kASLR is no longer needed and can be
removed. The klp_find_verify_func_addr function can be replaced by
klp_find_object_symbol, and klp_verify_vmlinux_symbol and its callback can
be removed completely.

In cases of duplicate symbols, old_sympos will be used to disambiguate
instead of old_addr. By default old_sympos will be 0, and patching will
only succeed if the symbol is unique. Specifying a positive value will
ensure that occurrence of the symbol in kallsyms for the patched object
will be used for patching if it is valid.

In addition, make old_addr an internal structure field not to be specified
by the user. Finally, remove klp_find_verify_func_addr as it can be
replaced by klp_find_object_symbol directly.

Support for symbol position disambiguation for relocations is added in the
next patch in this series.

Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Acked-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
(cherry picked from commit b2b018ef48675a9a524fa9791ea7d67fdac405f7)
Signed-off-by: Chris J Arges <chris.j.arges@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net: cavium: liquidio: fix check for in progress flag

smatch detected a suspicious looking bitop condition:

drivers/net/ethernet/cavium/liquidio/lio_main.c:2529
handle_timestamp() warn: suspicious bitop condition

(skb_shinfo(skb)->tx_flags | SKBTX_IN_PROGRESS is always non-zero,
so the logic is definitely not correct. Use & to mask the correct
bit.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from linux-next commit 19a6d156a7bd080f3a855a40a4a08ab475e34b4a)
Signed-off-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net: cavium: liquidio: Return correct error code

The return value of vmalloc on failure of allocation of memory should
be -ENOMEM and not -1.

Found using Coccinelle. A simplified version of the semantic patch
used is:

//<smpl>
@@
expression *e;
identifier l1;
position p,q;
@@

e@q = vmalloc(...);
if@p (e == NULL) {
...
goto l1;
}
l1:
...
return -1
+ -ENOMEM
;
//</smpl

The single call site of the containing function checks whether the
returned value is -1, so this check is changed as well. The single call
site of this call site, however, only checks whether the value is not 0,
so no further change was required.

Signed-off-by: Amitoj Kaur Chawla <amitoj1606@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from linux-next commit 08a965ec93ad0495802462c32b73241d658e189d)
Signed-off-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net: thunderx: Alloc higher order pages when pagesize is small

Allocate higher order pages when pagesize is small, this will
reduce number of calls to page allocator and wastage of memory.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from linux-next commit 6e4be8d6717cb63c58f6b404e63a881c76d8878c)
Signed-off-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net: thunderx: bgx: Add log message when setting mac address

Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from linux-next commit 1d82efaca87ecf53e97c696f9d0a9adefea0c7b5)
Signed-off-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net: thunderx: bgx: Use standard firmware node infrastructure.

In the case of OF device tree, the firmware information is attached to
the BGX device structure in the standard manner, so use the firmware
iterators and accessors where possible.

Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from linux-next commit eee326fd83348ed39a06c0db999ed513d10d9c39)
Signed-off-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net: thunderx: Assign affinity hints to vf's interrupts

This affinity hint can be used by user space irqbalance tool to set
preferred CPU mask for irqs registered by this VF. Irqbalance needs
to be in 'exact' mode to set irq affinity same as indicated by
affinity hint.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from linux-next commit fb4b7d98a0215fc3310c8415a86acfe726de395c)
Signed-off-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net: thunderx: Use napi_schedule_irqoff()

napi_schedule is being called from hard irq context, hence
switch to napi_schedule_irqoff which avoids unneeded call
to local_irq_save and local_irq_restore.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from linux-next commit ef0a4d8601760b346d9d0893f2a554c338861c4f)
Signed-off-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net, thunderx: Add TX timeout and RX buffer alloc failure stats.

When system is low on atomic memory, too many error messages are logged.
Since this is not a total failure but a simple switch to non-atomic allocation
better to have a stat.

Also add a stat for reset, kicked due to transmit watchdog timeout.

Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from linux-next commit a05d4845907a6f0296612d24956b189a51fb8df7)
Signed-off-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

arm64: prefetch: add missing #include for spin_lock_prefetch

As of 52e662326e1e ("arm64: prefetch: don't provide spin_lock_prefetch
with LSE"), spin_lock_prefetch is patched at runtime when the LSE atomics
are in use. This relies on the ARM64_LSE_ATOMIC_INSN macro to drive
the alternatives framework, but that macro is only available via
asm/lse.h, which isn't explicitly included in processor.h. Consequently,
drivers can run into build failures such as:

   In file included from include/linux/prefetch.h:14:0,
                    from drivers/net/ethernet/intel/i40e/i40e_txrx.c:27:
   arch/arm64/include/asm/processor.h: In function 'spin_lock_prefetch':
   arch/arm64/include/asm/processor.h:183:15: error: expected string literal before 'ARM64_LSE_ATOMIC_INSN'
     asm volatile(ARM64_LSE_ATOMIC_INSN(

This patch add the missing include and gets things building again.

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from linux-next commit afb83cc3f0e4f86ea0e1cc3db7a90f58f1abd4d5)
Signed-off-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

arm64: lib: patch in prfm for copy_page if requested

On ThunderX T88 pass 1 and pass 2, there is no hardware prefetching so
we need to patch in explicit software prefetching instructions

Prefetching improves this code by 60% over the original code and 2x
over the code without prefetching for the affected hardware using the
benchmark code at https://github.com/apinski-cavium/copy_page_benchmark

Signed-off-by: Andrew Pinski <apinski@cavium.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Tested-by: Andrew Pinski <apinski@cavium.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from linux-next commit 60e0a09db24adc8809696307e5d97cc4ba7cb3e0)
Signed-off-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

arm64: lib: improve copy_page to deal with 128 bytes at a time

We want to avoid lots of different copy_page implementations, settling
for something that is "good enough" everywhere and hopefully easy to
understand and maintain whilst we're at it.

This patch reworks our copy_page implementation based on discussions
with Cavium on the list and benchmarking on Cortex-A processors so that:

  - The loop is unrolled to copy 128 bytes per iteration

  - The reads are offset so that we read from the next 128-byte block
    in the same iteration that we store the previous block

  - Explicit prefetch instructions are removed for now, since they hurt
    performance on CPUs with hardware prefetching

  - The loop exit condition is calculated at the start of the loop

Signed-off-by: Will Deacon <will.deacon@arm.com>
Tested-by: Andrew Pinski <apinski@cavium.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from linux-next commit 223e23e8aa26b0bb62c597637e77295e14f6a62c)
Signed-off-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

arm64: prefetch: add alternative pattern for CPUs without a prefetcher

Most CPUs have a hardware prefetcher which generally performs better
without explicit prefetch instructions issued by software, however
some CPUs (e.g. Cavium ThunderX) rely solely on explicit prefetch
instructions.

This patch adds an alternative pattern (ARM64_HAS_NO_HW_PREFETCH) to
allow our library code to make use of explicit prefetch instructions
during things like copy routines only when the CPU does not have the
capability to perform the prefetching itself.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Tested-by: Andrew Pinski <apinski@cavium.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from linux-next commit d5370f754875460662abe8561388e019d90dd0c4)
Signed-off-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

arm64: prefetch: don't provide spin_lock_prefetch with LSE

The LSE atomics rely on us not dirtying data at L1 if we can avoid it,
otherwise many of the potential scalability benefits are lost.

This patch replaces spin_lock_prefetch with a nop when the LSE atomics
are in use, so that users don't shoot themselves in the foot by causing
needless coherence traffic at L1.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Tested-by: Andrew Pinski <apinski@cavium.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from linux-next commit cd5e10bdf3795d22f10787bb1991c43798c885d5)
Signed-off-by: Dann Frazier <dann.frazier@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

arm64: KVM: Configure TCR_EL2.PS at runtime

Setting TCR_EL2.PS to 40 bits is wrong on systems with less that
less than 40 bits of physical addresses. and breaks KVM on systems
where the RAM is above 40 bits.

This patch uses ID_AA64MMFR0_EL1.PARange to set TCR_EL2.PS dynamically,
just like we already do for VTCR_EL2.PS.

[Marc: rewrote commit message, patch tidy up]

Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Tirumalesh Chalamarla <tchalamarla@caviumnetworks.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit 3c5b1d92b3b02be07873d611a27950addff544d3)
[ dannf: backported to v4.4 ]
Signed-off-by: dann frazier <dann.frazier@canonical.com>

irqchip/gic-v3: Make sure read from ICC_IAR1_EL1 is visible on redestributor

The ARM GICv3 specification mentions the need for dsb after a read
from the ICC_IAR1_EL1 register:

4.1.1 Physical CPU Interface:
The effects of reading ICC_IAR0_EL1 and ICC_IAR1_EL1
on the state of a returned INTID are not guaranteed
to be visible until after the execution of a DSB.

Not having this could result in missed interrupts, so let's add the
required barrier.

[Marc: fixed commit message]

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: Tirumalesh Chalamarla <tchalamarla@caviumnetworks.com>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit 1a1ebd5fb1e203ee8cc73508cc7a38ac4b804596)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net: cavium: liquidio: use helpers ns_to_timespec64()

Convert the driver to use ns_to_timespec64() to keep consistency
with timespec64_to_ns() instead of open coding the same logic.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 286af315d3f153595ce718fb1e442891f14ed5c0)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net: thunderx: Enable CQE count threshold interrupt

This feature is introduced in pass-2 chip and with this CQ interrupt
coalescing will work based on both timer and count.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b9687b48a63a12ea31442f64dc77d41e83d0e478)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net: thunderx: HW TSO support for pass-2 hardware

This adds support for offloading TCP segmentation to HW in pass-2
revision of hardware. Both driver level SW TSO for pass1.x chips
and HW TSO for pass-2 chip will co-exist. Modified SQ descriptor
structures to reflect pass-2 hw implementation.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 40fb5f8a60f33133d36afde35a9ad865d35e4423)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net, thunderx: Remove unnecessary rcv buffer start address management

Since we have moved on to using allocated pages to carve receive
buffers instead of netdev_alloc_skb() there is no need to store
any pointers for later retrieval. Earlier we had to store
skb and skb->data pointers which later are used to handover
received packet to network stack.

This will avoid an unnecessary cache miss as well.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 668dda06d48fc16a5b40e6a32057bd18589e3f95)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net: thunderx: nicvf_queues: nivc_*_intr: remove duplication

The same switch-case repeates for nivc_*_intr functions.
In this patch it is moved to a helper nicvf_int_type_to_mask().

By the way:
- Unneeded write to NICVF register dropped if int_type is unknown.
- netdev_dbg() is used instead of netdev_err().

Signed-off-by: Yury Norov <yury.norov@auriga.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Acked-by: Vadim Lomovtsev <Vadim.Lomovtsev@caiumnetworks.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b45ceb406e4fd3045180b8d70bff60b1d43c7ff4)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: rebase to v4.4.2

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: (noup) cgroup: Add documentation for cgroup namespaces

BugLink: http://bugs.launchpad.net/bugs/1546775
Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: (noup) Add FS_USERNS_FLAG to cgroup fs

BugLink: http://bugs.launchpad.net/bugs/1546775
allowing root in a non-init user namespace to mount it.  This should
now be safe, because

1. non-init-root cannot mount a previously unbound subsystem
2. the task doing the mount must be privileged with respect to the
   user namespace owning the cgroup namespace
3. the mounted subsystem will have its current cgroup as the root dentry.
   the permissions will be unchanged, so tasks will receive no new
   privilege over the cgroups which they did not have on the original
   mounts.

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: (noup) cgroup: mount cgroupns-root when inside non-init cgroupns

BugLink: http://bugs.launchpad.net/bugs/1546775
This patch enables cgroup mounting inside userns when a process
as appropriate privileges. The cgroup filesystem mounted is
rooted at the cgroupns-root. Thus, in a container-setup, only
the hierarchy under the cgroupns-root is exposed inside the container.
This allows container management tools to run inside the containers
without depending on any global state.

Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: (noup) kernfs: define kernfs_node_dentry

BugLink: http://bugs.launchpad.net/bugs/1546775
Add a new kernfs api is added to lookup the dentry for a particular
kernfs path.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: (noup) cgroup: cgroup namespace setns support

BugLink: http://bugs.launchpad.net/bugs/1546775
setns on a cgroup namespace is allowed only if
task has CAP_SYS_ADMIN in its current user-namespace and
over the user-namespace associated with target cgroupns.
No implicit cgroup changes happen with attaching to another
cgroupns. It is expected that the somone moves the attaching
process under the target cgroupns-root.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: (noup) cgroup: introduce cgroup namespaces

BugLink: http://bugs.launchpad.net/bugs/1546775
Introduce the ability to create new cgroup namespace. The newly created
cgroup namespace remembers the cgroup of the process at the point
of creation of the cgroup namespace (referred as cgroupns-root).
The main purpose of cgroup namespace is to virtualize the contents
of /proc/self/cgroup file. Processes inside a cgroup namespace
are only able to see paths relative to their namespace root
(unless they are moved outside of their cgroupns-root, at which point
they will see a relative path from their cgroupns-root).
For a correctly setup container this enables container-tools
(like libcontainer, lxc, lmctfy, etc.) to create completely virtualized
containers without leaking system level cgroup hierarchy to the task.
This patch only implements the 'unshare' part of the cgroupns.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: (noup) sched: new clone flag CLONE_NEWCGROUP for cgroup namespace

BugLink: http://bugs.launchpad.net/bugs/1546775
CLONE_NEWCGROUP will be used to create new cgroup namespace.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: (noup) kernfs: Add API to generate relative kernfs path

BugLink: http://bugs.launchpad.net/bugs/1546775
The new function kernfs_path_from_node() generates and returns kernfs
path of a given kernfs_node relative to a given parent kernfs_node.

Signed-off-by: Aditya Kali <adityakali@google.com>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: Mark all devices as "might sleep"

BugLink: http://bugs.launchpad.net/bugs/1520436
Commit d1cd21427747 ("pwm: Set enable state properly on failed call to
enable") introduced a mutex that is needed to protect internal state of
PWM devices. Since that mutex is acquired in pwm_set_polarity() and in
pwm_enable() and might potentially block, all PWM devices effectively
become "might sleep".

It's rather pointless to keep the .can_sleep field around, but given
that there are external users let's postpone the removal for the next
release cycle.

Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit ff01c944cfa939f3474c28d88223213494aedf0b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: omap-dmtimer: Potential NULL dereference on error

BugLink: http://bugs.launchpad.net/bugs/1520436
"omap" is NULL so we can't dereference it.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit 074726402b82f14ca377da0b4a4767674c3d1ff8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: add HAS_IOMEM dependency to PWM_FSL_FTM

BugLink: http://bugs.launchpad.net/bugs/1520436
Ran into this on UML:

drivers/built-in.o: In function `fsl_pwm_probe':
linux/drivers/pwm/pwm-fsl-ftm.c:436: undefined reference to `devm_ioremap_resource'
collect2: error: ld returned 1 exit status

devm_ioremap_resource() is defined only when HAS_IOMEM is selected.

Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Cc: Xiubo Li <Li.Xiubo@freescale.com>
Cc: Alison Wang <b18965@freescale.com>
Cc: Jingchang Lu <b35083@freescale.com>
Cc: Sascha Hauer <s.hauer@pengutronix.de>
Cc: Yuan Yao <yao.yuan@freescale.com>
Cc: Thierry Reding <thierry.reding@gmail.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit 36d5be4bc9059f8123e818c8b63a4049cf1d0e0f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: Add PWM driver for OMAP using dual-mode timers

BugLink: http://bugs.launchpad.net/bugs/1520436
Adds support for using a OMAP dual-mode timer with PWM capability
as a Linux PWM device. The driver controls the timer by using the
dmtimer API.

Add a platform_data structure for each pwm-omap-dmtimer nodes containing
the dmtimers functions in order to get driver not rely on platform
specific functions.

Cc: Grant Erickson <marathon96@gmail.com>
Cc: NeilBrown <neilb@suse.de>
Cc: Joachim Eastwood <manabian@gmail.com>
Suggested-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
Acked-by: Tony Lindgren <tony@atomide.com>
[thierry.reding@gmail.com: coding style bikeshed, fix timer leak]
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit 6604c6556db9e41c85f2839f66bd9d617bcf9f87)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: rcar: Improve accuracy of frequency division setting

BugLink: http://bugs.launchpad.net/bugs/1520436
From: Ryo Kodama <ryo.kodama.vz@renesas.com>

When period_ns is set to the same value of RCAR_PWM_MAX_CYCLE in
rcar_pwm_get_clock_division(), this function should allow such value
for improving accuracy of frequency division setting.

Signed-off-by: Ryo Kodama <ryo.kodama.vz@renesas.com>
Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit 72c16a9f98afad073b4a9c947c1c89bfb886ffcb)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: lpc32xx: return ERANGE, if requested period is not supported

BugLink: http://bugs.launchpad.net/bugs/1520436
Instead of silent acceptance of unsupported requested configuration
for PWM period and setting the boundary supported value, return
-ERANGE to a caller.

Duty period value equal to 0 or period is still accepted to allow
configuration by PWM sysfs interface, when it is set to 0 by default.

For reference this is a list of restrictions on period_ns == 1/freq:

  | PWM parent clock | parent clock divisor | max freq | min freq |
  +------------------+----------------------+----------+----------+
  |   HCLK == 13 MHz |      1 (min)         | 50.7 KHz | 198.3 Hz |
  |   HCLK == 13 MHz |     15 (max)         | 3.38 KHz | 13.22 Hz |
  |  RTC == 32.7 KHz |      1 (min)         |   128 Hz |   0.5 Hz |
  |  RTC == 32.7 KHz |     15 (max)         | 8.533 Hz | 0.033 Hz |

Note that PWM sysfs interface does not support setting of period more
than NSEC_PER_SEC / MAX_INT32 ~ 2 seconds, however this PWM controller
supports a period up to 30 seconds.

Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit d6dbdf0ddefa581e49c2abe5fb0eb17d14111d89)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: lpc32xx: fix and simplify duty cycle and period calculations

BugLink: http://bugs.launchpad.net/bugs/1520436
The change fixes a problem, if duty_ns is too small in comparison
to period_ns (as a valid corner case duty_ns is 0 ns), then due to
PWM_DUTY() macro applied on a value the result is overflowed over 8
bits, and instead of the highest bitfield duty cycle value 0xff the
invalid duty cycle bitfield value 0x00 is written.

For reference the LPC32xx spec defines PWMx_DUTY bitfield description
is this way and it seems to be correct:

[Low]/[High] = [PWM_DUTY]/[256-PWM_DUTY], where 0 < PWM_DUTY <= 255.

In addition according to my oscilloscope measurements LPC32xx PWM is
"tristate" in sense that it produces a wave with floating min/max
voltage levels for different duty cycle values, for corner cases:

  PWM_DUTY == 0x01 => signal is in range from -1.05v to 0v
  ....
  PWM_DUTY == 0x80 => signal is in range from -0.75v to +0.75v
  ....
  PWM_DUTY == 0xff => signal is in range from 0v to +1.05v

  PWM_DUTY == 0x00 => signal is around 0v, PWM is off

Due to this peculiarity on very long period ranges (less than 1KHz)
and odd pre-divider values PWM generated wave does not remind a
clock shape signal, but rather a heartbit shape signal with positive
and negative peaks, so I would recommend to use high-speed HCLK clock
as a PWM parent clock and avoid using RTC clock as a parent.

The change corrects PWM output in corner cases and prevents any
possible overflows in calculation of values for PWM_DUTY and
PWM_RELOADV bitfields, thus helper macro definitions may be removed.

Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit 5a9fc9c666d5d759699cf5495bda85f1da0d747e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: lpc32xx: make device usable with common clock framework

BugLink: http://bugs.launchpad.net/bugs/1520436
As a preparatory change for switching LPC32xx mach support to common
clock framework fix clk_enable/clk_disable calls without matching
clk_prepare/clk_unprepare.

The driver can not be used on a platform with common clock framework
until clk_prepare/clk_unprepare calls are added, otherwise clk_enable
calls will fail and a WARN is generated:

    # echo 1 > /sys/bus/platform/drivers/lpc32xx-pwm/4005c000.pwm/pwm/pwmchip0/pwm0/enable
    ------------[ cut here ]------------
    WARNING: CPU: 0 PID: 701 at drivers/clk/clk.c:727 clk_core_enable+0x2c/0xa4()
    Modules linked in: sc16is7xx
    CPU: 0 PID: 701 Comm: sh Tainted: G        W       4.3.0-rc2+ #171
    Hardware name: LPC32XX SoC (Flattened Device Tree)
    Backtrace:
    [<>] (dump_backtrace) from [<>] (show_stack+0x18/0x1c)
    [<>] (show_stack) from [<>] (dump_stack+0x20/0x28)
    [<>] (dump_stack) from [<>] (warn_slowpath_common+0x90/0xb8)
    [<>] (warn_slowpath_common) from [<>] (warn_slowpath_null+0x24/0x2c)
    [<>] (warn_slowpath_null) from [<>] (clk_core_enable+0x2c/0xa4)
    [<>] (clk_core_enable) from [<>] (clk_enable+0x24/0x38)
    [<>] (clk_enable) from [<>] (lpc32xx_pwm_enable+0x1c/0x40)
    [<>] (lpc32xx_pwm_enable) from [<>] (pwm_enable+0x48/0x5c)
    [<>] (pwm_enable) from [<>] (pwm_enable_store+0x5c/0x78)
    [<>] (pwm_enable_store) from [<>] (dev_attr_store+0x20/0x2c)
    [<>] (dev_attr_store) from [<>] (sysfs_kf_write+0x44/0x50)
    [<>] (sysfs_kf_write) from [<>] (kernfs_fop_write+0x134/0x194)
    [<>] (kernfs_fop_write) from [<>] (__vfs_write+0x34/0xdc)
    [<>] (__vfs_write) from [<>] (vfs_write+0xb8/0x140)
    [<>] (vfs_write) from [<>] (SyS_write+0x50/0x90)
    [<>] (SyS_write) from [<>] (ret_fast_syscall+0x0/0x38)

Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit 82aff048dde444334c7045fab620b13058ff15a7)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: lpc32xx: correct number of PWM channels from 2 to 1

BugLink: http://bugs.launchpad.net/bugs/1520436
LPC32xx SoC has two independent PWM controllers, they have different
clock parents, clock gates and even slightly different controls, and
each of these two PWM controllers has one output channel. Due to
almost similar controls arranged in a row it is incorrectly set that
there is one PWM controller with two channels, fix this problem, which
at the moment prevents separate configuration of different clock
parents and gates for both PWM controllers.

The change makes previous PWM device node description incompatible
with this update.

Signed-off-by: Vladimir Zapolskiy <vz@mleia.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit ebe1fca35038df28b5c183e8486863e765364ec1)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: fsl-ftm: Fix clock enable/disable when using PM

BugLink: http://bugs.launchpad.net/bugs/1520436
A FTM PWM instance enables/disables three clocks: The bus clock, the
counter clock and the PWM clock. The bus clock gets enabled on
pwm_request, whereas the counter and PWM clocks will be enabled upon
pwm_enable.

The driver has three closesly related issues when enabling/disabling
clocks during suspend/resume:
- The three clocks are not treated differently in regards to the
  individual PWM state enabled/requested. This can lead to clocks
  getting disabled which have not been enabled in the first place
  (a PWM channel which only has been requested going through
  suspend/resume).

- When entering suspend, the current behavior relies on the
  FTM_OUTMASK register: If a PWM output is unmasked, the driver
  assumes the clocks are enabled. However, some PWM instances
  have only 2 channels connected (e.g. Vybrid's FTM1). In that case,
  the FTM_OUTMASK reads 0x3 if all channels are disabled, even if
  the code wrote 0xff to it before. For those PWM instances, the
  current approach to detect enabled PWM signals does not work.

- A third issue applies to the bus clock only, which can get enabled
  multiple times (once for each PWM channel of a PWM chip). This is
  fine, however when entering suspend mode, the clock only gets
  disabled once.

This change introduces a different approach by relying on the enable
and prepared counters of the clock framework and using the frameworks
PWM signal states to address all three issues.

Clocks get disabled during suspend and back enabled on resume
regarding to the PWM channels individual state (requested/enabled).

Since we do not count the clock enables in the driver, this change no
longer clears the Status and Control registers Clock Source Selection
(FTM_SC[CLKS]). However, since we disable the selected clock anyway,
and we explicitly select the clock source on reenabling a PWM channel
this approach should not make a difference in practice.

Signed-off-by: Stefan Agner <stefan@agner.ch>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit 816aec2325e620b6454474372a21f90a8740cb28)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: lpss: Rework the sequence of programming PWM_SW_UPDATE

BugLink: http://bugs.launchpad.net/bugs/1520436
Setting of PWM_SW_UPDATE is bit different in Intel Broxton compared to the
previous generation SoCs. Previously it was OK to set the bit many times
(from userspace via sysfs for example) before the PWM is actually enabled.

Starting from Intel Broxton it seems that we must set PWM_SW_UPDATE only
once before the PWM is enabled. Otherwise it is possible that the PWM does
not start properly.

Change the sequence of how PWM_SW_UPDATE is programmed so that we only set
it in pwm_lpss_config() when the PWM is already enabled. The initial
setting of PWM_SW_UPDATE will be done when PWM gets enabled. This should
make the driver work with the previous generation Intel SoCs and Broxton.

Add also small delay after the bit is set to let the hardware propagate it
properly.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit 37670676a122a38e72ecd9dac0feff2a3dac967f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: lpss: Select core part automatically

BugLink: http://bugs.launchpad.net/bugs/1520436
We have two users of core part right now. Let them to select core part
automatically.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit 6f90a00c6667dce5651341f0629443cf7951b235)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: lpss: Update PWM setting for Broxton

BugLink: http://bugs.launchpad.net/bugs/1520436
For Broxton PWM controller, base unit is defined as 8-bit integer
and 14-bit fraction, so need to update base unit setting to output
wave with right frequency.

Signed-off-by: Qipeng Zha <qipeng.zha@intel.com>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit 883e4d070fe125028c0579d8666b820aadf458fb)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: bcm2835: Fix email address specification

BugLink: http://bugs.launchpad.net/bugs/1520436
Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit 6ef7d1c46f0cbe2b8e9c66d5d95ffa5a612df45d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: bcm2835: Prevent division by zero

BugLink: http://bugs.launchpad.net/bugs/1520436
It's possible that the PWM clock becomes an orphan. So better check the
result of clk_get_rate() in order to prevent a division by zero.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit fd13c14426299e75983a0cd3edf53dfa4083a70a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: bcm2835: Calculate scaler in ->config()

BugLink: http://bugs.launchpad.net/bugs/1520436
Currently pwm-bcm2835 assumes a fixed clock rate and stores the
resulting scaler in the driver structure. But with the upcoming
PWM clock support for clk-bcm2835 the rate could change, so
calculate the scaler in the ->config() callback.

Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit ebe88b6ae41ff8f2b48608b6019c4341aa24bcea)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

pwm: lpss: Remove ->free() callback

BugLink: http://bugs.launchpad.net/bugs/1520436
The LPSS PWM driver calls pwm_lpss_disable() when the PWM device is
released (for example unexported from sysfs). This in turn calls
pm_runtime_put() which makes runtime PM count to be unbalanced if the
device has not been enabled at this point.

This is easy to reproduce:

  # cd /sys/class/pwm/pwmchip0
  # echo 0 > export
  # echo 0 > unexport

The count is unbalanced and prevents the PWM device from being powered on
next time.

Fix this by removing ->free() callback. There are no resources to be
released anyway.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Thierry Reding <thierry.reding@gmail.com>
(cherry picked from commit c7b91b33cf446ec09eedf4594cff8d7b85ef6870)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: [Config] CONFIG_PWM_OMAP_DMTIMER=m

BugLink: http://bugs.launchpad.net/bugs/1520436
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: [Debian] hv: hv_set_ifconfig -- fix numerous parameter handling issues

Fix a number of KVP parameter parsing issues:

1) we should not be separating the prefix and instance numbers with an '_',
2) IPADDR/NETMASK instance 0 does have a suffix which we do not provide,
3) GATEWAY instance 0 is inconsistant,
4) IPv6 should be configured whether IPv4 is DHCP or not, and
5) DHCP mode is selected via BOOTPROTO=dhcp not DHCP=yes.

BugLink: http://bugs.launchpad.net/bugs/1540586
Signed-off-by: Andy Whitcroft <apw@canonical.com>

UBUNTU: [Debian] hv: hv_set_ifconfig -- switch to approved indentation

BugLink: http://bugs.launchpad.net/bugs/1540586
Signed-off-by: Andy Whitcroft <apw@canonical.com>

openvswitch: allow management from inside user namespaces

Operations with the GENL_ADMIN_PERM flag fail permissions checks because
this flag means we call netlink_capable, which uses the init user ns.

Instead, let's introduce a new flag, GENL_UNS_ADMIN_PERM for operations
which should be allowed inside a user namespace.

The motivation for this is to be able to run openvswitch in unprivileged
containers. I've tested this and it seems to work, but I really have no
idea about the security consequences of this patch, so thoughts would be
much appreciated.

v2: use the GENL_UNS_ADMIN_PERM flag instead of a check in each function
v3: use separate ifs for UNS_ADMIN_PERM and ADMIN_PERM, instead of one
massive one

Reported-by: James Page <james.page@canonical.com>
Signed-off-by: Tycho Andersen <tycho.andersen@canonical.com>
CC: Eric Biederman <ebiederm@xmission.com>
CC: Pravin Shelar <pshelar@ovn.org>
CC: Justin Pettit <jpettit@nicira.com>
CC: "David S. Miller" <davem@davemloft.net>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from linux-next commit 4a92602aa1cd5bbaeedbd9536ff992f7d26fe9d1)
Signed-off-by: Leann Ogasawara <leann.ogasawara@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: Add an i/o barrier

BugLink: http://bugs.launchpad.net/bugs/1544679
A barrier should be added to ensure proper ordering of memory mapped
writes.

Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Kashyap Desai <kashyap.desai@broadcom.com>
Acked-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit 4327614e7154a29a5db10469e49282f61598f41a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: Fix SMAP issue

BugLink: http://bugs.launchpad.net/bugs/1544679
Inside compat IOCTL hook of driver, driver was using wrong address of
ioc->frame.raw which leads sense_ioc_ptr to be calculated wrongly and
failing IOCTL.

Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit 5ee15e20a7616bff57ab085271b2c05d1967a6ff)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: Fix for IO failing post OCR in SRIOV environment

BugLink: http://bugs.launchpad.net/bugs/1544679
Driver assumes that VFs always have peers present whenever they have
same LD IDs. But this is not the case. This patch handles the above
mentioned by explicitly checking for a peer before making HA/non-HA path
decision.

Signed-off-by: Uday Lingala <uday.lingala@avagotech.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit 07d6c5aa4d4128cadb7ff0557576f70085f2cae1)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid: fix null pointer check in megasas_detach_one().

BugLink: http://bugs.launchpad.net/bugs/1544679
The pd_seq_sync pointer can't be NULL, we have to check its entries
instead.

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit bd27c97fae40ff3a96e91750304296d851b3c5f2)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: driver version upgrade

BugLink: http://bugs.launchpad.net/bugs/1544679
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit aee21fb6abaf67eb2ad8a72a8aabbe5e1837207e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: SPERC OCR changes

BugLink: http://bugs.launchpad.net/bugs/1544679
This patch fixes online controller resets on SRIOV-enabled series of
Avago controllers.

1) Remove late detection heartbeat.

2) Change in the behavior if the FW found in READY/OPERATIONAL state.

Signed-off-by: Uday Lingala <uday.lingala@avagotech.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit 92d21b2e4c9e9ac9f027b8ce0b74eb4c03230214)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: Introduce module parameter for SCSI command timeout

BugLink: http://bugs.launchpad.net/bugs/1544679
This patch will introduce module-parameter for SCSI command timeout
value and fix setting of resetwaittime beyond a value.

Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit f4a01f737e56ab0054e4c1d93c3870f0e0803ba0)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: MFI adapter OCR changes

BugLink: http://bugs.launchpad.net/bugs/1544679
Optimized MFI adapters' OCR path, particularly
megasas_wait_for_outstanding() function.

Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit c05c7e5c091a6527bed7f31e0238749c845a4744)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: Make adprecovery variable atomic

BugLink: http://bugs.launchpad.net/bugs/1544679
Make instance->adprecovery variable atomic and removes hba_lock spinlock
while accessing instance->adprecovery.

Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit 19a9f01b55a094d46f02c50d7906514f239ab737)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: IO throttling support

BugLink: http://bugs.launchpad.net/bugs/1544679
This patch will add capability in driver to tell firmware that it can
throttle IOs in case controller's queue depth is downgraded post OFU
(online firmware upgrade). This feature will ensure firmware can be
downgraded from higher queue depth to lower queue depth without needing
system reboot. Added throttling code in IO path of driver, in case OS
tries to send more IOs than post OFU firmware's queue depth.

Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit a78fae31ce973dc68021d040fe23d85dfebd62be)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: Dual queue depth support

BugLink: http://bugs.launchpad.net/bugs/1544679
1. For iMR controllers, firmware will report two queue depths:

- Controller-wide queue depth
- LDIO queue depth (240)

Controller-wide queue depth will be greater among the two. Using this
new feature, iMR can provide larger Queue depth(QD) for JBOD and limited
QD for Virtual Disk(VD).

2. megaraid_sas driver will throttle read/write LDIOs based on "LDIO
Queue Depth".

3. Dual queue depth can be enabled/disabled via module parameter. It is
enabled by default if the firmware supports it. Only specific firmware
builds will enable the feature.

4. Added sysfs parameter "ldio_outstanding" which permits querying the
number of outstanding LDIO requests at runtime.

Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit 6516e25f7dc9f19d3a1138a1a9c19840f5962a48)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: Code optimization build_and_issue_cmd return-type

BugLink: http://bugs.launchpad.net/bugs/1544679
build_and_issue_cmd should return SCSI_MLQUEUE_HOST_BUSY for a few error
cases instead of returning 1.

Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit 6af29f582aae28bd3098cc1f6f037a8c4d49d5b9)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: Reply Descriptor Post Queue (RDPQ) support

BugLink: http://bugs.launchpad.net/bugs/1544679
This patch will create a reply queue pool for each MSI-X index and will
provide an array of base addresses instead of the single address of
legacy mode. Using this new interface the driver can support higher
queue depths through scattered DMA pools.

If array mode is not supported driver will fall back to the legacy
method of reply pool allocation. This limits controller queue depth to
1K max. To enable a queue depth of more than 1K driver requires firmware
to support array mode and scratch_pad3 will provide the new queue depth
value.

When RDPQ is used, downgrading to an older firmware release should not
be permitted. This may cause firmware fault and is not supported.

Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit 7cb94182656215333ded440ab785e88aa850283f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: Fastpath region lock bypass

BugLink: http://bugs.launchpad.net/bugs/1544679
Firmware will fill out per-LD data to tell driver whether a particular
LD supports region lock bypass. If yes, then driver will send non-FP
LDIO to region lock bypass FIFO. With this change in driver, firmware
will optimize certain code to improve performance.

Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit 9ddddb30a4b9e5e8bb20c255056bf319c5767a87)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: Update device queue depth based on interface type

BugLink: http://bugs.launchpad.net/bugs/1544679
This patch will update device Queue depth based on interface type(SAS,
SATA..) for sysPDs. For Virtual disks(VDs), there will be no change in
queue depth (will remain 256). To fetch interface type (SAS or SATA or
FC..) of syspD, driver will send DCMD MR_DCMD_PD_GET_INFO.

Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit 86596b4b69767c628763c892b39c58e514890ac1)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: Task management support

BugLink: http://bugs.launchpad.net/bugs/1544679
This patch adds task management for SCSI commands. Added functions are
task abort and target reset.

1. Currently, megaraid_sas driver performs controller reset when any IO
times out. With task management support added, task abort and target
reset will be tried to recover timed out IO. If task management fails,
then controller reset will be performaned. If the task management
request times out, fail the request and escalate to the next
level (controller reset).

2. mr_device_priv_data will be allocated for all generations of
controller, but is_tm_capable flag will never be set for
controllers (prior to Invader series) as firmware support is not
available for task management.

3. Task management capable firmware will set is_tm_capable flag in
firmware API.

Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit 31796fa184ee12f290da92f695845e3c780ff11d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: Syncing request flags macro names with firmware

BugLink: http://bugs.launchpad.net/bugs/1544679
Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit 4331dc0b8a9071e02404f43bd78cfce38eacf8cf)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: MFI IO timeout handling

BugLink: http://bugs.launchpad.net/bugs/1544679
This patch will do proper error handling for DCMD timeout failure cases
for Fusion adapters:

1. For MFI adapters, in case of DCMD timeout (DCMD which must return
SUCCESS) driver will call kill adapter.

2. What action needs to be taken in case of DCMD timeout is decided by
function dcmd_timeout_ocr_possible(). DCMD timeout causing OCR is
applicable to the following commands:

MR_DCMD_PD_LIST_QUERY
MR_DCMD_LD_GET_LIST
MR_DCMD_LD_LIST_QUERY
MR_DCMD_CTRL_SET_CRASH_DUMP_PARAMS
MR_DCMD_SYSTEM_PD_MAP_GET_INFO
MR_DCMD_LD_MAP_GET_INFO

3. If DCMD fails from driver init path there are certain DCMDs which
must return SUCCESS. If those DCMDs fail, driver bails out. For optional
DCMDs like pd_info etc., driver continues without executing certain
functionality.

Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit da8939d675e3fdd9652500c37640423062497edc)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

megaraid_sas: Do not allow PCI access during OCR

BugLink: http://bugs.launchpad.net/bugs/1544679
This patch will do synhronization between OCR function and AEN function
using "reset_mutex" lock. reset_mutex will be acquired only in the
first half of the AEN function which issues a DCMD. Second half of the
function which calls SCSI API (scsi_add_device/scsi_remove_device)
should be out of reset_mutex to avoid deadlock between scsi_eh thread
and driver.

During chip reset (inside OCR function), there should not be any PCI
access and AEN function (which is called in delayed context) may be
firing DCMDs (doing PCI writes) when chip reset is happening in parallel
which will cause FW fault. This patch will solve the problem by making
AEN thread and OCR thread mutually exclusive.

Signed-off-by: Sumit Saxena <sumit.saxena@avagotech.com>
Signed-off-by: Kashyap Desai <kashyap.desai@avagotech.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit e5bf0a869b770ff035cd64e5a8b66a3c6bdd9f3b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: [Config] Update bnx2x d-i firmware to 7.12.30

BugLink: http://bugs.launchpad.net/bugs/1544321
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: Start new release

Ignore: yes
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: Ubuntu-4.4.0-6.21

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: fuse: Add module parameter to enable user namespace mounts

This is still an experimental feature, so disable it by default
and allow it only when the system administrator supplies the
userns_mounts=true module parameter.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: ext4: Add module parameter to enable user namespace mounts

This is still an experimental feature, so disable it by default
and allow it only when the system administrator supplies the
userns_mounts=true module parameter.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: ext4: Add support for unprivileged mounts from user namespaces

Support unprivileged mounting of ext4 volumes from user
namespaces. This requires the following changes:

- Perform all uid and gid conversions to/from disk relative to
   s_user_ns. In many cases this will already be handled by the
   vfs helper functions. This also requires updates to handle
   cases where ids may not map into s_user_ns.

- Update most capability checks to check for capabilities in
   s_user_ns rather than init_user_ns. These mostly reflect
   changes to the filesystem that a user in s_user_ns could
   already make externally by virtue of having write access to
   the backing device.

- Restrict unsafe options in either the mount options or the
   ext4 superblock. Currently the only concerning option is
   errors=panic, and this is made to require CAP_SYS_ADMIN in
   init_user_ns.

- Verify that unprivileged users have the required access to the
   journal device at the path passed via the journal_path mount
   option.

   Note that for the journal_path and the journal_dev mount
   options, and for external journal devices specified in the
   ext4 superblock, devcgroup restrictions will be enforced by
   __blkdev_get(), (via blkdev_get_by_dev()), ensuring that the
   user has been granted appropriate access to the block device.

- Set the FS_USERNS_MOUNT flag on the filesystem types supported
   by ext4.

sysfs attributes for ext4 mounts remain writable only by real
root.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: block_dev: Forbid unprivileged mounting when device is opened for writing

For unprivileged mounts to be safe the user must not be able to
make changes to the backing store while it is mounted. This patch
takes a step towards preventing this by refusing to mount in a
user namepspace if the block device is open for writing and
refusing attempts to open the block device for writing by non-
root while it is mounted in a user namespace.

To prevent this from happening we use i_writecount in the inodes
of the bdev filesystem similarly to how it is used for regular
files. Whenever the device is opened for writing i_writecount
is checked; if it is negative the open returns -EBUSY, otherwise
i_writecount is incremented. On mount, a positive i_writecount
results in mount_bdev returning -EBUSY, otherwise i_writecount
is decremented. Opens by root and mounts from init_user_ns do not
check nor modify i_writecount.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: ima/evm: Allow root in s_user_ns to set xattrs

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: fs: Allow CAP_SYS_ADMIN in s_user_ns to freeze and thaw filesystems

The user in control of a super block should be allowed to freeze
and thaw it. Relax the restrictions on the FIFREEZE and FITHAW
ioctls to require CAP_SYS_ADMIN in s_user_ns.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: evm: Translate user/group ids relative to s_user_ns when computing HMAC

The EVM HMAC should be calculated using the on disk user and
group ids, so the k[ug]ids in the inode must be translated
relative to the s_user_ns of the inode's super block.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: quota: Convert ids relative to s_user_ns

When dealing with mounts from user namespaces quota user ids must
be translated relative to s_user_ns, especially when they will be
stored on disk. For purposes of the in-kernel hash table using
init_user_ns is still okay and may help reduce hash collisions,
so continue using init_user_ns there.

These changes introduce the possibility that the conversion
operations in struct qtree_fmt_operations could fail if an id
cannot be legitimately converted. Change these operations to
return int rather than void, and update the callers to handle
failures.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns

For filesystems mounted from a user namespace on-disk ids should
be translated relative to s_users_ns rather than init_user_ns.

When an id in the filesystem doesn't exist in s_user_ns the
associated id in the inode will be set to INVALID_[UG]ID, which
turns these into de facto "nobody" ids. This actually maps pretty
well into the way most code already works, and those places where
it didn't were fixed in previous patches. Moving forward vfs code
needs to be careful to handle instances where ids in inodes may
be invalid.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: mtd: Check permissions towards mtd block device inode when mounting

Unprivileged users should not be able to mount mtd block devices
when they lack sufficient privileges towards the block device
inode. Update mount_mtd() to validate that the user has the
required access to the inode at the specified path. The check
will be skipped for CAP_SYS_ADMIN, so privileged mounts will
continue working as before.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: fuse: Allow user namespace mounts

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: fuse: Restrict allow_other to the superblock's namespace or a descendant

Unprivileged users are normally restricted from mounting with the
allow_other option by system policy, but this could be bypassed
for a mount done with user namespace root permissions. In such
cases allow_other should not allow users outside the userns
to access the mount as doing so would give the unprivileged user
the ability to manipulate processes it would otherwise be unable
to manipulate. Restrict allow_other to apply to users in the same
userns used at mount or a descendant of that namespace.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: fuse: Support fuse filesystems outside of init_user_ns

In order to support mounts from namespaces other than
init_user_ns, fuse must translate uids and gids to/from the
userns of the process servicing requests on /dev/fuse. This
patch does that, with a couple of restrictions on the namespace:

- The userns for the fuse connection is fixed to the namespace
from which /dev/fuse is opened.

- The namespace must be the same as s_user_ns.

These restrictions simplify the implementation by avoiding the
need to pass around userns references and by allowing fuse to
rely on the checks in inode_change_ok for ownership changes.
Either restriction could be relaxed in the future if needed.

For cuse the namespace used for the connection is also simply
current_user_ns() at the time /dev/cuse is opened.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: fuse: Add support for pid namespaces

If the userspace process servicing fuse requests is running in
a pid namespace then pids passed via the fuse fd need to be
translated relative to that namespace. Capture the pid namespace
in use when the filesystem is mounted and use this for pid
translation.

Since no use case currently exists for changing namespaces all
translations are done relative to the pid namespace in use when
/dev/fuse is opened. Mounting or /dev/fuse IO from another
namespace will return errors.

Requests from processes whose pid cannot be translated into the
target namespace are not permitted, except for requests
allocated via fuse_get_req_nofail_nopages. For no-fail requests
in.h.pid will be 0 if the pid translation fails.

File locking changes based on previous work done by Eric
Biederman.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: capabilities: Allow privileged user in s_user_ns to set security.* xattrs

A privileged user in s_user_ns will generally have the ability to
manipulate the backing store and insert security.* xattrs into
the filesystem directly. Therefore the kernel must be prepared to
handle these xattrs from unprivileged mounts, and it makes little
sense for commoncap to prevent writing these xattrs to the
filesystem. The capability and LSM code have already been updated
to appropriately handle xattrs from unprivileged mounts, so it
is safe to loosen this restriction on setting xattrs.

The exception to this logic is that writing xattrs to a mounted
filesystem may also cause the LSM inode_post_setxattr or
inode_setsecurity callbacks to be invoked. SELinux will deny the
xattr update by virtue of applying mountpoint labeling to
unprivileged userns mounts, and Smack will deny the writes for
any user without global CAP_MAC_ADMIN, so loosening the
capability check in commoncap is safe in this respect as well.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: fs: Allow superblock owner to access do_remount_sb()

Superblock level remounts are currently restricted to global
CAP_SYS_ADMIN, as is the path for changing the root mount to
read only on umount. Loosen both of these permission checks to
also allow CAP_SYS_ADMIN in any namespace which is privileged
towards the userns which originally mounted the filesystem.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: fs: Don't remove suid for CAP_FSETID in s_user_ns

Expand the check in should_remove_suid() to keep privileges for
CAP_FSETID in s_user_ns rather than init_user_ns.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: fs: Update posix_acl support to handle user namespace mounts

ids in on-disk ACLs should be converted to s_user_ns instead of
init_user_ns as is done now. This introduces the possibility for
id mappings to fail, and when this happens syscalls will return
EOVERFLOW.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: fs: Refuse uid/gid changes which don't map into s_user_ns

Add checks to inode_change_ok to verify that uid and gid changes
will map into the superblock's user namespace. If they do not
fail with -EOVERFLOW. This cannot be overriden with ATTR_FORCE.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: cred: Reject inodes with invalid ids in set_create_file_as()

Using INVALID_[UG]ID for the LSM file creation context doesn't
make sense, so return an error if the inode passed to
set_create_file_as() has an invalid id.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: fs: Check for invalid i_uid in may_follow_link()

Filesystem uids which don't map into a user namespace may result
in inode->i_uid being INVALID_UID. A symlink and its parent
could have different owners in the filesystem can both get
mapped to INVALID_UID, which may result in following a symlink
when this would not have otherwise been permitted when protected
symlinks are enabled.

Add a new helper function, uid_valid_eq(), and use this to
validate that the ids in may_follow_link() are both equal and
valid. Also add an equivalent helper for gids, which is
currently unused.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: Smack: Handle labels consistently in untrusted mounts

The SMACK64, SMACK64EXEC, and SMACK64MMAP labels are all handled
differently in untrusted mounts. This is confusing and
potentically problematic. Change this to handle them all the same
way that SMACK64 is currently handled; that is, read the label
from disk and check it at use time. For SMACK64 and SMACK64MMAP
access is denied if the label does not match smk_root. To be
consistent with suid, a SMACK64EXEC label which does not match
smk_root will still allow execution of the file but will not run
with the label supplied in the xattr.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: userns: Replace in_userns with current_in_userns

All current callers of in_userns pass current_user_ns as the
first argument. Simplify by replacing in_userns with
current_in_userns which checks whether current_user_ns is in the
namespace supplied as an argument.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: selinux: Add support for unprivileged mounts from user namespaces

Security labels from unprivileged mounts in user namespaces must
be ignored. Force superblocks from user namespaces whose labeling
behavior is to use xattrs to use mountpoint labeling instead.
For the mountpoint label, default to converting the current task
context into a form suitable for file objects, but also allow the
policy writer to specify a different label through policy
transition rules.

Pieced together from code snippets provided by Stephen Smalley.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Stephen Smalley <sds@tycho.nsa.gov>
Acked-by: James Morris <james.l.morris@oracle.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: fs: Treat foreign mounts as nosuid

If a process gets access to a mount from a different user
namespace, that process should not be able to take advantage of
setuid files or selinux entrypoints from that filesystem.  Prevent
this by treating mounts from other mount namespaces and those not
owned by current_user_ns() or an ancestor as nosuid.

This will make it safer to allow more complex filesystems to be
mounted in non-root user namespaces.

This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
setgid, and file capability bits can no longer be abused if code in
a user namespace were to clear nosuid on an untrusted filesystem,
but this patch, by itself, is insufficient to protect the system
from abuse of files that, when execed, would increase MAC privilege.

As a more concrete explanation, any task that can manipulate a
vfsmount associated with a given user namespace already has
capabilities in that namespace and all of its descendents.  If they
can cause a malicious setuid, setgid, or file-caps executable to
appear in that mount, then that executable will only allow them to
elevate privileges in exactly the set of namespaces in which they
are already privileges.

On the other hand, if they can cause a malicious executable to
appear with a dangerous MAC label, running it could change the
caller's security context in a way that should not have been
possible, even inside the namespace in which the task is confined.

As a hardening measure, this would have made CVE-2014-5207 much
more difficult to exploit.

Signed-off-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: James Morris <james.l.morris@oracle.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: block_dev: Check permissions towards block device inode when mounting

Unprivileged users should not be able to mount block devices when
they lack sufficient privileges towards the block device inode.
Update blkdev_get_by_path() to validate that the user has the
required access to the inode at the specified path. The check
will be skipped for CAP_SYS_ADMIN, so privileged mounts will
continue working as before.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Serge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>