git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/log

KVM: arm64: Save/Restore guest DISR_EL1

BugLink: http://bugs.launchpad.net/bugs/1756096
If we deliver a virtual SError to the guest, the guest may defer it
with an ESB instruction. The guest reads the deferred value via DISR_EL1,
but the guests view of DISR_EL1 is re-mapped to VDISR_EL2 when HCR_EL2.AMO
is set.

Add the KVM code to save/restore VDISR_EL2, and make it accessible to
userspace as DISR_EL1.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit c773ae2b34760a1ae409614aa31cdded81a645a5)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

KVM: arm64: Set an impdef ESR for Virtual-SError using VSESR_EL2.

BugLink: http://bugs.launchpad.net/bugs/1756096
Prior to v8.2's RAS Extensions, the HCR_EL2.VSE 'virtual SError' feature
generated an SError with an implementation defined ESR_EL1.ISS, because we
had no mechanism to specify the ESR value.

On Juno this generates an all-zero ESR, the most significant bit 'ISV'
is clear indicating the remainder of the ISS field is invalid.

With the RAS Extensions we have a mechanism to specify this value, and the
most significant bit has a new meaning: 'IDS - Implementation Defined
Syndrome'. An all-zero SError ESR now means: 'RAS error: Uncategorized'
instead of 'no valid ISS'.

Add KVM support for the VSESR_EL2 register to specify an ESR value when
HCR_EL2.VSE generates a virtual SError. Change kvm_inject_vabt() to
specify an implementation-defined value.

We only need to restore the VSESR_EL2 value when HCR_EL2.VSE is set, KVM
save/restores this bit during __{,de}activate_traps() and hardware clears the
bit once the guest has consumed the virtual-SError.

Future patches may add an API (or KVM CAP) to pend a virtual SError with
a specified ESR.

Cc: Dongjiu Geng <gengdongjiu@huawei.com>
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 4715c14bc136687bb79d12e24aafdc0f38786eb7)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

KVM: arm/arm64: mask/unmask daif around VHE guests

BugLink: http://bugs.launchpad.net/bugs/1756096
Non-VHE systems take an exception to EL2 in order to world-switch into the
guest. When returning from the guest KVM implicitly restores the DAIF
flags when it returns to the kernel at EL1.

With VHE none of this exception-level jumping happens, so KVMs
world-switch code is exposed to the host kernel's DAIF values, and KVM
spills the guest-exit DAIF values back into the host kernel.
On entry to a guest we have Debug and SError exceptions unmasked, KVM
has switched VBAR but isn't prepared to handle these. On guest exit
Debug exceptions are left disabled once we return to the host and will
stay this way until we enter user space.

Add a helper to mask/unmask DAIF around VHE guests. The unmask can only
happen after the hosts VBAR value has been synchronised by the isb in
__vhe_hyp_call (via kvm_call_hyp()). Masking could be as late as
setting KVMs VBAR value, but is kept here for symmetry.

Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(backported from commit 4f5abad9e826bd579b0661efa32682d9c9bc3fa8)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

arm64: kernel: Prepare for a DISR user

BugLink: http://bugs.launchpad.net/bugs/1756096
KVM would like to consume any pending SError (or RAS error) after guest
exit. Today it has to unmask SError and use dsb+isb to synchronise the
CPU. With the RAS extensions we can use ESB to synchronise any pending
SError.

Add the necessary macros to allow DISR to be read and converted to an
ESR.

We clear the DISR register when we enable the RAS cpufeature, and the
kernel has not executed any ESB instructions. Any value we find in DISR
must have belonged to firmware. Executing an ESB instruction is the
only way to update DISR, so we can expect firmware to have handled
any deferred SError. By the same logic we clear DISR in the idle path.

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(backported from commit 68ddbf09ec5a888ec850edd7e7438d2daf069c56)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

arm64: Unconditionally enable IESB on exception entry/return for firmware-first

BugLink: http://bugs.launchpad.net/bugs/1756096
ARM v8.2 has a feature to add implicit error synchronization barriers
whenever the CPU enters or returns from an exception level. Add this to the
features we always enable. CPUs that don't support this feature will treat
the bit as RES0.

This feature causes RAS errors that are not yet visible to software to
become pending SErrors. We expect to have firmware-first RAS support
so synchronised RAS errors will be take immediately to EL3.
Any system without firmware-first handling of errors will take the SError
either immediatly after exception return, or when we unmask SError after
entry.S's work.

Adding IESB to the ELx flags causes it to be enabled by KVM and kexec
too.

Platform level RAS support may require additional firmware support.

Cc: Christoffer Dall <christoffer.dall@linaro.org>
Suggested-by: Will Deacon <will.deacon@arm.com>
Link: https://www.spinics.net/lists/kvm-arm/msg28192.html
Acked-by: Marc Zyngier <marc.zyngier@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit f751daa4f9d3da07e2777ea0c1ba2d58ff2c860f)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

arm64: kernel: Survive corrected RAS errors notified by SError

BugLink: http://bugs.launchpad.net/bugs/1756096
Prior to v8.2, SError is an uncontainable fatal exception. The v8.2 RAS
extensions use SError to notify software about RAS errors, these can be
contained by the Error Syncronization Barrier.

An ACPI system with firmware-first may use SError as its 'SEI'
notification. Future patches may add code to 'claim' this SError as a
notification.

Other systems can distinguish these RAS errors from the SError ESR and
use the AET bits and additional data from RAS-Error registers to handle
the error. Future patches may add this kernel-first handling.

Without support for either of these we will panic(), even if we received
a corrected error. Add code to decode the severity of RAS errors. We can
safely ignore contained errors where the CPU can continue to make
progress. For all other errors we continue to panic().

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 6bf0dcfd713563bd2e13ceb53217305c28a8aa5f)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

arm64: cpufeature: Detect CPU RAS Extentions

BugLink: http://bugs.launchpad.net/bugs/1756096
ARM's v8.2 Extentions add support for Reliability, Availability and
Serviceability (RAS). On CPUs with these extensions system software
can use additional barriers to isolate errors and determine if faults
are pending. Add cpufeature detection.

Platform level RAS support may require additional firmware support.

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
Signed-off-by: Xie XiuQi <xiexiuqi@huawei.com>
[Rebased added config option, reworded commit message]
Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(backported from commit 64c02720ea3598bf5143b672274d923a941b8053)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

arm64: sysreg: Move to use definitions for all the SCTLR bits

BugLink: http://bugs.launchpad.net/bugs/1756096
__cpu_setup() configures SCTLR_EL1 using some hard coded hex masks,
and el2_setup() duplicates some this when setting RES1 bits.

Lets make this the same as KVM's hyp_init, which uses named bits.

First, we add definitions for all the SCTLR_EL{1,2} bits, the RES{1,0}
bits, and those we want to set or clear.

Add a build_bug checks to ensures all bits are either set or clear.
This means we don't need to preserve endian-ness configuration
generated elsewhere.

Finally, move the head.S and proc.S users of these hard-coded masks
over to the macro versions.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 7a00d68ebe5f07cb1db17e7fedfd031f0d87e8bb)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

firmware: arm_sdei: Discover SDEI support via ACPI

BugLink: http://bugs.launchpad.net/bugs/1756096
SDEI defines a new ACPI table to indicate the presence of the interface.
The conduit is discovered in the same way as PSCI.

For ACPI we need to create the platform device ourselves as SDEI doesn't
have an entry in the DSDT.

The SDEI platform device should be created after ACPI has been initialised
so that we can parse the table, but before GHES devices are created, which
may register SDE events if they use SDEI as their notification type.

Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 677a60bd2003ff5517a0b502365112531446a3e3)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

arm64: acpi: Remove __init from acpi_psci_use_hvc() for use by SDEI

BugLink: http://bugs.launchpad.net/bugs/1756096
SDEI inherits the 'use hvc' bit that is also used by PSCI. PSCI does all
its initialisation early, SDEI does its late.

Remove the __init annotation from acpi_psci_use_hvc().

Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit fa31ab77ced9fbab87fbac4fca3682009b7f9be2)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

firmware: arm_sdei: add support for CPU private events

BugLink: http://bugs.launchpad.net/bugs/1756096
Private SDE events are per-cpu, and need to be registered and enabled
on each CPU.

Hide this detail from the caller by adapting our {,un}register and
{en,dis}able calls to send an IPI to each CPU if the event is private.

CPU private events are unregistered when the CPU is powered-off, and
re-registered when the CPU is brought back online. This saves bringing
secondary cores back online to call private_reset() on shutdown, kexec
and resume from hibernate.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit f92b5462a2f22d13a75dc663f7b2fac16a3e61cb)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

firmware: arm_sdei: Add support for CPU and system power states

BugLink: http://bugs.launchpad.net/bugs/1756096
When a CPU enters an idle lower-power state or is powering off, we
need to mask SDE events so that no events can be delivered while we
are messing with the MMU as the registered entry points won't be valid.

If the system reboots, we want to unregister all events and mask the CPUs.
For kexec this allows us to hand a clean slate to the next kernel
instead of relying on it to call sdei_{private,system}_data_reset().

For hibernate we unregister all events and re-register them on restore,
in case we restored with the SDE code loaded at a different address.
(e.g. KASLR).

Add all the notifiers necessary to do this. We only support shared events
so all events are left registered and enabled over CPU hotplug.

Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: James Morse <james.morse@arm.com>
[catalin.marinas@arm.com: added CPU_PM_ENTER_FAILED case]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit da351827240e1705cca64bb8ae526f0ce1068048)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

arm64: kernel: Add arch-specific SDEI entry code and CPU masking

BugLink: http://bugs.launchpad.net/bugs/1756096
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications.

Such notifications enter the kernel at the registered entry-point
with the register values of the interrupted CPU context. Because this
is not a CPU exception, it cannot reuse the existing entry code.
(crucially we don't implicitly know which exception level we interrupted),

Add the entry point to entry.S to set us up for calling into C code. If
the event interrupted code that had interrupts masked, we always return
to that location. Otherwise we pretend this was an IRQ, and use SDEI's
complete_and_resume call to return to vbar_el1 + offset.

This allows the kernel to deliver signals to user space processes. For
KVM this triggers the world switch, a quick spin round vcpu_run, then
back into the guest, unless there are pending signals.

Add sdei_mask_local_cpu() calls to the smp_send_stop() code, this covers
the panic() code-path, which doesn't invoke cpuhotplug notifiers.

Because we can interrupt entry-from/exit-to another EL, we can't trust the
value in sp_el0 or x29, even if we interrupted the kernel, in this case
the code in entry.S will save/restore sp_el0 and use the value in
__entry_task.

When we have VMAP stacks we can interrupt the stack-overflow test, which
stirs x0 into sp, meaning we have to have our own VMAP stacks. For now
these are allocated when we probe the interface. Future patches will add
refcounting hooks to allow the arch code to allocate them lazily.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit f5df26961853d6809d704cedcaf082c57f635a64)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

arm64: uaccess: Add PAN helper

BugLink: http://bugs.launchpad.net/bugs/1756096
Add __uaccess_{en,dis}able_hw_pan() helpers to set/clear the PSTATE.PAN
bit.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit e1281f56f114f3a945bf9ec30698bd3caa59d322)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

arm64: Add vmap_stack header file

BugLink: http://bugs.launchpad.net/bugs/1756096
Today the arm64 arch code allocates an extra IRQ stack per-cpu. If we
also have SDEI and VMAP stacks we need two extra per-cpu VMAP stacks.

Move the VMAP stack allocation out to a helper in a new header file.
This avoids missing THREADINFO_GFP, or getting the all-important alignment
wrong.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Reviewed-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit ed8b20d457d72e9e2a30533b436fdb4ea1c70b38)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

firmware: arm_sdei: Add driver for Software Delegated Exceptions

BugLink: http://bugs.launchpad.net/bugs/1756096
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement firmware notifications (such as
firmware-first RAS) or promote an IRQ that has been promoted to a
firmware-assisted NMI.

Add the code for detecting the SDEI version and the framework for
registering and unregistering events. Subsequent patches will add the
arch-specific backend code and the necessary power management hooks.

Only shared events are supported, power management, private events and
discovery for ACPI systems will be added by later patches.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit ad6eb31ef90355993eb55ff77e0e855ae7d91e4c)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

Docs: dt: add devicetree binding for describing arm64 SDEI firmware

BugLink: http://bugs.launchpad.net/bugs/1756096
The Software Delegated Exception Interface (SDEI) is an ARM standard
for registering callbacks from the platform firmware into the OS.
This is typically used to implement RAS notifications, or from an
IRQ that has been promoted to a firmware-assisted NMI.

Add a new devicetree binding to describe the SDE firmware interface.

Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 86f04f640058143388f56c048f86e66ea5204ae2)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

KVM: arm64: Stop save/restoring host tpidr_el1 on VHE

BugLink: http://bugs.launchpad.net/bugs/1756096
Now that a VHE host uses tpidr_el2 for the cpu offset we no longer
need KVM to save/restore tpidr_el1. Move this from the 'common' code
into the non-vhe code. While we're at it, on VHE we don't need to
save the ELR or SPSR as kernel_entry in entry.S will have pushed these
onto the kernel stack, and will restore them from there. Move these
to the non-vhe code as we need them to get back to the host.

Finally remove the always-copy-tpidr we hid in the stage2 setup
code, cpufeature's enable callback will do this for VHE, we only
need KVM to do it for non-vhe. Add the copy into kvm-init instead.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 1f742679c33bc083722cb0b442a95d458c491b56)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

arm64: alternatives: use tpidr_el2 on VHE hosts

BugLink: http://bugs.launchpad.net/bugs/1756096
Now that KVM uses tpidr_el2 in the same way as Linux's cpu_offset in
tpidr_el1, merge the two. This saves KVM from save/restoring tpidr_el1
on VHE hosts, and allows future code to blindly access per-cpu variables
without triggering world-switch.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 6d99b68933fbcf51f84fcbba49246ce1209ec193)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

KVM: arm64: Change hyp_panic()s dependency on tpidr_el2

BugLink: http://bugs.launchpad.net/bugs/1756096
Make tpidr_el2 a cpu-offset for per-cpu variables in the same way the
host uses tpidr_el1. This lets tpidr_el{1,2} have the same value, and
on VHE they can be the same register.

KVM calls hyp_panic() when anything unexpected happens. This may occur
while a guest owns the EL1 registers. KVM stashes the vcpu pointer in
tpidr_el2, which it uses to find the host context in order to restore
the host EL1 registers before parachuting into the host's panic().

The host context is a struct kvm_cpu_context allocated in the per-cpu
area, and mapped to hyp. Given the per-cpu offset for this CPU, this is
easy to find. Change hyp_panic() to take a pointer to the
struct kvm_cpu_context. Wrap these calls with an asm function that
retrieves the struct kvm_cpu_context from the host's per-cpu area.

Copy the per-cpu offset from the hosts tpidr_el1 into tpidr_el2 during
kvm init. (Later patches will make this unnecessary for VHE hosts)

We print out the vcpu pointer as part of the panic message. Add a back
reference to the 'running vcpu' in the host cpu context to preserve this.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit c97e166e54b662717d20ec2e36761758d2b6a7c2)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

KVM: arm/arm64: Convert kvm_host_cpu_state to a static per-cpu allocation

BugLink: http://bugs.launchpad.net/bugs/1756096
kvm_host_cpu_state is a per-cpu allocation made from kvm_arch_init()
used to store the host EL1 registers when KVM switches to a guest.

Make it easier for ASM to generate pointers into this per-cpu memory
by making it a static allocation.

Signed-off-by: James Morse <james.morse@arm.com>
Acked-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 36989e7fd386a9a5822c48691473863f8fbb404d)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

KVM: arm64: Store vcpu on the stack during __guest_enter()

BugLink: http://bugs.launchpad.net/bugs/1756096
KVM uses tpidr_el2 as its private vcpu register, which makes sense for
non-vhe world switch as only KVM can access this register. This means
vhe Linux has to use tpidr_el1, which KVM has to save/restore as part
of the host context.

If the SDEI handler code runs behind KVMs back, it mustn't access any
per-cpu variables. To allow this on systems with vhe we need to make
the host use tpidr_el2, saving KVM from save/restoring it.

__guest_enter() stores the host_ctxt on the stack, do the same with
the vcpu.

Signed-off-by: James Morse <james.morse@arm.com>
Reviewed-by: Christoffer Dall <cdall@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
(cherry picked from commit 32b03d1059667a39e089c45ee38ec9c16332430f)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

PCI/DPC: Enable DPC only if AER is available

BugLink: http://bugs.launchpad.net/bugs/1756094
The "Determination of DPC Control" implementation note in PCIe r4.0, sec
6.1.10, recommends the operating system always link DPC control to the
control of AER, as the two functionalities are strongly connected.

To avoid conflicts over whether platform firmware or the OS controls DPC,
enable DPC only if AER is enabled in the OS, and the device's error
handling does not have firmware-first AER handling.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
(cherry picked from commit eed85ff4c0da72640dcf7c0737c5a08bca2958e7)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

PCI/AER: Return error if AER is not supported

BugLink: http://bugs.launchpad.net/bugs/1756094
get_device_error_info() reads error information from registers in the AER
capability. If we call it for a device that has no AER capability, it
should return an error, but previously it returned success.

Return 0 (error) if the device doesn't have an AER capability.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
(cherry picked from commit 0f6f1d9fca4ad91ce9b30dc0aa847b0947786261)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

PCI: Make PCI_SCAN_ALL_PCIE_DEVS work for Root as well as Downstream Ports

BugLink: http://bugs.launchpad.net/bugs/1756094
PCIe Downstream Ports normally have only a Device 0 below them.  To
optimize enumeration, we don't scan for other devices *unless* the
PCI_SCAN_ALL_PCIE_DEVS flag is set by set by quirks or the
"pci=pcie_scan_all" kernel parameter.

Previously PCI_SCAN_ALL_PCIE_DEVS only affected scanning below Switch
Downstream Ports, not Root Ports.

But the "Nemo" system, also known as the AmigaOne X1000, has a PA Semi Root
Port whose link leads to an AMD/ATI SB600 South Bridge.  The Root Port is a
PCIe device, of course, but the SB600 contains only conventional PCI
devices with no visible PCIe port.

Simplify and restructure only_one_child() so that we scan for all possible
devices below Root Ports as well as Switch Downstream Ports when
PCI_SCAN_ALL_PCIE_DEVS is set.

This is enough to make Nemo work with "pci=pcie_scan_all".  We would also
like to add a quirk to set PCI_SCAN_ALL_PCIE_DEVS automatically on Nemo so
users wouldn't have to use the "pci=pcie_scan_all" parameter, but we don't
have that yet.

Link: https://lkml.kernel.org/r/CAErSpo55Q8Q=5p6_+uu7ahnw+53ibVDNRXxrzRV9QnUr_9EUfw@mail.gmail.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=198057
Reported-and-Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
(cherry picked from commit d57f0b8c81393e7105331ac037fa465d5a45c65f)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

PCI/ASPM: Unexport internal ASPM interfaces

BugLink: http://bugs.launchpad.net/bugs/1756094
Several of the interfaces defined in include/linux/pci-aspm.h are used only
internally from the PCI core:

  pcie_aspm_init_link_state()
  pcie_aspm_exit_link_state()
  pcie_aspm_pm_state_change()
  pcie_aspm_powersave_config_link()
  pcie_aspm_create_sysfs_dev_files()
  pcie_aspm_remove_sysfs_dev_files()

Move these to the internal drivers/pci/pci.h header so they don't clutter
the driver interface.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
(cherry picked from commit 7d8e7d19b095ae70b1ca483ca36e7985a108abe5)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

PCI/ASPM: Enable Latency Tolerance Reporting when supported

BugLink: http://bugs.launchpad.net/bugs/1756094
Enable Latency Tolerance Reporting (LTR). Note that LTR must be enabled in
the Root Port first, and must not be enabled in any downstream device
unless the Root Port and all intermediate Switches also support LTR.
See PCIe r3.1, sec 6.18.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Vidya Sagar <vidyas@nvidia.com>
(cherry picked from commit c46fd358070f22ba68d6e74c22016a33b914c20a)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

PCI/ASPM: Calculate LTR_L1.2_THRESHOLD from device characteristics

BugLink: http://bugs.launchpad.net/bugs/1756094
Per PCIe r3.1, sec 5.5.1, LTR_L1.2_THRESHOLD determines whether we enter
the L1.2 Link state: if L1.2 is enabled and downstream devices have
reported that they can tolerate latency of at least LTR_L1.2_THRESHOLD, we
must enter L1.2 when CLKREQ# is de-asserted.

The implication is that LTR_L1.2_THRESHOLD is the time required to
transition the Link from L0 to L1.2 and back to L0, and per sec 5.5.3.3.1,
Figures 5-16 and 5-17, it appears that the absolute minimum time for those
transitions would be T(POWER_OFF) + T(L1.2) + T(POWER_ON) + T(COMMONMODE).

Therefore, compute LTR_L1.2_THRESHOLD as:

    2us T(POWER_OFF)
  + 4us T(L1.2)
  + T(POWER_ON)
  + T(COMMONMODE)
  = LTR_L1.2_THRESHOLD

Previously we set LTR_L1.2_THRESHOLD to a fixed value of 163840ns
(163.84us):

  #define LTR_L1_2_THRESHOLD_BITS     ((1 << 21) | (1 << 23) | (1 << 30))
  ((1 << 21) | (1 << 23) | (1 << 30)) = 0x40a00000
  LTR_L1.2_THRESHOLD_Value = (0x40a00000 & 0x03ff0000) >> 16 = 0xa0 = 160
  LTR_L1.2_THRESHOLD_Scale = (0x40a00000 & 0xe0000000) >> 29 = 0x2 (* 1024ns)
  LTR_L1.2_THRESHOLD = 160 * 1024ns = 163840ns

Obviously this doesn't account for the circuit characteristics of different
implementations.

Note that while firmware may enable LTR, Linux itself currently does not
enable LTR.  When L1.2 is enabled but LTR is not, LTR_L1.2_THRESHOLD is
ignored and we always enter L1.2 when it is enabled and CLKREQ# is
de-asserted.  So this patch should not have any effect unless firmware
enables LTR.

Fixes: f1f0366dd6be ("PCI/ASPM: Calculate and save the L1.2 timing parameters")
Link: https://www.coreboot.org/pipermail/coreboot-gerrit/2015-March/021134.html
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Vidya Sagar <vidyas@nvidia.com>
Cc: Kenji Chen <kenji.chen@intel.com>
Cc: Patrick Georgi <pgeorgi@google.com>
Cc: Rajat Jain <rajatja@google.com>
(cherry picked from commit 80d7d7a904fac3f8114448dbb8cc9fa253b10120)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

PCI/AER: Skip recovery callbacks for correctable errors from ACPI APEI

BugLink: http://bugs.launchpad.net/bugs/1756094
PCIe correctable errors are corrected by hardware. Software may log them,
but no other software intervention is required.

There are two paths to enter the AER recovery code: (1) the native path
where Linux fields the AER interrupt and reads the AER registers directly,
and (2) the ACPI path where firmware reads the AER registers and hands them
off to Linux via the ACPI APEI path.

The AER do_recovery() function calls driver error reporting callbacks
(error_detected(), mmio_enabled(), resume(), etc), attempts recovery (for
fatal errors), and logs a "AER: Device recovery successful" message.

Since there's nothing to recover for correctable errors, the native path
already skips do_recovery(), so it doesn't call the driver callbacks and or
emit the message. Make the APEI path do the same.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
(cherry picked from commit b9f80fdc4244b417154ec30d3bc7ec3e76085634)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

PCI / PM: Support for LEAVE_SUSPENDED driver flag

BugLink: http://bugs.launchpad.net/bugs/1756094
Add support for DPM_FLAG_LEAVE_SUSPENDED to the PCI bus type by
making it (a) set the power.may_skip_resume status bit for devices
that, from its perspective, may be left in suspend after system
wakeup from sleep and (b) return early from pci_pm_resume_noirq()
for devices whose remaining resume callbacks during the transition
under way are going to be skipped by the PM core.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
(cherry picked from commit bd755d770ac78e8eeda05877ba66cc66f151e10e)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

PM / core: Add LEAVE_SUSPENDED driver flag

BugLink: http://bugs.launchpad.net/bugs/1756094
Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
instruct the PM core and middle-layer (bus type, PM domain, etc.)
code that it is desirable to leave the device in runtime suspend
after system-wide transitions to the working state (for example,
the device may be slow to resume and it may be better to avoid
resuming it right away).

Generally, the middle-layer code involved in the handling of the
device is expected to indicate to the PM core whether or not the
device may be left in suspend with the help of the device's
power.may_skip_resume status bit. That has to happen in the "noirq"
phase of the preceding system suspend (or analogous) transition.
The middle layer is then responsible for handling the device as
appropriate in its "noirq" resume callback which is executed
regardless of whether or not the device may be left suspended, but
the other resume callbacks (except for ->complete) will be skipped
automatically by the core if the device really can be left in
suspend.

The additional power.must_resume status bit introduced for the
implementation of this mechanisn is used internally by the PM core
to track the requirement to resume the device (which may depend on
its children etc).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
(backported from commit 0d4b54c6fee87ff60b0bc1007ca487449698468d)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: scsi: hisi_sas: export device table of v3 hw to userspace

BugLink: http://bugs.launchpad.net/bugs/1756094
Export device table of v3 hw to userspace, or auto probe will fail for
v3 hw.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: scsi: hisi_sas: config for hip08 ES

BugLink: http://bugs.launchpad.net/bugs/1756094
Do some modifications for configuring for hip08 ES

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: fix a bug in hisi_sas_dev_gone()

BugLink: http://bugs.launchpad.net/bugs/1756094
When device gone, NULL pointer can be accessed in free_device callback
if during SAS controller reset as we clear structure sas_dev prior.

Actually we can only set dev_type as SAS_PHY_UNUSED and not clear
structure sas_dev as all the members of structure sas_dev will be
re-initialized after device found.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 0d762b3af2a5b5095fec18aa4d61f408638aa9ca)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: make local symbol host_attrs static

BugLink: http://bugs.launchpad.net/bugs/1756094
Fixes the following sparse warning:

drivers/scsi/hisi_sas/hisi_sas_main.c:1691:25: warning:
symbol 'host_attrs' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1e15feacb9d3743ca0b314a6daf8cc59c90b1046)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: Change frame type for SET MAX commands

BugLink: http://bugs.launchpad.net/bugs/1756094
According to ATA protocol, SET MAX commands belong to different frame
types. So judge features field of SET MAX commands to decide which
frame type they belongs to.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 468f4b8d0711146f0075513e6047079a26fc3903)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: add v3 hw suspend and resume

BugLink: http://bugs.launchpad.net/bugs/1756094
For v3 hw SAS, it supports configuring power state from D0 to D3 for entering
Low Power status and power state from D3 to D0 for quit Low Power status.

When power state from D0 to D3, HW will send FLR to clear the registers of
ECAM and BAR space, and when power state from D3 to D0, it will clear the
registers of ECAM space only.

So when suspend, need to do like controller reset (including disable
interrupts/DQ/PHY/BUS), and also release slots after FLR. When resume,
re-config the registers of BAR space.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4d0951ee70d348b694ce2bbdcc65b684239da4b4)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: re-add the lldd_port_deformed()

BugLink: http://bugs.launchpad.net/bugs/1756094
In function sas_suspend_devices(), it requires callback lldd_port_deformed
callback to be implemented if lldd_port_deformed is implemented.

So add a stub for lldd_port_deformed.

Callback lldd_port_deformed was not required as the port deformation is done
elsewhere in the LLDD.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(backported from commit 336bd78bdabf39dbcee6b41f9628c6e51d1c25b0)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: fix SAS_QUEUE_FULL problem while running IO

BugLink: http://bugs.launchpad.net/bugs/1756094
This patch fix SAS_QUEUE_FULL problem. The test situation is close port while
running IO.

In sas_eh_handle_sas_errors(), SCSI EH will free sas_task of the device if
lldd_I_T_nexus_reset() return TMF_RESP_FUNC_COMPLETE or -ENODEV. But in our
SAS driver, we only free slots of the device when the return value is
TMF_RESP_FUNC_COMPLETE. So if the return value is -ENODEV, the slot resource
will not free any more.

As an solution, we should also free slots of the device in
lldd_I_T_nexus_reset() if the return value is -ENODEV.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 9960a24a1c96a40d6ab984ffefdd0e3003a3377e)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: add internal abort dev in some places

BugLink: http://bugs.launchpad.net/bugs/1756094
We should do internal abort dev before TMF_ABORT_TASK_SET and TMF_LU_RESET.
Because we may only have done internal abort for single IO in the earlier part
of SCSI EH process. Even the internal abort to the single IO, we also don't
know whether it is successful.

Besides, we should release slots of the device in hisi_sas_abort_task_set() if
the abort is successful.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 2a03813123c4beb0b60be6b3b65a6b30f7124579)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: judge result of internal abort

BugLink: http://bugs.launchpad.net/bugs/1756094
Normally, hardware should ensure that internal abort timeout will never
happen. If happen, it would be an SoC failure. What's more, HW will not
process any other commands if an internal abort hasn't return CQ, and they
will time out also.

So, we should judge the result of internal abort in SCSI EH, if it is failed,
we should give up to do TMF/softreset and return failure to the upper layer
directly.

This patch do following things to achieve this:

1. When internal abort timeout happened, we set return value to -EIO in
   hisi_sas_internal_task_abort().

2. If prep_abort() is not support, let hisi_sas_internal_task_abort() return
   TMF_RESP_FUNC_FAILED.

3. If hisi_sas_internal_task_abort() return an negative number, it can be
   thought that it not executed properly or internal abort timeout. Then we
   won't do behind TMF or softreset, and return failure directly.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 813709f2e1e07fa872c05f43801a05828d33a70a)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: do link reset for some CHL_INT2 ints

BugLink: http://bugs.launchpad.net/bugs/1756094
We should do link reset of PHY when identify timeout or STP link timeout. They
are internal events of SOC and are notified to driver through interrupts of
CHL_INT2.

Besides, we should add an delay work to do link reset as it needs sleep. So,
this patch add an new PHY event HISI_PHYE_LINK_RESET for this.

Notes: v2 HW doesn't report the event of STP link timeout. So, we only need
to handle event of identify timeout for v2 HW.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 057c3d1f07617049671a41bf05652d20071eb639)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: use an general way to delay PHY work

BugLink: http://bugs.launchpad.net/bugs/1756094
Use an general way to do delay work for a PHY. Then it will be easier to add
new delayed work for a PHY in future.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e537b62b0796042e1ab66657c4dab662d19e9f0b)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: add v2 hw port AXI error handling support

BugLink: http://bugs.launchpad.net/bugs/1756094
Add port AXI errors handling for v2 hw. We do host controller reset for such
errors.

Besides, change port muli-bits ECC error handling, and we should also do host
reset for such error. So, this patch put them in the same struct with port AXI
error.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 72f7fc3050d55e9877ecc56f33b7a434fca186f5)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: improve int_chnl_int_v2_hw() consistency with v3 hw

BugLink: http://bugs.launchpad.net/bugs/1756094
Change code format of int_chnl_int_v2_hw() to be consistent with v3 hw to
reduce an tag indent.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f64715d2837bee8fcd71f3e13acc7f02c9e9d98a)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: add some print to enhance debugging

BugLink: http://bugs.launchpad.net/bugs/1756094
Add some print at some places such as error info and cq of exception IO,
device found etc, and also adjust some log levels.

All this to assist debugging ability.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f1c88211454ff8063b358f9ebe250f0fe429319c)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: add RAS feature for v3 hw

BugLink: http://bugs.launchpad.net/bugs/1756094
We use PCIe AER to support RAS feature for v3 hw. This driver should do
following two things to support this:

1. Enable RAS interrupts, so that errors can be reported to RAS module.

2. Realize err_handler for sas_v3_pci_driver. Then if non-fatal error is
detected, print error source and try to recover SAS controller.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1aaf81e0e34988ff56b317b568f92fe6ca447da2)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: change ncq process for v3 hw

BugLink: http://bugs.launchpad.net/bugs/1756094
For v3 hw, each NCQ will return a CQ, so it is no need to acquire IPTT from
ITCT, just acquire it from IPTT field of CQ.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 9f347b2face51d782d1e03f2f05b7c3f93a6dc9a)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: add an mechanism to do reset work synchronously

BugLink: http://bugs.launchpad.net/bugs/1756094
Sometimes it is required to know when the controller reset has completed and
also if it has completed successfully. For such places, we call
hisi_sas_controller_reset() directly before. That may lead to multiple calls
to this function.

This patch create a per-reset structure which contains a completion structure
and status flag to know when the reset completes and also the status. It is
also in hisi_hba.wq to do reset work.

As all host reset works are done in hisi_hba.wq, we don't worry multiple calls
to hisi_sas_controller_reset().

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e402acdb664134f948b62d13b7db866295689f38)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: modify hisi_sas_dev_gone() for reset

BugLink: http://bugs.launchpad.net/bugs/1756094
Do a couple of changes for when HISI_SAS_RESET_BIT is set for HBA:

- Clearing ITCT is not necessary

- Remove internal abort as it will fail during reset

Flag sas_dev->dev_type is kept as SAS_PHY_UNUSED.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f8e45ec226e2c00c1da9cf156ea59a159e9b4ea6)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: some optimizations of host controller reset

BugLink: http://bugs.launchpad.net/bugs/1756094
This patch do following optimizations to host controller reset:

1. Unblock scsi requests before rescanning topology, as SCSI command need be
   used if new device is found during rescanning topology.

2. Remove drain_workqueue(hisi_hba->wq) and drain_workqueue(shost->work_q), as
   there is no need to ensure that all PHYs event are done before exiting host
   reset.

3. Improve message print level of host reset. Host reset is an important and
   very few occurrence event. We should know its progress even when not
   debugging.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit fb51e7a8d38484687337f16636c5be9528e00fed)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: optimise port id refresh function

BugLink: http://bugs.launchpad.net/bugs/1756094
Currently refreshing the PHY port id after reset is done in the rescan
topology function, which is quite late in the reset process. It could be moved
earlier in the process, as the port id can be refreshed once the PHYs become
ready.

In addition to this, we should set the hisi_sas_dev port id to 0xff (invalid
port id) if all PHYs of this port remain down for the same device.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a669bdbf4939ac72eff6b3ae33f771a1ef28448c)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: relocate clearing ITCT and freeing device

BugLink: http://bugs.launchpad.net/bugs/1756094
In certain scenarios we may just want to clear the ITCT for a device, and not
free other resources like the SATA bitmap using in v2 hw.

To facilitate this, this patch relocates the code of clearing ITCT from
free_device() to a new hw interface clear_itct(). Then for some hw, we should
not realise free_device() if there's nothing left to do for it.

[mkp: typo]

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 0258141aaab3007949ba0e67c3d28436354429bb)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: ata: enhance the definition of SET MAX feature field value

BugLink: http://bugs.launchpad.net/bugs/1756094
There are two other values for SET MAX feature field according to ata
protocol. So definite them.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d5c15c2c22a8d4e0e82ca95eac5a6ccd175c0762)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: hisi_sas: fix dma_unmap_sg() parameter

BugLink: http://bugs.launchpad.net/bugs/1756094
For function dma_unmap_sg(), the <nents> parameter should be number of
elements in the scatterlist prior to the mapping, not after the mapping.

Fix this usage.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit dc1e4730e2b636065628f8427b675788bca83d34)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: [Config] set NOBP and expoline options for s390

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

s390/entry.S: fix spurious zeroing of r0

BugLink: http://bugs.launchpad.net/bugs/1754580
when a system call is interrupted we might call the critical section
cleanup handler that re-does some of the operations. When we are between
.Lsysc_vtime and .Lsysc_do_svc we might also redo the saving of the
problem state registers r0-r7:

.Lcleanup_system_call:
[...]
0:      # update accounting time stamp
        mvc     __LC_LAST_UPDATE_TIMER(8),__LC_SYNC_ENTER_TIMER
        # set up saved register r11
        lg      %r15,__LC_KERNEL_STACK
        la      %r9,STACK_FRAME_OVERHEAD(%r15)
        stg     %r9,24(%r11)            # r11 pt_regs pointer
        # fill pt_regs
        mvc     __PT_R8(64,%r9),__LC_SAVE_AREA_SYNC
--->    stmg    %r0,%r7,__PT_R0(%r9)

The problem is now, that we might have already zeroed out r0.
The fix is to move the zeroing of r0 after sysc_do_svc.

Reported-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Fixes: 7041d28115e91 ("s390: scrub registers on kernel entry and KVM exit")
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit d3f468963cd6fd6d2aa5e26aed8b24232096d0e1)

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

s390: do not bypass BPENTER for interrupt system calls

BugLink: http://bugs.launchpad.net/bugs/1754580
The system call path can be interrupted before the switch back to the
standard branch prediction with BPENTER has been done. The critical
section cleanup code skips forward to .Lsysc_do_svc and bypasses the
BPENTER. In this case the kernel and all subsequent code will run with
the limited branch prediction.

Fixes: eacf67eb9b32 ("s390: run user space and KVM guests with modified branch prediction")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit d5feec04fe578c8dbd9e2e1439afc2f0af761ed4)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

s390: Replace IS_ENABLED(EXPOLINE_*) with IS_ENABLED(CONFIG_EXPOLINE_*)

BugLink: http://bugs.launchpad.net/bugs/1754580
I've accidentally stumbled upon the IS_ENABLED(EXPOLINE_*) lines, which
obviously always evaluate to false. Fix this.

Fixes: f19fbd5ed642 ("s390: introduce execute-trampolines for branches")
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit 2cb370d615e9fbed9e95ed222c2c8f337181aa90)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

s390: introduce execute-trampolines for branches

BugLink: http://bugs.launchpad.net/bugs/1754580
Add CONFIG_EXPOLINE to enable the use of the new -mindirect-branch= and
-mfunction_return= compiler options to create a kernel fortified against
the specte v2 attack.

With CONFIG_EXPOLINE=y all indirect branches will be issued with an
execute type instruction. For z10 or newer the EXRL instruction will
be used, for older machines the EX instruction. The typical indirect
call

basr %r14,%r1

is replaced with a PC relative call to a new thunk

brasl %r14,__s390x_indirect_jump_r1

The thunk contains the EXRL/EX instruction to the indirect branch

__s390x_indirect_jump_r1:
exrl 0,0f
j .
0: br %r1

The detour via the execute type instruction has a performance impact.
To get rid of the detour the new kernel parameter "nospectre_v2" and
"spectre_v2=[on,off,auto]" can be used. If the parameter is specified
the kernel and module code will be patched at runtime.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit f19fbd5ed642dc31c809596412dab1ed56f2f156)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

s390: run user space and KVM guests with modified branch prediction

BugLink: http://bugs.launchpad.net/bugs/1754580
Define TIF_ISOLATE_BP and TIF_ISOLATE_BP_GUEST and add the necessary
plumbing in entry.S to be able to run user space and KVM guests with
limited branch prediction.

To switch a user space process to limited branch prediction the
s390_isolate_bp() function has to be call, and to run a vCPU of a KVM
guest associated with the current task with limited branch prediction
call s390_isolate_bp_guest().

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit 6b73044b2b0081ee3dd1cd6eaab7dee552601efb)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

s390: add options to change branch prediction behaviour for the kernel

BugLink: http://bugs.launchpad.net/bugs/1754580
Add the PPA instruction to the system entry and exit path to switch
the kernel to a different branch prediction behaviour. The instructions
are added via CPU alternatives and can be disabled with the "nospec"
or the "nobp=0" kernel parameter. If the default behaviour selected
with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be
used to enable the changed kernel branch prediction.

Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit d768bd892fc8f066cd3aa000eb1867bcf32db0ee)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

s390/alternative: use a copy of the facility bit mask

BugLink: http://bugs.launchpad.net/bugs/1754580
To be able to switch off specific CPU alternatives with kernel parameters
make a copy of the facility bit mask provided by STFLE and use the copy
for the decision to apply an alternative.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit cf1489984641369611556bf00c48f945c77bcf02)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

s390: add optimized array_index_mask_nospec

BugLink: http://bugs.launchpad.net/bugs/1754580
Add an optimized version of the array_index_mask_nospec function for
s390 based on a compare and a subtract with borrow.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit e2dd833389cc4069a96b57bdd24227b5f52288f5)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

s390: scrub registers on kernel entry and KVM exit

BugLink: http://bugs.launchpad.net/bugs/1754580
Clear all user space registers on entry to the kernel and all KVM guest
registers on KVM guest exit if the register does not contain either a
parameter or a result value.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit 7041d28115e91f2144f811ffe8a195c696b1e1d0)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: s390/crypto: Fix kernel crash on aes_s390 module remove.

BugLink: http://bugs.launchpad.net/bugs/1753424
A kernel crash occurs when the aes_s390 kernel module is
removed on machines < z14. This only happens on kernel
version 4.15 and higher on machines not supporting MSA 8.

The reason for the crash is a unconditional
crypto_unregister_aead() invocation where no previous
crypto_register_aead() had been called. The fix now
remembers if there has been a successful registration and
only then calls the unregister function upon kernel module
remove.

The code now crashing has been introduced with
"bf7fa03 s390/crypto: add s390 platform specific aes gcm support."

Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: [Config] fix up retpoline abi files

Differences fall into three categories:

- Removed vulnerable sites.

- Changes in pv_*_ops constant offsets.

- Changes within an __init function.

All of these changes are safe.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Add missing unlock in WQ full logic

BugLink: http://bugs.launchpad.net/bugs/1752182
Commit 6e8e1c14c61e ("scsi: lpfc: Add WQ Full Logic for NVME Target") fails
the static checker. Checker correctly identified a missing unlock on a
return path.

Add the unlock.

Fixes: 6e8e1c14c61e ("scsi: lpfc: Add WQ Full Logic for NVME Target")
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 917d59ac5e26a45dce8b00d38bb0a338a7f22f23 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: use __raw_writeX on DPP copies

BugLink: http://bugs.launchpad.net/bugs/1752182
Commit 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4")
fails compilation on some 32-bit systems as writeq() is not supported on
all architectures. Additionally, it was pointed out that as writeX()
does byteswapping if necessary for pci vs the cpu endianness, the code
was broken on BE PPC.

After discussions with Arnd Bergmann, we've resolved the issue
to the following:
  Instead of writeX(), use __raw_writeX() - which writes to io
    space while preserving byte order. To use this, the code
    was changed to use a different buffer that lpfc prepped
    via sli_pcimem_bcopy() that was set to the bytestream to
    be written.
  On platforms with __raw_writeq support, use the routine, otherwise
    use __raw_writel()

[mkp: checkpatch]

Fixes: 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4")
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4c06619fc4da5b7aae76f1dde25bfea3246f2591 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Change Copyright of 12.0.0.0 modified files to 2018

BugLink: http://bugs.launchpad.net/bugs/1752182
Updated Copyright in files updated as part of 12.0.0.0

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit cf8037f8d08a078d263a9b725e3ae7603ad0d42e linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: update driver version to 12.0.0.0

BugLink: http://bugs.launchpad.net/bugs/1752182
Update the driver version to 12.0.0.0

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6efb23804153fac93bc49ef6e76707f35fbe2163 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Work around NVME cmd iu SGL type

BugLink: http://bugs.launchpad.net/bugs/1752182
The hardware offload for NVME commands was created when the
FC-NVME standard was setting SGL Descriptor Type to SGL Data
Block Descriptor (0h) and SGL Descriptor Sub Type to Address (0h).

A late change in NVMe-over-Fabrics obsoleted these values, creating
a transport SGL descriptor type with new values to go into these
fields.

For initial hardware support, in order to be compliant to the spec,
use host-supplied cmd IU buffers instead of the adapter generated
values. Later hardware will correct this.

Add a module parameter to override this offload disablement if looking
for lowest latency. This is reasonable as nothing in FC-NVME uses
the SQE SGL values.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4e565cf04138fca6ffeb884044febf922b2306d0 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Fix nvme embedded io length on new hardware

BugLink: http://bugs.launchpad.net/bugs/1752182
Newer hardware more strictly enforces buffer lenghts, causing an
mis-set value to be identified. Older hardware won't catch it.
The difference is benign on old hardware.

Set the right embedded buffer length for nvme ios.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 63452e144662a90b77fcdb27bd33c8b43655b850 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Add embedded data pointers for enhanced performance

BugLink: http://bugs.launchpad.net/bugs/1752182
The current driver isn't taking advantage of a performance hint whereby
the initial data buffer descriptor can be placed in the WQE as well as
the SGL.

Add the logic to detect support for the feature and to use it when
supported.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 0bc2b7c5317bd51df571e9d1131547901215f6c9 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Enable fw download on if_type=6 devices

BugLink: http://bugs.launchpad.net/bugs/1752182
Current code is very explicit in what it allows to be downloaded.
The driver checking prevented G7 firmware download. The driver
checking is unnecessary as the device will validate what it receives.

Revise the firmware download interface checking.
Added a little debug support in case there is still a failure.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1feb8204a12ed7987bffa75311754edc1367680f linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Add if_type=6 support for cycling valid bits

BugLink: http://bugs.launchpad.net/bugs/1752182
Traditional SLI4 required the driver to clear Valid bits on
EQEs and CQEs after consuming them.

The new if_type=6 hardware will cycle the value for what is
valid on each queue itteration. The driver no longer has to
touch the valid bits. This also means all the cpu cache
dirtying and perhaps flush/refill's done by the hardware
in accessing the EQ/CQ elements is eliminated.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7365f6fdbba559f7e814519fafe6e4956f68b6be linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Add 64G link speed support

BugLink: http://bugs.launchpad.net/bugs/1752182
The G7 adapter supports 64G link speeds. Add support to the driver.

In addition, a small cleanup to replace the odd bitmap logic with
a switch case.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit fbd8a6ba65443a8a79183edd9c2e1ad302339063 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Add PCI Ids for if_type=6 hardware

BugLink: http://bugs.launchpad.net/bugs/1752182
Add PCI ids for the new G7 adapter

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c238b9b6eae399e81d36382b09c2e969c154b7ee linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Add push-to-adapter support to sli4

BugLink: http://bugs.launchpad.net/bugs/1752182
New if_type=6 adapters support an additional BAR that provides
apertures to allow direct WQE to adapter push support - termed
Direct Packet Push (DPP). WQ creation differs slightly to ask for
a WQ to be DPP-ized. When submitting a WQE to a DPP WQ, it is
submitted to the host memory for the WQ normally, but is also
written by the host cpu directly to a BAR aperture. Write buffer
coalescing in hardware is (hopefully) turned on, enabling single
pci write operation support. The doorbell is thing rung to indicate
the WQE is available and was pushed to the aperture.

This patch:
- Updates the WQ Create commands for the DPP options
- Adds the bar mapping for if_type=6 DPP bar
- Adds the WQE pushing to the DDP aperture received from WQ create
- Adds a new module parameter to disable DPP operation if desired.
Default is enabled.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1351e69fc6db30e186295f1c9495d03cef6a01a2 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Add SLI-4 if_type=6 support to the code base

BugLink: http://bugs.launchpad.net/bugs/1752182
New hardware supports a SLI-4 interface, but with a new if_type
variant of 6.

If_type=6 has a different PCI BAR map, separate EQ/CQ doorbells,
and some changes in doorbell formats.

Add the changes for the if_type into headers, adapter initialization
and control flows. Add new eq and cq handlers.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 27d6ac0a6e830043bd5db89fee8adddb41ada2f7 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Rework sli4 doorbell infrastructure

BugLink: http://bugs.launchpad.net/bugs/1752182
Up until now, all SLI-4 devices had the same doorbells at the same
bar locations. With newer hardware, there are now independent EQ and
CQ doorbells and the bar locations differ.

Prepare the code for new hardware by separating the eq/cq doorbell into
separate components. The components can be set based on if_type.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 9dd35425a50c667ae2b6c2cda201425ed2d3fd25 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Rework lpfc to allow different sli4 cq and eq handlers

BugLink: http://bugs.launchpad.net/bugs/1752182
Up until now, an SLI-4 device had no variance in the way it handled
its EQs and CQs. With newer hardware, there are now differences in
doorbells and some differences in how entries are valid.

Prepare the code for new hardware by creating a sli4-based callout
table that can be set based on if_type.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b71413dd01bbf302236cfb61df44702ea838dd75 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Update 11.4.0.7 modified files for 2018 Copyright

BugLink: http://bugs.launchpad.net/bugs/1752182
Updated Copyright in files updated 11.4.0.7

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 128bddacc4dd7c86070e1e0534687e3083a89d52 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: update driver version to 11.4.0.7

BugLink: http://bugs.launchpad.net/bugs/1752182
Update the driver version to 11.4.0.7

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6e9d2f1667ea12bd2f997a7529fb41cce8e0036d linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Fix nonrecovery of NVME controller after cable swap.

BugLink: http://bugs.launchpad.net/bugs/1752182
In a test that is doing large numbers of cable swaps on the target, the
nvme controllers wouldn't reconnect.

During the cable swaps, the targets n_port_id would change. This
information was passed to the nvme-fc transport, in the new remoteport
registration. However, the nvme-fc transport didn't update the n_port_id
value in the remoteport struct when it reused an existing structure.
Later, when a new association was attempted on the remoteport, the
driver's NVME LS routine would use the stale n_port_id from the
remoteport struct to address the LS. As the device is no longer at that
address, the LS would go into never never land.

Separately, the nvme-fc transport will be corrected to update the
n_port_id value on a re-registration.

However, for now, there's no reason to use the transports values. The
private pointer points to the drivers node structure and the node
structure is up to date. Therefore, revise the LS routine to use the
drivers data structures for the LS. Augmented the debug message for
better debugging in the future.

Also removed a duplicate if check that seems to have slipped in.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 815a9c437617e221842d12b3366ff6911b3df628 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Treat SCSI Write operation Underruns as an error

BugLink: http://bugs.launchpad.net/bugs/1752182
Currently, write underruns (mismatch of amount transferred vs scsi
status and its residual) detected by the adapter are not being flagged
as an error. Its expected the target controls the data transfer and
would appropriately set the RSP values. Only read underruns are treated
as errors.

Revise the SCSI error handling to treat write underruns as an error as
well.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 45634a86ca6e98dbcaddb763f8e90ad243057789 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Fix header inclusion in lpfc_nvmet

BugLink: http://bugs.launchpad.net/bugs/1752182
The driver was inappropriately pulling in the nvme host's nvme.h
header. What it really needed was the standard <linux/nvme.h> header.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8d731d1aa993c44fcf4de0dbd42059e00cf37102 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Validate adapter support for SRIU option

BugLink: http://bugs.launchpad.net/bugs/1752182
When using the special option to suppress the response iu, ensure the
adapter fully supports the feature by checking feature flags from the
adapter and validating the support when formatting the WQE.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 20aefac3a9a23b56db43f1fe1b3ae72c87e39137 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Fix SCSI io host reset causing kernel crash

BugLink: http://bugs.launchpad.net/bugs/1752182
During SCSI error handling escalation to host reset, the SCSI io
routines were moved off the txcmplq, but the individual io's ON_CMPLQ
flag wasn't cleared. Thus, a background thread saw the io and attempted
to access it as if on the txcmplq.

Clear the flag upon removal.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c1dd9111b7f78a90bccd2e4abb9b9bb6319a4c64 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Indicate CONF support in NVMe PRLI

BugLink: http://bugs.launchpad.net/bugs/1752182
Revise the NVME PRLI to indicate CONF support.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a5ff06817eb86d022bc11993850a42732d7e6979 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Fix issue_lip if link is disabled

BugLink: http://bugs.launchpad.net/bugs/1752182
The driver ignored checks on whether the link should be kept
administratively down after a link bounce. Correct the checks.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 2289e9598dde9705400559ca2606fb8c145c34f0 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Fix soft lockup in lpfc worker thread during LIP testing

BugLink: http://bugs.launchpad.net/bugs/1752182
During link bounce testing in a point-to-point topology, the host may
enter a soft lockup on the lpfc_worker thread:

    Call Trace:
     lpfc_work_done+0x1f3/0x1390 [lpfc]
     lpfc_do_work+0x16f/0x180 [lpfc]
     kthread+0xc7/0xe0
     ret_from_fork+0x3f/0x70

The driver was simultaneously setting a combination of flags that caused
lpfc_do_work()to effectively spin between slow path work and new event
data, causing the lockup.

Ensure in the typical wq completions, that new event data flags are set
if the slow path flag is running. The slow path will eventually
reschedule the wq handling.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 161df4f09987ae2e9f0f97f0b38eee298b4a39ff linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Allow set of maximum outstanding SCSI cmd limit for a target

BugLink: http://bugs.launchpad.net/bugs/1752182
Make the attribute writeable.

Remove the ramp up to logic as its unnecessary, simply set depth. Add
debug message if depth changed, possibly reducing limit, yet our
outstanding count has yet to catch up with it.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 64bf009933bc84a7fb44ff50f86af0201b8be0c3 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Fix RQ empty firmware trap

BugLink: http://bugs.launchpad.net/bugs/1752182
When nvme target deferred receive logic waits for exchange resources,
the corresponding receive buffer is not replenished with the hardware.
This can result in a lack of asynchronous receive buffer resources in
the hardware, resulting in a "2885 Port Status Event: ... error
1=0x52004a01 ..." message.

Correct by replenishing the buffer whenenver the deferred logic kicks
in. Update corresponding debug messages and statistics as well.

[mkp: applied by hand]

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 411de511c6943554cdc4173c3f522029db2f75c7 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Fix IO failure during hba reset testing with nvme io.

BugLink: http://bugs.launchpad.net/bugs/1752182
A stress test repeatedly resetting the adapter while performing io would
eventually report I/O failures and missing nvme namespaces.

The driver was setting the nvmefc_fcp_req->private pointer to NULL
during the IO completion routine before upcalling done(). If the
transport was also running an abort for that IO, the driver would fail
the abort with message 6140. Failing the abort is not allowed by the
nvme-fc transport, as it mandates that the io must be returned back to
the transport. As that does not happen, the transport controller delete
has an outstanding reference and can't complete teardown.

The NULL-ing of the private pointer should be done only when the io is
considered complete. It's complete when the adapter returns the exchange
with the "exchange busy" flag clear.

Move the NULL'ing of the structure to the done case. This leaves the io
contexts set while it is busy and until the subsequent XRI_ABORTED
completion which returns the exchange is received.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 91455b850956bc13708a074bd1400f54aae74890 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Fix PRLI handling when topology type changes

BugLink: http://bugs.launchpad.net/bugs/1752182
The lpfc driver does not discover a target when the topology changes
from switched-fabric to direct-connect. The target rejects the PRLI from
the initiator in direct-connect as the driver is using the old S_ID from
the switched topology.

The driver was inappropriately clearing the VP bit to register the VPI,
which is what is associated with the S_ID.

Fix by leaving the VP bit set (it was set earlier) and as the VFI is
being re-registered, set the UPDT bit.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 2c3b2a8f652566c5b35d945f0c8146555d2062ec linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Add WQ Full Logic for NVME Target

BugLink: http://bugs.launchpad.net/bugs/1752182
I/O conditions on the nvme target may have the driver submitting to a
full hardware wq. The hardware wq is a shared resource among all nvme
controllers. When the driver hit a full wq, it failed the io posting
back to the nvme-fc transport, which then escalated it into errors.

Correct by maintaining a sideband queue within the driver that is added
to when the WQ full condition is hit, and drained from as soon as new WQ
space opens up.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6e8e1c14c61e54253098521127cd5ac0b959dd32 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: correct debug counters for abort

BugLink: http://bugs.launchpad.net/bugs/1752182
Existing code was using the wrong field for the completion status when
comparing whether to increment abort statistics

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8ae337013674d5c1e803429356b85cba2ce12067 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: move placement of target destroy on driver detach

BugLink: http://bugs.launchpad.net/bugs/1752182
Ensure nvme localports/targetports are torn down before dismantling the
adapter sli interface on driver detachment. This aids leaving
interfaces live while nvme may be making callbacks to abort it.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 281d61902ffbab47901f8616a38a45144627dd9e linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

scsi: lpfc: Increase CQ and WQ sizes for SCSI

BugLink: http://bugs.launchpad.net/bugs/1752182
Increased CQ and WQ sizes for SCSI FCP, matching those used for NVMe
development.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c176ffa0841c632593c5007f1d1c9ed126481daa linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>