]> git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/log
mirror_ubuntu-bionic-kernel.git
6 years agoPCI/DPC: Enable DPC only if AER is available
Keith Busch [Wed, 24 Jan 2018 23:03:18 +0000 (17:03 -0600)]
PCI/DPC: Enable DPC only if AER is available

BugLink: http://bugs.launchpad.net/bugs/1756094
The "Determination of DPC Control" implementation note in PCIe r4.0, sec
6.1.10, recommends the operating system always link DPC control to the
control of AER, as the two functionalities are strongly connected.

To avoid conflicts over whether platform firmware or the OS controls DPC,
enable DPC only if AER is enabled in the OS, and the device's error
handling does not have firmware-first AER handling.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
(cherry picked from commit eed85ff4c0da72640dcf7c0737c5a08bca2958e7)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoPCI/AER: Return error if AER is not supported
Keith Busch [Tue, 19 Dec 2017 21:06:40 +0000 (14:06 -0700)]
PCI/AER: Return error if AER is not supported

BugLink: http://bugs.launchpad.net/bugs/1756094
get_device_error_info() reads error information from registers in the AER
capability.  If we call it for a device that has no AER capability, it
should return an error, but previously it returned success.

Return 0 (error) if the device doesn't have an AER capability.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <helgaas@kernel.org>
(cherry picked from commit 0f6f1d9fca4ad91ce9b30dc0aa847b0947786261)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoPCI: Make PCI_SCAN_ALL_PCIE_DEVS work for Root as well as Downstream Ports
Bjorn Helgaas [Thu, 30 Nov 2017 21:22:39 +0000 (15:22 -0600)]
PCI: Make PCI_SCAN_ALL_PCIE_DEVS work for Root as well as Downstream Ports

BugLink: http://bugs.launchpad.net/bugs/1756094
PCIe Downstream Ports normally have only a Device 0 below them.  To
optimize enumeration, we don't scan for other devices *unless* the
PCI_SCAN_ALL_PCIE_DEVS flag is set by set by quirks or the
"pci=pcie_scan_all" kernel parameter.

Previously PCI_SCAN_ALL_PCIE_DEVS only affected scanning below Switch
Downstream Ports, not Root Ports.

But the "Nemo" system, also known as the AmigaOne X1000, has a PA Semi Root
Port whose link leads to an AMD/ATI SB600 South Bridge.  The Root Port is a
PCIe device, of course, but the SB600 contains only conventional PCI
devices with no visible PCIe port.

Simplify and restructure only_one_child() so that we scan for all possible
devices below Root Ports as well as Switch Downstream Ports when
PCI_SCAN_ALL_PCIE_DEVS is set.

This is enough to make Nemo work with "pci=pcie_scan_all".  We would also
like to add a quirk to set PCI_SCAN_ALL_PCIE_DEVS automatically on Nemo so
users wouldn't have to use the "pci=pcie_scan_all" parameter, but we don't
have that yet.

Link: https://lkml.kernel.org/r/CAErSpo55Q8Q=5p6_+uu7ahnw+53ibVDNRXxrzRV9QnUr_9EUfw@mail.gmail.com
Link: https://bugzilla.kernel.org/show_bug.cgi?id=198057
Reported-and-Tested-by: Christian Zigotzky <chzigotzky@xenosoft.de>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
(cherry picked from commit d57f0b8c81393e7105331ac037fa465d5a45c65f)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoPCI/ASPM: Unexport internal ASPM interfaces
Bjorn Helgaas [Fri, 15 Dec 2017 14:57:28 +0000 (08:57 -0600)]
PCI/ASPM: Unexport internal ASPM interfaces

BugLink: http://bugs.launchpad.net/bugs/1756094
Several of the interfaces defined in include/linux/pci-aspm.h are used only
internally from the PCI core:

  pcie_aspm_init_link_state()
  pcie_aspm_exit_link_state()
  pcie_aspm_pm_state_change()
  pcie_aspm_powersave_config_link()
  pcie_aspm_create_sysfs_dev_files()
  pcie_aspm_remove_sysfs_dev_files()

Move these to the internal drivers/pci/pci.h header so they don't clutter
the driver interface.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
(cherry picked from commit 7d8e7d19b095ae70b1ca483ca36e7985a108abe5)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoPCI/ASPM: Enable Latency Tolerance Reporting when supported
Bjorn Helgaas [Tue, 28 Nov 2017 22:43:50 +0000 (16:43 -0600)]
PCI/ASPM: Enable Latency Tolerance Reporting when supported

BugLink: http://bugs.launchpad.net/bugs/1756094
Enable Latency Tolerance Reporting (LTR).  Note that LTR must be enabled in
the Root Port first, and must not be enabled in any downstream device
unless the Root Port and all intermediate Switches also support LTR.
See PCIe r3.1, sec 6.18.

Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Vidya Sagar <vidyas@nvidia.com>
(cherry picked from commit c46fd358070f22ba68d6e74c22016a33b914c20a)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoPCI/ASPM: Calculate LTR_L1.2_THRESHOLD from device characteristics
Bjorn Helgaas [Fri, 17 Nov 2017 20:26:42 +0000 (14:26 -0600)]
PCI/ASPM: Calculate LTR_L1.2_THRESHOLD from device characteristics

BugLink: http://bugs.launchpad.net/bugs/1756094
Per PCIe r3.1, sec 5.5.1, LTR_L1.2_THRESHOLD determines whether we enter
the L1.2 Link state: if L1.2 is enabled and downstream devices have
reported that they can tolerate latency of at least LTR_L1.2_THRESHOLD, we
must enter L1.2 when CLKREQ# is de-asserted.

The implication is that LTR_L1.2_THRESHOLD is the time required to
transition the Link from L0 to L1.2 and back to L0, and per sec 5.5.3.3.1,
Figures 5-16 and 5-17, it appears that the absolute minimum time for those
transitions would be T(POWER_OFF) + T(L1.2) + T(POWER_ON) + T(COMMONMODE).

Therefore, compute LTR_L1.2_THRESHOLD as:

    2us T(POWER_OFF)
  + 4us T(L1.2)
  + T(POWER_ON)
  + T(COMMONMODE)
  = LTR_L1.2_THRESHOLD

Previously we set LTR_L1.2_THRESHOLD to a fixed value of 163840ns
(163.84us):

  #define LTR_L1_2_THRESHOLD_BITS     ((1 << 21) | (1 << 23) | (1 << 30))
  ((1 << 21) | (1 << 23) | (1 << 30)) = 0x40a00000
  LTR_L1.2_THRESHOLD_Value = (0x40a00000 & 0x03ff0000) >> 16 = 0xa0 = 160
  LTR_L1.2_THRESHOLD_Scale = (0x40a00000 & 0xe0000000) >> 29 = 0x2 (* 1024ns)
  LTR_L1.2_THRESHOLD = 160 * 1024ns = 163840ns

Obviously this doesn't account for the circuit characteristics of different
implementations.

Note that while firmware may enable LTR, Linux itself currently does not
enable LTR.  When L1.2 is enabled but LTR is not, LTR_L1.2_THRESHOLD is
ignored and we always enter L1.2 when it is enabled and CLKREQ# is
de-asserted.  So this patch should not have any effect unless firmware
enables LTR.

Fixes: f1f0366dd6be ("PCI/ASPM: Calculate and save the L1.2 timing parameters")
Link: https://www.coreboot.org/pipermail/coreboot-gerrit/2015-March/021134.html
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Vidya Sagar <vidyas@nvidia.com>
Cc: Kenji Chen <kenji.chen@intel.com>
Cc: Patrick Georgi <pgeorgi@google.com>
Cc: Rajat Jain <rajatja@google.com>
(cherry picked from commit 80d7d7a904fac3f8114448dbb8cc9fa253b10120)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoPCI/AER: Skip recovery callbacks for correctable errors from ACPI APEI
Tyler Baicar [Mon, 28 Aug 2017 17:09:44 +0000 (11:09 -0600)]
PCI/AER: Skip recovery callbacks for correctable errors from ACPI APEI

BugLink: http://bugs.launchpad.net/bugs/1756094
PCIe correctable errors are corrected by hardware.  Software may log them,
but no other software intervention is required.

There are two paths to enter the AER recovery code: (1) the native path
where Linux fields the AER interrupt and reads the AER registers directly,
and (2) the ACPI path where firmware reads the AER registers and hands them
off to Linux via the ACPI APEI path.

The AER do_recovery() function calls driver error reporting callbacks
(error_detected(), mmio_enabled(), resume(), etc), attempts recovery (for
fatal errors), and logs a "AER: Device recovery successful" message.

Since there's nothing to recover for correctable errors, the native path
already skips do_recovery(), so it doesn't call the driver callbacks and or
emit the message.  Make the APEI path do the same.

Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org>
[bhelgaas: changelog]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
(cherry picked from commit b9f80fdc4244b417154ec30d3bc7ec3e76085634)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoPCI / PM: Support for LEAVE_SUSPENDED driver flag
Rafael J. Wysocki [Sat, 18 Nov 2017 14:33:52 +0000 (15:33 +0100)]
PCI / PM: Support for LEAVE_SUSPENDED driver flag

BugLink: http://bugs.launchpad.net/bugs/1756094
Add support for DPM_FLAG_LEAVE_SUSPENDED to the PCI bus type by
making it (a) set the power.may_skip_resume status bit for devices
that, from its perspective, may be left in suspend after system
wakeup from sleep and (b) return early from pci_pm_resume_noirq()
for devices whose remaining resume callbacks during the transition
under way are going to be skipped by the PM core.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
(cherry picked from commit bd755d770ac78e8eeda05877ba66cc66f151e10e)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoPM / core: Add LEAVE_SUSPENDED driver flag
Rafael J. Wysocki [Sat, 18 Nov 2017 14:31:49 +0000 (15:31 +0100)]
PM / core: Add LEAVE_SUSPENDED driver flag

BugLink: http://bugs.launchpad.net/bugs/1756094
Define and document a new driver flag, DPM_FLAG_LEAVE_SUSPENDED, to
instruct the PM core and middle-layer (bus type, PM domain, etc.)
code that it is desirable to leave the device in runtime suspend
after system-wide transitions to the working state (for example,
the device may be slow to resume and it may be better to avoid
resuming it right away).

Generally, the middle-layer code involved in the handling of the
device is expected to indicate to the PM core whether or not the
device may be left in suspend with the help of the device's
power.may_skip_resume status bit.  That has to happen in the "noirq"
phase of the preceding system suspend (or analogous) transition.
The middle layer is then responsible for handling the device as
appropriate in its "noirq" resume callback which is executed
regardless of whether or not the device may be left suspended, but
the other resume callbacks (except for ->complete) will be skipped
automatically by the core if the device really can be left in
suspend.

The additional power.must_resume status bit introduced for the
implementation of this mechanisn is used internally by the PM core
to track the requirement to resume the device (which may depend on
its children etc).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
(backported from commit 0d4b54c6fee87ff60b0bc1007ca487449698468d)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoUBUNTU: SAUCE: scsi: hisi_sas: export device table of v3 hw to userspace
chenxiang [Wed, 7 Mar 2018 02:46:02 +0000 (10:46 +0800)]
UBUNTU: SAUCE: scsi: hisi_sas: export device table of v3 hw to userspace

BugLink: http://bugs.launchpad.net/bugs/1756094
Export device table of v3 hw to userspace, or auto probe will fail for
v3 hw.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoUBUNTU: SAUCE: scsi: hisi_sas: config for hip08 ES
chenxiang [Tue, 16 Jan 2018 09:14:19 +0000 (17:14 +0800)]
UBUNTU: SAUCE: scsi: hisi_sas: config for hip08 ES

BugLink: http://bugs.launchpad.net/bugs/1756094
Do some modifications for configuring for hip08 ES

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: fix a bug in hisi_sas_dev_gone()
Xiang Chen [Wed, 17 Jan 2018 16:46:54 +0000 (00:46 +0800)]
scsi: hisi_sas: fix a bug in hisi_sas_dev_gone()

BugLink: http://bugs.launchpad.net/bugs/1756094
When device gone, NULL pointer can be accessed in free_device callback
if during SAS controller reset as we clear structure sas_dev prior.

Actually we can only set dev_type as SAS_PHY_UNUSED and not clear
structure sas_dev as all the members of structure sas_dev will be
re-initialized after device found.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 0d762b3af2a5b5095fec18aa4d61f408638aa9ca)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: make local symbol host_attrs static
Wei Yongjun [Thu, 11 Jan 2018 11:13:58 +0000 (11:13 +0000)]
scsi: hisi_sas: make local symbol host_attrs static

BugLink: http://bugs.launchpad.net/bugs/1756094
Fixes the following sparse warning:

drivers/scsi/hisi_sas/hisi_sas_main.c:1691:25: warning:
 symbol 'host_attrs' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1e15feacb9d3743ca0b314a6daf8cc59c90b1046)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: Change frame type for SET MAX commands
chenxiang [Thu, 28 Dec 2017 10:20:47 +0000 (18:20 +0800)]
scsi: hisi_sas: Change frame type for SET MAX commands

BugLink: http://bugs.launchpad.net/bugs/1756094
According to ATA protocol, SET MAX commands belong to different frame
types. So judge features field of SET MAX commands to decide which
frame type they belongs to.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 468f4b8d0711146f0075513e6047079a26fc3903)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: add v3 hw suspend and resume
Xiang Chen [Fri, 8 Dec 2017 17:16:50 +0000 (01:16 +0800)]
scsi: hisi_sas: add v3 hw suspend and resume

BugLink: http://bugs.launchpad.net/bugs/1756094
For v3 hw SAS, it supports configuring power state from D0 to D3 for entering
Low Power status and power state from D3 to D0 for quit Low Power status.

When power state from D0 to D3, HW will send FLR to clear the registers of
ECAM and BAR space, and when power state from D3 to D0, it will clear the
registers of ECAM space only.

So when suspend, need to do like controller reset (including disable
interrupts/DQ/PHY/BUS), and also release slots after FLR. When resume,
re-config the registers of BAR space.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4d0951ee70d348b694ce2bbdcc65b684239da4b4)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: re-add the lldd_port_deformed()
Xiang Chen [Fri, 8 Dec 2017 17:16:49 +0000 (01:16 +0800)]
scsi: hisi_sas: re-add the lldd_port_deformed()

BugLink: http://bugs.launchpad.net/bugs/1756094
In function sas_suspend_devices(), it requires callback lldd_port_deformed
callback to be implemented if lldd_port_deformed is implemented.

So add a stub for lldd_port_deformed.

Callback lldd_port_deformed was not required as the port deformation is done
elsewhere in the LLDD.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(backported from commit 336bd78bdabf39dbcee6b41f9628c6e51d1c25b0)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: fix SAS_QUEUE_FULL problem while running IO
Xiang Chen [Fri, 8 Dec 2017 17:16:48 +0000 (01:16 +0800)]
scsi: hisi_sas: fix SAS_QUEUE_FULL problem while running IO

BugLink: http://bugs.launchpad.net/bugs/1756094
This patch fix SAS_QUEUE_FULL problem. The test situation is close port while
running IO.

In sas_eh_handle_sas_errors(), SCSI EH will free sas_task of the device if
lldd_I_T_nexus_reset() return TMF_RESP_FUNC_COMPLETE or -ENODEV.  But in our
SAS driver, we only free slots of the device when the return value is
TMF_RESP_FUNC_COMPLETE. So if the return value is -ENODEV, the slot resource
will not free any more.

As an solution, we should also free slots of the device in
lldd_I_T_nexus_reset() if the return value is -ENODEV.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 9960a24a1c96a40d6ab984ffefdd0e3003a3377e)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: add internal abort dev in some places
Xiaofei Tan [Fri, 8 Dec 2017 17:16:47 +0000 (01:16 +0800)]
scsi: hisi_sas: add internal abort dev in some places

BugLink: http://bugs.launchpad.net/bugs/1756094
We should do internal abort dev before TMF_ABORT_TASK_SET and TMF_LU_RESET.
Because we may only have done internal abort for single IO in the earlier part
of SCSI EH process. Even the internal abort to the single IO, we also don't
know whether it is successful.

Besides, we should release slots of the device in hisi_sas_abort_task_set() if
the abort is successful.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 2a03813123c4beb0b60be6b3b65a6b30f7124579)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: judge result of internal abort
Xiaofei Tan [Fri, 8 Dec 2017 17:16:46 +0000 (01:16 +0800)]
scsi: hisi_sas: judge result of internal abort

BugLink: http://bugs.launchpad.net/bugs/1756094
Normally, hardware should ensure that internal abort timeout will never
happen. If happen, it would be an SoC failure. What's more, HW will not
process any other commands if an internal abort hasn't return CQ, and they
will time out also.

So, we should judge the result of internal abort in SCSI EH, if it is failed,
we should give up to do TMF/softreset and return failure to the upper layer
directly.

This patch do following things to achieve this:

1. When internal abort timeout happened, we set return value to -EIO in
   hisi_sas_internal_task_abort().

2. If prep_abort() is not support, let hisi_sas_internal_task_abort() return
   TMF_RESP_FUNC_FAILED.

3. If hisi_sas_internal_task_abort() return an negative number, it can be
   thought that it not executed properly or internal abort timeout. Then we
   won't do behind TMF or softreset, and return failure directly.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 813709f2e1e07fa872c05f43801a05828d33a70a)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: do link reset for some CHL_INT2 ints
Xiaofei Tan [Fri, 8 Dec 2017 17:16:45 +0000 (01:16 +0800)]
scsi: hisi_sas: do link reset for some CHL_INT2 ints

BugLink: http://bugs.launchpad.net/bugs/1756094
We should do link reset of PHY when identify timeout or STP link timeout. They
are internal events of SOC and are notified to driver through interrupts of
CHL_INT2.

Besides, we should add an delay work to do link reset as it needs sleep. So,
this patch add an new PHY event HISI_PHYE_LINK_RESET for this.

Notes: v2 HW doesn't report the event of STP link timeout.  So, we only need
to handle event of identify timeout for v2 HW.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 057c3d1f07617049671a41bf05652d20071eb639)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: use an general way to delay PHY work
Xiaofei Tan [Fri, 8 Dec 2017 17:16:44 +0000 (01:16 +0800)]
scsi: hisi_sas: use an general way to delay PHY work

BugLink: http://bugs.launchpad.net/bugs/1756094
Use an general way to do delay work for a PHY. Then it will be easier to add
new delayed work for a PHY in future.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e537b62b0796042e1ab66657c4dab662d19e9f0b)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: add v2 hw port AXI error handling support
Xiaofei Tan [Fri, 8 Dec 2017 17:16:43 +0000 (01:16 +0800)]
scsi: hisi_sas: add v2 hw port AXI error handling support

BugLink: http://bugs.launchpad.net/bugs/1756094
Add port AXI errors handling for v2 hw. We do host controller reset for such
errors.

Besides, change port muli-bits ECC error handling, and we should also do host
reset for such error. So, this patch put them in the same struct with port AXI
error.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 72f7fc3050d55e9877ecc56f33b7a434fca186f5)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: improve int_chnl_int_v2_hw() consistency with v3 hw
Xiaofei Tan [Fri, 8 Dec 2017 17:16:42 +0000 (01:16 +0800)]
scsi: hisi_sas: improve int_chnl_int_v2_hw() consistency with v3 hw

BugLink: http://bugs.launchpad.net/bugs/1756094
Change code format of int_chnl_int_v2_hw() to be consistent with v3 hw to
reduce an tag indent.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f64715d2837bee8fcd71f3e13acc7f02c9e9d98a)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: add some print to enhance debugging
Xiang Chen [Fri, 8 Dec 2017 17:16:41 +0000 (01:16 +0800)]
scsi: hisi_sas: add some print to enhance debugging

BugLink: http://bugs.launchpad.net/bugs/1756094
Add some print at some places such as error info and cq of exception IO,
device found etc, and also adjust some log levels.

All this to assist debugging ability.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f1c88211454ff8063b358f9ebe250f0fe429319c)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: add RAS feature for v3 hw
Xiaofei Tan [Fri, 8 Dec 2017 17:16:40 +0000 (01:16 +0800)]
scsi: hisi_sas: add RAS feature for v3 hw

BugLink: http://bugs.launchpad.net/bugs/1756094
We use PCIe AER to support RAS feature for v3 hw.  This driver should do
following two things to support this:

1. Enable RAS interrupts, so that errors can be reported to RAS module.

2. Realize err_handler for sas_v3_pci_driver. Then if non-fatal error is
   detected, print error source and try to recover SAS controller.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1aaf81e0e34988ff56b317b568f92fe6ca447da2)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: change ncq process for v3 hw
Xiang Chen [Fri, 8 Dec 2017 17:16:39 +0000 (01:16 +0800)]
scsi: hisi_sas: change ncq process for v3 hw

BugLink: http://bugs.launchpad.net/bugs/1756094
For v3 hw, each NCQ will return a CQ, so it is no need to acquire IPTT from
ITCT, just acquire it from IPTT field of CQ.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 9f347b2face51d782d1e03f2f05b7c3f93a6dc9a)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: add an mechanism to do reset work synchronously
Xiaofei Tan [Fri, 8 Dec 2017 17:16:38 +0000 (01:16 +0800)]
scsi: hisi_sas: add an mechanism to do reset work synchronously

BugLink: http://bugs.launchpad.net/bugs/1756094
Sometimes it is required to know when the controller reset has completed and
also if it has completed successfully.  For such places, we call
hisi_sas_controller_reset() directly before. That may lead to multiple calls
to this function.

This patch create a per-reset structure which contains a completion structure
and status flag to know when the reset completes and also the status. It is
also in hisi_hba.wq to do reset work.

As all host reset works are done in hisi_hba.wq, we don't worry multiple calls
to hisi_sas_controller_reset().

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e402acdb664134f948b62d13b7db866295689f38)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: modify hisi_sas_dev_gone() for reset
Xiang Chen [Fri, 8 Dec 2017 17:16:37 +0000 (01:16 +0800)]
scsi: hisi_sas: modify hisi_sas_dev_gone() for reset

BugLink: http://bugs.launchpad.net/bugs/1756094
Do a couple of changes for when HISI_SAS_RESET_BIT is set for HBA:

 - Clearing ITCT is not necessary

 - Remove internal abort as it will fail during reset

Flag sas_dev->dev_type is kept as SAS_PHY_UNUSED.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f8e45ec226e2c00c1da9cf156ea59a159e9b4ea6)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: some optimizations of host controller reset
Xiaofei Tan [Fri, 8 Dec 2017 17:16:36 +0000 (01:16 +0800)]
scsi: hisi_sas: some optimizations of host controller reset

BugLink: http://bugs.launchpad.net/bugs/1756094
This patch do following optimizations to host controller reset:

1. Unblock scsi requests before rescanning topology, as SCSI command need be
   used if new device is found during rescanning topology.

2. Remove drain_workqueue(hisi_hba->wq) and drain_workqueue(shost->work_q), as
   there is no need to ensure that all PHYs event are done before exiting host
   reset.

3. Improve message print level of host reset. Host reset is an important and
   very few occurrence event. We should know its progress even when not
   debugging.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit fb51e7a8d38484687337f16636c5be9528e00fed)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: optimise port id refresh function
Xiaofei Tan [Fri, 8 Dec 2017 17:16:35 +0000 (01:16 +0800)]
scsi: hisi_sas: optimise port id refresh function

BugLink: http://bugs.launchpad.net/bugs/1756094
Currently refreshing the PHY port id after reset is done in the rescan
topology function, which is quite late in the reset process. It could be moved
earlier in the process, as the port id can be refreshed once the PHYs become
ready.

In addition to this, we should set the hisi_sas_dev port id to 0xff (invalid
port id) if all PHYs of this port remain down for the same device.

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a669bdbf4939ac72eff6b3ae33f771a1ef28448c)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: relocate clearing ITCT and freeing device
Xiaofei Tan [Fri, 8 Dec 2017 17:16:34 +0000 (01:16 +0800)]
scsi: hisi_sas: relocate clearing ITCT and freeing device

BugLink: http://bugs.launchpad.net/bugs/1756094
In certain scenarios we may just want to clear the ITCT for a device, and not
free other resources like the SATA bitmap using in v2 hw.

To facilitate this, this patch relocates the code of clearing ITCT from
free_device() to a new hw interface clear_itct().  Then for some hw, we should
not realise free_device() if there's nothing left to do for it.

[mkp: typo]

Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 0258141aaab3007949ba0e67c3d28436354429bb)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: ata: enhance the definition of SET MAX feature field value
chenxiang [Thu, 28 Dec 2017 10:20:46 +0000 (18:20 +0800)]
scsi: ata: enhance the definition of SET MAX feature field value

BugLink: http://bugs.launchpad.net/bugs/1756094
There are two other values for SET MAX feature field according to ata
protocol. So definite them.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit d5c15c2c22a8d4e0e82ca95eac5a6ccd175c0762)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: hisi_sas: fix dma_unmap_sg() parameter
Xiang Chen [Fri, 8 Dec 2017 17:16:33 +0000 (01:16 +0800)]
scsi: hisi_sas: fix dma_unmap_sg() parameter

BugLink: http://bugs.launchpad.net/bugs/1756094
For function dma_unmap_sg(), the <nents> parameter should be number of
elements in the scatterlist prior to the mapping, not after the mapping.

Fix this usage.

Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit dc1e4730e2b636065628f8427b675788bca83d34)
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoUBUNTU: [Config] set NOBP and expoline options for s390
Seth Forshee [Fri, 16 Mar 2018 15:28:34 +0000 (10:28 -0500)]
UBUNTU: [Config] set NOBP and expoline options for s390

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agos390/entry.S: fix spurious zeroing of r0
Christian Borntraeger [Mon, 5 Mar 2018 19:18:47 +0000 (19:18 +0000)]
s390/entry.S: fix spurious zeroing of r0

BugLink: http://bugs.launchpad.net/bugs/1754580
when a system call is interrupted we might call the critical section
cleanup handler that re-does some of the operations. When we are between
.Lsysc_vtime and .Lsysc_do_svc we might also redo the saving of the
problem state registers r0-r7:

.Lcleanup_system_call:
[...]
0:      # update accounting time stamp
        mvc     __LC_LAST_UPDATE_TIMER(8),__LC_SYNC_ENTER_TIMER
        # set up saved register r11
        lg      %r15,__LC_KERNEL_STACK
        la      %r9,STACK_FRAME_OVERHEAD(%r15)
        stg     %r9,24(%r11)            # r11 pt_regs pointer
        # fill pt_regs
        mvc     __PT_R8(64,%r9),__LC_SAVE_AREA_SYNC
--->    stmg    %r0,%r7,__PT_R0(%r9)

The problem is now, that we might have already zeroed out r0.
The fix is to move the zeroing of r0 after sysc_do_svc.

Reported-by: Farhan Ali <alifm@linux.vnet.ibm.com>
Fixes: 7041d28115e91 ("s390: scrub registers on kernel entry and KVM exit")
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit d3f468963cd6fd6d2aa5e26aed8b24232096d0e1)

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agos390: do not bypass BPENTER for interrupt system calls
Martin Schwidefsky [Thu, 22 Feb 2018 12:42:29 +0000 (13:42 +0100)]
s390: do not bypass BPENTER for interrupt system calls

BugLink: http://bugs.launchpad.net/bugs/1754580
The system call path can be interrupted before the switch back to the
standard branch prediction with BPENTER has been done. The critical
section cleanup code skips forward to .Lsysc_do_svc and bypasses the
BPENTER. In this case the kernel and all subsequent code will run with
the limited branch prediction.

Fixes: eacf67eb9b32 ("s390: run user space and KVM guests with modified branch prediction")
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit d5feec04fe578c8dbd9e2e1439afc2f0af761ed4)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agos390: Replace IS_ENABLED(EXPOLINE_*) with IS_ENABLED(CONFIG_EXPOLINE_*)
Eugeniu Rosca [Sat, 17 Feb 2018 23:10:29 +0000 (00:10 +0100)]
s390: Replace IS_ENABLED(EXPOLINE_*) with IS_ENABLED(CONFIG_EXPOLINE_*)

BugLink: http://bugs.launchpad.net/bugs/1754580
I've accidentally stumbled upon the IS_ENABLED(EXPOLINE_*) lines, which
obviously always evaluate to false. Fix this.

Fixes: f19fbd5ed642 ("s390: introduce execute-trampolines for branches")
Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit 2cb370d615e9fbed9e95ed222c2c8f337181aa90)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agos390: introduce execute-trampolines for branches
Martin Schwidefsky [Fri, 26 Jan 2018 11:46:47 +0000 (12:46 +0100)]
s390: introduce execute-trampolines for branches

BugLink: http://bugs.launchpad.net/bugs/1754580
Add CONFIG_EXPOLINE to enable the use of the new -mindirect-branch= and
-mfunction_return= compiler options to create a kernel fortified against
the specte v2 attack.

With CONFIG_EXPOLINE=y all indirect branches will be issued with an
execute type instruction. For z10 or newer the EXRL instruction will
be used, for older machines the EX instruction. The typical indirect
call

basr %r14,%r1

is replaced with a PC relative call to a new thunk

brasl %r14,__s390x_indirect_jump_r1

The thunk contains the EXRL/EX instruction to the indirect branch

__s390x_indirect_jump_r1:
exrl 0,0f
j .
0: br %r1

The detour via the execute type instruction has a performance impact.
To get rid of the detour the new kernel parameter "nospectre_v2" and
"spectre_v2=[on,off,auto]" can be used. If the parameter is specified
the kernel and module code will be patched at runtime.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit f19fbd5ed642dc31c809596412dab1ed56f2f156)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agos390: run user space and KVM guests with modified branch prediction
Martin Schwidefsky [Tue, 16 Jan 2018 06:36:46 +0000 (07:36 +0100)]
s390: run user space and KVM guests with modified branch prediction

BugLink: http://bugs.launchpad.net/bugs/1754580
Define TIF_ISOLATE_BP and TIF_ISOLATE_BP_GUEST and add the necessary
plumbing in entry.S to be able to run user space and KVM guests with
limited branch prediction.

To switch a user space process to limited branch prediction the
s390_isolate_bp() function has to be call, and to run a vCPU of a KVM
guest associated with the current task with limited branch prediction
call s390_isolate_bp_guest().

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit 6b73044b2b0081ee3dd1cd6eaab7dee552601efb)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agos390: add options to change branch prediction behaviour for the kernel
Martin Schwidefsky [Tue, 16 Jan 2018 06:11:45 +0000 (07:11 +0100)]
s390: add options to change branch prediction behaviour for the kernel

BugLink: http://bugs.launchpad.net/bugs/1754580
Add the PPA instruction to the system entry and exit path to switch
the kernel to a different branch prediction behaviour. The instructions
are added via CPU alternatives and can be disabled with the "nospec"
or the "nobp=0" kernel parameter. If the default behaviour selected
with CONFIG_KERNEL_NOBP is set to "n" then the "nobp=1" parameter can be
used to enable the changed kernel branch prediction.

Acked-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit d768bd892fc8f066cd3aa000eb1867bcf32db0ee)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agos390/alternative: use a copy of the facility bit mask
Martin Schwidefsky [Tue, 16 Jan 2018 06:03:44 +0000 (07:03 +0100)]
s390/alternative: use a copy of the facility bit mask

BugLink: http://bugs.launchpad.net/bugs/1754580
To be able to switch off specific CPU alternatives with kernel parameters
make a copy of the facility bit mask provided by STFLE and use the copy
for the decision to apply an alternative.

Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit cf1489984641369611556bf00c48f945c77bcf02)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agos390: add optimized array_index_mask_nospec
Martin Schwidefsky [Fri, 26 Jan 2018 11:01:55 +0000 (12:01 +0100)]
s390: add optimized array_index_mask_nospec

BugLink: http://bugs.launchpad.net/bugs/1754580
Add an optimized version of the array_index_mask_nospec function for
s390 based on a compare and a subtract with borrow.

Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit e2dd833389cc4069a96b57bdd24227b5f52288f5)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agos390: scrub registers on kernel entry and KVM exit
Martin Schwidefsky [Tue, 16 Jan 2018 12:27:30 +0000 (13:27 +0100)]
s390: scrub registers on kernel entry and KVM exit

BugLink: http://bugs.launchpad.net/bugs/1754580
Clear all user space registers on entry to the kernel and all KVM guest
registers on KVM guest exit if the register does not contain either a
parameter or a result value.

Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit 7041d28115e91f2144f811ffe8a195c696b1e1d0)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoUBUNTU: SAUCE: s390/crypto: Fix kernel crash on aes_s390 module remove.
Harald Freudenberger [Thu, 1 Mar 2018 15:40:00 +0000 (16:40 +0100)]
UBUNTU: SAUCE: s390/crypto: Fix kernel crash on aes_s390 module remove.

BugLink: http://bugs.launchpad.net/bugs/1753424
A kernel crash occurs when the aes_s390 kernel module is
removed on machines < z14. This only happens on kernel
version 4.15 and higher on machines not supporting MSA 8.

The reason for the crash is a unconditional
crypto_unregister_aead() invocation where no previous
crypto_register_aead() had been called. The fix now
remembers if there has been a successful registration and
only then calls the unregister function upon kernel module
remove.

The code now crashing has been introduced with
"bf7fa03 s390/crypto: add s390 platform specific aes gcm support."

Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoUBUNTU: [Config] fix up retpoline abi files
Seth Forshee [Fri, 16 Mar 2018 12:47:51 +0000 (07:47 -0500)]
UBUNTU: [Config] fix up retpoline abi files

Differences fall into three categories:

 - Removed vulnerable sites.

 - Changes in pv_*_ops constant offsets.

 - Changes within an __init function.

All of these changes are safe.

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Add missing unlock in WQ full logic
James Smart [Sat, 10 Mar 2018 18:28:48 +0000 (10:28 -0800)]
scsi: lpfc: Add missing unlock in WQ full logic

BugLink: http://bugs.launchpad.net/bugs/1752182
Commit 6e8e1c14c61e ("scsi: lpfc: Add WQ Full Logic for NVME Target") fails
the static checker. Checker correctly identified a missing unlock on a
return path.

Add the unlock.

Fixes: 6e8e1c14c61e ("scsi: lpfc: Add WQ Full Logic for NVME Target")
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 917d59ac5e26a45dce8b00d38bb0a338a7f22f23 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: use __raw_writeX on DPP copies
James Smart [Mon, 5 Mar 2018 18:29:03 +0000 (10:29 -0800)]
scsi: lpfc: use __raw_writeX on DPP copies

BugLink: http://bugs.launchpad.net/bugs/1752182
Commit 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4")
fails compilation on some 32-bit systems as writeq() is not supported on
all architectures. Additionally, it was pointed out that as writeX()
does byteswapping if necessary for pci vs the cpu endianness, the code
was broken on BE PPC.

After discussions with Arnd Bergmann, we've resolved the issue
to the following:
  Instead of writeX(), use __raw_writeX() - which writes to io
    space while preserving byte order. To use this, the code
    was changed to use a different buffer that lpfc prepped
    via sli_pcimem_bcopy() that was set to the bytestream to
    be written.
  On platforms with __raw_writeq support, use the routine, otherwise
    use __raw_writel()

[mkp: checkpatch]

Fixes: 1351e69fc6db ("scsi: lpfc: Add push-to-adapter support to sli4")
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4c06619fc4da5b7aae76f1dde25bfea3246f2591 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Change Copyright of 12.0.0.0 modified files to 2018
James Smart [Thu, 22 Feb 2018 16:18:52 +0000 (08:18 -0800)]
scsi: lpfc: Change Copyright of 12.0.0.0 modified files to 2018

BugLink: http://bugs.launchpad.net/bugs/1752182
Updated Copyright in files updated as part of 12.0.0.0

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit cf8037f8d08a078d263a9b725e3ae7603ad0d42e linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: update driver version to 12.0.0.0
James Smart [Thu, 22 Feb 2018 16:18:51 +0000 (08:18 -0800)]
scsi: lpfc: update driver version to 12.0.0.0

BugLink: http://bugs.launchpad.net/bugs/1752182
Update the driver version to 12.0.0.0

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6efb23804153fac93bc49ef6e76707f35fbe2163 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Work around NVME cmd iu SGL type
James Smart [Thu, 22 Feb 2018 16:18:50 +0000 (08:18 -0800)]
scsi: lpfc: Work around NVME cmd iu SGL type

BugLink: http://bugs.launchpad.net/bugs/1752182
The hardware offload for NVME commands was created when the
FC-NVME standard was setting SGL Descriptor Type to SGL Data
Block Descriptor (0h) and SGL Descriptor Sub Type to Address (0h).

A late change in NVMe-over-Fabrics obsoleted these values, creating
a transport SGL descriptor type with new values to go into these
fields.

For initial hardware support, in order to be compliant to the spec,
use host-supplied cmd IU buffers instead of the adapter generated
values. Later hardware will correct this.

Add a module parameter to override this offload disablement if looking
for lowest latency. This is reasonable as nothing in FC-NVME uses
the SQE SGL values.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4e565cf04138fca6ffeb884044febf922b2306d0 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix nvme embedded io length on new hardware
James Smart [Thu, 22 Feb 2018 16:18:49 +0000 (08:18 -0800)]
scsi: lpfc: Fix nvme embedded io length on new hardware

BugLink: http://bugs.launchpad.net/bugs/1752182
Newer hardware more strictly enforces buffer lenghts, causing an
mis-set value to be identified. Older hardware won't catch it.
The difference is benign on old hardware.

Set the right embedded buffer length for nvme ios.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 63452e144662a90b77fcdb27bd33c8b43655b850 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Add embedded data pointers for enhanced performance
James Smart [Thu, 22 Feb 2018 16:18:48 +0000 (08:18 -0800)]
scsi: lpfc: Add embedded data pointers for enhanced performance

BugLink: http://bugs.launchpad.net/bugs/1752182
The current driver isn't taking advantage of a performance hint whereby
the initial data buffer descriptor can be placed in the WQE as well as
the SGL.

Add the logic to detect support for the feature and to use it when
supported.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 0bc2b7c5317bd51df571e9d1131547901215f6c9 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Enable fw download on if_type=6 devices
James Smart [Thu, 22 Feb 2018 16:18:47 +0000 (08:18 -0800)]
scsi: lpfc: Enable fw download on if_type=6 devices

BugLink: http://bugs.launchpad.net/bugs/1752182
Current code is very explicit in what it allows to be downloaded.
The driver checking prevented G7 firmware download. The driver
checking is unnecessary as the device will validate what it receives.

Revise the firmware download interface checking.
Added a little debug support in case there is still a failure.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1feb8204a12ed7987bffa75311754edc1367680f linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Add if_type=6 support for cycling valid bits
James Smart [Thu, 22 Feb 2018 16:18:46 +0000 (08:18 -0800)]
scsi: lpfc: Add if_type=6 support for cycling valid bits

BugLink: http://bugs.launchpad.net/bugs/1752182
Traditional SLI4 required the driver to clear Valid bits on
EQEs and CQEs after consuming them.

The new if_type=6 hardware will cycle the value for what is
valid on each queue itteration. The driver no longer has to
touch the valid bits. This also means all the cpu cache
dirtying and perhaps flush/refill's done by the hardware
in accessing the EQ/CQ elements is eliminated.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 7365f6fdbba559f7e814519fafe6e4956f68b6be linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Add 64G link speed support
James Smart [Thu, 22 Feb 2018 16:18:45 +0000 (08:18 -0800)]
scsi: lpfc: Add 64G link speed support

BugLink: http://bugs.launchpad.net/bugs/1752182
The G7 adapter supports 64G link speeds. Add support to the driver.

In addition, a small cleanup to replace the odd bitmap logic with
a switch case.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit fbd8a6ba65443a8a79183edd9c2e1ad302339063 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Add PCI Ids for if_type=6 hardware
James Smart [Thu, 22 Feb 2018 16:18:44 +0000 (08:18 -0800)]
scsi: lpfc: Add PCI Ids for if_type=6 hardware

BugLink: http://bugs.launchpad.net/bugs/1752182
Add PCI ids for the new G7 adapter

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c238b9b6eae399e81d36382b09c2e969c154b7ee linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Add push-to-adapter support to sli4
James Smart [Thu, 22 Feb 2018 16:18:43 +0000 (08:18 -0800)]
scsi: lpfc: Add push-to-adapter support to sli4

BugLink: http://bugs.launchpad.net/bugs/1752182
New if_type=6 adapters support an additional BAR that provides
apertures to allow direct WQE to adapter push support - termed
Direct Packet Push (DPP). WQ creation differs slightly to ask for
a WQ to be DPP-ized. When submitting a WQE to a DPP WQ, it is
submitted to the host memory for the WQ normally, but is also
written by the host cpu directly to a BAR aperture.  Write buffer
coalescing in hardware is (hopefully) turned on, enabling single
pci write operation support. The doorbell is thing rung to indicate
the WQE is available and was pushed to the aperture.

This patch:
- Updates the WQ Create commands for the DPP options
- Adds the bar mapping for if_type=6 DPP bar
- Adds the WQE pushing to the DDP aperture received from WQ create
- Adds a new module parameter to disable DPP operation if desired.
  Default is enabled.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 1351e69fc6db30e186295f1c9495d03cef6a01a2 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Add SLI-4 if_type=6 support to the code base
James Smart [Thu, 22 Feb 2018 16:18:42 +0000 (08:18 -0800)]
scsi: lpfc: Add SLI-4 if_type=6 support to the code base

BugLink: http://bugs.launchpad.net/bugs/1752182
New hardware supports a SLI-4 interface, but with a new if_type
variant of 6.

If_type=6 has a different PCI BAR map, separate EQ/CQ doorbells,
and some changes in doorbell formats.

Add the changes for the if_type into headers, adapter initialization
and control flows. Add new eq and cq handlers.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 27d6ac0a6e830043bd5db89fee8adddb41ada2f7 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Rework sli4 doorbell infrastructure
James Smart [Thu, 22 Feb 2018 16:18:41 +0000 (08:18 -0800)]
scsi: lpfc: Rework sli4 doorbell infrastructure

BugLink: http://bugs.launchpad.net/bugs/1752182
Up until now, all SLI-4 devices had the same doorbells at the same
bar locations. With newer hardware, there are now independent EQ and
CQ doorbells and the bar locations differ.

Prepare the code for new hardware by separating the eq/cq doorbell into
separate components. The components can be set based on if_type.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 9dd35425a50c667ae2b6c2cda201425ed2d3fd25 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Rework lpfc to allow different sli4 cq and eq handlers
James Smart [Thu, 22 Feb 2018 16:18:40 +0000 (08:18 -0800)]
scsi: lpfc: Rework lpfc to allow different sli4 cq and eq handlers

BugLink: http://bugs.launchpad.net/bugs/1752182
Up until now, an SLI-4 device had no variance in the way it handled
its EQs and CQs. With newer hardware, there are now differences in
doorbells and some differences in how entries are valid.

Prepare the code for new hardware by creating a sli4-based callout
table that can be set based on if_type.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b71413dd01bbf302236cfb61df44702ea838dd75 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Update 11.4.0.7 modified files for 2018 Copyright
James Smart [Tue, 30 Jan 2018 23:59:03 +0000 (15:59 -0800)]
scsi: lpfc: Update 11.4.0.7 modified files for 2018 Copyright

BugLink: http://bugs.launchpad.net/bugs/1752182
Updated Copyright in files updated 11.4.0.7

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 128bddacc4dd7c86070e1e0534687e3083a89d52 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: update driver version to 11.4.0.7
James Smart [Tue, 30 Jan 2018 23:59:02 +0000 (15:59 -0800)]
scsi: lpfc: update driver version to 11.4.0.7

BugLink: http://bugs.launchpad.net/bugs/1752182
Update the driver version to 11.4.0.7

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6e9d2f1667ea12bd2f997a7529fb41cce8e0036d linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix nonrecovery of NVME controller after cable swap.
James Smart [Tue, 30 Jan 2018 23:59:01 +0000 (15:59 -0800)]
scsi: lpfc: Fix nonrecovery of NVME controller after cable swap.

BugLink: http://bugs.launchpad.net/bugs/1752182
In a test that is doing large numbers of cable swaps on the target, the
nvme controllers wouldn't reconnect.

During the cable swaps, the targets n_port_id would change. This
information was passed to the nvme-fc transport, in the new remoteport
registration. However, the nvme-fc transport didn't update the n_port_id
value in the remoteport struct when it reused an existing structure.
Later, when a new association was attempted on the remoteport, the
driver's NVME LS routine would use the stale n_port_id from the
remoteport struct to address the LS. As the device is no longer at that
address, the LS would go into never never land.

Separately, the nvme-fc transport will be corrected to update the
n_port_id value on a re-registration.

However, for now, there's no reason to use the transports values.  The
private pointer points to the drivers node structure and the node
structure is up to date. Therefore, revise the LS routine to use the
drivers data structures for the LS. Augmented the debug message for
better debugging in the future.

Also removed a duplicate if check that seems to have slipped in.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 815a9c437617e221842d12b3366ff6911b3df628 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Treat SCSI Write operation Underruns as an error
James Smart [Tue, 30 Jan 2018 23:59:00 +0000 (15:59 -0800)]
scsi: lpfc: Treat SCSI Write operation Underruns as an error

BugLink: http://bugs.launchpad.net/bugs/1752182
Currently, write underruns (mismatch of amount transferred vs scsi
status and its residual) detected by the adapter are not being flagged
as an error. Its expected the target controls the data transfer and
would appropriately set the RSP values.  Only read underruns are treated
as errors.

Revise the SCSI error handling to treat write underruns as an error as
well.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 45634a86ca6e98dbcaddb763f8e90ad243057789 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix header inclusion in lpfc_nvmet
James Smart [Tue, 30 Jan 2018 23:58:59 +0000 (15:58 -0800)]
scsi: lpfc: Fix header inclusion in lpfc_nvmet

BugLink: http://bugs.launchpad.net/bugs/1752182
The driver was inappropriately pulling in the nvme host's nvme.h
header. What it really needed was the standard <linux/nvme.h> header.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8d731d1aa993c44fcf4de0dbd42059e00cf37102 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Validate adapter support for SRIU option
James Smart [Tue, 30 Jan 2018 23:58:58 +0000 (15:58 -0800)]
scsi: lpfc: Validate adapter support for SRIU option

BugLink: http://bugs.launchpad.net/bugs/1752182
When using the special option to suppress the response iu, ensure the
adapter fully supports the feature by checking feature flags from the
adapter and validating the support when formatting the WQE.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 20aefac3a9a23b56db43f1fe1b3ae72c87e39137 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix SCSI io host reset causing kernel crash
James Smart [Tue, 30 Jan 2018 23:58:57 +0000 (15:58 -0800)]
scsi: lpfc: Fix SCSI io host reset causing kernel crash

BugLink: http://bugs.launchpad.net/bugs/1752182
During SCSI error handling escalation to host reset, the SCSI io
routines were moved off the txcmplq, but the individual io's ON_CMPLQ
flag wasn't cleared.  Thus, a background thread saw the io and attempted
to access it as if on the txcmplq.

Clear the flag upon removal.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c1dd9111b7f78a90bccd2e4abb9b9bb6319a4c64 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Indicate CONF support in NVMe PRLI
James Smart [Tue, 30 Jan 2018 23:58:56 +0000 (15:58 -0800)]
scsi: lpfc: Indicate CONF support in NVMe PRLI

BugLink: http://bugs.launchpad.net/bugs/1752182
Revise the NVME PRLI to indicate CONF support.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a5ff06817eb86d022bc11993850a42732d7e6979 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix issue_lip if link is disabled
James Smart [Tue, 30 Jan 2018 23:58:55 +0000 (15:58 -0800)]
scsi: lpfc: Fix issue_lip if link is disabled

BugLink: http://bugs.launchpad.net/bugs/1752182
The driver ignored checks on whether the link should be kept
administratively down after a link bounce. Correct the checks.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 2289e9598dde9705400559ca2606fb8c145c34f0 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix soft lockup in lpfc worker thread during LIP testing
James Smart [Tue, 30 Jan 2018 23:58:54 +0000 (15:58 -0800)]
scsi: lpfc: Fix soft lockup in lpfc worker thread during LIP testing

BugLink: http://bugs.launchpad.net/bugs/1752182
During link bounce testing in a point-to-point topology, the host may
enter a soft lockup on the lpfc_worker thread:

    Call Trace:
     lpfc_work_done+0x1f3/0x1390 [lpfc]
     lpfc_do_work+0x16f/0x180 [lpfc]
     kthread+0xc7/0xe0
     ret_from_fork+0x3f/0x70

The driver was simultaneously setting a combination of flags that caused
lpfc_do_work()to effectively spin between slow path work and new event
data, causing the lockup.

Ensure in the typical wq completions, that new event data flags are set
if the slow path flag is running. The slow path will eventually
reschedule the wq handling.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 161df4f09987ae2e9f0f97f0b38eee298b4a39ff linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Allow set of maximum outstanding SCSI cmd limit for a target
James Smart [Tue, 30 Jan 2018 23:58:53 +0000 (15:58 -0800)]
scsi: lpfc: Allow set of maximum outstanding SCSI cmd limit for a target

BugLink: http://bugs.launchpad.net/bugs/1752182
Make the attribute writeable.

Remove the ramp up to logic as its unnecessary, simply set depth.  Add
debug message if depth changed, possibly reducing limit, yet our
outstanding count has yet to catch up with it.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 64bf009933bc84a7fb44ff50f86af0201b8be0c3 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix RQ empty firmware trap
James Smart [Tue, 30 Jan 2018 23:58:52 +0000 (15:58 -0800)]
scsi: lpfc: Fix RQ empty firmware trap

BugLink: http://bugs.launchpad.net/bugs/1752182
When nvme target deferred receive logic waits for exchange resources,
the corresponding receive buffer is not replenished with the hardware.
This can result in a lack of asynchronous receive buffer resources in
the hardware, resulting in a "2885 Port Status Event: ... error
1=0x52004a01 ..." message.

Correct by replenishing the buffer whenenver the deferred logic kicks
in.  Update corresponding debug messages and statistics as well.

[mkp: applied by hand]

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 411de511c6943554cdc4173c3f522029db2f75c7 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix IO failure during hba reset testing with nvme io.
James Smart [Tue, 30 Jan 2018 23:58:51 +0000 (15:58 -0800)]
scsi: lpfc: Fix IO failure during hba reset testing with nvme io.

BugLink: http://bugs.launchpad.net/bugs/1752182
A stress test repeatedly resetting the adapter while performing io would
eventually report I/O failures and missing nvme namespaces.

The driver was setting the nvmefc_fcp_req->private pointer to NULL
during the IO completion routine before upcalling done().  If the
transport was also running an abort for that IO, the driver would fail
the abort with message 6140. Failing the abort is not allowed by the
nvme-fc transport, as it mandates that the io must be returned back to
the transport. As that does not happen, the transport controller delete
has an outstanding reference and can't complete teardown.

The NULL-ing of the private pointer should be done only when the io is
considered complete. It's complete when the adapter returns the exchange
with the "exchange busy" flag clear.

Move the NULL'ing of the structure to the done case. This leaves the io
contexts set while it is busy and until the subsequent XRI_ABORTED
completion which returns the exchange is received.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 91455b850956bc13708a074bd1400f54aae74890 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix PRLI handling when topology type changes
James Smart [Tue, 30 Jan 2018 23:58:50 +0000 (15:58 -0800)]
scsi: lpfc: Fix PRLI handling when topology type changes

BugLink: http://bugs.launchpad.net/bugs/1752182
The lpfc driver does not discover a target when the topology changes
from switched-fabric to direct-connect. The target rejects the PRLI from
the initiator in direct-connect as the driver is using the old S_ID from
the switched topology.

The driver was inappropriately clearing the VP bit to register the VPI,
which is what is associated with the S_ID.

Fix by leaving the VP bit set (it was set earlier) and as the VFI is
being re-registered, set the UPDT bit.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 2c3b2a8f652566c5b35d945f0c8146555d2062ec linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Add WQ Full Logic for NVME Target
James Smart [Tue, 30 Jan 2018 23:58:49 +0000 (15:58 -0800)]
scsi: lpfc: Add WQ Full Logic for NVME Target

BugLink: http://bugs.launchpad.net/bugs/1752182
I/O conditions on the nvme target may have the driver submitting to a
full hardware wq. The hardware wq is a shared resource among all nvme
controllers. When the driver hit a full wq, it failed the io posting
back to the nvme-fc transport, which then escalated it into errors.

Correct by maintaining a sideband queue within the driver that is added
to when the WQ full condition is hit, and drained from as soon as new WQ
space opens up.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 6e8e1c14c61e54253098521127cd5ac0b959dd32 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: correct debug counters for abort
James Smart [Tue, 30 Jan 2018 23:58:48 +0000 (15:58 -0800)]
scsi: lpfc: correct debug counters for abort

BugLink: http://bugs.launchpad.net/bugs/1752182
Existing code was using the wrong field for the completion status when
comparing whether to increment abort statistics

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8ae337013674d5c1e803429356b85cba2ce12067 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: move placement of target destroy on driver detach
James Smart [Tue, 30 Jan 2018 23:58:47 +0000 (15:58 -0800)]
scsi: lpfc: move placement of target destroy on driver detach

BugLink: http://bugs.launchpad.net/bugs/1752182
Ensure nvme localports/targetports are torn down before dismantling the
adapter sli interface on driver detachment.  This aids leaving
interfaces live while nvme may be making callbacks to abort it.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 281d61902ffbab47901f8616a38a45144627dd9e linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Increase CQ and WQ sizes for SCSI
James Smart [Tue, 30 Jan 2018 23:58:46 +0000 (15:58 -0800)]
scsi: lpfc: Increase CQ and WQ sizes for SCSI

BugLink: http://bugs.launchpad.net/bugs/1752182
Increased CQ and WQ sizes for SCSI FCP, matching those used for NVMe
development.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c176ffa0841c632593c5007f1d1c9ed126481daa linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix frequency of Release WQE CQEs
James Smart [Tue, 30 Jan 2018 23:58:45 +0000 (15:58 -0800)]
scsi: lpfc: Fix frequency of Release WQE CQEs

BugLink: http://bugs.launchpad.net/bugs/1752182
The driver controls when the hardware sends completions that communicate
consumption of elements from the WQ. This is done by setting a WQEC bit
on a WQE.

The current driver sets it on every Nth WQE posting. However, the driver
isn't clearing the bit if the WQE is reused. Thus, if the queue depth
isn't evenly divisible by N, with enough time, it can be set on every
element, creating a lot of overhead and risking CQ full conditions.

Correct by clearing the bit when not setting it on an Nth element.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 04673e38f56b30cd39b1fa0f386137d818b17781 linux-next)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agotreewide: Use DEVICE_ATTR_WO
Joe Perches [Tue, 19 Dec 2017 18:15:09 +0000 (10:15 -0800)]
treewide: Use DEVICE_ATTR_WO

BugLink: http://bugs.launchpad.net/bugs/1752182
Convert DEVICE_ATTR uses to DEVICE_ATTR_WO where possible.

Done with perl script:

$ git grep -w --name-only DEVICE_ATTR | \
  xargs perl -i -e 'local $/; while (<>) { s/\bDEVICE_ATTR\s*\(\s*(\w+)\s*,\s*\(?(?:\s*S_IWUSR\s*|\s*0200\s*)\)?\s*,\s*NULL\s*,\s*\s_store\s*\)/DEVICE_ATTR_WO(\1)/g; print;}'

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6cbaefb4bf2ce6746e49c972289702133b347ffa)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agotreewide: Use DEVICE_ATTR_RO
Joe Perches [Tue, 19 Dec 2017 18:15:08 +0000 (10:15 -0800)]
treewide: Use DEVICE_ATTR_RO

BugLink: http://bugs.launchpad.net/bugs/1752182
Convert DEVICE_ATTR uses to DEVICE_ATTR_RO where possible.

Done with perl script:

$ git grep -w --name-only DEVICE_ATTR | \
  xargs perl -i -e 'local $/; while (<>) { s/\bDEVICE_ATTR\s*\(\s*(\w+)\s*,\s*\(?(?:\s*S_IRUGO\s*|\s*0444\s*)\)?\s*,\s*\1_show\s*,\s*NULL\s*\)/DEVICE_ATTR_RO(\1)/g; print;}'

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Robert Jarzmik <robert.jarzmik@free.fr>
Acked-by: Sagi Grimberg <sagi@grimberg.me>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Harald Freudenberger <freude@linux.vnet.ibm.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Acked-by: Corey Minyard <cminyard@mvista.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c828a8920307185b7194b575731e8387c99a5a67)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agotreewide: Use DEVICE_ATTR_RW
Joe Perches [Tue, 19 Dec 2017 18:15:07 +0000 (10:15 -0800)]
treewide: Use DEVICE_ATTR_RW

BugLink: http://bugs.launchpad.net/bugs/1752182
Convert DEVICE_ATTR uses to DEVICE_ATTR_RW where possible.

Done with perl script:

$ git grep -w --name-only DEVICE_ATTR | \
  xargs perl -i -e 'local $/; while (<>) { s/\bDEVICE_ATTR\s*\(\s*(\w+)\s*,\s*\(?(\s*S_IRUGO\s*\|\s*S_IWUSR|\s*S_IWUSR\s*\|\s*S_IRUGO\s*|\s*0644\s*)\)?\s*,\s*\1_show\s*,\s*\1_store\s*\)/DEVICE_ATTR_RW(\1)/g; print;}'

Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Felipe Balbi <felipe.balbi@linux.intel.com>
Acked-by: Andy Shevchenko <andy.shevchenko@gmail.com>
Acked-by: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Jarkko Nikula <jarkko.nikula@bitmer.com>
Acked-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b6b996b6cdeecf7e1646c87422e04e446ddce124)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: fix a couple of minor indentation issues
Colin Ian King [Fri, 22 Dec 2017 00:39:36 +0000 (00:39 +0000)]
scsi: lpfc: fix a couple of minor indentation issues

BugLink: http://bugs.launchpad.net/bugs/1752182
Several statements are indented too far, fix these

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 8fd03fd17ff903abf91583344aaea2043cbccdad)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: don't dereference localport before it has been null checked
Colin Ian King [Fri, 22 Dec 2017 00:28:52 +0000 (00:28 +0000)]
scsi: lpfc: don't dereference localport before it has been null checked

BugLink: http://bugs.launchpad.net/bugs/1752182
localport is being dereferenced to assign lport and then immediately
afterwards localport is being sanity checked to see if it is null.  Fix
this by only dereferencing localport until after it has been null
checked.

Detected by CoverityScan, CID#1463038 ("Dereference before null check")

Fixes: 3a8cefbfc5ee ("scsi: lpfc: Beef up stat counters for debug")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 5c665aeb65aa066775763e59110ba4f5b5917bb6)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: scsi_transport_fc: fix typos on 64/128 GBit define names
James Smart [Thu, 21 Dec 2017 22:25:52 +0000 (14:25 -0800)]
scsi: scsi_transport_fc: fix typos on 64/128 GBit define names

BugLink: http://bugs.launchpad.net/bugs/1752182
The define names specified 64Bit/128Bit, not 64GBIT/128GBIT.  Correct
the names.

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit cc019a5a3b58670efe765f19aec42e28c16d7aed)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: correct sg_seg_cnt attribute min vs default
James Smart [Tue, 19 Dec 2017 18:57:50 +0000 (10:57 -0800)]
scsi: lpfc: correct sg_seg_cnt attribute min vs default

BugLink: http://bugs.launchpad.net/bugs/1752182
Prior patch mixed up what argument in the macro was what, so min value
was placed as the "default" argument, and the default value was placed
as the "min" argument. Thus, when the default was applied, it looked
like the default was smaller than the allowed min.

Swap argument postions to correct.

[mkp: fixed checkpatch warning]

Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b996ce39960e6239d3d30745749b0b17239cadce)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: update driver version to 11.4.0.6
James Smart [Sat, 9 Dec 2017 01:18:11 +0000 (17:18 -0800)]
scsi: lpfc: update driver version to 11.4.0.6

BugLink: http://bugs.launchpad.net/bugs/1752182
Update the driver version to 11.4.0.6

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 2f7005debea691ee83b575ed089eba80081c8bc3)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Beef up stat counters for debug
James Smart [Sat, 9 Dec 2017 01:18:10 +0000 (17:18 -0800)]
scsi: lpfc: Beef up stat counters for debug

BugLink: http://bugs.launchpad.net/bugs/1752182
If log verbose in not turned on, its hard to tell when certain error
paths get hit. Add stats counters and corresponding logic to
debugfs/sysfs to aid understanding what paths were traversed.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 4b056682d8812af30c6e6022f653b75abe2f26c7)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix infinite wait when driver unregisters a remote NVME port.
James Smart [Sat, 9 Dec 2017 01:18:09 +0000 (17:18 -0800)]
scsi: lpfc: Fix infinite wait when driver unregisters a remote NVME port.

BugLink: http://bugs.launchpad.net/bugs/1752182
When unregistering a remote port the lpfc driver would eventually wait
for the remoteport_unreg done callback. But the driver never completed
the io aborts that would allow the connections to terminate thus the
unreg done callback was never issued.  Turns out the coding style of the
driver allowed for the wait to occur on the same cpu that the deferred
isr is called on. The blocking for the wait, blocked the isr, and as the
isr didn't run, the io aborts wouldn't finish.

Turns out there was never a good reason to block waiting for the unreg
done in the first place. The driver can continue execution and the ref
counting within the driver will do the right thing.

Resolve by removing the wait and patching up a few cases where the ref
counting didn't look right - mainly cases where the remote port comes
back before the aborts had completed and the unreg done had been
called. Additionally, a few places which used pointer values to guide
driver actions weren't protected by lock, so correct those.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 3fd78355cdd59dbfec60e03a539378e3e3498c38)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix issues connecting with nvme initiator
James Smart [Sat, 9 Dec 2017 01:18:08 +0000 (17:18 -0800)]
scsi: lpfc: Fix issues connecting with nvme initiator

BugLink: http://bugs.launchpad.net/bugs/1752182
In the lpfc discovery engine, when as a nvme target, where the driver
was performing mailbox io with the adapter for port login when a NVME
PRLI is received from the host. Rather than queue and eventually get
back to sending a response after the mailbox traffic, the driver
rejected the io with an error response.

Turns out this particular initiator didn't like the rejection values
(unable to process command/command in progress) so it never attempted a
retry of the PRLI. Thus the host never established nvme connectivity
with the lpfc target.

By changing the rejection values (to Logical Busy/nothing more), the
initiator accepted the response and would retry the PRLI, resulting in
nvme connectivity.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit e06351a002214d152142906a546006e3446d1ef7)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix SCSI LUN discovery when SCSI and NVME enabled
James Smart [Sat, 9 Dec 2017 01:18:07 +0000 (17:18 -0800)]
scsi: lpfc: Fix SCSI LUN discovery when SCSI and NVME enabled

BugLink: http://bugs.launchpad.net/bugs/1752182
When enabled for both SCSI and NVME support, and connected pt2pt to a
SCSI only target, the driver nodelist entry for the remote port is left
in PRLI_ISSUE state and no SCSI LUNs are discovered. Works fine if only
configured for SCSI support.

Error was due to some of the prli points still reflecting the need to
send only 1 PRLI. On a lot of fabric configs, targets were NVME only,
which meant the fabric-reported protocol attributes were only telling
the driver one protocol or the other. Thus things worked fine. With
pt2pt, the driver must send a PRLI for both protocols as there are no
hints on what the target supports. Thus pt2pt targets were hitting the
multiple PRLI issues.

Complete the dual PRLI support. Track explicitly whether scsi (fcp) or
nvme prli's have been sent. Accurately track protocol support detected
on each node as reported by the fabric or probed by PRLI traffic.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 9de416ac67b54d666327ba927a190f4b7259f4a0)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Increase SCSI CQ and WQ sizes.
James Smart [Sat, 9 Dec 2017 01:18:06 +0000 (17:18 -0800)]
scsi: lpfc: Increase SCSI CQ and WQ sizes.

BugLink: http://bugs.launchpad.net/bugs/1752182
Increased the sizes of the SCSI WQ's and CQ's so that SCSI operation is
similar to that used by NVME. However, size increase restricted only to
those newer adapters that can support the larger WQE size, thus bigger
queue sizes.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit a51e41b671f18b4387b7150f64e1578729776302)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix receive PRLI handling
James Smart [Sat, 9 Dec 2017 01:18:05 +0000 (17:18 -0800)]
scsi: lpfc: Fix receive PRLI handling

BugLink: http://bugs.launchpad.net/bugs/1752182
Handling a rcv'ed PRLI incorrectly can cause the ndlp to end up in the
wrong state or the driver to ACC and PRLI when it should send LS_RJT.

The cause was due to the driver not properly looking at the PRLI type
and taking the multiple protocol support into consideration.

Resolved by adding checks in the various PRLI receive points to validate
PRLI type and reject if not valid for the enabled protocols and mode
(host vs target).

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit b95e29b75d3eebf989907c848f3b10eb5a0117fa)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix -EOVERFLOW behavior for NVMET and defer_rcv
James Smart [Sat, 9 Dec 2017 01:18:04 +0000 (17:18 -0800)]
scsi: lpfc: Fix -EOVERFLOW behavior for NVMET and defer_rcv

BugLink: http://bugs.launchpad.net/bugs/1752182
The driver is all set to handle the defer_rcv api for the nvmet_fc
transport, yet didn't properly recognize the return status when the
defer_rcv occurred. The driver treated it simply as an error and aborted
the io. Several residual issues occurred at that point.

Finish the defer_rcv support: recognize the return status when the io
request is being handled in a deferred style. This stops the rogue
aborts; Replenish the async cmd rcv buffer in the deferred receive if
needed.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit cbc5de1b8a0f67beeafa9e474803709368f55175)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: update driver version to 11.4.0.5
James Smart [Tue, 21 Nov 2017 00:00:44 +0000 (16:00 -0800)]
scsi: lpfc: update driver version to 11.4.0.5

BugLink: http://bugs.launchpad.net/bugs/1752182
Update the driver version to 11.4.0.5

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit ba48077f23d29218c25e057b037c0813f78de94c)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix random heartbeat timeouts during heavy IO
James Smart [Sat, 9 Dec 2017 01:18:03 +0000 (17:18 -0800)]
scsi: lpfc: Fix random heartbeat timeouts during heavy IO

BugLink: http://bugs.launchpad.net/bugs/1752182
NVME targets appear to randomly disconnect from the initiator when
running heavy IO.

The error is due to the host aggregate (across all controllers) io load
was beyond the maximum exchange count for nvme on the adapter. The
driver was properly returning a resource busy status, but the io load
was so great heartbeat commands would be bounced and not have a
successful retry within the fuzz amount for the nvme heartbeat (yes, a
very high io load!). Thus the target was terminating the controller due
to a keep alive failure.

Resolve by reserving a few exchanges (by counters) which can be used
when the adapter is out of normal exchanges and the command is a NVME
heartbeat command. As counters are used, while the reserved command is
outstanding, as soon as any other exchange completes, the counters are
adjusted and the reserved count is replenished. The heartbeat completes
execution in a normal fashion.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit cf1a1d3e2d88af49472014db0c82779b4fe85455)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: small sg cnt cleanup
James Smart [Tue, 21 Nov 2017 00:00:43 +0000 (16:00 -0800)]
scsi: lpfc: small sg cnt cleanup

BugLink: http://bugs.launchpad.net/bugs/1752182
The logic for sg_seg_cnt is a bit convoluted. This patch tries to clean
up a couple of areas, especially around the +2 and +1 logic.

This patch:

- Cleans up the lpfc_sg_seg_cnt attribute to specify a real minimum
  rather than making the minimum be whatever the default is.

- Removes the hardcoding of +2 (for the number of elements we use in a
  sgl for cmd iu and rsp iu) and +1 (an additional entry to compensate
  for nvme's reduction of io size based on a possible partial page)
  logic in sg list initialization. In the case where the +1 logic is
  referenced in host and target io checks, use the values set in the
  transport template as that value was properly set.

There can certainly be more done in this area and it will be addressed
in combined host/target driver effort.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 81e6a63728a409ae0e0061c1dc5adb4a85cc4869)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix driver handling of nvme resources during unload
James Smart [Tue, 21 Nov 2017 00:00:42 +0000 (16:00 -0800)]
scsi: lpfc: Fix driver handling of nvme resources during unload

BugLink: http://bugs.launchpad.net/bugs/1752182
During driver unload, the driver may crash due to NULL pointers.  The
NULL pointers were due to the driver not protecting itself sufficiently
during some of the teardown paths.  Additionally, the driver was not
waiting for and cleanup up nvme io resources. As such, the driver wasn't
making the callbacks to the transport, stalling the transports
association teardown.

This patch waits for io clean up before tearding down and adds checks
for possible NULL pointers.

Cc: <stable@vger.kernel.org> # 4.12+
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit c3725bdcdf28f5e2f3a78b69e9dd010f49284a09)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Fix crash during driver unload with running nvme traffic
James Smart [Tue, 21 Nov 2017 00:00:41 +0000 (16:00 -0800)]
scsi: lpfc: Fix crash during driver unload with running nvme traffic

BugLink: http://bugs.launchpad.net/bugs/1752182
When the driver is unloading, the nvme transport could be in the process
of submitting new requests, will send abort requests to terminate
associations, or may make LS-related requests.  The driver's abort and
request entry points currently is ignorant of the unloading state and is
starting the requests even though the infrastructure to complete them
continues to teardown.

Change the entry points for new requests to check whether unloading and
if so, reject the requests. Abort routines check unloading, and if so,
noop the request. An abort is noop'd as the teardown paths are already
aborting/terminating the io outstanding at the time the teardown
initiated.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 3386f4bdd243ad5a9094d390297602543abe9902)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: lpfc: Correct driver deregistrations with host nvme transport
James Smart [Tue, 21 Nov 2017 00:00:40 +0000 (16:00 -0800)]
scsi: lpfc: Correct driver deregistrations with host nvme transport

BugLink: http://bugs.launchpad.net/bugs/1752182
The driver's interaction with the host nvme transport has been incorrect
for a while. The driver did not wait for the unregister callbacks
(waited only 5 jiffies). Thus the driver may remove objects that may be
referenced by subsequent abort commands from the transport, and the
actual unregister callback was effectively a noop. This was especially
problematic if the driver was unloaded.

The driver now waits for the unregister callbacks, as it should, before
continuing with teardown.

Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit add9d6be3d650bf897b1c3feadabcf42e216acdb)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>