git.proxmox.com Git - mirror_ubuntu-artful-kernel.git/log

Drivers: hv: ring_buffer: remove stray smp_read_barrier_depends()

smp_read_barrier_depends() does nothing on almost all arcitectures
including x86 and having it in the beginning of
hv_get_ringbuffer_availbytes() does not provide any guarantees anyway.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 45870a441361d1c05a5f767c4ece2f6e30e0da9c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: ring_buffer.c: fix comment style

Convert 6+-string comments repeating function names to normal kernel-style
comments and fix a couple of other comment style issues. No textual or
functional changes intended.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 822f18d4d3e9d4efb4996bbe562d0f99ab82d7dd)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: utils: fix crash when device is removed from host side

The crash is observed when a service is being disabled host side while
userspace daemon is connected to the device:

[   90.244859] general protection fault: 0000 [#1] SMP
...
[   90.800082] Call Trace:
[   90.800082]  [<ffffffff81187008>] __fput+0xc8/0x1f0
[   90.800082]  [<ffffffff8118716e>] ____fput+0xe/0x10
...
[   90.800082]  [<ffffffff81015278>] do_signal+0x28/0x580
[   90.800082]  [<ffffffff81086656>] ? finish_task_switch+0xa6/0x180
[   90.800082]  [<ffffffff81443ebf>] ? __schedule+0x28f/0x870
[   90.800082]  [<ffffffffa01ebbaa>] ? hvt_op_read+0x12a/0x140 [hv_utils]
...

The problem is that hvutil_transport_destroy() which does misc_deregister()
freeing the appropriate device is reachable by two paths: module unload
and from util_remove(). While module unload path is protected by .owner in
struct file_operations util_remove() path is not. Freeing the device while
someone holds an open fd for it is a show stopper.

In general, it is not possible to revoke an fd from all users so the only
way to solve the issue is to defer freeing the hvutil_transport structure.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9420098adc50a88d4a441e0f92d54bfa7af44448)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: utils: introduce HVUTIL_TRANSPORT_DESTROY mode

When Hyper-V host asks us to remove some util driver by closing the
appropriate channel there is no easy way to force the current file
descriptor holder to hang up but we can start to respond -EBADF to all
operations asking it to exit gracefully.

As we're setting hvt->mode from two separate contexts now we need to use
a proper locking.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a15025660d4703a8b37290a14734cb4a84875770)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: utils: rename outmsg_lock

As a preparation to reusing outmsg_lock to protect test-and-set openrations
on 'mode' rename it the more general 'lock'.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a72f3a4ccff22de879a1f599210ecdd9bd483a43)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: utils: fix memory leak on on_msg() failure

inmsg should be freed in case of on_msg() failure to avoid memory leak.
Preserve the error code from on_msg().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 1f75338b6fece2bbd42ac3623830c65e2df6e031)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

tools: hv: vss: fix the write()'s argument: error -> vss_msg

Fix the write()'s argument in the daemon code.

Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Cc: stable@vger.kernel.org
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a689d2510f188e75391dbebacbddfd74d42f2a7e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: utils: Invoke the poll function after handshake

When the handshake with daemon is complete, we should poll the channel since
during the handshake, we will not be processing any messages. This is a
potential bug if the host is waiting for a response from the guest.
I would like to thank Dexuan for pointing this out.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2d0c3b5ad739697a68dc8a444f5b9f4817cf8f8f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: vmbus: Force all channel messages to be delivered on CPU 0

Force all channel messages to be delivered on CPU0. These messages are not
performance critical and are used during the setup and teardown of the
channel.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b282e4c06fe89fc750fb26791c0bd7b25315973a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

drivers/hv: correct tsc page sequence invalid value

Hypervisor Top Level Functional Specification v3/4 says
that TSC page sequence value = -1(0xFFFFFFFF) is used to
indicate that TSC page no longer reliable source of reference
timer. Unfortunately, we found that Windows Hyper-V guest
side implementation uses sequence value = 0 to indicate
that Tsc page no longer valid. This is clearly visible
inside Windows 2012R2 ntoskrnl.exe HvlGetReferenceTime()
function dissassembly:

HvlGetReferenceTime proc near
                 xchg    ax, ax
loc_1401C3132:
                 mov     rax, cs:HvlpReferenceTscPage
                 mov     r9d, [rax]
                 test    r9d, r9d
                 jz      short loc_1401C3176
                 rdtsc
                 mov     rcx, cs:HvlpReferenceTscPage
                 shl     rdx, 20h
                 or      rdx, rax
                 mov     rax, [rcx+8]
                 mov     rcx, cs:HvlpReferenceTscPage
                 mov     r8, [rcx+10h]
                 mul     rdx
                 mov     rax, cs:HvlpReferenceTscPage
                 add     rdx, r8
                 mov     ecx, [rax]
                 cmp     ecx, r9d
                 jnz     short loc_1401C3132
                 jmp     short loc_1401C3184
loc_1401C3176:
                 mov     ecx, 40000020h
                 rdmsr
                 shl     rdx, 20h
                 or      rdx, rax
loc_1401C3184:
                 mov     rax, rdx
                 retn
HvlGetReferenceTime endp

This patch aligns Tsc page invalid sequence value with
Windows Hyper-V guest implementation which is more
compatible with both Hyper-V hypervisor and KVM hypervisor.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c35b82ef0294ae5052120615f5cfcef17c5a6bf7)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: vmbus: Fix a Host signaling bug

Currently we have two policies for deciding when to signal the host:
One based on the ring buffer state and the other based on what the
VMBUS client driver wants to do. Consider the case when the client
wants to explicitly control when to signal the host. In this case,
if the client were to defer signaling, we will not be able to signal
the host subsequently when the client does want to signal since the
ring buffer state will prevent the signaling. Implement logic to
have only one signaling policy in force for a given channel.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Tested-by: Haiyang Zhang <haiyangz@microsoft.com>
Cc: <stable@vger.kernel.org> # v4.2+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8599846d73997cdbccf63f23394d871cfad1e5e6)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

drivers:hv: Allow for MMIO claims that span ACPI _CRS records

This patch makes 16GB GPUs work in Hyper-V VMs, since, for
compatibility reasons, the Hyper-V BIOS lists MMIO ranges in 2GB
chunks in its root bus's _CRS object.

Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 40f26f3168bf7a4da490db308dc0bd9f9923f41f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: vmbus: channge vmbus_connection.channel_lock to mutex

spinlock is unnecessary here.
mutex is enough.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d6f591e339d23f434efda11917da511870891472)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: vmbus: release relid on error in vmbus_process_offer()

We want to simplify vmbus_onoffer_rescind() by not invoking
hv_process_channel_removal(NULL, ...).

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f52078cf5711ce47c113a58702b35c8ff5f212f5)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: vmbus: fix rescind-offer handling for device without a driver

In the path vmbus_onoffer_rescind() -> vmbus_device_unregister() ->
device_unregister() -> ... -> __device_release_driver(), we can see for a
device without a driver loaded: dev->driver is NULL, so
dev->bus->remove(dev), namely vmbus_remove(), isn't invoked.

As a result, vmbus_remove() -> hv_process_channel_removal() isn't invoked
and some cleanups(like sending a CHANNELMSG_RELID_RELEASED message to the
host) aren't done.

We can demo the issue this way:
1. rmmod hv_utils;
2. disable the Heartbeat Integration Service in Hyper-V Manager and lsvmbus
shows the device disappears.
3. re-enable the Heartbeat in Hyper-V Manager and modprobe hv_utils, but
lsvmbus shows the device can't appear again.
This is because, the host thinks the VM hasn't released the relid, so can't
re-offer the device to the VM.

We can fix the issue by moving hv_process_channel_removal()
from vmbus_close_internal() to vmbus_device_release(), since the latter is
always invoked on device_unregister(), whether or not the dev has a driver
loaded.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 34c6801e3310ad286c7bb42bc88d42926b8f99bf)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: vmbus: do sanity check of channel state in vmbus_close_internal()

This fixes an incorrect assumption of channel state in the function.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 64b7faf903dae2df94d89edf2c688b16751800e4)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: vmbus: serialize process_chn_event() and vmbus_close_internal()

process_chn_event(), running in the tasklet, can race with
vmbus_close_internal() in the case of SMP guest, e.g., when the former is
accessing channel->inbound.ring_buffer, the latter could be freeing the
ring_buffer pages.

To resolve the race, we can serialize them by disabling the tasklet when
the latter is running here.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 63d55b2aeb5e4faa170316fee73c3c47ea9268c7)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: vmbus: Get rid of the unused irq variable

The irq we extract from ACPI is not used - we deliver hypervisor
interrupts on a special vector. Make the necessary adjustments.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit efc267226b827f347a329c395e16c08973b0e3d6)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: vmbus: Get rid of the unused macro

The macro VMBUS_DEVICE() is unused; get rid of it.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 90e031fa06ad6b660a8e9bebbb80bd30e555a2a5)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: vmbus: Use uuid_le_cmp() for comparing GUIDs

Use uuid_le_cmp() for comparing GUIDs.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 4ae9250893485f380275e7d5cb291df87c4d9710)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: vmbus: Use uuid_le type consistently

Consistently use uuid_le type in the Hyper-V driver code.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit af3ff643ea91ba64dd8d0b1cbed54d44512f96cd)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: vss: run only on supported host versions

The Backup integration service on WS2012 has appearently trouble to
negotiate with a guest which does not support the provided util version.
Currently the VSS driver supports only version 5/0. A WS2012 offers only
version 1/x and 3/x, and vmbus_prep_negotiate_resp correctly returns an
empty icframe_vercnt/icmsg_vercnt. But the host ignores that and
continues to send ICMSGTYPE_NEGOTIATE messages. The result are weird
errors during boot and general misbehaviour.

Check the Windows version to work around the host bug, skip hv_vss_init
on WS2012 and older.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ed9ba608e4851144af8c7061cbb19f751c73e998)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

drivers:hv: Define the channel type for Hyper-V PCI Express pass-through

This defines the channel type for PCI front-ends in Hyper-V VMs.

Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3053c762444a83ec6a8777f9476668b23b8ab180)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

drivers:hv: Export the API to invoke a hypercall on Hyper-V

This patch exposes the function that hv_vmbus.ko uses to make hypercalls. This
is necessary for retargeting an interrupt when it is given a new affinity.

Since we are exporting this API, rename the API as it will be visible outside
the hv.c file.

Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a108393dbf764efb2405f21ca759806c65b8bc16)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

drivers:hv: Export a function that maps Linux CPU num onto Hyper-V proc num

This patch exposes the mapping between Linux CPU number and Hyper-V virtual
processor number. This is necessary because the hypervisor needs to know which
virtual processors to target when making a mapping in the Interrupt Redirection
Table in the I/O MMU.

Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 619848bd074343ff2bdeeafca0be39748f6da372)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

drivers/hv: cleanup synic msrs if vmbus connect failed

Before vmbus_connect() synic is setup per vcpu - this means
hypervisor receives writes at synic msr's and probably allocate
hypervisor resources per synic setup.

If vmbus_connect() failed for some reason it's neccessary to cleanup
synic setup by call hv_synic_cleanup() at each vcpu to get a chance
to free allocated resources by hypervisor per synic.

This patch does appropriate cleanup in case of vmbus_connect() failure.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 17efbee8ba02ef00d3b270998978f8a1a90f1d92)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: utils: use memdup_user in hvt_op_write

Use memdup_user to handle OOM.

Fixes: 14b50f80c32d ('Drivers: hv: util: introduce hv_utils_transport abstraction')
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b00359642c2427da89dc8f77daa2c9e8a84e6d76)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: util: catch allocation errors

Catch allocation errors in hvutil_transport_send.

Fixes: 14b50f80c32d ('Drivers: hv: util: introduce hv_utils_transport abstraction')
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit cdc0c0c94e4e6dfa371d497a3130f83349b6ead6)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

tools: hv: remove repeated HV_FCOPY string

HV_FCOPY is already used as identifier in syslog.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6dfb867cea9e93ae9220f0b2e702b0440e4c8b4b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

tools: hv: report ENOSPC errors in hv_fcopy_daemon

Currently some "Unspecified error 0x80004005" is reported on the Windows
side if something fails. Handle the ENOSPC case and return
ERROR_DISK_FULL, which allows at least Copy-VMFile to report a meaning
full error.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b4ed5d1682c6613988c2eb1de55df5ac9988afcc)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: utils: run polling callback always in interrupt context

All channel interrupts are bound to specific VCPUs in the guest
at the point channel is created. While currently, we invoke the
polling function on the correct CPU (the CPU to which the channel
is bound to) in some cases we may run the polling function in
a non-interrupt context. This potentially can cause an issue as the
polling function can be interrupted by the channel callback function.
Fix the issue by running the polling function on the appropriate CPU
at interrupt level. Additional details of the issue being addressed by
this patch are given below:

Currently hv_fcopy_onchannelcallback is called from interrupts and also
via the ->write function of hv_utils. Since the used global variables to
maintain state are not thread safe the state can get out of sync.
This affects the variable state as well as the channel inbound buffer.

As suggested by KY adjust hv_poll_channel to always run the given
callback on the cpu which the channel is bound to. This avoids the need
for locking because all the util services are single threaded and only
one transaction is active at any given point in time.

Additionally, remove the context variable, they will always be the same as
recv_channel.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3cace4a616108539e2730f8dc21a636474395e0f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: util: Increase the timeout for util services

Util services such as KVP and FCOPY need assistance from daemon's running
in user space. Increase the timeout so we don't prematurely terminate
the transaction in the kernel. Host sets up a 60 second timeout for
all util driver transactions. The host will retry the transaction if it
times out. Set the guest timeout at 30 seconds.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c0b200cfb0403740171c7527b3ac71d03f82947a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: vmbus: fix build warning

We were getting build warning about unused variable "tsc_msr" and
"va_tsc" while building for i386 allmodconfig.

Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9220e39b5c900c67ddcb517d52fe52d90fb5e3c8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net: i40e: shut up uninitialized variable warnings

intel/i40e/i40e_txrx.c: In function 'i40e_xmit_frame_ring':
intel/i40e/i40e_txrx.c:2367:20: error: 'oiph' may be used uninitialized in this function [-Werror=maybe-uninitialized]
intel/i40e/i40e_txrx.c:2317:16: note: 'oiph' was declared here
intel/i40e/i40e_txrx.c:2367:17: error: 'oudph' may be used uninitialized in this function [-Werror=maybe-uninitialized]
intel/i40e/i40e_txrx.c:2316:17: note: 'oudph' was declared here

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 79febbc19b81b5242339bffb90e4dbea15015dde)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

i40e: fix build warnings

Fixes following build warnings :

drivers/net/ethernet/intel/i40e/i40e_main.c:7057:13: warning:
'i40e_sync_udp_filters_subtask' defined but not used [-Wunused-function]
drivers/net/ethernet/intel/i40e/i40e_main.c:8524:13: warning:
'i40e_add_vxlan_port' defined but not used [-Wunused-function]
drivers/net/ethernet/intel/i40e/i40e_main.c:8569:13: warning:
'i40e_del_vxlan_port' defined but not used [-Wunused-function]
drivers/net/ethernet/intel/i40e/i40e_main.c:8604:13: warning:
'i40e_add_geneve_port' defined but not used [-Wunused-function]
drivers/net/ethernet/intel/i40e/i40e_main.c:8651:13: warning:
'i40e_del_geneve_port' defined but not used [-Wunused-function]

Fixes: 6a899024058d ("i40e: geneve tunnel offload support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 5cae7615b613381a04d3dd06b8237234cc3f7cc9)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: nvme merge cleanup

BugLink: http://bugs.launchpad.net/bugs/1531539
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Export NVMe attributes to sysfs group

BugLink: http://bugs.launchpad.net/bugs/1531539
Adds all controller information to attribute list exposed to sysfs, and
appends the reset_controller attribute to it. The nvme device is created
with this attribute list, so driver no long manages its attributes.

Reported-by: Sujith Pandel <sujithpshankar@gmail.com>
Cc: Sujith Pandel <sujithpshankar@ gmail.com>
Cc: David Milburn <dmilburn@redhat.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 779ff75617099f4defe14e20443b95019a4c5ae8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Shutdown controller only for power-off

BugLink: http://bugs.launchpad.net/bugs/1531539
We don't need to shutdown a controller for a reset. A controller in a
shutdown state may take longer to become ready than one that was simply
disabled. This patch has the driver shut down a controller only if the
device is about to be powered off or being removed. When taking the
controller down for a reset reason, the controller will be disabled
instead.

Function names have been updated in this patch to reflect their changed
semantics.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit a5cdb68c2c10f0865122656833cd07636a4143ee)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: IO queue deletion re-write

BugLink: http://bugs.launchpad.net/bugs/1531539
The nvme driver deletes IO queues asynchronously since this operation
may potentially take an undesirable amount of time with a large number
of queues if done serially.

The driver used to manage coordinating asynchronous deletions. This
patch simplifies that by leveraging the block layer rather than using
kthread workers and chaining more complicated callbacks.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit db3cbfff5bcc0b9a82d8c71f00b9d60fad215871)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Remove queue freezing on resets

BugLink: http://bugs.launchpad.net/bugs/1531539
NVMe submits all commands through the block layer now. This means we
can let requests queue at the blk-mq hardware context since there is no
path that bypasses this anymore so we don't need to freeze the queues
anymore. The driver can simply stop the h/w queues from running during
a reset instead.

This also fixes a WARN in percpu_ref_reinit when the queue was unfrozen
with requeued requests.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 25646264e15af96c5c630fc742708b1eb3339222)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Use a retryable error code on reset

BugLink: http://bugs.launchpad.net/bugs/1531539
A negative status has the "do not retry" bit set, which makes it not
retryable. Use a fake status that can potentially be retried on reset.

An aborted command's status is overridden by the timeout handler so
that it won't be retried, which is necessary to keep initialization from
getting into a reset loop.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1d49c38c4865c596b01b31a52540275c1bb383e7)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Fix admin queue ring wrap

BugLink: http://bugs.launchpad.net/bugs/1531539
The tag set queue depth needs to be one less than the h/w queue depth
so we don't wrap the circular buffer. This conforms to the specification
defined "Full Queue" condition.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e3e9d50cd6ed392bb716e35c134d1e82707c51b4)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: make SG_IO support optional

BugLink: http://bugs.launchpad.net/bugs/1531539
Translation SCSI commands to NVMe commands is rather pointless in general
as applications must not expext to be able to use SCSI commands on a
generic block device.

Make the huge translation layer optional and hope no one will ever enable
it in the future.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(back ported from commit 4490733250b8b272a6d3e66352dd7b8025409549)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Conflicts:
drivers/nvme/host/Makefile

nvme: fixes for NVME_IOCTL_IO_CMD on the char device

BugLink: http://bugs.launchpad.net/bugs/1531539
Make sure we synchronize access to the namespaces list and grab a reference
to the namespace before doing I/O. Make sure to reject the ioctl if multiple
namespaces are present as it's entirely unsafe, and warn when using it even
with a single namespace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bfd8947194b2e2a53db82bbc7eb7c15d028c46db)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: synchronize access to ctrl->namespaces

BugLink: http://bugs.launchpad.net/bugs/1531539
Currently traversal and modification of ctrl->namespaces happens completely
unsynchronized, which can be fixed by the addition of a simple mutex.

Note: nvme_dev_ioctl will be handled in the next patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 69d3b8ac15a5eb938e6a01909f6cc8ae4b5d3a17)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: Move nvme_freeze/unfreeze_queues to nvme core

BugLink: http://bugs.launchpad.net/bugs/1531539
Nothing pci specific about them and We'll need them exported
in other transports too.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 363c9aacb6c59bb63148dd115632880a4aed4d88)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Export namespace attributes to sysfs

BugLink: http://bugs.launchpad.net/bugs/1531539
Exposes the NGUID, EUI-64, and NSID to sysfs entries under the disk's
kobject.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 2b9b6e86bca7209de02754fc84acf7ab3e78734e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Add pci error handlers

BugLink: http://bugs.launchpad.net/bugs/1531539
Requests enabling pcie aer support. Shuts down the controller on error
detected with io frozen state prior to requesting slot reset; resumes
controller after reset completes.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit a0a3408ee614848c27b0d36c2fe490da3b387b8d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: merge iod and cmd_info

BugLink: http://bugs.launchpad.net/bugs/1531539
Merge the two per-request structures in the nvme driver into a single
one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f4800d6d1548e0d5ab94f2216d41d94282e2588c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: meta_sg doesn't have to be an array

BugLink: http://bugs.launchpad.net/bugs/1531539
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bf68405705bd35c09ec1f7528718dce5af88daff)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: properly free resources for cancelled command

BugLink: http://bugs.launchpad.net/bugs/1531539
We need to move freeing of resources to the ->complete handler to ensure
they are also freed when we cancel the command.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit eee417b0697827a6e120199b126b447af3c81b47)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: simplify completion handling

BugLink: http://bugs.launchpad.net/bugs/1531539
Now that all commands are executed as block layer requests we can remove the
internal completion in the NVMe driver. Note that we can simply call
blk_mq_complete_request to abort commands as the block layer will protect
against double copletions internally.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit aae239e1910ebc27ec9f7e8b25904a69626cf28c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: special case AEN requests

BugLink: http://bugs.launchpad.net/bugs/1531539
AEN requests are different from other requests in that they don't time out
or can easily be cancelled. Because of that we should not use the blk-mq
infrastructure but just special case them in the completion path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit adf68f21c15572c68d9fadae618a09cf324b9814)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: switch abort to blk_execute_rq_nowait

BugLink: http://bugs.launchpad.net/bugs/1531539
And remove the now unused nvme_submit_cmd helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e7a2a87d5938bbebe1637c82fbde94ea6be3ef78)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: switch delete SQ/CQ to blk_execute_rq_nowait

BugLink: http://bugs.launchpad.net/bugs/1531539
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit d8f32166a9c587e87a3a86f654c73d40b6b5df00)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: factor out a few helpers from req_completion

BugLink: http://bugs.launchpad.net/bugs/1531539
We'll need them in other places later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7688faa6dd2c99ce5d66571d9ad65535ec39e8cb)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: fix admin queue depth

BugLink: http://bugs.launchpad.net/bugs/1531539
The number in tag_set->queue depth includes the reserved tags.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4680072003df14230e9eeeeefb617401012234a5)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Simplify metadata setup

BugLink: http://bugs.launchpad.net/bugs/1531539
We no longer require the two-pass setup for block integrity.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4b9d5b151046ff717819864f93cb8e012b347bce)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Remove device management handles on remove

BugLink: http://bugs.launchpad.net/bugs/1531539
We don't want to allow new references to open on a device that is
removed. This ties the lifetime of these handles to the physical device's
presence rather than to the open reference count.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 53029b0441bbd263dbb2ee6429572b1732dad4de)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Use unbounded work queue for all work

BugLink: http://bugs.launchpad.net/bugs/1531539
Removes all usage of the global work queue so work can't be
scheduled on two different work queues, and removes nvme's work queue
singlethreadedness so controllers can be driven in parallel.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: keep the dead controller removal on the system workqueue to avoid
deadlocks]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 92f7a1624bbc2361b96db81de89aee1baae40da9)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Implement namespace list scanning

BugLink: http://bugs.launchpad.net/bugs/1531539
The NVMe 1.1 specification provides an identify mode to return a
list of active namespaces. This is more efficient to discover which
namespace identifiers are active on a controller, providing potentially
significant improvement in scan time for controllers with sparesly
populated namespaces.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: add quirk for the broken Qemu Identify implementation. To be relaxed
later]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 540c801c65eb58e05e0ca38b6fd644a83d7e2b33)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: switch abort_limit to an atomic_t

BugLink: http://bugs.launchpad.net/bugs/1531539
There is no lock to sychronize access to the abort_limit field of
struct nvme_ctrl, so switch it to an atomic_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 6bf25d16410d8d95e3552f31c6a99e3fc3d31752)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: remove dead controllers from a work item

BugLink: http://bugs.launchpad.net/bugs/1531539
Compared to the kthread this gives us multiple call prevention for free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 5c8809e650772be87ba04595a8ccf278bab7b543)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: merge probe_work and reset_work

BugLink: http://bugs.launchpad.net/bugs/1531539
If we're using two work queues we're always going to run into races where
one item is tearing down what the other one is initializing. So insted
merge the two work queues, and let the old probe_work also tear the
controller down first if it was alive. Together with the better detection
of the probe path using a flag this gives us a properly serialized
reset/probe path that also doesn't accidentally trigger when two commands
time out and the second one tries to reset the controller while the first
reset is still in progress.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit fd634f4142861e533ac57e88ece8e98ab5851edb)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: do not restart the request timeout if we're resetting the controller

BugLink: http://bugs.launchpad.net/bugs/1531539
Otherwise we're never going to complete a command when it is restarted just
after we completed all other outstanding commands in nvme_clear_queue.

The controller must be disabled prior to completing a presumed lost
command, do this by directly shutting down the controller before
queueing the reset work, and return EH_HANDLED from the timeout handler
after we shut the controller down.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: split and rebase]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e1569a16180aef4311ff5fc54f54b23ae9e8a03e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: simplify resets

BugLink: http://bugs.launchpad.net/bugs/1531539
Don't delete the controller from dev_list before queuing a reset, instead
just check for it being reset in the polling kthread. This allows to remove
the dev_list_lock in various places, and in addition we can simply rely on
checking the queue_work return value to see if we could reset a controller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 846cc05f95d599801f296d8599e82686ebd395f0)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: add NVME_SC_CANCELLED

BugLink: http://bugs.launchpad.net/bugs/1531539
To properly document how we are using a negative Linux error value to
communicate request cancellations inside the driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 297465c873ae8c99180617ca904dc1a4a738f25d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: merge nvme_abort_req and nvme_timeout

BugLink: http://bugs.launchpad.net/bugs/1531539
We want to be able to return bettern error values frmo nvme_timeout, which
is significantly easier if the two functions are merged. Also clean up and
reduce the printk spew so that we only get one message per abort.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 31c7c7d2c9f17dc98a98c59c17e184bf164ee760)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: don't take the I/O queue q_lock in nvme_timeout

BugLink: http://bugs.launchpad.net/bugs/1531539
There is nothing it protects, but it makes lockdep unhappy in many different
ways.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4c9f748f0ee88447b28546991f60f43a7319aafd)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: protect against simultaneous shutdown invocations

BugLink: http://bugs.launchpad.net/bugs/1531539
Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 77bf25ea70200cddf083f74b7f617e5f07fac8bd)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: only add a controller to dev_list after it's been fully initialized

BugLink: http://bugs.launchpad.net/bugs/1531539
Without this we can easily get bad derferences on nvmeq->d_db when the nvme
kthread tries to poll the CQs for controllers that are in half initialized
state.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7385014c073263b077442439299fad013edd4409)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: only ignore hardware errors in nvme_create_io_queues

BugLink: http://bugs.launchpad.net/bugs/1531539
Half initialized queues due to kernel error returns or timeout are still a
good reason to give up on initializing a controller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 749941f2365db8198b5d75c83a575ee6e55bf03b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: precedence bug in nvme_pr_clear()

BugLink: http://bugs.launchpad.net/bugs/1531539
The "|" operator has higher precedence than "?:" so this didn't work as
intended. I had previously fixed this bug, but it we copied the older
unfixed version when we moved the function between files.

Fixes: 1673f1f08c88 ('nvme: move block_device_operations and ns/ctrl freeing to common code')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 8c0b39155048d5a24f25c6c60aa83729927b04cd)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: fix another 32-bit build warning

BugLink: http://bugs.launchpad.net/bugs/1531539
The nvme_user_cmd function was recently moved around from one file
to another, which made a warning reappear that I had fixed before
at some point:

drivers/nvme/host/core.c: In function 'nvme_user_cmd':
drivers/nvme/host/core.c:424:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

This applies the same workaround that we have elsewhere in the
driver with an extra type cast to uintptr_t.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 1673f1f08c88 ("nvme: move block_device_operations and ns/ctrl freeing to common code")
Link: https://lkml.org/lkml/2015/10/9/611
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit d1ea7be5f755bf1a4d4fdccc35880fcf5069df60)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: fix build with CONFIG_NVM enabled

BugLink: http://bugs.launchpad.net/bugs/1531539
Looks like I didn't test with CONFIG_NVM enabled, and neither did
the build bot.

Most of this is really weird crazy shit in the lighnvm support, though.

Struct nvme_ns is a structure for the NVM I/O command set, and it has
no business poking into it.  Second this commit:

commit 47b3115ae7b799be8b77b0f024215ad4f68d6460
Author: Wenwei Tao <ww.tao0320@gmail.com>
Date:   Fri Nov 20 13:47:55 2015 +0100

    nvme: lightnvm: use admin queues for admin cmds

Does even more crazy stuff.  If a function gets a request_queue parameter
passed it'd better use that and not look for another one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(back ported from commit ac02dddec63385ffef1397d3f56cec4108bcafe9)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Conflicts:
drivers/nvme/host/lightnvm.c

blk-integrity: empty implementation when disabled

BugLink: http://bugs.launchpad.net/bugs/1531539
This patch moves the blk_integrity_payload definition outside the
CONFIG_BLK_DEV_INTERITY dependency and provides empty function
implementations when the kernel configuration disables integrity
extensions. This simplifies drivers that make use of these to map user
data so they don't need to repeat the same configuration checks.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Updated by Jens to pass an error pointer return from
bio_integrity_alloc(), otherwise if CONFIG_BLK_DEV_INTEGRITY isn't
set, we return a weird ENOMEM from __nvme_submit_user_cmd()
if a meta buffer is set.

Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 06c1e3902aa74b7432a7e82bb4a5aca233a42839)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: refactor set_queue_count

BugLink: http://bugs.launchpad.net/bugs/1531539
Split out a helper that just issues the Set Features and interprets the
result which can go to common code, and document why we are ignoring
non-timeout error returns in the PCIe driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 9a0be7abb62ff2a7dc3360ab45c31f29b3faf642)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: move chardev and sysfs interface to common code

BugLink: http://bugs.launchpad.net/bugs/1531539
For this we need to add a proper controller init routine and a list of
all controllers that is in addition to the list of PCIe controllers,
which stays in pci.c. Note that we remove the sysfs device when the
last reference to a controller is dropped now - the old code would have
kept it around longer, which doesn't make much sense.

This requires a new ->reset_ctrl operation to implement controleller
resets, and a new ->write_reg32 operation that is required to implement
subsystem resets. We also now store caches copied of the NVMe compliance
version and the flag if a controller is attached to a subsystem or not in
the generic controller structure now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[Fixes for pr merge]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f3ca80fc11c3af566eacd99cf821c1a48035c63b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: move namespace scanning to common code

BugLink: http://bugs.launchpad.net/bugs/1531539
The namespace scanning code has been mostly generic already, we just
need to store a pointer to the tagset in the nvme_ctrl structure, and
add a method to check if a controller is I/O incapable. The latter
will hopefully be replaced by a proper controller state machine soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[Fixed pr conflicts]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(back ported from commit 5bae7f73d378a986671a3cad717c721b38f80d9e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Conflicts:
drivers/nvme/host/pci.c

nvme: move the call to nvme_init_identify earlier

BugLink: http://bugs.launchpad.net/bugs/1531539
We want to record the identify and CAP values even if no I/O queue
is available.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit ce4541f40a949cd9a9c9f308b1a6a86914ce6e1a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: add a common helper to read Identify Controller data

BugLink: http://bugs.launchpad.net/bugs/1531539
And add the 64-bit register read operation for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7fd8930f26be4c9078684b2fef14da0503771bf2)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: move nvme_{enable,disable,shutdown}_ctrl to common code

BugLink: http://bugs.launchpad.net/bugs/1531539
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 5fd4ce1b005bd6ede913763f65efae9af6f7f386)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: move remaining CC setup into nvme_enable_ctrl

BugLink: http://bugs.launchpad.net/bugs/1531539
Remove the calculation of all the bits written into the CC register into
nvme_enable_ctrl, so that they can be moved into the core NVMe driver in
the future.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1b2eb374651f0496b86ed5f095d4c448bff214fa)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: add explicit quirk handling

BugLink: http://bugs.launchpad.net/bugs/1531539
Add an enum for all workarounds not in the spec and identify the affected
controllers at probe time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 106198edb74cdf3fe1aefa6ad1e199b58ab7c4cb)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: move block_device_operations and ns/ctrl freeing to common code

BugLink: http://bugs.launchpad.net/bugs/1531539
This moves the block_device_operations over to common code mostly
as-is. The only change is that the ns and ctrl refcounting got some
small refcounting to have wrappers around the kref_put operations.

A new free_ctrl operation is added to allow the PCI driver to free
it's ressources on the final drop.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[Moved the integrity and pr changes due to merge conflict]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1673f1f08c8876f3942b4fa5e8f6a40215f15a94)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: use the block layer for userspace passthrough metadata

BugLink: http://bugs.launchpad.net/bugs/1531539
Use the integrity API to pass through metadata from userspace. For PI
enabled devices this means that we now validate the reftag, which seems
like an unintentional ommission in the old code.

Thanks to Keith Busch for testing and fixes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[Skip metadata setup on admin commands]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 0b7f1f26f95a51ab11d4dc0adee230212b3cd675)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: split __nvme_submit_sync_cmd

BugLink: http://bugs.launchpad.net/bugs/1531539
Add a separate nvme_submit_user_cmd for commands that directly DMA
to or from userspace. We'll add metadata support to that soon and
the common version would become too messy.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4160982e7594481d6b7f90aa693638a37d20ea17)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: move nvme_setup_flush and nvme_setup_rw to common code

BugLink: http://bugs.launchpad.net/bugs/1531539
And mark them inline so that we don't slow down the I/O submission path by
having to turn it into a forced out of line call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 22944e9981db1e496d983298fd420a8c6b758c80)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: move nvme_error_status to common code

BugLink: http://bugs.launchpad.net/bugs/1531539
And mark it inline so that we don't slow down the completion path by
having to turn it into a forced out of line call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 15a190f7f57a2e46717490c35ac09882042a200b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: factor out a nvme_unmap_data helper

BugLink: http://bugs.launchpad.net/bugs/1531539
This is the counter part to nvme_map_data.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit d4f6c3aba5b496a2cb80a8e8e082ae51e46579f3)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: refactor nvme_queue_rq

BugLink: http://bugs.launchpad.net/bugs/1531539
This "backports" the structure I've used for the fabrics driver. It
mostly started out as a cleanup so that I could actually understand
the code, but I think it also qualifies as a micro-optimization due
to the reduced time we hold q_lock and disable interrupts.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit ba1ca37ea4e320c108c356eb8c91ac652afc57dd)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: simplify nvme_setup_prps calling convention

BugLink: http://bugs.launchpad.net/bugs/1531539
Pass back a true/false value instead of the length which needs a compare
with the bytes in the request and drop the pointless gfp_t argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 69d2b571746d1c3fa10b7a0aa00859b296a98d12)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: split a new struct nvme_ctrl out of struct nvme_dev

BugLink: http://bugs.launchpad.net/bugs/1531539
The new struct nvme_ctrl will be used by the common NVMe code that sits
on top of struct request_queue and the new nvme_ctrl_ops abstraction.
It only contains the bare minimum required, which consists of values
sampled during controller probe, the admin queue pointer and a second
struct device pointer at the moment, but more will follow later. Only
values that are not used in the I/O fast path should be moved to
struct nvme_ctrl so that drivers can optimize their cache line usage
easily. That's also the reason why we have two device pointers as
the struct device is used for DMA mapping purposes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1c63dc66580d4bbb6d2b75bf184b5aa105ba5bdb)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: use vendor it from identify

BugLink: http://bugs.launchpad.net/bugs/1531539
Use the vendor ID from the identify data instead of the PCI device to
make the SCSI translation layer independent from the PCI driver. The NVMe
spec defines them as having the same value for current PCIe devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 01fec28a6f3ba96d4f46a538eae089dd92189fd1)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: split nvme_trans_device_id_page

BugLink: http://bugs.launchpad.net/bugs/1531539
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bf7d3ebbd219d8ad948e812d03e1decfd96c97d0)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: use offset instead of a struct for registers

BugLink: http://bugs.launchpad.net/bugs/1531539
This makes life easier for future non-PCI drivers where access to the
registers might be more complicated. Note that Linux drivers are
pretty evenly split between the two versions, and in fact the NVMe
driver already uses offsets for the doorbells.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
[Fixed CMBSZ offset]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(back ported from commit 7a67cbea653e444d04d7e850ab9631a14a196422)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Conflicts:
drivers/nvme/host/pci.c

nvme: split command submission helpers out of pci.c

BugLink: http://bugs.launchpad.net/bugs/1531539
Create a new core.c and start by adding the command submission helpers
to it, which are already abstracted away from the actual hardware queues
by the block layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(back ported from commit 21d34711e1b5970acfb22bddf1fefbfbd7e0123b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Conflicts:
drivers/nvme/host/Makefile

nvme: move struct nvme_iod to pci.c

BugLink: http://bugs.launchpad.net/bugs/1531539
This structure is specific to the PCIe driver internals and should be moved
to pci.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 71bd150c71072014d98bff6dc2db3229306ece35)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: [Config] CONFIG_BLK_DEV_NVME_SCSI=y

BugLink: http://bugs.launchpad.net/bugs/1531539
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

blk-mq: add a flags parameter to blk_mq_alloc_request

BugLink: http://bugs.launchpad.net/bugs/1531539
We already have the reserved flag, and a nowait flag awkwardly encoded as
a gfp_t. Add a real flags argument to make the scheme more extensible and
allow for a nicer calling convention.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 6f3b0e8bcf3cbb87a7459b3ed018d31d918df3f8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>