git.proxmox.com Git - mirror_ubuntu-artful-kernel.git/log

Drivers: hv: vmbus: Use uuid_le_cmp() for comparing GUIDs

Use uuid_le_cmp() for comparing GUIDs.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 4ae9250893485f380275e7d5cb291df87c4d9710)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: vmbus: Use uuid_le type consistently

Consistently use uuid_le type in the Hyper-V driver code.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit af3ff643ea91ba64dd8d0b1cbed54d44512f96cd)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: vss: run only on supported host versions

The Backup integration service on WS2012 has appearently trouble to
negotiate with a guest which does not support the provided util version.
Currently the VSS driver supports only version 5/0. A WS2012 offers only
version 1/x and 3/x, and vmbus_prep_negotiate_resp correctly returns an
empty icframe_vercnt/icmsg_vercnt. But the host ignores that and
continues to send ICMSGTYPE_NEGOTIATE messages. The result are weird
errors during boot and general misbehaviour.

Check the Windows version to work around the host bug, skip hv_vss_init
on WS2012 and older.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ed9ba608e4851144af8c7061cbb19f751c73e998)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

drivers:hv: Define the channel type for Hyper-V PCI Express pass-through

This defines the channel type for PCI front-ends in Hyper-V VMs.

Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3053c762444a83ec6a8777f9476668b23b8ab180)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

drivers:hv: Export the API to invoke a hypercall on Hyper-V

This patch exposes the function that hv_vmbus.ko uses to make hypercalls. This
is necessary for retargeting an interrupt when it is given a new affinity.

Since we are exporting this API, rename the API as it will be visible outside
the hv.c file.

Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a108393dbf764efb2405f21ca759806c65b8bc16)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

drivers:hv: Export a function that maps Linux CPU num onto Hyper-V proc num

This patch exposes the mapping between Linux CPU number and Hyper-V virtual
processor number. This is necessary because the hypervisor needs to know which
virtual processors to target when making a mapping in the Interrupt Redirection
Table in the I/O MMU.

Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 619848bd074343ff2bdeeafca0be39748f6da372)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

drivers/hv: cleanup synic msrs if vmbus connect failed

Before vmbus_connect() synic is setup per vcpu - this means
hypervisor receives writes at synic msr's and probably allocate
hypervisor resources per synic setup.

If vmbus_connect() failed for some reason it's neccessary to cleanup
synic setup by call hv_synic_cleanup() at each vcpu to get a chance
to free allocated resources by hypervisor per synic.

This patch does appropriate cleanup in case of vmbus_connect() failure.

Signed-off-by: Andrey Smetanin <asmetanin@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
CC: "K. Y. Srinivasan" <kys@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 17efbee8ba02ef00d3b270998978f8a1a90f1d92)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: utils: use memdup_user in hvt_op_write

Use memdup_user to handle OOM.

Fixes: 14b50f80c32d ('Drivers: hv: util: introduce hv_utils_transport abstraction')
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b00359642c2427da89dc8f77daa2c9e8a84e6d76)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: util: catch allocation errors

Catch allocation errors in hvutil_transport_send.

Fixes: 14b50f80c32d ('Drivers: hv: util: introduce hv_utils_transport abstraction')
Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit cdc0c0c94e4e6dfa371d497a3130f83349b6ead6)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

tools: hv: remove repeated HV_FCOPY string

HV_FCOPY is already used as identifier in syslog.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6dfb867cea9e93ae9220f0b2e702b0440e4c8b4b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

tools: hv: report ENOSPC errors in hv_fcopy_daemon

Currently some "Unspecified error 0x80004005" is reported on the Windows
side if something fails. Handle the ENOSPC case and return
ERROR_DISK_FULL, which allows at least Copy-VMFile to report a meaning
full error.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b4ed5d1682c6613988c2eb1de55df5ac9988afcc)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: utils: run polling callback always in interrupt context

All channel interrupts are bound to specific VCPUs in the guest
at the point channel is created. While currently, we invoke the
polling function on the correct CPU (the CPU to which the channel
is bound to) in some cases we may run the polling function in
a non-interrupt context. This potentially can cause an issue as the
polling function can be interrupted by the channel callback function.
Fix the issue by running the polling function on the appropriate CPU
at interrupt level. Additional details of the issue being addressed by
this patch are given below:

Currently hv_fcopy_onchannelcallback is called from interrupts and also
via the ->write function of hv_utils. Since the used global variables to
maintain state are not thread safe the state can get out of sync.
This affects the variable state as well as the channel inbound buffer.

As suggested by KY adjust hv_poll_channel to always run the given
callback on the cpu which the channel is bound to. This avoids the need
for locking because all the util services are single threaded and only
one transaction is active at any given point in time.

Additionally, remove the context variable, they will always be the same as
recv_channel.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3cace4a616108539e2730f8dc21a636474395e0f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: util: Increase the timeout for util services

Util services such as KVP and FCOPY need assistance from daemon's running
in user space. Increase the timeout so we don't prematurely terminate
the transaction in the kernel. Host sets up a 60 second timeout for
all util driver transactions. The host will retry the transaction if it
times out. Set the guest timeout at 30 seconds.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c0b200cfb0403740171c7527b3ac71d03f82947a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

Drivers: hv: vmbus: fix build warning

We were getting build warning about unused variable "tsc_msr" and
"va_tsc" while building for i386 allmodconfig.

Signed-off-by: Sudip Mukherjee <sudip@vectorindia.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9220e39b5c900c67ddcb517d52fe52d90fb5e3c8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

net: i40e: shut up uninitialized variable warnings

intel/i40e/i40e_txrx.c: In function 'i40e_xmit_frame_ring':
intel/i40e/i40e_txrx.c:2367:20: error: 'oiph' may be used uninitialized in this function [-Werror=maybe-uninitialized]
intel/i40e/i40e_txrx.c:2317:16: note: 'oiph' was declared here
intel/i40e/i40e_txrx.c:2367:17: error: 'oudph' may be used uninitialized in this function [-Werror=maybe-uninitialized]
intel/i40e/i40e_txrx.c:2316:17: note: 'oudph' was declared here

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 79febbc19b81b5242339bffb90e4dbea15015dde)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

i40e: fix build warnings

Fixes following build warnings :

drivers/net/ethernet/intel/i40e/i40e_main.c:7057:13: warning:
'i40e_sync_udp_filters_subtask' defined but not used [-Wunused-function]
drivers/net/ethernet/intel/i40e/i40e_main.c:8524:13: warning:
'i40e_add_vxlan_port' defined but not used [-Wunused-function]
drivers/net/ethernet/intel/i40e/i40e_main.c:8569:13: warning:
'i40e_del_vxlan_port' defined but not used [-Wunused-function]
drivers/net/ethernet/intel/i40e/i40e_main.c:8604:13: warning:
'i40e_add_geneve_port' defined but not used [-Wunused-function]
drivers/net/ethernet/intel/i40e/i40e_main.c:8651:13: warning:
'i40e_del_geneve_port' defined but not used [-Wunused-function]

Fixes: 6a899024058d ("i40e: geneve tunnel offload support")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 5cae7615b613381a04d3dd06b8237234cc3f7cc9)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: SAUCE: nvme merge cleanup

BugLink: http://bugs.launchpad.net/bugs/1531539
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Export NVMe attributes to sysfs group

BugLink: http://bugs.launchpad.net/bugs/1531539
Adds all controller information to attribute list exposed to sysfs, and
appends the reset_controller attribute to it. The nvme device is created
with this attribute list, so driver no long manages its attributes.

Reported-by: Sujith Pandel <sujithpshankar@gmail.com>
Cc: Sujith Pandel <sujithpshankar@ gmail.com>
Cc: David Milburn <dmilburn@redhat.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 779ff75617099f4defe14e20443b95019a4c5ae8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Shutdown controller only for power-off

BugLink: http://bugs.launchpad.net/bugs/1531539
We don't need to shutdown a controller for a reset. A controller in a
shutdown state may take longer to become ready than one that was simply
disabled. This patch has the driver shut down a controller only if the
device is about to be powered off or being removed. When taking the
controller down for a reset reason, the controller will be disabled
instead.

Function names have been updated in this patch to reflect their changed
semantics.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit a5cdb68c2c10f0865122656833cd07636a4143ee)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: IO queue deletion re-write

BugLink: http://bugs.launchpad.net/bugs/1531539
The nvme driver deletes IO queues asynchronously since this operation
may potentially take an undesirable amount of time with a large number
of queues if done serially.

The driver used to manage coordinating asynchronous deletions. This
patch simplifies that by leveraging the block layer rather than using
kthread workers and chaining more complicated callbacks.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit db3cbfff5bcc0b9a82d8c71f00b9d60fad215871)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Remove queue freezing on resets

BugLink: http://bugs.launchpad.net/bugs/1531539
NVMe submits all commands through the block layer now. This means we
can let requests queue at the blk-mq hardware context since there is no
path that bypasses this anymore so we don't need to freeze the queues
anymore. The driver can simply stop the h/w queues from running during
a reset instead.

This also fixes a WARN in percpu_ref_reinit when the queue was unfrozen
with requeued requests.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 25646264e15af96c5c630fc742708b1eb3339222)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Use a retryable error code on reset

BugLink: http://bugs.launchpad.net/bugs/1531539
A negative status has the "do not retry" bit set, which makes it not
retryable. Use a fake status that can potentially be retried on reset.

An aborted command's status is overridden by the timeout handler so
that it won't be retried, which is necessary to keep initialization from
getting into a reset loop.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1d49c38c4865c596b01b31a52540275c1bb383e7)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Fix admin queue ring wrap

BugLink: http://bugs.launchpad.net/bugs/1531539
The tag set queue depth needs to be one less than the h/w queue depth
so we don't wrap the circular buffer. This conforms to the specification
defined "Full Queue" condition.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e3e9d50cd6ed392bb716e35c134d1e82707c51b4)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: make SG_IO support optional

BugLink: http://bugs.launchpad.net/bugs/1531539
Translation SCSI commands to NVMe commands is rather pointless in general
as applications must not expext to be able to use SCSI commands on a
generic block device.

Make the huge translation layer optional and hope no one will ever enable
it in the future.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(back ported from commit 4490733250b8b272a6d3e66352dd7b8025409549)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Conflicts:
drivers/nvme/host/Makefile

nvme: fixes for NVME_IOCTL_IO_CMD on the char device

BugLink: http://bugs.launchpad.net/bugs/1531539
Make sure we synchronize access to the namespaces list and grab a reference
to the namespace before doing I/O. Make sure to reject the ioctl if multiple
namespaces are present as it's entirely unsafe, and warn when using it even
with a single namespace.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bfd8947194b2e2a53db82bbc7eb7c15d028c46db)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: synchronize access to ctrl->namespaces

BugLink: http://bugs.launchpad.net/bugs/1531539
Currently traversal and modification of ctrl->namespaces happens completely
unsynchronized, which can be fixed by the addition of a simple mutex.

Note: nvme_dev_ioctl will be handled in the next patch.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 69d3b8ac15a5eb938e6a01909f6cc8ae4b5d3a17)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: Move nvme_freeze/unfreeze_queues to nvme core

BugLink: http://bugs.launchpad.net/bugs/1531539
Nothing pci specific about them and We'll need them exported
in other transports too.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 363c9aacb6c59bb63148dd115632880a4aed4d88)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Export namespace attributes to sysfs

BugLink: http://bugs.launchpad.net/bugs/1531539
Exposes the NGUID, EUI-64, and NSID to sysfs entries under the disk's
kobject.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 2b9b6e86bca7209de02754fc84acf7ab3e78734e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Add pci error handlers

BugLink: http://bugs.launchpad.net/bugs/1531539
Requests enabling pcie aer support. Shuts down the controller on error
detected with io frozen state prior to requesting slot reset; resumes
controller after reset completes.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit a0a3408ee614848c27b0d36c2fe490da3b387b8d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: merge iod and cmd_info

BugLink: http://bugs.launchpad.net/bugs/1531539
Merge the two per-request structures in the nvme driver into a single
one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f4800d6d1548e0d5ab94f2216d41d94282e2588c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: meta_sg doesn't have to be an array

BugLink: http://bugs.launchpad.net/bugs/1531539
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bf68405705bd35c09ec1f7528718dce5af88daff)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: properly free resources for cancelled command

BugLink: http://bugs.launchpad.net/bugs/1531539
We need to move freeing of resources to the ->complete handler to ensure
they are also freed when we cancel the command.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit eee417b0697827a6e120199b126b447af3c81b47)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: simplify completion handling

BugLink: http://bugs.launchpad.net/bugs/1531539
Now that all commands are executed as block layer requests we can remove the
internal completion in the NVMe driver. Note that we can simply call
blk_mq_complete_request to abort commands as the block layer will protect
against double copletions internally.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit aae239e1910ebc27ec9f7e8b25904a69626cf28c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: special case AEN requests

BugLink: http://bugs.launchpad.net/bugs/1531539
AEN requests are different from other requests in that they don't time out
or can easily be cancelled. Because of that we should not use the blk-mq
infrastructure but just special case them in the completion path.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit adf68f21c15572c68d9fadae618a09cf324b9814)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: switch abort to blk_execute_rq_nowait

BugLink: http://bugs.launchpad.net/bugs/1531539
And remove the now unused nvme_submit_cmd helper.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e7a2a87d5938bbebe1637c82fbde94ea6be3ef78)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: switch delete SQ/CQ to blk_execute_rq_nowait

BugLink: http://bugs.launchpad.net/bugs/1531539
Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit d8f32166a9c587e87a3a86f654c73d40b6b5df00)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: factor out a few helpers from req_completion

BugLink: http://bugs.launchpad.net/bugs/1531539
We'll need them in other places later.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7688faa6dd2c99ce5d66571d9ad65535ec39e8cb)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: fix admin queue depth

BugLink: http://bugs.launchpad.net/bugs/1531539
The number in tag_set->queue depth includes the reserved tags.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4680072003df14230e9eeeeefb617401012234a5)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Simplify metadata setup

BugLink: http://bugs.launchpad.net/bugs/1531539
We no longer require the two-pass setup for block integrity.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4b9d5b151046ff717819864f93cb8e012b347bce)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Remove device management handles on remove

BugLink: http://bugs.launchpad.net/bugs/1531539
We don't want to allow new references to open on a device that is
removed. This ties the lifetime of these handles to the physical device's
presence rather than to the open reference count.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 53029b0441bbd263dbb2ee6429572b1732dad4de)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Use unbounded work queue for all work

BugLink: http://bugs.launchpad.net/bugs/1531539
Removes all usage of the global work queue so work can't be
scheduled on two different work queues, and removes nvme's work queue
singlethreadedness so controllers can be driven in parallel.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: keep the dead controller removal on the system workqueue to avoid
deadlocks]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 92f7a1624bbc2361b96db81de89aee1baae40da9)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: Implement namespace list scanning

BugLink: http://bugs.launchpad.net/bugs/1531539
The NVMe 1.1 specification provides an identify mode to return a
list of active namespaces. This is more efficient to discover which
namespace identifiers are active on a controller, providing potentially
significant improvement in scan time for controllers with sparesly
populated namespaces.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: add quirk for the broken Qemu Identify implementation. To be relaxed
later]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 540c801c65eb58e05e0ca38b6fd644a83d7e2b33)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: switch abort_limit to an atomic_t

BugLink: http://bugs.launchpad.net/bugs/1531539
There is no lock to sychronize access to the abort_limit field of
struct nvme_ctrl, so switch it to an atomic_t.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 6bf25d16410d8d95e3552f31c6a99e3fc3d31752)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: remove dead controllers from a work item

BugLink: http://bugs.launchpad.net/bugs/1531539
Compared to the kthread this gives us multiple call prevention for free.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 5c8809e650772be87ba04595a8ccf278bab7b543)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: merge probe_work and reset_work

BugLink: http://bugs.launchpad.net/bugs/1531539
If we're using two work queues we're always going to run into races where
one item is tearing down what the other one is initializing. So insted
merge the two work queues, and let the old probe_work also tear the
controller down first if it was alive. Together with the better detection
of the probe path using a flag this gives us a properly serialized
reset/probe path that also doesn't accidentally trigger when two commands
time out and the second one tries to reset the controller while the first
reset is still in progress.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit fd634f4142861e533ac57e88ece8e98ab5851edb)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: do not restart the request timeout if we're resetting the controller

BugLink: http://bugs.launchpad.net/bugs/1531539
Otherwise we're never going to complete a command when it is restarted just
after we completed all other outstanding commands in nvme_clear_queue.

The controller must be disabled prior to completing a presumed lost
command, do this by directly shutting down the controller before
queueing the reset work, and return EH_HANDLED from the timeout handler
after we shut the controller down.

Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: split and rebase]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e1569a16180aef4311ff5fc54f54b23ae9e8a03e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: simplify resets

BugLink: http://bugs.launchpad.net/bugs/1531539
Don't delete the controller from dev_list before queuing a reset, instead
just check for it being reset in the polling kthread. This allows to remove
the dev_list_lock in various places, and in addition we can simply rely on
checking the queue_work return value to see if we could reset a controller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 846cc05f95d599801f296d8599e82686ebd395f0)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: add NVME_SC_CANCELLED

BugLink: http://bugs.launchpad.net/bugs/1531539
To properly document how we are using a negative Linux error value to
communicate request cancellations inside the driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 297465c873ae8c99180617ca904dc1a4a738f25d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: merge nvme_abort_req and nvme_timeout

BugLink: http://bugs.launchpad.net/bugs/1531539
We want to be able to return bettern error values frmo nvme_timeout, which
is significantly easier if the two functions are merged. Also clean up and
reduce the printk spew so that we only get one message per abort.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 31c7c7d2c9f17dc98a98c59c17e184bf164ee760)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: don't take the I/O queue q_lock in nvme_timeout

BugLink: http://bugs.launchpad.net/bugs/1531539
There is nothing it protects, but it makes lockdep unhappy in many different
ways.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4c9f748f0ee88447b28546991f60f43a7319aafd)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: protect against simultaneous shutdown invocations

BugLink: http://bugs.launchpad.net/bugs/1531539
Signed-off-by: Keith Busch <keith.busch@intel.com>
[hch: split from a larger patch]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 77bf25ea70200cddf083f74b7f617e5f07fac8bd)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: only add a controller to dev_list after it's been fully initialized

BugLink: http://bugs.launchpad.net/bugs/1531539
Without this we can easily get bad derferences on nvmeq->d_db when the nvme
kthread tries to poll the CQs for controllers that are in half initialized
state.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7385014c073263b077442439299fad013edd4409)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: only ignore hardware errors in nvme_create_io_queues

BugLink: http://bugs.launchpad.net/bugs/1531539
Half initialized queues due to kernel error returns or timeout are still a
good reason to give up on initializing a controller.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 749941f2365db8198b5d75c83a575ee6e55bf03b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: precedence bug in nvme_pr_clear()

BugLink: http://bugs.launchpad.net/bugs/1531539
The "|" operator has higher precedence than "?:" so this didn't work as
intended. I had previously fixed this bug, but it we copied the older
unfixed version when we moved the function between files.

Fixes: 1673f1f08c88 ('nvme: move block_device_operations and ns/ctrl freeing to common code')
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 8c0b39155048d5a24f25c6c60aa83729927b04cd)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: fix another 32-bit build warning

BugLink: http://bugs.launchpad.net/bugs/1531539
The nvme_user_cmd function was recently moved around from one file
to another, which made a warning reappear that I had fixed before
at some point:

drivers/nvme/host/core.c: In function 'nvme_user_cmd':
drivers/nvme/host/core.c:424:4: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]

This applies the same workaround that we have elsewhere in the
driver with an extra type cast to uintptr_t.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: 1673f1f08c88 ("nvme: move block_device_operations and ns/ctrl freeing to common code")
Link: https://lkml.org/lkml/2015/10/9/611
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit d1ea7be5f755bf1a4d4fdccc35880fcf5069df60)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

NVMe: fix build with CONFIG_NVM enabled

BugLink: http://bugs.launchpad.net/bugs/1531539
Looks like I didn't test with CONFIG_NVM enabled, and neither did
the build bot.

Most of this is really weird crazy shit in the lighnvm support, though.

Struct nvme_ns is a structure for the NVM I/O command set, and it has
no business poking into it.  Second this commit:

commit 47b3115ae7b799be8b77b0f024215ad4f68d6460
Author: Wenwei Tao <ww.tao0320@gmail.com>
Date:   Fri Nov 20 13:47:55 2015 +0100

    nvme: lightnvm: use admin queues for admin cmds

Does even more crazy stuff.  If a function gets a request_queue parameter
passed it'd better use that and not look for another one.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(back ported from commit ac02dddec63385ffef1397d3f56cec4108bcafe9)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Conflicts:
drivers/nvme/host/lightnvm.c

blk-integrity: empty implementation when disabled

BugLink: http://bugs.launchpad.net/bugs/1531539
This patch moves the blk_integrity_payload definition outside the
CONFIG_BLK_DEV_INTERITY dependency and provides empty function
implementations when the kernel configuration disables integrity
extensions. This simplifies drivers that make use of these to map user
data so they don't need to repeat the same configuration checks.

Signed-off-by: Keith Busch <keith.busch@intel.com>
Updated by Jens to pass an error pointer return from
bio_integrity_alloc(), otherwise if CONFIG_BLK_DEV_INTEGRITY isn't
set, we return a weird ENOMEM from __nvme_submit_user_cmd()
if a meta buffer is set.

Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 06c1e3902aa74b7432a7e82bb4a5aca233a42839)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: refactor set_queue_count

BugLink: http://bugs.launchpad.net/bugs/1531539
Split out a helper that just issues the Set Features and interprets the
result which can go to common code, and document why we are ignoring
non-timeout error returns in the PCIe driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 9a0be7abb62ff2a7dc3360ab45c31f29b3faf642)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: move chardev and sysfs interface to common code

BugLink: http://bugs.launchpad.net/bugs/1531539
For this we need to add a proper controller init routine and a list of
all controllers that is in addition to the list of PCIe controllers,
which stays in pci.c. Note that we remove the sysfs device when the
last reference to a controller is dropped now - the old code would have
kept it around longer, which doesn't make much sense.

This requires a new ->reset_ctrl operation to implement controleller
resets, and a new ->write_reg32 operation that is required to implement
subsystem resets. We also now store caches copied of the NVMe compliance
version and the flag if a controller is attached to a subsystem or not in
the generic controller structure now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[Fixes for pr merge]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f3ca80fc11c3af566eacd99cf821c1a48035c63b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: move namespace scanning to common code

BugLink: http://bugs.launchpad.net/bugs/1531539
The namespace scanning code has been mostly generic already, we just
need to store a pointer to the tagset in the nvme_ctrl structure, and
add a method to check if a controller is I/O incapable. The latter
will hopefully be replaced by a proper controller state machine soon.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[Fixed pr conflicts]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(back ported from commit 5bae7f73d378a986671a3cad717c721b38f80d9e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Conflicts:
drivers/nvme/host/pci.c

nvme: move the call to nvme_init_identify earlier

BugLink: http://bugs.launchpad.net/bugs/1531539
We want to record the identify and CAP values even if no I/O queue
is available.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit ce4541f40a949cd9a9c9f308b1a6a86914ce6e1a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: add a common helper to read Identify Controller data

BugLink: http://bugs.launchpad.net/bugs/1531539
And add the 64-bit register read operation for it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 7fd8930f26be4c9078684b2fef14da0503771bf2)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: move nvme_{enable,disable,shutdown}_ctrl to common code

BugLink: http://bugs.launchpad.net/bugs/1531539
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 5fd4ce1b005bd6ede913763f65efae9af6f7f386)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: move remaining CC setup into nvme_enable_ctrl

BugLink: http://bugs.launchpad.net/bugs/1531539
Remove the calculation of all the bits written into the CC register into
nvme_enable_ctrl, so that they can be moved into the core NVMe driver in
the future.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1b2eb374651f0496b86ed5f095d4c448bff214fa)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: add explicit quirk handling

BugLink: http://bugs.launchpad.net/bugs/1531539
Add an enum for all workarounds not in the spec and identify the affected
controllers at probe time.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 106198edb74cdf3fe1aefa6ad1e199b58ab7c4cb)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: move block_device_operations and ns/ctrl freeing to common code

BugLink: http://bugs.launchpad.net/bugs/1531539
This moves the block_device_operations over to common code mostly
as-is. The only change is that the ns and ctrl refcounting got some
small refcounting to have wrappers around the kref_put operations.

A new free_ctrl operation is added to allow the PCI driver to free
it's ressources on the final drop.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[Moved the integrity and pr changes due to merge conflict]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1673f1f08c8876f3942b4fa5e8f6a40215f15a94)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: use the block layer for userspace passthrough metadata

BugLink: http://bugs.launchpad.net/bugs/1531539
Use the integrity API to pass through metadata from userspace. For PI
enabled devices this means that we now validate the reftag, which seems
like an unintentional ommission in the old code.

Thanks to Keith Busch for testing and fixes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
[Skip metadata setup on admin commands]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 0b7f1f26f95a51ab11d4dc0adee230212b3cd675)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: split __nvme_submit_sync_cmd

BugLink: http://bugs.launchpad.net/bugs/1531539
Add a separate nvme_submit_user_cmd for commands that directly DMA
to or from userspace. We'll add metadata support to that soon and
the common version would become too messy.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4160982e7594481d6b7f90aa693638a37d20ea17)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: move nvme_setup_flush and nvme_setup_rw to common code

BugLink: http://bugs.launchpad.net/bugs/1531539
And mark them inline so that we don't slow down the I/O submission path by
having to turn it into a forced out of line call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 22944e9981db1e496d983298fd420a8c6b758c80)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: move nvme_error_status to common code

BugLink: http://bugs.launchpad.net/bugs/1531539
And mark it inline so that we don't slow down the completion path by
having to turn it into a forced out of line call.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 15a190f7f57a2e46717490c35ac09882042a200b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: factor out a nvme_unmap_data helper

BugLink: http://bugs.launchpad.net/bugs/1531539
This is the counter part to nvme_map_data.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit d4f6c3aba5b496a2cb80a8e8e082ae51e46579f3)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: refactor nvme_queue_rq

BugLink: http://bugs.launchpad.net/bugs/1531539
This "backports" the structure I've used for the fabrics driver. It
mostly started out as a cleanup so that I could actually understand
the code, but I think it also qualifies as a micro-optimization due
to the reduced time we hold q_lock and disable interrupts.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit ba1ca37ea4e320c108c356eb8c91ac652afc57dd)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: simplify nvme_setup_prps calling convention

BugLink: http://bugs.launchpad.net/bugs/1531539
Pass back a true/false value instead of the length which needs a compare
with the bytes in the request and drop the pointless gfp_t argument.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 69d2b571746d1c3fa10b7a0aa00859b296a98d12)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: split a new struct nvme_ctrl out of struct nvme_dev

BugLink: http://bugs.launchpad.net/bugs/1531539
The new struct nvme_ctrl will be used by the common NVMe code that sits
on top of struct request_queue and the new nvme_ctrl_ops abstraction.
It only contains the bare minimum required, which consists of values
sampled during controller probe, the admin queue pointer and a second
struct device pointer at the moment, but more will follow later. Only
values that are not used in the I/O fast path should be moved to
struct nvme_ctrl so that drivers can optimize their cache line usage
easily. That's also the reason why we have two device pointers as
the struct device is used for DMA mapping purposes.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 1c63dc66580d4bbb6d2b75bf184b5aa105ba5bdb)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: use vendor it from identify

BugLink: http://bugs.launchpad.net/bugs/1531539
Use the vendor ID from the identify data instead of the PCI device to
make the SCSI translation layer independent from the PCI driver. The NVMe
spec defines them as having the same value for current PCIe devices.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 01fec28a6f3ba96d4f46a538eae089dd92189fd1)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: split nvme_trans_device_id_page

BugLink: http://bugs.launchpad.net/bugs/1531539
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit bf7d3ebbd219d8ad948e812d03e1decfd96c97d0)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

nvme: use offset instead of a struct for registers

BugLink: http://bugs.launchpad.net/bugs/1531539
This makes life easier for future non-PCI drivers where access to the
registers might be more complicated. Note that Linux drivers are
pretty evenly split between the two versions, and in fact the NVMe
driver already uses offsets for the doorbells.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
[Fixed CMBSZ offset]
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(back ported from commit 7a67cbea653e444d04d7e850ab9631a14a196422)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Conflicts:
drivers/nvme/host/pci.c

nvme: split command submission helpers out of pci.c

BugLink: http://bugs.launchpad.net/bugs/1531539
Create a new core.c and start by adding the command submission helpers
to it, which are already abstracted away from the actual hardware queues
by the block layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(back ported from commit 21d34711e1b5970acfb22bddf1fefbfbd7e0123b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Conflicts:
drivers/nvme/host/Makefile

nvme: move struct nvme_iod to pci.c

BugLink: http://bugs.launchpad.net/bugs/1531539
This structure is specific to the PCIe driver internals and should be moved
to pci.c.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Keith Busch <keith.busch@intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 71bd150c71072014d98bff6dc2db3229306ece35)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

UBUNTU: [Config] CONFIG_BLK_DEV_NVME_SCSI=y

BugLink: http://bugs.launchpad.net/bugs/1531539
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

blk-mq: add a flags parameter to blk_mq_alloc_request

BugLink: http://bugs.launchpad.net/bugs/1531539
We already have the reserved flag, and a nowait flag awkwardly encoded as
a gfp_t. Add a real flags argument to make the scheme more extensible and
allow for a nicer calling convention.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 6f3b0e8bcf3cbb87a7459b3ed018d31d918df3f8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: ensure that nvm_dev_ops can be used without CONFIG_NVM

BugLink: http://bugs.launchpad.net/bugs/1531539
null_blk defines an empty version of this ops structure if CONFIG_NVM
isn't set, but it doesn't know the type. Move those bits out of the
protection of CONFIG_NVM in the main lightnvm include.

Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit a7fd9a4f3e8179bab31e4637236ebb0e0b7867c6)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: introduce factory reset

BugLink: http://bugs.launchpad.net/bugs/1531539
Now that a device can be managed using the system blocks, a method to
reset the device is necessary as well. This patch introduces logic to
reset the device easily to factory state and exposes it through an
ioctl.

The ioctl takes the following flags:

  NVM_FACTORY_ERASE_ONLY_USER
      By default all blocks, except host-reserved blocks are erased upon
      factory reset. Instead of this, only erase host-reserved blocks.
  NVM_FACTORY_RESET_HOST_BLKS
      Mark host-reserved blocks to be erased and set their type to free.
  NVM_FACTORY_RESET_GRWN_BBLKS
      Mark "grown bad blocks" to be erased and set their type to free.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 8b4970c41f88ad772771f87b1c82c395248a84d8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: use system block for mm initialization

BugLink: http://bugs.launchpad.net/bugs/1531539
Use system block information to register the appropriate media manager.
This enables the LightNVM subsystem to instantiate a media manager
selected by the user, instead of relying on automatic detection by each
media manager loaded in the kernel.

A device must now be initialized before it can proceed to initialize its
media manager. Upon initialization, the configured media manager is
automatically initialized as well.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit b769207678176d590ea61ce7a64c9100925668b7)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: introduce ioctl to initialize device

BugLink: http://bugs.launchpad.net/bugs/1531539
Based on the previous patch, we now introduce an ioctl to initialize the
device using nvm_init_sysblock and create the necessary system blocks.
The user may specify the media manager that they wish to instantiate on
top. Default from user-space will be "gennvm".

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 5569615424613aa006005f18b03a3a12738a47d7)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: core on-disk initialization

BugLink: http://bugs.launchpad.net/bugs/1531539
An Open-Channel SSD shall be initialized before use. To initialize, we
define an on-disk format, that keeps a small set of metadata to bring up
the media manager on top of the device.

The initial step is introduced to allow a user to format the disks for a
given media manager. During format, a system block is stored on one to
three separate luns on the device. Each lun has the system block
duplicated. During initialization, the system block can be retrieved and
the appropriate media manager can initialized.

The on-disk format currently covers (struct nvm_system_block):

- Magic value "NVMS".
- Monotonic increasing sequence number.
- The physical block erase count.
- Version of the system block format.
- Media manager type.
- Media manager superblock physical address.

The interface provides three functions to manage the system block:

int nvm_init_sysblock(struct nvm_dev *, struct nvm_sb_info *)
int nvm_get_sysblock(struct nvm *dev, struct nvm_sb_info *)
int nvm_update_sysblock(struct nvm *dev, struct nvm_sb_info *)

Each implement a part of the logic to manage the system block. The
initialization creates the first system blocks and mark them on the
device. Get retrieves the latest system block by scanning all pages in
the associated system blocks. The update sysblock writes new metadata
and allocates new block if necessary.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e3eb3799f7e0d0924ceeba672ab271865de2802d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: introduce mlc lower page table mappings

BugLink: http://bugs.launchpad.net/bugs/1531539
NAND MLC memories have both lower and upper pages. When programming,
both of these must be written, before data can be read. However,
these lower and upper pages might not placed at even and odd flash
pages, but can be skipped. Therefore each flash memory has its lower
pages defined, which can then be used when programming and to know when
padding are necessary.

This patch implements the lower page definition in the specification,
and exposes it through a simple lookup table at dev->lptbl.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit ca5927e7ab5307965104ca58bbb29d110b1d4545)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: add mccap support

BugLink: http://bugs.launchpad.net/bugs/1531539
Some flash media has extended capabilities, such as programming SLC
pages on MLC/TLC flash, erase/program suspend, scramble and encryption.
MCCAP is introduced to detect support for these capabilities in the
command set.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit f9a9995072904f2d67d649545f17f81e00f4985e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: manage open and closed blocks separately

BugLink: http://bugs.launchpad.net/bugs/1531539
LightNVM targets need to know the state of the flash block when doing
flash optimizations. An example is implementing a write buffer to
respect the flash page size. Currently, block state is not accounted
for; the media manager only differentiates among free, bad and in-use
blocks.

This patch adds the logic in the generic media manager to enable
targets manage blocks into open and close separately, and it implements
such management in rrpc. It also adds a set of flags to describe the
state of the block (open, closed, free, bad).

In order to avoid taking two locks (nvm_lun and rrpc_lun) consecutively,
we introduce lockless get_/put_block primitives so that the open and
close list locks and future common logic is handled within the nvm_lun
lock.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit ff0e498bfa185fad5e86c4c7a2db4f9648d2344f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: fix missing grown bad block type

BugLink: http://bugs.launchpad.net/bugs/1531539
The get/set bad block interface defines good block, factory bad block,
grown bad block, device reserved block, and host reserved block.
Unfortunately the grown bad block was missing, leaving the offsets wrong
for device and host side reserved blocks.

This patch adds the missing type and corrects the offsets.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit b5d4acd4cbf5029a2616084d9e9f392046d53a37)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: reference rrpc lun in rrpc block

BugLink: http://bugs.launchpad.net/bugs/1531539
Currently, a rrpc block only points to its nvm_lun. If a user wants to
find the associated rrpc lun, it will have to calculate the index and
look it up manually. By referencing the rrpc lun directly, this step can
be omitted, at the cost of a larger memory footprint.

This is important for upcoming patches that implement write buffering in
rrpc.

Signed-off-by: Javier González <javier@cnexlabs.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit d7a64d275b39e19c010cdfd8728cc64f14b59bda)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: introduce nvm_submit_ppa

BugLink: http://bugs.launchpad.net/bugs/1531539
Internal logic for both core and media managers, does not have a
backing bio for issuing I/Os. Introduce nvm_submit_ppa to allow raw
I/Os to be submitted to the underlying device driver.

The function request the device, ppa, data buffer and its length and
will submit the I/O synchronously to the device. The return value may
therefore be used to detect any errors regarding the issued I/O.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 09719b62fdab031e39b39a6470364a372abdf3f4)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: move rq->error to nvm_rq->error

BugLink: http://bugs.launchpad.net/bugs/1531539
Instead of passing request error into the LightNVM modules, incorporate
it into the nvm_rq.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 72d256ecc5d0c8cbcc0bd5c6d983b434df556cb4)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: support multiple ppas in nvm_erase_ppa

BugLink: http://bugs.launchpad.net/bugs/1531539
Sometimes a user want to erase multiple PPAs at the same time. Extend
nvm_erase_ppa to take multiple ppas and number of ppas to be erased.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 81e681d3f7424fc2f03b6269e15c63131473c98f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: move the pages per block check out of the loop

BugLink: http://bugs.launchpad.net/bugs/1531539
There is no need to check whether dev's pages per block is
beyond rrpc support every time we init a lun, we only need
to check it once before enter the lun init loop.

Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 4b79beb4c36d697e940e9f70d72399c71230a418)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: sectors first in ppa list

BugLink: http://bugs.launchpad.net/bugs/1531539
The Westlake controller requires that the PPA list has sectors defined
sequentially. Currently, the PPA list is created with planes first, then
sectors. Change this to sectors first, then planes.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 556755e941837ebc4b4859dd7f74f2ed2dd00fc7)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: fix locking and mempool in rrpc_lun_gc

BugLink: http://bugs.launchpad.net/bugs/1531539
This patch fix two issues in rrpc_lun_gc

1. prio_list is protected by rrpc_lun's lock not nvm_lun's, so
acquire rlun's lock instead of lun's before operate on the list.

2. we delete block from prio_list before allocating gcb, but gcb
allocation may fail, we end without putting it back to the list,
this makes the block won't get reclaimed in the future. To solve
this issue, delete block after gcb allocation.

Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit b262924be03d5d2ae735bc9a4b37eb2c613f61f8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: put block back to gc list on its reclaim fail

BugLink: http://bugs.launchpad.net/bugs/1531539
We delete a block from the gc list before reclaim it, so
put it back to the list on its reclaim fail, otherwise
this block will not get reclaimed and be programmable
in the future.

Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit d0ca798f960ad7d86f5186fe312c131d00563eb7)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: check bi_error in gc

BugLink: http://bugs.launchpad.net/bugs/1531539
We should check last io completion status before
starting another one.

Signed-off-by: Wenwei Tao <ww.tao0320@gmail.com>
Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 2b11c1b24e50a26d435f1d59955f1268053623b7)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

lightnvm: return the get_bb_tbl return value

BugLink: http://bugs.launchpad.net/bugs/1531539
During get_bb_tbl, a callback is used to allow an user-specific scan
function to be called. The callback may return an error, and in that
case, the return value is overridden. However, the callback error is
needed when the fault is a user error and not a kernel error. For
example, when a user tries to initialize the same device twice. The
get_bb_tbl callback should be able to communicate this.

Signed-off-by: Matias Bjørling <m@bjorling.me>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 22513215b83d62a7f5e3494209b69d4d8c266ab8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>