Hence use be16_to_cpu() to convert it to CPU endianness. Compile-tested
only.
Fixes: af808ece5ce9 ("IB/SA: Check dlid before SA agent queries for ClassPortInfo") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yishai Hadas [Wed, 20 Jun 2018 14:11:39 +0000 (17:11 +0300)]
IB: Improve uverbs_cleanup_ucontext algorithm
Improve uverbs_cleanup_ucontext algorithm to work properly when the
topology graph of the objects cannot be determined at compile time. This
is the case with objects created via the devx interface in mlx5.
Typically uverbs objects must be created in a strict topologically sorted
order, so that LIFO ordering will generally cause them to be freed
properly. There are only a few cases (eg memory windows) where objects can
point to things out of the strict LIFO order.
Instead of using an explicit ordering scheme where the HW destroy is not
allowed to fail, go over the list multiple times and allow the destroy
function to fail. If progress halts then a final, desperate, cleanup is
done before leaking the memory. This indicates a driver bug.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bart Van Assche [Tue, 26 Jun 2018 22:24:48 +0000 (15:24 -0700)]
IB/srpt: Support HCAs with more than two ports
Since there are adapters that have four ports, increase the size of
the srpt_device.port[] array. This patch avoids that the following
warning is hit with quad port Chelsio adapters:
Reported-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Steve Wise <swise@opengridcomputing.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: <stable@vger.kernel.org> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Sagi Grimberg [Wed, 27 Jun 2018 08:03:45 +0000 (11:03 +0300)]
IB/iser: set can_queue earlier to allow setting higher queue depth
We need to set can_queue earlier than when enabling the scsi host.
in a blk-mq enabled environment, the tagset allocation is taken
from can_queue which cannot be modified later. Also, pass an updated
.can_queue to iscsi_session_setup to have enough iscsi tasks allocated
in the session kfifo.
Reported-by: Karandeep Chahal <karandeepchahal@gmail.com> Signed-off-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Vijay Immanuel [Tue, 19 Jun 2018 01:48:56 +0000 (18:48 -0700)]
IB/rxe: don't clear the tx queue on every transfer
Do not call sk_dst_set() on every packet transfer because
that calls sk_tx_queue_clear(), which clears the tx queue.
A QP must stay on the same tx queue to maintain packet order.
Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com> Acked-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Wed, 27 Jun 2018 07:44:26 +0000 (10:44 +0300)]
IB/cm: Remove now useless rcu_lock in dst_fetch_ha
This lock used to be protecting a call to dst_get_neighbour_noref,
however the below commit changed it to dst_neigh_lookup which no longer
requires rcu.
Access to nud_state, neigh_event_send or rdma_copy_addr does not require
RCU, so delete the lock.
Fixes: 02b619555ad6 ("infiniband: Convert dst_fetch_ha() over to dst_neigh_lookup().") Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Jason Gunthorpe [Sun, 24 Jun 2018 13:57:50 +0000 (16:57 +0300)]
IB/mlx4: Create slave AH's directly
Since slave GID's do not exist in the core gid table we can no longer use
the core code to help do this without creating inconsistencies. Directly
create the AH using mlx4 internal APIs.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Leon Romanovsky [Sun, 24 Jun 2018 08:23:47 +0000 (11:23 +0300)]
RDMA/uverbs: Don't overwrite NULL pointer with ZERO_SIZE_PTR
Number of specs is provided by user and in valid case can be equal to zero.
Such argument causes to call to kcalloc() with zero-length request and in
return the ZERO_SIZE_PTR is assigned. This pointer is different from NULL
and makes various if (..) checks to success.
Fixes: b6ba4a9aa59f ("IB/uverbs: Add support for flow counters") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Mon, 25 Jun 2018 21:21:15 +0000 (15:21 -0600)]
RDMA/uverbs: Check existence of create_flow callback
In the accepted series "Refactor ib_uverbs_write path", we presented the
roadmap to get rid of uverbs_cmd_mask and uverbs_ex_cmd_mask fields in
favor of simple check of function pointer. So let's put NULL check of
create_flow function callback despite the fact that uverbs_ex_cmd_mask
still exists.
Link: https://www.spinics.net/lists/linux-rdma/msg60753.html Suggested-by: Michael J Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Bart Van Assche [Fri, 22 Jun 2018 15:11:17 +0000 (08:11 -0700)]
MAINTAINERS: Update SRP entries
Reflect the acquisition of SanDisk by Western Digital in my e-mail
address. Remove the reference to David Dillow's git tree since SRP patches
are queued by Doug and Jason. Remove the reference to the OpenFabrics
website since the srp_daemon source code has been moved from that website
into the rdma-core project. Add an entry for the SRP target driver.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Bart Van Assche <bart.vanassche@sandisk.com> Cc: David Dillow <dillow@google.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Wed, 13 Jun 2018 17:19:42 +0000 (11:19 -0600)]
IB/usnic: Update with bug fixes from core code
usnic has a modified version of the core codes' ib_umem_get() and
related, and the copy misses many of the bug fixes done over the years:
Commit bc3e53f682d9 ("mm: distinguish between mlocked and pinned pages")
Commit 87773dd56d54 ("IB: ib_umem_release() should decrement mm->pinned_vm
from ib_umem_get")
Commit 8494057ab5e4 ("IB/uverbs: Prevent integer overflow in ib_umem_get
address arithmetic")
Commit 8abaae62f3fd ("IB/core: disallow registering 0-sized memory region")
Commit 66578b0b2f69 ("IB/core: don't disallow registering region starting
at 0x0")
Commit 53376fedb9da ("RDMA/core: not to set page dirty bit if it's already
set.")
Commit 8e907ed48827 ("IB/umem: Use the correct mm during ib_umem_release")
Yishai Hadas [Tue, 19 Jun 2018 07:43:56 +0000 (10:43 +0300)]
IB/mlx4: Add support for drain SQ & RQ
This patch follows the logic from ib_core but considers the internal
device state upon executing the involved commands.
Specifically, Upon internal error state modify QP to an error state can
be assumed to be success as each in-progress WR going to be flushed in
error in any case as expected by that modify command.
In addition,
As the drain should never fail the driver makes sure that post_send/recv
will succeed even if the device is already in an internal error state.
As such once the driver will supply the simulated/SW CQEs the CQE for
the drain WR will be handled as well.
In case of an internal error state the CQE for the drain WR may be
completed as part of the main task that handled the error state or by
the task that issued the drain WR.
As the above depends on scheduling the code takes the relevant locks
and actions to make sure that the completion handler for that WR will
always be called after that the post_send/recv were issued but not in
parallel to the other task that handles the error flow.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yishai Hadas [Tue, 19 Jun 2018 07:43:55 +0000 (10:43 +0300)]
IB/mlx5: Add support for drain SQ & RQ
This patch follows the logic from ib_core but considers the internal
device state upon executing the involved commands.
Specifically,
Upon internal error state modify QP to an error state can be assumed to
be success as each in-progress WR going to be flushed in error in any
case as expected by that modify command.
In addition,
As the drain should never fail the driver makes sure that post_send/recv
will succeed even if the device is already in an internal error state.
As such once the driver will supply the simulated/SW CQEs the CQE for
the drain WR will be handled as well.
In case of an internal error state the CQE for the drain WR may be
completed as part of the main task that handled the error state or by
the task that issued the drain WR.
As the above depends on scheduling the code takes the relevant locks and
actions to make sure that the completion handler for that WR will always
be called after that the post_send/recv were issued but not in parallel
to the other task that handles the error flow.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 19 Jun 2018 07:59:19 +0000 (10:59 +0300)]
IB/cm: Replace members of sa_path_rec with 'struct sgid_attr *'
While processing a path record entry in CM messages the associated GID
attribute is now also supplied.
Currently for RoCE a netdevice's net namespace pointer and ifindex are
stored in path record entry. Both of these fields of the netdev can change
anytime while processing CM messages. Additionally storing net namespace
without holding reference will lead to use-after-free crash. Therefore it
is removed. Netdevice information for RoCE is instead provided via
referenced gid attribute in ib_cm requests.
Such a design leads to a situation where the kernel can crash when the net
pointer becomes invalid. However today it is always initialized to
init_net, which cannot become invalid. In order to support processing
packets in any arbitrary namespace of the received packet, it is necessary
to avoid such conditions.
This patch removes the dependency on the net pointer and ifindex; instead
it will rely on SGID attribute which contains a pointer to netdev.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Parav Pandit [Tue, 19 Jun 2018 07:59:18 +0000 (10:59 +0300)]
IB/cm: Pass the sgid_attr through various events
Make the sgid_attr available along with path information to the event
consumer, this allows the consumer to keep using the same GID table entry
as the event is related to.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Parav Pandit [Tue, 19 Jun 2018 07:59:15 +0000 (10:59 +0300)]
IB: Make ib_init_ah_from_mcmember set sgid_attr
This is really just a CM support function, normally a multicast address
does not have a specific SGID - but the RDMA CM usage model does restrict
things to the netdevice the CM id is bound to, at least for roce case.
Store the selected table entry in the sgid_attr for everything else to
use.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Parav Pandit [Tue, 19 Jun 2018 07:59:14 +0000 (10:59 +0300)]
IB: Make ib_init_ah_attr_from_wc set sgid_attr
The work completion is inspected to determine what dgid table entry was
used to receieve the packet, produces a sgid_attr that matches and sticks
it in the ah_attr.
All callers of this function are now required to release the ah_attr on
success.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Michael J. Ruhl [Wed, 20 Jun 2018 16:43:23 +0000 (09:43 -0700)]
IB/hfi1: Remove INTx support and simplify MSIx usage
The INTx IRQ support does not work for all HF1 IRQ handlers
(specifically the receive data IRQs).
Remove all supporting code for the INTx IRQ.
If the requested MSIx vector request is unsuccessful, do not allow the
driver to continue.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Kamenee Arumugam <kamenee.arumugam@intel.com> Reviewed-by: Sadanand Warrier <sadanand.warrier@intel.com> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mike Marciniszyn [Wed, 20 Jun 2018 16:43:14 +0000 (09:43 -0700)]
IB/hfi1: Reorg ctxtdata and rightsize fields
Many fields in ctxtdata are incorrectly sized and the organization of the
fields within the structure is a jumble.
Fix by:
- Correcting oversize fields.
- Putting fields common to all contexts at the top with hot fields
at the top.
- Moving PSM fields to the bottom of the structure.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mike Marciniszyn [Wed, 20 Jun 2018 16:43:06 +0000 (09:43 -0700)]
IB/hfi1: Remove caches of chip CSRs
Remove the sizeable cache of the chip sizing CSRs and replace with CSR
reads as needed.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mike Marciniszyn [Wed, 20 Jun 2018 16:42:57 +0000 (09:42 -0700)]
IB/hfi1: Remove unused/writeonly devdata fields
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mike Marciniszyn [Wed, 20 Jun 2018 16:42:49 +0000 (09:42 -0700)]
IB/hfi1: Rightsize ctxt_eager_bufs fields
Fields in this structure are sized excessively based on hardware
limitations and input values.
Fix by reducing fields as appropriate and repositioning to close holes in
the structure.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mike Marciniszyn [Wed, 20 Jun 2018 16:42:40 +0000 (09:42 -0700)]
IB/hfi1: Remove rcvctrl from ctxtdata
It is only ever written.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mike Marciniszyn [Wed, 20 Jun 2018 16:42:31 +0000 (09:42 -0700)]
IB/hfi1: Remove rcvhdrq_size
The usage of this ctxt data field is not hot path and the value can be
computed on demand to cut down the ctxtdata bloat.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Thu, 21 Jun 2018 12:31:25 +0000 (15:31 +0300)]
IB/core: Free GID table entry during GID deletion
If we already hold the table->lock when doing the kref_put it means we are
in a context where it is safe to do the deletion synchronously, with no
need for the work queue.
This helps to eliminate issues when GID change is requested as part of MAC
address change or bonding event change where expectation is to replace the
GID almost immediately.
Fixes: b150c3862d21 ("IB/core: Introduce GID entry reference counts") Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Thu, 21 Jun 2018 12:31:24 +0000 (15:31 +0300)]
RDMA/cma: Consider net namespace while leaving multicast group
When sending multicast leave request, consider the net ns in which this
cm_id is created.
Code was duplicated in cma_leave_mc_groups() and rdma_leave_multicast(),
which is now done using a helper function cma_leave_roce_mc_group().
Fixes: bee3c3c91865 ("IB/cma: Join and leave multicast groups with IGMP") Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Move some s_flags defines out of rdmavt and into hfi1 because they are
hfi1 specific and therefore should remain in the driver instead of
bubbling up to rdmavt.
Document device specific ranges in rdmavt and remap
those in hfi1.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Kaike Wan <kaike.wan@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The field is based on a constant that can never change.
Use the define to assign the register instead.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This field should be in ctxtdata to allow for better locality of access by
eliminating a dd dereference.
The new field is now side-by-side with rcvhdrqentsize since the rhf_offset
is a function of the rcvhdrqentsize.
Both fields are now correctly sized as u8.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
IB/hfi1: Move normal functions from hfi1_devdata to const array
The current implementation precludes having receive context specific
packet type receive handlers.
Fix this by adding adding c99 const array for the existing handlers and
remove the current 72 bytes of pointers from devdata.
A new pointer in hfi1_ctxtdata will point to the const array.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yishai Hadas [Sun, 17 Jun 2018 10:00:05 +0000 (13:00 +0300)]
IB/mlx5: Add DEVX query EQN support
Return the matching device EQN for a given user vector number via the
DEVX interface.
Note:
EQs are owned by the kernel and shared by all user processes.
Basically, a user CQ can point to any EQ.
The kernel doesn't enforce any such limitation today either.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yishai Hadas [Sun, 17 Jun 2018 10:00:04 +0000 (13:00 +0300)]
IB/mlx5: Add DEVX support for memory registration
Add support to register a memory with the firmware via the DEVX
interface.
The driver translates a given user address to ib_umem then it will
register the physical addresses with the firmware and get a unique id
for this registration to be used for this virtual address.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yishai Hadas [Sun, 17 Jun 2018 10:00:03 +0000 (13:00 +0300)]
IB/mlx5: Add support for DEVX query UAR
Return a device UAR index for a given user index via the DEVX interface.
Security note:
The hardware protection mechanism works like this: Each device object that
is subject to UAR doorbells (QP/SQ/CQ) gets a UAR ID (called uar_page in
the device specification manual) upon its creation. Then upon doorbell,
hardware fetches the object context for which the doorbell was rang, and
validates that the UAR through which the DB was rang matches the UAR ID
of the object.
If no match the doorbell is silently ignored by the hardware. Of
course, the user cannot ring a doorbell on a UAR that was not mapped to
it.
Now in devx, as the devx kernel does not manipulate the QP/SQ/CQ command
mailboxes (except tagging them with UID), we expose to the user its UAR
ID, so it can embed it in these objects in the expected specification
format. So the only thing the user can do is hurt itself by creating a
QP/SQ/CQ with a UAR ID other than his, and then in this case other users
may ring a doorbell on its objects.
The consequence of that will be that another user can schedule a QP/SQ
of the buggy user for execution (just insert it to the hardware schedule
queue or arm its CQ for event generation), no further harm is expected.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Yishai Hadas [Sun, 17 Jun 2018 09:59:57 +0000 (12:59 +0300)]
IB/mlx5: Introduce DEVX
Introduce DEVX to enable direct device commands in downstream patches
from this series.
In that mode of work the firmware manages the isolation between
processes' resources and as such a DEVX user id is created and assigned
to the given user context upon allocation request.
A capability check is done to make sure that this feature is really
supported by the firmware prior to creating the DEVX user id.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matan Barak [Sun, 17 Jun 2018 09:59:54 +0000 (12:59 +0300)]
IB/uverbs: Allow an empty namespace in ioctl() framework
The ioctl parser framework wrongly assumed that each namespace is
populated. This could lead to NULL dereferences. Fix the parser to
always check that a given namespace indeed exists.
Fixes: fac9658cabb9 ("IB/core: Add new ioctl interface") Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matan Barak [Sun, 17 Jun 2018 09:59:53 +0000 (12:59 +0300)]
IB/uverbs: Add a macro to define a type with no kernel known size
Sometimes the uverbs uAPI doesn't really care about the structure it gets
from user-space. All it wants to do is to allocate enough space and send
it to the hardware/provider driver. Adding a UVERBS_ATTR_MIN_SIZE that
could be used for this scenarios. We use USHRT_MAX as the kernel known
size to bypass any zero validations.
Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matan Barak [Sun, 17 Jun 2018 09:59:52 +0000 (12:59 +0300)]
IB/uverbs: Add PTR_IN attributes that are allocated/copied automatically
Adding UVERBS_ATTR_SPEC_F_ALLOC_AND_COPY flag to PTR_IN attributes.
By using this flag, the parse automatically allocates and copies the
user-space data. This data is accessible by using uverbs_attr_get_len
and uverbs_attr_get_alloced_ptr inline accessor functions from the
handler.
Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Yishai Hadas <yishaih@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Matan Barak [Sun, 17 Jun 2018 09:59:51 +0000 (12:59 +0300)]
IB/uverbs: Refactor uverbs_finalize_objects
uverbs_finalize_objects is currently used only to commit or abort
objects. Since we want to add automatic allocation/free of PTR_IN
attributes, moving it to uverbs_ioctl.c and renamit it to
uverbs_finalize_attrs.
Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Steve Wise [Mon, 18 Jun 2018 15:05:26 +0000 (08:05 -0700)]
IB/core: add max_send_sge and max_recv_sge attributes
This patch replaces the ib_device_attr.max_sge with max_send_sge and
max_recv_sge. It allows ulps to take advantage of devices that have very
different send and recv sge depths. For example cxgb4 has a max_recv_sge
of 4, yet a max_send_sge of 16. Splitting out these attributes allows
much more efficient use of the SQ for cxgb4 with ulps that use the RDMA_RW
API. Consider a large RDMA WRITE that has 16 scattergather entries.
With max_sge of 4, the ulp would send 4 WRITE WRs, but with max_sge of
16, it can be done with 1 WRITE WR.
Acked-by: Sagi Grimberg <sagi@grimberg.me> Acked-by: Christoph Hellwig <hch@lst.de> Acked-by: Selvin Xavier <selvin.xavier@broadcom.com> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Vijay Immanuel [Wed, 13 Jun 2018 01:12:05 +0000 (18:12 -0700)]
IB/rxe: support for 802.1q VLAN on the listener
Set the vlan flag and vlan_id field in the wc for rdma_listen()
to work over VLAN. This is required by ib_init_ah_attr_from_wc()
which is called by the CM REQ handler.
Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com> Reviewed-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Vijay Immanuel [Thu, 14 Jun 2018 01:48:37 +0000 (18:48 -0700)]
IB/rxe: increase max MR limit
Increase the max MR limit to support more I/O queues
for NVMe over Fabrics hosts.
Signed-off-by: Vijay Immanuel <vijayi@attalasystems.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Allocate agent IDs from a global IDR instead of an atomic variable.
This eliminates the possibility of reusing an ID which is already in
use after 4 billion registrations. We limit the assigned ID to be less
than 2^24 as the mlx4 driver uses the most significant byte of the agent
ID to store the slave number. Users unlucky enough to see a collision
between agent numbers and slave numbers see messages like:
mlx4_ib: egress mad has non-null tid msb:1 class:4 slave:0
and the MAD layer stops working.
We look up the agent under protection of the RCU lock, which means we
have to free the agent using kfree_rcu, and only increment the reference
counter if it is not 0.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reported-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com> Acked-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Tested-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Allow users of the IDR to use the XArray lock for their own
synchronisation purposes. The IDR continues to rely on the caller to
handle locking, but this lets the caller use the lock embedded in the
IDR data structure instead of allocating their own lock.
Signed-off-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Wed, 13 Jun 2018 07:22:09 +0000 (10:22 +0300)]
RDMA: Convert drivers to use the AH's sgid_attr in post_wr paths
For UD the drivers were doing a sgid_index lookup into the cache to get
the attrs, however we can now directly access the same attrs stores in
the ib_ah instead and remove the lookup.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Parav Pandit [Wed, 13 Jun 2018 07:22:07 +0000 (10:22 +0300)]
IB/mlx4: Use GID attribute from ah attribute
While converting GID index from attribute to that of the HCA, GID
attribute is available from the ah_attr. Make use of GID attribute
to simplify the code and also avoid avoid GID query.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Parav Pandit [Wed, 13 Jun 2018 07:22:06 +0000 (10:22 +0300)]
RDMA: Convert drivers to use sgid_attr instead of sgid_index
The core code now ensures that all driver callbacks that receive an
rdma_ah_attrs will have a sgid_attr's pointer if there is a GRH present.
Drivers can use this pointer instead of calling a query function with
sgid_index. This simplifies the drivers and also avoids races where a
gid_index lookup may return different data if it is changed.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Jason Gunthorpe [Wed, 13 Jun 2018 07:22:05 +0000 (10:22 +0300)]
IB{cm, core}: Introduce and use ah_attr copy, move, replace APIs
Introduce AH attribute copy, move and replace APIs to be used by core and
provider drivers.
In CM code flow when ah attribute might be re-initialized twice while
processing incoming request, or initialized once while from path record
while sending out CM requests. Therefore use rdma_move_ah_attr API to
handle such scenarios instead of memcpy().
Provider drivers keeps a copy ah_attr during the lifetime of the ah.
Therefore, use rdma_replace_ah_attr() which conditionally release
reference to old ah_attr and holds reference to new attribute whose
referrence is released when the AH is freed.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Jason Gunthorpe [Wed, 13 Jun 2018 07:22:03 +0000 (10:22 +0300)]
IB/core: Add a sgid_attr pointer to struct rdma_ah_attr
The sgid_attr will ultimately replace the sgid_index in the ah_attr.
This will allow for all layers to have a consistent view of what
gid table entry was selected as processing runs through all stages of the
stack.
This commit introduces the pointer and ensures it is set before calling
any driver callback that includes a struct ah_attr callback, allowing
future patches to adjust both the drivers and the callers to use
sgid_attr instead of sgid_index.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Matthew Wilcox [Fri, 8 Jun 2018 17:42:17 +0000 (10:42 -0700)]
IB/mad: Agent registration is process context only
Document this (it's implicitly true due to sleeping operations already
in use in both registration and deregistration). Use this fact to use
spin_lock_irq instead of spin_lock_irqsave. This improves performance
slightly.
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Jason Gunthorpe [Tue, 12 Jun 2018 02:56:50 +0000 (20:56 -0600)]
IB/rxe: Do not hide uABI stuff in memcpy
struct rxe_global_route and struct ib_global_route are not the same thing
and should not be memcpy'd over each other, do a member by member copy
instead. This allows the layout of the in-kernel struct ib_global_route to
be changed without breaking rxe.
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Shiraz Saleem [Fri, 1 Jun 2018 17:18:36 +0000 (12:18 -0500)]
i40iw: Reorganize acquire/release of locks in i40iw_manage_apbvt
Commit f43c00c04bbf ("i40iw: Extend port reuse support for listeners")
introduces a sparse warning:
include/linux/spinlock.h:365:9: sparse: context imbalance in
'i40iw_manage_apbvt' - unexpected unlock
Fix this by reorganizing the acquire/release of locks in
i40iw_manage_apbvt and add a new function i40iw_cqp_manage_abvpt_cmd
to perform the CQP command. Also, use __clear_bit and __test_and_set_bit
as we do not need atomic versions.
Fixes: f43c00c04bbf ("i40iw: Extend port reuse support for listeners") Suggested-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Leon Romanovsky [Tue, 5 Jun 2018 04:55:02 +0000 (07:55 +0300)]
RDMA/uverbs: Refactor flow_resources_alloc() function
Simplify the flow_resources_alloc() function call by reducing
number of goto statements.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 5 Jun 2018 05:40:22 +0000 (08:40 +0300)]
IB: Replace ib_query_gid/ib_get_cached_gid with rdma_query_gid
If the gid_attr argument is NULL then the functions behave identically to
rdma_query_gid. ib_query_gid just calls ib_get_cached_gid, so everything
can be consolidated to one function.
Now that all callers either use rdma_query_gid() or ib_get_cached_gid(),
ib_query_gid() API is removed.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 5 Jun 2018 05:40:19 +0000 (08:40 +0300)]
net/smc: Replace ib_query_gid with rdma_get_gid_attr
Push the copy of the gid_attr into the SMC code. This probably doesn't
push it far enough, as it looks like the conn->lgr should potentially hold
the reference for its lifetime.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 5 Jun 2018 05:40:17 +0000 (08:40 +0300)]
IB/core: Introduce GID attribute get, put and hold APIs
This patch introduces three APIs, rdma_get_gid_attr(),
rdma_put_gid_attr(), and rdma_hold_gid_attr() which expose the reference
counting for GID table entries to the entire stack. The kref counting is
based on the struct ib_gid_attr pointer
Later patches will convert more cache query function to return struct
ib_gid_attrs.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 5 Jun 2018 05:40:16 +0000 (08:40 +0300)]
RDMA: Use GID from the ib_gid_attr during the add_gid() callback
Now that ib_gid_attr contains the GID, make use of that in the add_gid()
callback functions for the provider drivers to simplify the add_gid()
implementations.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 5 Jun 2018 05:40:15 +0000 (08:40 +0300)]
IB/core: Introduce GID entry reference counts
In order to be able to expose pointers to the ib_gid_attrs in the GID
table we need to make it so the value of the pointer cannot be
changed. Thus each GID table entry gets a unique piece of kref'd memory
that is written only during initialization and remains constant for its
lifetime.
This eventually will allow the struct ib_gid_attrs to be returned without
copy from many of query the APIs, but it also provides a way to track when
all users of a HW table index go away.
For roce we no longer allow an in-use HW table index to be re-used for a
new an different entry. When a GID table entry needs to be removed it is
hidden from the find API, but remains as a valid HW index and all
ib_gid_attr points remain valid. The HW index is not relased until all
users put the kref.
Later patches will broadly replace the use of the sgid_index integer with
the kref'd structure.
Ultimately this will prevent security problems where the OS changes the
properties of a HW GID table entry while an active user object is still
using the entry.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 5 Jun 2018 05:40:14 +0000 (08:40 +0300)]
IB/core: Store default GID property per-table instead of per-entry
There are at max one or two default GIDs for RoCE. Instead of storing
a default GID property for all the GIDs, store default GID indices as
individual bit per table.
This allows a future simplification to get rid of the GID property field.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Parav Pandit [Tue, 5 Jun 2018 05:40:13 +0000 (08:40 +0300)]
IB/core: Do not set the gid type when reserving default entries
When default GIDs are added, their gid type is set by
ib_cache_gid_set_default_gid(). There is no need to set the gid type of a
free GID entry during GID table initialization.
Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Linus Torvalds [Sat, 16 Jun 2018 20:37:55 +0000 (05:37 +0900)]
Merge tag 'for-linus-20180616' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A collection of fixes that should go into -rc1. This contains:
- bsg_open vs bsg_unregister race fix (Anatoliy)
- NVMe pull request from Christoph, with fixes for regressions in
this window, FC connect/reconnect path code unification, and a
trace point addition.
- timeout fix (Christoph)
- remove a few unused functions (Christoph)
- blk-mq tag_set reinit fix (Roman)"
* tag 'for-linus-20180616' of git://git.kernel.dk/linux-block:
bsg: fix race of bsg_open and bsg_unregister
block: remov blk_queue_invalidate_tags
nvme-fabrics: fix and refine state checks in __nvmf_check_ready
nvme-fabrics: handle the admin-only case properly in nvmf_check_ready
nvme-fabrics: refactor queue ready check
blk-mq: remove blk_mq_tagset_iter
nvme: remove nvme_reinit_tagset
nvme-fc: fix nulling of queue data on reconnect
nvme-fc: remove reinit_request routine
blk-mq: don't time out requests again that are in the timeout handler
nvme-fc: change controllers first connect to use reconnect path
nvme: don't rely on the changed namespace list log
nvmet: free smart-log buffer after use
nvme-rdma: fix error flow during mapping request data
nvme: add bio remapping tracepoint
nvme: fix NULL pointer dereference in nvme_init_subsystem
blk-mq: reinit q->tag_set_list entry only after grace period
Linus Torvalds [Sat, 16 Jun 2018 20:25:18 +0000 (05:25 +0900)]
Merge tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental
Pull documentation fixes from Mauro Carvalho Chehab:
"This solves a series of broken links for files under Documentation,
and improves a script meant to detect such broken links (see
scripts/documentation-file-ref-check).
The changes on this series are:
- can.rst: fix a footnote reference;
- crypto_engine.rst: Fix two parsing warnings;
- Fix a lot of broken references to Documentation/*;
- improve the scripts/documentation-file-ref-check script, in order
to help detecting/fixing broken references, preventing
false-positives.
After this patch series, only 33 broken references to doc files are
detected by scripts/documentation-file-ref-check"
* tag 'docs-broken-links' of git://linuxtv.org/mchehab/experimental: (26 commits)
fix a series of Documentation/ broken file name references
Documentation: rstFlatTable.py: fix a broken reference
ABI: sysfs-devices-system-cpu: remove a broken reference
devicetree: fix a series of wrong file references
devicetree: fix name of pinctrl-bindings.txt
devicetree: fix some bindings file names
MAINTAINERS: fix location of DT npcm files
MAINTAINERS: fix location of some display DT bindings
kernel-parameters.txt: fix pointers to sound parameters
bindings: nvmem/zii: Fix location of nvmem.txt
docs: Fix more broken references
scripts/documentation-file-ref-check: check tools/*/Documentation
scripts/documentation-file-ref-check: get rid of false-positives
scripts/documentation-file-ref-check: hint: dash or underline
scripts/documentation-file-ref-check: add a fix logic for DT
scripts/documentation-file-ref-check: accept more wildcards at filenames
scripts/documentation-file-ref-check: fix help message
media: max2175: fix location of driver's companion documentation
media: v4l: fix broken video4linux docs locations
media: dvb: point to the location of the old README.dvb-usb file
...
Linus Torvalds [Sat, 16 Jun 2018 20:06:18 +0000 (05:06 +0900)]
Merge tag 'fsnotify_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull fsnotify updates from Jan Kara:
"fsnotify cleanups unifying handling of different watch types.
This is the shortened fsnotify series from Amir with the last five
patches pulled out. Amir has modified those patches to not change
struct inode but obviously it's too late for those to go into this
merge window"
* tag 'fsnotify_for_v4.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
fsnotify: add fsnotify_add_inode_mark() wrappers
fanotify: generalize fanotify_should_send_event()
fsnotify: generalize send_to_group()
fsnotify: generalize iteration of marks by object type
fsnotify: introduce marks iteration helpers
fsnotify: remove redundant arguments to handle_event()
fsnotify: use type id to identify connector object type