]> git.proxmox.com Git - mirror_ubuntu-jammy-kernel.git/log
mirror_ubuntu-jammy-kernel.git
7 years agoMerge branch 'mlx' into merge-test
Doug Ledford [Wed, 14 Dec 2016 19:44:25 +0000 (14:44 -0500)]
Merge branch 'mlx' into merge-test

7 years agoMerge branch 'hfi1' into merge-test
Doug Ledford [Wed, 14 Dec 2016 19:44:08 +0000 (14:44 -0500)]
Merge branch 'hfi1' into merge-test

7 years agoMerge branches 'chelsio', 'debug-cleanup', 'hns' and 'i40iw' into merge-test
Doug Ledford [Wed, 14 Dec 2016 19:43:14 +0000 (14:43 -0500)]
Merge branches 'chelsio', 'debug-cleanup', 'hns' and 'i40iw' into merge-test

7 years agoIB/mthca: Replace pci_pool_alloc by pci_pool_zalloc
Souptick Joarder [Thu, 1 Dec 2016 18:41:59 +0000 (00:11 +0530)]
IB/mthca: Replace pci_pool_alloc by pci_pool_zalloc

In mthca_create_ah(), pci_pool_alloc() followed by memset will be
replaced by pci_pool_zalloc()

Signed-off-by: Souptick joarder <jrdr.linux@gmail.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agomlx5, calc_sq_size(): Make a debug message more informative
Bart Van Assche [Tue, 6 Dec 2016 01:19:52 +0000 (17:19 -0800)]
mlx5, calc_sq_size(): Make a debug message more informative

Make it clear that qp->sq.wqe_cnt is not the number of WQEs.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Eli Cohen <eli@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agomlx5: Remove a set-but-not-used variable
Bart Van Assche [Tue, 6 Dec 2016 01:18:27 +0000 (17:18 -0800)]
mlx5: Remove a set-but-not-used variable

This has been detected by building the mlx5 driver with W=1.

Fixes: 1a412fb1caa2 ('net/mlx5: Fixes: 1a412fb1caa2 (IB/mlx5: Modify QP
commands via mlx5 ifc')
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Eli Cohen <eli@mellanox.com>
Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agomlx5: Use { } instead of { 0 } to init struct
Bart Van Assche [Tue, 6 Dec 2016 01:18:08 +0000 (17:18 -0800)]
mlx5: Use { } instead of { 0 } to init struct

Detected by sparse.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Eli Cohen <eli@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx4: Rework special QP creation error path
Bart Van Assche [Mon, 14 Nov 2016 16:44:11 +0000 (08:44 -0800)]
IB/mlx4: Rework special QP creation error path

The special QP creation error path relies on offset_of(struct mlx4_ib_sqp,
qp) == 0. Remove this assumption because that makes the QP creation
code easier to understand.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Laurence Oberman <loberman@redhat.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Only put mmap_info ref if it exists
Jim Foraker [Tue, 1 Nov 2016 20:44:12 +0000 (13:44 -0700)]
IB/rdmavt: Only put mmap_info ref if it exists

rvt_create_qp() creates qp->ip only when a qp creation request comes from
userspace (udata is not NULL).  If we exceed the number of available
queue pairs however, the error path always attempts to put a kref to this
structure.  If the requestor is inside the kernel, this leads to a crash.

We fix this by checking that qp->ip is not NULL before caling kref_put().

Signed-off-by: Jim Foraker <foraker1@llnl.gov>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Acked-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Handle the kthread worker using the new API
Petr Mladek [Wed, 19 Oct 2016 12:07:20 +0000 (14:07 +0200)]
IB/rdmavt: Handle the kthread worker using the new API

Use the new API to create and destroy the cq kthread worker.
The API hides some implementation details.

In particular, kthread_create_worker() allocates and initializes
struct kthread_worker. It runs the kthread the right way and stores
task_struct into the worker structure. In addition, the *on_cpu()
variant binds the kthread to the given cpu and the related memory
node.

kthread_destroy_worker() flushes all pending works, stops
the kthread and frees the structure.

This patch does not change the existing behavior. Note that we must
use the on_cpu() variant because the function starts the kthread
and it must bind it to the right CPU before waking. The numa node
is associated for given CPU as well.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Avoid queuing work into a destroyed cq kthread worker
Petr Mladek [Wed, 19 Oct 2016 12:07:19 +0000 (14:07 +0200)]
IB/rdmavt: Avoid queuing work into a destroyed cq kthread worker

The memory barrier is not enough to protect queuing works into
a destroyed cq kthread. Just imagine the following situation:

CPU1 CPU2

rvt_cq_enter()
  worker =  cq->rdi->worker;

rvt_cq_exit()
  rdi->worker = NULL;
  smp_wmb();
  kthread_flush_worker(worker);
  kthread_stop(worker->task);
  kfree(worker);

  // nothing queued yet =>
  // nothing flushed and
  // happily stopped and freed

  if (likely(worker)) {
     // true => read before CPU2 acted
     cq->notify = RVT_CQ_NONE;
     cq->triggered++;
     kthread_queue_work(worker, &cq->comptask);

  BANG: worker has been flushed/stopped/freed in the meantime.

This patch solves this by protecting the critical sections by
rdi->n_cqs_lock. It seems that this lock is not much contended
and looks reasonable for this purpose.

One catch is that rvt_cq_enter() might be called from IRQ context.
Therefore we must always take the lock with IRQs disabled to avoid
a possible deadlock.

Signed-off-by: Petr Mladek <pmladek@suse.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx4: avoid a -Wmaybe-uninitialize warning
Arnd Bergmann [Tue, 25 Oct 2016 16:16:20 +0000 (18:16 +0200)]
IB/mlx4: avoid a -Wmaybe-uninitialize warning

There is an old warning about mlx4_SW2HW_EQ_wrapper on x86:

ethernet/mellanox/mlx4/resource_tracker.c: In function ‘mlx4_SW2HW_EQ_wrapper’:
ethernet/mellanox/mlx4/resource_tracker.c:3071:10: error: ‘eq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]

The problem here is that gcc won't track the state of the variable
across a spin_unlock. Moving the assignment out of the lock is
safe here and avoids the warning.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: avoid bogus -Wmaybe-uninitialized warning
Arnd Bergmann [Mon, 24 Oct 2016 20:48:21 +0000 (22:48 +0200)]
IB/mlx5: avoid bogus -Wmaybe-uninitialized warning

We get a false-positive warning in linux-next for the mlx5 driver:

infiniband/hw/mlx5/mr.c: In function ‘mlx5_ib_reg_user_mr’:
infiniband/hw/mlx5/mr.c:1172:5: error: ‘order’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
infiniband/hw/mlx5/mr.c:1161:6: note: ‘order’ was declared here
infiniband/hw/mlx5/mr.c:1173:6: error: ‘ncont’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
infiniband/hw/mlx5/mr.c:1160:6: note: ‘ncont’ was declared here
infiniband/hw/mlx5/mr.c:1173:6: error: ‘page_shift’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
infiniband/hw/mlx5/mr.c:1158:6: note: ‘page_shift’ was declared here
infiniband/hw/mlx5/mr.c:1143:13: error: ‘npages’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
infiniband/hw/mlx5/mr.c:1159:6: note: ‘npages’ was declared here

I had a trivial workaround for gcc-5 or higher, but that didn't work
on gcc-4.9 unfortunately.

The only way I found to avoid the warnings for gcc-4.9, short of
initializing each of the arguments first was to change the calling
conventions to separate the error code from the umem pointer. This
avoids casting the error codes from one pointer to another incompatible
pointer, and lets gcc figure out when that the data is actually valid
whenever we return successfully.

Acked-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Properly adjust rate limit on QP state transitions
Bodong Wang [Thu, 1 Dec 2016 11:43:16 +0000 (13:43 +0200)]
IB/mlx5: Properly adjust rate limit on QP state transitions

- Add MODIFY_QP_EX CMD to extend modify_qp.
- Rate limit will be updated in the following state transactions: RTR2RTS,
  RTS2RTS. The limit will be removed when SQ is in RST and ERR state.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/uverbs: Extend modify_qp and support packet pacing
Bodong Wang [Thu, 1 Dec 2016 11:43:15 +0000 (13:43 +0200)]
IB/uverbs: Extend modify_qp and support packet pacing

An new uverbs command ib_uverbs_ex_modify_qp is added to support more QP
attributes. User driver should choose to call the legacy/extended API
based on input mask.

IB_USER_LAST_QP_ATTR_MASK is added to indicated the maximum bit position
which supports legacy ib_uverbs_modify_qp.
IB_USER_LEGACY_LAST_QP_ATTR_MASK indicates the maximum bit position
which supports ib_uverbs_ex_modify_qp, the value of this mask should be
updated if new mask is added later.

Along with this change, rate_limit is supported by the extended command,
user driver could use it to control packet packing.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Support rate limit for packet pacing
Bodong Wang [Thu, 1 Dec 2016 11:43:14 +0000 (13:43 +0200)]
IB/core: Support rate limit for packet pacing

Add new member rate_limit to ib_qp_attr which holds the packet pacing rate
in kbps, 0 means unlimited.

IB_QP_RATE_LIMIT is added to ib_attr_mask and could be used by RAW
QPs when changing QP state from RTR to RTS, RTS to RTS.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Report mlx5 packet pacing capabilities when querying device
Bodong Wang [Thu, 1 Dec 2016 11:43:13 +0000 (13:43 +0200)]
IB/mlx5: Report mlx5 packet pacing capabilities when querying device

Enable mlx5 based hardware to report packet pacing capabilities
from kernel to user space. Packet pacing allows to limit the rate to any
number between the maximum and minimum, based on user settings.

The capabilities are exposed to user space through query_device by uhw.
The following capabilities are reported:

1. The maximum and minimum rate limit in kbps supported by packet pacing.
2. Bitmap showing which QP types are supported by packet pacing operation.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Support RAW Ethernet when RoCE is disabled
Or Gerlitz [Sun, 27 Nov 2016 14:51:36 +0000 (16:51 +0200)]
IB/mlx5: Support RAW Ethernet when RoCE is disabled

On some environments, such as certain SRIOV VF configurations, RoCE is
not supported for mlx5 Ethernet ports. Currently, the driver will not
open IB device on that port.

This is problematic, since we do want user-space RAW Ethernet (RAW_PACKET
QPs) functionality to remain in place. For that end, enhance the relevant
driver flows such that we do create a device instance in that case.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Rename RoCE related helpers to reflect being Eth ones
Or Gerlitz [Sun, 27 Nov 2016 14:51:35 +0000 (16:51 +0200)]
IB/mlx5: Rename RoCE related helpers to reflect being Eth ones

This is a pre-step towards having mlx5 IB device also over Eth ports where
RoCE is not supported. We change the roce enable/disable and roce_lag
init/fini function names to have _eth instead of _roce.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Refactor registration to netdev notifier
Or Gerlitz [Sun, 27 Nov 2016 14:51:34 +0000 (16:51 +0200)]
IB/mlx5: Refactor registration to netdev notifier

Refactor the netdev notifier registration into a small helper function.

This is a pre-step towards having mlx5 IB device over an Ethernet port
which doesn't support RoCE. Also, renamed the de-registration helper
and the new helper as netdev notifier and not roce, to make it clear
this is not only used with roce.

This patch doesn't change any functionality.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Use u64 for UMR length
Maor Gottlieb [Sun, 27 Nov 2016 13:18:22 +0000 (15:18 +0200)]
IB/mlx5: Use u64 for UMR length

The fast_registration length is used to convey length for memory
registrations through UMR which can be of any size up to 2^64.

Change the length type to be u64.

Fixes: 968e78dd9644 ('IB/mlx5: Enhance UMR support to allow partial page table update')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Avoid system crash when enabling many VFs
Eli Cohen [Sun, 27 Nov 2016 13:18:21 +0000 (15:18 +0200)]
IB/mlx5: Avoid system crash when enabling many VFs

When enabling many VFs, the total amount of DMA mappings increase
significantly. This causes DMA allocations to take a lot of time
since they are serialized in the kernel.

As a result the driver enters into fatal condition due to
timeout and the system hangs. To recover from this we disable
MR cache for VFs.

PFs will still have a full cache and VFs cache can be manipulated
as usual after driver load.

Fixes: e126ba97dba9 ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Assign SRQ type earlier
Maor Gottlieb [Sun, 27 Nov 2016 13:18:20 +0000 (15:18 +0200)]
IB/mlx5: Assign SRQ type earlier

Move the SRQ type assignment to be before actually using it
in create_srq_user() and in create_srq_kernel() functions.

Fixes: af1ba291c5e4 ('{net, IB}/mlx5: Refactor internal SRQ API')
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx4: Fix out-of-range array index in destroy qp flow
Jack Morgenstein [Sun, 27 Nov 2016 13:18:19 +0000 (15:18 +0200)]
IB/mlx4: Fix out-of-range array index in destroy qp flow

For non-special QPs, the port value becomes non-zero only at the
RESET-to-INIT transition. If the QP has not undergone that transition,
its port number value is still zero.

If such a QP is destroyed before being moved out of the RESET state,
subtracting one from the qp port number results in a negative value.
Using that negative value as an index into the qp1_proxy array
results in an out-of-bounds array reference.

Fix this by testing that the QP type is one that uses qp1_proxy before
using the port number. For special QPs of all types, the port number is
specified at QP creation time.

Fixes: 9433c188915c ("IB/mlx4: Invoke UPDATE_QP for proxy QP1 on MAC changes")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Make create/destroy_ah available to userspace
Moni Shoua [Wed, 23 Nov 2016 06:23:26 +0000 (08:23 +0200)]
IB/mlx5: Make create/destroy_ah available to userspace

Advertise that create_ah and destroy_ah verbs are accessible from
uverbs interface.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Use kernel driver to help userspace create ah
Moni Shoua [Wed, 23 Nov 2016 06:23:25 +0000 (08:23 +0200)]
IB/mlx5: Use kernel driver to help userspace create ah

Resolving a MAC address for a given IP address in userspace is inefficient.
This patch lets mlx5 user driver using the kernel driver to resolve the mac
and get the answer in the private section of the response.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Let create_ah return extended response to user
Moni Shoua [Wed, 23 Nov 2016 06:23:24 +0000 (08:23 +0200)]
IB/core: Let create_ah return extended response to user

Add struct ib_udata to the signature of create_ah callback that is
implemented by IB device drivers. This allows HW drivers to return extra
data to the userspace library.
This patch prepares the ground for mlx5 driver to resolve destination
mac address for a given GID and return it to userspace.
This patch was previously submitted by Knut Omang as a part of the
patch set to support Oracle's Infiniband HCA (SIF).

Signed-off-by: Knut Omang <knut.omang@oracle.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Report that device has udata response in create_ah
Moni Shoua [Wed, 23 Nov 2016 06:23:23 +0000 (08:23 +0200)]
IB/mlx5: Report that device has udata response in create_ah

To make mlx5 user driver aware of whether kernel driver returns dmac
in user data response add a new flag that will be returned back to
user-space through alloc_ucontext.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Change ib_resolve_eth_dmac to use it in create AH
Moni Shoua [Wed, 23 Nov 2016 06:23:22 +0000 (08:23 +0200)]
IB/core: Change ib_resolve_eth_dmac to use it in create AH

The function ib_resolve_eth_dmac() requires struct qp_attr * and
qp_attr_mask as parameters while the function might be useful to resolve
dmac for address handles. This patch changes the signature of the
function so it can be used in the flow of creating an address handle.

Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Add support to match inner packet fields
Moses Reuben [Mon, 14 Nov 2016 17:04:52 +0000 (19:04 +0200)]
IB/mlx5: Add support to match inner packet fields

Add support to match packet fields which are tunneled,
i.e. support matching the header of the inner packet which is the result of
or bit operation of the original header and the IB_FLOW_SPEC_INNER type.

The combination of IB_FLOW_SPEC_INNER | IB_FLOW_SPEC_VXLAN_TUNNEL is not
needed to be checked, because the IB core has this check already.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Introduce inner flow steering
Moses Reuben [Mon, 14 Nov 2016 17:04:51 +0000 (19:04 +0200)]
IB/core: Introduce inner flow steering

For a tunneled packet which contains external and internal headers,
we refer to the external headers as "outer fields" and the internal
headers as "inner fields".

Example of a tunneled packet:

{ L2 | L3 | L4 | tunnel header | L2 | L3 | l4 | data }
  |     |    |         |         |    |    |
{       outer fields           }{ inner fields }

This patch introduces a new flag for flow steering rules
- IB_FLOW_SPEC_INNER - which specifies that the rule applies
to the inner fields, rather than to the outer fields of the packet.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Support Vxlan tunneling specification
Moses Reuben [Mon, 14 Nov 2016 17:04:50 +0000 (19:04 +0200)]
IB/mlx5: Support Vxlan tunneling specification

Add support to receive specific Vxlan packet in ConnectX-4.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/uverbs: Add support for Vxlan protocol
Moses Reuben [Mon, 14 Nov 2016 17:04:49 +0000 (19:04 +0200)]
IB/uverbs: Add support for Vxlan protocol

Add ib_uverbs_flow_spec_tunnel to define the rule to match Vxlan,
the type, size, reserved fields are identical to rest of the protocols,
and are used to identify the spec.
The tunnel id is the vni value of the Vxlan protocol, and it is used
as part of the steering rule, it is limited by the mask.
The steering rule configured on the hardware does a match
according to vni and other protocols.
In the same way as rest of the protocols that we match,
the uniq field's of each protocol are represented on
the val and the mask.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Align structure ib_flow_spec_type
Moses Reuben [Mon, 14 Nov 2016 17:04:48 +0000 (19:04 +0200)]
IB/core: Align structure ib_flow_spec_type

Aligned the structure ib_flow_spec_type indentation,
after adding a new definition.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/core: Add flow spec tunneling support
Moses Reuben [Mon, 14 Nov 2016 17:04:47 +0000 (19:04 +0200)]
IB/core: Add flow spec tunneling support

In order to support tunneling, that can be used by the QP,
both struct ib_flow_spec_tunnel and struct ib_flow_tunnel_filter can be
used to more IP or UDP based tunneling protocols (e.g NVGRE, GRE, etc).

IB_FLOW_SPEC_VXLAN_TUNNEL type flow specification is added to use this
functionality and match specific Vxlan packets.

In similar to IPv6, we check overflow of the vni value by
comparing with the maximum size.

Signed-off-by: Moses Reuben <mosesr@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Add support for CQE compressing
Bodong Wang [Mon, 31 Oct 2016 10:16:45 +0000 (12:16 +0200)]
IB/mlx5: Add support for CQE compressing

CQE compressing reduces PCI overhead by coalescing and compressing
multiple CQEs into a single merged CQE. Successful compressing
improves message rate especially for small packet traffic.

CQE compressing is supported for all 64B CQE formats (with certain
limitations) generated by RQ/Responder or by SQ/Requestor.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Report mlx5 CQE compression caps during query
Bodong Wang [Mon, 31 Oct 2016 10:16:44 +0000 (12:16 +0200)]
IB/mlx5: Report mlx5 CQE compression caps during query

The capabilities include:
- Max number of compressed and aggregated CQEs in a single session,
  while zero means unsupported.
- For Responder, there are two formats of mini CQE: mini CQE with Rx
  hash and mini CQE with checksum. They're mutual exclusive.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx5: Report mlx5 multi packet WQE caps during query
Bodong Wang [Mon, 31 Oct 2016 10:15:21 +0000 (12:15 +0200)]
IB/mlx5: Report mlx5 multi packet WQE caps during query

The capabilities whether hardware support multi packet WQE or not is
exposed to user space through query_device by uhw.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agonet/mlx5: Report multi packet WQE capabilities
Leon Romanovsky [Mon, 31 Oct 2016 10:15:20 +0000 (12:15 +0200)]
net/mlx5: Report multi packet WQE capabilities

Multi packet WQE enables sending multiple fix sized packets
using a single WQE. The exposed field reports such HW support.

Signed-off-by: Bodong Wang <bodong@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rxe: Increase max number of completions to 32k
Yonatan Cohen [Wed, 16 Nov 2016 08:39:16 +0000 (10:39 +0200)]
IB/rxe: Increase max number of completions to 32k

Increase limit of max CQE from 8K to 32K to allow demanding
applications to work over SoftRoCE with same configuration
as most RoCEv2 HW vendors have.

Fixes: 8700e3e7c485 ("Soft RoCE driver")
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx4: Check if GRH is available before using it
Eran Ben Elisha [Thu, 10 Nov 2016 09:31:01 +0000 (11:31 +0200)]
IB/mlx4: Check if GRH is available before using it

Before reading GRH attributes, need to make sure AH contains GRH,
and in addition, initialize GID type.

Fixes: dbf727de7440 ('IB/core: Use GID table in AH creation and dmac resolution')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/mlx4: When no DMFS for IPoIB, don't allow NET_IF QPs
Eran Ben Elisha [Thu, 10 Nov 2016 09:31:00 +0000 (11:31 +0200)]
IB/mlx4: When no DMFS for IPoIB, don't allow NET_IF QPs

According to the firmware spec, FLOW_STEERING_IB_UC_QP_RANGE command is
supported only if dmfs_ipoib bit is set.

If it isn't set we want to ensure allocating NET_IF QPs fail. We do so
by filling out the allocation bitmap. By thus, the NET_IF QPs allocating
function won't find any free QP and will fail.

Fixes: c1c98501121e ('IB/mlx4: Add support for steerable IB UD QPs')
Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Reorganize structures to align with HW capabilities
Henry Orosco [Tue, 6 Dec 2016 22:16:20 +0000 (16:16 -0600)]
i40iw: Reorganize structures to align with HW capabilities

Some resources are incorrectly organized and at odds with
HW capabilities. Specifically, ILQ, IEQ, QPs, MSS, QOS
and statistics belong in a VSI.

Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Fix incorrect check for error
Mustafa Ismail [Tue, 6 Dec 2016 21:49:35 +0000 (15:49 -0600)]
i40iw: Fix incorrect check for error

In i40iw_ieq_handle_partial() the check for !status is incorrect.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Assign MSS only when it is a new MTU
Mustafa Ismail [Tue, 6 Dec 2016 21:49:34 +0000 (15:49 -0600)]
i40iw: Assign MSS only when it is a new MTU

Currently we are changing the MSS regardless of whether
there is a change or not in MTU. Fix to make the
assignment of MSS dependent on an MTU change.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Fix race condition in terminate timer's handler
Shiraz Saleem [Tue, 6 Dec 2016 21:49:33 +0000 (15:49 -0600)]
i40iw: Fix race condition in terminate timer's handler

Add a QP reference when terminate timer is started to ensure
the destroy QP doesn't race ahead to free the QP while it is being
referenced in the terminate timer's handler.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Fix memory leak in CQP destroy when in reset
Mustafa Ismail [Tue, 6 Dec 2016 21:49:32 +0000 (15:49 -0600)]
i40iw: Fix memory leak in CQP destroy when in reset

On a device close, the control QP (CQP) is destroyed by calling
cqp_destroy which destroys the CQP and frees its SD buffer memory.
However, if the reset flag is true, cqp_destroy is never called and
leads to a memory leak on SD buffer memory. Fix this by always calling
cqp_destroy, on device close, regardless of reset. The exception to this
when CQP create fails. In this case, the SD buffer memory is already
freed on an error check and there is no need to call cqp_destroy.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Fix QP flush to not hang on empty queues or failure
Shiraz Saleem [Tue, 6 Dec 2016 21:49:31 +0000 (15:49 -0600)]
i40iw: Fix QP flush to not hang on empty queues or failure

When flush QP and there are no pending work requests, signal completion
to unblock i40iw_drain_sq and i40iw_drain_rq which are waiting on
completion for iwqp->sq_drained and iwqp->sq_drained respectively.
Also, signal completion if flush QP fails to prevent the drain SQ or RQ
from being blocked indefintely.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Fix double free of QP
Mustafa Ismail [Tue, 6 Dec 2016 21:49:30 +0000 (15:49 -0600)]
i40iw: Fix double free of QP

A QP can be double freed if i40iw_cm_disconn() is
called while it is currently being freed by
i40iw_rem_ref(). The fix in i40iw_cm_disconn() will
first check if the QP is already freed before
making another request for the QP to be freed.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Use correct src address in memcpy to rdma stats counters
Shiraz Saleem [Fri, 11 Nov 2016 16:55:41 +0000 (10:55 -0600)]
i40iw: Use correct src address in memcpy to rdma stats counters

hw_stats is a pointer to i40_iw_dev_stats struct in i40iw_get_hw_stats().
Use hw_stats and not &hw_stats in the memcpy to copy the i40iw device stats
data into rdma_hw_stats counters.

Fixes: b40f4757daa1 ("IB/core: Make device counter infrastructure dynamic")
Cc: stable@vger.kernel.org # 4.7+
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Remove macros I40IW_STAG_KEY_FROM_STAG and I40IW_STAG_INDEX_FROM_STAG
Thomas Huth [Wed, 5 Oct 2016 11:55:38 +0000 (13:55 +0200)]
i40iw: Remove macros I40IW_STAG_KEY_FROM_STAG and I40IW_STAG_INDEX_FROM_STAG

The macros I40IW_STAG_KEY_FROM_STAG and I40IW_STAG_INDEX_FROM_STAG are
apparently bad - they are using the logical "&&" operation which
does not make sense here. It should have been a bitwise "&" instead.
Since the macros seem to be completely unused, let's simply remove
them so that nobody accidentially uses them in the future. And while
we're at it, also remove the unused macro I40IW_CREATE_STAG.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Define platform_config_table_limits once
Bart Van Assche [Tue, 6 Dec 2016 00:48:11 +0000 (16:48 -0800)]
IB/hfi1: Define platform_config_table_limits once

Defining static data structures in a header file is wrong because
this causes the data structure to be instantiated once in every .c
file it is included in. Hence move the definition of a static
array from a header file into the only .c file in which it is used.

Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com>
Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
Cc: Dean Luick <dean.luick@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: constify mmu_notifier_ops structure
Bhumika Goyal [Sat, 19 Nov 2016 09:47:48 +0000 (15:17 +0530)]
IB/hfi1: constify mmu_notifier_ops structure

Declare the structure mmu_notifier_ops as const as it is only stored in
the ops field of a mmu_notifier structure. The ops field is of type
const struct mmu_notifier_ops *, so mmu_notifier_ops structures having
this property can be declared as const.
Done using coccinelle:
@r1 disable optional_qualifier @
identifier i;
position p;
@@
static struct mmu_notifier_ops i@p = {...};

@ok1@
identifier r1.i;
position p;
struct mmu_rb_handler handler;
@@
handler.mn.ops=&i@p

@bad@
position p!={r1.p,ok1.p};
identifier r1.i;
@@
i@p

@depends on !bad disable optional_qualifier@
identifier r1.i;
@@
static
+const
struct mmu_notifier_ops i={...};

@depends on !bad disable optional_qualifier@
identifier r1.i;
@@
+const
struct mmu_notifier_ops i;

File size before:
   text    data     bss     dec     hex filename
   3566      72      16    3654     e46
drivers/infiniband/hw/hfi1/mmu_rb.o

File size after:
   text    data     bss     dec     hex filename
   3658       0      16    3674     e5a
drivers/infiniband/hw/hfi1/mmu_rb.o

Signed-off-by: Bhumika Goyal <bhumirks@gmail.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt, IB/hfi1, IB/qib: Add inlines for mtu division
Mike Marciniszyn [Thu, 8 Dec 2016 03:34:37 +0000 (19:34 -0800)]
IB/rdmavt, IB/hfi1, IB/qib: Add inlines for mtu division

Add rvt_div_round_up_mtu() and rvt_div_mtu() routines to
do the computation based on the pmtu and the log_pmtu.

Change divides in qib, hfi1 to use the new inlines.

Reviewed-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1,IB/qib: use rvt swqe mr deref helper
Mike Marciniszyn [Thu, 8 Dec 2016 03:34:31 +0000 (19:34 -0800)]
IB/hfi1,IB/qib: use rvt swqe mr deref helper

Convert to use new swqe put routine.

Reviewed-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Add swqe mr deref helper
Mike Marciniszyn [Thu, 8 Dec 2016 03:34:25 +0000 (19:34 -0800)]
IB/rdmavt: Add swqe mr deref helper

Add a helper to release mr references held by
an swqe.

Reviewed-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Avoid credit return allocation for cpu-less NUMA nodes
Harish Chegondi [Thu, 8 Dec 2016 03:34:19 +0000 (19:34 -0800)]
IB/hfi1: Avoid credit return allocation for cpu-less NUMA nodes

Do not allocate credit return base and DMA memory for
NUMA nodes without CPUs.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1,IB/qib: Use new send completion helper
Mike Marciniszyn [Thu, 8 Dec 2016 03:34:12 +0000 (19:34 -0800)]
IB/hfi1,IB/qib: Use new send completion helper

Convert cq completion returns in both rdmavt drivers
to use the new helper.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Add a send completion helper
Mike Marciniszyn [Thu, 8 Dec 2016 03:34:06 +0000 (19:34 -0800)]
IB/rdmavt: Add a send completion helper

This is for use by client drivers to drive
send completions into a CQ.

A new exported table allows for the mapping
of ib_wr_opcode into a ib_wc_opcode.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/qib: Use standard refcount wrapper for QPs
Sebastian Sanchez [Thu, 8 Dec 2016 03:34:00 +0000 (19:34 -0800)]
IB/qib: Use standard refcount wrapper for QPs

Use the standard driver wrapper for QP reference counters.
This makes the code more maintainable.

Fixes: Commit 4d6f85c3fa55 ("IB/rdmavt, IB/qib, IB/hfi1: Use new QP put get routines")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Use reference count wrapper for MRs
Sebastian Sanchez [Thu, 8 Dec 2016 03:33:53 +0000 (19:33 -0800)]
IB/hfi1: Use reference count wrapper for MRs

Some parts of the code don't use the standard driver
wrapper for memory region reference counters. Use the
standard driver wrapper throughout the code.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Replace qp->refcount release code with standard driver wrapper
Sebastian Sanchez [Thu, 8 Dec 2016 03:33:46 +0000 (19:33 -0800)]
IB/hfi1: Replace qp->refcount release code with standard driver wrapper

Some parts of the code don't use the standard release
wrapper rvt_put_qp() for decrementing and testing
the refcount to then try to use a resource.
Replace this code with the standard driver wrapper.

Fixes: Commit 4d6f85c3fa55 ("IB/rdmavt, IB/qib, IB/hfi1: Use new QP put get routines")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Preserve external device completed bit
Dean Luick [Thu, 8 Dec 2016 03:33:40 +0000 (19:33 -0800)]
IB/hfi1: Preserve external device completed bit

The driver should not change the external device request
completed bit when not actually doing an external device
request.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Remove critical section gap in sc_buffer_alloc()
Sebastian Sanchez [Thu, 8 Dec 2016 03:33:33 +0000 (19:33 -0800)]
IB/hfi1: Remove critical section gap in sc_buffer_alloc()

In sc_buffer_alloc(), the sc->alloc_lock is released
before calling sc_release_update(), and it is reacquired
after the function call. This causes CPU lock trading.
Fix it by not dropping the lock before calling
sc_release_update().

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Remove usage of qp->s_cur_sge
Mitko Haralanov [Thu, 8 Dec 2016 03:33:27 +0000 (19:33 -0800)]
IB/hfi1: Remove usage of qp->s_cur_sge

The s_cur_sge field in the qp structure holds a pointer to the
SGE of the currently processed WQE. It assumes the protection
of the RVT_S_BUSY flag to prevent the changing of this field
while the send engine is using it. This scheme works as long
as there is only one instance of the send engine running at a
time.

Scaling of the send engine to multiple cores would break this
assumption as there could be multiple instances of the send engine
running on different CPUs. This opens a window where the QP's
RVT_S_BUSY flag is not set but the send engine is still running.

To prevent accidental changing of the s_cur_sge pointer, the QP's
dependence on it is removed. The SGE pointer is now stored in the
verbs_txreq, which is a per-packet data structure. This ensures
that each individual packet has it's own pointer, which is setup
while the RVT_S_BUSY flag is set.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Add trace of MR segs
Mike Marciniszyn [Thu, 8 Dec 2016 03:33:20 +0000 (19:33 -0800)]
IB/rdmavt: Add trace of MR segs

Add tracing of MR segment information.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Add special setting for low power AOC
Dean Luick [Thu, 8 Dec 2016 03:33:13 +0000 (19:33 -0800)]
IB/hfi1: Add special setting for low power AOC

Low power QSFP AOC cables require a different SerDes
Tx PLL bandwidth setting than the default.  The
8051 firmware does not know the details, so the driver
needs to tell the firmware through a special setting.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Remove definition of unused hfi1_affinity struct
Tadeusz Struk [Thu, 8 Dec 2016 03:33:06 +0000 (19:33 -0800)]
IB/hfi1: Remove definition of unused hfi1_affinity struct

The struct hfi1_affinity is not used anymore.
We use the struct hfi1_affinity_node and hfi1_affinity_node_list
instead.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Remove dependence on qp->s_cur_size
Don Hiatt [Thu, 8 Dec 2016 03:33:00 +0000 (19:33 -0800)]
IB/hfi1: Remove dependence on qp->s_cur_size

The qp->s_cur_size field assumes that the S_BUSY bit protects
the field from modification after the slock is dropped. Scaling the
send engine to multiple cores would break that assumption.

Correct the issue by carrying the payload size in the txreq structure.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Show statistics counters under IB stats interface
Jianxin Xiong [Thu, 8 Dec 2016 03:32:53 +0000 (19:32 -0800)]
IB/hfi1: Show statistics counters under IB stats interface

Previously tools like hfi1stats had to access these counters through
debugfs, which often caused permission issue for non-root users. It is
not always acceptable to change the debugfs mounting permission due
to security concerns. When exposed under the IB stats interface, the
counters are universally readable by default.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/rdmavt: Fix trace hierarchy
Dennis Dalessandro [Thu, 8 Dec 2016 03:32:47 +0000 (19:32 -0800)]
IB/rdmavt: Fix trace hierarchy

Split rdmavt traces into separate files to preserve the original
hierarchy since only one trace sub system may now be defined per header
file.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Disable header suppression for short packets
Jakub Pawlak [Thu, 8 Dec 2016 03:32:41 +0000 (19:32 -0800)]
IB/hfi1: Disable header suppression for short packets

For the received packets with payload less or equal 8DWS
RxDmaDataFifoRdUncErr is not reported. There is set RHF.EccErr
if the header is not suppressed. When such packet is detected
on the send side the header suppression mechanism is disabled
by clearing SH bit in the packet header.

Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com>
Signed-off-by: Jakub Pawlak <jakub.pawlak@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Export 8051 memory and LCB registers via debugfs
Dean Luick [Thu, 8 Dec 2016 03:32:34 +0000 (19:32 -0800)]
IB/hfi1: Export 8051 memory and LCB registers via debugfs

Both the 8051 memory and LCB register access require multiple
steps and coordination with the driver.  This cannot be safely
done with resource0 alone.

The 8051 memory is exported read-only.  LCB is exported read/write.

Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Use non-atomic __test_and_clear_bit in hot path
Sebastian Sanchez [Thu, 8 Dec 2016 03:32:28 +0000 (19:32 -0800)]
IB/hfi1: Use non-atomic __test_and_clear_bit in hot path

qp->r_aflags is already protected by qp->r_lock, therefore,
test_and_clear_bit() doesn't need to be atomic. Profile
shows this function call is costly.

Change the test_and_clear_bit() call to use the non-atomic
variant.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Fix dc8051 multiple qword memory reads
Dean Luick [Thu, 8 Dec 2016 03:32:22 +0000 (19:32 -0800)]
IB/hfi1: Fix dc8051 multiple qword memory reads

When reading multiple dc8051 data memory locations
at once, the read enabled field must be toggled
at every address change.  Do that by writing only
the address first, then writing the enable.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoIB/hfi1: Read new EPROM format
Dean Luick [Thu, 8 Dec 2016 03:32:15 +0000 (19:32 -0800)]
IB/hfi1: Read new EPROM format

Add the ability to read the new EPROM format.

Reviewed-by: Easwar Hariharan <easwar.hariharan@intel.com>
Signed-off-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoiw_cxgb4: Fix error return code in c4iw_rdev_open()
Wei Yongjun [Sat, 17 Sep 2016 00:41:37 +0000 (00:41 +0000)]
iw_cxgb4: Fix error return code in c4iw_rdev_open()

Fix to return error code -ENOMEM from the __get_free_page() error
handling case instead of 0, as done elsewhere in this function.

Fixes: 05eb23893c2c ("cxgb4/iw_cxgb4: Doorbell Drop Avoidance Bug Fixes")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Add request for reset on CQP timeout
Henry Orosco [Wed, 30 Nov 2016 21:14:15 +0000 (15:14 -0600)]
i40iw: Add request for reset on CQP timeout

When CQP times out, send a request to LAN driver for reset.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Code cleanup, remove check of PBLE pages
Henry Orosco [Wed, 30 Nov 2016 21:13:47 +0000 (15:13 -0600)]
i40iw: Code cleanup, remove check of PBLE pages

Remove check for zero 'pages' of unallocated pbles calculated in
add_pble_pool(); as it can never be true.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Correctly fail loopback connection if no listener
Shiraz Saleem [Wed, 30 Nov 2016 21:12:35 +0000 (15:12 -0600)]
i40iw: Correctly fail loopback connection if no listener

Fail the connect and return the proper error code if a client
is started with local IP address and there is no corresponding
loopback listener.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Fill in IRD value when on connect request
Shiraz Saleem [Wed, 30 Nov 2016 21:12:11 +0000 (15:12 -0600)]
i40iw: Fill in IRD value when on connect request

IRD is not populated on connect request and application is
getting 0 for the value. Fill in the correct value on
connect request.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Set TOS field in IP header
Shiraz Saleem [Wed, 30 Nov 2016 21:09:34 +0000 (15:09 -0600)]
i40iw: Set TOS field in IP header

Set the TOS field in IP header with the value passed in
from application. If there is mismatch between the remote
client's TOS and listener, set the listener Tos to the higher
of the two values.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Add NULL check for ibqp event handler
Shiraz Saleem [Wed, 30 Nov 2016 21:09:07 +0000 (15:09 -0600)]
i40iw: Add NULL check for ibqp event handler

Add NULL check for ibqp event handler before calling it to report
QP events, as it might not initialized.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Faisal Latif <faisal.latif@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Replace list_for_each_entry macro with safe version
Mustafa Ismail [Wed, 30 Nov 2016 21:08:34 +0000 (15:08 -0600)]
i40iw: Replace list_for_each_entry macro with safe version

Use list_for_each_entry_safe macro for the IPv6 addr list
as IPv6 addresses can be deleted while going through the
list.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Add IP addr handling on netdev events
Mustafa Ismail [Wed, 30 Nov 2016 21:07:30 +0000 (15:07 -0600)]
i40iw: Add IP addr handling on netdev events

Disable listeners and disconnect all connected QPs on
a netdev interface down event. On an interface up event,
the listeners are re-enabled.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Add missing cleanup on device close
Mustafa Ismail [Wed, 30 Nov 2016 20:59:26 +0000 (14:59 -0600)]
i40iw: Add missing cleanup on device close

On i40iw device close, disconnect all connected QPs by moving
them to error state; and block further QPs, PDs and CQs from
being created. Additionally, make sure all resources have been
freed before deallocating the ibdev as part of the device close.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Add 2MB page support
Henry Orosco [Wed, 30 Nov 2016 20:57:40 +0000 (14:57 -0600)]
i40iw: Add 2MB page support

Add support to allow each independent memory region to
be configured for 2MB page size in addition to 4KB
page size.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Utilize physically mapped memory regions
Henry Orosco [Wed, 30 Nov 2016 20:56:14 +0000 (14:56 -0600)]
i40iw: Utilize physically mapped memory regions

Add support to use physically mapped WQ's and MR's if determined
that the OS registered user-memory for the region is physically
contiguous. This feature will eliminate the need for unnecessarily
setting up and using PBL's when not required.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Fix incorrect assignment of SQ head
Henry Orosco [Tue, 22 Nov 2016 15:44:20 +0000 (09:44 -0600)]
i40iw: Fix incorrect assignment of SQ head

The SQ head is incorrectly incremented when the number
of WQEs required is greater than the number available.
The fix is to use the I40IW_RING_MOV_HEAD_BY_COUNT
macro. This checks for the SQ full condition first and
only if SQ has room for the request, then we move the
head appropriately.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Remove variable flush_code and check to set qp->sq_flush
Henry Orosco [Thu, 10 Nov 2016 04:20:31 +0000 (22:20 -0600)]
i40iw: Remove variable flush_code and check to set qp->sq_flush

The flush_code variable in i40iw_bld_terminate_hdr() is obsolete and
the check to set qp->sq_flush is unreachable. Currently flush code is
populated in setup_term_hdr() and both SQ and RQ are flushed always
as part of the tear down flow.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Remove check on return from device_init_pestat()
Henry Orosco [Thu, 10 Nov 2016 03:42:26 +0000 (21:42 -0600)]
i40iw: Remove check on return from device_init_pestat()

Remove unnecessary check for return code from
device_init_pestat() and change func to void.

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Use runtime check for IS_ENABLED(CONFIG_IPV6)
Henry Orosco [Thu, 10 Nov 2016 03:34:02 +0000 (21:34 -0600)]
i40iw: Use runtime check for IS_ENABLED(CONFIG_IPV6)

To be consistent, use the runtime check instead of
conditional compile.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Use actual page size
Henry Orosco [Thu, 10 Nov 2016 03:33:32 +0000 (21:33 -0600)]
i40iw: Use actual page size

In i40iw_post_send, use the actual page size instead of
encoded page size. This is to be consistent with the
rest of the file.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Remove NULL check for cm_node->iwdev
Henry Orosco [Thu, 10 Nov 2016 03:32:25 +0000 (21:32 -0600)]
i40iw: Remove NULL check for cm_node->iwdev

It is not necessary to check cm_node->iwdev in
i40iw_rem_ref_cm_node() as it can never be NULL after
a successful call out of i40iw_make_cm_node().

Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Remove checks for more than 48 bytes inline data
Henry Orosco [Thu, 10 Nov 2016 03:32:03 +0000 (21:32 -0600)]
i40iw: Remove checks for more than 48 bytes inline data

Remove dead code, which isn't executed because we
return error if the data size is greater than 48 bytes.

Inline data size greater than 48 bytes isn't supported
and the maximum WQE size is 64 bytes.

Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Query device accounts for internal rsrc
Henry Orosco [Thu, 10 Nov 2016 03:30:28 +0000 (21:30 -0600)]
i40iw: Query device accounts for internal rsrc

Some resources are consumed internally and not available to the user.
After hw is initialized, figure out how many resources are consumed
and subtract those numbers from the initial max device capability in
i40iw_query_device().

Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Optimize inline data copy
Henry Orosco [Thu, 10 Nov 2016 03:28:02 +0000 (21:28 -0600)]
i40iw: Optimize inline data copy

Use memcpy for inline data copy in sends
and writes instead of byte by byte copy.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Fix for LAN handler removal
Henry Orosco [Thu, 10 Nov 2016 03:27:02 +0000 (21:27 -0600)]
i40iw: Fix for LAN handler removal

If i40iw_open() fails for any reason, the LAN handler
is not being removed. Modify i40iw_deinit_device()
to always remove the handler.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Correct values for max_recv_sge, max_send_sge
Henry Orosco [Thu, 10 Nov 2016 03:26:39 +0000 (21:26 -0600)]
i40iw: Correct values for max_recv_sge, max_send_sge

When creating QPs, ensure init_attr->cap.max_recv_sge
is clipped to MAX_FRAG_COUNT.

Expose MAX_FRAG_COUNT for max_recv_sge and max_send_sge in
i40iw_query_qp().

Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Reviewed-By: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
7 years agoi40iw: Use vector when creating CQs
Henry Orosco [Thu, 10 Nov 2016 03:24:48 +0000 (21:24 -0600)]
i40iw: Use vector when creating CQs

Assign each CEQ vector to a different CPU when possible, then
when creating a CQ, use the vector for the CEQ id. This
allows completion work to be distributed over multiple cores.

Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>