Mike Marciniszyn [Tue, 24 May 2016 19:50:23 +0000 (12:50 -0700)]
IB/hfi1: Fix hard lockup due to not using save/restore spin lock
Commit b9b06cb6feda
("IB/hfi1: Fix missing lock/unlock in verbs drain callback")
added a spin lock.
Unfortunately, the new lock code can be called from a base
level interrupt state, and an interrupt that can get stacked
will attempt to get the same lock.
Fix by using the flag save/restore spin lock variation.
Cc: stable@vger.kernel.org # 4.6+ Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Jianxin Xiong [Tue, 24 May 2016 19:50:10 +0000 (12:50 -0700)]
IB/hfi1, qib: Add ieth to the packet header definitions
A new union member "ieth" (Invalidate Extended Transport Header) is
added to the packet header definition in preparation of supporting
the send with invalidate opcode.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
The TODO list for the hfi1 driver was completed during 4.6. In addition
other objections raised (which are far beyond what was in the TODO list)
have been addressed as well. It is now time to remove the driver from
staging and into the drivers/infiniband sub-tree.
Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/hfi1: Do not free hfi1 cdev parent structure early
The deletion of a cdev is not a fence for holding off references to the
structure. The driver attempts to delete the cdev and then proceeds to
free the parent structure, the hfi1_devdata, or dd. This can potentially
lead to a kernel panic in situations where a user has an FD for the cdev
open, and the pci device gets removed. If the user then closes the FD
there will be a NULL dereference when trying to do put on the cdev's
kobject.
Fix this by pointing the cdev's kobject.parent at a new kobject embedded
in its parent structure. Also take a reference when the device is opened
and put it back when it is closed.
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/hfi1: Remove write(), use ioctl() for user cmds
Remove the write() handler for user space commands now that ioctl
handling is available. User apps will need to change to use ioctl from
this point forward.
IOCTL is more suited to what user space commands need to do than the
write() interface. Add IOCTL definitions for all existing write commands
and the handling for those. The write() interface will be removed in a
follow on patch.
Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
The snoop/diag interface is better served by an implementation which is
more general and usable by other drivers perhaps. Go ahead and remove
the code now and get rid of the char dev. We can put the feature back
when we have a more agreeable solution.
Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Remove UI char device which exposes direct access to registers for user
space. This was put in to aid in debugging the hardware. We are looking
into alternatives means of providing the same functionality. This
removes another char device from HFI1's footprint.
Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
hfi1 current exports a cdev that can be used to target all of the hfi's
in the system. However there is a problem with this approach in
that the devices could be on different subnets. This is a problem that
user space can figure out and explicitly tell the driver on which device
to create a context.
Remove the multi-purpose cdev leaving a dedicated cdev for each port.
Also remove the striping capability that is dependent upon the user
choosing the multi-purpose cdev. It is now up to user space to determine
how to stripe contexts.
Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Jianxin Xiong [Thu, 19 May 2016 12:21:57 +0000 (05:21 -0700)]
IB/hfi1: Fix bug that blocks process on exit after port bounce
During the processing of a user SDMA request, if there was an
error before the request counter was increased, the state of
the packet queue could be updated incorrectly, causing the
counter to underflow. As the result, the process could get
stuck later since the counter could never get back to 0.
This patch adds a condition to guard the packet queue update
so that the counter is only decreased if it has been increased
before the error happens.
Jubin John [Thu, 19 May 2016 12:21:50 +0000 (05:21 -0700)]
IB/qib: Remove unused qib_7322_intr_msgs[]
Building the qib driver with gcc version 6.1.0 raises the following
build warning:
drivers/infiniband/hw/qib/qib_iba7322.c:1311:39: warning:
'qib_7322_intr_msgs' defined but not used [-Wunused-const-variable=]
static const struct qib_hwerror_msgs qib_7322_intr_msgs[] = {
^~~~~~~~~~~~~~~~~~
Remove the unused qib_7322_intr_msgs[]
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Jubin John [Thu, 19 May 2016 12:21:37 +0000 (05:21 -0700)]
IB/hfi1: Fix sdma_event_names[] build warning
sdma_event_names[] is only used within CONFIG_SDMA_VERBOSITY ifdefs, so
when CONFIG_SDMA_VERBOSITY is disabled, it results in the following
0-day build warning:
>> drivers/infiniband/hw/hfi1/sdma.c:137:27: warning: 'sdma_event_names'
>> defined but not used [-Wunused-const-variable=]
static const char * const sdma_event_names[] = {
^~~~~~~~~~~~~~~~
This occurs on the following compiler:
compiler: gcc-6 (Debian 6.1.1-1) 6.1.1 20160430
For more information check:
https://lists.01.org/pipermail/kbuild-all/2016-May/020060.html
Fix this warning by defining sdma_event_name[] only within the
CONFIG_SDMA_VERBOSITY ifdefs.
Reported-by: kbuild test robot <fengguang.wu@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Thu, 19 May 2016 12:21:18 +0000 (05:21 -0700)]
IB/hfi1: Fix an interval RB node reference count leak
Commit e88c9271d9f8 ("IB/hfi1: Fix buffer cache corner case which
may cause corruption") introduced a bug which may cause a reference
count of a interval RB node to be leaked in the case where an SDMA
transfer from that node completes at the same time as the node is
being extended.
If a node is being extended, it is first removed from the RB tree
in order to be processed without the risk of an invalidation event
removing the node at the same time.
If a SDMA completion happens during that time, the completion handler
will fail to find the node in the RB tree and, therefore, fail to
correctly decrement its refcount. This leaves the node in the tree and
its pages pinned for the duration of the user process.
To prevent this from happening the io vector adds a reference to the
RB node, which is used during the SDMA completion instead of looking
up the node in the RB tree.
This change adds a performance improvement as a side effect by avoiding
the RB tree lookup.
Fixes: e88c9271d9f8 ("IB/hfi1: Fix buffer cache corner case which may cause corruption") Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
staging/rdma/hfi1: use RCU_INIT_POINTER() when NULLing.
It is safe to use RCU_INIT_POINTER() to NULL a pointer, instead of
rcu_assign_pointer().
This results in slightly smaller/faster code.
Signed-off-by: Muhammad Falak R Wani <falakreyaz@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Ashutosh Dixit [Thu, 12 May 2016 17:24:00 +0000 (10:24 -0700)]
IB/hfi1: Change hfi1_init loop to preserve error returns
If one iteration of the loop causes an error return and a later iteration
doesn't, the later iteration causes the earlier error condition to be
lost. This could result in driver probe succeeding when it should have
failed. Therefore save off the error return in the loop itself rather than
outside the loop.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Jianxin Xiong [Thu, 12 May 2016 17:23:53 +0000 (10:23 -0700)]
ib_pack.h: Add opcode definition for send with invalidate
The opcode for "SEND Last with Invalidate" and "SEND Only with
Invalidate" have been defined for RC in IBA Specification Vol 1
since Release 1.2. Add the definition to the header file in
preparation of supporting these opcodes in rdmavt based drivers.
Jianxin Xiong [Thu, 12 May 2016 17:23:47 +0000 (10:23 -0700)]
IB/hfi1: Keep SC_USER as the last send context type
SC_USER needs to be the last send context type to ensure other
send context types get their allocation when num_user_contexts
is set to a large number.
This fixes a panic when the module parameter num_user_contexts
is set to 141 and larger.
Dean Luick [Thu, 12 May 2016 17:23:41 +0000 (10:23 -0700)]
IB/hfi1: Immediately apply congestion setting MAD
The handling of the congestion setting MAD packet only
saved off the values, waiting for a congestion control
table packet before going active. Instead, immediately
apply the values.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Jubin John [Thu, 12 May 2016 17:23:22 +0000 (10:23 -0700)]
IB/hfi1: Fix hfi_rcvhdr tracepoint
The hfi_rcvhdr tracepoint has the ctxt and eflags switched in the
prototype of the trace event, compared to the args and usage of the
trace function. Fix this by swapping these 2 fields in the trace event
prototype.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Jubin John [Thu, 12 May 2016 17:23:16 +0000 (10:23 -0700)]
IB/hfi1: Remove unnecessary header
While running perftests, there is a significant utilization of the
random number daemon. This is due to the linux/random.h header being
included in qp.c and verbs.c. However, none of the functions from this
header are being used in these files, so remove the unnecessary header.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Mitko Haralanov [Thu, 12 May 2016 17:23:09 +0000 (10:23 -0700)]
IB/hfi1: Improve performance of interval RB trees
The interval RB tree management functions use handlers to
store user-specific callback for the various tree operations.
These handlers are put on a doubly-linked list. When a RB
tree function is called, the list is searched for the handler
of the particular tree.
The list which holds the handlers is modified very rarely - when
a handler is created and when a handler is removed. On the other
hand, it is searched very often. This a perfect usage scenario
for RCU.
The result is a much lower overhead of traversing the list as most
of the time no locking will be required.
Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Thu, 12 May 2016 17:22:57 +0000 (10:22 -0700)]
IB/hfi1: Fix pio wait counter double increment
The code unconditionlly increments the pio wait counter
making the counter inacurate and unusable.
Fixes: 14553ca11039 ("staging/rdma/hfi1: Adaptive PIO for short messages") Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
The external device configuration was incorrectly shifted to byte 3 of
the 32 bit DC_HOST_COMM_SETTINGS instead of byte 0. This patch corrects
the shift and provides the cable capability information in byte 0.
Easwar Hariharan [Thu, 12 May 2016 17:22:39 +0000 (10:22 -0700)]
IB/hfi1: Wait for QSFP modules to initialize
The function level reset in init_chip() and subsequent write of all 1s
to the ASIC_QSFP registers effectively resets attached active and
optical QSFP modules that pay attention to the RESET_N pin.
We subsequently try to access the QSFP management interface to qualify
and tune the channel and fabric SerDes before enough time (2 seconds
per SFF 8679 spec for QSFP28 modules) has elapsed for the module to
finish initialization. This fails and causes the failure of the channel
tuning algorithm, preventing us from bringing the link up.
This patch checks the port type prior to beginning channel and SerDes
tuning, and if found to be QSFP, watches for the QSFP initialization
complete interrupt, with a maximum timeout of 2 seconds, to allow the
initialization to complete.
Easwar Hariharan [Thu, 12 May 2016 17:22:33 +0000 (10:22 -0700)]
IB/hfi1: Ignore non-temperature warnings on a downed link
QSFP modules can raise an interrupt to inform us of expected conditions
while the link is down, such as RX power low. Actively ignore these
conditions when the link is down as they only add reporting noise.
Continue reporting conditions that are valid at all times, such as
temperature alarms and warnings.
Adding the needed mlx5_ifc hardware bits and structs
for the following features:
* Add vport to steering commands for SRIOV ACL support
* Add mlcr, pcmr and mcia registers for dump module EEPROM
* Add support for FCS, beacon led and disable_link bits to
hca caps
* Add CQE period mode bit in CQ context for CQE based CQ
moderation support
* Add umr SQ bit for fragmented memory registration
* Add needed bits and caps for Striding RQ support
In-order to avoid possible future conflicts between rdma and
net-next we added all expected updates to this file for this release.
If more changes will be submitted, we plan to do it only through
one of the subsystems, probably net-next.
All updated bits in this patch will be later used in
the up-coming submissions to net-next and rdma trees.
Since all srp_map_finish_fr() callers pass a non-zero value as
the fourth argument (sg_nents), the sg_nents == 0 check in that
function can be removed. Add a count == 0 check in the caller
of that function.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Laurence Oberman <loberman@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/srp: Avoid that mapping failure triggers an infinite loop
The srp_queuecommand() function translates ENOMEM into QUEUE_FULL
which causes the SCSI mid-layer to retry the command. All other
error codes are translated into DID_ERROR which causes the SCSI
command to fail. Return E2BIG if mapping will always fail to
prevent that the SCSI mid-layer keeps resubmitting a command
forever.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Laurence Oberman <loberman@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
Ensure that req->nmdesc is set correctly in srp_map_sg() if mapping
fails. Avoid that mapping failure causes a memory descriptor leak.
Report srp_map_sg() failure to the caller.
Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Laurence Oberman <loberman@redhat.com> Reviewed-by: Sagi Grimberg <sagi@grimberg.me> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
Hariprasad S [Wed, 4 May 2016 19:57:36 +0000 (01:27 +0530)]
RDMA/iw_cxgb4: move QP -> ERROR on fatal disconnect errors
In c4iw_ep_disconnect(), if we fail to initiate a close operation, then
move the qp to ERROR to disassociate the ep from the qp. Failure to do
this will leak the ep resources.
Hariprasad S [Wed, 4 May 2016 19:57:35 +0000 (01:27 +0530)]
RDMA/iw_cxgb4: don't use abort_connection in process_mpa_request()
Instead return whether the caller needs to disconnect. This is part of
getting rid of abort_connection() altogether so we properly clean up on
send_abort() failures.
Hariprasad S [Wed, 4 May 2016 19:57:32 +0000 (01:27 +0530)]
RDMA/iw_cxgb4: remove connection abort from process_mpa_reply
Instead, have the caller, rx_data() handle the close/abort like
it does for process_mpa_request(). This is part of getting rid of
abort_connection() altogether so we properly clean up on send_abort()
failures.
Hariprasad S [Wed, 4 May 2016 19:57:31 +0000 (01:27 +0530)]
RDMA/iw_cxgb4: ensure eps don't get freed while the mutex is held
In rx_data(), with the ep in FPDU_MODE, refcnt=2, if we get unexpected
streaming data, we call c4iw_modify_rc_qp() and move the qp from
RTS -> TERMINATE. In c4iw_modify_rc_qp(), if rdma_fini() returns
an error, the ep will be dereferenced (refcnt=1). Then rx_data()
calls c4iw_ep_disconnect() which starts the close operation.
But if send_halfclose() fails in c4iw_ep_disconnect(), we will call
release_ep_resources() derefing the ep which reduces the refcnt to 0 and
and frees the ep. However we still has the ep mutex at that point, so we
have a touch-after-free bug. There is a similar issue where
peer_close() calls c4iw_ep_disconnect().
The solution is to add a reference to the ep in c4iw_ep_disconnect()
after acquiring the mutex, and release it after releasing the mutex.
Hariprasad S [Wed, 4 May 2016 19:57:30 +0000 (01:27 +0530)]
RDMA/iw_cxgb4: stop ep timer on close failure
In c4iw_ep_disconnect(), if we start the ep timer to begin a close,
but send_halfclose() fails, we need to stop the timer and send a CLOSE
event up to the IWCM before releasing the resources. Otherwise, we can
crash when the ep timer fires if the ep is referencing a previous instance
of the device. This can happen as part of adapter reset/recovery, for
instance.
Hariprasad S [Wed, 4 May 2016 19:57:29 +0000 (01:27 +0530)]
RDMA/iw_cxgb4: release ep resources on accept arp failure
If ARP fails before the CPL_PASS_ACCEPT_RPL is seen by hardware, the tid
will be stuck in SYN_PEND and never released. So create an arp failure
handler specifically for this message to release the endpoint resources.
In pass_accept_rpl_arp_failure(), put the parent endpoint so it will
be freed when destroyed. Also we don't need to call release_tid() here
because _c4iw_free_ep() calls cxgb4_remove_tid() which releases the
hwtid.
If we get an ABORT_REQ_RSS instead of a PASS_ESTABLISH (because the
peer's ACK to our SYN is never received), then put the parent as well
in peer_abort().
Treat accept_cr() failures just like arp failures: put the parent ep
and release the ep resources destroying the tid
The ARP failure handlers are called in an atomic context, so we need to
schedule some of the processing which might block. Namely _c4iw_free_ep()
which needs a mutex. So create a "special" CPL opcode and handler and
schedule it via sched() to be run by process_work() in a blockable context.
Also rework the active open arp failure handler to make use of
release_ep_resources(). This allows both the active and passive arp
failure handlers to use the same deferred cleanup function.
iSER currently has a couple places that set max_sectors in either the host
template or SCSI host, and all of them get it wrong.
This patch instead uses a single assignment that (hopefully) gets it right:
the max_sectors value must be derived from the number of segments in the
FR or FMR structure, but actually be one lower than the page size multiplied
by the number of sectors, as it has to handle the case of non-aligned I/O.
Without this I get trivial to reproduce hangs when running xfstests
(on XFS) over iSER to Linux targets.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Acked-by: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Doug Ledford <dledford@redhat.com>
RDMA/i40iw: Fix for checking if the QP is destroyed
Fix for checking if the QP associated with a completion
has been destroyed while processing CQ elements.
If that is the case, move the CQ head to the next element
and continue completion processing.
STag index mask is calculated incorrectly, missing
the 14 bits minimum requirement. Add max macro to use
either # of MRs or 14 bits in the mask size calculation.
Ismail, Mustafa [Mon, 18 Apr 2016 15:33:09 +0000 (10:33 -0500)]
RDMA/i40iw: Adding queue drain functions
Adding sq and rq drain functions, which block until all
previously posted wr-s in the specified queue have completed.
A completion object is signaled to unblock the thread,
when the last cqe for the corresponding queue is processed.
Ismail, Mustafa [Mon, 18 Apr 2016 15:33:08 +0000 (10:33 -0500)]
RDMA/i40iw: Fix SD calculation for initial HMC creation
Correct SD calculation by using base address returned from commit FPM.
This alleviates any assumptions on resource ordering and alignment
requirement. Also consolidate SD estimation code into i40iw_est_sd().
Jubin John [Thu, 14 Apr 2016 15:31:53 +0000 (08:31 -0700)]
IB/hfi1: Serialize hrtimer function calls
hrtimer functions do not guarantee serialization, so we extend the
cca_timer_lock to cover the hrtimer_forward_now() in the hrtimer
callback handler and the hrtimer_start() in process_becn(). This
prevents races between these 2 functions to update the hrtimer state
leading to problems such as:
kernel BUG at kernel/hrtimer.c:1282!
encountered during validation of the CCA feature.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Thu, 14 Apr 2016 15:31:42 +0000 (08:31 -0700)]
IB/hfi1: Correctly report neighbor link down reason
The code to save the link down reason for reporting to the SMA
was in a location before the actual reason was read. Move the
SMA link down reason assignment to a better location.
Dean Luick [Thu, 14 Apr 2016 15:31:36 +0000 (08:31 -0700)]
IB/hfi1: Use the neighbor link down reason only when valid
The 8051 uses a link down reason to inform the driver why the
link went down. The neighbor planned link down reason code is
only valid when a link down idle message is received by the 8051.
Enhance the explanation on why the link went down.
Dean Luick [Thu, 14 Apr 2016 15:31:30 +0000 (08:31 -0700)]
IB/hfi1: Ignore link downgrade with 0 lanes
Versions of the 8051 firmware < 0.38 may report a link failure
as a link downgrade with a width of 0 followed by a link down
notification. Ignore the zero width downgrade notification -
the driver should follow the link down path.
Dean Luick [Tue, 12 Apr 2016 18:32:06 +0000 (11:32 -0700)]
IB/hfi1: Add RSM rule for user FECN handling
Add a receive side mapping rule to extract expected user packets with
the FECN bit set and place them in an eager buffer. This will allow
user libraries to recognize that a FECN was sent when using header
suppression and respond appropriately.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Tue, 12 Apr 2016 18:31:11 +0000 (11:31 -0700)]
IB/hfi1: Move QOS decision logic into its own function
The decision to use QOS affects other resource allocation.
Move the QOS decision logic into its own function so it can
be called by other interested parties.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Tue, 12 Apr 2016 18:30:51 +0000 (11:30 -0700)]
IB/hfi1: Extract RSM map table init from QOS
Refactor the allocation, tracking, and writing of the RSM map table
into its own set of routines. This will allow the map table to be
passed to multiple users to fill in as needed. Start with the original
user, QOS.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/hfi1: Reduce kernel context pio buffer allocation
The pio buffers were pooled evenly among all kernel contexts and
user contexts. However, the demand from kernel contexts is much
lower than user contexts. This patch reduces the allocation for
kernel contexts and thus makes more credits available for PSM,
helping performance. This is especially useful on high core-count
systems where large numbers of contexts are used.
A new context type SC_VL15 is added to distinguish the context used
for VL15 from other kernel contexts. The reason is that VL15 needs
to support 2KB sized packet while other kernel contexts need only
support packets up to the size determined by "piothreshold", which
has a default value of 256.
The new allocation method allows triple buffering of largest pio
packets configured for these contexts. This is sufficient to maintain
verbs performance. The largest pio packet size is 2048B for VL15
and "piothreshold" for other kernel contexts. A cap is applied to
"piothreshold" to avoid excessive buffer allocation.
The special case that SDMA is disable is handled differently. In
that case, the original pooling allocation is used to better
support the much higher pio traffic.
Notice that if adaptive pio is disabled (piothreshold==0), the pio
buffer size doesn't matter for non-VL15 kernel send contexts when
SDMA is enabled because pio is not used at all on these contexts
and thus the new allocation is still valid. If SDMA is disabled then
pooling allocation is used as mentioned in previous paragraph.
Adjustment is also made to the calculation of the credit return
threshold for the kernel contexts. Instead of purely based on
the MTU size, a percentage based threshold is also considered and
the smaller one of the two is chosen. This is necessary to ensure
that with the reduced buffer allocation credits are returned in
time to avoid unnecessary stall in the send path.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mark Debbage <mark.debbage@intel.com> Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Jubin John [Tue, 12 Apr 2016 18:30:08 +0000 (11:30 -0700)]
IB/hfi1: Change default number of user contexts
Change the default number of user contexts to the number of real
(non-HT) cpu cores in order to reduce the division of hfi1 hardware
contexts in the case of high core counts with hyper-threading enabled.
Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Signed-off-by: Jubin John <jubin.john@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Tue, 12 Apr 2016 18:28:56 +0000 (11:28 -0700)]
IB/hfi1: Remove unreachable code
Remove unreachable code from RC ack handling to fix an
smatch error.
Fixes: 633d27399514 ("staging/rdma/hfi1: use mod_timer when appropriate") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Dean Luick [Tue, 12 Apr 2016 18:28:36 +0000 (11:28 -0700)]
IB/hfi1: Fix double QSFP resource acquire on cache refresh
The function refresh_qsfp_cache() acquires the i2c chain resource,
but one caller already holds the resource. Change the acquire so
all calls to refresh_qsfp_cache() are covered by the acquire and
remove the acquire within refresh_qsfp_cache().
Dean Luick [Tue, 12 Apr 2016 18:26:21 +0000 (11:26 -0700)]
IB/hfi1: Guard against concurrent I2C access across all chains
The discrete ASIC board design makes the two I2C chains not
independent of each other. That is, only one chain can safely
be accessed at a time. For discrete ASIC devices, adjust the
resource locking so that access to one I2C chain will lock both
of the chains.
The pre-LNI SerDes and channel tuning algorithm already checks for
module presence assertion for the relevant port types. The extraneous
check removed in this patch blocks link up for port types for which
the module presence assertion is not relevant.