Jia-Ju Bai [Mon, 5 Jun 2017 12:23:40 +0000 (20:23 +0800)]
rxe: Fix a sleep-in-atomic bug in post_one_send
The driver may sleep under a spin lock, and the function call path is:
post_one_send (acquire the lock by spin_lock_irqsave)
init_send_wqe
copy_from_user --> may sleep
There is no flow that makes "qp->is_user" true, and copy_from_user may
cause bug when a non-user pointer is used. So the lines of copy_from_user
and check of "qp->is_user" are removed.
Signed-off-by: Jia-Ju Bai <baijiaju1990@163.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Moni Shoua <monis@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Ram Amrani [Mon, 5 Jun 2017 13:32:27 +0000 (16:32 +0300)]
RDMA/qedr: Add 64KB PAGE_SIZE support to user-space queues
Add 64KB PAGE_SIZE support to user-space CQ, SQ and RQ queues.
De-facto it means that code was added to translate 64KB
pages to smaller 4KB pages that the FW can handle. Otherwise,
the FW would wrap (or jump to the next page) when reaching 4KB
while the user space library will continue on the same large page.
Note that MR code remains as is since the FW supports larger pages
for MRs.
Devesh Sharma [Mon, 22 May 2017 10:15:40 +0000 (03:15 -0700)]
RDMA/bnxt_re: Fix RQE posting logic
This patch adds code to ring RQ Doorbell aggressively
so that the adapter can DMA RQ buffers sooner, instead
of DMA all WQEs in the post_recv WR list together at the
end of the post_recv verb.
Also use spinlock to serialize RQ posting
Somnath Kotur [Mon, 22 May 2017 10:15:36 +0000 (03:15 -0700)]
RDMA/bnxt_re: Add HW workaround for avoiding stall for UD QPs
HW stalls out after 0x800000 WQEs are posted for UD QPs.
To workaround this problem, driver will send a modify_qp cmd
to the HW at around the halfway mark(0x400000) so that FW
can accordingly modify the QP context in the HW to prevent this
stall.
This workaround needs to be done for UD, QP1 and Raw Ethertype
packets. Added a counter to keep track of WQEs posted during post_send.
Selvin Xavier [Mon, 22 May 2017 10:15:34 +0000 (03:15 -0700)]
RDMA/bnxt_re: Dereg MR in FW before freeing the fast_reg_page_list
If the host buffers are freed before destroying MR in HW,
HW could try accessing these buffers. This could cause a host
crash. Fixing the code to avoid this condition.
Eddie Wai [Wed, 14 Jun 2017 10:26:23 +0000 (03:26 -0700)]
RDMA/bnxt_re: HW workarounds for handling specific conditions
This patch implements the following HW workarounds
1. The SQ depth needs to be augmented by 128 + 1 to avoid running
into an Out of order CQE issue
2. Workaround to handle the problem where the HW fast path engine continues
to access DMA memory in retranmission mode even after the WQE has
already been completed. If the HW reports this condition, driver detects
it and posts a Fence WQE. The driver stops reporting the completions
to stack until it receives completion for Fence WQE.
Signed-off-by: Eddie Wai <eddie.wai@broadcom.com> Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Devesh Sharma [Mon, 22 May 2017 10:15:31 +0000 (03:15 -0700)]
RDMA/bnxt_re: Fixing the Control path command and response handling
Fixing a concurrency issue with creq handling. Each caller
was given a globally managed crsq element, which was
accessed outside a lock. This could result in corruption,
if lot of applications are simultaneously issuing Control Path
commands. Now, each caller will provide its own response buffer
and the responses will be copied under a lock.
Also, Fixing the queue full condition check for the CMDQ.
As a part of these changes, the control path code is refactored
to remove the code replication in the response status checking.
Roland Dreier [Tue, 6 Jun 2017 16:22:00 +0000 (09:22 -0700)]
IB/addr: Fix setting source address in addr6_resolve()
Commit eea40b8f624f ("infiniband: call ipv6 route lookup via the stub
interface") introduced a regression in address resolution when connecting
to IPv6 destination addresses. The old code called ip6_route_output(),
while the new code calls ipv6_stub->ipv6_dst_lookup(). The two are almost
the same, except that ipv6_dst_lookup() also calls ip6_route_get_saddr()
if the source address is in6addr_any.
This means that the test of ipv6_addr_any(&fl6.saddr) now never succeeds,
and so we never copy the source address out. This ends up causing
rdma_resolve_addr() to fail, because without a resolved source address,
cma_acquire_dev() will fail to find an RDMA device to use. For me, this
causes connecting to an NVMe over Fabrics target via RoCE / IPv6 to fail.
Fix this by copying out fl6.saddr if ipv6_addr_any() is true for the original
source address passed into addr6_resolve(). We can drop our call to
ipv6_dev_get_saddr() because ipv6_dst_lookup() already does that work.
Fixes: eea40b8f624 ("infiniband: call ipv6 route lookup via the stub interface") Cc: <stable@vger.kernel.org> # 3.12+ Signed-off-by: Roland Dreier <roland@purestorage.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Majd Dibbiny [Sun, 21 May 2017 16:09:54 +0000 (19:09 +0300)]
RDMA/SA: Fix kernel panic in CMA request handler flow
Commit 9fdca4da4d8c (IB/SA: Split struct sa_path_rec based on IB and
ROCE specific fields) moved the service_id to be specific attribute
for IB and OPA SA Path Record, and thus wasn't assigned for RoCE.
This caused to the following kernel panic in the CMA request handler flow:
This patch moves it back to the common SA Path Record structure
and removes the redundant setter and getter.
Tested on Connect-IB and Connect-X4 in Infiniband and RoCE respectively.
Fixes: 9fdca4da4d8c (IB/SA: Split struct sa_path_rec based on IB ands
ROCE specific fields) Signed-off-by: Majd Dibbiny <majd@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
Qing Huang [Thu, 18 May 2017 23:33:53 +0000 (16:33 -0700)]
RDMA/core: not to set page dirty bit if it's already set.
This change will optimize kernel memory deregistration operations.
__ib_umem_release() used to call set_page_dirty_lock() against every
writable page in its memory region. Its purpose is to keep data
synced between CPU and DMA device when swapping happens after mem
deregistration ops. Now we choose not to set page dirty bit if it's
already set by kernel prior to calling __ib_umem_release(). This
reduces memory deregistration time by half or even more when we ran
application simulation test program.
Leon Romanovsky [Thu, 18 May 2017 04:40:33 +0000 (07:40 +0300)]
RDMA/uverbs: Declare local function static and add brackets to sizeof
Commit 57520751445b ("IB/SA: Add OPA path record type") introduced
new local function __ib_copy_path_rec_to_user, but didn't limit its
scope. This produces the following sparse warning:
drivers/infiniband/core/uverbs_marshall.c:99:6: warning:
symbol '__ib_copy_path_rec_to_user' was not declared. Should it be
static?
In addition, it used sizeof ... notations instead of sizeof(...), which
is correct in C, but a little bit misleading. Let's change it too.
Fixes: 57520751445b ("IB/SA: Add OPA path record type") Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Israel Rukshin [Thu, 11 May 2017 15:52:36 +0000 (18:52 +0300)]
RDMA/srp: Fix NULL deref at srp_destroy_qp()
If srp_init_qp() fails at srp_create_ch_ib() then ch->send_cq
may be NULL.
Calling directly to ib_destroy_qp() is sufficient because
no work requests were posted on the created qp.
Fixes: 9294000d6d89 ("IB/srp: Drain the send queue before destroying a QP") Cc: <stable@vger.kernel.org> Signed-off-by: Israel Rukshin <israelr@mellanox.com> Reviewed-by: Max Gurtovoy <maxg@mellanox.com> Reviewed-by: Bart van Assche <bart.vanassche@sandisk.com>-- Signed-off-by: Doug Ledford <dledford@redhat.com>
Leon Romanovsky [Sun, 14 May 2017 10:32:06 +0000 (13:32 +0300)]
RDMA/IPoIB: Limit the ipoib_dev_uninit_default scope
ipoib_dev_uninit_default() call is used in ipoib_main.c file only
and it generates the following warning from smatch tool:
drivers/infiniband/ulp/ipoib/ipoib_main.c:1593:6: warning:
symbol 'ipoib_dev_uninit_default' was not declared. Should it
be static?
Honggang Li [Thu, 11 May 2017 12:14:28 +0000 (20:14 +0800)]
RDMA/IPoIB: Replace netdev_priv with ipoib_priv for ipoib_get_link_ksettings
ipoib_dev_init accesses the wrong private data for the IPoIB device.
Commit cd565b4b51e5 (IB/IPoIB: Support acceleration options callbacks)
changed ipoib_priv from being identical to netdev_priv to being an
area inside of, but not the same pointer as, the netdev_priv pointer.
As such, the struct we want is the ipoib_priv area, not the netdev_priv
area, so use the right accessor, otherwise we kernel panic.
Max Gurtovoy [Sun, 28 May 2017 07:53:10 +0000 (10:53 +0300)]
net/mlx5: Define interface bits for fencing UMR wqe
HW can implement UMR wqe re-transmission in various ways.
Thus, add HCA cap to distinguish the needed fence for UMR to make
sure that the wqe wouldn't fail on mkey checks.
Signed-off-by: Max Gurtovoy <maxg@mellanox.com> Acked-by: Leon Romanovsky <leon@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Doug Ledford <dledford@redhat.com>
Jack Morgenstein [Sun, 21 May 2017 16:11:13 +0000 (19:11 +0300)]
RDMA/mlx4: Fix MAD tunneling when SRIOV is enabled
The cited patch added a type field to structures ib_ah and rdma_ah_attr.
Function mlx4_ib_query_ah() builds an rdma_ah_attr structure from the
data in an mlx4_ib_ah structure (which contains both an ib_ah structure
and an address vector).
For mlx4_ib_query_ah() to work properly, the type field in the contained
ib_ah structure must be set correctly.
In the outgoing MAD tunneling flow, procedure mlx4_ib_multiplex_mad()
paravirtualizes a MAD received from a slave and sends the processed
mad out over the wire. During this processing, it populates an
mlx4_ib_ah structure and calls mlx4_ib_query_ah().
The cited commit overlooked setting the type field in the contained
ib_ah structure before invoking mlx4_ib_query_ah(). As a result, the
type field remained uninitialized, and the rdma_ah_attr structure was
incorrectly built. This resulted in improperly built MADs being sent out
over the wire.
This patch properly initializes the type field in the contained ib_ah
structure before calling mlx4_ib_query_ah(). The rdma_ah_attr structure
is then generated correctly.
Fixes: 44c58487d51a ("IB/core: Define 'ib' and 'roce' rdma_ah_attr types") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Marciniszyn [Fri, 12 May 2017 16:02:00 +0000 (09:02 -0700)]
RDMA/qib,hfi1: Fix MR reference count leak on write with immediate
The handling of IB_RDMA_WRITE_ONLY_WITH_IMMEDIATE will leak a memory
reference when a buffer cannot be allocated for returning the immediate
data.
The issue is that the rkey validation has already occurred and the RNR
nak fails to release the reference that was fruitlessly gotten. The
the peer will send the identical single packet request when its RNR
timer pops.
The fix is to release the held reference prior to the rnr nak exit.
This is the only sequence the requires both rkey validation and the
buffer allocation on the same packet.
Cc: Stable <stable@vger.kernel.org> # 4.7+ Tested-by: Tadeusz Struk <tadeusz.struk@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
RDMA/hfi1: Defer setting VL15 credits to link-up interrupt
Keep VL15 credits at 0 during LNI, before link-up. Store
VL15 credits value during verify cap interrupt and set
in after link-up. This addresses an issue where VL15 MAD
packets could be sent by one side of the link before
the other side is ready to receive them.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Reviewed-by: Dean Luick <dean.luick@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
RDMA/hfi1: change PCI bar addr assignments to Linux API functions
The Omni-Path adapter driver fails to load on the ppc64le platform
due to invalid PCI setup.
This patch makes the PCI configuration more robust and will
fix 64 bit addressing for ppc64le.
Signed-off-by: Steven L Roberts <robers97@gmail.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
RDMA/hfi1: fix array termination by appending NULL to attr array
This fixes a kernel panic when loading the hfi driver as a dynamic module.
Signed-off-by: Steven L Roberts <robers97@gmail.com> Reviewed-by: Leon Romanovsky <leon@kernel.org> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Raju Rangoju [Wed, 31 May 2017 06:36:58 +0000 (12:06 +0530)]
RDMA/iw_cxgb4: fix the calculation of ipv6 header size
Take care of ipv6 checks while computing header length for deducing mtu
size of ipv6 servers. Due to the incorrect header length computation for
ipv6 servers, wrong mss is reported to the peer (client).
Raju Rangoju [Mon, 15 May 2017 06:40:39 +0000 (06:40 +0000)]
RDMA/iw_cxgb4: Avoid touch after free error in ARP failure handlers
The patch 761e19a504af (RDMA/iw_cxgb4: Handle return value of
c4iw_ofld_send() in abort_arp_failure()) from May 6, 2016
leads to the following static checker warning:
drivers/infiniband/hw/cxgb4/cm.c:575 abort_arp_failure()
warn: passing freed memory 'skb'
Also fixes skb leak when l2t resolution fails
Fixes: 761e19a504afa55 (RDMA/iw_cxgb4: Handle return value of
c4iw_ofld_send() in abort_arp_failure()) Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Raju Rangoju <rajur@chelsio.com> Reviewed-by: Steve Wise <swise@opengridcomputing.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Shiraz Saleem [Fri, 19 May 2017 21:14:02 +0000 (16:14 -0500)]
RDMA/i40iw: Remove MSS change support
MSS change on active QPs is not supported. Store new MSS
value for new QPs only. Remove code to modify MSS on the fly.
This also resolves a crash on QP modify to QP 0.
Mustafa Ismail [Thu, 11 May 2017 04:32:14 +0000 (23:32 -0500)]
RDMA/i40iw: Fix device initialization error path
Some error paths in i40iw_initialize_dev are doing
additional and unnecessary work before exiting.
Correctly free resources allocated prior to error
and return with correct status code.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intelcom> Signed-off-by: Henry Orosco <henry.orosco@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Linus Torvalds [Sun, 28 May 2017 23:18:27 +0000 (16:18 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal SoC management fixes from Eduardo Valentin:
- fixes to TI SoC driver, Broadcom, qoriq
- small sparse warning fix on thermal core
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
thermal: broadcom: ns-thermal: default on iProc SoCs
ti-soc-thermal: Fix a typo in a comment line
ti-soc-thermal: Delete error messages for failed memory allocations in ti_bandgap_build()
ti-soc-thermal: Use devm_kcalloc() in ti_bandgap_build()
thermal: core: make thermal_emergency_poweroff static
thermal: qoriq: remove useless call for of_thermal_get_trip_points()
Linus Torvalds [Sat, 27 May 2017 16:39:09 +0000 (09:39 -0700)]
Merge tag 'tty-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial fixes from Greg KH:
"Here are some serial and tty fixes for 4.12-rc3. They are a bit bigger
than normal, which is why I had them bake in linux-next for a few
weeks and didn't send them to you for -rc2.
They revert a few of the serdev patches from 4.12-rc1, and bring
things back to how they were in 4.11, to try to make things a bit more
stable there. Rob and Johan both agree that this is the way forward,
so this isn't people squabbling over semantics. Other than that, just
a few minor serial driver fixes that people have had problems with.
All of these have been in linux-next for a few weeks with no reported
issues"
* tag 'tty-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: altera_uart: call iounmap() at driver remove
serial: imx: ensure UCR3 and UFCR are setup correctly
MAINTAINERS/serial: Change maintainer of jsm driver
serial: enable serdev support
tty/serdev: add serdev registration interface
serdev: Restore serdev_device_write_buf for atomic context
serial: core: fix crash in uart_suspend_port
tty: fix port buffer locking
tty: ehv_bytechan: clean up init error handling
serial: ifx6x60: fix use-after-free on module unload
serial: altera_jtaguart: adding iounmap()
serial: exar: Fix stuck MSIs
serial: efm32: Fix parity management in 'efm32_uart_console_get_options()'
serdev: fix tty-port client deregistration
Revert "tty_port: register tty ports with serdev bus"
drivers/tty: 8250: only call fintek_8250_probe when doing port I/O
Linus Torvalds [Sat, 27 May 2017 16:28:34 +0000 (09:28 -0700)]
Merge tag 'powerpc-4.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fix running SPU programs on Cell, and a few other minor fixes.
Thanks to Alistair Popple, Jeremy Kerr, Michael Neuling, Nicholas
Piggin"
* tag 'powerpc-4.12-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc: Add PPC_FEATURE userspace bits for SCV and DARN instructions
powerpc/spufs: Fix hash faults for kernel regions
powerpc: Fix booting P9 hash with CONFIG_PPC_RADIX_MMU=N
powerpc/powernv/npu-dma.c: Fix opal_npu_destroy_context() call
selftests/powerpc: Fix TM resched DSCR test with some compilers
Linus Torvalds [Sat, 27 May 2017 16:17:58 +0000 (09:17 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A series of fixes for X86:
- The final fix for the end-of-stack issue in the unwinder
- Handle non PAT systems gracefully
- Prevent access to uninitiliazed memory
- Move early delay calaibration after basic init
- Fix Kconfig help text
- Fix a cross compile issue
- Unbreak older make versions"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/timers: Move simple_udelay_calibration past init_hypervisor_platform
x86/alternatives: Prevent uninitialized stack byte read in apply_alternatives()
x86/PAT: Fix Xorg regression on CPUs that don't support PAT
x86/watchdog: Fix Kconfig help text file path reference to lockup watchdog documentation
x86/build: Permit building with old make versions
x86/unwind: Add end-of-stack check for ftrace handlers
Revert "x86/entry: Fix the end of the stack for newly forked tasks"
x86/boot: Use CROSS_COMPILE prefix for readelf
Linus Torvalds [Sat, 27 May 2017 16:02:41 +0000 (09:02 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling fixes from Thomas Gleixner:
- Synchronization of tools and kernel headers
- A series of fixes for perf report addressing various failures:
* Handle invalid maps proper
* Plug a memory leak
* Handle frames and callchain order correctly
- Fixes for handling inlines and children mode
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools/include: Sync kernel ABI headers with tooling headers
perf tools: Put caller above callee in --children mode
perf report: Do not drop last inlined frame
perf report: Always honor callchain order for inlined nodes
perf script: Add --inline option for debugging
perf report: Fix off-by-one for non-activation frames
perf report: Fix memory leak in addr2line when called by addr2inlines
perf report: Don't crash on invalid maps in `-g srcline` mode
Linus Torvalds [Sat, 27 May 2017 15:30:30 +0000 (08:30 -0700)]
Merge tag 'trace-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull ftrace fixes from Steven Rostedt:
"There's been a few memory issues found with ftrace.
One was simply a memory leak where not all was being freed that should
have been in releasing a file pointer on set_graph_function.
Then Thomas found that the ftrace trampolines were marked for
read/write as well as execute. To shrink the possible attack surface,
he added calls to set them to ro. Which also uncovered some other
issues with freeing module allocated memory that had its permissions
changed.
Kprobes had a similar issue which is fixed and a selftest was added to
trigger that issue again"
* tag 'trace-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
x86/ftrace: Make sure that ftrace trampolines are not RWX
x86/mm/ftrace: Do not bug in early boot on irqs_disabled in cpu_flush_range()
selftests/ftrace: Add a testcase for many kprobe events
kprobes/x86: Fix to set RWX bits correctly before releasing trampoline
ftrace: Fix memory leak in ftrace_graph_release()
Thomas Gleixner [Thu, 25 May 2017 08:57:51 +0000 (10:57 +0200)]
x86/ftrace: Make sure that ftrace trampolines are not RWX
ftrace use module_alloc() to allocate trampoline pages. The mapping of
module_alloc() is RWX, which makes sense as the memory is written to right
after allocation. But nothing makes these pages RO after writing to them.
Add proper set_memory_rw/ro() calls to protect the trampolines after
modification.
Interrupts should not be enabled at this early in the boot process. It is
also fine to leave interrupts enabled during this time as there's only one
CPU running, and on_each_cpu() means to only run on the current CPU.
If early_boot_irqs_disabled is set, it is safe to run cpu_flush_range() with
interrupts disabled. Don't trigger a BUG_ON() in that case.
Masami Hiramatsu [Fri, 26 May 2017 04:44:54 +0000 (13:44 +0900)]
selftests/ftrace: Add a testcase for many kprobe events
Add a testcase to test kprobes via ftrace interface
with many concurrent kprobe events.
This tries to add many kprobe events (up to 256) on
kernel functions. To avoid making ftrace-based
kprobes (kprobes on fentry), it skips first N bytes
(on x86 N=5, on ppc or arm N=4) of function entry.
After that, it enables all those events, disable it,
and remove it.
Since the unoptimization buffer reclaiming will
be delayed, after removing events, it will wait
enough time.
Masami Hiramatsu [Thu, 25 May 2017 10:38:17 +0000 (19:38 +0900)]
kprobes/x86: Fix to set RWX bits correctly before releasing trampoline
Fix kprobes to set(recover) RWX bits correctly on trampoline
buffer before releasing it. Releasing readonly page to
module_memfree() crash the kernel.
Without this fix, if kprobes user register a bunch of kprobes
in function body (since kprobes on function entry usually
use ftrace) and unregister it, kernel hits a BUG and crash.
Luis Henriques [Thu, 25 May 2017 15:20:38 +0000 (16:20 +0100)]
ftrace: Fix memory leak in ftrace_graph_release()
ftrace_hash is being kfree'ed in ftrace_graph_release(), however the
->buckets field is not. This results in a memory leak that is easily
captured by kmemleak:
Linus Torvalds [Fri, 26 May 2017 23:45:13 +0000 (16:45 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input layer fixes from Dmitry Torokhov:
"Just a few fixups to a couple of drivers"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: elan_i2c - ignore signals when finishing updating firmware
Input: elan_i2c - clear INT before resetting controller
Input: atmel_mxt_ts - add T100 as a readable object
Input: edt-ft5x06 - increase allowed data range for threshold parameter
1) Fix state pruning in bpf verifier wrt. alignment, from Daniel
Borkmann.
2) Handle non-linear SKBs properly in SCTP ICMP parsing, from Davide
Caratti.
3) Fix bit field definitions for rss_hash_type of descriptors in mlx5
driver, from Jesper Brouer.
4) Defer slave->link updates until bonding is ready to do a full commit
to the new settings, from Nithin Sujir.
5) Properly reference count ipv4 FIB metrics to avoid use after free
situations, from Eric Dumazet and several others including Cong Wang
and Julian Anastasov.
6) Fix races in llc_ui_bind(), from Lin Zhang.
7) Fix regression of ESP UDP encapsulation for TCP packets, from
Steffen Klassert.
8) Fix mdio-octeon driver Kconfig deps, from Randy Dunlap.
9) Fix regression in setting DSCP on ipv6/GRE encapsulation, from Peter
Dawson.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (43 commits)
ipv4: add reference counting to metrics
net: ethernet: ax88796: don't call free_irq without request_irq first
ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
sctp: fix ICMP processing if skb is non-linear
net: llc: add lock_sock in llc_ui_bind to avoid a race condition
bonding: Don't update slave->link until ready to commit
test_bpf: Add a couple of tests for BPF_JSGE.
bpf: add various verifier test cases
bpf: fix wrong exposure of map_flags into fdinfo for lpm
bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data
bpf: properly reset caller saved regs after helper call and ld_abs/ind
bpf: fix incorrect pruning decision when alignment must be tracked
arp: fixed -Wuninitialized compiler warning
tcp: avoid fastopen API to be used on AF_UNSPEC
net: move somaxconn init from sysctl code
net: fix potential null pointer dereference
geneve: fix fill_info when using collect_metadata
virtio-net: enable TSO/checksum offloads for Q-in-Q vlans
be2net: Fix offload features for Q-in-Q packets
vlan: Fix tcp checksum offloads in Q-in-Q vlans
...
- Fix warnings about unused variables that appeared in -rc1.
- Don't spew errors when bmapping a local format directory
- Fix an off-by-one error in a delalloc eof assertion
- Make fsmap only return inode information for CAP_SYS_ADMIN
- Fix a potential mount time deadlock recovering cow extents
- Fix unaligned memory access in _btree_visit_blocks
- Fix various SEEK_HOLE/SEEK_DATA bugs"
* tag 'xfs-4.12-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
xfs: Move handling of missing page into one place in xfs_find_get_desired_pgoff()
xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff()
xfs: Fix missed holes in SEEK_HOLE implementation
xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()
xfs: fix unaligned access in xfs_btree_visit_blocks
xfs: avoid mount-time deadlock in CoW extent recovery
xfs: only return detailed fsmap info if the caller has CAP_SYS_ADMIN
xfs: bad assertion for delalloc an extent that start at i_size
xfs: fix warnings about unused stack variables
xfs: BMAPX shouldn't barf on inline-format directories
xfs: fix indlen accounting error on partial delalloc conversion
2) At the same time run following loop :
while :
do
ip ro add 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
ip ro del 10.246.7.152 dev eth0 src 10.246.7.151 mtu 1500
done
Cong Wang attempted to add back rt->fi in commit 82486aa6f1b9 ("ipv4: restore rt->fi for reference counting")
but this proved to add some issues that were complex to solve.
Instead, I suggested to add a refcount to the metrics themselves,
being a standalone object (in particular, no reference to other objects)
I tried to make this patch as small as possible to ease its backport,
instead of being super clean. Note that we believe that only ipv4 dst
need to take care of the metric refcount. But if this is wrong,
this patch adds the basic infrastructure to extend this to other
families.
Many thanks to Julian Anastasov for reviewing this patch, and Cong Wang
for his efforts on this problem.
Fixes: 2860583fe840 ("ipv4: Kill rt->fi") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Reviewed-by: Julian Anastasov <ja@ssi.bg> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: ax88796: don't call free_irq without request_irq first
The function ax_init_dev (which is called only from the driver's .probe
function) calls free_irq in the error path without having requested the
irq in the first place. So drop the free_irq call in the error path.
Fixes: 825a2ff1896e ("AX88796 network driver") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Peter Dawson [Thu, 25 May 2017 20:35:18 +0000 (06:35 +1000)]
ip6_tunnel, ip6_gre: fix setting of DSCP on encapsulated packets
This fix addresses two problems in the way the DSCP field is formulated
on the encapsulating header of IPv6 tunnels.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=195661
1) The IPv6 tunneling code was manipulating the DSCP field of the
encapsulating packet using the 32b flowlabel. Since the flowlabel is
only the lower 20b it was incorrect to assume that the upper 12b
containing the DSCP and ECN fields would remain intact when formulating
the encapsulating header. This fix handles the 'inherit' and
'fixed-value' DSCP cases explicitly using the extant dsfield u8 variable.
2) The use of INET_ECN_encapsulate(0, dsfield) in ip6_tnl_xmit was
incorrect and resulted in the DSCP value always being set to 0.
Commit 90427ef5d2a4 ("ipv6: fix flow labels when the traffic class
is non-0") caused the regression by masking out the flowlabel
which exposed the incorrect handling of the DSCP portion of the
flowlabel in ip6_tunnel and ip6_gre.
Fixes: 90427ef5d2a4 ("ipv6: fix flow labels when the traffic class is non-0") Signed-off-by: Peter Dawson <peter.a.dawson@boeing.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Thu, 25 May 2017 17:14:56 +0000 (19:14 +0200)]
sctp: fix ICMP processing if skb is non-linear
sometimes ICMP replies to INIT chunks are ignored by the client, even if
the encapsulated SCTP headers match an open socket. This happens when the
ICMP packet is carried by a paged skb: use skb_header_pointer() to read
packet contents beyond the SCTP header, so that chunk header and initiate
tag are validated correctly.
v2:
- don't use skb_header_pointer() to read the transport header, since
icmp_socket_deliver() already puts these 8 bytes in the linear area.
- change commit message to make specific reference to INIT chunks.
Signed-off-by: Davide Caratti <dcaratti@redhat.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
linzhang [Thu, 25 May 2017 06:07:18 +0000 (14:07 +0800)]
net: llc: add lock_sock in llc_ui_bind to avoid a race condition
There is a race condition in llc_ui_bind if two or more processes/threads
try to bind a same socket.
If more processes/threads bind a same socket success that will lead to
two problems, one is this action is not what we expected, another is
will lead to kernel in unstable status or oops(in my simple test case,
cause llc2.ko can't unload).
The current code is test SOCK_ZAPPED bit to avoid a process to
bind a same socket twice but that is can't avoid more processes/threads
try to bind a same socket at the same time.
So, add lock_sock in llc_ui_bind like others, such as llc_ui_connect.
Signed-off-by: Lin Zhang <xiaolou4617@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 26 May 2017 18:05:22 +0000 (11:05 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A collection of fixes that should go into this series. This contains:
- A set of NVMe fixes, pulled from Christoph. This includes a set of
fixes for the fiber channel bits from James Smart, rdma queue depth
fix from Marta, controller removal fixes from Ming, and some more
APST quirk updates from Andy.
- A blk-mq debugfs fix from Bart, fixing a problem with the
untangling of the sysfs and debugfs blk-mq bits that was added in
this series.
- Error code fix in add_partition() from Dan.
- A small series of fixes for the new blk-throttle code from Shaohua"
* 'for-linus' of git://git.kernel.dk/linux-block: (21 commits)
blk-mq: Only register debugfs attributes for blk-mq queues
nvme: Quirk APST on Intel 600P/P3100 devices
nvme: only setup block integrity if supported by the driver
nvme: replace is_flags field in nvme_ctrl_ops with a flags field
nvme-pci: consistencly use ctrl->device for logging
partitions/msdos: FreeBSD UFS2 file systems are not recognized
block: fix an error code in add_partition()
blk-throttle: force user to configure all settings for io.low
blk-throttle: respect 0 bps/iops settings for io.low
blk-throttle: output some debug info in trace
blk-throttle: add hierarchy support for latency target and idle time
nvme_fc: remove extra controller reference taken on reconnect
nvme_fc: correct nvme status set on abort
nvme_fc: set logging level on resets/deletes
nvme_fc: revise comment on teardown
nvme_fc: Support ctrl_loss_tmo
nvme_fc: get rid of local reconnect_delay
blk-mq: remove blk_mq_abort_requeue_list()
nvme: avoid to use blk_mq_abort_requeue_list()
nvme: use blk_mq_start_hw_queues() in nvme_kill_queues()
...
Linus Torvalds [Fri, 26 May 2017 17:51:18 +0000 (10:51 -0700)]
Merge tag 'pci-v4.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
- fix PCI_ENDPOINT build error (merged for v4.12)
- fix Switchtec driver (merged for v4.12)
- fix imx6 config read timeouts, fallout from changing to non-postable
reads
- add PM "needs_resume" flag for i915 suspend issue
* tag 'pci-v4.12-fixes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI/PM: Add needs_resume flag to avoid suspend complete optimization
PCI: imx6: Fix config read timeout handling
switchtec: Fix minor bug with partition ID register
switchtec: Use new cdev_device_add() helper function
PCI: endpoint: Make PCI_ENDPOINT depend on HAS_DMA
Linus Torvalds [Fri, 26 May 2017 16:35:22 +0000 (09:35 -0700)]
Merge tag 'ceph-for-4.12-rc3' of git://github.com/ceph/ceph-client
Pul ceph fixes from Ilya Dryomov:
"A bunch of make W=1 and static checker fixups, a RECONNECT_SEQ
messenger patch from Zheng and Luis' fallocate fix"
* tag 'ceph-for-4.12-rc3' of git://github.com/ceph/ceph-client:
ceph: check that the new inode size is within limits in ceph_fallocate()
libceph: cleanup old messages according to reconnect seq
libceph: NULL deref on crush_decode() error path
libceph: fix error handling in process_one_ticket()
libceph: validate blob_struct_v in process_one_ticket()
libceph: drop version variable from ceph_monmap_decode()
libceph: make ceph_msg_data_advance() return void
libceph: use kbasename() and kill ceph_file_part()
Linus Torvalds [Fri, 26 May 2017 16:05:35 +0000 (09:05 -0700)]
Merge tag 'mmc-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"This contains fixes to make the WiFi work again for the ARM64 Hikey
board.
Together with a couple of DTS updates for the Hikey board we have also
extended the mmc pwrseq_simple, to support a new power-off-delay-us DT
property, as that was required to enable a graceful power off sequence
for the WiFi chip"
* tag 'mmc-v4.12-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
arm64: dts: hikey: Fix WiFi support
arm64: dts: hi6220: Move board data from the dwmmc nodes to hikey dts
arm64: dts: hikey: Add the SYS_5V and the VDD_3V3 regulators
arm64: dts: hi6220: Move the fixed_5v_hub regulator to the hikey dts
arm64: dts: hikey: Add clock for the pmic mfd
mfd: dts: hi655x: Add clock binding for the pmic
mmc: pwrseq_simple: Parse DTS for the power-off-delay-us property
mmc: dt: pwrseq-simple: Invent power-off-delay-us
Linus Torvalds [Fri, 26 May 2017 16:03:09 +0000 (09:03 -0700)]
Merge tag 'sound-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This contains a few HD-audio device-specific quirks and an endianess
fix for USB-audio, as well as the update of quirk model list document.
All fixes are small and trivial.
The document update could have been postponed, but it's a good thing
for user and has absolutely zero risk of breakage, so included here"
* tag 'sound-4.12-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - apply STAC_9200_DELL_M22 quirk for Dell Latitude D430
ALSA: hda - Update the list of quirk models
ALSA: hda - Provide dual-codecs model option for a few Realtek codecs
ALSA: hda - Apply dual-codec quirk for MSI Z270-Gaming mobo
ALSA: hda - No loopback on ALC299 codec
ALSA: usb-audio: fix Amanero Combo384 quirk on big-endian hosts
Linus Torvalds [Fri, 26 May 2017 15:54:06 +0000 (08:54 -0700)]
Merge tag 'drm-fixes-for-v4.12-rc3' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Not a whole lot happening here, a set of amdgpu fixes and one core
deadlock fix, and some misc drivers fixes"
* tag 'drm-fixes-for-v4.12-rc3' of git://people.freedesktop.org/~airlied/linux:
drm/amdgpu: fix null point error when rmmod amdgpu.
drm/amd/powerplay: fix a signedness bugs
drm/amdgpu: fix NULL pointer panic of emit_gds_switch
drm/radeon: Unbreak HPD handling for r600+
drm/amd/powerplay/smu7: disable mclk switching for high refresh rates
drm/amd/powerplay/smu7: add vblank check for mclk switching (v2)
drm/radeon/ci: disable mclk switching for high refresh rates (v2)
drm/amdgpu/ci: disable mclk switching for high refresh rates (v2)
drm/amdgpu: fix fundamental suspend/resume issue
drm/gma500/psb: Actually use VBT mode when it is found
drm: Fix deadlock retry loop in page_flip_ioctl
drm: qxl: Delay entering atomic context during cursor update
drm/radeon: Fix oops upon driver load on PowerXpress laptops
Jens Axboe [Fri, 26 May 2017 15:11:19 +0000 (09:11 -0600)]
Merge branch 'nvme-4.12' of git://git.infradead.org/nvme into for-linus
Christoph writes:
"A couple of fixes for the next rc on the nvme front. Various FC fixes
from James, controller removal fixes from Ming (including a block layer
patch), a APST related device quirk from Andy, a RDMA fix for small
queue depth device from Marta, as well as fixes for the lack of
metadata support in non-PCIe drivers and the printk logging format from
me."
Bart Van Assche [Thu, 25 May 2017 23:38:06 +0000 (16:38 -0700)]
blk-mq: Only register debugfs attributes for blk-mq queues
The code in blk-mq-debugfs.c assumes that it is working on a blk-mq
queue and is not intended to work on a blk-sq queue. Hence only
register blk-mq debugfs attributes for blk-mq queues.
Fixes: commit 9c1051aacde8 ("blk-mq: untangle debugfs and sysfs") Signed-off-by: Bart Van Assche <bart.vanassche@sandisk.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Reviewed-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Andy Lutomirski [Wed, 24 May 2017 22:06:31 +0000 (15:06 -0700)]
nvme: Quirk APST on Intel 600P/P3100 devices
They have known firmware bugs. A fix is apparently in the works --
once fixed firmware is available, someone from Intel (Hi, Keith!)
can adjust the quirk accordingly.
Cc: stable@vger.kernel.org # v4.11 Cc: Kai-Heng Feng <kai.heng.feng@canonical.com> Cc: Mario Limonciello <mario_limonciello@dell.com> Signed-off-by: Andy Lutomirski <luto@kernel.org> Signed-off-by: Christoph Hellwig <hch@lst.de>
nvme: only setup block integrity if supported by the driver
Currently only the PCIe driver supports metadata, so we should not claim
integrity support for the other drivers. This prevents nasty crashes
with targets that advertise metadata support on fabrics.
Also use the opportunity to factor out some code into a separate helper
that isn't even compiled if CONFIG_BLK_DEV_INTEGRITY is disabled.
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Keith Busch <keith.busch@intel.com>
Dave Airlie [Fri, 26 May 2017 01:51:55 +0000 (11:51 +1000)]
Merge branch 'drm-fixes-4.12' of git://people.freedesktop.org/~agd5f/linux into drm-fixes
A bunch of bug fixes:
- Fix display flickering on some chips at high refresh rates
- suspend/resume fix
- hotplug fix
- a couple of segfault fixes for certain cases
* 'drm-fixes-4.12' of git://people.freedesktop.org/~agd5f/linux:
drm/amdgpu: fix null point error when rmmod amdgpu.
drm/amd/powerplay: fix a signedness bugs
drm/amdgpu: fix NULL pointer panic of emit_gds_switch
drm/radeon: Unbreak HPD handling for r600+
drm/amd/powerplay/smu7: disable mclk switching for high refresh rates
drm/amd/powerplay/smu7: add vblank check for mclk switching (v2)
drm/radeon/ci: disable mclk switching for high refresh rates (v2)
drm/amdgpu/ci: disable mclk switching for high refresh rates (v2)
drm/amdgpu: fix fundamental suspend/resume issue
Dave Airlie [Fri, 26 May 2017 01:51:28 +0000 (11:51 +1000)]
Merge tag 'drm-misc-fixes-2017-05-25' of git://anongit.freedesktop.org/git/drm-misc into drm-fixes
Core Changes:
- Don't drop vblank reference more than once in cases of ww retry (Daniel)
Driver Changes:
- radeon: Fix oops during radeon probe trying to reference wrong device (Lukas)
- qxl: Avoid sleeping while in atomic context on cursor update (Gabriel)
- gma500: Use VBT mode instead of pre-programmed mode for LVDS (Patrik)
Cc: Lukas Wunner <lukas@wunner.de> Cc: Gabriel Krisman Bertazi <krisman@collabora.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Patrik Jakobsson <patrik.r.jakobsson@gmail.com>
* tag 'drm-misc-fixes-2017-05-25' of git://anongit.freedesktop.org/git/drm-misc:
drm/gma500/psb: Actually use VBT mode when it is found
drm: Fix deadlock retry loop in page_flip_ioctl
drm: qxl: Delay entering atomic context during cursor update
drm/radeon: Fix oops upon driver load on PowerXpress laptops
Nithin Sujir [Thu, 25 May 2017 02:45:17 +0000 (19:45 -0700)]
bonding: Don't update slave->link until ready to commit
In the loadbalance arp monitoring scheme, when a slave link change is
detected, the slave->link is immediately updated and slave_state_changed
is set. Later down the function, the rtnl_lock is acquired and the
changes are committed, updating the bond link state.
However, the acquisition of the rtnl_lock can fail. The next time the
monitor runs, since slave->link is already updated, it determines that
link is unchanged. This results in the bond link state permanently out
of sync with the slave link.
This patch modifies bond_loadbalance_arp_mon() to handle link changes
identical to bond_ab_arp_{inspect/commit}(). The new link state is
maintained in slave->new_link until we're ready to commit at which point
it's copied into slave->link.
NOTE: miimon_{inspect/commit}() has a more complex state machine
requiring the use of the bond_{propose,commit}_link_state() functions
which maintains the intermediate state in slave->link_new_state. The arp
monitors don't require that.
Testing: This bug is very easy to reproduce with the following steps.
1. In a loop, toggle a slave link of a bond slave interface.
2. In a separate loop, do ifconfig up/down of an unrelated interface to
create contention for rtnl_lock.
Within a few iterations, the bond link goes out of sync with the slave
link.
Signed-off-by: Nithin Nayak Sujir <nsujir@tintri.com> Cc: Mahesh Bandewar <maheshb@google.com> Cc: Jay Vosburgh <jay.vosburgh@canonical.com> Acked-by: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Daney [Wed, 24 May 2017 23:35:49 +0000 (16:35 -0700)]
test_bpf: Add a couple of tests for BPF_JSGE.
Some JITs can optimize comparisons with zero. Add a couple of
BPF_JSGE tests against immediate zero.
Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 25 May 2017 17:44:29 +0000 (13:44 -0400)]
Merge branch 'bpf-fixes'
Daniel Borkmann says:
====================
Various BPF fixes
Follow-up to fix incorrect pruning when alignment tracking is
in use and to properly clear regs after call to not leave stale
data behind, also a fix that adds bpf_clone_redirect to the
bpf_helper_changes_pkt_data helper and exposes correct map_flags
for lpm map into fdinfo. For details, please see individual
patches.
v1 -> v2:
- Reworked first patch so that env->strict_alignment is the
final indicator on whether we have to deal with strict
alignment rather than having CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS
checks on various locations, so only checking env->strict_alignment
is sufficient after that. Thanks for spotting, Dave!
- Added patch 3 and 4.
- Rest as is.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 24 May 2017 23:05:09 +0000 (01:05 +0200)]
bpf: add various verifier test cases
This patch adds various verifier test cases:
1) A test case for the pruning issue when tracking alignment
is used.
2) Various PTR_TO_MAP_VALUE_OR_NULL tests to make sure pointer
arithmetic turns such register into UNKNOWN_VALUE type.
3) Test cases for the special treatment of LD_ABS/LD_IND to
make sure verifier doesn't break calling convention here.
Latter is needed, since f.e. arm64 JIT uses r1 - r5 for
storing temporary data, so they really must be marked as
NOT_INIT.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 24 May 2017 23:05:08 +0000 (01:05 +0200)]
bpf: fix wrong exposure of map_flags into fdinfo for lpm
trie_alloc() always needs to have BPF_F_NO_PREALLOC passed in via
attr->map_flags, since it does not support preallocation yet. We
check the flag, but we never copy the flag into trie->map.map_flags,
which is later on exposed into fdinfo and used by loaders such as
iproute2. Latter uses this in bpf_map_selfcheck_pinned() to test
whether a pinned map has the same spec as the one from the BPF obj
file and if not, bails out, which is currently the case for lpm
since it exposes always 0 as flags.
Also copy over flags in array_map_alloc() and stack_map_alloc().
They always have to be 0 right now, but we should make sure to not
miss to copy them over at a later point in time when we add actual
flags for them to use.
Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation") Reported-by: Jarno Rajahalme <jarno@covalent.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 24 May 2017 23:05:07 +0000 (01:05 +0200)]
bpf: add bpf_clone_redirect to bpf_helper_changes_pkt_data
The bpf_clone_redirect() still needs to be listed in
bpf_helper_changes_pkt_data() since we call into
bpf_try_make_head_writable() from there, thus we need
to invalidate prior pkt regs as well.
Fixes: 36bbef52c7eb ("bpf: direct packet write and access for helpers for clsact progs") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 24 May 2017 23:05:06 +0000 (01:05 +0200)]
bpf: properly reset caller saved regs after helper call and ld_abs/ind
Currently, after performing helper calls, we clear all caller saved
registers, that is r0 - r5 and fill r0 depending on struct bpf_func_proto
specification. The way we reset these regs can affect pruning decisions
in later paths, since we only reset register's imm to 0 and type to
NOT_INIT. However, we leave out clearing of other variables such as id,
min_value, max_value, etc, which can later on lead to pruning mismatches
due to stale data.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 24 May 2017 23:05:05 +0000 (01:05 +0200)]
bpf: fix incorrect pruning decision when alignment must be tracked
Currently, when we enforce alignment tracking on direct packet access,
the verifier lets the following program pass despite doing a packet
write with unaligned access:
from 6 to 8:
R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
R3=pkt_end R7=inv,min_value=0,max_value=1 R10=fp
8: (b7) r0 = 0
9: (95) exit
from 5 to 10:
R0=pkt(id=0,off=14,r=0) R1=ctx R2=pkt(id=0,off=0,r=0)
R3=pkt_end R7=inv,min_value=2 R10=fp
10: (07) r0 += 1
11: (05) goto pc-6
6: safe <----- here, wrongly found safe
processed 15 insns
However, if we enforce a pruning mismatch by adding state into r8
which is then being mismatched in states_equal(), we find that for
the otherwise same program, the verifier detects a misaligned packet
access when actually walking that path:
... will let the above pass. The situation we run into is that
old->off <= cur->off (14 <= 15), meaning that prior walked paths
went with smaller offset, which was later used in the packet
access after successful packet range check and found to be safe
already.
For example: Given is R0=pkt(id=0,off=0,r=0). Adding offset 14
as in above program to it, results in R0=pkt(id=0,off=14,r=0)
before the packet range test. Now, testing this against R3=pkt_end
with 'if r0 > r3 goto out' will transform R0 into R0=pkt(id=0,off=14,r=14)
for the case when we're within bounds. A write into the packet
at offset *(u32 *)(r0 -4), that is, 2 + 14 -4, is valid and
aligned (2 is for NET_IP_ALIGN). After processing this with
all fall-through paths, we later on check paths from branches.
When the above skb->mark test is true, then we jump near the
end of the program, perform r0 += 1, and jump back to the
'if r0 > r3 goto out' test we've visited earlier already. This
time, R0 is of type R0=pkt(id=0,off=15,r=0), and we'll prune
that part because this time we'll have a larger safe packet
range, and we already found that with off=14 all further insn
were already safe, so it's safe as well with a larger off.
However, the problem is that the subsequent write into the packet
with 2 + 15 -4 is then unaligned, and not caught by the alignment
tracking. Note that min_align, aux_off, and aux_off_align were
all 0 in this example.
Since we cannot tell at this time what kind of packet access was
performed in the prior walk and what minimal requirements it has
(we might do so in the future, but that requires more complexity),
fix it to disable this pruning case for strict alignment for now,
and let the verifier do check such paths instead. With that applied,
the test cases pass and reject the program due to misalignment.
Fixes: d1174416747d ("bpf: Track alignment of register values in the verifier.")
Reference: http://patchwork.ozlabs.org/patch/761909/ Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net/ipv4/arp.c:880:35: warning: 'addr_type' may be used uninitialized in
this function [-Wmaybe-uninitialized]
While the code logic seems to be correct and doesn't allow the variable
to be used uninitialized, and the warning is not consistently
reproducible, it's still worth fixing it for other people not to waste
time looking at the warning in case it pops up in the build environment.
Yes, compiler is probably at fault, but we will need to accommodate.
Fixes: 7d472a59c0e5 ("arp: always override existing neigh entries with gratuitous ARP") Signed-off-by: Ihar Hrachyshka <ihrachys@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Wang [Wed, 24 May 2017 16:59:31 +0000 (09:59 -0700)]
tcp: avoid fastopen API to be used on AF_UNSPEC
Fastopen API should be used to perform fastopen operations on the TCP
socket. It does not make sense to use fastopen API to perform disconnect
by calling it with AF_UNSPEC. The fastopen data path is also prone to
race conditions and bugs when using with AF_UNSPEC.
One issue reported and analyzed by Vegard Nossum is as follows:
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Thread A: Thread B:
------------------------------------------------------------------------
sendto()
- tcp_sendmsg()
- sk_stream_memory_free() = 0
- goto wait_for_sndbuf
- sk_stream_wait_memory()
- sk_wait_event() // sleep
| sendto(flags=MSG_FASTOPEN, dest_addr=AF_UNSPEC)
| - tcp_sendmsg()
| - tcp_sendmsg_fastopen()
| - __inet_stream_connect()
| - tcp_disconnect() //because of AF_UNSPEC
| - tcp_transmit_skb()// send RST
| - return 0; // no reconnect!
| - sk_stream_wait_connect()
| - sock_error()
| - xchg(&sk->sk_err, 0)
| - return -ECONNRESET
- ... // wake up, see sk->sk_err == 0
- skb_entail() on TCP_CLOSE socket
If the connection is reopened then we will send a brand new SYN packet
after thread A has already queued a buffer. At this point I think the
socket internal state (sequence numbers etc.) becomes messed up.
When the new connection is closed, the FIN-ACK is rejected because the
sequence number is outside the window. The other side tries to
retransmit,
but __tcp_retransmit_skb() calls tcp_trim_head() on an empty skb which
corrupts the skb data length and hits a BUG() in copy_and_csum_bits().
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Hence, this patch adds a check for AF_UNSPEC in the fastopen data path
and return EOPNOTSUPP to user if such case happens.
Fixes: cf60af03ca4e7 ("tcp: Fast Open client - sendmsg(MSG_FASTOPEN)") Reported-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Wei Wang <weiwan@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Kapl [Wed, 24 May 2017 08:22:22 +0000 (10:22 +0200)]
net: move somaxconn init from sysctl code
The default value for somaxconn is set in sysctl_core_net_init(), but this
function is not called when kernel is configured without CONFIG_SYSCTL.
This results in the kernel not being able to accept TCP connections,
because the backlog has zero size. Usually, the user ends up with:
"TCP: request_sock_TCP: Possible SYN flooding on port 7. Dropping request. Check SNMP counters."
If SYN cookies are not enabled the connection is rejected.
Before ef547f2ac16 (tcp: remove max_qlen_log), the effects were less
severe, because the backlog was always at least eight slots long.
Signed-off-by: Roman Kapl <roman.kapl@sysgo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
KT Liao [Tue, 23 May 2017 20:41:47 +0000 (13:41 -0700)]
Input: elan_i2c - ignore signals when finishing updating firmware
Use wait_for_completion_timeout() instead of
wait_for_completion_interruptible_timeout() to avoid stray signals ruining
firmware update. Our timeout is only 300 msec so we are fine simply letting
it expire in case device misbehaves.
KT Liao [Thu, 25 May 2017 17:06:21 +0000 (10:06 -0700)]
Input: elan_i2c - clear INT before resetting controller
Some old touchpad FWs need to have interrupt cleared before issuing reset
command after updating firmware. We clear interrupt by attempting to read
full report from the controller, and discarding any data read.
Add null check to avoid a potential null pointer dereference.
Addresses-Coverity-ID: 1408831 Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Garver [Tue, 23 May 2017 22:37:27 +0000 (18:37 -0400)]
geneve: fix fill_info when using collect_metadata
Since 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") fill_info
does not return UDP_ZERO_CSUM6_RX when using COLLECT_METADATA. This is
because it uses ip_tunnel_info_af() with the device level info, which is
not valid for COLLECT_METADATA.
Fix by checking for the presence of the actual sockets.
Fixes: 9b4437a5b870 ("geneve: Unify LWT and netdev handling.") Signed-off-by: Eric Garver <e@erig.me> Acked-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jan Kara [Thu, 18 May 2017 23:36:24 +0000 (16:36 -0700)]
xfs: Move handling of missing page into one place in xfs_find_get_desired_pgoff()
Currently several places in xfs_find_get_desired_pgoff() handle the case
of a missing page. Make them all handled in one place after the loop has
terminated.
Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Jan Kara [Thu, 18 May 2017 23:36:23 +0000 (16:36 -0700)]
xfs: Fix off-by-in in loop termination in xfs_find_get_desired_pgoff()
There is an off-by-one error in loop termination conditions in
xfs_find_get_desired_pgoff() since 'end' may index a page beyond end of
desired range if 'endoff' is page aligned. It doesn't have any visible
effects but still it is good to fix it.
Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Jan Kara [Thu, 18 May 2017 23:36:22 +0000 (16:36 -0700)]
xfs: Fix missed holes in SEEK_HOLE implementation
XFS SEEK_HOLE implementation could miss a hole in an unwritten extent as
can be seen by the following command:
xfs_io -c "falloc 0 256k" -c "pwrite 0 56k" -c "pwrite 128k 8k"
-c "seek -h 0" file
wrote 57344/57344 bytes at offset 0
56 KiB, 14 ops; 0.0000 sec (49.312 MiB/sec and 12623.9856 ops/sec)
wrote 8192/8192 bytes at offset 131072
8 KiB, 2 ops; 0.0000 sec (70.383 MiB/sec and 18018.0180 ops/sec)
Whence Result
HOLE 139264
Where we can see that hole at offset 56k was just ignored by SEEK_HOLE
implementation. The bug is in xfs_find_get_desired_pgoff() which does
not properly detect the case when pages are not contiguous.
Fix the problem by properly detecting when found page has larger offset
than expected.
CC: stable@vger.kernel.org Fixes: d126d43f631f996daeee5006714fed914be32368 Signed-off-by: Jan Kara <jack@suse.cz> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Eryu Guan [Tue, 23 May 2017 15:30:46 +0000 (08:30 -0700)]
xfs: fix off-by-one on max nr_pages in xfs_find_get_desired_pgoff()
xfs_find_get_desired_pgoff() is used to search for offset of hole or
data in page range [index, end] (both inclusive), and the max number
of pages to search should be at least one, if end == index.
Otherwise the only page is missed and no hole or data is found,
which is not correct.
When block size is smaller than page size, this can be demonstrated
by preallocating a file with size smaller than page size and writing
data to the last block. E.g. run this xfs_io command on a 1k block
size XFS on x86_64 host.
# xfs_io -fc "falloc 0 3k" -c "pwrite 2k 1k" \
-c "seek -d 0" /mnt/xfs/testfile
wrote 1024/1024 bytes at offset 2048
1 KiB, 1 ops; 0.0000 sec (33.675 MiB/sec and 34482.7586 ops/sec)
Whence Result
DATA EOF
Data at offset 2k was missed, and lseek(2) returned ENXIO.
This is uncovered by generic/285 subtest 07 and 08 on ppc64 host,
where pagesize is 64k. Because a recent change to generic/285
reduced the preallocated file size to smaller than 64k.
Cc: stable@vger.kernel.org # v3.7+ Signed-off-by: Eryu Guan <eguan@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Eric Sandeen [Tue, 23 May 2017 02:54:10 +0000 (19:54 -0700)]
xfs: fix unaligned access in xfs_btree_visit_blocks
This structure copy was throwing unaligned access warnings on sparc64:
Kernel unaligned access at TPC[1043c088] xfs_btree_visit_blocks+0x88/0xe0 [xfs]
xfs_btree_copy_ptrs does a memcpy, which avoids it.
Signed-off-by: Eric Sandeen <sandeen@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
Nicholas Piggin [Sat, 20 May 2017 04:29:49 +0000 (14:29 +1000)]
powerpc: Add PPC_FEATURE userspace bits for SCV and DARN instructions
Providing "scv" support to userspace requires kernel support, so it
must be advertised as independently to the base ISA 3 instruction set.
The darn instruction relies on firmware enablement, so it has been
decided to split this out from the core ISA 3 feature as well.
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Jeremy Kerr [Wed, 24 May 2017 06:49:59 +0000 (16:49 +1000)]
powerpc/spufs: Fix hash faults for kernel regions
Commit ac29c64089b7 ("powerpc/mm: Replace _PAGE_USER with
_PAGE_PRIVILEGED") swapped _PAGE_USER for _PAGE_PRIVILEGED, and
introduced check_pte_access() which denied kernel access to
non-_PAGE_PRIVILEGED pages.
However, it didn't add _PAGE_PRIVILEGED to the hash fault handler
for spufs' kernel accesses, so the DMAs required to establish SPE
memory no longer work.
This change adds _PAGE_PRIVILEGED to the hash fault handler for
kernel accesses.
Fixes: ac29c64089b7 ("powerpc/mm: Replace _PAGE_USER with _PAGE_PRIVILEGED") Cc: stable@vger.kernel.org # v4.7+ Signed-off-by: Jeremy Kerr <jk@ozlabs.org> Reported-by: Sombat Tragolgosol <sombat3960@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>