Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f1e90f6e2c1fb0e491f910540314015324fed1e2 linux-next) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1798182
Improves socket memory utilization when receiving packets larger
than 128 bytes (the previous rx copybreak) and smaller than 256 bytes.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 87731f0c681c9682c5521e5197d89e561b7da395 linux-next) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
net: ena: limit refill Rx threshold to 256 to avoid latency issues
BugLink: http://bugs.launchpad.net/bugs/1798182
Currently Rx refill is done when the number of required descriptors is
above 1/8 queue size. With a default of 1024 entries per queue the
threshold is 128 descriptors.
There is intention to increase the queue size to 8196 entries.
In this case threshold of 1024 descriptors is too large and can hurt
latency.
Add another limitation to Rx threshold to be at most 256 descriptors.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0574bb806dad29a3dada0ee42b01645477d48282 linux-next) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
net: ena: use CSUM_CHECKED device indication to report skb's checksum status
BugLink: http://bugs.launchpad.net/bugs/1798182
Set skb->ip_summed to the correct value as reported by the device.
Add counter for the case where rx csum offload is enabled but
device didn't check it.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cb36bb36e1f17d2a7b9a9751e5cfec4235b46c93 linux-next) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
net: ena: add functions for handling Low Latency Queues in ena_netdev
BugLink: http://bugs.launchpad.net/bugs/1798182
This patch includes all code changes necessary in ena_netdev to enable
packet sending via the LLQ placemnt mode.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 38005ca816a7ef5516dc8e59ae95716739aa75b0 linux-next) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
net: ena: add functions for handling Low Latency Queues in ena_com
BugLink: http://bugs.launchpad.net/bugs/1798182
This patch introduces APIs for detection, initialization, configuration
and actual usage of low latency queues(LLQ). It extends transmit API with
creation of LLQ descriptors in device memory (which include host buffers
descriptors as well as packet header)
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 689b2bdaaa1480ad2c14bdc4c6eaf38284549022 linux-next) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
net: ena: introduce Low Latency Queues data structures according to ENA spec
BugLink: http://bugs.launchpad.net/bugs/1798182
Low Latency Queues(LLQ) allow usage of device's memory for descriptors
and headers. Such queues decrease processing time since data is already
located on the device when driver rings the doorbell.
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a7982b8ec947052df6d4467b3a81571f02f528e0 linux-next) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 095f2f1facba0c78f23750dba65c78cef722c1ea linux-next) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1798182
Reduce fastpath overhead by making ena_com_tx_comp_req_id_get() inline.
Also move it to ena_eth_com.h file with its dependency function
ena_com_cq_inc_head().
Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0e575f8542d1f4d74df30b5a9ba419c5373d01a1 linux-next) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
net: ena: fix NULL dereference due to untimely napi initialization
BugLink: http://bugs.launchpad.net/bugs/1798182
napi poll functions should be initialized before running request_irq(),
to handle a rare condition where there is a pending interrupt, causing
the ISR to fire immediately while the poll function wasn't set yet,
causing a NULL dereference.
Fixes: 1738cd3ed342 ("net: ena: Add a driver for Amazon Elastic Network Adapters (ENA)") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 78a55d05def95144ca5fa9a64c49b2a0636a9866) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
net: ena: fix rare bug when failed restart/resume is followed by driver removal
BugLink: http://bugs.launchpad.net/bugs/1798182
In a rare scenario when ena_device_restore() fails, followed by device
remove, an FLR will not be issued. In this case, the device will keep
sending asynchronous AENQ keep-alive events, even after driver removal,
leading to memory corruption.
Fixes: 8c5c7abdeb2d ("net: ena: add power management ops to the ENA driver") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d7703ddbd7c9cb1ab7c08e1b85b314ff8cea38e9) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
net: ena: fix warning in rmmod caused by double iounmap
BugLink: http://bugs.launchpad.net/bugs/1798182
Memory mapped with devm_ioremap is automatically freed when the driver
is disconnected from the device. Therefore there is no need to
explicitly call devm_iounmap.
Fixes: 0857d92f71b6 ("net: ena: add missing unmap bars on device removal") Fixes: 411838e7b41c ("net: ena: fix rare kernel crash when bar memory remap fails") Signed-off-by: Arthur Kiyanovski <akiyano@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d79c3888bde6581da7ff9f9d6f581900ecb5e632) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Eric Dumazet [Thu, 27 Sep 2018 16:31:58 +0000 (09:31 -0700)]
net: ena: remove ndo_poll_controller
BugLink: http://bugs.launchpad.net/bugs/1798182
As diagnosed by Song Liu, ndo_poll_controller() can
be very dangerous on loaded hosts, since the cpu
calling ndo_poll_controller() might steal all NAPI
contexts (for all RX/TX queues of the NIC). This capture
can last for unlimited amount of time, since one
cpu is generally not able to drain all the queues under load.
ena uses NAPI for TX completions, so we better let core
networking stack call the napi->poll() to avoid the capture.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Netanel Belgazal <netanel@amazon.com> Cc: Saeed Bishara <saeedb@amazon.com> Cc: Zorik Machulsky <zorik@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 21627982e4fff76a053f4d08d7fb56e532e08d52) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Acked-by: Khalid Elmously <khalid.elmously@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
The upcoming fix for the -EBUSY return from affinity settings requires to
use the irq_move_irq() functionality even on irq remapped interrupts. To
avoid the out of line call, move the check for the pending bit into an
inline helper.
Preparatory change for the real fix. No functional change.
Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Cc: Dou Liyang <douly.fnst@cn.fujitsu.com> Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The case that interrupt affinity setting fails with -EBUSY can be handled
in the kernel completely by using the already available generic pending
infrastructure.
If a irq_chip::set_affinity() fails with -EBUSY, handle it like the
interrupts for which irq_chip::set_affinity() can only be invoked from
interrupt context. Copy the new affinity mask to irq_desc::pending_mask and
set the affinity pending bit. The next raised interrupt for the affected
irq will check the pending bit and try to set the new affinity from the
handler. This avoids that -EBUSY is returned when an affinity change is
requested from user space and the previous change has not been cleaned
up. The new affinity will take effect when the next interrupt is raised
from the device.
Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.819273597@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The generic pending interrupt mechanism moves interrupts from the interrupt
handler on the original target CPU to the new destination CPU. This is
required for x86 and ia64 due to the way the interrupt delivery and
acknowledge works if the interrupts are not remapped.
However that update can fail for various reasons. Some of them are valid
reasons to discard the pending update, but the case, when the previous move
has not been fully cleaned up is not a legit reason to fail.
Check the return value of irq_do_set_affinity() for -EBUSY, which indicates
a pending cleanup, and rearm the pending move in the irq dexcriptor so it's
tried again when the next interrupt arrives.
Fixes: 996c591227d9 ("x86/irq: Plug vector cleanup race") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.386544292@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
To address the EBUSY fail of interrupt affinity settings in case that the
previous setting has not been cleaned up yet, use the new apic_ack_irq()
function instead of the special ir_ack_apic_edge() implementation which is
merily a wrapper around ack_APIC_irq().
Preparatory change for the real fix
Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.555716895@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
To address the EBUSY fail of interrupt affinity settings in case that the
previous setting has not been cleaned up yet, use the new apic_ack_irq()
function instead of the special uv_ack_apic() implementation which is
merily a wrapper around ack_APIC_irq().
Preparatory change for the real fix
Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Reported-by: Song Liu <liu.song.a23@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.721691398@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
To address the EBUSY fail of interrupt affinity settings in case that the
previous setting has not been cleaned up yet, use the new apic_ack_irq()
function instead of directly invoking ack_APIC_irq().
Preparatory change for the real fix
Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.639011135@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
apic_ack_edge() is explicitely for handling interrupt affinity cleanup when
interrupt remapping is not available or disable.
Remapped interrupts and also some of the platform specific special
interrupts, e.g. UV, invoke ack_APIC_irq() directly.
To address the issue of failing an affinity update with -EBUSY the delayed
affinity mechanism can be reused, but ack_APIC_irq() does not handle
that. Adding this to ack_APIC_irq() is not possible, because that function
is also used for exceptions and directly handled interrupts like IPIs.
Create a new function, which just contains the conditional invocation of
irq_move_irq() and the final ack_APIC_irq().
Reuse the new function in apic_ack_edge().
Preparatory change for the real fix.
Fixes: dccfe3147b42 ("x86/vector: Simplify vector move cleanup") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <liu.song.a23@gmail.com> Cc: Dmitry Safonov <0x7f454c46@gmail.com> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Tariq Toukan <tariqt@mellanox.com> Link: https://lkml.kernel.org/r/20180604162224.471925894@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Several people observed the WARN_ON() in irq_matrix_free() which triggers
when the caller tries to free an vector which is not in the allocation
range. Song provided the trace information which allowed to decode the root
cause.
The rework of the vector allocation mechanism failed to preserve a sanity
check, which prevents setting a new target vector/CPU when the previous
affinity change has not fully completed.
As a result a half finished affinity change can be overwritten, which can
cause the leak of a irq descriptor pointer on the previous target CPU and
double enqueue of the hlist head into the cleanup lists of two or more
CPUs. After one CPU cleaned up its vector the next CPU will invoke the
cleanup handler with vector 0, which triggers the out of range warning in
the matrix allocator.
Prevent this by checking the apic_data of the interrupt whether the
move_in_progress flag is false and the hlist node is not hashed. Return
-EBUSY if not.
This prevents the damage and restores the behaviour before the vector
allocation rework, but due to other changes in that area it also widens the
chance that user space can observe -EBUSY. In theory this should be fine,
but actually not all user space tools handle -EBUSY correctly. Addressing
that is not part of this fix, but will be addressed in follow up patches.
Fixes: 69cde0004a4b ("x86/vector: Use matrix allocator for vector assignment") Reported-by: Dmitry Safonov <0x7f454c46@gmail.com> Reported-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Song Liu <liu.song.a23@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Song Liu <songliubraving@fb.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Cc: Mike Travis <mike.travis@hpe.com> Cc: Borislav Petkov <bp@alien8.de> Link: https://lkml.kernel.org/r/20180604162224.303870257@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Commit 05829d9431df (cpufreq: ti-cpufreq: kfree opp_data when
failure) has fixed a memory leak in the failure path, however
the patch returned a positive value on get_cpu_device() failure
instead of the previous negative value. Fix this incorrect error
return value properly.
Fixes: 05829d9431df (cpufreq: ti-cpufreq: kfree opp_data when failure) Cc: 4.14+ <stable@vger.kernel.org> # v4.14+ Signed-off-by: Suman Anna <s-anna@ti.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
DP83620 register set is compatible with the DP83848, but it also supports
100base-FX. When the hardware is configured such as that fiber mode is
enabled, autonegotiation is not possible.
The chip, however, doesn't expose this information via BMSR_ANEGCAPABLE.
Instead, this bit is always set high, even if the particular hardware
configuration makes it so that auto negotiation is not possible [1]. Under
these circumstances, the phy subsystem keeps trying for autonegotiation to
happen, without success.
Hereby, we inspect BMCR_ANENABLE bit after genphy_config_init, which on
reset is set to 0 when auto negotiation is disabled, and so we use this
value instead of BMSR_ANEGCAPABLE.
In __alloc_pages_slowpath() we reset zonelist and preferred_zoneref for
allocations that can ignore memory policies. The zonelist is obtained
from current CPU's node. This is a problem for __GFP_THISNODE
allocations that want to allocate on a different node, e.g. because the
allocating thread has been migrated to a different CPU.
This has been observed to break SLAB in our 4.4-based kernel, because
there it relies on __GFP_THISNODE working as intended. If a slab page
is put on wrong node's list, then further list manipulations may corrupt
the list because page_to_nid() is used to determine which node's
list_lock should be locked and thus we may take a wrong lock and race.
Current SLAB implementation seems to be immune by luck thanks to commit 511e3a058812 ("mm/slab: make cache_grow() handle the page allocated on
arbitrary node") but there may be others assuming that __GFP_THISNODE
works as promised.
We can fix it by simply removing the zonelist reset completely. There
is actually no reason to reset it, because memory policies and cpusets
don't affect the zonelist choice in the first place. This was different
when commit 183f6371aac2 ("mm: ignore mempolicies when using
ALLOC_NO_WATERMARK") introduced the code, as mempolicies provided their
own restricted zonelists.
We might consider this for 4.17 although I don't know if there's
anything currently broken.
SLAB is currently not affected, but in kernels older than 4.7 that don't
yet have 511e3a058812 ("mm/slab: make cache_grow() handle the page
allocated on arbitrary node") it is. That's at least 4.4 LTS. Older
ones I'll have to check.
So stable backports should be more important, but will have to be
reviewed carefully, as the code went through many changes. BTW I think
that also the ac->preferred_zoneref reset is currently useless if we
don't also reset ac->nodemask from a mempolicy to NULL first (which we
probably should for the OOM victims etc?), but I would leave that for a
separate patch.
Link: http://lkml.kernel.org/r/20180525130853.13915-1-vbabka@suse.cz Signed-off-by: Vlastimil Babka <vbabka@suse.cz> Fixes: 183f6371aac2 ("mm: ignore mempolicies when using ALLOC_NO_WATERMARK") Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Michal Hocko <mhocko@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The HID descriptor for the 2nd-gen Intuos Pro large (PTH-860) contains
a typo which defines an incorrect logical maximum Y value. This causes
a small portion of the bottom of the tablet to become unusable (both
because the area is below the "bottom" of the tablet and because
'wacom_wac_event' ignores out-of-range values). It also results in a
skewed aspect ratio.
To fix this, we add a quirk to 'wacom_usage_mapping' which overwrites
the data with the correct value.
Current ISH driver only registers suspend/resume PM callbacks which don't
support hibernation (suspend to disk). Basically after hiberation, the ISH
can't resume properly and user may not see sensor events (for example: screen
rotation may not work).
User will not see a crash or panic or anything except the following message
in log:
hid-sensor-hub 001F:8086:22D8.0001: timeout waiting for response from ISHTP device
So this patch adds support for S4/hiberbation to ISH by using the
SIMPLE_DEV_PM_OPS() MACRO instead of struct dev_pm_ops directly. The suspend
and resume functions will now be used for both suspend to RAM and hibernation.
If power management is disabled, SIMPLE_DEV_PM_OPS will do nothing, the suspend
and resume related functions won't be used, so mark them as __maybe_unused to
clarify that this is the intended behavior, and remove #ifdefs for power
management.
As long as a symlink inode remains in-core, the destination (and
therefore size) will not be re-fetched from the server, as it cannot
change. The original implementation of the attribute cache assumed that
setting the expiry time in the past was sufficient to cause a re-fetch
of all attributes on the next getattr. That does not work in this case.
The bug manifested itself as follows. When the command sequence
The page loading code trusts the data provided in the firmware images
a bit too much and may cause a buffer overflow or copy unknown data if
the block sizes don't match what we expect.
To prevent potential problems, harden the code by checking if the
sizes we are copying are what we expect.
Commit 184add2ca23c ("libata: Apply NOLPM quirk for SanDisk
SD7UB3Q*G1001 SSDs") disabled LPM for SanDisk SD7UB3Q*G1001 SSDs.
This has lead to several reports of users of that SSD where LPM
was working fine and who know have a significantly increased idle
power consumption on their laptops.
Likely there is another problem on the T450s from the original
reporter which gets exposed by the uncore reaching deeper sleep
states (higher PC-states) due to LPM being enabled. The problem as
reported, a hardfreeze about once a day, already did not sound like
it would be caused by LPM and the reports of the SSD working fine
confirm this. The original reporter is ok with dropping the quirk.
A X250 user has reported the same hard freeze problem and for him
the problem went away after unrelated updates, I suspect some GPU
driver stack changes fixed things.
TL;DR: The original reporters problem were triggered by LPM but not
an LPM issue, so drop the quirk for the SSD in question.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1583207 Cc: stable@vger.kernel.org Cc: Richard W.M. Jones <rjones@redhat.com> Cc: Lorenzo Dalrio <lorenzo.dalrio@gmail.com> Reported-by: Lorenzo Dalrio <lorenzo.dalrio@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org> Acked-by: "Richard W.M. Jones" <rjones@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
According to current code implementation, detecting the long
idle period is done by checking if the interval between two
adjacent utilization update handlers is long enough. Although
this mechanism can detect if the idle period is long enough
(no utilization hooks invoked during idle period), it might
not cover a corner case: if the task has occupied the CPU
for too long which causes no context switches during that
period, then no utilization handler will be launched until this
high prio task is scheduled out. As a result, the idle_periods
field might be calculated incorrectly because it regards the
100% load as 0% and makes the conservative governor who uses
this field confusing.
Change the detection to compare the idle_time with sampling_rate
directly.
Reported-by: Artem S. Tashkinov <t.artem@mailcity.com> Signed-off-by: Chen Yu <yu.c.chen@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
If the policy limits are updated via cpufreq_update_policy() and
subsequently via sysfs, the limits stored in user_policy may be
set incorrectly.
For example, if both min and max are set via sysfs to the maximum
available frequency, user_policy.min and user_policy.max will also
be the maximum. If a policy notifier triggered by
cpufreq_update_policy() lowers both the min and the max at this
point, that change is not reflected by the user_policy limits, so
if the max is updated again via sysfs to the same lower value,
then user_policy.max will be lower than user_policy.min which
shouldn't happen. In particular, if one of the policy CPUs is
then taken offline and back online, cpufreq_set_policy() will
fail for it due to a failing limits check.
To prevent that from happening, initialize the min and max fields
of the new_policy object to the ones stored in user_policy that
were previously set via sysfs.
Signed-off-by: Kevin Wangtao <kevin.wangtao@hisilicon.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
[ rjw: Subject & changelog ] Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
cgwb_release() punts the actual release to cgwb_release_workfn() on
system_wq. Depending on the number of cgroups or block devices, there
can be a lot of cgwb_release_workfn() in flight at the same time.
We're periodically seeing close to 256 kworkers getting stuck with the
following stack trace and overtime the entire system gets stuck.
1. A lot of cgwb_release_workfn() is queued at the same time and all
system_wq kworkers are assigned to execute them.
2. They all end up calling synchronize_rcu_expedited(). One of them
wins and tries to perform the expedited synchronization.
3. However, that invovles queueing rcu_exp_work to system_wq and
waiting for it. Because #1 is holding all available kworkers on
system_wq, rcu_exp_work can't be executed. cgwb_release_workfn()
is waiting for synchronize_rcu_expedited() which in turn is waiting
for cgwb_release_workfn() to free up some of the kworkers.
We shouldn't be scheduling hundreds of cgwb_release_workfn() at the
same time. There's nothing to be gained from that. This patch
updates cgwb release path to use a dedicated percpu workqueue with
@max_active of 1.
While this resolves the problem at hand, it might be a good idea to
isolate rcu_exp_work to its own workqueue too as it can be used from
various paths and is prone to this sort of indirect A-A deadlocks.
It is not allowed to reinit q->tag_set_list list entry while RCU grace
period has not completed yet, otherwise the following soft lockup in
blk_mq_sched_restart() happens:
1. There are several MQ queues with shared tags.
2. One queue is about to be freed and now task is in
blk_mq_del_queue_tag_set().
3. Other CPU is in blk_mq_sched_restart() and loops over all queues in
tag list in order to find hctx to restart.
Because linked list entry was modified in blk_mq_del_queue_tag_set()
without proper waiting for a grace period, blk_mq_sched_restart()
never ends, spining in list_for_each_entry_rcu_rr(), thus soft lockup.
Fix is simple: reinit list entry after an RCU grace period elapsed.
Fixes: Fixes: 705cda97ee3a ("blk-mq: Make it safe to use RCU to iterate over blk_mq_tag_set.tag_list") Cc: stable@vger.kernel.org Cc: Sagi Grimberg <sagi@grimberg.me> Cc: linux-block@vger.kernel.org Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
When we stopped relying on the bdev everywhere I broke updating the
block device size on the fly, which ceph relies on. We can't just do
set_capacity, we also have to do bd_set_size so things like parted will
notice the device size change.
Fixes: 29eaadc ("nbd: stop using the bdev everywhere")
cc: stable@vger.kernel.org Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
I messed up changing the size of an NBD device while it was connected by
not actually updating the device or doing the uevent. Fix this by
updating everything if we're connected and we change the size.
cc: stable@vger.kernel.org Fixes: 639812a ("nbd: don't set the device size until we're connected") Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
This fixes a use after free bug, we shouldn't be doing disk->queue right
after we do del_gendisk(disk). Save the queue and do the cleanup after
the del_gendisk.
Validate_buf () function checks for an expected minimum sized response
passed to query_info() function.
For security information, the size of a security descriptor can be
smaller (one subauthority, no ACEs) than the size of the structure
that defines FileInfoClass of FileAllInformation.
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=199725 Cc: <stable@vger.kernel.org> Signed-off-by: Shirish Pargaonkar <shirishpargaonkar@gmail.com> Reviewed-by: Noah Morrison <noah.morrison@rubrik.com> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Each of the strings that we want to put into the buf[MAX_FLAG_OPT_SIZE]
in flags_read() is two characters long. But the sprintf() adds
a trailing newline and will add a terminating NUL byte. So
MAX_FLAG_OPT_SIZE needs to be 4.
sprintf() calls vsnprintf() and *that* does return:
" * The return value is the number of characters which would
* be generated for the given input, excluding the trailing
* '\0', as per ISO C99."
When 'kzalloc()' fails in 'snd_hda_attach_pcm_stream()', a new pcm instance is
created without setting its operators via 'snd_pcm_set_ops()'. Following
operations on the new pcm instance can trigger kernel null pointer dereferences
and cause kernel oops.
This bug was found with my work on building a gray-box fault-injection tool for
linux-kernel-module binaries. A kernel null pointer dereference was confirmed
from line 'substream->ops->open()' in function 'snd_pcm_open_substream()' in
file 'sound/core/pcm_native.c'.
This patch fixes the bug by calling 'snd_device_free()' in the error handling
path of 'kzalloc()', which removes the new pcm instance from the snd card before
returns with an error code.
[BUG]
Btrfs can create compressed extent without checksum (even though it
shouldn't), and if we then try to replace device containing such extent,
the result device will contain all the uncompressed data instead of the
compressed one.
Test case already submitted to fstests:
https://patchwork.kernel.org/patch/10442353/
[CAUSE]
When handling compressed extent without checksum, device replace will
goe into copy_nocow_pages() function.
In that function, btrfs will get all inodes referring to this data
extents and then use find_or_create_page() to get pages direct from that
inode.
The problem here is, pages directly from inode are always uncompressed.
And for compressed data extent, they mismatch with on-disk data.
Thus this leads to corrupted compressed data extent written to replace
device.
[FIX]
In this attempt, we could just remove the "optimization" branch, and let
unified scrub_pages() to handle it.
Although scrub_pages() won't bother reusing page cache, it will be a
little slower, but it does the correct csum checking and won't cause
such data corruption caused by "optimization".
Note about the fix: this is the minimal fix that can be backported to
older stable trees without conflicts. The whole callchain from
copy_nocow_pages() can be deleted, and will be in followup patches.
Fixes: ff023aac3119 ("Btrfs: add code to scrub to copy read data to another disk") CC: stable@vger.kernel.org # 4.4+ Reported-by: James Harvey <jamespharvey20@gmail.com> Reviewed-by: James Harvey <jamespharvey20@gmail.com> Signed-off-by: Qu Wenruo <wqu@suse.com>
[ remove code removal, add note why ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
In cow_file_range(), create_io_em() may fail, but its return value is
not recorded. Then return value may be 0 even it failed which is a
wrong behavior.
Let cow_file_range() return PTR_ERR(em) if create_io_em() failed.
Fixes: 6f9994dbabe5 ("Btrfs: create a helper to create em for IO") CC: stable@vger.kernel.org # 4.11+ Signed-off-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
If we have invalid flags set, when we error out we must drop our writer
counter and free the buffer we allocated for the arguments. This bug is
trivially reproduced with the following program on 4.7+:
ret = ioctl(fd, BTRFS_IOC_RM_DEV_V2, &vol_args);
if (ret == -1)
perror("ioctl");
close(fd);
return EXIT_SUCCESS;
}
When unmounting the filesystem, we'll hit the
WARN_ON(mnt_get_writers(mnt)) in cleanup_mnt() and also may prevent the
filesystem to be remounted read-only as the writer count will stay
lifted.
Fixes: 6b526ed70cf1 ("btrfs: introduce device delete by devid") CC: stable@vger.kernel.org # 4.9+ Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
In btrfs_clone_files(), we must check the NODATASUM flag while the
inodes are locked. Otherwise, it's possible that btrfs_ioctl_setflags()
will change the flags after we check and we can end up with a party
checksummed file.
The race window is only a few instructions in size, between the if and
the locks which is:
3834 if (S_ISDIR(src->i_mode) || S_ISDIR(inode->i_mode))
3835 return -EISDIR;
where the setflags must be run and toggle the NODATASUM flag (provided
the file size is 0). The clone will block on the inode lock, segflags
takes the inode lock, changes flags, releases log and clone continues.
Not impossible but still needs a lot of bad luck to hit unintentionally.
Fixes: 0e7b824c4ef9 ("Btrfs: don't make a file partly checksummed through file clone") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com>
[ update changelog ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
syzbot is hitting WARN() at kernfs_add_one() [1].
This is because kernfs_create_link() is confused by previous device_add()
call which continued without setting dev->kobj.parent field when
get_device_parent() failed by memory allocation fault injection.
Fix this by propagating the error from class_dir_create_and_add() to
the calllers of get_device_parent().
ext4_resize_fs() has an off-by-one bug when checking whether growing of
a filesystem will not overflow inode count. As a result it allows a
filesystem with 8192 inodes per group to grow to 64TB which overflows
inode count to 0 and makes filesystem unusable. Fix it.
Cc: stable@vger.kernel.org Fixes: 3f8a6411fbada1fa482276591e037f3b1adcf55b Reported-by: Jaco Kroon <jaco@uls.co.za> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
If ext4_find_inline_data_nolock() returns an error it needs to get
reflected up to ext4_iget(). In order to fix this,
ext4_iget_extra_inode() needs to return an error (and not return
void).
This is related to "ext4: do not allow external inodes for inline
data" (which fixes CVE-2018-11412) in that in the errors=continue
case, it would be useful to for userspace to receive an error
indicating that file system is corrupted.
Currently in ext4_punch_hole we're going to skip the mtime update if
there are no actual blocks to release. However we've actually modified
the file by zeroing the partial block so the mtime should be updated.
Moreover the sync and datasync handling is skipped as well, which is
also wrong. Fix it.
When ext4_ind_map_blocks() computes a length of a hole, it doesn't count
with the fact that mapped offset may be somewhere in the middle of the
completely empty subtree. In such case it will return too large length
of the hole which then results in lseek(SEEK_DATA) to end up returning
an incorrect offset beyond the end of the hole.
Fix the problem by correctly taking offset within a subtree into account
when computing a length of a hole.
Fixes: facab4d9711e7aa3532cb82643803e8f1b9518e8 CC: stable@vger.kernel.org Reported-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
This happened through fault injection where aead_req allocation in
tls_do_encryption() eventually failed and we returned -ENOMEM from
the function. Turns out that the use-after-free is triggered from
tls_sw_sendmsg() in the second tls_push_record(). The error then
triggers a jump to waiting for memory in sk_stream_wait_memory()
resp. returning immediately in case of MSG_DONTWAIT. What follows is
the trim_both_sgl(sk, orig_size), which drops elements from the sg
list added via tls_sw_sendmsg(). Now the use-after-free gets triggered
when the socket is being closed, where tls_sk_proto_close() callback
is invoked. The tls_complete_pending_work() will figure that there's
a pending closed tls record to be flushed and thus calls into the
tls_push_pending_closed_record() from there. ctx->push_pending_record()
is called from the latter, which is the tls_sw_push_pending_record()
from sw path. This again calls into tls_push_record(). And here the
tls_fill_prepend() will panic since the buffer address has been freed
earlier via trim_both_sgl(). One way to fix it is to move the aead
request allocation out of tls_do_encryption() early into tls_push_record().
This means we don't prep the tls header and advance state to the
TLS_PENDING_CLOSED_RECORD before allocation which could potentially
fail happened. That fixes the issue on my side.
Fixes: 3c4d7559159b ("tls: kernel TLS support") Reported-by: syzbot+5c74af81c547738e1684@syzkaller.appspotmail.com Reported-by: syzbot+709f2810a6a05f11d4d3@syzkaller.appspotmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Dave Watson <davejwatson@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Tun, tap, virtio, packet and uml vector all use struct virtio_net_hdr
to communicate packet metadata to userspace.
For skbuffs with vlan, the first two return the packet as it may have
existed on the wire, inserting the VLAN tag in the user buffer. Then
virtio_net_hdr.csum_start needs to be adjusted by VLAN_HLEN bytes.
Commit f09e2249c4f5 ("macvtap: restore vlan header on user read")
added this feature to macvtap. Commit 3ce9b20f1971 ("macvtap: Fix
csum_start when VLAN tags are present") then fixed up csum_start.
Virtio, packet and uml do not insert the vlan header in the user
buffer.
When introducing virtio_net_hdr_from_skb to deduplicate filling in
the virtio_net_hdr, the variant from macvtap which adds VLAN_HLEN was
applied uniformly, breaking csum offset for packets with vlan on
virtio and packet.
Make insertion of VLAN_HLEN optional. Convert the callers to pass it
when needed.
Fixes: e858fae2b0b8f4 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion") Fixes: 1276f24eeef2 ("packet: use common code for virtio_net_hdr and skb GSO conversion") Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
After commit 6b229cf77d68 ("udp: add batching to udp_rmem_release()")
the sk_rmem_alloc field does not measure exactly anymore the
receive queue length, because we batch the rmem release. The issue
is really apparent only after commit 0d4a6608f68c ("udp: do rmem bulk
free even if the rx sk queue is empty"): the user space can easily
check for an empty socket with not-0 queue length reported by the 'ss'
tool or the procfs interface.
We need to use a custom UDP helper to report the correct queue length,
taking into account the forward allocation deficit.
Reported-by: trevor.francis@46labs.com Fixes: 6b229cf77d68 ("UDP: add batching to udp_rmem_release()") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
commit 079096f103fa ("tcp/dccp: install syn_recv requests into ehash
table") introduced an optimization for the handling of child sockets
created for a new TCP connection.
But this optimization passes any data associated with the last ACK of the
connection handshake up the stack without verifying its checksum, because it
calls tcp_child_process(), which in turn calls tcp_rcv_state_process()
directly. These lower-level processing functions do not do any checksum
verification.
Insert a tcp_checksum_complete call in the TCP_NEW_SYN_RECEIVE path to
fix this.
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Frank van der Linden <fllinden@amazon.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Tested-by: Balbir Singh <bsingharora@gmail.com> Reviewed-by: Balbir Singh <bsingharora@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
use nla_strlcpy() to avoid copying data beyond the length of TCA_DEF_DATA
netlink attribute, in case it is less than SIMP_MAX_DATA and it does not
end with '\0' character.
v2: fix errors in the commit message, thanks Hangbin Liu
Fixes: fa1b1cff3d06 ("net_cls_act: Make act_simple use of netlink policy.") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
IPVS setups with local client and remote tunnel server need
to create exception for the local virtual IP. What we do is to
change PMTU from 64KB (on "lo") to 1460 in the common case.
Suggested-by: Martin KaFai Lau <kafai@fb.com> Fixes: 45e4fd26683c ("ipv6: Only create RTF_CACHE routes after encountering pmtu exception") Fixes: 7343ff31ebf0 ("ipv6: Don't create clones of host routes.") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: David Ahern <dsahern@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Commit 4a0e3e989d66 ("cdc_ncm: Add support for moving NDP to end
of NCM frame") added logic to reserve space for the NDP at the
end of the NTB/skb. This reservation did not take the final
alignment of the NDP into account, causing us to reserve too
little space. Additionally the padding prior to NDP addition did
not ensure there was enough space for the NDP.
The NTB/skb with the NDP appended would then exceed the configured
max size. This caused the final padding of the NTB to use a
negative count, padding to almost INT_MAX, and resulting in:
Commit e1069bbfcf3b ("net: cdc_ncm: Reduce memory use when kernel
memory low") made this bug much more likely to trigger by reducing
the NTB size under memory pressure.
Link: https://bugs.debian.org/893393 Reported-by: Горбешко Богдан <bodqhrohro@gmail.com> Reported-and-tested-by: Dennis Wassenberg <dennis.wassenberg@secunet.com> Cc: Enrico Mioso <mrkiko.rs@gmail.com> Fixes: 4a0e3e989d66 ("cdc_ncm: Add support for moving NDP to end of NCM frame") Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The check for cpu hit statistics was not returning immediate false for
any non vport rep netdev and hence we crashed (say on mlx5 probed VFs) if
user-space tool was calling into any possible netdev in the system.
Fix that by doing a proper check before dereferencing.
Fixes: 1d447a39142e ('net/mlx5e: Extendable vport representor netdev private data') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Reported-by: Eli Cohen <eli@melloanox.com> Reviewed-by: Eli Cohen <eli@melloanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
(cherry picked from commit 8ffd569aaa818f2624ca821d9a246342fa8b8c50) Signed-off-by: Talat Batheesh <talatb@mellanox.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Corey Minyard [Mon, 22 Oct 2018 19:35:55 +0000 (14:35 -0500)]
ipmi: Remove ACPI SPMI probing from the SSIF (I2C) driver
The IPMI spec states:
The purpose of the SPMI Table is to provide a mechanism that can
be used by the OSPM (an ACPI term for “OS Operating System-directed
configuration and Power Management” essentially meaning an ACPI-aware
OS or OS loader) very early in the boot process, e.g., before the
ability to execute ACPI control methods in the OS is available.
When we are probing IPMI in Linux, ACPI control methods are available,
so we shouldn't be probing using SPMI. It could cause some confusion
during the probing process.
Eddie.Horng [Wed, 24 Oct 2018 06:54:50 +0000 (14:54 +0800)]
cap_inode_getsecurity: use d_find_any_alias() instead of d_find_alias()
BugLink: https://bugs.launchpad.net/bugs/1786729
The code in cap_inode_getsecurity(), introduced by commit 8db6c34f1dbc
("Introduce v3 namespaced file capabilities"), should use
d_find_any_alias() instead of d_find_alias() do handle unhashed dentry
correctly. This is needed, for example, if execveat() is called with an
open but unlinked overlayfs file, because overlayfs unhashes dentry on
unlink.
This is a regression of real life application, first reported at
https://www.spinics.net/lists/linux-unionfs/msg05363.html
Below reproducer and setup can reproduce the case.
const char* exec="echo";
const char *newargv[] = { "echo", "hello", NULL};
const char *newenviron[] = { NULL };
int fd, err;
On regular filesystem, for example, ext4 read xattr from
disk and return to execveat(), will not trigger this issue, however,
the overlay attr handler pass real dentry to vfs_getxattr() will.
This reproducer calls fgetxattr() with an unlinked fd, involkes
vfs_getxattr() then reproduced the case that d_find_alias() in
cap_inode_getsecurity() can't find the unlinked dentry.
Suggested-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Amir Goldstein <amir73il@gmail.com> Acked-by: Serge E. Hallyn <serge@hallyn.com> Fixes: 8db6c34f1dbc ("Introduce v3 namespaced file capabilities") Cc: <stable@vger.kernel.org> # v4.14 Signed-off-by: Eddie Horng <eddie.horng@mediatek.com> Signed-off-by: Eric W. Biederman <ebiederm@xmission.com>
(cherry picked from commit 355139a8dba446cc11a424cddbf7afebc3041ba1) Signed-off-by: Po-Hsu Lin <po-hsu.lin@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Corey Minyard [Thu, 25 Oct 2018 15:04:52 +0000 (10:04 -0500)]
ipmi:ssif: Add support for multi-part transmit messages > 2 parts
The spec was fairly confusing about how multi-part transmit messages
worked, so the original implementation only added support for two
part messages. But after talking about it with others and finding
something I missed, I think it makes more sense.
The spec mentions smbus command 8 in a table at the end of the
section on SSIF support as the end transaction. If that works,
then all is good and as it should be. However, some implementations
seem to use a middle transaction <32 bytes tomark the end because of the
confusion in the spec, even though that is an SMBus violation if
the number of bytes is zero.
So this change adds some tests, if command=8 works, it uses that,
otherwise if an empty end transaction works, it uses a middle
transaction <32 bytes to mark the end. If neither works, then
it limits the size to 63 bytes as it is now.
BugLink: https://bugs.launchpad.net/bugs/1799794 Cc: Harri Hakkarainen <harri@cavium.com> Cc: Bazhenov, Dmitry <dmitry.bazhenov@auriga.com> Cc: Mach, Dat <Dat.Mach@cavium.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>
(cherry picked from commit 10042504ed92c06077b8a20a4edd67ba784847d4) Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Tyler Hicks [Wed, 31 Oct 2018 00:55:25 +0000 (00:55 +0000)]
sysfs: Fix regression when adding a file to an existing group
BugLink: https://launchpad.net/bugs/1784501
Commit 5f81880d5204 ("sysfs, kobject: allow creating kobject belonging
to arbitrary users") incorrectly changed the argument passed as the
parent parameter when calling sysfs_add_file_mode_ns(). This caused some
sysfs attribute files to not be added correctly to certain groups.
Fixes: 5f81880d5204 ("sysfs, kobject: allow creating kobject belonging to arbitrary users") Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Reported-by: Heiner Kallweit <hkallweit1@gmail.com> Tested-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d1753390274f7760e5b593cb657ea34f0617e559) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Tyler Hicks [Wed, 31 Oct 2018 00:55:24 +0000 (00:55 +0000)]
bridge: make sure objects belong to container's owner
BugLink: https://launchpad.net/bugs/1784501
When creating various bridge objects in /sys/class/net/... make sure
that they belong to the container's owner instead of global root (if
they belong to a container/namespace).
Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 705e0dea4d52ef420a7d37fd9cc6725092e5e1ff) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Tyler Hicks [Wed, 31 Oct 2018 00:55:23 +0000 (00:55 +0000)]
net: create reusable function for getting ownership info of sysfs inodes
BugLink: https://launchpad.net/bugs/1784501
Make net_ns_get_ownership() reusable by networking code outside of core.
This is useful, for example, to allow bridge related sysfs files to be
owned by container root.
Add a function comment since this is a potentially dangerous function to
use given the way that kobject_get_ownership() works by initializing uid
and gid before calling .get_ownership().
Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(backported from commit fbdeaed408cf2728c62640c10848ddb1b67e63d3)
[tyhicks: Minor context changes] Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Dmitry Torokhov [Wed, 31 Oct 2018 00:55:22 +0000 (00:55 +0000)]
net-sysfs: make sure objects belong to container's owner
BugLink: https://launchpad.net/bugs/1784501
When creating various objects in /sys/class/net/... make sure that they
belong to container's owner instead of global root (if they belong to a
container/namespace).
Co-Developed-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b0e37c0d8a6abed0cd1b611314a7ebf50b0a8ed4) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Tyler Hicks [Wed, 31 Oct 2018 00:55:21 +0000 (00:55 +0000)]
net-sysfs: require net admin in the init ns for setting tx_maxrate
BugLink: https://launchpad.net/bugs/1784501
An upcoming change will allow container root to open some /sys/class/net
files for writing. The tx_maxrate attribute can result in changes
to actual hardware devices so err on the side of caution by requiring
CAP_NET_ADMIN in the init namespace in the corresponding attribute store
operation.
Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3033fced2f689d4a870b3ba6a8a676db1261d262) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Dmitry Torokhov [Wed, 31 Oct 2018 00:55:20 +0000 (00:55 +0000)]
driver core: set up ownership of class devices in sysfs
BugLink: https://launchpad.net/bugs/1784501
Plumb in get_ownership() callback for devices belonging to a class so that
they can be created with uid/gid different from global root. This will
allow network devices in a container to belong to container's root and not
global root.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9944e894c1266dc8515c82d1ff752d681215526b) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Dmitry Torokhov [Wed, 31 Oct 2018 00:55:19 +0000 (00:55 +0000)]
kobject: kset_create_and_add() - fetch ownership info from parent
BugLink: https://launchpad.net/bugs/1784501
This change implements get_ownership() for ksets created with
kset_create_and_add() call by fetching ownership data from parent kobject.
This is done mostly for benefit of "queues" attribute of net devices so
that corresponding directory belongs to container's root instead of global
root for network devices in a container.
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d028b6f703209dbe96201b2714ff46625877128e) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Dmitry Torokhov [Wed, 31 Oct 2018 00:55:18 +0000 (00:55 +0000)]
sysfs, kobject: allow creating kobject belonging to arbitrary users
BugLink: https://launchpad.net/bugs/1784501
Normally kobjects and their sysfs representation belong to global root,
however it is not necessarily the case for objects in separate namespaces.
For example, objects in separate network namespace logically belong to the
container's root and not global root.
This change lays groundwork for allowing network namespace objects
ownership to be transferred to container's root user by defining
get_ownership() callback in ktype structure and using it in sysfs code to
retrieve desired uid/gid when creating sysfs objects for given kobject.
Co-Developed-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5f81880d5204ee2388fd9a75bb850ccd526885b7) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Dmitry Torokhov [Wed, 31 Oct 2018 00:55:17 +0000 (00:55 +0000)]
kernfs: allow creating kernfs objects with arbitrary uid/gid
BugLink: https://launchpad.net/bugs/1784501
This change allows creating kernfs files and directories with arbitrary
uid/gid instead of always using GLOBAL_ROOT_UID/GID by extending
kernfs_create_dir_ns() and kernfs_create_file_ns() with uid/gid arguments.
The "simple" kernfs_create_file() and kernfs_create_dir() are left alone
and always create objects belonging to the global root.
When creating symlinks ownership (uid/gid) is taken from the target kernfs
object.
Co-Developed-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 488dee96bb62f0b3d9e678cf42574034d5b033a5) Signed-off-by: Tyler Hicks <tyhicks@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
BugLink: http://bugs.launchpad.net/bugs/1800849
When the oom killer kills a userspace process in the page fault handler
while in guest context, the fault handler fails to release the mm_sem
if the FAULT_FLAG_RETRY_NOWAIT option is set. This leads to a deadlock
when tearing down the mm when the process terminates. This bug can only
happen when pfault is enabled, so only KVM clients are affected.
The problem arises in the rare cases in which handle_mm_fault does not
release the mm_sem. This patch fixes the issue by manually releasing
the mm_sem when needed.
Fixes: 24eb3a824c4f3 ("KVM: s390: Add FAULT_FLAG_RETRY_NOWAIT for guest fault") Cc: <stable@vger.kernel.org> # 3.15+ Signed-off-by: Claudio Imbrenda <imbrenda@linux.vnet.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit 306d6c49ac9ded11114cb53b0925da52f2c2ada1) Signed-off-by: Frank Heimes <frank.heimes@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
net/af_iucv: fix skb handling on HiperTransport xmit error
BugLink: http://bugs.launchpad.net/bugs/1800639
When sending an skb, afiucv_hs_send() bails out on various error
conditions. But currently the caller has no way of telling whether the
skb was freed or not - resulting in potentially either
a) leaked skbs from iucv_send_ctrl(), or
b) double-free's from iucv_sock_sendmsg().
As dev_queue_xmit() will always consume the skb (even on error), be
consistent and also free the skb from all other error paths. This way
callers no longer need to care about managing the skb.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Reviewed-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry-picked from commit b2f543949acd1ba64313fdad9e672ef47550d773) Signed-off-by: Frank Heimes <frank.heimes@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
net/af_iucv: drop inbound packets with invalid flags
BugLink: http://bugs.launchpad.net/bugs/1800639
Inbound packets may have any combination of flag bits set in their iucv
header. If we don't know how to handle a specific combination, drop the
skb instead of leaking it.
To clarify what error is returned in this case, replace the hard-coded
0 with the corresponding macro.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry-picked from commit 222440996d6daf635bed6cb35041be22ede3e8a0) Signed-off-by: Frank Heimes <frank.heimes@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
AceLan Kao [Tue, 6 Nov 2018 10:53:12 +0000 (18:53 +0800)]
SAUCE: nvme: add quirk to not call disable function when suspending
BugLink: https://bugs.launchpad.net/bugs/1801875
Call nvme_dev_disable() function leads to the power consumption goes
up to 2.2 Watt during suspend-to-idle, and from SK hynix FE, they
suggest us to use its own APST feature to do the power management during
s2idle.
After D3 is diabled and nvme_dev_disable() is not called while
suspending, the power consumption drops to 0.77 Watt during s2idle.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
AceLan Kao [Tue, 6 Nov 2018 10:53:11 +0000 (18:53 +0800)]
SAUCE: pci: prevent sk hynix nvme from entering D3
BugLink: https://bugs.launchpad.net/bugs/1801875
It leads to the power consumption raises to 2.2W during s2idle, while
it consumes less than 1W during long idle if put SK hynix nvme to D3
and then enter s2idle.
From SK hynix FE, MS Windows doesn't put nvme to D3, and uses its own
APST feature to do the power management.
To leverage its APST feature during s2idle, we can't disable nvme
device while suspending, too.
BTW, prevent it from entering D3 will increase the power consumtion around
0.13W ~ 0.15W during short/long idle, and the power consumption during
s2idle becomes 0.77W.
Signed-off-by: AceLan Kao <acelan.kao@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Daniel Drake [Thu, 18 Oct 2018 09:14:00 +0000 (11:14 +0200)]
Input: i8042 - enable keyboard wakeups by default when s2idle is used
BugLink: https://bugs.launchpad.net/bugs/1798552
Previously, on typical consumer laptops, pressing a key on the keyboard
when the system is in suspend would cause it to wake up (default or
unconditional behaviour). This happens because the EC generates a SCI
interrupt in this scenario.
That is no longer true on modern laptops based on Intel WhiskeyLake,
including Acer Swift SF314-55G, Asus UX333FA, Asus UX433FN and Asus
UX533FD. We confirmed with Asus EC engineers that the "Modern Standby"
design has been modified so that the EC no longer generates a SCI
in this case; the keyboard controller itself should be used for wakeup.
In order to retain the standard behaviour of being able to use the
keyboard to wake up the system, enable serio wakeups by default on
platforms that are using s2idle.
Link: https://lkml.kernel.org/r/CAB4CAwfQ0mPMqCLp95TVjw4J0r5zKPWkSvvkK4cpZUGE--w8bQ@mail.gmail.com Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Daniel Drake <drake@endlessm.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
(cherry picked from commit 684bec1092b6991ff2a7751e8a763898576eb5c2) Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-By: AceLan Kao <acelan.kao@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
xfrm: Fix NULL pointer dereference when skb_dst_force clears the dst_entry.
BugLink: https://bugs.launchpad.net/bugs/1801878
Since commit 222d7dbd258d ("net: prevent dst uses after free")
skb_dst_force() might clear the dst_entry attached to the skb.
The xfrm code don't expect this to happen, so we crash with
a NULL pointer dereference in this case. Fix it by checking
skb_dst(skb) for NULL after skb_dst_force() and drop the packet
in cast the dst_entry was cleared.
Fixes: 222d7dbd258d ("net: prevent dst uses after free") Reported-by: Tobias Hommel <netdev-list@genoetigt.de> Reported-by: Kristian Evensen <kristian.evensen@gmail.com> Reported-by: Wolfgang Walter <linux@stwm.de> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com>
(cherry picked from commit 9e1437937807b0122e8da1ca8765be2adca9aee6) Signed-off-by: Gavin Guo <gavin.guo@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Julian Wiedmann [Wed, 16 May 2018 07:37:25 +0000 (09:37 +0200)]
s390/qdio: reset old sbal_state flags
BugLink: http://bugs.launchpad.net/bugs/1801686
When allocating a new AOB fails, handle_outbound() is still capable of
transmitting the selected buffer (just without async completion).
But if a previous transfer on this queue slot used async completion, its
sbal_state flags field is still set to QDIO_OUTBUF_STATE_FLAG_PENDING.
So when the upper layer driver sees this stale flag, it expects an async
completion that never happens.
Fix this by unconditionally clearing the flags field.
Fixes: 104ea556ee7f ("qdio: support asynchronous delivery of storage blocks") Cc: <stable@vger.kernel.org> #v3.2+ Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
(cherry picked from commit 64e03ff72623b8c2ea89ca3cb660094e019ed4ae) Signed-off-by: Frank Heimes <frank.heimes@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Yunsheng Lin [Wed, 7 Nov 2018 01:25:38 +0000 (18:25 -0700)]
net: hns3: Set tx ring' tc info when netdev is up
BugLink: https://bugs.launchpad.net/bugs/1802023
The HNS3_RING_TX_RING_TC_REG register is used to map tx ring to
specific tc, the tx queue to tc mapping is needed by the hardware
to do the correct tx schedule.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(backported from commit 1c77215480bcfa0852575180f997bd156f2aef17)
[ dannf: Trivial offset fix in hns3_enet.h ] Signed-off-by: dann frazier <dann.frazier@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Jean Delvare [Wed, 31 Oct 2018 17:54:16 +0000 (13:54 -0400)]
s390: qeth: Fix potential array overrun in cmd/rc lookup
BugLink: https://bugs.launchpad.net/bugs/1800641
Functions qeth_get_ipa_msg and qeth_get_ipa_cmd_name are modifying
the last member of global arrays without any locking that I can see.
If two instances of either function are running at the same time,
it could cause a race ultimately leading to an array overrun (the
contents of the last entry of the array is the only guarantee that
the loop will ever stop).
Performing the lookups without modifying the arrays is admittedly
slower (two comparisons per iteration instead of one) but these
are operations which are rare (should only be needed in error
cases or when debugging, not during successful operation) and it
seems still less costly than introducing a mutex to protect the
arrays in question.
As a side bonus, it allows us to declare both arrays as const data.
Signed-off-by: Jean Delvare <jdelvare@suse.de> Cc: Julian Wiedmann <jwi@linux.ibm.com> Cc: Ursula Braun <ubraun@linux.ibm.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 048a7f8b4ec085d5c56ad4a3bf450389a4aed5f9) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Jason Ekstrand [Wed, 31 Oct 2018 18:28:09 +0000 (14:28 -0400)]
drm/syncobj: Don't leak fences when WAIT_FOR_SUBMIT is set
BugLink: https://bugs.launchpad.net/bugs/1798165
We attempt to get fences earlier in the hopes that everything will
already have fences and no callbacks will be needed. If we do succeed
in getting a fence, getting one a second time will result in a duplicate
ref with no unref. This is causing memory leaks in Vulkan applications
that create a lot of fences; playing for a few hours can, apparently,
bring down the system.
Cc: stable@vger.kernel.org
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=107899 Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Signed-off-by: Jason Ekstrand <jason@jlekstrand.net> Signed-off-by: Sean Paul <seanpaul@chromium.org> Link: https://patchwork.freedesktop.org/patch/msgid/20180926071703.15257-1-jason.ekstrand@intel.com
(cherry picked from commit 337fe9f5c1e7de1f391c6a692531379d2aa2ee11) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The change also provides stronger context for authentication and share
connection (see MS-SMB2 3.3.5.7 and MS-SRVS 3.1.6.8) as noted by
Tom Talpey, and addresses comments about the buffer size for the UNC
made by Aurélien Aptel.
Signed-off-by: Thomas Werschlein <thomas.werschlein@geo.uzh.ch> Signed-off-by: Steve French <stfrench@microsoft.com> CC: Tom Talpey <ttalpey@microsoft.com> Reviewed-by: Aurelien Aptel <aaptel@suse.com> CC: Stable <stable@vger.kernel.org>
(cherry picked from commit 395a2076b4064f97d3fce03af15210ff2a7bb7f9) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Kleber Souza <kleber.souza@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>