tls_sw_sendmsg() allocates plaintext and encrypted SG entries using
function sk_alloc_sg(). In case the number of SG entries hit
MAX_SKB_FRAGS, sk_alloc_sg() returns -ENOSPC and sets the variable for
current SG index to '0'. This leads to calling of function
tls_push_record() with 'sg_encrypted_num_elem = 0' and later causes
kernel crash. To fix this, set the number of SG elements to the number
of elements in plaintext/encrypted SG arrays in case sk_alloc_sg()
returns -ENOSPC.
Fixes: 3c4d7559159b ("tls: kernel TLS support") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When initializing the device (procedure init_one), the driver
calls mlx5_pci_init to perform pci initialization. As part of this
initialization, mlx5_pci_init creates a debugfs directory.
If this creation fails, init_one aborts, returning failure to
the caller (which is the probe method caller).
The main reason for such a failure to occur is if the debugfs
directory already exists. This can happen if the last time
mlx5_pci_close was called, debugfs_remove (silently) failed due
to the debugfs directory not being empty.
Guarantee that such a debugfs_remove failure will not occur by
instead calling debugfs_remove_recursive in procedure mlx5_pci_close.
Fixes: 59211bd3b632 ("net/mlx5: Split the load/unload flow into hardware and software flows") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Currently, mlx5_attach_interface does not check for error
after calling intf->attach or intf->add. When these two calls
fails, the client is not initialized and will cause issues such as
kernel panic on invalid address in the teardown path (mlx5_detach_interface)
Fixes: 737a234bb638 ("net/mlx5: Introduce attach/detach to interface API") Signed-off-by: Huy Nguyen <huyn@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When a rds sock is bound, it is inserted into the bind_hash_table
which is protected by RCU. But when releasing rds sock, after it
is removed from this hash table, it is freed immediately without
respecting RCU grace period. This could cause some use-after-free
as reported by syzbot.
Mark the rds sock with SOCK_RCU_FREE before inserting it into the
bind_hash_table, so that it would be always freed after a RCU grace
period.
The other problem is in rds_find_bound(), the rds sock could be
freed in between rhashtable_lookup_fast() and rds_sock_addref(),
so we need to extend RCU read lock protection in rds_find_bound()
to close this race condition.
Reported-and-tested-by: syzbot+8967084bcac563795dc6@syzkaller.appspotmail.com Reported-by: syzbot+93a5839deb355537440f@syzkaller.appspotmail.com Cc: Sowmini Varadhan <sowmini.varadhan@oracle.com> Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: rds-devel@oss.oracle.com Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oarcle.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
With performance optimization the spi transfer and messages of basic
register operations like qcaspi_read_register moved into the private
driver structure. But they weren't protected against mutual access
(e.g. between driver kthread and ethtool). So dumping the QCA7000
registers via ethtool during network traffic could make spi_sync
hang forever, because the completion in spi_message is overwritten.
So revert the optimization completely.
Fixes: 291ab06ecf676 ("net: qualcomm: new Ethernet over SPI driver for QCA700") Signed-off-by: Stefan Wahren <stefan.wahren@i2se.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When the mlx5 health mechanism detects a problem while the driver
is in the middle of init_one or remove_one, the driver needs to prevent
the health mechanism from scheduling future work; if future work
is scheduled, there is a problem with use-after-free: the system WQ
tries to run the work item (which has been freed) at the scheduled
future time.
Prevent this by disabling work item scheduling in the health mechanism
when the driver is in the middle of init_one() or remove_one().
Fixes: e126ba97dba9 ("mlx5: Add driver for Mellanox Connect-IB adapters") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Reviewed-by: Feras Daoud <ferasda@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
DMA allocated memory is lost in be_cmd_get_profile_config() when we
call it with non-NULL port_res parameter.
Signed-off-by: Petr Oros <poros@redhat.com> Reviewed-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
If FI_EXTRA_ATTR is set in inode by fuzzing, inode.i_addr[0] will be
parsed as inode.i_extra_isize, then in __recover_inline_status, inline
data address will beyond boundary of page, result in accessing invalid
memory.
So in this condition, during reading inode page, let's do sanity check
with EXTRA_ATTR feature of fs and extra_attr bit of inode, if they're
inconsistent, deny to load this inode.
- Overview
Out-of-bound access in f2fs_iget() when mounting a corrupted f2fs image
- Reproduce
The following message will be got in KASAN build of 4.18 upstream kernel.
[ 819.392227] ==================================================================
[ 819.393901] BUG: KASAN: slab-out-of-bounds in f2fs_iget+0x736/0x1530
[ 819.395329] Read of size 4 at addr ffff8801f099c968 by task mount/1292
My Chromebook Plus (kevin) is spitting the following at boot time:
(NULL device *): hwmon: 'sbs-9-000b' is not a valid name attribute, please fix
Clearly, __hwmon_device_register is unhappy about the property name.
Some investigation reveals that thermal_add_hwmon_sysfs doesn't
sanitize the name of the attribute.
In order to keep it quiet, let's replace '-' with '_' in hwmon->type
This is consistent with what iio-hwmon does since b92fe9e3379c8.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Tested-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The pxa3xx driver uses the pinctrl-single driver since a while which
does not implement a .gpio_set_direction() callback. The pinmux core
will simply return 0 in this case, and the pxa3xx gpio driver hence
believes the pinctrl driver did its job and returns as well.
This effectively makes pxa_gpio_direction_{input,output} no-ops.
To fix this, do not call into the pinctrl subsystem for the PXA3xx
platform for now. We can revert this once the pinctrl-single driver
learned to support setting pin directions.
Signed-off-by: Daniel Mack <daniel@zonque.org> Acked-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
dax_pmem_percpu_exit() waits for dax_pmem_percpu_release() to invoke the
dax_pmem->cmp completion. Unfortunately this approach to cleaning up
the percpu_ref only works after devm_memremap_pages() was successful.
If devm_add_action_or_reset() or devm_memremap_pages() fails,
dax_pmem_percpu_release() is not invoked. Therefore
dax_pmem_percpu_exit() hangs waiting for the completion:
rc = devm_add_action_or_reset(dev, dax_pmem_percpu_exit,
&dax_pmem->ref);
if (rc)
return rc;
For the case when sbi->segs_per_sec > 1 with lfs mode, take
section:segment = 5 for example, if the section prefree_map is
...previous section | current section (1 1 0 1 1) | next section...,
then the start = x, end = x + 1, after start = start_segno +
sbi->segs_per_sec, start = x + 5, then it will skip x + 3 and x + 4, but
their bitmap is still set, which will cause duplicated
f2fs_issue_discard of this same section in the next write_checkpoint:
round 1: section bitmap : 1 1 1 1 1, all valid, prefree_map: 0 0 0 0 0
then rm data block NO.2, block NO.2 becomes invalid, prefree_map: 0 0 1 0 0
write_checkpoint: section bitmap: 1 1 0 1 1, prefree_map: 0 0 0 0 0,
prefree of NO.2 is cleared, and no discard issued
round 2: rm data block NO.0, NO.1, NO.3, NO.4
all invalid, but prefree bit of NO.2 is set and cleared in round 1, then
prefree_map: 1 1 0 1 1
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1, no
valid blocks of this section, so discard issued, but this time prefree
bit of NO.3 and NO.4 is skipped due to start = start_segno + sbi->segs_per_sec;
round 3:
write_checkpoint: section bitmap: 0 0 0 0 0, prefree_map: 0 0 0 1 1 ->
0 0 0 0 0, no valid blocks of this section, so discard issued,
this time prefree bit of NO.3 and NO.4 is cleared, but the discard of
this section is sent again...
To fix this problem, we can align the start and end value to section
boundary for fstrim and real-time discard operation, and decide to issue
discard only when the whole section is invalid, which can issue discard
aligned to section size as much as possible and avoid redundant discard.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
If we attempt to request more blocks than we have room for, we try to
instead request as much as we can, however, alloc_valid_block_count
is not decremented to match the new value, allowing it to drift higher
until the next checkpoint. This always decrements it when the requested
amount cannot be fulfilled.
Signed-off-by: Daniel Rosenberg <drosen@google.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The periphery can't be accessed before we set the
INIT_DONE bit which initializes the device.
A previous patch added a reconfiguration of the MSI-X
tables upon resume, but at that point in the flow,
INIT_DONE wasn't set. Since the reconfiguration of the
MSI-X tables require periphery access, it failed.
The difference between WoWLAN and without WoWLAN is that
in WoWLAN, iwl_trans_pcie_d3_suspend clears the INIT_DONE
without clearing the STATUS_DEVICE_ENABLED bit in the
software status. Because of that, the resume code thinks
that the device is enabled, but the INIT_DONE bit has been
cleared.
To fix this, don't reconfigure the MSI-X tables in case
WoWLAN is enabled. It will be done in
iwl_trans_pcie_d3_resume anyway.
During normal IO, FW can return IO with 'port unavailble' status. Driver
would send a LOGO to remote port for session resync. On an off chance, a
PLOGI could arrive before sending the LOGO. This patch will skip sendiing
LOGO if a PLOGI just came in.
Signed-off-by: Quinn Tran <quinn.tran@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
- Return valid error codes from ppc4xx_setup_pcieh_hw(), have it clean
up after itself, and only access hardware after all possible error
conditions have been handled.
- Use devm_kzalloc() instead of kzalloc() in ppc4xx_msi_probe()
Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
drivers/pinctrl/berlin/berlin.c:237 berlin_pinctrl_build_state()
warn: passing devm_ allocated variable to kfree. 'pctrl->functions'
As we will be calling krealloc() on pointer 'pctrl->functions', which means
kfree() will be called in there, devm_kzalloc() shouldn't be used with
the allocation in the first place. Fix the warning by calling kcalloc()
and managing the free procedure in error path on our own.
Fixes: 3de68d331c24 ("pinctrl: berlin: add the core pinctrl driver for Marvell Berlin SoCs") Signed-off-by: YueHaibing <yuehaibing@huawei.com> Reviewed-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
activate_managed() returns EINVAL instead of -EINVAL in case of
error. While this is unlikely to happen, the positive return value would
cause further malfunction at the call site.
Fixes: 2db1f959d9dc ("x86/vector: Handle managed interrupts proper") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Otherwise kernel uses random MAC which is not very conveniet.
With that change in place use might set desired MAC in U-Boot
with "setenv ethaddr 11:22:33:44:55:66", save environment and
then from boot to boot the same MAC will be used by the kernel.
One other note for this to happen it's required to pass
board's .dtb in U-Boot's "bootm" command like that:
------------------->8-----------------
bootm 0x82000000 - 0x84000000
------------------->8-----------------
Here 0x82000000 is location of uImage while
0x80000000 is location of either axs10x.dtb or hsdk.dtb
previously loaded from SD-card, USB storage or TFTP server.
Since commit e641a317830b ("KVM: PPC: Book3S HV: Unify dirty page map
between HPT and radix", 2017-10-26), kvm_unmap_radix() computes the
number of PAGE_SIZEd pages being unmapped and passes it to
kvmppc_update_dirty_map(), which expects to be passed the page size
instead. Consequently it will only mark one system page dirty even
when a large page (for example a THP page) is being unmapped. The
consequence of this is that part of the THP page might not get copied
during live migration, resulting in memory corruption for the guest.
This fixes it by computing and passing the page size in kvm_unmap_radix().
Cc: stable@vger.kernel.org # v4.15+ Fixes: e641a317830b (KVM: PPC: Book3S HV: Unify dirty page map between HPT and radix) Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
After commit b196d88aba8a ("tun: fix use after free for ptr_ring") we
need clean up tx ring during release(). But unfortunately, it tries to
do the cleanup blindly after socket were destroyed which will lead
another use-after-free. Fix this by doing the cleanup before dropping
the last reference of the socket in __tun_detach().
Backport Note :-
Upstream commit moves the ptr_ring_cleanup call from tun_chr_close to
__tun_detach. Upstream applied that patch after replacing skb_array with
ptr_ring. This patch moves the skb_array_cleanup call from
tun_chr_close to __tun_detach.
Reported-by: Andrei Vagin <avagin@virtuozzo.com> Acked-by: Andrei Vagin <avagin@virtuozzo.com> Fixes: b196d88aba8a ("tun: fix use after free for ptr_ring") Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Zubin Mithra <zsm@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
(cherry picked from commit 2a522aef03a4e11d6fdc05a107a023943b103e39) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
We used to initialize ptr_ring during TUNSETIFF, this is because its
size depends on the tx_queue_len of netdevice. And we try to clean it
up when socket were detached from netdevice. A race were spotted when
trying to do uninit during a read which will lead a use after free for
pointer ring. Solving this by always initialize a zero size ptr_ring
in open() and do resizing during TUNSETIFF, and then we can safely do
cleanup during close(). With this, there's no need for the workaround
that was introduced by commit 4df0bfc79904 ("tun: fix a memory leak
for tfile->tx_array").
Backport Note :-
Comparison with the upstream patch:
[1] A "semantic revert" of the changes made in 4df0bfc799("tun: fix a memory leak for tfile->tx_array"). 4df0bfc799 was applied upstream, and then skb array was changed
to use ptr_ring. The upstream patch then removes the changes introduced
by 4df0bfc799. This backport does the same; "revert" the changes
made by 4df0bfc799.
[2] xdp_rxq_info_unreg() being called in relevant locations
As xdp_rxq_info related patches are not present in 4.14, these
changes are not needed in the backport.
[3] An instance of ptr_ring_init needs to be replaced by skb_array_init
Inside tun_attach()
[4] ptr_ring_cleanup needs to be replaced by skb_array_cleanup
Inside tun_chr_close()
Note that the backport for 7063efd33b ("tuntap: fix use after free during release")
needs to be applied on top of this patch.
Reported-by: syzbot+e8b902c3c3fadf0a9dba@syzkaller.appspotmail.com Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Michael S. Tsirkin <mst@redhat.com> Fixes: 1576d9860599 ("tun: switch to use skb array for tx") Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Zubin Mithra <zsm@chromium.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com>
(cherry picked from commit 4320b96333c761620b1ac5cc12b8d5265b3e4b8b) Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Eric Dumazet [Thu, 13 Sep 2018 14:58:56 +0000 (07:58 -0700)]
net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends
BugLink: https://bugs.launchpad.net/bugs/1836117
After working on IP defragmentation lately, I found that some large
packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
zero paddings on the last (small) fragment.
While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
to CHECKSUM_NONE, forcing a full csum validation, even if all prior
fragments had CHECKSUM_COMPLETE set.
We can instead compute the checksum of the part we are trimming,
usually smaller than the part we keep.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 88078d98d1bb085d72af8437707279e203524fa5) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
ipv6: defrag: drop non-last frags smaller than min mtu
BugLink: https://bugs.launchpad.net/bugs/1836117
don't bother with pathological cases, they only waste cycles.
IPv6 requires a minimum MTU of 1280 so we should never see fragments
smaller than this (except last frag).
v3: don't use awkward "-offset + len"
v2: drop IPv4 part, which added same check w. IPV4_MIN_MTU (68).
There were concerns that there could be even smaller frags
generated by intermediate nodes, e.g. on radio networks.
Cc: Peter Oskolkov <posk@google.com> Cc: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0ed4229b08c13c84a3c301a08defdc9e7f4467e6) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 385114dec8a49b5e5945e77ba7de6356106713f4) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Peter Oskolkov [Thu, 13 Sep 2018 14:58:52 +0000 (07:58 -0700)]
ip: discard IPv4 datagrams with overlapping segments.
BugLink: https://bugs.launchpad.net/bugs/1836117
This behavior is required in IPv6, and there is little need
to tolerate overlapping fragments in IPv4. This change
simplifies the code and eliminates potential DDoS attack vectors.
Tested: ran ip_defrag selftest (not yet available uptream).
Suggested-by: David S. Miller <davem@davemloft.net> Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Florian Westphal <fw@strlen.de> Acked-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7969e5c40dfd04799d4341f1b7cd266b6e47f227) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Eric Dumazet [Thu, 13 Sep 2018 14:58:51 +0000 (07:58 -0700)]
inet: frags: fix ip6frag_low_thresh boundary
BugLink: https://bugs.launchpad.net/bugs/1836117
Giving an integer to proc_doulongvec_minmax() is dangerous on 64bit arches,
since linker might place next to it a non zero value preventing a change
to ip6frag_low_thresh.
ip6frag_low_thresh is not used anymore in the kernel, but we do not
want to prematuraly break user scripts wanting to change it.
Since specifying a minimal value of 0 for proc_doulongvec_minmax()
is moot, let's remove these zero values in all defrag units.
Fixes: 6e00f7dd5e4e ("ipv6: frags: fix /proc/sys/net/ipv6/ip6frag_low_thresh") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3d23401283e80ceb03f765842787e0e79ff598b7) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Eric Dumazet [Thu, 13 Sep 2018 14:58:50 +0000 (07:58 -0700)]
inet: frags: get rid of ipfrag_skb_cb/FRAG_CB
BugLink: https://bugs.launchpad.net/bugs/1836117
ip_defrag uses skb->cb[] to store the fragment offset, and unfortunately
this integer is currently in a different cache line than skb->next,
meaning that we use two cache lines per skb when finding the insertion point.
By aliasing skb->ip_defrag_offset and skb->dev, we pack all the fields
in a single cache line and save precious memory bandwidth.
Note that after the fast path added by Changli Gao in commit d6bebca92c66 ("fragment: add fast path for in-order fragments")
this change wont help the fast path, since we still need
to access prev->len (2nd cache line), but will show great
benefits when slow path is entered, since we perform
a linear scan of a potentially long list.
Also, note that this potential long list is an attack vector,
we might consider also using an rb-tree there eventually.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bf66337140c64c27fa37222b7abca7e49d63fb57) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Eric Dumazet [Thu, 13 Sep 2018 14:58:49 +0000 (07:58 -0700)]
inet: frags: reorganize struct netns_frags
BugLink: https://bugs.launchpad.net/bugs/1836117
Put the read-mostly fields in a separate cache line
at the beginning of struct netns_frags, to reduce
false sharing noticed in inet_frag_kill()
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c2615cf5a761b32bf74e85bddc223dfff3d9b9f0) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Move @nelems at the end of struct rhashtable so that first cache line
is shared between all cpus, because almost never dirtied.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e5d672a0780d9e7118caad4c171ec88b8299398d) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 05c0b86b9696802fd0ce5676a92a63f1b455bdf3) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
While fixing the bug at that time, it also added a very high cost
for DDOS frags, as the ICMP rate limit is applied after this
expensive operation (skb_clone() + consume_skb(), implying memory
allocations, copy, and freeing)
We can use skb_get(head) here, all we want is to make sure skb wont
be freed by another cpu.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1eec5d5670084ee644597bd26c25e22c69b9f748) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Eric Dumazet [Thu, 13 Sep 2018 14:58:45 +0000 (07:58 -0700)]
inet: frags: break the 2GB limit for frags storage
BugLink: https://bugs.launchpad.net/bugs/1836117
Some users are willing to provision huge amounts of memory to be able
to perform reassembly reasonnably well under pressure.
Current memory tracking is using one atomic_t and integers.
Switch to atomic_long_t so that 64bit arches can use more than 2GB,
without any cost for 32bit arches.
Note that this patch avoids an overflow error, if high_thresh was set
to ~2GB, since this test in inet_frag_alloc() was never true :
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3e67f106f619dcfaf6f4e2039599bdb69848c714) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2d44ed22e607f9a285b049de2263e3840673a260) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Note: in the future, we should try hard to avoid the skb_clone()
since this is a serious performance cost.
Under DDOS, the ICMP message wont be sent because of rate limits.
Fact that ip6_expire_frag_queue() does not use skb_clone() is
disturbing too. Presumably IPv6 should have the same
issue than the one we fixed in commit ec4fbd64751d
("inet: frag: release spinlock before calling icmp_send()")
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 399d1404be660d355192ff4df5ccc3f4159ec1e4) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Also since we use rhashtable we can bring back the number of fragments
in "grep FRAG /proc/net/sockstat /proc/net/sockstat6" that was
removed in commit 434d305405ab ("inet: frag: don't account number
of fragment queues")
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6befe4a78b1553edb6eed3a78b4bcd9748526672) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Eric Dumazet [Thu, 13 Sep 2018 14:58:41 +0000 (07:58 -0700)]
inet: frags: use rhashtables for reassembly units
BugLink: https://bugs.launchpad.net/bugs/1836117
Some applications still rely on IP fragmentation, and to be fair linux
reassembly unit is not working under any serious load.
It uses static hash tables of 1024 buckets, and up to 128 items per bucket (!!!)
A work queue is supposed to garbage collect items when host is under memory
pressure, and doing a hash rebuild, changing seed used in hash computations.
This work queue blocks softirqs for up to 25 ms when doing a hash rebuild,
occurring every 5 seconds if host is under fire.
Then there is the problem of sharing this hash table for all netns.
It is time to switch to rhashtables, and allocate one of them per netns
to speedup netns dismantle, since this is a critical metric these days.
Lookup is now using RCU. A followup patch will even remove
the refcount hold/release left from prior implementation and save
a couple of atomic operations.
Before this patch, 16 cpus (16 RX queue NIC) could not handle more
than 1 Mpps frags DDOS.
After the patch, I reach 9 Mpps without any tuning, and can use up to 2GB
of storage for the fragments (exact number depends on frags being evicted
after timeout)
Eric Dumazet [Thu, 13 Sep 2018 14:58:40 +0000 (07:58 -0700)]
rhashtable: add schedule points
BugLink: https://bugs.launchpad.net/bugs/1836117
Rehashing and destroying large hash table takes a lot of time,
and happens in process context. It is safe to add cond_resched()
in rhashtable_rehash_table() and rhashtable_free_and_destroy()
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ae6da1f503abb5a5081f9f6c4a6881de97830f3e) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The only sysctl that is not per-netns is not used :
ip6frag_secret_interval
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Nikolay Borisov <kernel@kyup.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 18dcbe12fe9fca0ab825f7eff993060525ac2503) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
This is a prereq to "inet: frags: use rhashtables for reassembly units"
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 807f1844df4ac23594268fa9f41902d0549e92aa) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
This is a prereq to "inet: frags: use rhashtables for reassembly units"
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5b975bab23615cd0fdf67af6c9298eb01c4b9f61) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Eric Dumazet [Thu, 13 Sep 2018 14:58:35 +0000 (07:58 -0700)]
inet: frags: refactor ipfrag_init()
BugLink: https://bugs.launchpad.net/bugs/1836117
We need to call inet_frags_init() before register_pernet_subsys(),
as a prereq for following patch ("inet: frags: use rhashtables for reassembly units")
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 483a6e4fa055123142d8956866fe2aa9c98d546d) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Eric Dumazet [Thu, 13 Sep 2018 14:58:34 +0000 (07:58 -0700)]
inet: frags: add a pointer to struct netns_frags
BugLink: https://bugs.launchpad.net/bugs/1836117
In order to simplify the API, add a pointer to struct inet_frags.
This will allow us to make things less complex.
These functions no longer have a struct inet_frags parameter :
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 093ba72914b696521e4885756a68a3332782c8de) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
This patch changes the return value to eventually propagate an
error.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 787bea7748a76130566f881c2342a0be4127d182) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Currently if the cm_id is not bound to any netdevice, than for such cm_id,
net namespace is ignored; which is incorrect.
Regardless of cm_id bound to a netdevice or not, net namespace must
match. When a cm_id is bound to a netdevice, in such case net namespace
and netdevice both must match.
Fixes: 4c21b5bcef73 ("IB/cma: Add net_dev and private data checks to RDMA CM") Signed-off-by: Parav Pandit <parav@mellanox.com> Reviewed-by: Daniel Jurgens <danielj@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
If a driver causes DMA cache maintenance with a zero length then we
currently BUG and kill the kernel. As this is a scenario that we may
well be able to recover from, WARN & return in the condition instead.
If the client is sending a layoutget, but the server issues a callback
to recall what it thinks may be an outstanding layout, then we may find
an uninitialised layout attached to the inode due to the layoutget.
In that case, it is appropriate to return NFS4ERR_NOMATCHING_LAYOUT
rather than NFS4ERR_DELAY, as the latter can end up deadlocking.
This patch adds to do sanity check with {sit,nat}_ver_bitmap_bytesize
during mount, in order to avoid accessing across cache boundary with
this abnormal bitmap size.
- Overview
buffer overrun in build_sit_info() when mounting a crafted f2fs image
When attaching a device to an IOMMU group with
CONFIG_DEBUG_ATOMIC_SLEEP=y:
BUG: sleeping function called from invalid context at mm/slab.h:421
in_atomic(): 1, irqs_disabled(): 128, pid: 61, name: kworker/1:1
...
Call trace:
...
arm_lpae_alloc_pgtable+0x114/0x184
arm_64_lpae_alloc_pgtable_s1+0x2c/0x128
arm_32_lpae_alloc_pgtable_s1+0x40/0x6c
alloc_io_pgtable_ops+0x60/0x88
ipmmu_attach_device+0x140/0x334
ipmmu_attach_device() takes a spinlock, while arm_lpae_alloc_pgtable()
allocates memory using GFP_KERNEL. Originally, the ipmmu-vmsa driver
had its own custom page table allocation implementation using
GFP_ATOMIC, hence the spinlock was fine.
Fix this by replacing the spinlock by a mutex, like the arm-smmu driver
does.
Fixes: f20ed39f53145e45 ("iommu/ipmmu-vmsa: Use the ARM LPAE page table allocator") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Look up of buffers in s5p_mfc_handle_frame_new, s5p_mfc_handle_frame_copy_time
functions is not working properly for DMA addresses above 2 GiB. As a result
flags and timestamp of returned buffers are not set correctly and it breaks
operation of GStreamer/OMX plugins which rely on the CAPTURE buffer queue
flags.
Due to improper return type of the get_dec_y_adr, get_dspl_y_adr callbacks
and sign bit extension these callbacks return incorrect address values,
e.g. 0xfffffffffefc0000 instead of 0x00000000fefc0000. Then the statement:
is always false, which breaks looking up capture queue buffers.
To ensure proper matching by address u32 type is used for the DMA
addresses. This should work on all related SoCs, since the MFC DMA
address width is not larger than 32-bit.
Changes done in this patch are minimal as there is a larger patch series
pending refactoring the whole driver.
The driver only registers one input device, which uses the screen
parameters from the first T9 instance. The first T63 instance also uses
those parameters.
It is incorrect to send input reports from the second instances of these
objects if they are enabled: the input scaling will be wrong and the
positions will be mashed together.
This also causes problems on Android if the number of slots exceeds 32.
In the future, this could be handled by looking for enabled touch object
instances and creating an input device for each one.
More than one io_mode feature can be requested when creating a dm cache
device (as is: last one wins). The io_mode selections are incompatible
with one another, we should force them to be selected exclusively. Add
a counter to check for more than one io_mode selection.
Fixes: 629d0a8a1a10 ("dm cache metadata: add "metadata2" feature") Signed-off-by: John Pittman <jpittman@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The function dcb_app_lookup walks the list of specified DCB APP entries,
looking for one that matches a given criteria: ifindex, selector,
protocol ID and optionally also priority. The "don't care" value for
priority is set to 0, because that priority has not been allowed under
CEE regime, which predates the IEEE standardization.
Under IEEE, 0 is a valid priority number. But because dcb_app_lookup
considers zero a wild card, attempts to add an APP entry with priority 0
fail when other entries exist for a given ifindex / selector / PID
triplet.
Fix by changing the wild-card value to -1.
Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
of_find_compatible_node() returns a device_node pointer with refcount
incremented and must be decremented explicitly.
As this code is using the result only to check presence of the interrupt
controller (!NULL) but not actually using the result otherwise the
refcount can be decremented here immediately again.
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/19820/ Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
As Wen Xu reported in bugzilla, after image was injected with random data
by fuzzing, inline inode would contain invalid reserved blkaddr, then
during inline conversion, we will encounter illegal memory accessing
reported by KASAN, the root cause of this is when writing out converted
inline page, we will use invalid reserved blkaddr to update sit bitmap,
result in accessing memory beyond sit bitmap boundary.
In order to fix this issue, let's do sanity check with reserved block
address of inline inode to avoid above condition.
Locking the root adapter for __i2c_transfer will deadlock if the
device sits behind a mux-locked I2C mux. Switch to the finer-grained
i2c_lock_bus with the I2C_LOCK_SEGMENT flag. If the device does not
sit behind a mux-locked mux, the two locking variants are equivalent.
Signed-off-by: Peter Rosin <peda@axentia.se> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Alexander Steffen <Alexander.Steffen@infineon.com> Signed-off-by: Wolfram Sang <wsa@the-dreams.de> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
An SPI TPM device managed directly on an embedded board using
the SPI bus and some GPIO or similar line as IRQ handler will
pass the IRQn from the TPM device associated with the SPI
device. This is already handled by the SPI core, so make sure
to pass this down to the core as well.
(The TPM core habit of using -1 to signal no IRQ is dubious
(as IRQ 0 is NO_IRQ) but I do not want to mess with that
semantic in this patch.)
Cc: Mark Brown <broonie@kernel.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
If segment type in SSA and SIT is inconsistent, we will encounter below
BUG_ON during GC, to avoid this panic, let's just skip doing GC on such
segment.
The bug is triggered with image reported in below link:
In synchronous scenario, like in checkpoint(), we are going to flush
dirty node pages to device synchronously, we can easily failed
writebacking node page due to trylock_page() failure, especially in
condition of intensive lock competition, which can cause long latency
of checkpoint(). So let's use lock_page() in synchronous scenario to
avoid this issue.
Signed-off-by: Yunlei He <heyunlei@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
It is incorrect to enable TX/RX queues (call by mvneta_port_up()) for
port without link. Indeed MTU change for interface without link causes TX
queues to stuck.
Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP
network unit") Signed-off-by: Yelena Krivosheev <yelena@marvell.com>
[gregory.clement: adding Fixes tags and rewording commit log] Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The AMD pinctrl driver demultiplexes GPIO interrupts and fires off their
individual handlers.
If one of these GPIO irqs is configured as a level interrupt, and its
downstream handler is a threaded ONESHOT interrupt, the GPIO interrupt
source is masked by handle_level_irq() until the eventual return of the
threaded irq handler. During this time the level GPIO interrupt status
will still report as high until the actual gpio source is cleared - both
in the individual GPIO interrupt status bit (INTERRUPT_STS_OFF) and in
its corresponding "WAKE_INT_STATUS_REG" bit.
Thus, if another GPIO interrupt occurs during this time,
amd_gpio_irq_handler() will see that the (masked-and-not-yet-cleared)
level irq is still pending and incorrectly call its handler again.
To fix this, have amd_gpio_irq_handler() check for both interrupts status
and mask before calling generic_handle_irq().
Note: Is it possible that this bug was the source of the interrupt storm
on Ryzen when using chained interrupts before commit ba714a9c1dea85
("pinctrl/amd: Use regular interrupt instead of chained")?
Signed-off-by: Daniel Kurtz <djkurtz@chromium.org> Acked-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
If ioh_gpio_probe() fails on devm_irq_alloc_descs() then chip may point
to any element of chip_save array, so reverse iteration from pointer chip
may become chip_save[-1] and gpiochip_remove() will operate with wrong
memory.
The patch fix the error path of ioh_gpio_probe() to correctly bypass
chip_save array.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
This fixes two issues with setting hid->name information.
CC net/bluetooth/hidp/core.o
In function ‘hidp_setup_hid’,
inlined from ‘hidp_session_dev_init’ at net/bluetooth/hidp/core.c:815:9,
inlined from ‘hidp_session_new’ at net/bluetooth/hidp/core.c:953:8,
inlined from ‘hidp_connection_add’ at net/bluetooth/hidp/core.c:1366:8:
net/bluetooth/hidp/core.c:778:2: warning: ‘strncpy’ output may be truncated copying 127 bytes from a string of length 127 [-Wstringop-truncation]
strncpy(hid->name, req->name, sizeof(req->name) - 1);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
CC net/bluetooth/hidp/core.o
net/bluetooth/hidp/core.c: In function ‘hidp_setup_hid’:
net/bluetooth/hidp/core.c:778:38: warning: argument to ‘sizeof’ in ‘strncpy’ call is the same expression as the source; did you mean to use the size of the destination? [-Wsizeof-pointer-memaccess]
strncpy(hid->name, req->name, sizeof(req->name));
^
Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The tx completion of multiple mgmt frames can be bundled
in a single event and sent by the firmware to host, if this
capability is not disabled explicitly by the host. If the host
cannot handle the bundled mgmt tx completion, this capability
support needs to be disabled in the wmi init cmd, sent to the firmware.
Add the host capability indication flag in the wmi ready command,
to let firmware know the features supported by the host driver.
This field is ignored if it is not supported by firmware.
Set the host capability indication flag(i.e. host_capab) to zero,
for disabling the support of bundle mgmt tx completion. This will
indicate the firmware to send completion event for every mgmt tx
completion, instead of bundling them together and sending in a single
event.
The mock / test version of pmem_direct_access() needs to check the
validity of pointers kaddr and pfn for NULL assignment. If anyone
equals to NULL, it doesn't need to calculate the value.
If pointer equals to NULL, that is to say callers may have no need for
kaddr or pfn, so this patch is prepared for allowing them to pass in
NULL instead of having to pass in a local pointer or variable that
they then just throw away.
Suggested-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Huaisheng Ye <yehs1@lenovo.com> Reviewed-by: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
tw_probe() returns 0 in case of fail of tw_initialize_device_extension(),
pci_resource_start() or tw_reset_sequence() and releases resources.
twl_probe() returns 0 in case of fail of twl_initialize_device_extension(),
pci_iomap() and twl_reset_sequence(). twa_probe() returns 0 in case of
fail of tw_initialize_device_extension(), ioremap() and
twa_reset_sequence().
The patch adds retval initialization for these cases.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru> Acked-by: Adam Radford <aradford@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
isa_virt_to_bus() & isa_bus_to_virt() claim to treat ISA bus addresses
as being identical to physical addresses, but they fail to do so in the
presence of a non-zero PHYS_OFFSET.
Correct this by having them use virt_to_phys() & phys_to_virt(), which
consolidates the calculations to one place & ensures that ISA bus
addresses do indeed match physical addresses.
Signed-off-by: Paul Burton <paul.burton@mips.com>
Patchwork: https://patchwork.linux-mips.org/patch/20047/ Cc: James Hogan <jhogan@kernel.org> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: linux-mips@linux-mips.org Cc: Vladimir Kondratiev <vladimir.kondratiev@intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When receiving a beacon or probe response, we should update the
boottime_ns field which is the timestamp the frame was received at.
(cf mac80211.h)
This fixes a scanning issue with Android since it relies on this
timestamp to determine when the AP has been seen for the last time
(via the nl80211 BSS_LAST_SEEN_BOOTTIME parameter).
Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The QCA4019 hw1.0 firmware 10.4-3.2.1-00050 and 10.4-3.5.3-00053 (and most
likely all other) seem to ignore the WMI_CHAN_FLAG_DFS flag during the
scan. This results in transmission (probe requests) on channels which are
not "available" for transmissions.
Since the firmware is closed source and nothing can be done from our side
to fix the problem in it, the driver has to work around this problem. The
WMI_CHAN_FLAG_PASSIVE seems to be interpreted by the firmware to not
scan actively on a channel unless an AP was detected on it. Simple probe
requests will then be transmitted by the STA on the channel.
ath10k must therefore also use this flag when it queues a radar channel for
scanning. This should reduce the chance of an active scan when the channel
might be "unusable" for transmissions.
Fixes: e8a50f8ba44b ("ath10k: introduce DFS implementation") Signed-off-by: Sven Eckelmann <sven.eckelmann@openmesh.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The tx power applied by set_txpower is limited by the CTL (conformance
test limit) entries in the EEPROM. These can change based on the user
configured regulatory domain.
Depending on the EEPROM data this can cause the tx power to become too
limited, if the original regdomain CTLs impose lower limits than the CTLs
of the user configured regdomain.
To fix this issue, set the initial channel limits without any CTL
restrictions and only apply the CTL at run time when setting the channel
and the real tx power.
Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Add missing in_8() accessors to init_pmu() and pmu_sr_intr().
This fixes several sparse warnings:
drivers/macintosh/via-pmu.c:536:29: warning: dereference of noderef expression
drivers/macintosh/via-pmu.c:537:33: warning: dereference of noderef expression
drivers/macintosh/via-pmu.c:1455:17: warning: dereference of noderef expression
drivers/macintosh/via-pmu.c:1456:69: warning: dereference of noderef expression
Tested-by: Stan Johnson <userm57@yahoo.com> Signed-off-by: Finn Thain <fthain@telegraphics.com.au> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
'perf record' will error out if both --delay and LBR are applied.
For example:
# perf record -D 1000 -a -e cycles -j any -- sleep 2
Error:
dummy:HG: PMU Hardware doesn't support sampling/overflow-interrupts.
Try 'perf stat'
#
A dummy event is added implicitly for initial delay, which has the same
configurations as real sampling events. The dummy event is a software
event. If LBR is configured, perf must error out.
The dummy event will only be used to track PERF_RECORD_MMAP while perf
waits for the initial delay to enable the real events. The BRANCH_STACK
bit can be safely cleared for the dummy event.
After applying the patch:
# perf record -D 1000 -a -e cycles -j any -- sleep 2
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.054 MB perf.data (828 samples) ]
#
Reported-by: Sunil K Pandey <sunil.k.pandey@intel.com> Signed-off-by: Kan Liang <kan.liang@linux.intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1531145722-16404-1-git-send-email-kan.liang@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
'perf c2c' scans read/write accesses and tries to find false sharing
cases, so when the events it wants were not asked for or ended up not
taking place, we get no histograms.
So do not try to display entry details if there's not any. Currently
this ends up in crash:
Fix build warnings in f2fs when CONFIG_PROC_FS is not enabled
by marking the unused functions as __maybe_unused.
../fs/f2fs/sysfs.c:519:12: warning: 'segment_info_seq_show' defined but not used [-Wunused-function]
../fs/f2fs/sysfs.c:546:12: warning: 'segment_bits_seq_show' defined but not used [-Wunused-function]
../fs/f2fs/sysfs.c:570:12: warning: 'iostat_info_seq_show' defined but not used [-Wunused-function]
Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Chao Yu <yuchao0@huawei.com> Cc: linux-f2fs-devel@lists.sourceforge.net Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
For the case when sbi->segs_per_sec > 1, take section:segment = 5 for
example, if segment 1 is just used and allocate new segment 2, and the
blocks of segment 1 is invalidated, at this time, the previous code will
use __set_test_and_free to free the free_secmap and free_sections++,
this is not correct since it is still a current section, so fix it.
Signed-off-by: Yunlong Song <yunlong.song@huawei.com> Reviewed-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
If config CONFIG_F2FS_FAULT_INJECTION is on, for both read or write path
we will call find_lock_page() to get the page, but for read path, it
missed to passing FGP_ACCESSED to allocator to active the page in LRU
list, result in being reclaimed in advance incorrectly, fix it.
Reported-by: Xianrong Zhou <zhouxianrong@huawei.com> Signed-off-by: Chao Yu <yuchao0@huawei.com> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
If number of isa and pci boards exceed NUM_BOARDS on the path
rp_init()->init_PCI()->register_PCI() then buffer overwrite occurs
in register_PCI() on assign rcktpt_io_addr[i].
The patch adds check on upper bound for index of registered
board in register_PCI.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
clk_evt memory is not being freed when the synic is shutdown
or when there is an allocation error. Add the appropriate
kfree() call, along with a comment to clarify how the memory
gets freed after an allocation error. Make the free path
consistent by removing checks for NULL since kfree() and
free_page() already do the check.
Signed-off-by: Michael Kelley <mikelley@microsoft.com> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
static struct ro_vpd and rw_vpd are initialized by vpd_sections_init()
in vpd_probe() based on header's ro and rw sizes.
In vpd_remove() vpd_section_destroy() performs deinitialization based
on enabled flag, which is set to true by vpd_sections_init().
This leads to call of vpd_section_destroy() on already destroyed section
for probe-release-probe-release sequence if first probe performs
ro_vpd initialization and second probe does not initialize it.
The patch adds changing enabled flag on vpd_section_destroy and adds
cleanup on the error path of vpd_sections_init.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The uio_unregister_device() function assumes that if "info->uio_dev" is
non-NULL that means "info" is fully allocated. Setting info->uio_de
has to be the last thing in the function.
In the current code, if request_threaded_irq() fails then we return with
info->uio_dev set to non-NULL but info is not fully allocated and it can
lead to double frees.
Fixes: beafc54c4e2f ("UIO: Add the User IO core code") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The CSID decodes the input data stream. When the input comes from
the Test Generator the format of the stream is set on the source
media pad. When the input comes from the CSIPHY the format is the
one on the sink media pad. Use the proper format for each case.
timer_base::must_forward_clock is indicating that the base clock might be
stale due to a long idle sleep.
The forwarding of the base clock takes place in the timer softirq or when a
timer is enqueued to a base which is idle. If the enqueue of timer to an
idle base happens from a remote CPU, then the following race can happen:
CPU0 CPU1
run_timer_softirq mod_timer
base = lock_timer_base(timer);
base->must_forward_clk = false
if (base->must_forward_clk)
forward(base); -> skipped
enqueue_timer(base, timer, idx);
-> idx is calculated high due to
stale base
unlock_timer_base(timer);
base = lock_timer_base(timer);
forward(base);
The root cause is that timer_base::must_forward_clk is cleared outside the
timer_base::lock held region, so the remote queuing CPU observes it as
cleared, but the base clock is still stale. This can cause large
granularity values for timers, i.e. the accuracy of the expiry time
suffers.
Prevent this by clearing the flag with timer_base::lock held, so that the
forwarding takes place before the cleared flag is observable by a remote
CPU.