Hash faults are not resoved in NMI context, instead causing the access
to fail. This is done because perf interrupts can get backtraces
including walking the user stack, and taking a hash fault on those could
deadlock on the HPTE lock if the perf interrupt hits while the same HPTE
lock is being held by the hash fault code. The user-access for the stack
walking will notice the access failed and deal with that in the perf
code.
The reason to allow perf interrupts in is to better profile hash faults.
The problem with this is any hash fault on a kernel access that happens
in NMI context will crash, because kernel accesses must not fail.
Hard lockups, system reset, machine checks that access vmalloc space
including modules and including stack backtracing and symbol lookup in
modules, per-cpu data, etc could all run into this problem.
Fix this by disallowing perf interrupts in the hash fault code (the
direct hash fault is covered by MSR[EE]=0 so the PMI disable just needs
to extend to the preload case). This simplifies the tricky logic in hash
faults and perf, at the cost of reduced profiling of hash faults.
perf can still latch addresses when interrupts are disabled, it just
won't get the stack trace at that point, so it would still find hot
spots, just sometimes with confusing stack chains.
An alternative could be to allow perf interrupts here but always do the
slowpath stack walk if we are in nmi context, but that slows down all
perf interrupt stack walking on hash though and it does not remove as
much tricky code.
Reported-by: Laurent Dufour <ldufour@linux.ibm.com> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Tested-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220204035348.545435-1-npiggin@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit c3543bac6efaecf97404c30c0513fab3b3fd5dca) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Before, the hardware would be allowed to transmit injected 802.11 MPDUs
as A-MSDU. This resulted in corrupted frames being transmitted. Now,
injected MPDUs are transmitted as-is, without A-MSDU.
The fix was verified with frame injection on MT7915 hardware, both with
and without the injected frame being encrypted.
If the hardware cannot do A-MSDU aggregation on MPDUs, this problem
would also be present in the TX path where mac80211 does the 802.11
encapsulation. However, I have not observed any such problem when
disabling IEEE80211_HW_SUPPORTS_TX_ENCAP_OFFLOAD to force that mode.
Therefore this fix is isolated to injected frames only.
The same A-MSDU logic is also present in the mt7921 driver, so it is
likely that this fix should be applied there too. I do not have access
to mt7921 hardware so I have not been able to test that.
Signed-off-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit df467929a0408efccd9495ccc5b650d9a3782313) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
In pm8001_chip_set_dev_state_req(), pm8001_chip_fw_flash_update_req(),
pm80xx_chip_phy_ctl_req() and pm8001_chip_reg_dev_req() add missing calls
to pm8001_tag_free() to free the allocated tag when pm8001_mpi_build_cmd()
fails.
Similarly, in pm8001_exec_internal_task_abort(), if the chip ->task_abort
method fails, the tag allocated for the abort request task must be
freed. Add the missing call to pm8001_tag_free().
The call to pm8001_ccb_task_free() at the end of
pm8001_mpi_task_abort_resp() already frees the ccb tag. So when the device
NCQ_ABORT_ALL_FLAG is set, the tag should not be freed again. Also change
the hardcoded 0xBFFFFFFF value to ~NCQ_ABORT_ALL_FLAG as it ought to be.
The declaration of the local variable destination1 in pm80xx_pci_mem_copy()
as a pointer to a u32 results in the sparse warning:
warning: incorrect type in assignment (different base types)
expected unsigned int [usertype]
got restricted __le32 [usertype]
Furthermore, the destination" argument of pm80xx_pci_mem_copy() is wrongly
declared with the const attribute.
Fix both problems by changing the type of the "destination" argument to
"__le32 *" and use this argument directly inside the pm80xx_pci_mem_copy()
function, thus removing the need for the destination1 local variable.
Resolve build errors reported against UML build for undefined
ioport_map() and ioport_unmap() functions. Without this config
option a device cannot have vfio_pci_core_device.has_vga set,
so the existing function would always return -EINVAL anyway.
The driver has a fallback so make the message informational
rather than a warning. The driver has a fallback if the
Component Resource Association Table (CRAT) is missing, so
make this informational now.
Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/1906 Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f325d3e1dcc85fc3cd984f30fd443ab2f3b42631) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Update both bio-based and request-based DM to requeue IO if the
mapping table not available.
This race of IO being submitted before the DM device ready is so
narrow, yet possible for initial table load given that the DM device's
request_queue is created prior, that it best to requeue IO to handle
this unlikely case.
Reported-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit da52e8b9dad67549f3b607286b08efa3cc84081f) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
It appears like cmd could be a Spectre v1 gadget as it's supplied by a
user and used as an array index. Prevent the contents of kernel memory
from being leaked to userspace via speculative execution by using
array_index_nospec.
Signed-off-by: Jordy Zomer <jordy@pwning.systems> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 02cc46f397eb3691c56affbd5073e54f7a82ac32) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
In case user space sends a packet destined to a broadcast address when a
matching broadcast route is not configured, the kernel will create a
unicast neighbour entry that will never be resolved [1].
When the broadcast route is configured, the unicast neighbour entry will
not be invalidated and continue to linger, resulting in packets being
dropped.
Solve this by invalidating unresolved neighbour entries for broadcast
addresses after routes for these addresses are internally configured by
the kernel. This allows the kernel to create a broadcast neighbour entry
following the next route lookup.
Another possible solution that is more generic but also more complex is
to have the ARP code register a listener to the FIB notification chain
and invalidate matching neighbour entries upon the addition of broadcast
routes.
It is also possible to wave off the issue as a user space problem, but
it seems a bit excessive to expect user space to be that intimately
familiar with the inner workings of the FIB/neighbour kernel code.
Reported-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Tested-by: Wang Hai <wanghai38@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 049072749a5e1fe277ee64746bbe9fca5c55f3f9) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Quoting the header comments, IRQF_ONESHOT is "Used by threaded interrupts
which need to keep the irq line disabled until the threaded handler has
been run.". When applied to an interrupt that doesn't request a threaded
irq then IRQF_ONESHOT has a lesser known (undocumented?) side effect,
which it to disable the forced threading of irqs (and for "normal" kernels
it is a nop). In this case I can find no evidence that suppressing forced
threading is intentional. Had it been intentional then a driver must adopt
the raw_spinlock API in order to avoid deadlocks on PREEMPT_RT kernels
(and avoid calling any kernel API that uses regular spinlocks).
Fix this by removing the spurious additional flag.
This change is required for my Snapdragon 7cx Gen2 tablet to boot-to-GUI
with PREEMPT_RT enabled.
During disassociation we're decreasing the phy's ref count.
If the ref count becomes 0, we're configuring the phy ctxt
to the default channel (the lowest channel which the device
can operate on). Currently we're not checking whether the
the default channel is enabled or not. Fix it by configuring
the phy ctxt to the lowest channel which is enabled.
Currently, fragmented EBS was set for a channel only if the 'hb_type'
was set to fragmented or balanced scan. However, 'hb_type' is set only
in case of CDB, and thus fragmented EBS is never set for a channel for
non-CDB devices. Fix it.
The quirk handling may need to set some different properties
which means using a different swnode, move the setting of the swnode
to inside dwc3_pci_quirks() so that the quirk handling can choose
a different swnode.
Normally, the queues are disabled when the channels are deactivated, and
enabled when the channels are activated. However, on register, the
channels are not active, but the queues are enabled by default. This
change fixes it, preventing mlx5e_xmit from running when the channels
are deactivated in the beginning.
Signed-off-by: Maxim Mikityanskiy <maximmi@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit c64f3707cdf9347db765cdc68091751a60da5bd1) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The AXP288's recommended and factory default Vhold value (minimum
input voltage below which the input current draw will be reduced)
is 4.4V. This lines up with other charger IC's such as the TI
bq2419x/bq2429x series which use 4.36V or 4.44V.
For some reason some BIOS-es initialize Vhold to 4.6V or even 4.7V
which combined with the typical voltage drop over typically low
wire gauge micro-USB cables leads to the input-current getting
capped below 1A (with a 2A capable dedicated charger) based on Vhold.
This leads to slow charging, or even to the device slowly discharging
if the device is in heavy use.
As the Linux AXP288 drivers use the builtin BC1.2 charger detection
and send the input-current-limit according to the detected charger
there really is no reason not to use the recommended 4.4V Vhold.
Set Vhold to 4.4V to fix the slow charging issue on various devices.
There is one exception, the special-case of the HP X2 2-in-1s which
combine this BC1.2 capable PMIC with a Type-C port and a 5V/3A factory
provided charger with a Type-C plug which does not do BC1.2. These
have their input-current-limit hardcoded to 3A (like under Windows)
and use a higher Vhold on purpose to limit the current when used
with other chargers. To avoid touching Vhold on these HP X2 laptops
the code setting Vhold is added to an else branch of the if checking
for these models.
Note this also fixes the sofar unused VBUS_ISPOUT_VHOLD_SET_MASK
define, which was wrong.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 83efc05c857961e4f05750959521dada4ac7a333) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Commit 1f9ad21c3b38 ("powerpc/mm: Implement set_memory() routines")
included a spin_lock() to change_page_attr() in order to
safely perform the three step operations. But then
commit 9f7853d7609d ("powerpc/mm: Fix set_memory_*() against
concurrent accesses") modify it to use pte_update() and do
the operation safely against concurrent access.
In the meantime, Maxime reported some spinlock recursion.
Remove the read / modify / write sequence to make the operation atomic
and remove the spin_lock() in change_page_attr().
To do the operation atomically, we can't use pte modification helpers
anymore. Because all platforms have different combination of bits, it
is not easy to use those bits directly. But all have the
_PAGE_KERNEL_{RO/ROX/RW/RWX} set of flags. All we need it to compare
two sets to know which bits are set or cleared.
For instance, by comparing _PAGE_KERNEL_ROX and _PAGE_KERNEL_RO you
know which bit gets cleared and which bit get set when changing exec
permission.
The driver is missing to set the residual size while completing an
I/O. Ensure proper data transfer size is reported to the kernel on I/O
completion based on the transfer length reported by the firmware.
The Qualcomm PCI bridge device (Device ID 0x0110) found in chipsets such as
SM8450 does not set the Command Completed bit unless writes to the Slot
Command register change "Control" bits.
removed the need to disable bottom half while acquiring
listening_hash.lock. There are still two callers left which disable
bottom half before the lock is acquired.
On PREEMPT_RT the softirqs are preemptible and local_bh_disable() acts
as a lock to ensure that resources, that are protected by disabling
bottom halves, remain protected.
This leads to a circular locking dependency if the lock acquired with
disabled bottom halves is also acquired with enabled bottom halves
followed by disabling bottom halves. This is the reverse locking order.
It has been observed with inet_listen_hashbucket::lock:
Drop local_bh_disable() around __inet_hash() which acquires
listening_hash->lock. Split inet_unhash() and acquire the
listen_hashbucket lock without disabling bottom halves; the inet_ehash
lock with disabled bottom halves.
The copy test uses the memcpy() to copy data between IO memory spaces.
This can trigger an alignment fault error (pasted the error logs below)
because memcpy() may use unaligned accesses on a mapped memory that is
just IO, which does not support unaligned memory accesses.
Fix it by using the correct memcpy API to copy from/to IO memory.
During event processing, events are read from the event queue one
by one until the queue is empty.If the master device continuously
requests address access at the same time and the SMMU generates
events, the cyclic processing of the event takes a long time and
softlockup warnings may be reported.
Aardvark hardware supports Multi-MSI and MSI_FLAG_MULTI_PCI_MSI is already
set for the MSI chip. But when allocating MSI interrupt numbers for
Multi-MSI, the numbers need to be properly aligned, otherwise endpoint
devices send MSI interrupt with incorrect numbers.
Fix this issue by using function bitmap_find_free_region() instead of
bitmap_find_next_zero_area().
To ensure that aligned MSI interrupt numbers are used by endpoint devices,
we cannot use Linux virtual irq numbers (as they are random and not
properly aligned). Instead we need to use the aligned hwirq numbers.
This change fixes receiving MSI interrupts on Armada 3720 boards and
allows using NVMe disks which use Multi-MSI feature with 3 interrupts.
Without this NVMe disks freeze booting as linux nvme-core.c is waiting
60s for an interrupt.
Avoid dropping into shell if the controller is in locked up state.
Driver issues SIS soft reset to bring back the controller to SIS mode while
OS boots into kdump mode.
If the controller is in lockup state, SIS soft reset does not work.
Since the controller lockup code has not been cleared, driver considers the
firmware is no longer up and running. Driver returns back an error code to
OS and the kdump fails.
Link: https://lore.kernel.org/r/164375212337.440833.11955356190354940369.stgit@brunhilda.pdev.net Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com> Reviewed-by: Scott Benesh <scott.benesh@microchip.com> Reviewed-by: Scott Teel <scott.teel@microchip.com> Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com> Signed-off-by: Don Brace <don.brace@microchip.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 330c4e1b4ec44a1deda9198475c84dbed36bb07e) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Cc: Christian König <christian.koenig@amd.com> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <Alexander.Deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Rajneesh Bhardwaj <rajneesh.bhardwaj@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 0a922366d6d9b2532344b3763a54090ab9b50f59) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
On large config LPARs (having 192 and more cores), Linux fails to boot
due to insufficient memory in the first memblock. It is due to the
memory reservation for the crash kernel which starts at 128MB offset of
the first memblock. This memory reservation for the crash kernel doesn't
leave enough space in the first memblock to accommodate other essential
system resources.
The crash kernel start address was set to 128MB offset by default to
ensure that the crash kernel get some memory below the RMA region which
is used to be of size 256MB. But given that the RMA region size can be
512MB or more, setting the crash kernel offset to mid of RMA size will
leave enough space for the kernel to allocate memory for other system
resources.
Since the above crash kernel offset change is only applicable to the LPAR
platform, the LPAR feature detection is pushed before the crash kernel
reservation. The rest of LPAR specific initialization will still
be done during pseries_probe_fw_features as usual.
This patch is dependent on changes to paca allocation for boot CPU. It
expect boot CPU to discover 1T segment support which is introduced by
the patch posted here:
https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html
While testing a patch that will follow later
("net: add netns refcount tracker to struct nsproxy")
I found that devtmpfs_init() was called before init_net
was initialized.
This is a bug, because devtmpfs_setup() calls
ksys_unshare(CLONE_NEWNS);
This has the effect of increasing init_net refcount,
which will be later overwritten to 1, as part of setup_net(&init_net)
We had too many prior patches [1] trying to work around the root cause.
Really, make sure init_net is in BSS section, and that net_ns_init()
is called earlier at boot time.
Note that another patch ("vfs: add netns refcount tracker
to struct fs_context") also will need net_ns_init() being called
before vfs_caches_init()
As a bonus, this patch saves around 4KB in .data section.
[1]
f8c46cb39079 ("netns: do not call pernet ops for not yet set up init_net namespace") b5082df8019a ("net: Initialise init_net.count to 1") 734b65417b24 ("net: Statically initialize init_net.dev_base_head")
v2: fixed a build error reported by kernel build bots (CONFIG_NET=n)
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 9c1ace066f22fe490d240dba8e13aeee9a88c589) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
This fixes minor data-races in ip6_mc_input() and
batadv_mcast_mla_rtr_flags_softif_get_ipv6()
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 4790998fdd0d0c682512af9b4a875bc80da73836) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
There are cases where clang compiler is packaged in a way
readelf is a symbolic link to llvm-readelf. In such cases,
llvm-readelf will be used instead of default binutils readelf,
and the following error will appear during libbpf build:
# Warning: Num of global symbols in
# /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/sharedobjs/libbpf-in.o (367)
# does NOT match with num of versioned symbols in
# /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf.so libbpf.map (383).
# Please make sure all LIBBPF_API symbols are versioned in libbpf.map.
# --- /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf_global_syms.tmp ...
# +++ /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf_versioned_syms.tmp ...
# @@ -324,6 +324,22 @@
# btf__str_by_offset
# btf__type_by_id
# btf__type_cnt
# +LIBBPF_0.0.1
# +LIBBPF_0.0.2
# +LIBBPF_0.0.3
# +LIBBPF_0.0.4
# +LIBBPF_0.0.5
# +LIBBPF_0.0.6
# +LIBBPF_0.0.7
# +LIBBPF_0.0.8
# +LIBBPF_0.0.9
# +LIBBPF_0.1.0
# +LIBBPF_0.2.0
# +LIBBPF_0.3.0
# +LIBBPF_0.4.0
# +LIBBPF_0.5.0
# +LIBBPF_0.6.0
# +LIBBPF_0.7.0
# libbpf_attach_type_by_name
# libbpf_find_kernel_btf
# libbpf_find_vmlinux_btf_id
# make[2]: *** [Makefile:184: check_abi] Error 1
# make[1]: *** [Makefile:140: all] Error 2
The above failure is due to different printouts for some ABS
versioned symbols. For example, with the same libbpf.so,
$ /bin/readelf --dyn-syms --wide tools/lib/bpf/libbpf.so | grep "LIBBPF" | grep ABS
134: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.5.0
202: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.6.0
...
$ /opt/llvm/bin/readelf --dyn-syms --wide tools/lib/bpf/libbpf.so | grep "LIBBPF" | grep ABS
134: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.5.0@@LIBBPF_0.5.0
202: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.6.0@@LIBBPF_0.6.0
...
The binutils readelf doesn't print out the symbol LIBBPF_* version and llvm-readelf does.
Such a difference caused libbpf build failure with llvm-readelf.
The proposed fix filters out all ABS symbols as they are not part of the comparison.
This works for both binutils readelf and llvm-readelf.
Reported-by: Delyan Kratunov <delyank@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220204214355.502108-1-yhs@fb.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e9da1df2c021642a0541eaf2add19dbe0d9e7685) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
When adding 6GHz channels to scan request based on reported
co-located APs, don't add channels that have only APs with
"non-transmitted" BSSes if they only match the wildcard SSID since
they will be found by probing the "transmitted" BSS.
Even if it is only a false-positive since skip_buf0/skip_buf1 are only
used in mt76_dma_tx_cleanup_idx routine, initialize skip_unmap in
mt76_dma_rx_fill in order to fix the following UBSAN report:
Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 55c93a89e31dcf2f99482be8dd28d99a3431dd9a) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
If the nic fails to start, it is possible that the
reset_work has already been scheduled. Ensure the
work item is canceled so we do not have use-after-free
crash in case cleanup is called before the work item
is executed.
This fixes crash on my x86_64 apu2 when mt7921k radio
fails to work. Radio still fails, but OS does not
crash.
Signed-off-by: Ben Greear <greearb@candelatech.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 38fbe806645090c07aa97171f20fc62c3d7d3a98) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
As stated in [1], negative current values are used for discharging
batteries.
AXP PMICs internally have two different ADC channels for shunt current
measurement: one used during charging and one during discharging.
The values reported by these ADCs are unsigned.
While the driver properly selects ADC channel to get the data from,
it doesn't apply negative sign when reporting discharging current.
[1] Documentation/ABI/testing/sysfs-class-power
Signed-off-by: Evgeny Boger <boger@wirenboard.com> Acked-by: Chen-Yu Tsai <wens@csie.org> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 793a3704589358ac55e7ec881552b250f384a6ec) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
[why]
Unlock is needed on the error handling path to prevent dead lock.
v3d_submit_cl_ioctl and v3d_submit_csd_ioctl is missing unlock.
[how]
Fix this by changing goto target on the error handling path. So
changing the goto to target an error handling path
that includes drm_gem_unlock reservations.
coccinelle report:
./drivers/scsi/bfa/bfad_attr.c:908:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:860:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:888:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:853:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:808:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:728:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:822:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:927:9-17:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:900:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:874:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:714:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/bfa/bfad_attr.c:839:8-16:
WARNING: use scnprintf or sprintf
Use sysfs_emit() instead of scnprintf() or sprintf().
coccinelle report:
./drivers/scsi/mvsas/mv_init.c:699:8-16:
WARNING: use scnprintf or sprintf
./drivers/scsi/mvsas/mv_init.c:747:8-16:
WARNING: use scnprintf or sprintf
Use sysfs_emit() instead of scnprintf() or sprintf().
Menglong Dong reports that the documentation for the dst_port field in
struct bpf_sock is inaccurate and confusing. From the BPF program PoV, the
field is a zero-padded 16-bit integer in network byte order. The value
appears to the BPF user as if laid out in memory as so:
32-, 16-, and 8-bit wide loads from the field are all allowed, but only if
the offset into the field is 0.
32-bit wide loads from dst_port are especially confusing. The loaded value,
after converting to host byte order with bpf_ntohl(dst_port), contains the
port number in the upper 16-bits.
Remove the confusion by splitting the field into two 16-bit fields. For
backward compatibility, allow 32-bit wide loads from offsetof(struct
bpf_sock, dst_port).
While at it, allow loads 8-bit loads at offset [0] and [1] from dst_port.
pm_runtime_get_sync() will increase the rumtime PM counter
even when it returns an error. Thus a pairing decrement is needed
to prevent refcount leak. Fix this by replacing this API with
pm_runtime_resume_and_get(), which will not change the runtime
PM counter on error. Besides, a matching decrement is needed
on the error handling path to keep the counter balanced.
According to the man page of TCP_CORK [1], if set, don't send out
partial frames. All queued partial frames are sent when option is
cleared again.
When applications call setsockopt to disable TCP_CORK, this call is
protected by lock_sock(), and tries to mod_delayed_work() to 0, in order
to send pending data right now. However, the delayed work smc_tx_work is
also protected by lock_sock(). There introduces lock contention for
sending data.
To fix it, send pending data directly which acts like TCP, without
lock_sock() protected in the context of setsockopt (already lock_sock()ed),
and cancel unnecessary dealyed work, which is protected by lock.
[1] https://linux.die.net/man/7/tcp
Signed-off-by: Tony Lu <tonylu@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 35380262304ff5fc9432c0d9d6793d5d39634cc6) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
If amss.bin was missing ath11k would crash during 'rmmod ath11k_pci'. The
reason for that was that we were using mhi_async_power_up() which does not
check any errors. But mhi_sync_power_up() on the other hand does check for
errors so let's use that to fix the crash.
I was not able to find a reason why an async version was used.
ath11k_mhi_start() (which enables state ATH11K_MHI_POWER_ON) is called from
ath11k_hif_power_up(), which can sleep. So sync version should be safe to use
here.
The issue here is that board file loading happens after ath11k_pci_probe()
succesfully returns (ath11k initialisation happends asynchronously) and the
suspend handler is still enabled, of course failing as ath11k is not properly
initialised. Fix this by checking ATH11K_FLAG_QMI_FAIL during both suspend and
resume.
SVM ioctls take proper svms->lock to handle race conditions, don't need
take process mutex to serialize ioctls. This also fixes circular locking
warning:
WARNING: possible circular locking dependency detected
Signed-off-by: Philip Yang <Philip.Yang@amd.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit e84b0438010d3359ae4830ba44108150f4839a92) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
coccinelle report:
./drivers/ptp/ptp_sysfs.c:17:8-16:
WARNING: use scnprintf or sprintf
./drivers/ptp/ptp_sysfs.c:390:8-16:
WARNING: use scnprintf or sprintf
Use sysfs_emit instead of scnprintf or sprintf makes more sense.
Reported-by: Zeal Robot <zealci@zte.com.cn> Signed-off-by: Yang Guang <yang.guang5@zte.com.cn> Signed-off-by: David Yang <davidcomponentone@gmail.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 1eb598045326e72805efb4abab998022c126dbb9) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Variable ret in function cdnsp_decode_trb is initialized but not
used. To fix this compiler warning patch adds checking whether the
data buffer has not been overflowed.
According to the Tegra Technical Reference Manual, the seq_num
field of control endpoint is not [31:24] but [31:27]. Bit 24
is reserved and bit 26 is splitxstate.
The change fixes the wrong control endpoint's definitions.
[Why]
If the DPCD caps specifies a PSR version newer than PSR_VERSION_1 then
we fallback to using PSR_VERSION_1 in amdgpu_dm_set_psr_caps.
This gets overriden with the raw DPCD value in amdgpu_dm_link_setup_psr,
which can result in DMCUB hanging if we pass in an unsupported PSR
version number.
[How]
Fix the hang by using link->psr_settings.psr_version directly during
amdgpu_dm_link_setup_psr.
Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Reviewed-by: Anthony Koo <Anthony.Koo@amd.com> Acked-by: Rodrigo Siqueira <Rodrigo.Siqueira@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 6040c99cb1a18c8da3f84e5051db12b6353a2576) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
[why]
Resource release is needed on the error handling path
to prevent memory leak.
[how]
Fix this by adding kfree on the error handling path.
Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Yongzhi Liu <lyz_cs@pku.edu.cn> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 7e10369c72db7a0e2f77b2e306aadc07aef6b07a) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
This issue takes place in an error path in
amdgpu_cs_fence_to_handle_ioctl(). When `info->in.what` falls into
default case, the function simply returns -EINVAL, forgetting to
decrement the reference count of a dma_fence obj, which is bumped
earlier by amdgpu_cs_get_fence(). This may result in reference count
leaks.
Fix it by decreasing the refcount of specific object before returning
the error code.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Xin Xiong <xiongx18@fudan.edu.cn> Signed-off-by: Xin Tan <tanxin.ctf@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 3edd8646cb7c11b57c90e026bda6f21076223f5b) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
[Why]
For allow eDP hot-plug feature, the stream signal may change to VIRTUAL
when plug-out and back to eDP when plug-in. OS will still setPathMode
with same timing for each plugging, but eDP gets no stream update as we
don't check signal type changing back as keeping it VIRTUAL. It's also
unsafe for future cases that stream signal is switched with same timing.
[How]
Check stream signal type change include previous HDMI signal case.
Reviewed-by: Aric Cyr <Aric.Cyr@amd.com> Acked-by: Wayne Lin <wayne.lin@amd.com> Signed-off-by: Dale Zhao <dale.zhao@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit c4b64a80554e57a68b594f2920988f4bc39768d9) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The bug was found during fuzzing. Stacktrace locates it in
ath5k_eeprom_convert_pcal_info_5111.
When none of the curve is selected in the loop, idx can go
up to AR5K_EEPROM_N_PD_CURVES. The line makes pd out of bound.
pd = &chinfo[pier].pd_curves[idx];
There are many OOB writes using pd later in the code. So I
added a sanity check for idx. Checks for other loops involving
AR5K_EEPROM_N_PD_CURVES are not needed as the loop index is not
used outside the loops.
The patch is NOT tested with real device.
The following is the fuzzing report
BUG: KASAN: slab-out-of-bounds in ath5k_eeprom_read_pcal_info_5111+0x126a/0x1390 [ath5k]
Write of size 1 at addr ffff8880174a4d60 by task modprobe/214
When RDTSCP is supported but RDPID is not supported in host,
RDPID emulation is available. However, __kvm_get_msr() would
only fail when RDTSCP/RDPID both are disabled in guest, so
the emulator wouldn't inject a #UD when RDPID is disabled but
RDTSCP is enabled in guest.
Fixes: fb6d4d340e05 ("KVM: x86: emulate RDPID") Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com>
Message-Id: <1dfd46ae5b76d3ed87bde3154d51c64ea64c99c1.1646226788.git.houwenlong.hwl@antgroup.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit d5f6f44e04c30ad52bf4ca8b415b2d3caa2cf465) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
HSW_IN_TX* bits are used in generic code which are not supported on
AMD. Worse, these bits overlap with AMD EventSelect[11:8] and hence
using HSW_IN_TX* bits unconditionally in generic code is resulting in
unintentional pmu behavior on AMD. For example, if EventSelect[11:8]
is 0x2, pmc_reprogram_counter() wrongly assumes that
HSW_IN_TX_CHECKPOINTED is set and thus forces sampling period to be 0.
Also per the SDM, both bits 32 and 33 "may only be set if the processor
supports HLE or RTM" and for "IN_TXCP (bit 33): this bit may only be set
for IA32_PERFEVTSEL2."
Opportunistically eliminate code redundancy, because if the HSW_IN_TX*
bit is set in pmc->eventsel, it is already set in attr.config.
Reported-by: Ravi Bangoria <ravi.bangoria@amd.com> Reported-by: Jim Mattson <jmattson@google.com> Fixes: 103af0a98788 ("perf, kvm: Support the in_tx/in_tx_cp modifiers in KVM arch perfmon emulation v5") Co-developed-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Like Xu <likexu@tencent.com>
Message-Id: <20220309084257.88931-1-likexu@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a997e0f5aa553274f42f0d4cf36aaddc812d7484) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
AMD EPYC CPUs never raise a #GP for a WRMSR to a PerfEvtSeln MSR. Some
reserved bits are cleared, and some are not. Specifically, on
Zen3/Milan, bits 19 and 42 are not cleared.
When emulating such a WRMSR, KVM should not synthesize a #GP,
regardless of which bits are set. However, undocumented bits should
not be passed through to the hardware MSR. So, rather than checking
for reserved bits and synthesizing a #GP, just clear the reserved
bits.
This may seem pedantic, but since KVM currently does not support the
"Host/Guest Only" bits (41:40), it is necessary to clear these bits
rather than synthesizing #GP, because some popular guests (e.g Linux)
will set the "Host Only" bit even on CPUs that don't support
EFER.SVME, and they don't expect a #GP.
Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379957] unchecked MSR access error: WRMSR to 0xc0010200 (tried to write 0x0000020000130026) at rIP: 0xffffffff9b276a28 (native_write_msr+0x8/0x30)
Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379958] Call Trace:
Feb 23 03:59:58 Ubuntu1804 kernel: [ 405.379963] amd_pmu_disable_event+0x27/0x90
Fixes: ca724305a2b0 ("KVM: x86/vPMU: Implement AMD vPMU code for KVM") Reported-by: Lotus Fenn <lotusf@google.com> Signed-off-by: Jim Mattson <jmattson@google.com> Reviewed-by: Like Xu <likexu@tencent.com> Reviewed-by: David Dunn <daviddunn@google.com>
Message-Id: <20220226234131.2167175-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit e7bab9898249ee21c46a722ae8e3c67f1712573a) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Include kvm_cache_regs.h to pick up the definition of is_guest_mode(),
which is referenced by nested_svm_virtualize_tpr() in svm.h. Remove
include from svm_onhpyerv.c which was done only because of lack of
include in svm.h.
Fixes: 883b0a91f41ab ("KVM: SVM: Move Nested SVM Implementation to nested.c") Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Sean Christopherson <seanjc@google.com> Cc: kvm@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Peter Gonda <pgonda@google.com>
Message-Id: <20220304161032.2270688-1-pgonda@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 5483640f8efbe4a9f42361935633572b9b925436) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The third nybble of AMD's event select overlaps with Intel's IN_TX and
IN_TXCP bits. Therefore, we can't use AMD64_RAW_EVENT_MASK on Intel
platforms that support TSX.
Declare a raw_event_mask in the kvm_pmu structure, initialize it in
the vendor-specific pmu_refresh() functions, and use that mask for
PERF_TYPE_RAW configurations in reprogram_gp_counter().
Fixes: 710c47651431 ("KVM: x86/pmu: Use AMD64_RAW_EVENT_MASK for PERF_TYPE_RAW") Signed-off-by: Jim Mattson <jmattson@google.com>
Message-Id: <20220308012452.3468611-1-jmattson@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a82fe0ba1c52a9e04d998ab226c518ef07295cf7) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
One of KFENCE's main design principles is that with increasing uptime,
allocation coverage increases sufficiently to detect previously
undetected bugs.
We have observed that frequent long-lived allocations of the same source
(e.g. pagecache) tend to permanently fill up the KFENCE pool with
increasing system uptime, thus breaking the above requirement. The
workaround thus far had been increasing the sample interval and/or
increasing the KFENCE pool size, but is no reliable solution.
To ensure diverse coverage of allocations, limit currently covered
allocations of the same source once pool utilization reaches 75%
(configurable via `kfence.skip_covered_thresh`) or above. The effect is
retaining reasonable allocation coverage when the pool is close to full.
A side-effect is that this also limits frequent long-lived allocations
of the same source filling up the pool permanently.
Uniqueness of an allocation for coverage purposes is based on its
(partial) allocation stack trace (the source). A Counting Bloom filter
is used to check if an allocation is covered; if the allocation is
currently covered, the allocation is skipped by KFENCE.
Testing was done using:
(a) a synthetic workload that performs frequent long-lived
allocations (default config values; sample_interval=1;
num_objects=63), and
(b) normal desktop workloads on an otherwise idle machine where
the problem was first reported after a few days of uptime
(default config values).
In both test cases the sampled allocation rate no longer drops to zero
at any point. In the case of (b) we observe (after 2 days uptime) 15%
unique allocations in the pool, 77% pool utilization, with 20% "skipped
allocations (covered)".
Move the saving of the stack trace of allocations into __kfence_alloc(),
so that the stack entries array can be used outside of
kfence_guarded_alloc() and we avoid potentially unwinding the stack
multiple times.
Link: https://lkml.kernel.org/r/20210923104803.2620285-3-elver@google.com Signed-off-by: Marco Elver <elver@google.com> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Acked-by: Alexander Potapenko <glider@google.com> Cc: Aleksandr Nogikh <nogikh@google.com> Cc: Jann Horn <jannh@google.com> Cc: Taras Madan <tarasmadan@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 44b44b64b4da9edbbfff52872a5b0c10acda54aa) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
When 'index' is a big numbers, it may become negative which forced
to 'int'. then 'index << part_shift' might overflow to a positive
value that is not greater than '0xfffff', then sysfs might complains
about duplicate creation. Because of this, move the 'index' judgment
to the front will fix it and be better.
Fixes: b0d9111a2d53 ("nbd: use an idr to keep track of nbd devices") Fixes: 940c264984fd ("nbd: fix possible overflow for 'first_minor' in nbd_dev_add()") Signed-off-by: Zhang Wensheng <zhangwensheng5@huawei.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/20220310093224.4002895-1-zhangwensheng5@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 5142720dbe51befeb25f204f912ef1ad93fba343) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Reason is when add delay in recv_work, lead to release the last reference
of 'nbd->config_refs'. nbd_config_put will call flush_workqueue to make
all work finish. Obviously, it will lead to deadloop.
To solve this issue, according to Josef's suggestion move 'recv_work'
init from start device to nbd_dev_add, then destroy 'recv_work'when
nbd device teardown.
We never checked for errors on add_disk() as this function
returned void. Now that this is fixed, use the shiny new
error handling.
Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 3e526e9ae0e4d35765a4a5344695ded0c60c6223) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
As the potential failure of the wm8350_register_irq(),
it should be better to check it and return error if fails.
Also, it need not free 'wm_rtc->rtc' since it will be freed
automatically.
Fixes: 077eaf5b40ec ("rtc: rtc-wm8350: add support for WM8350 RTC") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Acked-by: Charles Keepax <ckeepax@opensource.cirrus.com> Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20220303085030.291793-1-jiasheng@iscas.ac.cn Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit f8008f42d46377096178c43de6aeec81cd6f3eeb) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Due to dropped inclusion of asm-generic/xor.h, xor_block_8regs symbol is
missing with CONFIG64 and break compilation, as the asm/xor_64.h also did
not include it. The patch recreate the logic from arch/x86, which check
whether AVX is available and add fallbacks for 32bit and 64bit config of
um.
A very minor additional "fix" is, the return of the macro parameter
instead of NULL, as this is the original intent of the macro, but
this does not change the actual behavior.
Fixes: c0ecca6604b8 ("um: enable the use of optimized xor routines in UML") Signed-off-by: Benjamin Beichler <benjamin.beichler@uni-rostock.de> Acked-By: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b257272f5483eaa0b2e5139d3ac2299e2cfd6100) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Due to some renaming, we ended up with the "indirect iomem"
naming in Kconfig, following INDIRECT_PIO. However, clearly
I missed following through on that in the ifdefs, but so far
INDIRECT_IOMEM_FALLBACK isn't used by any architecture.
Reported-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Fixes: ca2e334232b6 ("lib: add iomem emulation (logic_iomem)") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 2be1a7f09635031aab33b7400d3c991876b74b5d) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Commit 6dce5aa59e0b ("PCI: xgene: Use inbound resources for setup")
killed PCIe on my XGene-1 box (a Mustang board). The machine itself
is still alive, but half of its storage (over NVMe) is gone, and the
NVMe driver just times out.
Note that this machine boots with a device tree provided by the
UEFI firmware (2016 vintage), which could well be non conformant
with the spec, hence the breakage.
With the patch reverted, the box boots 5.17-rc8 with flying colors.
Link: https://lore.kernel.org/all/Yf2wTLjmcRj+AbDv@xps13.dannf Link: https://lore.kernel.org/r/20220321104843.949645-2-maz@kernel.org Fixes: 6dce5aa59e0b ("PCI: xgene: Use inbound resources for setup") Signed-off-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: stable@vger.kernel.org Cc: Rob Herring <robh@kernel.org> Cc: Toan Le <toan@os.amperecomputing.com> Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Cc: Krzysztof Wilczyński <kw@linux.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Stéphane Graber <stgraber@ubuntu.com> Cc: dann frazier <dann.frazier@canonical.com>
[dannf: minor context adjustment] Signed-off-by: dann frazier <dann.frazier@canonical.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 541b7456fc4d370d22f9213636e03bf536c3d616) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Matthew Wilcox reported that there is a missing mmap_lock in
file_files_note that could possibly lead to a user after free.
Solve this by using the existing vma snapshot for consistency
and to avoid the need to take the mmap_lock anywhere in the
coredump code except for dump_vma_snapshot.
Update the dump_vma_snapshot to capture vm_pgoff and vm_file
that are neeeded by fill_files_note.
Add free_vma_snapshot to free the captured values of vm_file.
Reported-by: Matthew Wilcox <willy@infradead.org> Link: https://lkml.kernel.org/r/20220131153740.2396974-1-willy@infradead.org Cc: stable@vger.kernel.org Fixes: a07279c9a8cd ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot") Fixes: 2aa362c49c31 ("coredump: extend core dump note section to contain file names of mapped files") Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 39fd0cc079c98dafcf355997ada7b5e67f0bb10a) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Move the call of dump_vma_snapshot and kvfree(vma_meta) out of the
individual coredump routines into do_coredump itself. This makes
the code less error prone and easier to maintain.
Make the vma snapshot available to the coredump routines
in struct coredump_params. This makes it easier to
change and update what is captures in the vma snapshot
and will be needed for fixing fill_file_notes.
Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f6ca862806df3762170cd3251852330304e781c9) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Pass the non-aligned size to __iommu_dma_map when using swiotlb bounce
buffers in iommu_dma_map_page, to account for min_align_mask.
To deal with granule alignment, __iommu_dma_map maps iova_align(size +
iova_off) bytes starting at phys - iova_off. If iommu_dma_map_page
passes aligned size when using swiotlb, then this becomes
iova_align(iova_align(orig_size) + iova_off). Normally iova_off will be
zero when using swiotlb. However, this is not the case for devices that
set min_align_mask. When iova_off is non-zero, __iommu_dma_map ends up
mapping an extra page at the end of the buffer. Beyond just being a
security issue, the extra page is not cleaned up by __iommu_dma_unmap.
This causes problems when the IOVA is reused, due to collisions in the
iommu driver. Just passing the original size is sufficient, since
__iommu_dma_map will take care of granule alignment.
Add an argument to swiotlb_tbl_map_single that specifies the desired
alignment of the allocated buffer. This is used by dma-iommu to ensure
the buffer is aligned to the iova granule size when using swiotlb with
untrusted sub-granule mappings. This addresses an issue where adjacent
slots could be exposed to the untrusted device if IO_TLB_SIZE < iova
granule < PAGE_SIZE.
Signed-off-by: David Stevens <stevensd@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210929023300.335969-7-stevensd@google.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Cc: Mario Limonciello <Mario.Limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3e44e136560cb66dd5156889f811b870789b9499) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Introduce a new dev_use_swiotlb function to guard swiotlb code, instead
of overloading dev_is_untrusted. This allows CONFIG_SWIOTLB to be
checked more broadly, so the swiotlb related code can be removed more
aggressively.
Signed-off-by: David Stevens <stevensd@chromium.org> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210929023300.335969-6-stevensd@google.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Cc: Mario Limonciello <Mario.Limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 52d23f5f091509acda49b6b38edad96f175c06ff) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Fold the _swiotlb helper functions into the respective _page functions,
since recent fixes have moved all logic from the _page functions to the
_swiotlb functions.
Signed-off-by: David Stevens <stevensd@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20210929023300.335969-5-stevensd@google.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Cc: Mario Limonciello <Mario.Limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit bc05d84824c0d1deba7a36bc63785d611833c513) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Calling the iommu_dma_sync_*_for_cpu functions during unmap can cause
two copies out of the swiotlb buffer. Do the arch sync directly in
__iommu_dma_unmap_swiotlb instead to avoid this. This makes the call to
iommu_dma_sync_sg_for_cpu for untrusted devices in iommu_dma_unmap_sg no
longer necessary, so move that invocation later in the function.
Signed-off-by: David Stevens <stevensd@chromium.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Robin Murphy <robin.murphy@arm.com> Link: https://lore.kernel.org/r/20210929023300.335969-4-stevensd@google.com Signed-off-by: Joerg Roedel <jroedel@suse.de> Cc: Mario Limonciello <Mario.Limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c3841d020b82f6d92f8ac84ef2bb2f398c6eae7e) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
FNAME(cmpxchg_gpte) is an inefficient mess. It is at least decent if it
can go through get_user_pages_fast(), but if it cannot then it tries to
use memremap(); that is not just terribly slow, it is also wrong because
it assumes that the VM_PFNMAP VMA is contiguous.
The right way to do it would be to do the same thing as
hva_to_pfn_remapped() does since commit add6a0cd1c5b ("KVM: MMU: try to
fix up page faults before giving up", 2016-07-05), using follow_pte()
and fixup_user_fault() to determine the correct address to use for
memremap(). To do this, one could for example extract hva_to_pfn()
for use outside virt/kvm/kvm_main.c. But really there is no reason to
do that either, because there is already a perfectly valid address to
do the cmpxchg() on, only it is a userspace address. That means doing
user_access_begin()/user_access_end() and writing the code in assembly
to handle exceptions correctly. Worse, the guest PTE can be 8-byte
even on i686 so there is the extra complication of using cmpxchg8b to
account for. But at least it is an efficient mess.
(Thanks to Linus for suggesting improvement on the inline assembly).
Reported-by: Qiuhao Li <qiuhao@sysec.org> Reported-by: Gaoning Pan <pgn@zju.edu.cn> Reported-by: Yongkang Jia <kangel@zju.edu.cn> Reported-by: syzbot+6cde2282daa792c49ab8@syzkaller.appspotmail.com Debugged-by: Tadeusz Struk <tadeusz.struk@linaro.org> Tested-by: Maxim Levitsky <mlevitsk@redhat.com> Cc: stable@vger.kernel.org Fixes: bd53cb35a3e9 ("X86/KVM: Handle PFNs outside of kernel reach when touching GPTEs") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8771d9673e0bdb7148299f3c074667124bde6dff) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Since MMC core handles runtime PM reference counting, we can avoid doing
redundant runtime PM work in the driver. That means the only thing
commit 5b4258f6721f ("misc: rtsx: rts5249 support runtime PM") misses is
to always enable runtime PM, to let its parent driver enable ASPM in the
runtime idle routine.
drivers/block/n64cart.c: In function ‘n64cart_submit_bio’:
drivers/block/n64cart.c:91:26: error: ‘struct bio’ has no member named ‘bi_disk’
91 | struct device *dev = bio->bi_disk->private_data;
| ^~
CC drivers/slimbus/qcom-ctrl.o
CC drivers/auxdisplay/hd44780.o
CC drivers/watchdog/watchdog_core.o
CC drivers/nvme/host/fault_inject.o
AR drivers/accessibility/braille/built-in.a
make[2]: *** [scripts/Makefile.build:288: drivers/block/n64cart.o] Error 1
Fixes: 309dca309fc3 ("block: store a block_device pointer in struct bio"); Reported-by: k2ci <kernel-bot@kylinos.cn> Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220321071216.1549596-1-liu.yun@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a9bbdeef768faac1a00a17f98d40af218e29be71) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
This commit fixes a couple of typos: s/--doall/--do-all/ and
s/--doallmodconfig/--do-allmodconfig/.
[ paulmck: Add Fixes: supplied by Paul Menzel. ]
Fixes: a115a775a8d5 ("torture: Add "make allmodconfig" to torture.sh") Reported-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2a710a5c59e9ff2214def7199a4f5e7fd48f0247) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
This is a mix of a documentation fix with some additions to the
"panic_print" syscall / parameter. The goal here is being able to collect
all CPUs backtraces during a panic event and also to enable "panic_print"
in a kdump event - details of the reasoning and design choices in the
patches.
This patch (of 3):
Commit de6da1e8bcf0 ("panic: add an option to replay all the printk
message in buffer") added a new bit to the sysctl/kernel parameter
"panic_print", but the documentation was added only in
kernel-parameters.txt, not in the sysctl guide.
Fix it here by adding bit 5 to sysctl admin-guide documentation.
Moving to an EPOLL based IRQ controller broke uml_mconsole stop/go
commands. This fixes it and restores stop/go functionality.
Fixes: ff6a17989c08 ("Epoll based IRQ controller") Signed-off-by: Anton Ivanov <anton.ivanov@cambridgegreys.com> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 166abd13eab0e816a73c7d936be95d819fcfc6eb) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
this patch support tick_delay bit[31:30] without enhance_timing feature.
Fixes: f84d866ab43f("spi: mediatek: add tick_delay support") Signed-off-by: Leilk Liu <leilk.liu@mediatek.com> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Link: https://lore.kernel.org/r/20220315032411.2826-2-leilk.liu@mediatek.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit dd8772224c19ef0851baa8bea4aafd50a2276fa5) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
According to subdevice interface specification found in V4L2 API
documentation, set format pad operations should not affect image
geometry set in preceding image processing steps. Unfortunately, that
requirement is not respected by the driver implementation of set format
as it was not the case when that code was still implementing a pair of
now obsolete .s_mbus_fmt() / .try_mbus_fmt() video operations before
they have been merged and reused as an implementation of .set_fmt() pad
operation by commit 717fd5b4907a ("[media] v4l2: replace try_mbus_fmt
by set_fmt").
Exclude non-compliant crop rectangle adjustments from set format try,
as well as a call to .set_selection() from set format active processing
path, so only frame scaling is applied as needed and crop rectangle is
no longer modified.
[Sakari Ailus: Rebase on subdev state patches]
Fixes: 717fd5b4907a ("[media] v4l2: replace try_mbus_fmt by set_fmt") Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2a6e0695ddd51d90de298951bd53d04f5aef03c4) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Try requests are now only supported by format processing pad operations
implemented by the driver. The driver selection API operations
currently respond to them with -EINVAL. While that is correct, it
constraints video device drivers to not use subdevice cropping at all
while processing user requested active frame size, otherwise their set
try format results might differ from active. As a consequence, we
can't fix set format pad operation as not to touch crop rectangle since
that would affect users not being able to set arbitrary frame sizes.
Moreover, without a working set try selection support we are not able
to use pad config crop rectangle as a reference while processing set
try format requests.
Implement missing try selection support. Moreover, as it will be now
possible to maintain the pad config crop rectangle via selection API,
start using it instead of the active one as a reference while
processing set try format requests.
is_unscaled_ok() helper, now also called from set selection operation,
has been just moved up in the source file to avoid a prototype, with no
functional changes.
[Sakari Ailus: Rebase on subdev state patches]
Fixes: 717fd5b4907a ("[media] v4l2: replace try_mbus_fmt by set_fmt") Signed-off-by: Janusz Krzysztofik <jmkrzyszt@gmail.com> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3995d4cf529c3369ad2901d8b9d0a8bcfc6cc840) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>