While commit 6a01afcf8468 ("mac80211: mesh: Free ie data when leaving
mesh") fixed a memory leak on mesh leave / teardown it introduced a
potential memory corruption caused by a double free when rejoining the
mesh:
This double free / kernel panics can be reproduced by using wpa_supplicant
with an encrypted mesh (if set up without encryption via "iw" then
ifmsh->ie is always NULL, which avoids this issue). And then calling:
$ iw dev mesh0 mesh leave
$ iw dev mesh0 mesh join my-mesh
Note that typically these commands are not used / working when using
wpa_supplicant. And it seems that wpa_supplicant or wpa_cli are going
through a NETDEV_DOWN/NETDEV_UP cycle between a mesh leave and mesh join
where the NETDEV_UP resets the mesh.ie to NULL via a memcpy of
default_mesh_setup in cfg80211_netdev_notifier_call, which then avoids
the memory corruption, too.
The issue was first observed in an application which was not using
wpa_supplicant but "Senf" instead, which implements its own calls to
nl80211.
Fixing the issue by removing the kfree()'ing of the mesh IE in the mesh
join function and leaving it solely up to the mesh leave to free the
mesh IE.
Cc: stable@vger.kernel.org Fixes: 6a01afcf8468 ("mac80211: mesh: Free ie data when leaving mesh") Reported-by: Matthias Kretschmer <mathias.kretschmer@fit.fraunhofer.de> Signed-off-by: Linus Lüssing <ll@simonwunderlich.de> Tested-by: Mathias Kretschmer <mathias.kretschmer@fit.fraunhofer.de> Link: https://lore.kernel.org/r/20220310183513.28589-1-linus.luessing@c0d3.blue Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 12e407a8ef17623823fd0c066fbd7f103953d28d) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Three architectures check the end of a user access against the
address limit without taking a possible overflow into account.
Passing a negative length or another overflow in here returns
success when it should not.
Use the most common correct implementation here, which optimizes
for a constant 'size' argument, and turns the common case into a
single comparison.
Cc: stable@vger.kernel.org Fixes: da551281947c ("csky: User access") Fixes: f663b60f5215 ("microblaze: Fix uaccess_ok macro") Fixes: 7567746e1c0d ("Hexagon: Add user access functions") Reported-by: David Laight <David.Laight@aculab.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e65d28d4e9bf90a35ba79c06661a572a38391dec) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Currently rcu_preempt_deferred_qs_irqrestore() releases rnp->boost_mtx
before reporting the expedited quiescent state. Under heavy real-time
load, this can result in this function being preempted before the
quiescent state is reported, which can in turn prevent the expedited grace
period from completing. Tim Murray reports that the resulting expedited
grace periods can take hundreds of milliseconds and even more than one
second, when they should normally complete in less than a millisecond.
This was fine given that there were no particular response-time
constraints for synchronize_rcu_expedited(), as it was designed
for throughput rather than latency. However, some users now need
sub-100-millisecond response-time constratints.
This patch therefore follows Neeraj's suggestion (seconded by Tim and
by Uladzislau Rezki) of simply reversing the two operations.
Reported-by: Tim Murray <timmurray@google.com> Reported-by: Joel Fernandes <joelaf@google.com> Reported-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Reviewed-by: Neeraj Upadhyay <quic_neeraju@quicinc.com> Reviewed-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by: Tim Murray <timmurray@google.com> Cc: Todd Kjos <tkjos@google.com> Cc: Sandeep Patil <sspatil@google.com> Cc: <stable@vger.kernel.org> # 5.4.x Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 058d62a03e7d057d5eeec0db800117765ff23e6c) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
If virtio_gpu_object_shmem_init() fails (e.g. due to fault injection, as it
happened in the bug report by syzbot), virtio_gpu_array_put_free() could be
called with objs equal to NULL.
Ensure that objs is not NULL in virtio_gpu_array_put_free(), or otherwise
return from the function.
Cc: stable@vger.kernel.org # 5.13.x Signed-off-by: Roberto Sassu <roberto.sassu@huawei.com> Reported-by: syzbot+e9072e90624a31dfa85f@syzkaller.appspotmail.com Fixes: 377f8331d0565 ("drm/virtio: fix possible leak/unlock virtio_gpu_object_array") Link: http://patchwork.freedesktop.org/patch/msgid/20211213183122.838119-1-roberto.sassu@huawei.com Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b094fece3810c71ceee6f0921676cb65d4e68c5a) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The implementations of aead and skcipher in the QAT driver do not
support properly requests with the CRYPTO_TFM_REQ_MAY_BACKLOG flag set.
If the HW queue is full, the driver returns -EBUSY but does not enqueue
the request.
This can result in applications like dm-crypt waiting indefinitely for a
completion of a request that was never submitted to the hardware.
To avoid this problem, disable the registration of all crypto algorithms
in the QAT driver by setting the number of crypto instances to 0 at
configuration time.
Cc: stable@vger.kernel.org Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit cb807cb52a8e399dfe7d58a64549cbbbb176ba9d) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Clevo NL5xRU and NL5xNU/TUXEDO Aura 15 Gen1 and Gen2 have both a working
native and video interface. However the default detection mechanism first
registers the video interface before unregistering it again and switching
to the native interface during boot. This results in a dangling SBIOS
request for backlight change for some reason, causing the backlight to
switch to ~2% once per boot on the first power cord connect or disconnect
event. Setting the native interface explicitly circumvents this buggy
behaviour by avoiding the unregistering process.
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 57a2b3f8bf1c91f4a7daf40cc77a095d8fc14493) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
For some reason, the Microsoft Surface Go 3 uses the standard ACPI
interface for battery information, but does not use the standard PNP0C0A
HID. Instead it uses MSHW0146 as identifier. Add that ID to the driver
as this seems to work well.
Additionally, the power state is not updated immediately after the AC
has been (un-)plugged, so add the respective quirk for that.
Signed-off-by: Maximilian Luz <luzmaximilian@gmail.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a01ac24114899d46dafcb0dc0c204418f9d28ea4) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
On this board the ACPI RSDP structure points to both a RSDT and an XSDT,
but the XSDT points to a truncated FADT. This causes all sorts of trouble
and usually a complete failure to boot after the following error occurs:
This leaves the ACPI implementation in such a broken state that subsequent
kernel subsystem initialisations go wrong, resulting in among others
mismapped PCI memory, SATA and USB enumeration failures, and freezes.
As this is an older embedded platform that will likely never see any BIOS
updates to address this issue and its default shipping OS only complies to
ACPI 1.0, work around this by forcing `acpi=rsdt`. This patch, applied on
top of Linux 5.10.102, was confirmed on real hardware to fix the issue.
Signed-off-by: Mark Cilissen <mark@yotsuba.nl> Cc: All applicable <stable@vger.kernel.org> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8942aac690161dd3bd8c91044c6f867d9f10a364) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
All packets on ingress (except for jumbo) are terminated with a 4-bytes
CRC checksum. It's the responsability of the driver to strip those 4
bytes. Unfortunately a change dating back to March 2017 re-shuffled some
code and made the CRC stripping code effectively dead.
This change re-orders that part a bit such that the datalen is
immediately altered if needed.
Fixes: 4902a92270fb ("drivers: net: xgene: Add workaround for errata 10GE_8/ENET_11") Cc: stable@vger.kernel.org Signed-off-by: Stephane Graber <stgraber@ubuntu.com> Tested-by: Stephane Graber <stgraber@ubuntu.com> Link: https://lore.kernel.org/r/20220322224205.752795-1-stgraber@ubuntu.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3e27eafac6590d09164e4ad00a2e99b30e4ea37e) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Tests 72 and 78 for ALSA in kselftest fail due to reading
inconsistent values from some devices on a VirtualBox
Virtual Machine using the snd_intel8x0 driver for the AC'97
Audio Controller device.
Taking for example test number 72, this is what the test reports:
"Surround Playback Volume.0 expected 1 but read 0, is_volatile 0"
"Surround Playback Volume.1 expected 0 but read 1, is_volatile 0"
These errors repeat for each value from 0 to 31.
Taking a look at these error messages it is possible to notice
that the written values are read back swapped.
When the write is performed, these values are initially stored in
an array used to sanity-check them and write them in the pcmreg
array. To write them, the two one-byte values are packed together
in a two-byte variable through bitwise operations: the first
value is shifted left by one byte and the second value is stored in the
right byte through a bitwise OR. When reading the values back,
right shifts are performed to retrieve the previously stored
bytes. These shifts are executed in the wrong order, thus
reporting the values swapped as shown above.
This patch fixes this mistake by reversing the read
operations' order.
Signed-off-by: Giacomo Guiduzzi <guiduzzi.giacomo@gmail.com> Signed-off-by: Paolo Valente <paolo.valente@linaro.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220322200653.15862-1-guiduzzi.giacomo@gmail.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c2052ad0c74fd0a1c6def7dd5a9bbad520f52687) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
For the RODE NT-USB the lowest Playback mixer volume setting mutes the
audio output. But it is not reported as such causing e.g. PulseAudio to
accidentally mute the device when selecting a low volume.
Fix this by applying the existing quirk for this kind of issue when the
device is detected.
snd_pcm_reset() is a non-atomic operation, and it's allowed to run
during the PCM stream running. It implies that the manipulation of
hw_ptr and other parameters might be racy.
This patch adds the PCM stream lock at appropriate places in
snd_pcm_*_reset() actions for covering that.
We have no protection against concurrent PCM buffer preallocation
changes via proc files, and it may potentially lead to UAF or some
weird problem. This patch applies the PCM open_mutex to the proc
write operation for avoiding the racy proc writes and the PCM stream
open (and further operations).
Like the previous fixes to hw_params and hw_free ioctl races, we need
to paper over the concurrent prepare ioctl calls against hw_params and
hw_free, too.
This patch implements the locking with the existing
runtime->buffer_mutex for prepare ioctls. Unlike the previous case
for snd_pcm_hw_hw_params() and snd_pcm_hw_free(), snd_pcm_prepare() is
performed to the linked streams, hence the lock can't be applied
simply on the top. For tracking the lock in each linked substream, we
modify snd_pcm_action_group() slightly and apply the buffer_mutex for
the case stream_lock=false (formerly there was no lock applied)
there.
In the current PCM design, the read/write syscalls (as well as the
equivalent ioctls) are allowed before the PCM stream is running, that
is, at PCM PREPARED state. Meanwhile, we also allow to re-issue
hw_params and hw_free ioctl calls at the PREPARED state that may
change or free the buffers, too. The problem is that there is no
protection against those mix-ups.
This patch applies the previously introduced runtime->buffer_mutex to
the read/write operations so that the concurrent hw_params or hw_free
call can no longer interfere during the operation. The mutex is
unlocked before scheduling, so we don't take it too long.
Currently we have neither proper check nor protection against the
concurrent calls of PCM hw_params and hw_free ioctls, which may result
in a UAF. Since the existing PCM stream lock can't be used for
protecting the whole ioctl operations, we need a new mutex to protect
those racy calls.
This patch introduced a new mutex, runtime->buffer_mutex, and applies
it to both hw_params and hw_free ioctl code paths. Along with it, the
both functions are slightly modified (the mmap_count check is moved
into the state-check block) for code simplicity.
Reported-by: Hu Jiahui <kirin.say@gmail.com> Cc: <stable@vger.kernel.org> Reviewed-by: Jaroslav Kysela <perex@perex.cz> Link: https://lore.kernel.org/r/20220322170720.3529-2-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 33061d0fba51d2bf70a2ef9645f703c33fe8e438) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
On a HP 288 Pro G8, the front mic could not be detected.In order to
get it working, the pin configuration needs to be set correctly, and
the ALC671_FIXUP_HP_HEADSET_MIC2 fixup needs to be applied.
New device id for Corsair Virtuoso SE RGB Wireless that currently is not
in the mixer_map. This entry in the mixer_map is necessary in order to
label its mixer appropriately and allow userspace to pick the correct
volume controls. For instance, my own Corsair Virtuoso SE RGB Wireless
headset has this new ID and consequently, the sidetone and volume are not
working correctly without this change.
> sudo lsusb -v | grep -i corsair
Bus 007 Device 011: ID 1b1c:0a40 Corsair CORSAIR VIRTUOSO SE Wireless Gam
idVendor 0x1b1c Corsair
iManufacturer 1 Corsair
iProduct 2 CORSAIR VIRTUOSO SE Wireless Gaming Headset
We've got syzbot reports hitting INT_MAX overflow at vmalloc()
allocation that is called from snd_pcm_plug_alloc(). Although we
apply the restrictions to input parameters, it's based only on the
hw_params of the underlying PCM device. Since the PCM OSS layer
allocates a temporary buffer for the data conversion, the size may
become unexpectedly large when more channels or higher rates is given;
in the reported case, it went over INT_MAX, hence it hits WARN_ON().
This patch is an attempt to avoid such an overflow and an allocation
for too large buffers. First off, it adds the limit of 1MB as the
upper bound for period bytes. This must be large enough for all use
cases, and we really don't want to handle a larger temporary buffer
than this size. The size check is performed at two places, where the
original period bytes is calculated and where the plugin buffer size
is calculated.
In addition, the driver uses array_size() and array3_size() for
multiplications to catch overflows for the converted period size and
buffer bytes.
This is essentially a revert of the commit dc865fb9e7c2 ("ASoC: sti:
Use snd_pcm_stop_xrun() helper"), which converted the manual
snd_pcm_stop() calls with snd_pcm_stop_xrun().
The commit above introduced a deadlock as snd_pcm_stop_xrun() itself
takes the PCM stream lock while the caller already holds it. Since
the conversion was done only for consistency reason and the open-call
with snd_pcm_stop() to the XRUN state is a correct usage, let's revert
the commit back as the fix.
Whenever llc_ui_bind() and/or llc_ui_autobind()
took a reference on a netdevice but subsequently fail,
they must properly release their reference
or risk the infamous message from unregister_netdevice()
at device dismantle.
unregister_netdevice: waiting for eth0 to become free. Usage count = 3
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: 赵子轩 <beraphin@gmail.com> Reported-by: Stoyan Manolov <smanolov@suse.de> Link: https://lore.kernel.org/r/20220323004147.1990845-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e9072996108387ab19b497f5b557c93f98d96b0b) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
When an invalid (non existing) handle is used in a TPM command,
that uses the resource manager interface (/dev/tpmrm0) the resource
manager tries to load it from its internal cache, but fails and
the tpm_dev_transmit returns an -EINVAL error to the caller.
The existing async handler doesn't handle these error cases
currently and the condition in the poll handler never returns
mask with EPOLLIN set.
The result is that the poll call blocks and the application gets stuck
until the user_read_timer wakes it up after 120 sec.
Change the tpm_dev_async_work function to handle error conditions
returned from tpm_dev_transmit they are also reflected in the poll mask
and a correct error code could passed back to the caller.
Cc: Jarkko Sakkinen <jarkko@kernel.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: <linux-integrity@vger.kernel.org> Cc: <stable@vger.kernel.org> Cc: <linux-kernel@vger.kernel.org> Fixes: 9e1b74a63f77 ("tpm: add support for nonblocking operation") Tested-by: Jarkko Sakkinen<jarkko@kernel.org> Signed-off-by: Tadeusz Struk <tstruk@gmail.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org> Cc: Tadeusz Struk <tadeusz.struk@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 42b9f6d19faa86ae74a35574b9db71cebae5cf10) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
After the recent changes made by commit c2e39305299f01 ("btrfs: clear
extent buffer uptodate when we fail to write it") and its followup fix,
commit 651740a5024117 ("btrfs: check WRITE_ERR when trying to read an
extent buffer"), we can now end up not cleaning up space reservations of
log tree extent buffers after a transaction abort happens, as well as not
cleaning up still dirty extent buffers.
This happens because if writeback for a log tree extent buffer failed,
then we have cleared the bit EXTENT_BUFFER_UPTODATE from the extent buffer
and we have also set the bit EXTENT_BUFFER_WRITE_ERR on it. Later on,
when trying to free the log tree with free_log_tree(), which iterates
over the tree, we can end up getting an -EIO error when trying to read
a node or a leaf, since read_extent_buffer_pages() returns -EIO if an
extent buffer does not have EXTENT_BUFFER_UPTODATE set and has the
EXTENT_BUFFER_WRITE_ERR bit set. Getting that -EIO means that we return
immediately as we can not iterate over the entire tree.
In that case we never update the reserved space for an extent buffer in
the respective block group and space_info object.
When this happens we get the following traces when unmounting the fs:
This also means that in case we have log tree extent buffers that are
still dirty, we can end up not cleaning them up in case we find an
extent buffer with EXTENT_BUFFER_WRITE_ERR set on it, as in that case
we have no way for iterating over the rest of the tree.
This issue is very often triggered with test cases generic/475 and
generic/648 from fstests.
The issue could almost be fixed by iterating over the io tree attached to
each log root which keeps tracks of the range of allocated extent buffers,
log_root->dirty_log_pages, however that does not work and has some
inconveniences:
1) After we sync the log, we clear the range of the extent buffers from
the io tree, so we can't find them after writeback. We could keep the
ranges in the io tree, with a separate bit to signal they represent
extent buffers already written, but that means we need to hold into
more memory until the transaction commits.
How much more memory is used depends a lot on whether we are able to
allocate contiguous extent buffers on disk (and how often) for a log
tree - if we are able to, then a single extent state record can
represent multiple extent buffers, otherwise we need multiple extent
state record structures to track each extent buffer.
In fact, my earlier approach did that:
However that can cause a very significant negative impact on
performance, not only due to the extra memory usage but also because
we get a larger and deeper dirty_log_pages io tree.
We got a report that, on beefy machines at least, we can get such
performance drop with fsmark for example:
2) We would be doing it only to deal with an unexpected and exceptional
case, which is basically failure to read an extent buffer from disk
due to IO failures. On a healthy system we don't expect transaction
aborts to happen after all;
3) Instead of relying on iterating the log tree or tracking the ranges
of extent buffers in the dirty_log_pages io tree, using the radix
tree that tracks extent buffers (fs_info->buffer_radix) to find all
log tree extent buffers is not reliable either, because after writeback
of an extent buffer it can be evicted from memory by the release page
callback of the btree inode (btree_releasepage()).
Since there's no way to be able to properly cleanup a log tree without
being able to read its extent buffers from disk and without using more
memory to track the logical ranges of the allocated extent buffers do
the following:
1) When we fail to cleanup a log tree, setup a flag that indicates that
failure;
2) Trigger writeback of all log tree extent buffers that are still dirty,
and wait for the writeback to complete. This is just to cleanup their
state, page states, page leaks, etc;
3) When unmounting the fs, ignore if the number of bytes reserved in a
block group and in a space_info is not 0 if, and only if, we failed to
cleanup a log tree. Also ignore only for metadata block groups and the
metadata space_info object.
This is far from a perfect solution, but it serves to silence test
failures such as those from generic/475 and generic/648. However having
a non-zero value for the reserved bytes counters on unmount after a
transaction abort, is not such a terrible thing and it's completely
harmless, it does not affect the filesystem integrity in any way.
Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 4c5d94990fa2fd609360ecd0f7e183212a7d115c) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Before this patch, the symbol end address fixup to be called, needed two
conditions being met:
if (prev->end == prev->start && prev->end != curr->start)
Where
"prev->end == prev->start" means that prev is zero-long
(and thus needs a fixup)
and
"prev->end != curr->start" means that fixup hasn't been applied yet
However, this logic is incorrect in the following situation:
In this case, prev->start == prev->end == curr->start == curr->end,
thus the condition above thinks that "we need a fixup due to zero
length of prev symbol, but it has been probably done, since the
prev->end == curr->start", which is wrong.
After the patch, the execution path proceeds to arch__symbols__fixup_end
function which fixes up the size of prev symbol by adding page_size to
its end offset.
Fixes: 3b01a413c196c910 ("perf symbols: Improve kallsyms symbol end addr calculation") Signed-off-by: Michael Petlan <mpetlan@redhat.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Link: http://lore.kernel.org/lkml/20220317135536.805-1-mpetlan@redhat.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 23775775b9a66b2ea1bb4873282998351f928c3e) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Syzbot reported warning in usb_submit_urb() which is caused by wrong
endpoint type. There was a check for the number of endpoints, but not
for the type of endpoint.
Fix it by replacing old desc.bNumEndpoints check with
usb_find_common_endpoints() helper for finding endpoints
Fail log:
usb 5-1: BOGUS urb xfer, pipe 1 != type 3
WARNING: CPU: 2 PID: 48 at drivers/usb/core/urb.c:502 usb_submit_urb+0xed2/0x18a0 drivers/usb/core/urb.c:502
Modules linked in:
CPU: 2 PID: 48 Comm: kworker/2:2 Not tainted 5.17.0-rc6-syzkaller-00226-g07ebd38a0da2 #0
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2 04/01/2014
Workqueue: usb_hub_wq hub_event
...
Call Trace:
<TASK>
aiptek_open+0xd5/0x130 drivers/input/tablet/aiptek.c:830
input_open_device+0x1bb/0x320 drivers/input/input.c:629
kbd_connect+0xfe/0x160 drivers/tty/vt/keyboard.c:1593
Fixes: 8e20cf2bce12 ("Input: aiptek - fix crash on detecting device without endpoints") Reported-and-tested-by: syzbot+75cccf2b7da87fb6f84b@syzkaller.appspotmail.com Signed-off-by: Pavel Skripkin <paskripkin@gmail.com> Link: https://lore.kernel.org/r/20220308194328.26220-1-paskripkin@gmail.com Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e762f57ff255af28236cd02ca9fc5c7e5a089d31) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
This is caused by mpt3sas_base_sync_reply_irqs() using an invalid reply_q
pointer outside of the list_for_each_entry() loop. At the end of the full
list traversal the pointer is invalid.
Move the _base_process_reply_queue() call inside of the loop.
Link: https://lore.kernel.org/r/d625deae-a958-0ace-2ba3-0888dd0a415b@ddn.com Fixes: 711a923c14d9 ("scsi: mpt3sas: Postprocessing of target and LUN reset") Cc: stable@vger.kernel.org Acked-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com> Signed-off-by: Matt Lupfer <mlupfer@ddn.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0cd2dd4bcf4abc812148c4943f966a3c8dccb00f) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Although the bug manifested in the driver core, the real cause was a
race with the gadget core. dev_uevent() does:
if (dev->driver)
add_uevent_var(env, "DRIVER=%s", dev->driver->name);
and between the test and the dereference of dev->driver, the gadget
core sets dev->driver to NULL.
The race wouldn't occur if the gadget core registered its devices on
a real bus, using the standard synchronization techniques of the
driver core. However, it's not necessary to make such a large change
in order to fix this bug; all we need to do is make sure that
udc->dev.driver is always NULL.
In fact, there is no reason for udc->dev.driver ever to be set to
anything, let alone to the value it currently gets: the address of the
gadget's driver. After all, a gadget driver only knows how to manage
a gadget, not how to manage a UDC.
This patch simply removes the statements in the gadget core that touch
udc->dev.driver.
Fixes: 2ccea03a8f7e ("usb: gadget: introduce UDC Class") CC: <stable@vger.kernel.org> Reported-and-tested-by: syzbot+348b571beb5eeb70a582@syzkaller.appspotmail.com Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Link: https://lore.kernel.org/r/YiQgukfFFbBnwJ/9@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2015c23610cd0efadaeca4d3a8d1dae9a45aa35a) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The newly introduced TRAMP_VALIAS definition causes a build warning
with clang-14:
arch/arm64/include/asm/vectors.h:66:31: error: arithmetic on a null pointer treated as a cast from integer to pointer is a GNU extension [-Werror,-Wnull-pointer-arithmetic]
return (char *)TRAMP_VALIAS + SZ_2K * slot;
Change the addition to something clang does not complain about.
Fixes: bd09128d16fa ("arm64: Add percpu vectors for EL1") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: James Morse <james.morse@arm.com> Link: https://lore.kernel.org/r/20220316183833.1563139-1-arnd@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 222f5e2d7f20a8852a4f79efb353b9f8adadfa1c) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Recent commit 974578017fc1 ("iavf: Add waiting so the port is
initialized in remove") adds a wait-loop at the beginning of
iavf_remove() to ensure that port initialization is finished
prior unregistering net device. This causes a regression
in reboot/shutdown scenario because in this case callback
iavf_shutdown() is called and this callback detaches the device,
makes it down if it is running and sets its state to __IAVF_REMOVE.
Later shutdown callback of associated PF driver (e.g. ice_shutdown)
is called. That callback calls among other things sriov_disable()
that calls indirectly iavf_remove() (see stack trace below).
As the adapter state is already __IAVF_REMOVE then the mentioned
loop is end-less and shutdown process hangs.
The patch fixes this by checking adapter's state at the beginning
of iavf_remove() and skips the rest of the function if the adapter
is already in remove state (shutdown is in progress).
Reproducer:
1. Create VF on PF driven by ice or i40e driver
2. Ensure that the VF is bound to iavf driver
3. Reboot
ACL rules can be offloaded to VCAP IS2 either through chain 0, or, since
the blamed commit, through a chain index whose number encodes a specific
PAG (Policy Action Group) and lookup number.
The chain number is translated through ocelot_chain_to_pag() into a PAG,
and through ocelot_chain_to_lookup() into a lookup number.
The problem with the blamed commit is that the above 2 functions don't
have special treatment for chain 0. So ocelot_chain_to_pag(0) returns
filter->pag = 224, which is in fact -32, but the "pag" field is an u8.
So we end up programming the hardware with VCAP IS2 entries having a PAG
of 224. But the way in which the PAG works is that it defines a subset
of VCAP IS2 filters which should match on a packet. The default PAG is
0, and previous VCAP IS1 rules (which we offload using 'goto') can
modify it. So basically, we are installing filters with a PAG on which
no packet will ever match. This is the hardware equivalent of adding
filters to a chain which has no 'goto' to it.
Restore the previous functionality by making ACL filters offloaded to
chain 0 go to PAG 0 and lookup number 0. The choice of PAG is clearly
correct, but the choice of lookup number isn't "as before" (which was to
leave the lookup a "don't care"). However, lookup 0 should be fine,
since even though there are ACL actions (policers) which have a
requirement to be used in a specific lookup, that lookup is 0.
Fixes: 226e9cd82a96 ("net: mscc: ocelot: only install TCAM entries into a specific lookup and PAG") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Link: https://lore.kernel.org/r/20220316192117.2568261-1-vladimir.oltean@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 800a17adb531d417b8d219b25e37b4de4c3a6a8c) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The RXCHK block will return a partial checksum of 0 if it encounters
a problem while receiving a packet. Since a 1's complement sum can
only produce this result if no bits are set in the received data
stream it is fair to treat it as an invalid partial checksum and
not pass it up the stack.
Fixes: 810155397890 ("net: bcmgenet: use CHECKSUM_COMPLETE for NETIF_F_RXCSUM") Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Link: https://lore.kernel.org/r/20220317012812.1313196-1-opendmb@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 2d7cff7e1fee5a00115f5afcb185e8f8c765d407) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Commit b7a49f73059f ("bnx2x: Utilize firmware 7.13.21.0")
added request_firmware() logic in probe() which caused
load failure when firmware file is not present in initrd (below),
as access to firmware file is not feasible during probe.
Direct firmware load for bnx2x/bnx2x-e2-7.13.15.0.fw failed with error -2
Direct firmware load for bnx2x/bnx2x-e2-7.13.21.0.fw failed with error -2
This patch fixes this issue by -
1. Removing request_firmware() logic from the probe()
such that .ndo_open() handle it as it used to handle
it earlier
2. Given request_firmware() is removed from probe(), so
driver has to relax FW version comparisons a bit against
the already loaded FW version (by some other PFs of same
adapter) to allow different compatible/close enough FWs with which
multiple PFs may run with (in different environments), as the
given PF who is in probe flow has no idea now with which firmware
file version it is going to initialize the device in ndo_open()
Fix a number of undefined references to drm_kms_helper.ko in
drm_dp_helper.ko:
arm-suse-linux-gnueabi-ld: drivers/gpu/drm/dp/drm_dp_mst_topology.o: in function `drm_dp_mst_duplicate_state':
drm_dp_mst_topology.c:(.text+0x2df0): undefined reference to `__drm_atomic_helper_private_obj_duplicate_state'
arm-suse-linux-gnueabi-ld: drivers/gpu/drm/dp/drm_dp_mst_topology.o: in function `drm_dp_delayed_destroy_work':
drm_dp_mst_topology.c:(.text+0x370c): undefined reference to `drm_kms_helper_hotplug_event'
arm-suse-linux-gnueabi-ld: drivers/gpu/drm/dp/drm_dp_mst_topology.o: in function `drm_dp_mst_up_req_work':
drm_dp_mst_topology.c:(.text+0x7938): undefined reference to `drm_kms_helper_hotplug_event'
arm-suse-linux-gnueabi-ld: drivers/gpu/drm/dp/drm_dp_mst_topology.o: in function `drm_dp_mst_link_probe_work':
drm_dp_mst_topology.c:(.text+0x82e0): undefined reference to `drm_kms_helper_hotplug_event'
This happens if panel-edp.ko has been configured with
DRM_PANEL_EDP=y
DRM_DP_HELPER=y
DRM_KMS_HELPER=m
which builds DP helpers into the kernel and KMS helpers sa a module.
Making DRM_PANEL_EDP select DRM_KMS_HELPER resolves this problem.
To avoid a resulting cyclic dependency with DRM_PANEL_BRIDGE, don't
make the latter depend on DRM_KMS_HELPER and fix the one DRM bridge
drivers that doesn't already select DRM_KMS_HELPER. As KMS helpers
cannot be selected directly by the user, config symbols should avoid
depending on it anyway.
Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 3755d35ee1d2 ("drm/panel: Select DRM_DP_HELPER for DRM_PANEL_EDP") Acked-by: Sam Ravnborg <sam@ravnborg.org> Tested-by: Brian Masney <bmasney@redhat.com> Reported-by: kernel test robot <lkp@intel.com> Cc: Thomas Zimmermann <tzimmermann@suse.de> Cc: Naresh Kamboju <naresh.kamboju@linaro.org> Cc: Linux Kernel Functional Testing <lkft@linaro.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Maxime Ripard <mripard@kernel.org> Cc: dri-devel@lists.freedesktop.org Cc: Dave Airlie <airlied@redhat.com> Cc: Thierry Reding <thierry.reding@gmail.com> Link: https://patchwork.freedesktop.org/patch/478296/ Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 11dab4a800a4ffcceb805fd6bce33062669a3b5d) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
If display timings were read from the devicetree using
of_get_display_timing() and pixelclk-active is defined
there, the flag DISPLAY_FLAGS_SYNC_POSEDGE/NEGEDGE is
automatically generated. Through the function
drm_bus_flags_from_videomode() e.g. called in the
panel-simple driver this flag got into the bus flags,
but then in imx_pd_bridge_atomic_check() the bus flag
check failed and will not initialize the display. The
original commit fe141cedc433 does not explain why this
check was introduced. So remove the bus flags check,
because it stops the initialization of the display with
valid bus flags.
Fixes: fe141cedc433 ("drm/imx: pd: Use bus format/flags provided by the bridge when available") Signed-off-by: Christoph Niedermaier <cniedermaier@dh-electronics.com> Cc: Marek Vasut <marex@denx.de> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: David Airlie <airlied@linux.ie> Cc: Daniel Vetter <daniel@ffwll.ch> Cc: Shawn Guo <shawnguo@kernel.org> Cc: Sascha Hauer <s.hauer@pengutronix.de> Cc: Pengutronix Kernel Team <kernel@pengutronix.de> Cc: Fabio Estevam <festevam@gmail.com> Cc: NXP Linux Team <linux-imx@nxp.com> Cc: linux-arm-kernel@lists.infradead.org
To: dri-devel@lists.freedesktop.org Tested-by: Max Krummenacher <max.krummenacher@toradex.com> Acked-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Marek Vasut <marex@denx.de> Link: https://patchwork.freedesktop.org/patch/msgid/20220201113643.4638-1-cniedermaier@dh-electronics.com Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 43dcd410809e180cabecc6de88d3ba07f1ab45f3) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
As the potential failure of the kvmalloc_array(),
it should be better to check and restore the 'data'
if fails in order to avoid the dereference of the
NULL pointer.
Fix double free possibility in iavf_disable_vf, as crit_lock is
freed in caller, iavf_reset_task. Add kernel-doc for iavf_disable_vf.
Remove mutex_unlock in iavf_disable_vf.
Without this patch there is double free scenario, when calling
iavf_reset_task.
Fixes: e85ff9c631e1 ("iavf: Fix deadlock in iavf_reset_task") Signed-off-by: Przemyslaw Patynowski <przemyslawx.patynowski@intel.com> Suggested-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Mateusz Palczewski <mateusz.palczewski@intel.com> Tested-by: Konrad Jankowski <konrad0.jankowski@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 735f918cc2c8510583af8f0e4ded48ba6f19c9a4) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
As the potential failure of the dma_map_single(),
it should be better to check it and return error
if fails.
Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b2bc45327e8cbb1adefcf5cbd53c67868428982a) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
syzbot found that when an AF_PACKET socket is using PACKET_COPY_THRESH
and mmap operations, tpacket_rcv() is queueing skbs with
garbage in skb->cb[], triggering a too big copy [1]
Presumably, users of af_packet using mmap() already gets correct
metadata from the mapped buffer, we can simply make sure
to clear 12 bytes that might be copied to user space later.
BUG: KASAN: stack-out-of-bounds in memcpy include/linux/fortify-string.h:225 [inline]
BUG: KASAN: stack-out-of-bounds in packet_recvmsg+0x56c/0x1150 net/packet/af_packet.c:3489
Write of size 165 at addr ffffc9000385fb78 by task syz-executor233/3631
This bug resulted in only the current mode being resumed and suspended when
the PHY supported both fiber and copper modes and when the PHY only supported
copper mode the fiber mode would incorrectly be attempted to be resumed and
suspended.
Fixes: 3758be3dc162 ("Marvell phy: add functions to suspend and resume both interfaces: fiber and copper links.") Signed-off-by: Kurt Cancemi <kurt@x64architecture.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20220312201512.326047-1-kurt@x64architecture.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit a296f3ae8009fe1ee7384279d621d3bbca43580b) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Commit 5f9c55c8066b ("ipv6: check return value of ipv6_skip_exthdr")
introduced an incorrect check, which leads to all ESP packets over
either TCPv6 or UDPv6 encapsulation being dropped. In this particular
case, offset is negative, since skb->data points to the ESP header in
the following chain of headers, while skb->network_header points to
the IPv6 header:
IPv6 | ext | ... | ext | UDP | ESP | ...
That doesn't seem to be a problem, especially considering that if we
reach esp6_input_done2, we're guaranteed to have a full set of headers
available (otherwise the packet would have been dropped earlier in the
stack). However, it means that the return value will (intentionally)
be negative. We can make the test more specific, as the expected
return value of ipv6_skip_exthdr will be the (negated) size of either
a UDP header, or a TCP header with possible options.
In the future, we should probably either make ipv6_skip_exthdr
explicitly accept negative offsets (and adjust its return value for
error cases), or make ipv6_skip_exthdr only take non-negative
offsets (and audit all callers).
Fixes: 5f9c55c8066b ("ipv6: check return value of ipv6_skip_exthdr") Reported-by: Xiumei Mu <xmu@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit b9820bf09f599c907e32af09674e4bb1296711eb) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
When iterating over sockets using vsock_for_each_connected_socket, make
sure that a transport filters out sockets that don't belong to the
transport.
There actually was an issue caused by this; in a nested VM
configuration, destroying the nested VM (which often involves the
closing of /dev/vhost-vsock if there was h2g connections to the nested
VM) kills not only the h2g connections, but also all existing g2h
connections to the (outmost) host which are totally unrelated.
Tested: Executed the following steps on Cuttlefish (Android running on a
VM) [1]: (1) Enter into an `adb shell` session - to have a g2h
connection inside the VM, (2) open and then close /dev/vhost-vsock by
`exec 3< /dev/vhost-vsock && exec 3<&-`, (3) observe that the adb
session is not reset.
alx_reinit has a lockdep assertion that the alx->mtx mutex must be held.
alx_reinit is called from two places: alx_reset and alx_change_mtu.
alx_reset does acquire alx->mtx before calling alx_reinit.
alx_change_mtu does not acquire this mutex, nor do its callers or any
path towards alx_change_mtu.
Acquire the mutex in alx_change_mtu.
The issue was introduced when the fine-grained locking was introduced
to the code to replace the RTNL. The same commit also introduced the
lockdep assertion.
When "dump_apple_properties" is used on the kernel boot command line,
it causes an Unknown parameter message and the string is added to init's
argument strings:
Unknown kernel command line parameters "dump_apple_properties
BOOT_IMAGE=/boot/bzImage-517rc6 efivar_ssdt=newcpu_ssdt", will be
passed to user space.
Run /sbin/init as init process
with arguments:
/sbin/init
dump_apple_properties
with environment:
HOME=/
TERM=linux
BOOT_IMAGE=/boot/bzImage-517rc6
efivar_ssdt=newcpu_ssdt
Similarly when "efivar_ssdt=somestring" is used, it is added to the
Unknown parameter message and to init's environment strings, polluting
them (see examples above).
Change the return value of the __setup functions to 1 to indicate
that the __setup options have been handled.
Fixes: 58c5475aba67 ("x86/efi: Retrieve and assign Apple device properties") Fixes: 475fb4e8b2f4 ("efi / ACPI: load SSTDs from EFI variables") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reported-by: Igor Zhbanov <i.zhbanov@omprussia.ru>
Link: lore.kernel.org/r/64644a2f-4a20-bab3-1e15-3b2cdd0defe3@omprussia.ru Cc: Ard Biesheuvel <ardb@kernel.org> Cc: linux-efi@vger.kernel.org Cc: Lukas Wunner <lukas@wunner.de> Cc: Octavian Purdila <octavian.purdila@intel.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Matt Fleming <matt@codeblueprint.co.uk> Link: https://lore.kernel.org/r/20220301041851.12459-1-rdunlap@infradead.org Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
(cherry picked from commit 4b49ba22a25383ed6c7ce9e2c1ba12246dd618fc) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
commit f86c3ed55920 ("drm/mgag200: Split PLL setup into compute and
update functions") introduced a regression for g200wb and g200ew.
The PLLs are not set up properly, and VGA screen stays
black, or displays "out of range" message.
MGA1064_WB_PIX_PLLC_N/M/P was mistakenly replaced with
MGA1064_PIX_PLLC_N/M/P which have different addresses.
blkcg_init_queue() may add rq qos structures to request queue, previously
blk_cleanup_queue() calls rq_qos_exit() to release them, but commit 8e141f9eb803 ("block: drain file system I/O on del_gendisk")
moves rq_qos_exit() into del_gendisk(), so memory leak is caused
because queues may not have disk, such as un-present scsi luns, nvme
admin queue, ...
Fixes the issue by adding rq_qos_exit() to blk_cleanup_queue() back.
BTW, v5.18 won't need this patch any more since we move
blkcg_init_queue()/blkcg_exit_queue() into disk allocation/release
handler, and patches have been in for-5.18/block.
Cc: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org Fixes: 8e141f9eb803 ("block: drain file system I/O on del_gendisk") Reported-by: syzbot+b42749a851a47a0f581b@syzkaller.appspotmail.com Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220314043018.177141-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d4ad8736ac982111bb0be8306bf19c8207f6600e) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The reason for the livelock is that swapcache_prepare() always returns
EEXIST, indicating that SWAP_HAS_CACHE has not been cleared, so that it
cannot jump out of the loop. We suspect that the task that clears the
SWAP_HAS_CACHE flag never gets a chance to run. We try to lower the
priority of the task stuck in a livelock so that the task that clears
the SWAP_HAS_CACHE flag will run. The results show that the system
returns to normal after the priority is lowered.
In our testing, multiple real-time tasks are bound to the same core, and
the task in the livelock is the highest priority task of the core, so
the livelocked task cannot be preempted.
Although cond_resched() is used by __read_swap_cache_async, it is an
empty function in the preemptive system and cannot achieve the purpose
of releasing the CPU. A high-priority task cannot release the CPU
unless preempted by a higher-priority task. But when this task is
already the highest priority task on this core, other tasks will not be
able to be scheduled. So we think we should replace cond_resched() with
schedule_timeout_uninterruptible(1), schedule_timeout_interruptible will
call set_current_state first to set the task state, so the task will be
removed from the running queue, so as to achieve the purpose of giving
up the CPU and prevent it from running in kernel mode for too long.
(akpm: ugly hack becomes uglier. But it fixes the issue in a
backportable-to-stable fashion while we hopefully work on something
better)
Link: https://lkml.kernel.org/r/20220221111749.1928222-1-cgel.zte@gmail.com Signed-off-by: Guo Ziliang <guo.ziliang@zte.com.cn> Reported-by: Zeal Robot <zealci@zte.com.cn> Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Reviewed-by: Jiang Xuexin <jiang.xuexin@zte.com.cn> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Acked-by: Hugh Dickins <hughd@google.com> Cc: Naoya Horiguchi <naoya.horiguchi@nec.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Roger Quadros <rogerq@kernel.org> Cc: Ziliang Guo <guo.ziliang@zte.com.cn> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6829aa17ca55e6e61c9420aedf4a38e12c4e0320) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Once s_root is set, genric_shutdown_super() will be called if
fill_super() fails. That means, we will call ocfs2_dismount_volume()
twice in such case, which can lead to kernel crash.
Fix this issue by initializing filecheck kobj before setting s_root.
Link: https://lkml.kernel.org/r/20220310081930.86305-1-joseph.qi@linux.alibaba.com Fixes: 5f483c4abb50 ("ocfs2: add kobject for online file check") Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b786b64dcb312922f9d671b8bd75d3dec5ed9a53) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The generate function in struct rng_alg expects that the destination
buffer is completely filled if the function returns 0. qcom_rng_read()
can run into a situation where the buffer is partially filled with
randomness and the remaining part of the buffer is zeroed since
qcom_rng_generate() doesn't check the return value. This issue can
be reproduced by running the following from libkcapi:
The generated OUTFILE will have three huge sections that contain all
zeros, and this is caused by the code where the test
'val & PRNG_STATUS_DATA_AVAIL' fails.
Let's fix this issue by ensuring that qcom_rng_read() always returns
with a full buffer if the function returns success. Let's also have
qcom_rng_generate() return the correct value.
Here's some statistics from the ent project
(https://www.fourmilab.ch/random/) that shows information about the
quality of the generated numbers:
Optimum compression would reduce the size
of this 9000000 byte file by 2 percent.
Chi square distribution for 9000000 samples is 9329962.81, and
randomly would exceed this value less than 0.01 percent of the
times.
Arithmetic mean value of data bytes is 119.3731 (127.5 = random).
Monte Carlo value for Pi is 3.197293333 (error 1.77 percent).
Serial correlation coefficient is 0.159130 (totally uncorrelated =
0.0).
Without this patch, the results of the chi-square test is 0.01%, and
the numbers are certainly not random according to ent's project page.
The results improve with this patch:
Optimum compression would reduce the size
of this 9000000 byte file by 0 percent.
Chi square distribution for 9000000 samples is 258.77, and randomly
would exceed this value 42.24 percent of the times.
Arithmetic mean value of data bytes is 127.5006 (127.5 = random).
Monte Carlo value for Pi is 3.141277333 (error 0.01 percent).
Serial correlation coefficient is 0.000468 (totally uncorrelated =
0.0).
This change was tested on a Nexus 5 phone (msm8974 SoC).
Signed-off-by: Brian Masney <bmasney@redhat.com> Fixes: ceec5f5b5988 ("crypto: qcom-rng - Add Qcom prng driver") Cc: stable@vger.kernel.org # 4.19+ Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Reviewed-by: Andrew Halaney <ahalaney@redhat.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ab9337c7cb6f875b6286440b1adfbeeef2b2b2bd) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Eric Dumazet [Wed, 18 May 2022 04:39:44 +0000 (01:39 -0300)]
net/sched: cls_u32: fix netns refcount changes in u32_change()
We are now able to detect extra put_net() at the moment
they happen, instead of much later in correct code paths.
u32_init_knode() / tcf_exts_init() populates the ->exts.net
pointer, but as mentioned in tcf_exts_init(),
the refcount on netns has not been elevated yet.
The refcount is taken only once tcf_exts_get_net()
is called.
So the two u32_destroy_key() calls from u32_change()
are attempting to release an invalid reference on the netns.
Fixes: 35c55fc156d8 ("cls_u32: use tcf_exts_get_net() before call_rcu()") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 3db09e762dc79584a69c10d74a6b98f89a9979f8)
CVE-2022-29581 Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Acked-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Andrea Righi <andrea.righi@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Tadeusz Struk [Mon, 9 May 2022 14:00:54 +0000 (16:00 +0200)]
ext4: limit length to bitmap_maxbytes - blocksize in punch_hole
BugLink: https://bugs.launchpad.net/bugs/1972281
Syzbot found an issue [1] in ext4_fallocate().
The C reproducer [2] calls fallocate(), passing size 0xffeffeff000ul,
and offset 0x1000000ul, which, when added together exceed the
bitmap_maxbytes for the inode. This triggers a BUG in
ext4_ind_remove_space(). According to the comments in this function
the 'end' parameter needs to be one block after the last block to be
removed. In the case when the BUG is triggered it points to the last
block. Modify the ext4_punch_hole() function and add constraint that
caps the length to satisfy the one before laster block requirement.
Setting PTRACE_O_SUSPEND_SECCOMP is supposed to be a highly privileged
operation because it allows the tracee to completely bypass all seccomp
filters on kernels with CONFIG_CHECKPOINT_RESTORE=y. It is only supposed to
be settable by a process with global CAP_SYS_ADMIN, and only if that
process is not subject to any seccomp filters at all.
However, while these permission checks were done on the PTRACE_SETOPTIONS
path, they were missing on the PTRACE_SEIZE path, which also sets
user-specified ptrace flags.
Move the permissions checks out into a helper function and let both
ptrace_attach() and ptrace_setoptions() call it.
Cc: stable@kernel.org Fixes: 13c4a90119d2 ("seccomp: add ptrace options for suspend/resume") Signed-off-by: Jann Horn <jannh@google.com> Link: https://lkml.kernel.org/r/20220319010838.1386861-1-jannh@google.com Signed-off-by: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b6d75218ff65f4d63c9cf4986f6c55666fb90a1a) Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
CONFIG_STRICT_SIGALTSTACK_SIZE is intend for enforcing strict checking
of the sigaltstack size against the *real size of the FPU frame*,
enabling it is risky since it may lead to the broken of legacy
applications which already allocate a too small sigaltstack but can
still work because they never get a signal delivered. (lin-x-wang)
Fixes: cf1383fe60 ("x86/signal: Implement sigaltstack size validation") Signed-off-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Chang S. Bae [Sat, 29 Jan 2022 17:36:47 +0000 (09:36 -0800)]
selftests/x86/amx: Update the ARCH_REQ_XCOMP_PERM test
BugLink: https://bugs.launchpad.net/bugs/1967750
Update the arch_prctl test to check the permission bitmap whether the
requested feature is added as expected or not.
Every non-dynamic feature that is enabled is permitted already for use.
TILECFG is not dynamic feature. Ensure the bit is always on from
ARCH_GET_XCOMP_PERM.
Yang Zhong [Sat, 29 Jan 2022 17:36:46 +0000 (09:36 -0800)]
x86/fpu/xstate: Fix the ARCH_REQ_XCOMP_PERM implementation
BugLink: https://bugs.launchpad.net/bugs/1967750
ARCH_REQ_XCOMP_PERM is supposed to add the requested feature to the
permission bitmap of thread_group_leader()->fpu. But the code overwrites
the bitmap with the requested feature bit only rather than adding it.
Fix the code to add the requested feature bit to the master bitmask.
Fixes: db8268df0983 ("x86/arch_prctl: Add controls for dynamic XSTATE components") Signed-off-by: Yang Zhong <yang.zhong@intel.com> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Paolo Bonzini <bonzini@gnu.org> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20220129173647.27981-2-chang.seok.bae@intel.com
(backported from commit 063452fd94d153d4eb38ad58f210f3d37a09cca4) Acked-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Jim Mattson [Thu, 3 Feb 2022 19:43:07 +0000 (11:43 -0800)]
x86/cpufeatures: Put the AMX macros in the word 18 block
BugLink: https://bugs.launchpad.net/bugs/1967750
These macros are for bits in CPUID.(EAX=7,ECX=0):EDX, not for bits in
CPUID(EAX=7,ECX=1):EAX. Put them with their brethren.
[ bp: Sort word 18 bits properly, as caught by Like Xu
<like.xu.linux@gmail.com> ]
Support for large, "dynamic" fpstates was recently merged. This
included code to ensure that sigaltstacks are sufficiently sized for
these large states. A new lock was added to remove races between
enabling large features and setting up sigaltstacks.
== Problem ==
The new lock (sigaltstack_lock()) is acquired in the sigreturn path
before restoring the old sigaltstack. Unfortunately, contention on the
new lock causes a measurable signal handling performance regression [1].
However, the common case is that no *changes* are made to the
sigaltstack state at sigreturn.
== Solution ==
do_sigaltstack() acquires sigaltstack_lock() and is used for both
sys_sigaltstack() and restoring the sigaltstack in sys_sigreturn().
Check for changes to the sigaltstack before taking the lock. If no
changes were made, return before acquiring the lock.
This removes lock contention from the common-case sigreturn path.
Marco Elver [Fri, 26 Nov 2021 12:47:46 +0000 (13:47 +0100)]
x86/fpu/signal: Initialize sw_bytes in save_xstate_epilog()
BugLink: https://bugs.launchpad.net/bugs/1967750
save_sw_bytes() did not fully initialize sw_bytes, which caused KMSAN
to report an infoleak (see below).
Initialize sw_bytes explicitly to avoid this.
Local variable sw_bytes created at:
save_xstate_epilog+0x80/0x510 arch/x86/kernel/fpu/signal.c:121
copy_fpstate_to_sigframe+0x861/0xb60 arch/x86/kernel/fpu/signal.c:245
Bytes 20-47 of 48 are uninitialized
Memory access of size 48 starts at ffff8880801d3a18
Data copied to user address 00007ffd90e2ef50
=====================================================
Chang S. Bae [Tue, 26 Oct 2021 09:11:57 +0000 (02:11 -0700)]
Documentation/x86: Add documentation for using dynamic XSTATE features
BugLink: https://bugs.launchpad.net/bugs/1967750
Explain how dynamic XSTATE features can be enabled via the
architecture-specific prctl() along with dynamic sigframe size and
first use trap handling.
Fix:
Documentation/x86/xstate.rst:15: WARNING: Title underline too short.
as reported by Stephen Rothwell <sfr@canb.auug.org.au>
Chang S. Bae [Tue, 26 Oct 2021 12:25:25 +0000 (05:25 -0700)]
selftests/x86/amx: Add context switch test
BugLink: https://bugs.launchpad.net/bugs/1967750
XSAVE state is thread-local. The kernel switches between thread
state at context switch time. Generally, running a selftest for
a while will naturally expose it to some context switching and
and will test the XSAVE code.
Instead of just hoping that the tests get context-switched at
random times, force context-switches on purpose. Spawn off a few
userspace threads and force context-switches between them.
Ensure that the kernel correctly context switches each thread's
unique AMX state.
Chang S. Bae [Tue, 26 Oct 2021 12:25:24 +0000 (05:25 -0700)]
selftests/x86/amx: Add test cases for AMX state management
BugLink: https://bugs.launchpad.net/bugs/1967750
AMX TILEDATA is a very large XSAVE feature. It could have caused
nasty XSAVE buffer space waste in two places:
* Signal stacks
* Kernel task_struct->fpu buffers
To avoid this waste, neither of these buffers have AMX state by
default. The non-default features are called "dynamic" features.
There is an arch_prctl(ARCH_REQ_XCOMP_PERM) which allows a task
to declare that it wants to use AMX or other "dynamic" XSAVE
features. This arch_prctl() ensures that sufficient sigaltstack
space is available before it will succeed. It also expands the
task_struct buffer.
Functions of this test:
* Test arch_prctl(ARCH_REQ_XCOMP_PERM). Ensure that it checks for
proper sigaltstack sizing and that the sizing is enforced for
future sigaltstack calls.
* Ensure that ARCH_REQ_XCOMP_PERM is inherited across fork()
* Ensure that TILEDATA use before the prctl() is fatal
* Ensure that TILEDATA is cleared across fork()
Note: Generally, compiler support is needed to do something with
AMX. Instead, directly load AMX state from userspace with a
plain XSAVE. Do not depend on the compiler.
Chang S. Bae [Thu, 21 Oct 2021 22:55:27 +0000 (15:55 -0700)]
x86/fpu/amx: Enable the AMX feature in 64-bit mode
BugLink: https://bugs.launchpad.net/bugs/1967750
Add the AMX state components in XFEATURE_MASK_USER_SUPPORTED and the
TILE_DATA component to the dynamic states and update the permission check
table accordingly.
This is only effective on 64 bit kernels as for 32bit kernels
XFEATURE_MASK_TILE is defined as 0.
TILE_DATA is caller-saved state and the only dynamic state. Add build time
sanity check to ensure the assumption that every dynamic feature is caller-
saved.
Make AMX state depend on XFD as it is dynamic feature.
Chang S. Bae [Thu, 21 Oct 2021 22:55:26 +0000 (15:55 -0700)]
x86/fpu: Add XFD handling for dynamic states
BugLink: https://bugs.launchpad.net/bugs/1967750
To handle the dynamic sizing of buffers on first use the XFD MSR has to be
armed. Store the delta between the maximum available and the default
feature bits in init_fpstate where it can be retrieved for task creation.
If the delta is non zero then dynamic features are enabled. This needs also
to enable the static key which guards the XFD updates. This is delayed to
an initcall because the FPU setup runs before jump labels are initialized.
Chang S. Bae [Thu, 21 Oct 2021 22:55:25 +0000 (15:55 -0700)]
x86/fpu: Calculate the default sizes independently
BugLink: https://bugs.launchpad.net/bugs/1967750
When dynamically enabled states are supported the maximum and default sizes
for the kernel buffers and user space interfaces are not longer identical.
Put the necessary calculations in place which only take the default enabled
features into account.
Chang S. Bae [Thu, 21 Oct 2021 22:55:24 +0000 (15:55 -0700)]
x86/fpu/amx: Define AMX state components and have it used for boot-time checks
BugLink: https://bugs.launchpad.net/bugs/1967750
The XSTATE initialization uses check_xstate_against_struct() to sanity
check the size of XSTATE-enabled features. AMX is a XSAVE-enabled feature,
and its size is not hard-coded but discoverable at run-time via CPUID.
The AMX state is composed of state components 17 and 18, which are all user
state components. The first component is the XTILECFG state of a 64-byte
tile-related control register. The state component 18, called XTILEDATA,
contains the actual tile data, and the state size varies on
implementations. The architectural maximum, as defined in the CPUID(0x1d,
1): EAX[15:0], is a byte less than 64KB. The first implementation supports
8KB.
Check the XTILEDATA state size dynamically. The feature introduces the new
tile register, TMM. Define one register struct only and read the number of
registers from CPUID. Cross-check the overall size with CPUID again.
Chang S. Bae [Thu, 21 Oct 2021 22:55:23 +0000 (15:55 -0700)]
x86/fpu/xstate: Prepare XSAVE feature table for gaps in state component numbers
BugLink: https://bugs.launchpad.net/bugs/1967750
The kernel checks at boot time which features are available by walking a
XSAVE feature table which contains the CPUID feature bit numbers which need
to be checked whether a feature is available on a CPU or not. So far the
feature numbers have been linear, but AMX will create a gap which the
current code cannot handle.
Make the table entries explicitly indexed and adjust the loop code
accordingly to prepare for that.
No functional change.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Len Brown <len.brown@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-20-chang.seok.bae@intel.com
(cherry picked from commit 70c3f1671b0cbc386b387f1de33b7837e276a195) Acked-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Chang S. Bae [Thu, 21 Oct 2021 22:55:22 +0000 (15:55 -0700)]
x86/fpu/xstate: Add fpstate_realloc()/free()
BugLink: https://bugs.launchpad.net/bugs/1967750
The fpstate embedded in struct fpu is the default state for storing the FPU
registers. It's sized so that the default supported features can be stored.
For dynamically enabled features the register buffer is too small.
The #NM handler detects first use of a feature which is disabled in the
XFD MSR. After handling permission checks it recalculates the size for
kernel space and user space state and invokes fpstate_realloc() which
tries to reallocate fpstate and install it.
Provide the allocator function which checks whether the current buffer size
is sufficient and if not allocates one. If allocation is successful the new
fpstate is initialized with the new features and sizes and the now enabled
features is removed from the task's XFD mask.
realloc_fpstate() uses vzalloc(). If use of this mechanism grows to
re-allocate buffers larger than 64KB, a more sophisticated allocation
scheme that includes purpose-built reclaim capability might be justified.
Chang S. Bae [Thu, 21 Oct 2021 22:55:21 +0000 (15:55 -0700)]
x86/fpu/xstate: Add XFD #NM handler
BugLink: https://bugs.launchpad.net/bugs/1967750
If the XFD MSR has feature bits set then #NM will be raised when user space
attempts to use an instruction related to one of these features.
When the task has no permissions to use that feature, raise SIGILL, which
is the same behavior as #UD.
If the task has permissions, calculate the new buffer size for the extended
feature set and allocate a larger fpstate. In the unlikely case that
vzalloc() fails, SIGSEGV is raised.
The allocation function will be added in the next step. Provide a stub
which fails for now.
Chang S. Bae [Thu, 21 Oct 2021 22:55:20 +0000 (15:55 -0700)]
x86/fpu: Update XFD state where required
BugLink: https://bugs.launchpad.net/bugs/1967750
The IA32_XFD_MSR allows to arm #NM traps for XSTATE components which are
enabled in XCR0. The register has to be restored before the tasks XSTATE is
restored. The life time rules are the same as for FPU state.
XFD is updated on return to userspace only when the FPU state of the task
is not up to date in the registers. It's updated before the XRSTORS so
that eventually enabled dynamic features are restored as well and not
brought into init state.
Also in signal handling for restoring FPU state from user space the
correctness of the XFD state has to be ensured.
Chang S. Bae [Thu, 21 Oct 2021 22:55:18 +0000 (15:55 -0700)]
x86/fpu: Add XFD state to fpstate
BugLink: https://bugs.launchpad.net/bugs/1967750
Add storage for XFD register state to struct fpstate. This will be used to
store the XFD MSR state. This will be used for switching the XFD MSR when
FPU content is restored.
Add a per-CPU variable to cache the current MSR value so the MSR has only
to be written when the values are different.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-15-chang.seok.bae@intel.com
(cherry picked from commit 8bf26758ca9659866b844dd51037314b4c0fa6bd) Acked-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Chang S. Bae [Thu, 21 Oct 2021 22:55:16 +0000 (15:55 -0700)]
x86/cpufeatures: Add eXtended Feature Disabling (XFD) feature bit
BugLink: https://bugs.launchpad.net/bugs/1967750
Intel's eXtended Feature Disable (XFD) feature is an extension of the XSAVE
architecture. XFD allows the kernel to enable a feature state in XCR0 and
to receive a #NM trap when a task uses instructions accessing that state.
This is going to be used to postpone the allocation of a larger XSTATE
buffer for a task to the point where it is actually using a related
instruction after the permission to use that facility has been granted.
XFD is not used by the kernel, but only applied to userspace. This is a
matter of policy as the kernel knows how a fpstate is reallocated and the
XFD state.
The compacted XSAVE format is adjustable for dynamic features. Make XFD
depend on XSAVES.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-13-chang.seok.bae@intel.com
(cherry picked from commit c351101678ce54492b6e09810ec02efc0df036a9) Acked-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Chang S. Bae [Thu, 21 Oct 2021 22:55:15 +0000 (15:55 -0700)]
x86/fpu: Reset permission and fpstate on exec()
BugLink: https://bugs.launchpad.net/bugs/1967750
On exec(), extended register states saved in the buffer is cleared. With
dynamic features, each task carries variables besides the register states.
The struct fpu has permission information and struct fpstate contains
buffer size and feature masks. They are all dynamically updated with
dynamic features.
Reset the current task's entire FPU data before an exec() so that the new
task starts with default permission and fpstate.
Rename the register state reset function because the old naming confuses as
it does not reset struct fpstate.
Thomas Gleixner [Thu, 21 Oct 2021 22:55:14 +0000 (15:55 -0700)]
x86/fpu: Prepare fpu_clone() for dynamically enabled features
BugLink: https://bugs.launchpad.net/bugs/1967750
The default portion of the parent's FPU state is saved in a child task.
With dynamic features enabled, the non-default portion is not saved in a
child's fpstate because these register states are defined to be
caller-saved. The new task's fpstate is therefore the default buffer.
Fork inherits the permission of the parent.
Also, do not use memcpy() when TIF_NEED_FPU_LOAD is set because it is
invalid when the parent has dynamic features.
Chang S. Bae [Thu, 21 Oct 2021 22:55:13 +0000 (15:55 -0700)]
x86/fpu/signal: Prepare for variable sigframe length
BugLink: https://bugs.launchpad.net/bugs/1967750
The software reserved portion of the fxsave frame in the signal frame
is copied from structures which have been set up at boot time. With
dynamically enabled features the content of these structures is no
longer correct because the xfeatures and size can be different per task.
Calculate the software reserved portion at runtime and fill in the
xfeatures and size values from the tasks active fpstate.
Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021225527.10184-10-chang.seok.bae@intel.com
(cherry picked from commit 53599b4d54b9b8dda1d537a558946869d2acbddc) Acked-by: Dimitri John Ledkov <dimitri.ledkov@canonical.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Thomas Gleixner [Thu, 21 Oct 2021 22:55:12 +0000 (15:55 -0700)]
x86/signal: Use fpu::__state_user_size for sigalt stack validation
BugLink: https://bugs.launchpad.net/bugs/1967750
Use the current->group_leader->fpu to check for pending permissions to use
extended features and validate against the resulting user space size which
is stored in the group leaders fpu struct as well.
This prevents a task from installing a too small sized sigaltstack after
permissions to use dynamically enabled features have been granted, but
the task has not (yet) used a related instruction.