Lockdep reports the below circular locking dependency issue. The
mmap_lock acquisition while holding pci_bus_sem is due to the use of
copy_to_user() from within a pci_walk_bus() callback.
Building the devices array directly into the user buffer is only for
convenience. Instead we can allocate a local buffer for the array,
bounded by the number of devices on the bus/slot, fill the device
information into this local buffer, then copy it into the user buffer
outside the bus walk callback.
======================================================
WARNING: possible circular locking dependency detected
6.9.0-rc5+ #39 Not tainted
------------------------------------------------------
CPU 0/KVM/4113 is trying to acquire lock: ffff99a609ee18a8 (&vdev->vma_lock){+.+.}-{4:4}, at: vfio_pci_mmap_fault+0x35/0x1a0 [vfio_pci_core]
but task is already holding lock: ffff99a243a052a0 (&mm->mmap_lock){++++}-{4:4}, at: vaddr_get_pfns+0x3f/0x170 [vfio_iommu_type1]
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
... and use it to limit the virtual terminals to just N_TTY. They are
kind of special, and in particular, the "con_write()" routine violates
the "writes cannot sleep" rule that some ldiscs rely on.
This avoids the
BUG: sleeping function called from invalid context at kernel/printk/printk.c:2659
when N_GSM has been attached to a virtual console, and gsmld_write()
calls con_write() while holding a spinlock, and con_write() then tries
to get the console lock.
The standard PCIe configuration read-write interface is used to
access the configuration space of the peripheral PCIe devices
of the mips processor after the PCIe link surprise down, it can
generate kernel panic caused by "Data bus error". So it is
necessary to add PCIe link status check for system protection.
When the PCIe link is down or in training, assigning a value
of 0 to the configuration address can prevent read-write behavior
to the configuration space of peripheral PCIe devices, thereby
preventing kernel panic.
Signed-off-by: Songyang Li <leesongyang@outlook.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Hewlett-Packard HP Pavilion 17 Notebook PC/1972 is an Intel Ivy Bridge
system with a muxless AMD Radeon dGPU. Attempting to use the dGPU fails
with the following sequence:
ACPI Error: Aborting method \AMD3._ON due to previous error (AE_AML_LOOP_TIMEOUT) (20230628/psparse-529)
radeon 0000:01:00.0: not ready 1023ms after resume; waiting
radeon 0000:01:00.0: not ready 2047ms after resume; waiting
radeon 0000:01:00.0: not ready 4095ms after resume; waiting
radeon 0000:01:00.0: not ready 8191ms after resume; waiting
radeon 0000:01:00.0: not ready 16383ms after resume; waiting
radeon 0000:01:00.0: not ready 32767ms after resume; waiting
radeon 0000:01:00.0: not ready 65535ms after resume; giving up
radeon 0000:01:00.0: Unable to change power state from D3cold to D0, device inaccessible
The issue is that the Root Port the dGPU is connected to can't handle the
transition from D3cold to D0 so the dGPU can't properly exit runtime PM.
The existing logic in pci_bridge_d3_possible() checks for systems that are
newer than 2015 to decide that D3 is safe. This would nominally work for
an Ivy Bridge system (which was discontinued in 2015), but this system
appears to have continued to receive BIOS updates until 2017 and so this
existing logic doesn't appropriately capture it.
Add the system to bridge_d3_blacklist to prevent D3cold from being used.
Link: https://lore.kernel.org/r/20240307163709.323-1-mario.limonciello@amd.com Reported-by: Eric Heintzmann <heintzmann.eric@free.fr> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3229 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Eric Heintzmann <heintzmann.eric@free.fr> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
An overflow can occur in a situation where src.centiseconds
takes the value of 255. This situation is unlikely, but there
is no validation check anywere in the code.
Found by Linux Verification Center (linuxtesting.org) with Svace.
Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Roman Smirnov <r.smirnov@omp.ru> Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru> Signed-off-by: Jan Kara <jack@suse.cz>
Message-Id: <20240327132755.13945-1-r.smirnov@omp.ru> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Newer Qualcomm platforms (sm8450+) successfully handle busy state and
send the Command Completion after sending the Busy state. Older devices
have firmware bug and can not continue after sending the CCI_BUSY state,
but the command that leads to CCI_BUSY is already forbidden by the
NO_PARTNER_PDOS quirk.
Follow other UCSI glue drivers and drop special handling for CCI_BUSY
event. Let the UCSI core properly handle this state.
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20240408-qcom-ucsi-fixes-bis-v1-3-716c145ca4b1@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
The Lenovo Yoga Tablet 2 Pro 1380 model is the exception to the rule that
devices which use the Crystal Cove PMIC without using ACPI for battery and
AC power_supply class support use the USB-phy for charger detection.
Unlike the Lenovo Yoga Tablet 2 830 / 1050 models this model has an extra
LC824206XA Micro USB switch which does the charger detection.
Add a DMI quirk to not set the "linux,phy_charger_detect" property on
the 1380 model. This quirk matches on the BIOS version to differentiate
the 1380 model from the 830 and 1050 models which otherwise have
the same DMI strings.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com> Link: https://lore.kernel.org/r/20240406140127.17885-1-hdegoede@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
I have a use case where nr_buffers = 3 and in which each descriptor is composed by 3
segments, resulting in the DMA channel descs_allocated to be 9. Since axi_desc_put()
handles the hw_desc considering the descs_allocated, this scenario would result in a
kernel panic (hw_desc array will be overrun).
To fix this, the proposal is to add a new member to the axi_dma_desc structure,
where we keep the number of allocated hw_descs (axi_desc_alloc()) and use it in
axi_desc_put() to handle the hw_desc array correctly.
Additionally I propose to remove the axi_chan_start_first_queued() call after completing
the transfer, since it was identified that unbalance can occur (started descriptors can
be interrupted and transfer ignored due to DMA channel not being enabled).
The incompatible device in my possession has a sticker that says
"F5U002 Rev 2" and "P80453-B", and lsusb identifies it as
"050d:0002 Belkin Components IEEE-1284 Controller". There is a bug
report from 2007 from Michael Trausch who was seeing the exact same
errors that I saw in 2024 trying to use this cable.
In f2fs_remount, SB_INLINECRYPT flag will be clear and re-set.
If create new file or open file during this gap, these files
will not use inlinecrypt. Worse case, it may lead to data
corruption if wrappedkey_v0 is enable.
Thread A: Thread B:
-f2fs_remount -f2fs_file_open or f2fs_new_inode
-default_options
<- clear SB_INLINECRYPT flag
-fscrypt_select_encryption_impl
-parse_options
<- set SB_INLINECRYPT again
Signed-off-by: Yunlei He <heyunlei@oppo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
When setting the guid via configfs it is possible to test if
its value is one of the kernel supported ones by calling
uvc_format_by_guid on it. If the result is NULL, we know the
guid is unsupported and can be ignored.
Signed-off-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Link: https://lore.kernel.org/r/20240221-uvc-gadget-configfs-guid-v1-1-f0678ca62ebb@pengutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
If an error happens in ftrace, ftrace_kill() will prevent disarming
kprobes. Eventually, the ftrace_ops associated with the kprobes will be
freed, yet the kprobes will still be active, and when triggered, they
will use the freed memory, likely resulting in a page fault and panic.
This behavior can be reproduced quite easily, by creating a kprobe and
then triggering a ftrace_kill(). For simplicity, we can simulate an
ftrace error with a kernel module like [1]:
sudo perf probe --add commit_creds
sudo perf trace -e probe:commit_creds
# In another terminal
make
sudo insmod ftrace_killer.ko # calls ftrace_kill(), simulating bug
# Back to perf terminal
# ctrl-c
sudo perf probe --del commit_creds
After a short period, a page fault and panic would occur as the kprobe
continues to execute and uses the freed ftrace_ops. While ftrace_kill()
is supposed to be used only in extreme circumstances, it is invoked in
FTRACE_WARN_ON() and so there are many places where an unexpected bug
could be triggered, yet the system may continue operating, possibly
without the administrator noticing. If ftrace_kill() does not panic the
system, then we should do everything we can to continue operating,
rather than leave a ticking time bomb.
Link: https://lore.kernel.org/all/20240501162956.229427-1-stephen.s.brennan@oracle.com/ Signed-off-by: Stephen Brennan <stephen.s.brennan@oracle.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Guo Ren <guoren@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
In the following concurrency we will access the uninitialized rs->lock:
ext4_fill_super
ext4_register_sysfs
// sysfs registered msg_ratelimit_interval_ms
// Other processes modify rs->interval to
// non-zero via msg_ratelimit_interval_ms
ext4_orphan_cleanup
ext4_msg(sb, KERN_INFO, "Errors on filesystem, "
__ext4_msg
___ratelimit(&(EXT4_SB(sb)->s_msg_ratelimit_state)
if (!rs->interval) // do nothing if interval is 0
return 1;
raw_spin_trylock_irqsave(&rs->lock, flags)
raw_spin_trylock(lock)
_raw_spin_trylock
__raw_spin_trylock
spin_acquire(&lock->dep_map, 0, 1, _RET_IP_)
lock_acquire
__lock_acquire
register_lock_class
assign_lock_key
dump_stack();
ratelimit_state_init(&sbi->s_msg_ratelimit_state, 5 * HZ, 10);
raw_spin_lock_init(&rs->lock);
// init rs->lock here
and get the following dump_stack:
=========================================================
INFO: trying to register non-static key.
The code is fine but needs lockdep annotation, or maybe
you didn't initialize this object before use?
turning off the locking correctness validator.
CPU: 12 PID: 753 Comm: mount Tainted: G E 6.7.0-rc6-next-20231222 #504
[...]
Call Trace:
dump_stack_lvl+0xc5/0x170
dump_stack+0x18/0x30
register_lock_class+0x740/0x7c0
__lock_acquire+0x69/0x13a0
lock_acquire+0x120/0x450
_raw_spin_trylock+0x98/0xd0
___ratelimit+0xf6/0x220
__ext4_msg+0x7f/0x160 [ext4]
ext4_orphan_cleanup+0x665/0x740 [ext4]
__ext4_fill_super+0x21ea/0x2b10 [ext4]
ext4_fill_super+0x14d/0x360 [ext4]
[...]
=========================================================
Normally interval is 0 until s_msg_ratelimit_state is initialized, so
___ratelimit() does nothing. But registering sysfs precedes initializing
rs->lock, so it is possible to change rs->interval to a non-zero value
via the msg_ratelimit_interval_ms interface of sysfs while rs->lock is
uninitialized, and then a call to ext4_msg triggers the problem by
accessing an uninitialized rs->lock. Therefore register sysfs after all
initializations are complete to avoid such problems.
Signed-off-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240102133730.1098120-1-libaokun1@huawei.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
If devm_add_action() returns -ENOMEM, then MSIs are allocated but not
not freed on teardown. Use devm_add_action_or_reset() instead to keep
the static analyser happy.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Aleksandr Aprelkov <aaprelkov@usergate.com> Link: https://lore.kernel.org/r/20240403053759.643164-1-aaprelkov@usergate.com
[will: Tweak commit message, remove warning message] Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
The P2SB could get an invalid BAR from the BIOS, and that won't be fixed
up until pcibios_assign_resources(), which is an fs_initcall().
- Move p2sb_fs_init() to an fs_initcall_sync(). This is still early
enough to avoid a race with any dependent drivers.
- Add a check for IORESOURCE_UNSET in p2sb_valid_resource() to catch
unset BARs going forward.
- Return error values from p2sb_fs_init() so that the 'initcall_debug'
cmdline arg provides useful data.
Signed-off-by: Ben Fradella <bfradell@netapp.com> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Klara Modin <klarasmodin@gmail.com> Reviewed-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Link: https://lore.kernel.org/r/20240509164905.41016-1-bcfradella@proton.me Reviewed-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
With -Wextra clang warns about pointer arithmetic using a null pointer.
When building with CONFIG_PCI=n, that triggers a warning in the IO
accessors, eg:
In file included from linux/arch/powerpc/include/asm/io.h:672:
linux/arch/powerpc/include/asm/io-defs.h:23:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
23 | DEF_PCI_AC_RET(inb, u8, (unsigned long port), (port), pio, port)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
...
linux/arch/powerpc/include/asm/io.h:591:53: note: expanded from macro '__do_inb'
591 | #define __do_inb(port) readb((PCI_IO_ADDR)_IO_BASE + port);
| ~~~~~~~~~~~~~~~~~~~~~ ^
That is because when CONFIG_PCI=n, _IO_BASE is defined as 0.
Although _IO_BASE is defined as plain 0, the cast (PCI_IO_ADDR) converts
it to void * before the addition with port happens.
Instead the addition can be done first, and then the cast. The resulting
value will be the same, but avoids the warning, and also avoids void
pointer arithmetic which is apparently non-standard.
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Closes: https://lore.kernel.org/all/CA+G9fYtEh8zmq8k8wE-8RZwW-Qr927RLTn+KqGnq1F=ptaaNsA@mail.gmail.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240503075619.394467-1-mpe@ellerman.id.au Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Modify the code so it can be compiled tested in configurations that do
not have ACPI enabled.
It fixes the following errors:
drivers/media/pci/intel/ipu-bridge.c:103:30: error: implicit declaration of function ‘acpi_device_handle’; did you mean ‘acpi_fwnode_handle’? [-Werror=implicit-function-declaration]
drivers/media/pci/intel/ipu-bridge.c:103:30: warning: initialization of ‘acpi_handle’ {aka ‘void *’} from ‘int’ makes pointer from integer without a cast [-Wint-conversion]
drivers/media/pci/intel/ipu-bridge.c:110:17: error: implicit declaration of function ‘for_each_acpi_dev_match’ [-Werror=implicit-function-declaration]
drivers/media/pci/intel/ipu-bridge.c:110:74: error: expected ‘;’ before ‘for_each_acpi_consumer_dev’
drivers/media/pci/intel/ipu-bridge.c:104:29: warning: unused variable ‘consumer’ [-Wunused-variable]
drivers/media/pci/intel/ipu-bridge.c:103:21: warning: unused variable ‘handle’ [-Wunused-variable]
drivers/media/pci/intel/ipu-bridge.c:166:38: error: invalid use of undefined type ‘struct acpi_device’
drivers/media/pci/intel/ipu-bridge.c:185:43: error: invalid use of undefined type ‘struct acpi_device’
drivers/media/pci/intel/ipu-bridge.c:191:30: error: invalid use of undefined type ‘struct acpi_device’
drivers/media/pci/intel/ipu-bridge.c:196:30: error: invalid use of undefined type ‘struct acpi_device’
drivers/media/pci/intel/ipu-bridge.c:202:30: error: invalid use of undefined type ‘struct acpi_device’
drivers/media/pci/intel/ipu-bridge.c:223:31: error: invalid use of undefined type ‘struct acpi_device’
drivers/media/pci/intel/ipu-bridge.c:236:18: error: implicit declaration of function ‘acpi_get_physical_device_location’ [-Werror=implicit-function-declaration]
drivers/media/pci/intel/ipu-bridge.c:236:56: error: invalid use of undefined type ‘struct acpi_device’
drivers/media/pci/intel/ipu-bridge.c:238:31: error: invalid use of undefined type ‘struct acpi_device’
drivers/media/pci/intel/ipu-bridge.c:256:31: error: invalid use of undefined type ‘struct acpi_device’
drivers/media/pci/intel/ipu-bridge.c:275:31: error: invalid use of undefined type ‘struct acpi_device’
drivers/media/pci/intel/ipu-bridge.c:280:30: error: invalid use of undefined type ‘struct acpi_device’
drivers/media/pci/intel/ipu-bridge.c:469:26: error: implicit declaration of function ‘acpi_device_hid’; did you mean ‘dmi_device_id’? [-Werror=implicit-function-declaration]
drivers/media/pci/intel/ipu-bridge.c:468:74: warning: format ‘%s’ expects argument of type ‘char *’, but argument 4 has type ‘int’ [-Wformat=]
drivers/media/pci/intel/ipu-bridge.c:637:58: error: expected ‘;’ before ‘{’ token
drivers/media/pci/intel/ipu-bridge.c:696:1: warning: label ‘err_put_adev’ defined but not used [-Wunused-label]
drivers/media/pci/intel/ipu-bridge.c:693:1: warning: label ‘err_put_ivsc’ defined but not used [-Wunused-label]
drivers/media/pci/intel/ipu-bridge.c:691:1: warning: label ‘err_free_swnodes’ defined but not used [-Wunused-label]
drivers/media/pci/intel/ipu-bridge.c:632:40: warning: unused variable ‘primary’ [-Wunused-variable]
drivers/media/pci/intel/ipu-bridge.c:632:31: warning: unused variable ‘fwnode’ [-Wunused-variable]
drivers/media/pci/intel/ipu-bridge.c:733:73: error: expected ‘;’ before ‘{’ token
drivers/media/pci/intel/ipu-bridge.c:725:24: warning: unused variable ‘csi_dev’ [-Wunused-variable]
drivers/media/pci/intel/ipu-bridge.c:724:43: warning: unused variable ‘adev’ [-Wunused-variable]
drivers/media/pci/intel/ipu-bridge.c:599:12: warning: ‘ipu_bridge_instantiate_ivsc’ defined but not used [-Wunused-function]
drivers/media/pci/intel/ipu-bridge.c:444:13: warning: ‘ipu_bridge_create_connection_swnodes’ defined but not used [-Wunused-function]
drivers/media/pci/intel/ipu-bridge.c:297:13: warning: ‘ipu_bridge_create_fwnode_properties’ defined but not used [-Wunused-function]
drivers/media/pci/intel/ipu-bridge.c:155:12: warning: ‘ipu_bridge_check_ivsc_dev’ defined but not used [-Wunused-function]
Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
plpar_hcall(), plpar_hcall9(), and related functions expect callers to
provide valid result buffers of certain minimum size. Currently this
is communicated only through comments in the code and the compiler has
no idea.
For example, if I write a bug like this:
long retbuf[PLPAR_HCALL_BUFSIZE]; // should be PLPAR_HCALL9_BUFSIZE
plpar_hcall9(H_ALLOCATE_VAS_WINDOW, retbuf, ...);
This compiles with no diagnostics emitted, but likely results in stack
corruption at runtime when plpar_hcall9() stores results past the end
of the array. (To be clear this is a contrived example and I have not
found a real instance yet.)
To make this class of error less likely, we can use explicitly-sized
array parameters instead of pointers in the declarations for the hcall
APIs. When compiled with -Warray-bounds[1], the code above now
provokes a diagnostic like this:
error: array argument is too small;
is of size 32, callee requires at least 72 [-Werror,-Warray-bounds]
60 | plpar_hcall9(H_ALLOCATE_VAS_WINDOW, retbuf,
| ^ ~~~~~~
[1] Enabled for LLVM builds but not GCC for now. See commit 0da6e5fd6c37 ("gcc: disable '-Warray-bounds' for gcc-13 too") and
related changes.
Signed-off-by: Nathan Lynch <nathanl@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240408-pseries-hvcall-retbuf-v1-1-ebc73d7253cf@linux.ibm.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
There is a race condition in which a rendering job might take just long
enough to trigger the drm sched job timeout handler but also still
complete before the hard reset is done by the timeout handler.
This runs into race conditions not expected by the timeout handler.
In some very specific cases it currently may result in a refcount
imbalance on lima_pm_idle, with a stack dump such as:
We can prevent that race condition entirely by masking the irqs at the
beginning of the timeout handler, at which point we give up on waiting
for that job entirely.
The irqs will be enabled again at the next hard reset which is already
done as a recovery by the timeout handler.
This is needed because we want to reset those devices in device-agnostic
code such as lima_sched.
In particular, masking irqs will be useful before a hard reset to
prevent race conditions.
The Z830 has some buttons that will only work properly as "quickstart"
buttons. To enable them in that mode, a value between 1 and 7 must be
used for HCI_HOTKEY_EVENT. Windows uses 0x5 on this laptop so use that for
maximum predictability and compatibility.
As there is not yet a known way of auto detection, this patch uses a DMI
quirk table. A module parameter is exposed to allow setting this on other
models for testing.
Signed-off-by: Arvid Norlander <lkml@vorpal.se> Tested-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20240131111641.4418-3-W_Armin@gmx.de Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
[WHY]
PSP can access DCN registers during command submission and we need
to ensure that DCN is not in PG before doing so.
[HOW]
Add a callback to DM to lock and notify DC for idle optimization exit.
It can't be DC directly because of a potential race condition with the
link protection thread and the rest of DM operation.
Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Charlene Liu <charlene.liu@amd.com> Acked-by: Alex Hung <alex.hung@amd.com> Signed-off-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
This fixes some CHECKs reported by the checkpatch script.
Issues reported in ath3k.c:
-------
ath3k.c
-------
CHECK: Please don't use multiple blank lines
+
+
CHECK: Blank lines aren't necessary after an open brace '{'
+static const struct usb_device_id ath3k_blist_tbl[] = {
+
CHECK: Alignment should match open parenthesis
+static int ath3k_load_firmware(struct usb_device *udev,
+ const struct firmware *firmware)
CHECK: Alignment should match open parenthesis
+ err = usb_bulk_msg(udev, pipe, send_buf, size,
+ &len, 3000);
CHECK: Unnecessary parentheses around 'len != size'
+ if (err || (len != size)) {
CHECK: Alignment should match open parenthesis
+static int ath3k_get_version(struct usb_device *udev,
+ struct ath3k_version *version)
CHECK: Alignment should match open parenthesis
+static int ath3k_load_fwfile(struct usb_device *udev,
+ const struct firmware *firmware)
CHECK: Alignment should match open parenthesis
+ err = usb_bulk_msg(udev, pipe, send_buf, size,
+ &len, 3000);
CHECK: Unnecessary parentheses around 'len != size'
+ if (err || (len != size)) {
CHECK: Blank lines aren't necessary after an open brace '{'
+ switch (fw_version.ref_clock) {
+
CHECK: Alignment should match open parenthesis
+ snprintf(filename, ATH3K_NAME_LEN, "ar3k/ramps_0x%08x_%d%s",
+ le32_to_cpu(fw_version.rom_version), clk_value, ".dfu");
CHECK: Alignment should match open parenthesis
+static int ath3k_probe(struct usb_interface *intf,
+ const struct usb_device_id *id)
CHECK: Alignment should match open parenthesis
+ BT_ERR("Firmware file \"%s\" not found",
+ ATH3K_FIRMWARE);
CHECK: Alignment should match open parenthesis
+ BT_ERR("Firmware file \"%s\" request failed (err=%d)",
+ ATH3K_FIRMWARE, ret);
Signed-off-by: Uri Arev <me@wantyapps.xyz> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Lenovo Slim 7 16ARH7 is a machine with switchable graphics between AMD
and Nvidia, and the backlight can't be adjusted properly unless
acpi_backlight=native is passed. Although nvidia-wmi-backlight is
present and loaded, this doesn't work as expected at all.
For making it working as default, add the corresponding quirk entry
with a DMI matching "LENOVO" "82UX".
Link: https://bugzilla.suse.com/show_bug.cgi?id=1217750 Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Adjusts the report descriptor for N-Key devices to
make the output count 0x01 which completely avoids
the need for a block of filtering.
Signed-off-by: Luke D. Jones <luke@ljones.dev> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
This device sometimes doesn't send touch release signals when moving
from >=4 fingers to <4 fingers. Using MT_QUIRK_NOT_SEEN_MEANS_UP instead
of MT_QUIRK_ALWAYS_VALID makes sure that no touches become stuck.
MT_QUIRK_FORCE_MULTI_INPUT is not necessary for this device, but does no
harm.
Signed-off-by: Sean O'Brien <seobrien@chromium.org> Signed-off-by: Jiri Kosina <jkosina@suse.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
During chip recovery (e.g. chip reset), there is a possible situation that
kernel worker reset_work is holding the lock and waiting for kernel thread
stat_worker to be parked, while stat_worker is waiting for the release of
the same lock.
It causes a deadlock resulting in the dumping of hung tasks messages and
possible rebooting of the device.
This patch prevents the execution of stat_worker during the chip recovery.
Signed-off-by: Leon Yen <leon.yen@mediatek.com> Signed-off-by: Ming Yen Hsieh <MingYen.Hsieh@mediatek.com> Signed-off-by: Felix Fietkau <nbd@nbd.name> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
BUG: KCSAN: data-race in net_rx_action / netpoll_send_skb
write (marked) to 0xffff8881164168b0 of 4 bytes by interrupt on cpu 10:
net_rx_action (./include/linux/netpoll.h:90 net/core/dev.c:6712 net/core/dev.c:6822)
<snip>
read to 0xffff8881164168b0 of 4 bytes by task 1 on cpu 2:
netpoll_send_skb (net/core/netpoll.c:319 net/core/netpoll.c:345 net/core/netpoll.c:393)
netpoll_send_udp (net/core/netpoll.c:?)
<snip>
value changed: 0x0000000a -> 0xffffffff
This happens because netpoll_owner_active() needs to check if the
current CPU is the owner of the lock, touching napi->poll_owner
non atomically. The ->poll_owner field contains the current CPU holding
the lock.
Use an atomic read to check if the poll owner is the current CPU.
Signed-off-by: Breno Leitao <leitao@debian.org> Link: https://lore.kernel.org/r/20240429100437.3487432-1-leitao@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
This switch family supports four LEDs for each of its six ports. Each
LED group is composed of one of these four LEDs from all six ports. LED
groups can be configured to display hardware information, such as link
activity, or manually controlled through a bitmap in registers
RTL8366RB_LED_0_1_CTRL_REG and RTL8366RB_LED_2_3_CTRL_REG.
After a reset, the default LED group configuration for groups 0 to 3
indicates, respectively, link activity, link at 1000M, 100M, and 10M, or
RTL8366RB_LED_CTRL_REG as 0x5432. These configurations are commonly used
for LED indications. However, the driver was replacing that
configuration to use manually controlled LEDs (RTL8366RB_LED_FORCE)
without providing a way for the OS to control them. The default
configuration is deemed more useful than fixed, uncontrollable turned-on
LEDs.
The driver was enabling/disabling LEDs during port_enable/disable.
However, these events occur when the port is administratively controlled
(up or down) and are not related to link presence. Additionally, when a
port N was disabled, the driver was turning off all LEDs for group N,
not only the corresponding LED for port N in any of those 4 groups. In
such cases, if port 0 was brought down, the LEDs for all ports in LED
group 0 would be turned off. As another side effect, the driver was
wrongly warning that port 5 didn't have an LED ("no LED for port 5").
Since showing the administrative state of ports is not an orthodox way
to use LEDs, it was not worth it to fix it and all this code was
dropped.
The code to disable LEDs was simplified only changing each LED group to
the RTL8366RB_LED_OFF state. Registers RTL8366RB_LED_0_1_CTRL_REG and
RTL8366RB_LED_2_3_CTRL_REG are only used when the corresponding LED
group is configured with RTL8366RB_LED_FORCE and they don't need to be
cleaned. The code still references an LED controlled by
RTL8366RB_INTERRUPT_CONTROL_REG, but as of now, no test device has
actually used it. Also, some magic numbers were replaced by macros.
Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
There is a 'malloc' call, which can be unsuccessful.
This patch will add the malloc failure checking
to avoid possible null dereference and give more information
about test fail reasons.
Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Link: https://lore.kernel.org/r/20240423082102.2018886-1-chentao@kylinos.cn Signed-off-by: Will Deacon <will@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
This happens when TC does a mirred egress redirect from the root qdisc of
device A to the root qdisc of device B. As long as these two locks aren't
protecting the same qdisc, they can be acquired in chain: add a per-qdisc
lockdep key to silence false warnings.
This dynamic key should safely replace the static key we have in sch_htb:
it was added to allow enqueueing to the device "direct qdisc" while still
holding the qdisc root lock.
v2: don't use static keys anymore in HTB direct qdiscs (thanks Eric Dumazet)
CC: Maxim Mikityanskiy <maxim@isovalent.com> CC: Xiumei Mu <xmu@redhat.com> Reported-by: Christoph Paasch <cpaasch@apple.com> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/451 Signed-off-by: Davide Caratti <dcaratti@redhat.com> Link: https://lore.kernel.org/r/7dc06d6158f72053cf877a82e2a7a5bd23692faa.1713448007.git.dcaratti@redhat.com Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
This copper module comes with broken TX_FAULT indicator which must be
ignored for it to work.
Co-authored-by: Josef Schlehofer <pepe.schlehofer@gmail.com> Signed-off-by: Daniel Golle <daniel@makrotopia.org>
[ rebased on top of net-next ] Signed-off-by: Marek Behún <kabel@kernel.org> Link: https://lore.kernel.org/r/20240423090025.29231-1-kabel@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
trace_drop_common() is called with preemption disabled, and it acquires
a spin_lock. This is problematic for RT kernels because spin_locks are
sleeping locks in this configuration, which causes the following splat:
BUG: sleeping function called from invalid context at kernel/locking/spinlock_rt.c:48
in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 449, name: rcuc/47
preempt_count: 1, expected: 0
RCU nest depth: 2, expected: 2
5 locks held by rcuc/47/449:
#0: ff1100086ec30a60 ((softirq_ctrl.lock)){+.+.}-{2:2}, at: __local_bh_disable_ip+0x105/0x210
#1: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0xbf/0x130
#2: ffffffffb394a280 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip+0x11c/0x210
#3: ffffffffb394a160 (rcu_callback){....}-{0:0}, at: rcu_do_batch+0x360/0xc70
#4: ff1100086ee07520 (&data->lock){+.+.}-{2:2}, at: trace_drop_common.constprop.0+0xb5/0x290
irq event stamp: 139909
hardirqs last enabled at (139908): [<ffffffffb1df2b33>] _raw_spin_unlock_irqrestore+0x63/0x80
hardirqs last disabled at (139909): [<ffffffffb19bd03d>] trace_drop_common.constprop.0+0x26d/0x290
softirqs last enabled at (139892): [<ffffffffb07a1083>] __local_bh_enable_ip+0x103/0x170
softirqs last disabled at (139898): [<ffffffffb0909b33>] rcu_cpu_kthread+0x93/0x1f0
Preemption disabled at:
[<ffffffffb1de786b>] rt_mutex_slowunlock+0xab/0x2e0
CPU: 47 PID: 449 Comm: rcuc/47 Not tainted 6.9.0-rc2-rt1+ #7
Hardware name: Dell Inc. PowerEdge R650/0Y2G81, BIOS 1.6.5 04/15/2022
Call Trace:
<TASK>
dump_stack_lvl+0x8c/0xd0
dump_stack+0x14/0x20
__might_resched+0x21e/0x2f0
rt_spin_lock+0x5e/0x130
? trace_drop_common.constprop.0+0xb5/0x290
? skb_queue_purge_reason.part.0+0x1bf/0x230
trace_drop_common.constprop.0+0xb5/0x290
? preempt_count_sub+0x1c/0xd0
? _raw_spin_unlock_irqrestore+0x4a/0x80
? __pfx_trace_drop_common.constprop.0+0x10/0x10
? rt_mutex_slowunlock+0x26a/0x2e0
? skb_queue_purge_reason.part.0+0x1bf/0x230
? __pfx_rt_mutex_slowunlock+0x10/0x10
? skb_queue_purge_reason.part.0+0x1bf/0x230
trace_kfree_skb_hit+0x15/0x20
trace_kfree_skb+0xe9/0x150
kfree_skb_reason+0x7b/0x110
skb_queue_purge_reason.part.0+0x1bf/0x230
? __pfx_skb_queue_purge_reason.part.0+0x10/0x10
? mark_lock.part.0+0x8a/0x520
...
trace_drop_common() also disables interrupts, but this is a minor issue
because we could easily replace it with a local_lock.
Replace the spin_lock with raw_spin_lock to avoid sleeping in atomic
context.
Signed-off-by: Wander Lairson Costa <wander@redhat.com> Reported-by: Hu Chunyu <chuhu@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
The x86 Android tablets on which quirks to skip looking for a matching
UartSerialBus resource and instead unconditionally create a serial bus
device (serdev) are necessary there are 2 sorts of serialports:
ACPI enumerated highspeed designware UARTs, these are the ones which
typcially need to be skipped since they need a serdev for the attached
BT HCI.
A PNP enumerated UART which is part of the PCU. So far the existing
quirks have ignored this. But on the Lenovo Yoga Tablet 2 Pro 1380
models this is used for a custom fastcharging protocol. There is
a Micro USB switch which can switch the USB data lines to this uart
and then a 600 baud protocol is used to configure the charger for
a voltage higher then 5V.
Add a new ACPI_QUIRK_PNP_UART1_SKIP quirk type and set this for
the existing entry for the Lenovo Yoga Tablet 2 830 / 1050 models.
Note this will lead to unnecessarily also creating a serdev for
the PCU UART on the 830 / 1050 which don't need this, but the UART
is not used otherwise there so that is not a problem.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
memcpy: detected field-spanning write (size 28) of single field "dst" at include/linux/sockptr.h:49 (size 16)
Refactor the code to be more explicit.
Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Reviewed-by: Kees Cook <keescook@chromium.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
gcc-9 and some other older versions produce a false-positive warning
for zeroing two fields
In file included from include/linux/string.h:369,
from drivers/net/wireless/ath/ath9k/main.c:18:
In function 'fortify_memset_chk',
inlined from 'ath9k_ps_wakeup' at drivers/net/wireless/ath/ath9k/main.c:140:3:
include/linux/fortify-string.h:462:25: error: call to '__write_overflow_field' declared with attribute warning: detected write beyond size of field (1st parameter); maybe use struct_group()? [-Werror=attribute-warning]
462 | __write_overflow_field(p_size_field, size);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Using a struct_group seems to reliably avoid the warning and
not make the code much uglier. The combined memset() should even
save a couple of cpu cycles.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://msgid.link/20240328135509.3755090-3-arnd@kernel.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
In the test, after map is closed and then after two rcu grace periods,
it is assumed that map_id is not available to user space.
But the above assumption cannot be guaranteed. After zero or one
or two rcu grace periods in different siturations, the actual
freeing-map-work is put into a workqueue. Later on, when the work
is dequeued, the map will be actually freed.
See bpf_map_put() in kernel/bpf/syscall.c.
By using workqueue, there is no ganrantee that map will be actually
freed after a couple of rcu grace periods. This patch removed
such map leak detection and then the test can pass consistently.
Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240322061353.632136-1-yonghong.song@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
In some systems, the netcat server can incur in delay to start listening.
When this happens, the test can randomly fail in various points.
This is an example error message:
# ip gre none gso
# encap 192.168.1.1 to 192.168.1.2, type gre, mac none len 2000
# test basic connectivity
# Ncat: Connection refused.
The issue stems from a race condition between the netcat client and server.
The test author had addressed this problem by implementing a sleep, which
I have removed in this patch.
This patch introduces a function capable of sleeping for up to two seconds.
However, it can terminate the waiting period early if the port is reported
to be listening.
The ssb_device_uevent() function first attempts to convert the 'dev' pointer
to 'struct ssb_device *'. However, it mistakenly dereferences 'dev' before
performing the NULL check, potentially leading to a NULL pointer
dereference if 'dev' is NULL.
To fix this issue, move the NULL check before dereferencing the 'dev' pointer,
ensuring that the pointer is valid before attempting to use it.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Signed-off-by: Rand Deeb <rand.sec96@gmail.com> Signed-off-by: Kalle Valo <kvalo@kernel.org> Link: https://msgid.link/20240306123028.164155-1-rand.sec96@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Historically, the signed integer overflow sanitizer did not work in the
kernel due to its interaction with `-fwrapv` but this has since been
changed [1] in the newest version of Clang; It was re-enabled in the
kernel with Commit 557f8c582a9ba8ab ("ubsan: Reintroduce signed overflow
sanitizer").
Let's rework this overflow checking logic to not actually perform an
overflow during the check itself, thus avoiding the UBSAN splat.
When the rcutorture tests start to exit, the rcu_torture_cleanup() is
invoked to stop kthreads and release resources, if the stall-task
kthreads exist, cpu-stall has started and the rcutorture.stall_cpu
is set to a larger value, the rcu_torture_cleanup() will be blocked
for a long time and the hung-task may occur, this commit therefore
add kthread_should_stop() to the loop of cpu-stall operation, when
rcutorture tests ends, no need to wait for cpu-stall to end, exit
directly.
The "pipe_count > RCU_TORTURE_PIPE_LEN" check has a comment saying "Should
not happen, but...". This is only true when testing an RCU whose grace
periods are always long enough. This commit therefore fixes this comment.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Closes: https://lore.kernel.org/lkml/CAHk-=wi7rJ-eGq+xaxVfzFEgbL9tdf6Kc8Z89rCpfcQOKm74Tw@mail.gmail.com/ Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
1) The command type does something on the prep side that triggers an
audit call.
2) The thread hasn't done any operations before this that triggered
an audit call inside ->issue(), where we have audit_uring_entry()
and audit_uring_exit().
Work around this by issuing a blanket NOP operation before the SQPOLL
does anything.
Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
When the qm uninit command is executed, the err data needs to
be released to prevent memory leakage. The error information
release operation and uacce_remove are integrated in
qm_remove_uacce.
So add the qm_remove_uacce to qm uninit to avoid err memory
leakage.
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
The AIV is one of the SEC resources. When releasing resources,
it need to release the AIV resources at the same time.
Otherwise, memory leakage occurs.
The aiv resource release is added to the sec resource release
function.
Signed-off-by: Chenghai Huang <huangchenghai2@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
As the old padata code can execute in softirq context, disable
softirqs for the new padata_do_mutithreaded code too as otherwise
lockdep will get antsy.
Reported-by: syzbot+0cb5bb0f4bf9e79db3b3@syzkaller.appspotmail.com Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Acked-by: Daniel Jordan <daniel.m.jordan@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Manuel Diewald <manuel.diewald@canonical.com> Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
For case there is no more inodes for IO in io list from last wb_writeback,
We may bail out early even there is inode in dirty list should be written
back. Only bail out when we queued once to avoid missing dirtied inode.
BugLink: https://bugs.launchpad.net/bugs/2073358
Commit 861e8086029e ("e1000e: move force SMBUS from enable ulp function
to avoid PHY loss issue") resolved a PHY access loss during suspend on
Meteor Lake consumer platforms, but it affected corporate systems
incorrectly.
A better fix, working for both consumer and corporate systems, was
proposed in commit bfd546a552e1 ("e1000e: move force SMBUS near the end
of enable_ulp function"). However, it introduced a regression on older
devices, such as [8086:15B8], [8086:15F9], [8086:15BE].
This patch aims to fix the secondary regression, by limiting the scope of
the changes to Meteor Lake platforms only.
Fixes: bfd546a552e1 ("e1000e: move force SMBUS near the end of enable_ulp function") Reported-by: Todd Brandt <todd.e.brandt@intel.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218940 Reported-by: Dieter Mummenschanz <dmummenschanz@web.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218936 Signed-off-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Mor Bar-Gabay <morx.bar.gabay@intel.com> (A Contingent Worker at Intel) Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20240709203123.2103296-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 76a0a3f9cc2fbd0e56671706bb74a9a988397898) Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Acked-by: Thibault Ferrante <thibault.ferrante@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Andrei Gherzan <andrei.gherzan@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Hui Wang [Thu, 18 Jul 2024 08:57:16 +0000 (16:57 +0800)]
e1000e: move force SMBUS near the end of enable_ulp function
BugLink: https://bugs.launchpad.net/bugs/2073358
The commit 861e8086029e ("e1000e: move force SMBUS from enable ulp
function to avoid PHY loss issue") introduces a regression on
PCH_MTP_I219_LM18 (PCIID: 0x8086550A). Without the referred commit, the
ethernet works well after suspend and resume, but after applying the
commit, the ethernet couldn't work anymore after the resume and the
dmesg shows that the NIC link changes to 10Mbps (1000Mbps originally):
[ 43.305084] e1000e 0000:00:1f.6 enp0s31f6: NIC Link is Up 10 Mbps Full Duplex, Flow Control: Rx/Tx
Without the commit, the force SMBUS code will not be executed if
"return 0" or "goto out" is executed in the enable_ulp(), and in my
case, the "goto out" is executed since FWSM_FW_VALID is set. But after
applying the commit, the force SMBUS code will be ran unconditionally.
Here move the force SMBUS code back to enable_ulp() and put it
immediately ahead of hw->phy.ops.release(hw), this could allow the
longest settling time as possible for interface in this function and
doesn't change the original code logic.
The issue was found on a Lenovo laptop with the ethernet hw as below:
00:1f.6 Ethernet controller [0200]: Intel Corporation Device [8086:550a]
(rev 20).
And this patch is verified (cable plug and unplug, system suspend
and resume) on Lenovo laptops with ethernet hw: [8086:550a],
[8086:550b], [8086:15bb], [8086:15be], [8086:1a1f], [8086:1a1c] and
[8086:0dc7].
Fixes: 861e8086029e ("e1000e: move force SMBUS from enable ulp function to avoid PHY loss issue") Signed-off-by: Hui Wang <hui.wang@canonical.com> Acked-by: Vitaly Lifshits <vitaly.lifshits@intel.com> Tested-by: Naama Meir <naamax.meir@linux.intel.com> Reviewed-by: Simon Horman <horms@kernel.org> Reviewed-by: Paul Menzel <pmenzel@molgen.mpg.de> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com> Tested-by: Zhang Rui <rui.zhang@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://lore.kernel.org/r/20240528-net-2024-05-28-intel-net-fixes-v1-1-dc8593d2bbc6@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit bfd546a552e140b0a4c8a21527c39d6d21addb28) Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Acked-by: Thibault Ferrante <thibault.ferrante@canonical.com> Acked-by: Noah Wager <noah.wager@canonical.com> Acked-by: Andrei Gherzan <andrei.gherzan@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
When using an initializer for a union only one of the union members
must be initialized. The initializer for the acpi_object union variable
passed as argument to the SID ACPI method was initializing both
the type and the integer members of the union.
Unfortunately rather then complaining about this gcc simply ignores
the first initializer and only used the second integer.value = 1
initializer. Leaving type set to 0 which leads to the argument being
skipped by acpi acpi_ns_evaluate() resulting in:
Since IWL_FW_CMD_VER_UNKNOWN = 99, then my change to consider
cmd_ver >= 7 instead of cmd_ver = 7 included also firmwares that don't
advertise the command version at all. This made us send a command with a
bad size and because of that, the firmware hit a BAD_COMMAND immediately
after handling the REDUCE_TX_POWER_CMD command.
debug_pagealloc is a debug feature which clears the valid bit in page table
entry for freed pages to detect illegal accesses to freed memory.
For this feature to work, virtual mapping must have PAGE_SIZE resolution.
(No, we cannot map with huge pages and split them only when needed; because
pages can be allocated/freed in atomic context and page splitting cannot be
done in atomic context)
Force linear mapping to use small pages if debug_pagealloc is enabled.
Note that it is not necessary to force the entire linear mapping, but only
those that are given to memory allocator. Some parts of memory can keep
using huge page mapping (for example, kernel's executable code). But these
parts are minority, so keep it simple. This is just a debug feature, some
extra overhead should be acceptable.
[Why]
Commit:
- commit 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload allocation/removement")
accidently overwrite the commit
- commit 54d217406afe ("drm: use mgr->dev in drm_dbg_kms in drm_dp_add_payload_part2")
which cause regression.
[How]
Recover the original NULL fix and remove the unnecessary input parameter 'state' for
drm_dp_add_payload_part2().
Fixes: 5aa1dfcdf0a4 ("drm/mst: Refactor the flow for payload allocation/removement") Reported-by: Leon Weiß <leon.weiss@ruhr-uni-bochum.de> Link: https://lore.kernel.org/r/38c253ea42072cc825dc969ac4e6b9b600371cc8.camel@ruhr-uni-bochum.de/ Cc: lyude@redhat.com Cc: imre.deak@intel.com Cc: stable@vger.kernel.org Cc: regressions@lists.linux.dev Reviewed-by: Harry Wentland <harry.wentland@amd.com> Acked-by: Jani Nikula <jani.nikula@intel.com> Signed-off-by: Wayne Lin <Wayne.Lin@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240307062957.2323620-1-Wayne.Lin@amd.com
(cherry picked from commit 4545614c1d8da603e57b60dd66224d81b6ffc305) Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Temperature is stored as 16bit value in two's complement format. Current
implementation ignores the sign bit. Make it aware of the sign bit by
using sign_extend32.
If CONFIG_DRM_AUX_HPD_BRIDGE is not enabled, the aux-bridge.h header
provides a stub for the bridge's functions. Correct the arguments list
of one of those stubs to match the argument list of the non-stubbed
function.
Fixes: e5ca263508f7 ("drm/bridge: aux-hpd: separate allocation and registration") Reported-by: kernel test robot <lkp@intel.com> Cc: stable <stable@kernel.org> Closes: https://lore.kernel.org/oe-kbuild-all/202405110428.TMCfb1Ut-lkp@intel.com/ Cc: Johan Hovold <johan+linaro@kernel.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by: Johan Hovold <johan+linaro@kernel.org> Link: https://lore.kernel.org/r/20240511-fix-aux-hpd-stubs-v1-1-98dae71dfaec@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
dm-integrity could set discard_granularity lower than the logical block
size. This could result in failures when sending discard requests to
dm-integrity.
transaction id should be updated in ocfs2_unlink and ocfs2_link.
Otherwise, inode link will be wrong after journal replay even fsync was
called before power failure:
=======================================================================
$ touch testdir/bar
$ ln testdir/bar testdir/bar_link
$ fsync testdir/bar
$ stat -c %h $SCRATCH_MNT/testdir/bar
1
$ stat -c %h $SCRATCH_MNT/testdir/bar
1
=======================================================================
Link: https://lkml.kernel.org/r/20240408082041.20925-4-glass.su@suse.com Fixes: ccd979bdbce9 ("[PATCH] OCFS2: The Second Oracle Cluster Filesystem") Signed-off-by: Su Yue <glass.su@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Jiri reported that the current kexec_dprintk() always prints out debugging
message whenever kexec/kdmmp loading is triggered. That is not wanted.
The debugging message is supposed to be printed out when 'kexec -s -d' is
specified for kexec/kdump loading.
After investigating, the reason is the current kexec_dprintk() takes
printk(KERN_INFO) or printk(KERN_DEBUG) depending on whether '-d' is
specified. However, distros usually have defaulg log level like below:
[~]# cat /proc/sys/kernel/printk
7 4 1 7
So, even though '-d' is not specified, printk(KERN_DEBUG) also always
prints out. I thought printk(KERN_DEBUG) is equal to pr_debug(), it's
not.
Fix it by changing to use pr_info() instead which are expected to work.
Link: https://lkml.kernel.org/r/20240409042238.1240462-1-bhe@redhat.com Fixes: cbc2fe9d9cb2 ("kexec_file: add kexec_file flag to control debug printing") Signed-off-by: Baoquan He <bhe@redhat.com> Reported-by: Jiri Slaby <jirislaby@kernel.org> Closes: https://lore.kernel.org/all/4c775fca-5def-4a2d-8437-7130b02722a2@kernel.org Reviewed-by: Dave Young <dyoung@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Commit 7627a0edef54 ("ata: ahci: Drop low power policy board type")
dropped the board_ahci_low_power board type, and instead enables LPM if:
-The AHCI controller reports that it supports LPM (Partial/Slumber), and
-CONFIG_SATA_MOBILE_LPM_POLICY != 0, and
-The port is not defined as external in the per port PxCMD register, and
-The port is not defined as hotplug capable in the per port PxCMD
register.
Partial and Slumber LPM states can either be initiated by HIPM or DIPM.
For HIPM (host initiated power management) to get enabled, both the AHCI
controller and the drive have to report that they support HIPM.
For DIPM (device initiated power management) to get enabled, only the
drive has to report that it supports DIPM. However, the HBA will reject
device requests to enter LPM states which the HBA does not support.
The problem is that AMD Radeon S3 SSD drives do not handle low power modes
correctly. The problem was most likely not seen before because no one
had used this drive with a AHCI controller with LPM enabled.
Add a quirk so that we do not enable LPM for this drive, since we see
command timeouts if we do (even though the drive claims to support both
HIPM and DIPM).
Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type") Cc: stable@vger.kernel.org Reported-by: Doru Iorgulescu <doru.iorgulescu1@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218832 Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Commit 7627a0edef54 ("ata: ahci: Drop low power policy board type")
dropped the board_ahci_low_power board type, and instead enables LPM if:
-The AHCI controller reports that it supports LPM (Partial/Slumber), and
-CONFIG_SATA_MOBILE_LPM_POLICY != 0, and
-The port is not defined as external in the per port PxCMD register, and
-The port is not defined as hotplug capable in the per port PxCMD
register.
Partial and Slumber LPM states can either be initiated by HIPM or DIPM.
For HIPM (host initiated power management) to get enabled, both the AHCI
controller and the drive have to report that they support HIPM.
For DIPM (device initiated power management) to get enabled, only the
drive has to report that it supports DIPM. However, the HBA will reject
device requests to enter LPM states which the HBA does not support.
The problem is that Crucial CT240BX500SSD1 drives do not handle low power
modes correctly. The problem was most likely not seen before because no
one had used this drive with a AHCI controller with LPM enabled.
Add a quirk so that we do not enable LPM for this drive, since we see
command timeouts if we do (even though the drive claims to support DIPM).
Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type") Cc: stable@vger.kernel.org Reported-by: Aarrayy <lp610mh@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218832 Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Commit 7627a0edef54 ("ata: ahci: Drop low power policy board type")
dropped the board_ahci_low_power board type, and instead enables LPM if:
-The AHCI controller reports that it supports LPM (Partial/Slumber), and
-CONFIG_SATA_MOBILE_LPM_POLICY != 0, and
-The port is not defined as external in the per port PxCMD register, and
-The port is not defined as hotplug capable in the per port PxCMD
register.
Partial and Slumber LPM states can either be initiated by HIPM or DIPM.
For HIPM (host initiated power management) to get enabled, both the AHCI
controller and the drive have to report that they support HIPM.
For DIPM (device initiated power management) to get enabled, only the
drive has to report that it supports DIPM. However, the HBA will reject
device requests to enter LPM states which the HBA does not support.
The problem is that Apacer AS340 drives do not handle low power modes
correctly. The problem was most likely not seen before because no one
had used this drive with a AHCI controller with LPM enabled.
Add a quirk so that we do not enable LPM for this drive, since we see
command timeouts if we do (even though the drive claims to support DIPM).
Fixes: 7627a0edef54 ("ata: ahci: Drop low power policy board type") Cc: stable@vger.kernel.org Reported-by: Tim Teichmann <teichmanntim@outlook.de> Closes: https://lore.kernel.org/linux-ide/87bk4pbve8.ffs@tglx/ Reviewed-by: Mika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Signed-off-by: Niklas Cassel <cassel@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Commit b8b8b4e0c052 ("ata: ahci: Add Intel Alder Lake-P AHCI controller
to low power chipsets list") added Intel Alder Lake to the ahci_pci_tbl.
Because of the way that the Intel PCS quirk was implemented, having
an explicit entry in the ahci_pci_tbl caused the Intel PCS quirk to
be applied. (The quirk was not being applied if there was no explict
entry.)
Thus, entries that were added to the ahci_pci_tbl also got the Intel
PCS quirk applied.
The quirk was cleaned up in commit 7edbb6059274 ("ahci: clean up
intel_pcs_quirk"), such that it is clear which entries that actually
applies the Intel PCS quirk.
Newer Intel AHCI controllers do not need the Intel PCS quirk,
and applying it when not needed actually breaks some platforms.
Do not apply the Intel PCS quirk for Intel Alder Lake.
This is in line with how things worked before commit b8b8b4e0c052 ("ata:
ahci: Add Intel Alder Lake-P AHCI controller to low power chipsets list"),
such that certain platforms using Intel Alder Lake will work once again.
For ${atomic}_sub_and_test() the @i parameter is the value to subtract,
not add. Fix the typo in the kerneldoc template and generate the headers
with this update.
Fixes: ad8110706f38 ("locking/atomic: scripts: generate kerneldoc comments") Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Carlos Llamas <cmllamas@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Kees Cook <keescook@chromium.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20240515133844.3502360-1-cmllamas@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[portia: updated sha1sum on atomic headers] Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Since commit 0a44dfc07074 ("wifi: mac80211: simplify non-chanctx
drivers") ieee80211_hw_config() is no longer called with changed = ~0.
rtlwifi relied on ~0 in order to ignore the default retry limits of
4/7, preferring 48/48 in station mode and 7/7 in AP/IBSS.
RTL8192DU has a lot of packet loss with the default limits from
mac80211. Fix it by ignoring IEEE80211_CONF_CHANGE_RETRY_LIMITS
completely, because it's the simplest solution.
the commit referenced below leads to this NUMA configuration no longer
being refused by a CONFIG_NUMA=y kernel (previously
NUMA: nodes only cover 6144MB of your 8185MB e820 RAM. Not used.
No NUMA configuration found
Faking a node at [mem 0x0000000000000000-0x000000027fffffff]
was seen in the log directly after the message quoted above), because of
memblock_validate_numa_coverage() checking for NUMA_NO_NODE (only). This
in turn led to memblock_alloc_range_nid()'s warning about MAX_NUMNODES
triggering, followed by a NULL deref in memmap_init() when trying to
access node 64's (NODE_SHIFT=6) node data.
To compensate said change, make memblock_set_node() warn on and adjust
a passed in value of MAX_NUMNODES, just like various other functions
already do.
Fixes: ff6c3d81f2e8 ("NUMA: optimize detection of memory with no node id assigned by firmware") Signed-off-by: Jan Beulich <jbeulich@suse.com> Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/1c8a058c-5365-4f27-a9f1-3aeb7fb3e7b2@suse.com Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
It is reported that commit 950210887670 ("thermal: core: Drop
trips_disabled bitmask") causes the maximum frequency of CPUs to drop
further down with every system sleep-wake cycle on Intel Core i7-4710HQ.
This turns out to be due to a trip point whose temperature is equal to 0
degrees Celsius which is acted on every time the system wakes from sleep.
Before commit 950210887670 this trip point would be disabled wia the
trips_disabled bitmask, but now it is treated as a valid one.
Since ACPI thermal control is generally about protection against
overheating, trip points with temperature of 0 centigrade or below are
not particularly useful there, so initialize them all as invalid which
fixes the problem at hand.
Fixes: 950210887670 ("thermal: core: Drop trips_disabled bitmask") Closes: https://lore.kernel.org/linux-pm/3f71747b-f852-4ee0-b384-cf46b2aefa3f@gmx.com Reported-by: Tibor Billes <tbilles@gmx.com> Tested-by: Tibor Billes <tbilles@gmx.com> Cc: 6.7+ <stable@vger.kernel.org> # 6.7+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
When devm_regmap_init_i2c() fails, regmap_ee could be error pointer,
instead of checking for IS_ERR(regmap_ee), regmap is checked which looks
like a copy paste error.
In case of error in the bmi323_trigger_handler() function, the
function exits without calling the iio_trigger_notify_done()
which is responsible for informing the attached trigger that
the process is done and in case there is a .reenable(), to
call it.
Firmware interface 1.10.2.118 has increased the size of
HWRM_PORT_PHY_QCFG response beyond the maximum size that can be
forwarded. When the VF's link state is not the default auto state,
the PF will need to forward the response back to the VF to indicate
the forced state. This regression may cause the VF to fail to
initialize.
Fix it by capping the HWRM_PORT_PHY_QCFG response to the maximum
96 bytes. The SPEEDS2_SUPPORTED flag needs to be cleared because the
new speeds2 fields are beyond the legacy structure. Also modify
bnxt_hwrm_fwd_resp() to print a warning if the message size exceeds 96
bytes to make this failure more obvious.
Tests show that user fence signalling requires kind of write barrier,
otherwise not all writes performed by the workload will be available
to userspace. It is already done for render and compute, we need it
also for the rest: video, gsc, copy.
Fixes: dd08ebf6c352 ("drm/xe: Introduce a new DRM driver for Intel GPUs") Signed-off-by: Andrzej Hajda <andrzej.hajda@intel.com> Reviewed-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20240605-fix_user_fence_posted-v3-2-06e7932f784a@intel.com
(cherry picked from commit 3ad7d18c5dad75ed38098c7cc3bc9594b4701399) Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
If the card doesn't have display hardware, hpd_work and hpd_lock are
left uninitialized which causes BUG when attempting to schedule hpd_work
on runtime PM resume.
Fix it by adding headless flag to DRM and skip any hpd if it's set.
Due to timer wheel implementation, a timer will usually fire
after its schedule.
For instance, for HZ=1000, a timeout between 512ms and 4s
has a granularity of 64ms.
For this range of values, the extra delay could be up to 63ms.
For TCP, this means that tp->rcv_tstamp may be after
inet_csk(sk)->icsk_timeout whenever the timer interrupt
finally triggers, if one packet came during the extra delay.
We need to make sure tcp_rtx_probe0_timed_out() handles this case.
Fixes: e89688e3e978 ("net: tcp: fix unexcepted socket die when snd_wnd is 0") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Menglong Dong <imagedong@tencent.com> Acked-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://lore.kernel.org/r/20240607125652.1472540-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
When the noop_qdisc owner isn't initialized, then it will be 0,
so packets will erroneously be regarded as having been subject
to recursion as long as only CPU 0 queues them. For non-SMP,
that's all packets, of course. This causes a change in what's
reported to userspace, normally noop_qdisc would drop packets
silently, but with this change the syscall returns -ENOBUFS if
RECVERR is also set on the socket.
Fix this by initializing the owner field to -1, just like it
would be for dynamically allocated qdiscs by qdisc_alloc().
Fixes: 0f022d32c3ec ("net/sched: Fix mirred deadlock on device recursion") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/r/20240607175340.786bfb938803.I493bf8422e36be4454c08880a8d3703cea8e421a@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
It is reported that commit 31a0fa0019b0 ("thermal/debugfs: Pass cooling
device state to thermal_debug_cdev_add()") causes the ACPI fan driver
to fail probing on some systems which turns out to be due to the _FST
control method returning an invalid value until _FSL is first evaluated
for the given fan. If this happens, the .get_cur_state() cooling device
callback returns an error and __thermal_cooling_device_register() fails
as uses that callback after commit 31a0fa0019b0.
Arguably, _FST should not return an invalid value even if it is
evaluated before _FSL, so this may be regarded as a platform firmware
issue, but at the same time it is not a good enough reason for failing
the cooling device registration where the initial cooling device state
is only needed to initialize a thermal debug facility.
Accordingly, modify __thermal_cooling_device_register() to avoid
calling thermal_debug_cdev_add() instead of returning an error if the
initial .get_cur_state() callback invocation fails.
Fixes: 31a0fa0019b0 ("thermal/debugfs: Pass cooling device state to thermal_debug_cdev_add()") Closes: https://lore.kernel.org/linux-acpi/20240530153727.843378-1-laura.nao@collabora.com Reported-by: Laura Nao <laura.nao@collabora.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> Tested-by: Laura Nao <laura.nao@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
The pcmtest driver tests use the kselftest harness which requires that
_GNU_SOURCE is defined but nothing causes it to be defined. Since the
KHDR_INCLUDES Makefile variable has had the required define added let's
use that, this should provide some futureproofing.
Fixes: daef47b89efd ("selftests: Compile kselftest headers with -D_GNU_SOURCE") Signed-off-by: Mark Brown <broonie@kernel.org> Reviewed-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>
Since commit 5c4233cc0920 ("powerpc/kdump: Split KEXEC_CORE and
CRASH_DUMP dependency"), crashing_cpu is not available without
CONFIG_CRASH_DUMP. Fix compile error on 64-BIT 85xx owing to this
change.
Fixes: 5c4233cc0920 ("powerpc/kdump: Split KEXEC_CORE and CRASH_DUMP dependency") Cc: stable@vger.kernel.org # v6.9+ Reported-by: Christian Zigotzky <chzigotzky@xenosoft.de> Closes: https://lore.kernel.org/all/fa247ae4-5825-4dbe-a737-d93b7ab4d4b9@xenosoft.de/ Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240510080757.560159-1-hbathini@linux.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Portia Stephens <portia.stephens@canonical.com> Signed-off-by: Roxana Nicolescu <roxana.nicolescu@canonical.com>