Compiling the nvlink stuff relies on the SPAPR_TCE_IOMMU otherwise there
are compile errors:
drivers/vfio/pci/vfio_pci_nvlink2.c:101:10: error: implicit declaration of function 'mm_iommu_put' [-Werror,-Wimplicit-function-declaration]
ret = mm_iommu_put(data->mm, data->mem);
As PPC only defines these functions when the config is set.
Previously this wasn't a problem by chance as SPAPR_TCE_IOMMU was the only
IOMMU that could have satisfied IOMMU_API on POWERNV.
Fixes: 179209fa1270 ("vfio: IOMMU_API should be selected") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Message-Id: <0-v1-83dba9768fc3+419-vfio_nvlink2_kconfig_jgg@nvidia.com> Signed-off-by: Alex Williamson <alex.williamson@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The SOR resets are exclusively shared with the SOR power domain. This
means that exclusive access can only be granted temporarily and in order
for that to work, a rigorous sequence must be observed. To ensure that a
single consumer gets exclusive access to a reset, each consumer must
implement a rigorous protocol using the reset_control_acquire() and
reset_control_release() functions.
However, these functions alone don't provide any guarantees at the
system level. Drivers need to ensure that the only a single consumer has
access to the reset at the same time. In order for the SOR to be able to
exclusively access its reset, it must therefore ensure that the SOR
power domain is not powered off by holding on to a runtime PM reference
to that power domain across the reset assert/deassert operation.
This used to work fine by accident, but was revealed when recently more
devices started to rely on the SOR power domain.
Fixes: 11c632e1cfd3 ("drm/tegra: sor: Implement acquire/release for reset") Reported-by: Jonathan Hunter <jonathanh@nvidia.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Coupling of display controllers used to rely on runtime PM to take the
companion controller out of reset. Commit fd67e9c6ed5a ("drm/tegra: Do
not implement runtime PM") accidentally broke this when runtime PM was
removed.
Restore this functionality by reusing the hierarchical host1x client
suspend/resume infrastructure that's similar to runtime PM and which
perfectly fits this use-case.
Fixes: fd67e9c6ed5a ("drm/tegra: Do not implement runtime PM") Reported-by: Dmitry Osipenko <digetx@gmail.com> Reported-by: Paul Fertser <fercerpav@gmail.com> Tested-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
syzbot is reporting NULL pointer dereference at reiserfs_security_init()
[1], for commit ab17c4f02156c4f7 ("reiserfs: fixup xattr_root caching")
is assuming that REISERFS_SB(s)->xattr_root != NULL in
reiserfs_xattr_jcreate_nblocks() despite that commit made
REISERFS_SB(sb)->priv_root != NULL && REISERFS_SB(s)->xattr_root == NULL
case possible.
I guess that commit 6cb4aff0a77cc0e6 ("reiserfs: fix oops while creating
privroot with selinux enabled") wanted to check xattr_root != NULL
before reiserfs_xattr_jcreate_nblocks(), for the changelog is talking
about the xattr root.
The issue is that while creating the privroot during mount
reiserfs_security_init calls reiserfs_xattr_jcreate_nblocks which
dereferences the xattr root. The xattr root doesn't exist, so we get
an oops.
Therefore, update reiserfs_xattrs_initialized() to check both the
privroot and the xattr root.
Link: https://syzkaller.appspot.com/bug?id=8abaedbdeb32c861dc5340544284167dd0e46cde Reported-and-tested-by: syzbot <syzbot+690cb1e51970435f9775@syzkaller.appspotmail.com> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Fixes: 6cb4aff0a77c ("reiserfs: fix oops while creating privroot with selinux enabled") Acked-by: Jeff Mahoney <jeffm@suse.com> Acked-by: Jan Kara <jack@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The page table of AMDGPU requires an alignment to CPU page so we should
check ioctl parameters for it. Return -EINVAL if some parameter is
unaligned to CPU page, instead of corrupt the page table sliently.
Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
In Mesa, dev_info.gart_page_size is used for alignment and it was
set to AMDGPU_GPU_PAGE_SIZE(4KB). However, the page table of AMDGPU
driver requires an alignment on CPU pages. So, for non-4KB page system,
gart_page_size should be max_t(u32, PAGE_SIZE, AMDGPU_GPU_PAGE_SIZE).
Signed-off-by: Rui Wang <wangr@lemote.com> Signed-off-by: Huacai Chen <chenhc@lemote.com> Link: https://github.com/loongson-community/linux-stable/commit/caa9c0a1
[Xi: rebased for drm-next, use max_t for checkpatch,
and reworded commit message.] Signed-off-by: Xi Ruoyao <xry111@mengyan1223.wang> BugLink: https://gitlab.freedesktop.org/drm/amd/-/issues/1549 Tested-by: Dan Horák <dan@danny.cz> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Offset calculation wasn't correct as start addresses are in pfn
not in bytes.
CC: stable@vger.kernel.org Signed-off-by: Nirmoy Das <nirmoy.das@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Do the same thing we do for Renoir. We can check, but since
the sbios has started DPM, it will always return true which
causes the driver to skip some of the SMU init when it shouldn't.
Reviewed-by: Zhan Liu <zhan.liu@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Amdgpu driver uses 4-byte data type as DQM fence memory,
and transmits GPU address of fence memory to microcode
through query status PM4 message. However, query status
PM4 message definition and microcode processing are all
processed according to 8 bytes. Fence memory only allocates
4 bytes of memory, but microcode does write 8 bytes of memory,
so there is a memory corruption.
Changes since v1:
* Change dqm->fence_addr as a u64 pointer to fix this issue,
also fix up query_status and amdkfd_fence_wait_timeout function
uses 64 bit fence value to make them consistent.
Signed-off-by: Qu Huang <jinsdb@126.com> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Felix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
There are code paths that rely on zero_pfn to be fully initialized
before core_initcall. For example, wq_sysfs_init() is a core_initcall
function that eventually results in a call to kernel_execve, which
causes a page fault with a subsequent mmput. If zero_pfn is not
initialized by then it may not get cleaned up properly and result in an
error:
BUG: Bad rss-counter state mm:(ptrval) type:MM_ANONPAGES val:1
Here is an analysis of the race as seen on a MIPS device. On this
particular MT7621 device (Ubiquiti ER-X), zero_pfn is PFN 0 until
initialized, at which point it becomes PFN 5120:
1. wq_sysfs_init calls into kobject_uevent_env at core_initcall:
kobject_uevent_env+0x7e4/0x7ec
kset_register+0x68/0x88
bus_register+0xdc/0x34c
subsys_virtual_register+0x34/0x78
wq_sysfs_init+0x1c/0x4c
do_one_initcall+0x50/0x1a8
kernel_init_freeable+0x230/0x2c8
kernel_init+0x10/0x100
ret_from_kernel_thread+0x14/0x1c
2. kobject_uevent_env() calls call_usermodehelper_exec() which executes
kernel_execve asynchronously.
3. Memory allocations in kernel_execve cause a page fault, bumping the
MM reference counter:
add_mm_counter_fast+0xb4/0xc0
handle_mm_fault+0x6e4/0xea0
__get_user_pages.part.78+0x190/0x37c
__get_user_pages_remote+0x128/0x360
get_arg_page+0x34/0xa0
copy_string_kernel+0x194/0x2a4
kernel_execve+0x11c/0x298
call_usermodehelper_exec_async+0x114/0x194
4. In case zero_pfn has not been initialized yet, zap_pte_range does
not decrement the MM_ANONPAGES RSS counter and the BUG message is
triggered shortly afterwards when __mmdrop checks the ref counters:
__mmdrop+0x98/0x1d0
free_bprm+0x44/0x118
kernel_execve+0x160/0x1d8
call_usermodehelper_exec_async+0x114/0x194
ret_from_kernel_thread+0x14/0x1c
To avoid races such as described above, initialize init_zero_pfn at
early_initcall level. Depending on the architecture, ZERO_PAGE is
either constant or gets initialized even earlier, at paging_init, so
there is no issue with initializing zero_pfn earlier.
The s390 specific vdso function __arch_get_hw_counter() is supposed to
consider tod clock steering.
If a tod clock steering event happens and the tod clock is set to a
new value __arch_get_hw_counter() will not return the real tod clock
value but slowly drift it from the old delta until the returned value
finally matches the real tod clock value again.
Unfortunately the type of tod_steering_delta unsigned while it is
supposed to be signed. It depends on if tod_steering_delta is negative
or positive in which direction the vdso code drifts the clock value.
Worst case is now that instead of drifting the clock slowly it will
jump into the opposite direction by a factor of two.
Fix this by simply making tod_steering_delta signed.
Commit cbc3b92ce037 fixed an issue to modify the macros of the stack trace
event so that user space could parse it properly. Originally the stack
trace format to user space showed that the called stack was a dynamic
array. But it is not actually a dynamic array, in the way that other
dynamic event arrays worked, and this broke user space parsing for it. The
update was to make the array look to have 8 entries in it. Helper
functions were added to make it parse it correctly, as the stack was
dynamic, but was determined by the size of the event stored.
Although this fixed user space on how it read the event, it changed the
internal structure used for the stack trace event. It changed the array
size from [0] to [8] (added 8 entries). This increased the size of the
stack trace event by 8 words. The size reserved on the ring buffer was the
size of the stack trace event plus the number of stack entries found in
the stack trace. That commit caused the amount to be 8 more than what was
needed because it did not expect the caller field to have any size. This
produced 8 entries of garbage (and reading random data) from the stack
trace event:
Instead, subtract the size of the caller field from the size of the event
to make sure that only the amount needed to store the stack trace is
reserved.
rpm_active indicates how many times the supplier usage_count has been
incremented. Consequently it must be updated after pm_runtime_get_sync() of
the supplier, not before.
Fixes: 4c06c4e6cf63 ("driver core: Fix possible supplier PM-usage counter imbalance") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: 5.1+ <stable@vger.kernel.org> # 5.1+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
pm_runtime_put_suppliers() must not decrement rpm_active unless the
consumer is suspended. That is because, otherwise, it could suspend
suppliers for an active consumer.
That can happen as follows:
static int driver_probe_device(struct device_driver *drv, struct device *dev)
{
int ret = 0;
pm_runtime_get_suppliers(dev);
if (dev->parent)
pm_runtime_get_sync(dev->parent);
At this point, dev can runtime suspend so rpm_put_suppliers() can run,
rpm_active becomes 1 (the lowest value).
pm_runtime_barrier(dev);
if (initcall_debug)
ret = really_probe_debug(dev, drv);
else
ret = really_probe(dev, drv);
Probe callback can have runtime resumed dev, and then runtime put
so dev is awaiting autosuspend, but rpm_active is 2.
pm_request_idle(dev);
if (dev->parent)
pm_runtime_put(dev->parent);
pm_runtime_put_suppliers(dev);
Now pm_runtime_put_suppliers() will put the supplier
i.e. rpm_active 2 -> 1, but consumer can still be active.
return ret;
}
Fix by checking the runtime status. For any status other than
RPM_SUSPENDED, rpm_active can be considered to be "owned" by
rpm_[get/put]_suppliers() and pm_runtime_put_suppliers() need do nothing.
Reported-by: Asutosh Das <asutoshd@codeaurora.org> Fixes: 4c06c4e6cf63 ("driver core: Fix possible supplier PM-usage counter imbalance") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: 5.1+ <stable@vger.kernel.org> # 5.1+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Fixing nested_vmcb_check_save to avoid all TOC/TOU races
is a bit harder in released kernels, so do the bare minimum
by avoiding that EFER.SVME is cleared. This is problematic
because svm_set_efer frees the data structures for nested
virtualization if EFER.SVME is cleared.
Also check that EFER.SVME remains set after a nested vmexit;
clearing it could happen if the bit is zero in the save area
that is passed to KVM_SET_NESTED_STATE (the save area of the
nested state corresponds to the nested hypervisor's state
and is restored on the next nested vmexit).
Cc: stable@vger.kernel.org Fixes: 2fcf4876ada ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Avoid races between check and use of the nested VMCB controls. This
for example ensures that the VMRUN intercept is always reflected to the
nested hypervisor, instead of being processed by the host. Without this
patch, it is possible to end up with svm->nested.hsave pointing to
the MSR permission bitmap for nested guests.
This bug is CVE-2021-29657.
Reported-by: Felix Wilhelm <fwilhelm@google.com> Cc: stable@vger.kernel.org Fixes: 2fcf4876ada ("KVM: nSVM: implement on demand allocation of the nested state") Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
coprocessor_flush is not a part of fast exception handlers, but it uses
parts of fast coprocessor handling code that's why it's in the same
source file. It uses call0 opcode to invoke those parts so there are no
limitations on their relative location, but the rest of the code calls
coprocessor_flush with call8 and that doesn't work when vectors are
placed in a different gigabyte-aligned area than the rest of the kernel.
Move coprocessor_flush from the .exception.text section to the .text so
that it's reachable from the rest of the kernel with call8.
Cc: stable@vger.kernel.org Signed-off-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
If a uaccess (e.g. get_user()) triggers a fault and there's a
fault signal pending, the handler will return to the uaccess without
having performed a uaccess fault fixup, and so the CPU will immediately
execute the uaccess instruction again, whereupon it will livelock
bouncing between that instruction and the fault handler.
We found the alc_update_headset_mode() is not called on some machines
when unplugging the headset, as a result, the mode of the
ALC_HEADSET_MODE_UNPLUGGED can't be set, then the current_headset_type
is not cleared, if users plug a differnt type of headset next time,
the determine_headset_type() will not be called and the audio jack is
set to the headset type of previous time.
On the Dell machines which connect the dmic to the PCH, if we open
the gnome-sound-setting and unplug the headset, this issue will
happen. Those machines disable the auto-mute by ucm and has no
internal mic in the input source, so the update_headset_mode() will
not be called by cap_sync_hook or automute_hook when unplugging, and
because the gnome-sound-setting is opened, the codec will not enter
the runtime_suspend state, so the update_headset_mode() will not be
called by alc_resume when unplugging. In this case the
hp_automute_hook is called when unplugging, so add
update_headset_mode() calling to this function.
Cc: <stable@vger.kernel.org> Signed-off-by: Hui Wang <hui.wang@canonical.com> Link: https://lore.kernel.org/r/20210320091542.6748-2-hui.wang@canonical.com Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The recently added PM prepare and complete callbacks don't have the
sanity check whether the card instance has been properly initialized,
which may potentially lead to Oops.
This patch adds the azx_is_pm_ready() call in each place
appropriately like other PM callbacks.
Fixes: f5dac54d9d93 ("ALSA: hda: Separate runtime and system suspend") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210329113059.25035-2-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The card power state change via snd_power_change_state() at the system
suspend/resume seems dropped mistakenly during the PM code rewrite.
The card power state doesn't play much role nowadays but it's still
referred in a few places such as the HDMI codec driver.
This patch restores them, but in a more appropriate place now in the
prepare and complete callbacks.
Fixes: f5dac54d9d93 ("ALSA: hda: Separate runtime and system suspend") Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20210329113059.25035-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Logitech ConferenceCam Connect is a compound USB device with UVC and
UAC. Not 100% reproducible but sometimes it keeps responding STALL to
every control transfer once it receives get_freq request.
This patch adds 046d:0x084c to a snd_usb_get_sample_rate_quirk list.
Commit 71da201f38df ("ACPI: scan: Defer enumeration of devices with
_DEP lists") dropped the following 2 lines from acpi_init_device_object():
/* Assume there are unmet deps until acpi_device_dep_initialize() runs */
device->dep_unmet = 1;
Leaving the initial value of dep_unmet at the 0 from the kzalloc(). This
causes the acpi_bus_get_status() call in acpi_add_single_object() to
actually call _STA, even though there maybe unmet deps, leading to errors
like these:
[ 0.123579] ACPI Error: No handler for Region [ECRM] (00000000ba9edc4c)
[GenericSerialBus] (20170831/evregion-166)
[ 0.123601] ACPI Error: Region GenericSerialBus (ID=9) has no handler
(20170831/exfldio-299)
[ 0.123618] ACPI Error: Method parse/execution failed
\_SB.I2C1.BAT1._STA, AE_NOT_EXIST (20170831/psparse-550)
Fix this by re-adding the dep_unmet = 1 initialization to
acpi_init_device_object() and modifying acpi_bus_check_add() to make sure
that dep_unmet always gets setup there, overriding the initial 1 value.
This re-fixes the issue initially fixed by
commit 63347db0affa ("ACPI / scan: Use acpi_bus_get_status() to initialize
ACPI_TYPE_DEVICE devs"), which introduced the removed
"device->dep_unmet = 1;" statement.
This issue was noticed; and the fix tested on a Dell Venue 10 Pro 5055.
Fixes: 71da201f38df ("ACPI: scan: Defer enumeration of devices with _DEP lists") Suggested-by: Rafael J. Wysocki <rafael@kernel.org> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Cc: 5.11+ <stable@vger.kernel.org> # 5.11+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Commit 496121c02127 ("ACPI: processor: idle: Allow probing on platforms
with one ACPI C-state") broke CPU0 hotplug on certain systems, e.g.
I'm observing the following on AWS Nitro (e.g r5b.xlarge but other
instance types are affected as well):
In fact, the above mentioned commit only revealed the problem and did
not introduce it. On x86, to wakeup CPU an NMI is being used and
hlt_play_dead()/mwait_play_dead() loops are prepared to handle it:
/*
* If NMI wants to wake up CPU0, start CPU0.
*/
if (wakeup_cpu0())
start_cpu0();
cpuidle_play_dead() -> acpi_idle_play_dead() (which is now being called on
systems where it wasn't called before the above mentioned commit) serves
the same purpose but it doesn't have a path for CPU0. What happens now on
wakeup is:
- NMI is sent to CPU0
- wakeup_cpu0_nmi() works as expected
- we get back to while (1) loop in acpi_idle_play_dead()
- safe_halt() puts CPU0 to sleep again.
The straightforward/minimal fix is add the special handling for CPU0 on x86
and that's what the patch is doing.
Fixes: 496121c02127 ("ACPI: processor: idle: Allow probing on platforms with one ACPI C-state") Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: 5.10+ <stable@vger.kernel.org> # 5.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The following problem has been reported by George Kennedy:
Since commit 7fef431be9c9 ("mm/page_alloc: place pages to tail
in __free_pages_core()") the following use after free occurs
intermittently when ACPI tables are accessed.
BUG: KASAN: use-after-free in ibft_init+0x134/0xc49
Read of size 4 at addr ffff8880be453004 by task swapper/0/1
CPU: 3 PID: 1 Comm: swapper/0 Not tainted 5.12.0-rc1-7a7fd0d #1
Call Trace:
dump_stack+0xf6/0x158
print_address_description.constprop.9+0x41/0x60
kasan_report.cold.14+0x7b/0xd4
__asan_report_load_n_noabort+0xf/0x20
ibft_init+0x134/0xc49
do_one_initcall+0xc4/0x3e0
kernel_init_freeable+0x5af/0x66b
kernel_init+0x16/0x1d0
ret_from_fork+0x22/0x30
ACPI tables mapped via kmap() do not have their mapped pages
reserved and the pages can be "stolen" by the buddy allocator.
Apparently, on the affected system, the ACPI table in question is
not located in "reserved" memory, like ACPI NVS or ACPI Data, that
will not be used by the buddy allocator, so the memory occupied by
that table has to be explicitly reserved to prevent the buddy
allocator from using it.
In order to address this problem, rearrange the initialization of the
ACPI tables on x86 to locate the initial tables earlier and reserve
the memory occupied by them.
The other architectures using ACPI should not be affected by this
change.
Link: https://lore.kernel.org/linux-acpi/1614802160-29362-1-git-send-email-george.kennedy@oracle.com/ Reported-by: George Kennedy <george.kennedy@oracle.com> Tested-by: George Kennedy <george.kennedy@oracle.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Cc: 5.10+ <stable@vger.kernel.org> # 5.10+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Multiple BPF-helpers that can manipulate/increase the size of the SKB uses
__bpf_skb_max_len() as the max-length. This function limit size against
the current net_device MTU (skb->dev->mtu).
When a BPF-prog grow the packet size, then it should not be limited to the
MTU. The MTU is a transmit limitation, and software receiving this packet
should be allowed to increase the size. Further more, current MTU check in
__bpf_skb_max_len uses the MTU from ingress/current net_device, which in
case of redirects uses the wrong net_device.
This patch keeps a sanity max limit of SKB_MAX_ALLOC (16KiB). The real limit
is elsewhere in the system. Jesper's testing[1] showed it was not possible
to exceed 8KiB when expanding the SKB size via BPF-helper. The limiting
factor is the define KMALLOC_MAX_CACHE_SIZE which is 8192 for
SLUB-allocator (CONFIG_SLUB) in-case PAGE_SIZE is 4096. This define is
in-effect due to this being called from softirq context see code
__gfp_pfmemalloc_flags() and __do_kmalloc_node(). Jakub's testing showed
that frames above 16KiB can cause NICs to reset (but not crash). Keep this
sanity limit at this level as memory layer can differ based on kernel
config.
I met below warning when cating a small size(about 80bytes) txt file
on 9pfs(msize=2097152 is passed to 9p mount option), the reason is we
miss iov_iter_advance() if the read count is 0 for zerocopy case, so
we didn't truncate the pipe, then iov_iter_pipe() thinks the pipe is
full. Fix it by removing the exception for 0 to ensure to call
iov_iter_advance() even on empty read for zerocopy case.
Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
lmc set sc->lmc_media pointer when there is a matching device.
However, when no matching device is found, this pointer is NULL
and the following dereference will result in a null-ptr-deref.
To fix this issue, unregister the hdlc device and return an error.
Signed-off-by: Tong Zhang <ztong0001@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
In ipa_cmd_register_write_valid() we verify that values we will
supply to a REGISTER_WRITE IPA immediate command will fit in
the fields that need to hold them. This patch fixes some issues
in that function and ipa_cmd_register_write_offset_valid().
The dev_err() call in ipa_cmd_register_write_offset_valid() has
some printf format errors:
- The name of the register (corresponding to the string format
specifier) was not supplied.
- The IPA base offset and offset need to be supplied separately to
match the other format specifiers.
Also make the ~0 constant used there to compute the maximum
supported offset value explicitly unsigned.
There are two other issues in ipa_cmd_register_write_valid():
- There's no need to check the hash flush register for platforms
(like IPA v4.2) that do not support hashed tables
- The highest possible endpoint number, whose status register
offset is computed, is COUNT - 1, not COUNT.
Fix these problems, and add some additional commentary.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
This patch actually fixes a bug, though it doesn't affect the two
platforms supported currently. The fix implements GSI memory
pointers a bit differently.
For IPA version 4.5 and above, the address space for almost all GSI
registers is adjusted downward by a fixed amount. This is currently
handled by adjusting the I/O virtual address pointer after it has
been mapped. The bug is that the pointer is not "de-adjusted" as it
should be when it's unmapped.
This patch fixes that error, but it does so by maintaining one "raw"
pointer for the mapped memory range. This is assigned when the
memory is mapped and used to unmap the memory. This pointer is also
used to access the two registers that do *not* sit in the "adjusted"
memory space.
Rather than adjusting *that* pointer, we maintain a separate pointer
that's an adjusted copy of the "raw" pointer, and that is used for
most GSI register accesses.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
We do not support inter-EE channel or event ring commands. Inter-EE
interrupts are disabled (and never re-enabled) for all channels and
event rings, so we have no need for the GSI registers that clear
those interrupt conditions. So remove their definitions.
Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
If a DDP broadcast packet is sent out to a non-gateway target, it is
also looped back. There is a potential for the loopback device to have a
longer hardware header length than the original target route's device,
which can result in the skb not being created with enough room for the
loopback device's hardware header. This patch fixes the issue by
determining that a loopback will be necessary prior to allocating the
skb, and if so, ensuring the skb has enough room.
This was discovered while testing a new driver that creates a LocalTalk
network interface (LTALK_HLEN = 1). It caused an skb_under_panic.
Signed-off-by: Doug Brown <doug@schmorgal.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The aq_nic_start function can fail in a variety of cases which leaves
the device in broken state.
An example case where the start function fails is the
request_threaded_irq which can be interrupted, resulting in a EINTR
result. This can be manually triggered by bringing the link up (e.g. ip
link set up) and triggering a SIGINT on the initiating process (e.g.
Ctrl+C). This would put the device into a half configured state.
Subsequently bringing the link up again would cause the napi_enable to
BUG.
In order to correctly clean up the failed attempt to start a device call
aq_nic_stop.
Signed-off-by: Nathan Rossi <nathan.rossi@digi.com> Reviewed-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
ieee80211_find_sta_by_ifaddr() must be called under the RCU lock and
the resulting pointer is only valid under RCU lock as well.
Fix ath10k_wmi_tlv_op_pull_peer_stats_info() to hold RCU lock before it
calls ieee80211_find_sta_by_ifaddr() and release it when the resulting
pointer is no longer needed.
This problem was found while reviewing code to debug RCU warn from
ath10k_wmi_tlv_parse_peer_stats_info().
The only thing we do touching the device in hard interrupt context
is, at most, writing an interrupt ACK register, which isn't racing
in with anything protected by the reg_lock.
Thus, avoid disabling interrupts here for potentially long periods
of time, particularly long periods have been observed with dumping
of firmware memory (leading to lockup warnings on some devices.)
Initialize the dummy FIB offload module after debugfs, so that the FIB
module could create its own directory there.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
When function return fail to __ath11k_mac_register after success called
ieee80211_register_hw, then it set wiphy->dev.parent to NULL by
SET_IEEE80211_DEV(ar->hw, NULL) in end of __ath11k_mac_register, then
cfg80211_get_drvinfo will be called by below call stack, but the
wiphy->dev.parent is NULL, so kernel crash.
After apply this patch, kernel crash gone, and below is the test case's
sequence of function call and log when wlan load with fail by function
ath11k_regd_update, and __ath11k_mac_register return fail:
When ath11k_regd_update or ath11k_debugfs_register return fail, function
ieee80211_unregister_hw of mac80211 will be called, then it will wait
untill cfg80211_get_drvinfo finished, the wiphy->dev.parent is not NULL
at this moment, after that, it set wiphy->dev.parent to NULL by
SET_IEEE80211_DEV(ar->hw, NULL) in end of __ath11k_mac_register, so
not happen kernel crash.
Tested-on: QCA6390 hw2.0 PCI WLAN.HST.1.0.1-01740-QCAHSTSWPLZ_V2_TO_X86-1 Signed-off-by: Wen Gong <wgong@codeaurora.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/1608607824-16067-1-git-send-email-wgong@codeaurora.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
This ensure that previous association attempts do not leave stale statuses
on subsequent attempts.
This fixes the WARN_ON(!cr->bss)) from __cfg80211_connect_result() when
connecting to an AP after a previous connection failure (e.g. where EAP fails
due to incorrect psk but association succeeded). In some scenarios, indeed,
brcmf_is_linkup() was reporting a link up event too early due to stale
BRCMF_VIF_STATUS_ASSOC_SUCCESS bit, thus reporting to cfg80211 a connection
result with a zeroed bssid (vif->profile.bssid is still empty), causing the
WARN_ON due to the call to cfg80211_get_bss() with the empty bssid.
The MPTCP_PUSH_PENDING define is 6 and these tests should be testing if
BIT(6) is set.
Fixes: c2e6048fa1cf ("mptcp: fix race in release_cb") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
When slave is NULL or slave_ops->ndo_neigh_setup is NULL, no error
return code of bond_neigh_init() is assigned.
To fix this bug, ret is assigned with -EINVAL in these cases.
Fixes: 9e99bfefdbce ("bonding: fix bond_neigh_init()") Reported-by: TOTE Robot <oslab@tsinghua.edu.cn> Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
If we receive a MPTCP_PUSH_PENDING even from a subflow when
mptcp_release_cb() is serving the previous one, the latter
will be delayed up to the next release_sock(msk).
Address the issue implementing a test/serve loop for such
event.
Additionally rename the push helper to __mptcp_push_pending()
to be more consistent with the existing code.
Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Since 20dd3850bcf8 ("can: Speed up CAN frame receiption by using
ml_priv") the CAN framework uses per device specific data in the AF_CAN
protocol. For this purpose the struct net_device->ml_priv is used. Later
the ml_priv usage in CAN was extended for other users, one of them being
CAN_J1939.
Later in the kernel ml_priv was converted to an union, used by other
drivers. E.g. the tun driver started storing it's stats pointer.
Since tun devices can claim to be a CAN device, CAN specific protocols
will wrongly interpret this pointer, which will cause system crashes.
Mostly this issue is visible in the CAN_J1939 stack.
To fix this issue, we request a dedicated CAN pointer within the
net_device struct.
Reported-by: syzbot+5138c4dd15a0401bec7b@syzkaller.appspotmail.com Fixes: 20dd3850bcf8 ("can: Speed up CAN frame receiption by using ml_priv") Fixes: ffd956eef69b ("can: introduce CAN midlayer private and allocate it automatically") Fixes: 9d71dd0c7009 ("can: add support of SAE J1939 protocol") Fixes: 497a5757ce4e ("tun: switch to net core provided statistics counters") Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/r/20210223070127.4538-1-o.rempel@pengutronix.de Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
mptcp re-used inet(6)_release, so the subflow sockets are ignored.
Need to invoke ip(v6)_mc_drop_socket function to ensure mcast join
resources get free'd.
Fixes: 717e79c867ca5 ("mptcp: Add setsockopt()/getsockopt() socket operations") Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/110 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Currently we move orphaned msk sockets directly from FIN_WAIT2
state to CLOSE, with the rationale that incoming additional
data could be just dropped by the TCP stack/TW sockets.
Anyhow we miss sending MPTCP-level ack on incoming DATA_FIN,
and that may hang the peers.
Fixes: e16163b6e2b7 ("mptcp: refactor shutdown and close") Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
# tc filter add dev $h2 ingress protocol ip pref 1 handle 101 flower \
$tcflags dst_ip 192.0.2.2 ip_ttl 63 action drop
doesn't drop all IPv4 packets that match the configured TTL / destination
address. In particular, if "fragment offset" or "more fragments" have non
zero value in the IPv4 header, setting of FLOW_DISSECTOR_KEY_IP is simply
ignored. Fix this dissecting IPv4 TTL and TOS before fragment info; while
at it, add a selftest for tc flower's match on 'ip_ttl' that verifies the
correct behavior.
Fixes: 518d8a2e9bad ("net/flow_dissector: add support for dissection of misc ip header fields") Reported-by: Shuang Li <shuali@redhat.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Currently we do not schedule the MPTCP retransmission
timer after pushing the data when such action happens
in the subflow context.
This may cause hang-up on active-backup scenarios, or
even when only single subflow msks are involved, if we lost
some peer's ack.
Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The mptcp subflow route_req() callback performs the subflow
req initialization after the route_req() check. If the latter
fails, mptcp-specific bits of the current request sockets
are left uninitialized.
The above causes bad things at req socket disposal time, when
the mptcp resources are cleared.
This change addresses the issue by splitting subflow_init_req()
into the actual initialization and the mptcp-specific checks.
The initialization is moved before any possibly failing check.
Reported-by: Christoph Paasch <cpaasch@apple.com> Fixes: 7ea851d19b23 ("tcp: merge 'init_req' and 'route_req' functions") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The current mptcp_poll() implementation gives unexpected
results after shutdown(SEND_SHUTDOWN) and when the msk
status is TCP_CLOSE.
Set the correct mask.
Fixes: 8edf08649eed ("mptcp: rework poll+nospace handling") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Currently all errors received on msk subflows are ignored.
We need to catch at least the errors on connect() and
on fallback sockets.
Use a custom sk_error_report callback at subflow level,
and do the real action under the msk socket lock - via
the usual sock_owned_by_user()/release_callback() schema.
Fixes: 6e628cd3a8f7 ("mptcp: use mptcp release_cb for delayed tasks") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Mat Martineau <mathew.j.martineau@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The condition should be skipped if CPU ID equal to nthreads.
The patch doesn't fix any actual issue since
nthreads = min_t(unsigned int, num_present_cpus(), MVPP2_MAX_THREADS).
On all current Armada platforms, the number of CPU's is
less than MVPP2_MAX_THREADS.
Fixes: e531f76757eb ("net: mvpp2: handle cases where more CPUs are available than s/w threads") Reported-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Stefan Chulski <stefanc@marvell.com> Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
In ext4_rename(), when RENAME_WHITEOUT failed to add new entry into
directory, it ends up dropping new created whiteout inode under the
running transaction. After commit <9b88f9fb0d2> ("ext4: Do not iput inode
under running transaction"), we follow the assumptions that evict() does
not get called from a transaction context but in ext4_rename() it breaks
this suggestion. Although it's not a real problem, better to obey it, so
this patch add inode to orphan list and stop transaction before final
iput().
The intent is to avoid writing init code after init (because the text
might have been freed). The code is needlessly different between
jump_label and static_call and not obviously correct.
The existing code relies on the fact that the module loader clears the
init layout, such that within_module_init() always fails, while
jump_label relies on the module state which is more obvious and
matches the kernel logic.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Sumit Garg <sumit.garg@linaro.org> Link: https://lkml.kernel.org/r/20210318113610.636651340@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The underlying problem is not introduced by the commit, yet it uncovered the
underlying issue. The cited commit relies on valid pages. This is not given for
due to some bugs. For now, just warn and work around the issue by just ignoring
the bad ttm objects.
Below is some debug info gathered while debugging this issue:
nouveau 0000:01:00.0: DRM: ttm_dma->num_pages: 2048
nouveau 0000:01:00.0: DRM: ttm_dma->pages is NULL
nouveau 0000:01:00.0: DRM: ttm_dma: 00000000e96058e7
nouveau 0000:01:00.0: DRM: ttm_dma->page_flags:
nouveau 0000:01:00.0: DRM: ttm_dma: Populated: 1
nouveau 0000:01:00.0: DRM: ttm_dma: No Retry: 0
nouveau 0000:01:00.0: DRM: ttm_dma: SG: 256
nouveau 0000:01:00.0: DRM: ttm_dma: Zero Alloc: 0
nouveau 0000:01:00.0: DRM: ttm_dma: Swapped: 0
This patch causes a panic when rebooting my Dell Poweredge r440. I do
not have the full panic log as it's lost at that stage of the reboot and
I do not have a serial console. Reverting this patch makes my system
able to reboot again.
Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
In ww_acquire_init(), mutex_acquire() is gated by CONFIG_DEBUG_LOCK_ALLOC.
The dep_map in the ww_acquire_ctx structure is also gated by the
same config. However mutex_release() in ww_acquire_fini() is gated by
CONFIG_DEBUG_MUTEXES. It is possible to set CONFIG_DEBUG_MUTEXES without
setting CONFIG_DEBUG_LOCK_ALLOC though it is an unlikely configuration.
That may cause a compilation error as dep_map isn't defined in this
case. Fix this potential problem by enclosing mutex_release() inside
CONFIG_DEBUG_LOCK_ALLOC.
Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lore.kernel.org/r/20210316153119.13802-3-longman@redhat.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The use_ww_ctx flag is passed to mutex_optimistic_spin(), but the
function doesn't use it. The frequent use of the (use_ww_ctx && ww_ctx)
combination is repetitive.
In fact, ww_ctx should not be used at all if !use_ww_ctx. Simplify
ww_mutex code by dropping use_ww_ctx from mutex_optimistic_spin() an
clear ww_ctx if !use_ww_ctx. In this way, we can replace (use_ww_ctx &&
ww_ctx) by just (ww_ctx).
There is a possible chance that some cooling device stats buffer
allocation fails due to very high cooling device max state value.
Later cooling device update sysfs can try to access stats data
for the same cooling device. It will lead to NULL pointer
dereference issue.
Add a NULL pointer check before accessing thermal cooling device
stats data. It fixes the following bug
We do some IO operations in the snd_soc_component_set_jack callback
function and snd_soc_component_set_jack() will be called when soc
component is removed. However, we should not access SoundWire registers
when the bus is suspended.
So set regcache_cache_only(regmap, true) to avoid accessing in the
soc component removal process.
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com> Reviewed-by: Kai Vehmanen <kai.vehmanen@linux.intel.com> Reviewed-by: Rander Wang <rander.wang@intel.com> Link: https://lore.kernel.org/r/20210316005254.29699-1-yung-chuan.liao@linux.intel.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Simple-card/audio-graph-card drivers do not handle MCLK clock when it
is specified in the codec device node. The expectation here is that,
the codec should actually own up the MCLK clock and do necessary setup
in the driver.
Suggested-by: Mark Brown <broonie@kernel.org> Suggested-by: Michael Walle <michael@walle.cc> Signed-off-by: Sameer Pujar <spujar@nvidia.com> Link: https://lore.kernel.org/r/1615829492-8972-3-git-send-email-spujar@nvidia.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
request_irq() wont accept a name which contains slash so we need to
repalce it with something else -- otherwise it will trigger a warning
and the entry in /proc/irq/ will not be created
since the .name might be used by userspace and we don't want to break
userspace, so we are changing the parameters passed to request_irq()
request_irq() wont accept a name which contains slash so we need to
repalce it with something else -- otherwise it will trigger a warning
and the entry in /proc/irq/ will not be created
since the .name might be used by userspace and we don't want to break
userspace, so we are changing the parameters passed to request_irq()
In st_open(), if STp->in_use is true, STp will be freed by
scsi_tape_put(). However, STp is still used by DEBC_printk() after. It is
better to DEBC_printk() before scsi_tape_put().
Link: https://lore.kernel.org/r/20210311064636.10522-1-lyl2019@mail.ustc.edu.cn Acked-by: Kai Mäkisara <kai.makisara@kolumbus.fi> Signed-off-by: Lv Yunlong <lyl2019@mail.ustc.edu.cn> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
io_sq_thread_finish() is called in io_ring_ctx_free(), so SQPOLL task is
potentially running submitting new requests. It's not a disaster because
of using a "try" variant of percpu_ref_get, but is far from nice.
Remove ctx from the sqd ctx list earlier, before cancellation loop, so
SQPOLL can't find it and so won't submit new requests.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
It's racy to modify req->flags from a not owning context, e.g. linked
timeout calling req_set_fail_links() for the master request might race
with that request setting/clearing flags while being executed
concurrently. Just remove req_set_fail_links(prev) from
io_link_timeout_fn(), io_async_find_and_cancel() and functions down the
line take care of setting the fail bit.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Don't send fake signals to PF_IO_WORKER threads, they don't accept
signals. Just treat them like kthreads in this regard, all they need
is a wakeup as no forced kernel/user transition is needed.
When the server tries to do a callback and a client fails it due to
authentication problems, we need the server to set callback down
flag in RENEW so that client can recover.
Suggested-by: Bruce Fields <bfields@redhat.com> Signed-off-by: Olga Kornievskaia <kolga@netapp.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Tested-by: Benjamin Coddington <bcodding@redhat.com> Link: https://lore.kernel.org/linux-nfs/FB84E90A-1A03-48B3-8BF7-D9D10AC2C9FE@oracle.com/T/#t Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The driver was setting bit clock polarity opposite to intended polarity.
Also simplify the code by grouping ADC and DAC clock configurations into
a single field.
Many systems do not use ACPI and hence do not provide a DMI table. On
non-ACPI systems a warning, such as the following, is printed on boot.
WARNING KERN tegra-audio-graph-card sound: ASoC: no DMI vendor name!
The variable 'dmi_available' is not exported and so currently cannot be
used by kernel modules without adding an accessor. However, it is
possible to use the function is_acpi_device_node() to determine if the
sound card is an ACPI device and hence indicate if we expect a DMI table
to be present. Therefore, call is_acpi_device_node() to see if we are
using ACPI and only parse the DMI table if we are booting with ACPI.
Most steps in this table are steps of 3dB (300 centi-dB), so we can
simplify the table.
This not only reduces the amount of space it takes inside the kernel,
this also makes alsa-lib's mixer code actually accept the table, where
as before this change alsa-lib saw the "ADC PGA Gain" control as a
control without a dB scale.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210228160441.241110-1-hdegoede@redhat.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The original default value written to the DAP_AVC_CTRL register during
sgtl5000_i2c_probe() was 0x0510. This would incorrectly write values to
bits 4 and 10, which are defined as RESERVED. It would also not set
bits 12 and 14 to their correct RESET values of 0x1, and instead set
them to 0x0. While the DAP_AVC module is effectively disabled because
the EN bit is 0, this default value is still writing invalid values to
registers that are marked as read-only and RESERVED as well as not
setting bits 12 and 14 to their correct default values as defined by the
datasheet.
The correct value that should be written to the DAP_AVC_CTRL register is
0x5100, which configures the register bits to the default values defined
by the datasheet, and prevents any writes to bits defined as
'read-only'. Generally speaking, it is best practice to NOT attempt to
write values to registers/bits defined as RESERVED, as it generally
produces unwanted/undefined behavior, or errors.
Also, all credit for this patch should go to my colleague Dan MacDonald
<dmacdonald@curbellmedical.com> for finding this error in the first
place.
The adc_vol_tlv volume-control has a range from -17.625 dB to +30 dB,
not -176.25 dB to + 300 dB. This wrong scale is esp. a problem in userspace
apps which translate the dB scale to a linear scale. With the logarithmic
dB scale being of by a factor of 10 we loose all precision in the lower
area of the range when apps translate things to a linear scale.
E.g. the 0 dB default, which corresponds with a value of 47 of the
0 - 127 range for the control, would be shown as 0/100 in alsa-mixer.
Since the centi-dB values used in the TLV struct cannot represent the
0.375 dB step size used by these controls, change the TLV definition
for them to specify a min and max value instead of min + stepsize.
Note this mirrors commit 3f31f7d9b540 ("ASoC: rt5670: Fix dac- and adc-
vol-tlv values being off by a factor of 10") which made the exact same
change to the rt5670 codec driver.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210226143817.84287-3-hdegoede@redhat.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The adc_vol_tlv volume-control has a range from -17.625 dB to +30 dB,
not -176.25 dB to + 300 dB. This wrong scale is esp. a problem in userspace
apps which translate the dB scale to a linear scale. With the logarithmic
dB scale being of by a factor of 10 we loose all precision in the lower
area of the range when apps translate things to a linear scale.
E.g. the 0 dB default, which corresponds with a value of 47 of the
0 - 127 range for the control, would be shown as 0/100 in alsa-mixer.
Since the centi-dB values used in the TLV struct cannot represent the
0.375 dB step size used by these controls, change the TLV definition
for them to specify a min and max value instead of min + stepsize.
Note this mirrors commit 3f31f7d9b540 ("ASoC: rt5670: Fix dac- and adc-
vol-tlv values being off by a factor of 10") which made the exact same
change to the rt5670 codec driver.
Signed-off-by: Hans de Goede <hdegoede@redhat.com> Link: https://lore.kernel.org/r/20210226143817.84287-2-hdegoede@redhat.com Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
In case if isi.nr_pages is 0, we are making sis->pages (which is
unsigned int) a huge value in iomap_swapfile_activate() by assigning -1.
This could cause a kernel crash in kernel v4.18 (with below signature).
Or could lead to unknown issues on latest kernel if the fake big swap gets
used.
Fix this issue by returning -EINVAL in case of nr_pages is 0, since it
is anyway a invalid swapfile. Looks like this issue will be hit when
we have pagesize < blocksize type of configuration.
I was able to hit the issue in case of a tiny swap file with below
test script.
https://raw.githubusercontent.com/riteshharjani/LinuxStudy/master/scripts/swap-issue.sh
kernel crash analysis on v4.18
==============================
On v4.18 kernel, it causes a kernel panic, since sis->pages becomes
a huge value and isi.nr_extents is 0. When 0 is returned it is
considered as a swapfile over NFS and SWP_FILE is set (sis->flags |= SWP_FILE).
Then when swapoff was getting called it was calling a_ops->swap_deactivate()
if (sis->flags & SWP_FILE) is true. Since a_ops->swap_deactivate() is
NULL in case of XFS, it causes below panic.
Panic signature on v4.18 kernel:
=======================================
root@qemu:/home/qemu# [ 8291.723351] XFS (loop2): Unmounting Filesystem
[ 8292.123104] XFS (loop2): Mounting V5 Filesystem
[ 8292.132451] XFS (loop2): Ending clean mount
[ 8292.263362] Adding 4294967232k swap on /mnt1/test/swapfile. Priority:-2 extents:1 across:274877906880k
[ 8292.277834] Unable to handle kernel paging request for instruction fetch
[ 8292.278677] Faulting instruction address: 0x00000000
cpu 0x19: Vector: 400 (Instruction Access) at [c0000009dd5b7ad0]
pc: 0000000000000000
lr: c0000000003eb9dc: destroy_swap_extents+0xfc/0x120
sp: c0000009dd5b7d50
msr: 8000000040009033
current = 0xc0000009b6710080
paca = 0xc00000003ffcb280 irqmask: 0x03 irq_happened: 0x01
pid = 5604, comm = swapoff
Linux version 4.18.0 (riteshh@xxxxxxx) (gcc version 8.4.0 (Ubuntu 8.4.0-1ubuntu1~18.04)) #57 SMP Wed Mar 3 01:33:04 CST 2021
enter ? for help
[link register ] c0000000003eb9dc destroy_swap_extents+0xfc/0x120
[c0000009dd5b7d50] c0000000025a7058 proc_poll_event+0x0/0x4 (unreliable)
[c0000009dd5b7da0] c0000000003f0498 sys_swapoff+0x3f8/0x910
[c0000009dd5b7e30] c00000000000bbe4 system_call+0x5c/0x70
Exception: c01 (System Call) at 00007ffff7d208d8
Signed-off-by: Ritesh Harjani <riteshh@linux.ibm.com>
[djwong: rework the comment to provide more details] Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
svc_authenticate sets rq_authop and calls svcauth_gss_accept. The
kmalloc(sizeof(*svcdata), GFP_KERNEL) fails, leaving rq_auth_data NULL,
and returning SVC_DENIED.
This causes svc_process_common to go to err_bad_auth, and eventually
call svc_authorise. That calls ->release == svcauth_gss_release, which
tries to dereference rq_auth_data.
When NFSD_V4 is enabled and CRYPTO is disabled,
Kbuild gives the following warning:
WARNING: unmet direct dependencies detected for CRYPTO_SHA256
Depends on [n]: CRYPTO [=n]
Selected by [y]:
- NFSD_V4 [=y] && NETWORK_FILESYSTEMS [=y] && NFSD [=y] && PROC_FS [=y]
WARNING: unmet direct dependencies detected for CRYPTO_MD5
Depends on [n]: CRYPTO [=n]
Selected by [y]:
- NFSD_V4 [=y] && NETWORK_FILESYSTEMS [=y] && NFSD [=y] && PROC_FS [=y]
This is because NFSD_V4 selects CRYPTO_MD5 and CRYPTO_SHA256,
without depending on or selecting CRYPTO, despite those config options
being subordinate to CRYPTO.
Signed-off-by: Julian Braha <julianbraha@gmail.com> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
When generic/371 is run on kvm-xfstests using 5.10 and 5.11 kernels, it
fails at significant rates on the two test scenarios that disable
delayed allocation (ext3conv and data_journal) and force actual block
allocation for the fallocate and pwrite functions in the test. The
failure rate on 5.10 for both ext3conv and data_journal on one test
system typically runs about 85%. On 5.11, the failure rate on ext3conv
sometimes drops to as low as 1% while the rate on data_journal
increases to nearly 100%.
The observed failures are largely due to ext4_should_retry_alloc()
cutting off block allocation retries when s_mb_free_pending (used to
indicate that a transaction in progress will free blocks) is 0.
However, free space is usually available when this occurs during runs
of generic/371. It appears that a thread attempting to allocate
blocks is just missing transaction commits in other threads that
increase the free cluster count and reset s_mb_free_pending while
the allocating thread isn't running. Explicitly testing for free space
availability avoids this race.
The current code uses a post-increment operator in the conditional
expression that determines whether the retry limit has been exceeded.
This means that the conditional expression uses the value of the
retry counter before it's increased, resulting in an extra retry cycle.
The current code actually retries twice before hitting its retry limit
rather than once.
Increasing the retry limit to 3 from the current actual maximum retry
count of 2 in combination with the change described above reduces the
observed failure rate to less that 0.1% on both ext3conv and
data_journal with what should be limited impact on users sensitive to
the overhead caused by retries.
A per filesystem percpu counter exported via sysfs is added to allow
users or developers to track the number of times the retry limit is
exceeded without resorting to debugging methods. This should provide
some insight into worst case retry behavior.
Signed-off-by: Eric Whitney <enwlinux@gmail.com> Link: https://lore.kernel.org/r/20210218151132.19678-1-enwlinux@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Right now "mount -t virtiofs -o dax myfs /mnt/virtiofs" succeeds even
if filesystem deivce does not have a cache window and hence DAX can't
be supported.
This gives a false sense to user that they are using DAX with virtiofs
but fact of the matter is that they are not.
Fix this by returning error if dax can't be supported and user has asked
for it.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Memory hotplug may fail on systems with CONFIG_RANDOMIZE_BASE because the
linear map range is not checked correctly.
The start physical address that linear map covers can be actually at the
end of the range because of randomization. Check that and if so reduce it
to 0.
This can be verified on QEMU with setting kaslr-seed to ~0ul:
Piotr Krysiuk [Tue, 6 Apr 2021 20:59:39 +0000 (21:59 +0100)]
UBUNTU: SAUCE: bpf, x86: Validate computation of branch displacements for x86-32
The branch displacement logic in the BPF JIT compilers for x86 assumes
that, for any generated branch instruction, the distance cannot
increase between optimization passes.
But this assumption can be violated due to how the distances are
computed. Specifically, whenever a backward branch is processed in
do_jit(), the distance is computed by subtracting the positions in the
machine code from different optimization passes. This is because part
of addrs[] is already updated for the current optimization pass, before
the branch instruction is visited.
And so the optimizer can expand blocks of machine code in some cases.
This can confuse the optimizer logic, where it assumes that a fixed
point has been reached for all machine code blocks once the total
program size stops changing. And then the JIT compiler can output
abnormal machine code containing incorrect branch displacements.
To mitigate this issue, we assert that a fixed point is reached while
populating the output image. This rejects any problematic programs.
The issue affects both x86-32 and x86-64. We mitigate separately to
ease backporting.
Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Reviewed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
(cherry picked from commit 26f55a59dc65ff77cd1c4b37991e26497fc68049
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git)
CVE-2021-29154 Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Piotr Krysiuk [Mon, 5 Apr 2021 21:52:15 +0000 (22:52 +0100)]
UBUNTU: SAUCE: bpf, x86: Validate computation of branch displacements for x86-64
The branch displacement logic in the BPF JIT compilers for x86 assumes
that, for any generated branch instruction, the distance cannot
increase between optimization passes.
But this assumption can be violated due to how the distances are
computed. Specifically, whenever a backward branch is processed in
do_jit(), the distance is computed by subtracting the positions in the
machine code from different optimization passes. This is because part
of addrs[] is already updated for the current optimization pass, before
the branch instruction is visited.
And so the optimizer can expand blocks of machine code in some cases.
This can confuse the optimizer logic, where it assumes that a fixed
point has been reached for all machine code blocks once the total
program size stops changing. And then the JIT compiler can output
abnormal machine code containing incorrect branch displacements.
To mitigate this issue, we assert that a fixed point is reached while
populating the output image. This rejects any problematic programs.
The issue affects both x86-32 and x86-64. We mitigate separately to
ease backporting.
Signed-off-by: Piotr Krysiuk <piotras@gmail.com> Reviewed-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
(cherry picked from commit e4d4d456436bfb2fe412ee2cd489f7658449b098
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf.git)
CVE-2021-29154 Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
v2:
* Move Wa_14010685332 into it's own function - vsyrjala
* Add TODO comment about figuring out if we can move this workaround - imre
v3:
* Rename cnp_irq_post_reset() to cnp_display_clock_wa()
* Add TODO item mentioning we need to clarify which platforms this
workaround applies to
* Just use ibx_irq_reset() in gen8_irq_reset(). This code should be
functionally equivalent on gen9 bc to the code v2 added
* Drop icp_hpd_irq_setup() call in spt_hpd_irq_setup(), this looks to be
more or less identical to spt_hpd_irq_setup() minus additionally enabling
one port. Will update i915 to use icp_hpd_irq_setup() for ICP in a
separate patch.
v4:
* Revert Wa_14010685332 system list in comments to how it was before
* Add back HAS_PCH_SPLIT() check before calling ibx_irq_reset()
Cc: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Tejas Upadhyay <tejaskumarx.surendrakumar.upadhyay@intel.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20210217180016.1937401-1-lyude@redhat.com
(cherry picked from commit 59b7cb44cffde6ab5b8ace1aef9b79f50be1c3eb git://anongit.freedesktop.org/drm-tip) Signed-off-by: Koba Ko <koba.ko@canonical.com> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>