Nicholas Piggin [Sat, 30 Jan 2021 13:08:24 +0000 (23:08 +1000)]
powerpc/64s: move bad_page_fault handling to C
This simplifies code, and it is also useful when introducing
interrupt handler wrappers when introducing wrapper functionality
that doesn't cope with asm entry code calling into more than one
handler function.
32-bit and 64e still have some such cases, which limits some ways
they can use interrupt wrappers.
Nicholas Piggin [Sat, 30 Jan 2021 13:08:22 +0000 (23:08 +1000)]
powerpc/64s: add do_bad_page_fault_segv handler
This function acts like an interrupt handler so it needs to follow
the standard interrupt handler function signature which will be
introduced in a future change.
Nicholas Piggin [Sat, 30 Jan 2021 13:08:16 +0000 (23:08 +1000)]
powerpc: remove arguments from fault handler functions
Make mm fault handlers all just take the pt_regs * argument and load
DAR/DSISR from that. Make those that return a value return long.
This is done to make the function signatures match other handlers, which
will help with a future patch to add wrappers. Explicit arguments could
be added for performance but that would require more wrapper macro
variants.
Interrupts that occur in kernel mode expect that context tracking
is set to kernel. Enabling local irqs before context tracking
switches from guest to host means interrupts can come in and trigger
warnings about wrong context, and possibly worse.
Nicholas Piggin [Sat, 30 Jan 2021 13:08:11 +0000 (23:08 +1000)]
powerpc/64s: interrupt exit improve bounding of interrupt recursion
When replaying pending soft-masked interrupts when an interrupt returns
to an irqs-enabled context, there is a special case required if this was
an asynchronous interrupt to avoid unbounded interrupt recursion.
This case was not tested for in the case the asynchronous interrupt hit
in user context, because a subsequent nested interrupt would by definition
hit in kernel mode, which then exits via the kernel path which does test
this case.
There is no reason to allow this for such interrupts. While recursion is
bounded at the next level, it's simpler and uses less stack to apply the
replay logic consistently.
This also expands the comment which was really pretty poor and didn't
explain the problem (I can say that because I wrote it).
powerpc/pci: Move PHB discovery for PCI_DN using platforms
Make powernv, pseries, powermac and maple use ppc_mc.discover_phbs.
These platforms need to be done together because they all depend on
pci_dn's being created from the DT. The pci_dn contains a pointer to
the relevant pci_controller so they need to be created after the
pci_controller structures are available, but before PCI devices are
scanned. Currently this ordering is provided by initcalls and the
sequence is:
1. PHBs are discovered (setup_arch) (early boot, pre-initcalls)
2. pci_dn are created from the unflattended DT (core initcall)
3. PHBs are scanned pcibios_init() (subsys initcall)
The new ppc_md.discover_phbs() function is also a core_initcall so we
can't guarantee ordering between the creation of pci_controllers and
the creation of pci_dn's which require a pci_controller. We could use
the postcore, or core_sync initcall levels, but it's cleaner to just
move the pci_dn setup into the per-PHB inits which occur inside of
.discover_phb() for these platforms. This brings the boot-time path in
line with the PHB hotplug path that is used for pseries DLPAR
operations too.
Signed-off-by: Oliver O'Halloran <oohall@gmail.com>
[mpe: Squash powermac & maple in to avoid breakage those platforms,
convert memblock allocs to use kmalloc to avoid warnings] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20201103043523.916109-2-oohall@gmail.com
On many powerpc platforms the discovery and initalisation of
pci_controllers (PHBs) happens inside of setup_arch(). This is very early
in boot (pre-initcalls) and means that we're initialising the PHB long
before many basic kernel services (slab allocator, debugfs, a real ioremap)
are available.
On PowerNV this causes an additional problem since we map the PHB registers
with ioremap(). As of commit d538aadc2718 ("powerpc/ioremap: warn on early
use of ioremap()") a warning is printed because we're using the "incorrect"
API to setup and MMIO mapping in searly boot. The kernel does provide
early_ioremap(), but that is not intended to create long-lived MMIO
mappings and a seperate warning is printed by generic code if
early_ioremap() mappings are "leaked."
This is all fixable with dumb hacks like using early_ioremap() to setup
the initial mapping then replacing it with a real ioremap later on in
boot, but it does raise the question: Why the hell are we setting up the
PHB's this early in boot?
The old and wise claim it's due to "hysterical rasins." Aside from amused
grapes there doesn't appear to be any real reason to maintain the current
behaviour. Already most of the newer embedded platforms perform PHB
discovery in an arch_initcall and between the end of setup_arch() and the
start of initcalls none of the generic kernel code does anything PCI
related. On powerpc scanning PHBs occurs in a subsys_initcall so it should
be possible to move the PHB discovery to a core, postcore or arch initcall.
This patch adds the ppc_md.discover_phbs hook and a core_initcall stub that
calls it. The core_initcalls are the earliest to be called so this will
any possibly issues with dependency between initcalls. This isn't just an
academic issue either since on pseries and PowerNV EEH init occurs in an
arch_initcall and depends on the pci_controllers being available, similarly
the creation of pci_dns occurs at core_initcall_sync (i.e. between core and
postcore initcalls). These problems need to be addressed seperately.
The pnv_phb->initialized flag is an odd beast. It was added back in 2012 in
commit db1266c85261 ("powerpc/powernv: Skip check on PE if necessary") to
allow devices to be enabled even if the device had not yet been assigned to
a PE. Allowing the device to be enabled before the PE is configured may
cause spurious EEH events since none of the IOMMU context has been setup.
I'm not entirely sure why this was ever necessary. My best guess is that it
was an workaround for a bug or some other undesireable behaviour from the
PCI core. Either way, it's unnecessary now since as of commit dc3d8f85bb57
("powerpc/powernv/pci: Re-work bus PE configuration") we can guarantee that
the PE will be configured before the PCI core will allow drivers to bind to
the device.
It's also worth pointing out that the ->initialized flag is only set in
pnv_pci_ioda_create_dbgfs(). That function has its entire body wrapped
in #ifdef CONFIG_DEBUG_FS. As a result, for kernels built without debugfs
(i.e. petitboot) the other checks in pnv_pci_enable_device_hook() are
bypassed entirely.
Christophe Leroy [Wed, 23 Dec 2020 09:38:48 +0000 (09:38 +0000)]
powerpc/xmon: Enable breakpoints on 8xx
Since commit 4ad8622dc548 ("powerpc/8xx: Implement hw_breakpoint"),
8xx has breakpoints so there is no reason to opt breakpoint logic
out of xmon for the 8xx.
Markus Elfring [Tue, 27 Aug 2019 06:44:20 +0000 (08:44 +0200)]
powerpc/82xx: Delete an unnecessary of_node_put() call in pq2ads_pci_init_irq()
A null pointer would be passed to a call of the function “of_node_put”
immediately after a call of the function “of_find_compatible_node” failed
at one place.
Remove this superfluous function call.
This issue was detected by using the Coccinelle software.
Markus Elfring [Tue, 27 Aug 2019 11:34:02 +0000 (13:34 +0200)]
powerpc/pseries: Delete an unnecessary kfree() call in dlpar_store()
A null pointer would be passed to a call of the function “kfree”
immediately after a call of the function “kstrdup” failed at one place.
Remove this superfluous function call.
This issue was detected by using the Coccinelle software.
Qian Cai [Sun, 10 May 2020 05:15:59 +0000 (01:15 -0400)]
powerpc/mm/book3s64/iommu: fix some RCU-list locks
It is safe to traverse mm->context.iommu_group_mem_list with either
mem_list_mutex or the RCU read lock held. Silence a few RCU-list false
positive warnings and fix a few missing RCU read locks.
arch/powerpc/mm/book3s64/iommu_api.c:330 RCU-list traversed in non-reader section!!
Ganesh Goudar [Thu, 28 Jan 2021 10:41:43 +0000 (16:11 +0530)]
powerpc/mce: Remove per cpu variables from MCE handlers
Access to per-cpu variables requires translation to be enabled on
pseries machine running in hash mmu mode, Since part of MCE handler
runs in realmode and part of MCE handling code is shared between ppc
architectures pseries and powernv, it becomes difficult to manage
these variables differently on different architectures, So have
these variables in paca instead of having them as per-cpu variables
to avoid complications.
Ganesh Goudar [Thu, 28 Jan 2021 10:41:42 +0000 (16:11 +0530)]
powerpc/mce: Reduce the size of event arrays
Maximum recursive depth of MCE is 4, Considering the maximum depth
allowed reduce the size of event to 10 from 100. This saves us ~19kB
of memory and has no fatal consequences.
Pingfan Liu [Thu, 22 Oct 2020 06:51:19 +0000 (14:51 +0800)]
powerpc/time: Enable sched clock for irqtime
When CONFIG_IRQ_TIME_ACCOUNTING and CONFIG_VIRT_CPU_ACCOUNTING_GEN, powerpc
does not enable "sched_clock_irqtime" and can not utilize irq time
accounting.
Like x86, powerpc does not use the sched_clock_register() interface. So it
needs an dedicated call to enable_sched_clock_irqtime() to enable irq time
accounting.
The "ibm,arch-vec-5-platform-support" property is a list of pairs of
bytes representing the options and values supported by the platform
firmware. At boot time, Linux scans this list and activates the
available features it recognizes : Radix and XIVE.
A recent change modified the number of entries to loop on and 8 bytes,
4 pairs of { options, values } entries are always scanned. This is
fine on KVM but not on PowerVM which can advertises less. As a
consequence on this platform, Linux reads extra entries pointing to
random data, interprets these as available features and tries to
activate them, leading to a firmware crash in
ibm,client-architecture-support.
Fix that by using the property length of "ibm,arch-vec-5-platform-support".
Fixes: ab91239942a9 ("powerpc/prom: Remove VLA in prom_check_platform_support()") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210122075029.797013-1-clg@kaod.org
powerpc/eeh: Add a debugfs interface to check if a driver supports recovery
If a PCI device's current driver implements the error handling callbacks
EEH can use them to recover the device after an error occurs. For devices
without the error handling callbacks we recover them by removing the device
and re-scanning it so the PCI core puts the device back into a known good
state.
Currently there's no way for userspace to determine if the driver supports
recovery or not which makes it difficult to write automated tests for EEH.
This patch addressing that by adding a debugfs interface for querying if
a specific device can be recovered or not.
The basic EEH test ignores VFs since we the way the eeh_dev_break debugfs
interface works means that if multiple VFs are enabled we may cause errors
on all them them. However, we can work around that by only enabling a
single VF at a time.
This patch adds some infrastructure for finding SR-IOV capable devices and
enabling / disabling VFs so we can exercise the VF specific EEH recovery
paths. Two new tests are added, one for testing EEH aware devices and one
for EEH un-aware VFs.
powerpc/sstep: Fix incorrect return from analyze_instr()
We currently just percolate the return value from analyze_instr()
to the caller of emulate_step(), especially if it is a -1.
For one particular case (opcode = 4) for instructions that aren't
currently emulated, we are returning 'should not be single-stepped'
while we should have returned 0 which says 'did not emulate, may
have to single-step'.
Fixes: 930d6288a26787 ("powerpc: sstep: Add support for maddhd, maddhdu, maddld instructions") Signed-off-by: Ananth N Mavinakayanahalli <ananth@linux.ibm.com> Suggested-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Sandipan Das <sandipan@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/161157999039.64773.14950289716779364766.stgit@thinktux.local
Nicholas Piggin [Mon, 18 Jan 2021 12:34:51 +0000 (22:34 +1000)]
powerpc: Always enable queued spinlocks for 64s, disable for others
Queued spinlocks have shown to have good performance and fairness
properties even on smaller (2 socket) POWER systems. This selects them
automatically for 64s. For other platforms they are de-selected, the
standard spinlock is far simpler and smaller code, and single chips
with a handful of cores is unlikely to show any improvement.
CONFIG_EXPERT still allows this to be changed, e.g., to help debug
performance or correctness issues.
Cédric Le Goater [Sat, 12 Dec 2020 14:27:07 +0000 (15:27 +0100)]
powerpc/vas: Fix IRQ name allocation
The VAS device allocates a generic interrupt to handle page faults but
the IRQ name doesn't show under /proc. This is because it's on
stack. Allocate the name.
KVM: PPC: Make the VMX instruction emulation routines static
These are only used locally. It fixes these W=1 compile errors :
../arch/powerpc/kvm/powerpc.c:1521:5: error: no previous prototype for ‘kvmppc_get_vmx_dword’ [-Werror=missing-prototypes]
1521 | int kvmppc_get_vmx_dword(struct kvm_vcpu *vcpu, int index, u64 *val)
| ^~~~~~~~~~~~~~~~~~~~
../arch/powerpc/kvm/powerpc.c:1539:5: error: no previous prototype for ‘kvmppc_get_vmx_word’ [-Werror=missing-prototypes]
1539 | int kvmppc_get_vmx_word(struct kvm_vcpu *vcpu, int index, u64 *val)
| ^~~~~~~~~~~~~~~~~~~
../arch/powerpc/kvm/powerpc.c:1557:5: error: no previous prototype for ‘kvmppc_get_vmx_hword’ [-Werror=missing-prototypes]
1557 | int kvmppc_get_vmx_hword(struct kvm_vcpu *vcpu, int index, u64 *val)
| ^~~~~~~~~~~~~~~~~~~~
../arch/powerpc/kvm/powerpc.c:1575:5: error: no previous prototype for ‘kvmppc_get_vmx_byte’ [-Werror=missing-prototypes]
1575 | int kvmppc_get_vmx_byte(struct kvm_vcpu *vcpu, int index, u64 *val)
| ^~~~~~~~~~~~~~~~~~~
Fixes: acc9eb9305fe ("KVM: PPC: Reimplement LOAD_VMX/STORE_VMX instruction mmio emulation with analyse_instr() input") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-19-clg@kaod.org
../arch/powerpc/mm/book3s64/slb.c:380:6: error: no previous prototype for ‘preload_new_slb_context’ [-Werror=missing-prototypes]
380 | void preload_new_slb_context(unsigned long start, unsigned long sp)
| ^~~~~~~~~~~~~~~~~~~~~~~
../arch/powerpc/mm/book3s64/hash_utils.c:1867:6: error: no previous prototype for ‘hpte_insert_repeating’ [-Werror=missing-prototypes]
1867 | long hpte_insert_repeating(unsigned long hash, unsigned long vpn,
| ^~~~~~~~~~~~~~~~~~~~~
../arch/powerpc/mm/book3s64/hash_utils.c:1515:5: error: no previous prototype for ‘__hash_page’ [-Werror=missing-prototypes]
1515 | int __hash_page(unsigned long trap, unsigned long ea, unsigned long dsisr,
| ^~~~~~~~~~~
../arch/powerpc/mm/book3s64/hash_utils.c:1850:6: error: no previous prototype for ‘low_hash_fault’ [-Werror=missing-prototypes]
1850 | void low_hash_fault(struct pt_regs *regs, unsigned long address, int rc)
| ^~~~~~~~~~~~~~
Commit 650b55b707fd ("powerpc: Add prefixed instructions to instruction
data type") removed the use of patch_imm32_load_insns(). Clean it up
to fix this W=1 compile error :
../arch/powerpc/kernel/optprobes.c:149:6: error: no previous prototype for ‘patch_imm32_load_insns’ [-Werror=missing-prototypes]
149 | void patch_imm32_load_insns(unsigned int val, kprobe_opcode_t *addr)
Fixes: 650b55b707fd ("powerpc: Add prefixed instructions to instruction data type") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-11-clg@kaod.org
powerpc/pseries/ras: Make init_ras_hotplug_IRQ() static
init_ras_hotplug_IRQ() is a local routine used by a machine init call
and it doesn't need to be external.
It fixes this W=1 compile error:
../arch/powerpc/platforms/pseries/ras.c:125:12: error: no previous prototype for ‘init_ras_hotplug_IRQ’ [-Werror=missing-prototypes]
125 | int __init init_ras_hotplug_IRQ(void)
| ^~~~~~~~~~~~~~~~~~~~
powerpc/pseries/eeh: Make pseries_pcibios_bus_add_device() static
pseries_pcibios_bus_add_device() is a local routine defining the
pcibios_bus_add_device() handler of the pseries machine in
eeh_pseries_init(). It doesn't need to be external.
It fixes this W=1 compile error:
../arch/powerpc/platforms/pseries/eeh_pseries.c:46:6: error: no previous prototype for ‘pseries_pcibios_bus_add_device’ [-Werror=missing-prototypes]
46 | void pseries_pcibios_bus_add_device(struct pci_dev *pdev)
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: dae7253f9f78 ("powerpc/pseries: Add pseries SR-IOV Machine dependent calls") Signed-off-by: Cédric Le Goater <clg@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210104143206.695198-4-clg@kaod.org
The last use of 'status' was removed in 2012. Remove the variable to
fix this W=1 compile error.
../arch/powerpc/platforms/pseries/ras.c: In function ‘ras_epow_interrupt’:
../arch/powerpc/platforms/pseries/ras.c:318:6: error: variable ‘status’ set but not used [-Werror=unused-but-set-variable]
318 | int status;
| ^~~~~~
Kajol Jain [Mon, 28 Dec 2020 08:52:04 +0000 (14:22 +0530)]
powerpc/perf/hv-24x7: Dont create sysfs event files for dummy events
The hv_24x7 performance monitoring unit creates a list of supported
events from the event catalog obtained via HCALL. The hv_24x7 catalog
could also contain invalid or dummy events with names like RESERVED*.
These events do not have any hardware counters backing them. Add a
check to string compare the event names to filter such events out.
Michael Ellerman [Thu, 17 Dec 2020 00:53:06 +0000 (11:53 +1100)]
powerpc/64s/kuap: Use mmu_has_feature()
In commit 8150a153c013 ("powerpc/64s: Use early_mmu_has_feature() in
set_kuap()") we switched the KUAP code to use early_mmu_has_feature(),
to avoid a bug where we called set_kuap() before feature patching had
been done, leading to recursion and crashes.
That path, which called probe_kernel_read() from printk(), has since
been removed, see commit 2ac5a3bf7042 ("vsprintf: Do not break early
boot with probing addresses").
Additionally probe_kernel_read() no longer invokes any KUAP routines,
since commit fe557319aa06 ("maccess: rename probe_kernel_{read,write}
to copy_{from,to}_kernel_nofault") and c33165253492 ("powerpc: use
non-set_fs based maccess routines").
So it should now be safe to use mmu_has_feature() in the KUAP
routines, because we shouldn't invoke them prior to feature patching.
This is essentially a revert of commit 8150a153c013 ("powerpc/64s: Use
early_mmu_has_feature() in set_kuap()"), but we've since added a
second usage of early_mmu_has_feature() in get_kuap(), so we convert
that to use mmu_has_feature() as well.
Linus Torvalds [Sat, 2 Jan 2021 20:22:46 +0000 (12:22 -0800)]
Merge tag 's390-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 cleanups from Vasily Gorbik:
"Update defconfigs and sort config select list"
* tag 's390-5.11-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/Kconfig: sort config S390 select list once again
s390: update defconfigs
Linus Torvalds [Sat, 2 Jan 2021 19:53:05 +0000 (11:53 -0800)]
Merge tag 'pm-5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix a crash in intel_pstate during resume from suspend-to-RAM
that may occur after recent changes and two resource leaks in error
paths in the operating performance points (OPP) framework, add a new
C-states table to intel_idle and update the cpuidle MAINTAINERS entry
to cover the governors too.
Specifics:
- Fix recently introduced crash in the intel_pstate driver that
occurs if scale-invariance is disabled during resume from
suspend-to-RAM due to inconsistent changes of APERF or MPERF MSR
values made by the platform firmware (Rafael Wysocki).
- Fix a memory leak and add a missing clk_put() in error paths in the
OPP framework (Quanyang Wang, Viresh Kumar).
- Add new C-states table for SnowRidge processors to the intel_idle
driver (Artem Bityutskiy).
- Update the MAINTAINERS entry for cpuidle to make it clear that the
governors are covered by it too (Lukas Bulwahn)"
* tag 'pm-5.11-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: add SnowRidge C-state table
cpufreq: intel_pstate: Fix fast-switch fallback path
opp: Call the missing clk_put() on error
opp: fix memory leak in _allocate_opp_table
MAINTAINERS: include governors into CPU IDLE TIME MANAGEMENT FRAMEWORK
Linus Torvalds [Fri, 1 Jan 2021 20:58:07 +0000 (12:58 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a load of driver fixes (12 ufs, 1 mpt3sas, 1 cxgbi).
The big core two fixes are for power management ("block: Do not accept
any requests while suspended" and "block: Fix a race in the runtime
power management code") which finally sorts out the resume problems
we've occasionally been having.
To make the resume fix, there are seven necessary precursors which
effectively renames REQ_PREEMPT to REQ_PM, so every "special" request
in block is automatically a power management exempt one.
All of the non-PM preempt cases are removed except for the one in the
SCSI Parallel Interface (spi) domain validation which is a genuine
case where we have to run requests at high priority to validate the
bus so this becomes an autopm get/put protected request"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (22 commits)
scsi: cxgb4i: Fix TLS dependency
scsi: ufs: Un-inline ufshcd_vops_device_reset function
scsi: ufs: Re-enable WriteBooster after device reset
scsi: ufs-mediatek: Use correct path to fix compile error
scsi: mpt3sas: Signedness bug in _base_get_diag_triggers()
scsi: block: Do not accept any requests while suspended
scsi: block: Remove RQF_PREEMPT and BLK_MQ_REQ_PREEMPT
scsi: core: Only process PM requests if rpm_status != RPM_ACTIVE
scsi: scsi_transport_spi: Set RQF_PM for domain validation commands
scsi: ide: Mark power management requests with RQF_PM instead of RQF_PREEMPT
scsi: ide: Do not set the RQF_PREEMPT flag for sense requests
scsi: block: Introduce BLK_MQ_REQ_PM
scsi: block: Fix a race in the runtime power management code
scsi: ufs-pci: Enable UFSHCD_CAP_RPM_AUTOSUSPEND for Intel controllers
scsi: ufs-pci: Fix recovery from hibernate exit errors for Intel controllers
scsi: ufs-pci: Ensure UFS device is in PowerDown mode for suspend-to-disk ->poweroff()
scsi: ufs-pci: Fix restore from S4 for Intel controllers
scsi: ufs-mediatek: Keep VCC always-on for specific devices
scsi: ufs: Allow regulators being always-on
scsi: ufs: Clear UAC for RPMB after ufshcd resets
...
Linus Torvalds [Fri, 1 Jan 2021 20:49:09 +0000 (12:49 -0800)]
Merge tag 'block-5.11-2021-01-01' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Two minor block fixes from this last week that should go into 5.11:
- Add missing NOWAIT debugfs definition (Andres)
- Fix kerneldoc warning introduced this merge window (Randy)"
* tag 'block-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
block: add debugfs stanza for QUEUE_FLAG_NOWAIT
fs: block_dev.c: fix kernel-doc warnings from struct block_device changes
Linus Torvalds [Fri, 1 Jan 2021 20:29:49 +0000 (12:29 -0800)]
Merge tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"A few fixes that should go into 5.11, all marked for stable as well:
- Fix issue around identity COW'ing and users that share a ring
across processes
- Fix a hang associated with unregistering fixed files (Pavel)
- Move the 'process is exiting' cancelation a bit earlier, so
task_works aren't affected by it (Pavel)"
* tag 'io_uring-5.11-2021-01-01' of git://git.kernel.dk/linux-block:
kernel/io_uring: cancel io_uring before task works
io_uring: fix io_sqe_files_unregister() hangs
io_uring: add a helper for setting a ref node
io_uring: don't assume mm is constant across submits
Linus Torvalds [Mon, 28 Dec 2020 19:40:22 +0000 (11:40 -0800)]
depmod: handle the case of /sbin/depmod without /sbin in PATH
Commit 436e980e2ed5 ("kbuild: don't hardcode depmod path") stopped
hard-coding the path of depmod, but in the process caused trouble for
distributions that had that /sbin location, but didn't have it in the
PATH (generally because /sbin is limited to the super-user path).
Work around it for now by just adding /sbin to the end of PATH in the
depmod.sh script.
Pavel Begunkov [Wed, 30 Dec 2020 21:34:16 +0000 (21:34 +0000)]
kernel/io_uring: cancel io_uring before task works
For cancelling io_uring requests it needs either to be able to run
currently enqueued task_works or having it shut down by that moment.
Otherwise io_uring_cancel_files() may be waiting for requests that won't
ever complete.
Go with the first way and do cancellations before setting PF_EXITING and
so before putting the task_work infrastructure into a transition state
where task_work_run() would better not be called.
Pavel Begunkov [Wed, 30 Dec 2020 21:34:15 +0000 (21:34 +0000)]
io_uring: fix io_sqe_files_unregister() hangs
io_sqe_files_unregister() uninterruptibly waits for enqueued ref nodes,
however requests keeping them may never complete, e.g. because of some
userspace dependency. Make sure it's interruptible otherwise it would
hang forever.