Pingfan Liu [Fri, 28 Aug 2020 13:39:57 +0000 (21:39 +0800)]
arm64/relocate_kernel: remove redundant code
Kernel startup entry point requires disabling MMU and D-cache.
As for kexec-reboot, taking a close look at "msr sctlr_el1, x12" in
__cpu_soft_restart as the following:
-1. booted at EL1
The instruction is enough to disable MMU and I/D cache for
EL1 regime.
-2. booted at EL2, using VHE
Access to SCTLR_EL1 is redirected to SCTLR_EL2 in EL2. So the instruction
is enough to disable MMU and clear I+C bits for EL2 regime.
-3. booted at EL2, not using VHE
The instruction itself can not affect EL2 regime. But The hyp-stub doesn't
enable the MMU and I/D cache for EL2 regime. And KVM also disable them for EL2
regime when its unloaded, or execute a HVC_SOFT_RESTART call. So when
kexec-reboot, the code in KVM has prepare the requirement.
As a conclusion, disabling MMU and clearing I+C bits in
SYM_CODE_START(arm64_relocate_new_kernel) is redundant, and can be removed
Signed-off-by: Pingfan Liu <kernelfans@gmail.com> Cc: James Morse <james.morse@arm.com> Cc: Geoff Levand <geoff@infradead.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Remi Denis-Courmont <remi.denis.courmont@huawei.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: kvmarm@lists.cs.columbia.edu Link: https://lore.kernel.org/r/1598621998-20563-1-git-send-email-kernelfans@gmail.com
To: linux-arm-kernel@lists.infradead.org Signed-off-by: Will Deacon <will@kernel.org>
The macro was introduced by commit <ecf35a237a85> ("arm64: PTE/PMD
contiguous bit definition") at the beginning. It's only used by
commit <348a65cdcbbf> ("arm64: Mark kernel page ranges contiguous"),
which was reverted later by commit <667c27597ca8>. This makes the
macro unused.
This removes the unused macro (CONT_RANGE_OFFSET).
arm64/cpuinfo: Define HWCAP name arrays per their actual bit definitions
HWCAP name arrays (hwcap_str, compat_hwcap_str, compat_hwcap2_str) that are
scanned for /proc/cpuinfo are detached from their bit definitions making it
vulnerable and difficult to correlate. It is also bit problematic because
during /proc/cpuinfo dump these arrays get traversed sequentially assuming
they reflect and match actual HWCAP bit sequence, to test various features
for a given CPU. This redefines name arrays per their HWCAP bit definitions
. It also warns after detecting any feature which is not expected on arm64.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Brown <broonie@kernel.org> Cc: Dave Martin <Dave.Martin@arm.com> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1599630535-29337-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
In certain page migration situations, a THP page can be migrated without
being split into it's constituent subpages. This saves time required to
split a THP and put it back together when required. But it also saves an
wider address range translation covered by a single TLB entry, reducing
future page fault costs.
A previous patch changed platform THP helpers per generic memory semantics,
clearing the path for THP migration support. This adds two more THP helpers
required to create PMD migration swap entries. Now enable THP migration via
ARCH_ENABLE_THP_MIGRATION.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1599627183-14453-3-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
arm64/mm: Change THP helpers to comply with generic MM semantics
pmd_present() and pmd_trans_huge() are expected to behave in the following
manner during various phases of a given PMD. It is derived from a previous
detailed discussion on this topic [1] and present THP documentation [2].
pmd_present(pmd):
- Returns true if pmd refers to system RAM with a valid pmd_page(pmd)
- Returns false if pmd refers to a migration or swap entry
pmd_trans_huge(pmd):
- Returns true if pmd refers to system RAM and is a trans huge mapping
Once PMD_SECT_VALID gets cleared, it results in pmd_present() return false
on the PMD entry. It will need another bit apart from PMD_SECT_VALID to re-
affirm pmd_present() as true during the THP split process. To comply with
above mentioned semantics, pmd_trans_huge() should also check pmd_present()
first before testing presence of an actual transparent huge mapping.
The solution:
Ideally PMD_TYPE_SECT should have been used here instead. But it shares the
bit position with PMD_SECT_VALID which is used for THP invalidation. Hence
it will not be there for pmd_present() check after pmdp_invalidate().
A new software defined PMD_PRESENT_INVALID (bit 59) can be set on the PMD
entry during invalidation which can help pmd_present() return true and in
recognizing the fact that it still points to memory.
This bit is transient. During the split process it will be overridden by a
page table page representing normal pages in place of erstwhile huge page.
Other pmdp_invalidate() callers always write a fresh PMD value on the entry
overriding this transient PMD_PRESENT_INVALID bit, which makes it safe.
arm64: topology: Stop using MPIDR for topology information
In the absence of ACPI or DT topology data, we fallback to haphazardly
decoding *something* out of MPIDR. Sadly, the contents of that register are
mostly unusable due to the implementation leniancy and things like Aff0
having to be capped to 15 (despite being encoded on 8 bits).
Consider a simple system with a single package of 32 cores, all under the
same LLC. We ought to be shoving them in the same core_sibling mask, but
MPIDR is going to look like:
The scheduler's MC domain (all CPUs with same LLC) is going to be built via
arch_topology.c::cpu_coregroup_mask()
In there we try to figure out a sensible mask out of the topology
information we have. In short, here we'll pick the smallest of NUMA or
core sibling mask.
MC mask for CPU16 will thus be 16-19... Uh oh. CPUs 16-19 are in two
different unique MC spans, and the scheduler has no idea what to make of
that. That triggers the WARN_ON() added by commit
We could try to come up with some cleverer scheme to figure out which of
the available masks to pick, but really if one of those masks resulted from
MPIDR then it should be discarded because it's bound to be bogus.
I was hoping to give MPIDR a chance for SMT, to figure out which threads are
in the same core using Aff1-3 as core ID, but Sudeep and Robin pointed out
to me that there are systems out there where *all* cores have non-zero
values in their higher affinity fields (e.g. RK3288 has "5" in all of its
cores' MPIDR.Aff1), which would expose a bogus core ID to userspace.
Stop using MPIDR for topology information. When no other source of topology
information is available, mark each CPU as its own core and its NUMA node
as its LLC domain.
arm64/mm/ptdump: Add address markers for BPF regions
Kernel virtual region [BPF_JIT_REGION_START..BPF_JIT_REGION_END] is missing
from address_markers[], hence relevant page table entries are not displayed
with /sys/kernel/debug/kernel_page_tables. This adds those missing markers.
While here, also rename arch/arm64/mm/dump.c which sounds bit ambiguous, as
arch/arm64/mm/ptdump.c instead.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Steven Price <steven.price@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Link: https://lore.kernel.org/r/1599208259-11191-1-git-send-email-anshuman.khandual@arm.com Signed-off-by: Will Deacon <will@kernel.org>
Qi Liu [Fri, 4 Sep 2020 09:57:38 +0000 (17:57 +0800)]
arm64: perf: Remove unnecessary event_idx check
event_idx is obtained from armv8pmu_get_event_idx(), and this idx must be
between ARMV8_IDX_CYCLE_COUNTER and cpu_pmu->num_events. So it's unnecessary
to do this check. Let's remove it.
Ard Biesheuvel [Tue, 25 Aug 2020 13:54:40 +0000 (15:54 +0200)]
arm64: get rid of TEXT_OFFSET
TEXT_OFFSET serves no purpose, and for this reason, it was redefined
as 0x0 in the v5.8 timeframe. Since this does not appear to have caused
any issues that require us to revisit that decision, let's get rid of the
macro entirely, along with any references to it.
Zenghui Yu [Tue, 18 Aug 2020 06:36:25 +0000 (14:36 +0800)]
ACPI/IORT: Remove the unused inline functions
Since commit 8212688600ed ("ACPI/IORT: Fix build error when IOMMU_SUPPORT
is disabled"), iort_fwspec_iommu_ops() and iort_add_device_replay() are not
needed anymore when CONFIG_IOMMU_API is not selected. Let's remove them.
Zenghui Yu [Tue, 18 Aug 2020 06:36:24 +0000 (14:36 +0800)]
ACPI/IORT: Drop the unused @ops of iort_add_device_replay()
Since commit d2e1a003af56 ("ACPI/IORT: Don't call iommu_ops->add_device
directly"), we use the IOMMU core API to replace a direct invoke of the
specified callback. The parameter @ops has therefore became unused. Let's
drop it.
drivers/perf: hisi: Add missing include of linux/module.h
MODULE_*** is used in HiSilicon uncore PMU drivers and is provided by
linux/module.h, but the header file is not directly included. Add the
missing include.
Yue Hu [Tue, 4 Aug 2020 08:53:47 +0000 (16:53 +0800)]
arm64: traps: Add str of description to panic() in die()
Currently, there are different description strings in die() such as
die("Oops",,), die("Oops - BUG",,). And panic() called by die() will
always show "Fatal exception" or "Fatal exception in interrupt".
Note that panic() will run any panic handler via panic_notifier_list.
And the string above will be formatted and placed in static buf[]
which will be passed to handler.
So panic handler can not distinguish which Oops it is from the buf if
we want to do some things like reserve the string in memory or panic
statistics. It's not benefit to debug. We need to add more codes to
troubleshoot. Let's utilize existing resource to make debug much simpler.
Memory Tagging Extension (part of the ARMv8.5 Extensions) provides
a mechanism to detect the sources of memory related errors which
may be vulnerable to exploitation, including bounds violations,
use-after-free, use-after-return, use-out-of-scope and use before
initialization errors.
Add Memory Tagging Extension documentation for the arm64 linux
kernel support.
Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Co-developed-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Szabolcs Nagy <szabolcs.nagy@arm.com> Cc: Will Deacon <will@kernel.org>
Steven Price [Wed, 13 May 2020 15:37:51 +0000 (16:37 +0100)]
arm64: mte: Save tags when hibernating
When hibernating the contents of all pages in the system are written to
disk, however the MTE tags are not visible to the generic hibernation
code. So just before the hibernation image is created copy the tags out
of the physical tag storage into standard memory so they will be
included in the hibernation image. After hibernation apply the tags back
into the physical tag storage.
Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Will Deacon <will@kernel.org>
Steven Price [Wed, 13 May 2020 15:37:50 +0000 (16:37 +0100)]
arm64: mte: Enable swap of tagged pages
When swapping pages out to disk it is necessary to save any tags that
have been set, and restore when swapping back in. Make use of the new
page flag (PG_ARCH_2, locally named PG_mte_tagged) to identify pages
with tags. When swapping out these pages the tags are stored in memory
and later restored when the pages are brought back in. Because shmem can
swap pages back in without restoring the userspace PTE it is also
necessary to add a hook for shmem.
Signed-off-by: Steven Price <steven.price@arm.com>
[catalin.marinas@arm.com: move function prototypes to mte.h]
[catalin.marinas@arm.com: drop '_tags' from arch_swap_restore_tags()] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Will Deacon <will@kernel.org>
Steven Price [Wed, 13 May 2020 15:37:49 +0000 (16:37 +0100)]
mm: Add arch hooks for saving/restoring tags
Arm's Memory Tagging Extension (MTE) adds some metadata (tags) to
every physical page, when swapping pages out to disk it is necessary to
save these tags, and later restore them when reading the pages back.
Add some hooks along with dummy implementations to enable the
arch code to handle this.
Three new hooks are added to the swap code:
* arch_prepare_to_swap() and
* arch_swap_invalidate_page() / arch_swap_invalidate_area().
One new hook is added to shmem:
* arch_swap_restore()
Signed-off-by: Steven Price <steven.price@arm.com>
[catalin.marinas@arm.com: add unlock_page() on the error path]
[catalin.marinas@arm.com: dropped the _tags suffix] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>
fs: Handle intra-page faults in copy_mount_options()
The copy_mount_options() function takes a user pointer argument but no
size and it tries to read up to a PAGE_SIZE. However, copy_from_user()
is not guaranteed to return all the accessible bytes if, for example,
the access crosses a page boundary and gets a fault on the second page.
To work around this, the current copy_mount_options() implementation
performs two copy_from_user() passes, first to the end of the current
page and the second to what's left in the subsequent page.
On arm64 with MTE enabled, access to a user page may trigger a fault
after part of the buffer in a page has been copied (when the user
pointer tag, bits 56-59, no longer matches the allocation tag stored in
memory). Allow copy_mount_options() to handle such intra-page faults by
resorting to byte at a time copy in case of copy_from_user() failure.
Note that copy_from_user() handles the zeroing of the kernel buffer in
case of error.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Catalin Marinas [Mon, 30 Mar 2020 09:29:38 +0000 (10:29 +0100)]
arm64: mte: ptrace: Add PTRACE_{PEEK,POKE}MTETAGS support
Add support for bulk setting/getting of the MTE tags in a tracee's
address space at 'addr' in the ptrace() syscall prototype. 'data' points
to a struct iovec in the tracer's address space with iov_base
representing the address of a tracer's buffer of length iov_len. The
tags to be copied to/from the tracer's buffer are stored as one tag per
byte.
On successfully copying at least one tag, ptrace() returns 0 and updates
the tracer's iov_len with the number of tags copied. In case of error,
either -EIO or -EFAULT is returned, trying to follow the ptrace() man
page.
Note that the tag copying functions are not performance critical,
therefore they lack optimisations found in typical memory copy routines.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Alan Hayward <Alan.Hayward@arm.com> Cc: Luis Machado <luis.machado@linaro.org> Cc: Omair Javaid <omair.javaid@linaro.org>
Catalin Marinas [Tue, 10 Dec 2019 11:19:15 +0000 (11:19 +0000)]
arm64: mte: Allow user control of the generated random tags via prctl()
The IRG, ADDG and SUBG instructions insert a random tag in the resulting
address. Certain tags can be excluded via the GCR_EL1.Exclude bitmap
when, for example, the user wants a certain colour for freed buffers.
Since the GCR_EL1 register is not accessible at EL0, extend the
prctl(PR_SET_TAGGED_ADDR_CTRL) interface to include a 16-bit field in
the first argument for controlling which tags can be generated by the
above instruction (an include rather than exclude mask). Note that by
default all non-zero tags are excluded. This setting is per-thread.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
Catalin Marinas [Wed, 27 Nov 2019 10:30:15 +0000 (10:30 +0000)]
arm64: mte: Allow user control of the tag check mode via prctl()
By default, even if PROT_MTE is set on a memory range, there is no tag
check fault reporting (SIGSEGV). Introduce a set of option to the
exiting prctl(PR_SET_TAGGED_ADDR_CTRL) to allow user control of the tag
check fault mode:
PR_MTE_TCF_NONE - no reporting (default)
PR_MTE_TCF_SYNC - synchronous tag check fault reporting
PR_MTE_TCF_ASYNC - asynchronous tag check fault reporting
These options translate into the corresponding SCTLR_EL1.TCF0 bitfield,
context-switched by the kernel. Note that the kernel accesses to the
user address space (e.g. read() system call) are not checked if the user
thread tag checking mode is PR_MTE_TCF_NONE or PR_MTE_TCF_ASYNC. If the
tag checking mode is PR_MTE_TCF_SYNC, the kernel makes a best effort to
check its user address accesses, however it cannot always guarantee it.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
Catalin Marinas [Fri, 29 Nov 2019 12:45:08 +0000 (12:45 +0000)]
mm: Allow arm64 mmap(PROT_MTE) on RAM-based files
Since arm64 memory (allocation) tags can only be stored in RAM, mapping
files with PROT_MTE is not allowed by default. RAM-based files like
those in a tmpfs mount or memfd_create() can support memory tagging, so
update the vm_flags accordingly in shmem_mmap().
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>
Catalin Marinas [Fri, 9 Aug 2019 14:48:46 +0000 (15:48 +0100)]
arm64: mte: Validate the PROT_MTE request via arch_validate_flags()
Make use of the newly introduced arch_validate_flags() hook to
sanity-check the PROT_MTE request passed to mmap() and mprotect(). If
the mapping does not support MTE, these syscalls will return -EINVAL.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
Catalin Marinas [Mon, 25 Nov 2019 17:27:06 +0000 (17:27 +0000)]
mm: Introduce arch_validate_flags()
Similarly to arch_validate_prot() called from do_mprotect_pkey(), an
architecture may need to sanity-check the new vm_flags.
Define a dummy function always returning true. In addition to
do_mprotect_pkey(), also invoke it from mmap_region() prior to updating
vma->vm_page_prot to allow the architecture code to veto potentially
inconsistent vm_flags.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org>
Catalin Marinas [Wed, 27 Nov 2019 10:00:27 +0000 (10:00 +0000)]
arm64: mte: Add PROT_MTE support to mmap() and mprotect()
To enable tagging on a memory range, the user must explicitly opt in via
a new PROT_MTE flag passed to mmap() or mprotect(). Since this is a new
memory type in the AttrIndx field of a pte, simplify the or'ing of these
bits over the protection_map[] attributes by making MT_NORMAL index 0.
There are two conditions for arch_vm_get_page_prot() to return the
MT_NORMAL_TAGGED memory type: (1) the user requested it via PROT_MTE,
registered as VM_MTE in the vm_flags, and (2) the vma supports MTE,
decided during the mmap() call (only) and registered as VM_MTE_ALLOWED.
arch_calc_vm_prot_bits() is responsible for registering the user request
as VM_MTE. The newly introduced arch_calc_vm_flag_bits() sets
VM_MTE_ALLOWED if the mapping is MAP_ANONYMOUS. An MTE-capable
filesystem (RAM-based) may be able to set VM_MTE_ALLOWED during its
mmap() file ops call.
In addition, update VM_DATA_DEFAULT_FLAGS to allow mprotect(PROT_MTE) on
stack or brk area.
The Linux mmap() syscall currently ignores unknown PROT_* flags. In the
presence of MTE, an mmap(PROT_MTE) on a file which does not support MTE
will not report an error and the memory will not be mapped as Normal
Tagged. For consistency, mprotect(PROT_MTE) will not report an error
either if the memory range does not support MTE. Two subsequent patches
in the series will propose tightening of this behaviour.
Kevin Brodsky [Mon, 25 Nov 2019 17:27:06 +0000 (17:27 +0000)]
mm: Introduce arch_calc_vm_flag_bits()
Similarly to arch_calc_vm_prot_bits(), introduce a dummy
arch_calc_vm_flag_bits() invoked from calc_vm_flag_bits(). This macro
can be overridden by architectures to insert specific VM_* flags derived
from the mmap() MAP_* flags.
Signed-off-by: Kevin Brodsky <Kevin.Brodsky@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org>
When the Memory Tagging Extension is enabled, two pages are identical
only if both their data and tags are identical.
Make the generic memcmp_pages() a __weak function and add an
arm64-specific implementation which returns non-zero if any of the two
pages contain valid MTE tags (PG_mte_tagged set). There isn't much
benefit in comparing the tags of two pages since these are normally used
for heap allocations and likely to differ anyway.
When the Memory Tagging Extension is enabled, the tags need to be
preserved across page copy (e.g. for copy-on-write, page migration).
Introduce MTE-aware copy_{user_,}highpage() functions to copy tags to
the destination if the source page has the PG_mte_tagged flag set.
copy_user_page() does not need to handle tag copying since, with this
patch, it is only called by the DAX code where there is no source page
structure (and no source tags).
Catalin Marinas [Mon, 4 May 2020 13:42:36 +0000 (14:42 +0100)]
arm64: mte: Clear the tags when a page is mapped in user-space with PROT_MTE
Pages allocated by the kernel are not guaranteed to have the tags
zeroed, especially as the kernel does not (yet) use MTE itself. To
ensure the user can still access such pages when mapped into its address
space, clear the tags via set_pte_at(). A new page flag - PG_mte_tagged
(PG_arch_2) - is used to track pages with valid allocation tags.
Since the zero page is mapped as pte_special(), it won't be covered by
the above set_pte_at() mechanism. Clear its tags during early MTE
initialisation.
Co-developed-by: Steven Price <steven.price@arm.com> Signed-off-by: Steven Price <steven.price@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org>
mm: Preserve the PG_arch_2 flag in __split_huge_page_tail()
When a huge page is split into normal pages, part of the head page flags
are transferred to the tail pages. However, the PG_arch_* flags are not
part of the preserved set.
PG_arch_2 is used by the arm64 MTE support to mark pages that have valid
tags. The absence of such flag would cause the arm64 set_pte_at() to
clear the tags in order to avoid stale tags exposed to user or the
swapping out hooks to ignore the tags. Not preserving PG_arch_2 on huge
page splitting leads to tag corruption in the tail pages.
Preserve the newly added PG_arch_2 flag in __split_huge_page_tail().
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org>
Steven Price [Wed, 22 Apr 2020 14:25:27 +0000 (15:25 +0100)]
mm: Add PG_arch_2 page flag
For arm64 MTE support it is necessary to be able to mark pages that
contain user space visible tags that will need to be saved/restored e.g.
when swapped out.
To support this add a new arch specific flag (PG_arch_2). This flag is
only available on 64-bit architectures due to the limited number of
spare page flags on the 32-bit ones.
Signed-off-by: Steven Price <steven.price@arm.com>
[catalin.marinas@arm.com: use CONFIG_64BIT for guarding this new flag] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Andrew Morton <akpm@linux-foundation.org>
arm64: mte: Handle synchronous and asynchronous tag check faults
The Memory Tagging Extension has two modes of notifying a tag check
fault at EL0, configurable through the SCTLR_EL1.TCF0 field:
1. Synchronous raising of a Data Abort exception with DFSC 17.
2. Asynchronous setting of a cumulative bit in TFSRE0_EL1.
Add the exception handler for the synchronous exception and handling of
the asynchronous TFSRE0_EL1.TF0 bit setting via a new TIF flag in
do_notify_resume().
On a tag check failure in user-space, whether synchronous or
asynchronous, a SIGSEGV will be raised on the faulting thread.
Catalin Marinas [Wed, 26 Aug 2020 17:52:59 +0000 (18:52 +0100)]
arm64: kvm: mte: Hide the MTE CPUID information from the guests
KVM does not support MTE in guests yet, so clear the corresponding field
in the ID_AA64PFR1_EL1 register. In addition, inject an undefined
exception in the guest if it accesses one of the GCR_EL1, RGSR_EL1,
TFSR_EL1 or TFSRE0_EL1 registers. While the emulate_sys_reg() function
already injects an undefined exception, this patch prevents the
unnecessary printk.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Steven Price <steven.price@arm.com> Acked-by: Marc Zyngier <maz@kernel.org>
arm64: mte: CPU feature detection and initial sysreg configuration
Add the cpufeature and hwcap entries to detect the presence of MTE. Any
secondary CPU not supporting the feature, if detected on the boot CPU,
will be parked.
Add the minimum SCTLR_EL1 and HCR_EL2 bits for enabling MTE. The Normal
Tagged memory type is configured in MAIR_EL1 before the MMU is enabled
in order to avoid disrupting other CPUs in the CnP domain.
Catalin Marinas [Wed, 27 Nov 2019 09:51:13 +0000 (09:51 +0000)]
arm64: mte: Use Normal Tagged attributes for the linear map
Once user space is given access to tagged memory, the kernel must be
able to clear/save/restore tags visible to the user. This is done via
the linear mapping, therefore map it as such. The new MT_NORMAL_TAGGED
index for MAIR_EL1 is initially mapped as Normal memory and later
changed to Normal Tagged via the cpufeature infrastructure. From a
mismatched attribute aliases perspective, the Tagged memory is
considered a permission and it won't lead to undefined behaviour.
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Suzuki K Poulose <Suzuki.Poulose@arm.com>
Will Deacon [Mon, 22 Jun 2020 12:37:09 +0000 (13:37 +0100)]
arm64: vdso: Fix unusual formatting in *setup_additional_pages()
There's really no need to put every parameter on a new line when calling
a function with a long name, so reformat the *setup_additional_pages()
functions in the vDSO setup code to follow the usual conventions.
Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Will Deacon <will@kernel.org>
Linus Torvalds [Sun, 30 Aug 2020 22:53:44 +0000 (15:53 -0700)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- fix regression in af_alg that affects iwd
- restore polling delay in qat
- fix double free in ingenic on error path
- fix potential build failure in sa2ul due to missing Kconfig dependency
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: af_alg - Work around empty control messages without MSG_MORE
crypto: sa2ul - add Kconfig selects to fix build error
crypto: ingenic - Drop kfree for memory allocated with devm_kzalloc
crypto: qat - add delay before polling mailbox
Linus Torvalds [Sun, 30 Aug 2020 19:01:23 +0000 (12:01 -0700)]
Merge tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"Three interrupt related fixes for X86:
- Move disabling of the local APIC after invoking fixup_irqs() to
ensure that interrupts which are incoming are noted in the IRR and
not ignored.
- Unbreak affinity setting.
The rework of the entry code reused the regular exception entry
code for device interrupts. The vector number is pushed into the
errorcode slot on the stack which is then lifted into an argument
and set to -1 because that's regs->orig_ax which is used in quite
some places to check whether the entry came from a syscall.
But it was overlooked that orig_ax is used in the affinity cleanup
code to validate whether the interrupt has arrived on the new
target. It turned out that this vector check is pointless because
interrupts are never moved from one vector to another on the same
CPU. That check is a historical leftover from the time where x86
supported multi-CPU affinities, but not longer needed with the now
strict single CPU affinity. Famous last words ...
- Add a missing check for an empty cpumask into the matrix allocator.
The affinity change added a warning to catch the case where an
interrupt is moved on the same CPU to a different vector. This
triggers because a condition with an empty cpumask returns an
assignment from the allocator as the allocator uses for_each_cpu()
without checking the cpumask for being empty. The historical
inconsistent for_each_cpu() behaviour of ignoring the cpumask and
unconditionally claiming that CPU0 is in the mask struck again.
Sigh.
plus a new entry into the MAINTAINER file for the HPE/UV platform"
* tag 'x86-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
genirq/matrix: Deal with the sillyness of for_each_cpu() on UP
x86/irq: Unbreak interrupt affinity setting
x86/hotplug: Silence APIC only after all interrupts are migrated
MAINTAINERS: Add entry for HPE Superdome Flex (UV) maintainers
Linus Torvalds [Sun, 30 Aug 2020 18:56:54 +0000 (11:56 -0700)]
Merge tag 'irq-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"A set of fixes for interrupt chip drivers:
- Revert the platform driver conversion of interrupt chip drivers as
it turned out to create more problems than it solves.
- Fix a trivial typo in the new module helpers which made probing
reliably fail.
- Small fixes in the STM32 and MIPS Ingenic drivers
- The TI firmware rework which had badly managed dependencies and had
to wait post rc1"
* tag 'irq-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/ingenic: Leave parent IRQ unmasked on suspend
irqchip/stm32-exti: Avoid losing interrupts due to clearing pending bits by mistake
irqchip: Revert modular support for drivers using IRQCHIP_PLATFORM_DRIVER helperse
irqchip: Fix probing deferal when using IRQCHIP_PLATFORM_DRIVER helpers
arm64: dts: k3-am65: Update the RM resource types
arm64: dts: k3-am65: ti-sci-inta/intr: Update to latest bindings
arm64: dts: k3-j721e: ti-sci-inta/intr: Update to latest bindings
irqchip/ti-sci-inta: Add support for INTA directly connecting to GIC
irqchip/ti-sci-inta: Do not store TISCI device id in platform device id field
dt-bindings: irqchip: Convert ti, sci-inta bindings to yaml
dt-bindings: irqchip: ti, sci-inta: Update docs to support different parent.
irqchip/ti-sci-intr: Add support for INTR being a parent to INTR
dt-bindings: irqchip: Convert ti, sci-intr bindings to yaml
dt-bindings: irqchip: ti, sci-intr: Update bindings to drop the usage of gic as parent
firmware: ti_sci: Add support for getting resource with subtype
firmware: ti_sci: Drop unused structure ti_sci_rm_type_map
firmware: ti_sci: Drop the device id to resource type translation
Linus Torvalds [Sun, 30 Aug 2020 18:50:53 +0000 (11:50 -0700)]
Merge tag 'sched-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fix from Thomas Gleixner:
"A single fix for the scheduler:
- Make is_idle_task() __always_inline to prevent the compiler from
putting it out of line into the wrong section because it's used
inside noinstr sections"
* tag 'sched-urgent-2020-08-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched: Use __always_inline on is_idle_task()
Linus Torvalds [Sun, 30 Aug 2020 17:56:12 +0000 (10:56 -0700)]
Merge tag 'powerpc-5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Revert our removal of PROT_SAO, at least one user expressed an
interest in using it on Power9. Instead don't allow it to be used in
guests unless enabled explicitly at compile time.
- A fix for a crash introduced by a recent change to FP handling.
- Revert a change to our idle code that left Power10 with no idle
support.
- One minor fix for the new scv system call path to set PPR.
- Fix a crash in our "generic" PMU if branch stack events were enabled.
- A fix for the IMC PMU, to correctly identify host kernel samples.
- The ADB_PMU powermac code was found to be incompatible with
VMAP_STACK, so make them incompatible in Kconfig until the code can
be fixed.
- A build fix in drivers/video/fbdev/controlfb.c, and a documentation
fix.
Thanks to Alexey Kardashevskiy, Athira Rajeev, Christophe Leroy,
Giuseppe Sacco, Madhavan Srinivasan, Milton Miller, Nicholas Piggin,
Pratik Rajesh Sampat, Randy Dunlap, Shawn Anastasio, Vaidyanathan
Srinivasan.
* tag 'powerpc-5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/32s: Disable VMAP stack which CONFIG_ADB_PMU
Revert "powerpc/powernv/idle: Replace CPU feature check with PVR check"
powerpc/perf: Fix reading of MSR[HV/PR] bits in trace-imc
powerpc/perf: Fix crashes with generic_compat_pmu & BHRB
powerpc/64s: Fix crash in load_fp_state() due to fpexc_mode
powerpc/64s: scv entry should set PPR
Documentation/powerpc: fix malformed table in syscall64-abi
video: fbdev: controlfb: Fix build for COMPILE_TEST=y && PPC_PMAC=n
selftests/powerpc: Update PROT_SAO test to skip ISA 3.1
powerpc/64s: Disallow PROT_SAO in LPARs by default
Revert "powerpc/64s: Remove PROT_SAO support"
Linus Torvalds [Sun, 30 Aug 2020 17:51:03 +0000 (10:51 -0700)]
Merge tag 'usb-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Let's try this again... Here are some USB fixes for 5.9-rc3.
This differs from the previous pull request for this release in that
the usb gadget patch now does not break some systems, and actually
does what it was intended to do. Many thanks to Marek Szyprowski for
quickly noticing and testing the patch from Andy Shevchenko to resolve
this issue.
Additionally, some more new USB quirks have been added to get some new
devices to work properly based on user reports.
Other than that, the patches are all here, and they contain:
- usb gadget driver fixes
- xhci driver fixes
- typec fixes
- new quirks and ids
- fixes for USB patches that went into 5.9-rc1.
All of these have been tested in linux-next with no reported issues"
* tag 'usb-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits)
usb: storage: Add unusual_uas entry for Sony PSZ drives
USB: Ignore UAS for JMicron JMS567 ATA/ATAPI Bridge
usb: host: ohci-exynos: Fix error handling in exynos_ohci_probe()
USB: gadget: u_f: Unbreak offset calculation in VLAs
USB: quirks: Ignore duplicate endpoint on Sound Devices MixPre-D
usb: typec: tcpm: Fix Fix source hard reset response for TDA 2.3.1.1 and TDA 2.3.1.2 failures
USB: PHY: JZ4770: Fix static checker warning.
USB: gadget: f_ncm: add bounds checks to ncm_unwrap_ntb()
USB: gadget: u_f: add overflow checks to VLA macros
xhci: Always restore EP_SOFT_CLEAR_TOGGLE even if ep reset failed
xhci: Do warm-reset when both CAS and XDEV_RESUME are set
usb: host: xhci: fix ep context print mismatch in debugfs
usb: uas: Add quirk for PNY Pro Elite
tools: usb: move to tools buildsystem
USB: Fix device driver race
USB: Also match device drivers using the ->match vfunc
usb: host: xhci-tegra: fix tegra_xusb_get_phy()
usb: host: xhci-tegra: otg usb2/usb3 port init
usb: hcd: Fix use after free in usb_hcd_pci_remove()
usb: typec: ucsi: Hold con->lock for the entire duration of ucsi_register_port()
...
Linus Torvalds [Sun, 30 Aug 2020 17:47:23 +0000 (10:47 -0700)]
Merge tag 'edac_urgent_for_v5.9_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fix from Borislav Petkov:
"A fix to properly clear ghes_edac driver state on driver remove so
that a subsequent load can probe the system properly (Shiju Jose)"
* tag 'edac_urgent_for_v5.9_rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/ghes: Fix NULL pointer dereference in ghes_edac_register()
Thomas Gleixner [Sun, 30 Aug 2020 17:07:53 +0000 (19:07 +0200)]
genirq/matrix: Deal with the sillyness of for_each_cpu() on UP
Most of the CPU mask operations behave the same way, but for_each_cpu() and
it's variants ignore the cpumask argument and claim that CPU0 is always in
the mask. This is historical, inconsistent and annoying behaviour.
The matrix allocator uses for_each_cpu() and can be called on UP with an
empty cpumask. The calling code does not expect that this succeeds but
until commit e027fffff799 ("x86/irq: Unbreak interrupt affinity setting")
this went unnoticed. That commit added a WARN_ON() to catch cases which
move an interrupt from one vector to another on the same CPU. The warning
triggers on UP.
Add a check for the cpumask being empty to prevent this.
Fixes: 2f75d9e1c905 ("genirq: Implement bitmap matrix allocator") Reported-by: kernel test robot <rong.a.chen@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org
Linus Torvalds [Sat, 29 Aug 2020 20:50:56 +0000 (13:50 -0700)]
fsldma: fix very broken 32-bit ppc ioread64 functionality
Commit ef91bb196b0d ("kernel.h: Silence sparse warning in
lower_32_bits") caused new warnings to show in the fsldma driver, but
that commit was not to blame: it only exposed some very incorrect code
that tried to take the low 32 bits of an address.
That made no sense for multiple reasons, the most notable one being that
that code was intentionally limited to only 32-bit ppc builds, so "only
low 32 bits of an address" was completely nonsensical. There were no
high bits to mask off to begin with.
But even more importantly fropm a correctness standpoint, turning the
address into an integer then caused the subsequent address arithmetic to
be completely wrong too, and the "+1" actually incremented the address
by one, rather than by four.
Which again was incorrect, since the code was reading two 32-bit values
and trying to make a 64-bit end result of it all. Surprisingly, the
iowrite64() did not suffer from the same odd and incorrect model.
This code has never worked, but it's questionable whether anybody cared:
of the two users that actually read the 64-bit value (by way of some C
preprocessor hackery and eventually the 'get_cdar()' inline function),
one of them explicitly ignored the value, and the other one might just
happen to work despite the incorrect value being read.
This patch at least makes it not fail the build any more, and makes the
logic superficially sane. Whether it makes any difference to the code
_working_ or not shall remain a mystery.
Linus Torvalds [Sat, 29 Aug 2020 19:51:58 +0000 (12:51 -0700)]
Merge tag 's390-5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Vasily Gorbik:
- Disable preemption trace in percpu macros since the lockdep code
itself uses percpu variables now and it causes recursions.
- Fix kernel space 4-level paging broken by recent vmem rework.
* tag 's390-5.9-4' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/vmem: fix vmem_add_range for 4-level paging
s390: don't trace preemption in percpu macros
Linus Torvalds [Sat, 29 Aug 2020 19:44:30 +0000 (12:44 -0700)]
Merge tag 'for-linus-5.9-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Two fixes for Xen: one needed for ongoing work to support virtio with
Xen, and one for a corner case in IRQ handling with Xen"
* tag 'for-linus-5.9-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
arm/xen: Add misuse warning to virt_to_gfn
xen/xenbus: Fix granting of vmalloc'd memory
XEN uses irqdesc::irq_data_common::handler_data to store a per interrupt XEN data pointer which contains XEN specific information.
Linus Torvalds [Sat, 29 Aug 2020 19:37:00 +0000 (12:37 -0700)]
Merge tag 'hwmon-for-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull hwmon fixes from Guenter Roeck:
- Fix tempeerature scale in gsc-hwmon driver
- Fix divide by 0 error in nct7904 driver
- Drop non-existing attribute from pmbus/isl68137 driver
- Fix status check in applesmc driver
* tag 'hwmon-for-v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
hwmon: (gsc-hwmon) Scale temperature to millidegrees
hwmon: (applesmc) check status earlier.
hwmon: (nct7904) Correct divide by 0
hwmon: (pmbus/isl68137) remove READ_TEMPERATURE_1 telemetry for RAA228228
Linus Torvalds [Fri, 28 Aug 2020 23:38:29 +0000 (16:38 -0700)]
Merge tag 'block-5.9-2020-08-28' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- nbd timeout fix (Hou)
- device size fix for loop LOOP_CONFIGURE (Martijn)
- MD pull from Song with raid5 stripe size fix (Yufen)
* tag 'block-5.9-2020-08-28' of git://git.kernel.dk/linux-block:
md/raid5: make sure stripe_size as power of two
loop: Set correct device size when using LOOP_CONFIGURE
nbd: restore default timeout when setting it to zero
Linus Torvalds [Fri, 28 Aug 2020 23:23:16 +0000 (16:23 -0700)]
Merge tag 'io_uring-5.9-2020-08-28' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
"A few fixes in here, all based on reports and test cases from folks
using it. Most of it is stable material as well:
- Hashed work cancelation fix (Pavel)
- poll wakeup signalfd fix
- memlock accounting fix
- nonblocking poll retry fix
- ensure we never return -ERESTARTSYS for reads
- ensure offset == -1 is consistent with preadv2() as documented
- IOPOLL -EAGAIN handling fixes
- remove useless task_work bounce for block based -EAGAIN retry"
* tag 'io_uring-5.9-2020-08-28' of git://git.kernel.dk/linux-block:
io_uring: don't bounce block based -EAGAIN retry off task_work
io_uring: fix IOPOLL -EAGAIN retries
io_uring: clear req->result on IOPOLL re-issue
io_uring: make offset == -1 consistent with preadv2/pwritev2
io_uring: ensure read requests go through -ERESTART* transformation
io_uring: don't use poll handler if file can't be nonblocking read/written
io_uring: fix imbalanced sqo_mm accounting
io_uring: revert consumed iov_iter bytes on error
io-wq: fix hang after cancelling pending hashed work
io_uring: don't recurse on tsk->sighand->siglock with signalfd
Linus Torvalds [Fri, 28 Aug 2020 20:23:31 +0000 (13:23 -0700)]
Merge tag 'devprop-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull device properties framework fix from Rafael Wysocki:
"Prevent the promotion of the secondary firmware node of a device to
the primary one from leaking a pointer (Heikki Krogerus)"
* tag 'devprop-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
device property: Fix the secondary firmware node handling in set_primary_fwnode()
Linus Torvalds [Fri, 28 Aug 2020 20:17:14 +0000 (13:17 -0700)]
Merge tag 'acpi-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These fix two recent issues in the ACPI memory mappings management
code and tighten up error handling in the ACPI driver for AMD SoCs
(APD).
Specifics:
- Avoid redundant rounding to the page size in acpi_os_map_iomem() to
address a recently introduced issue with the EFI memory map
permission check on ARM64 (Ard Biesheuvel).
- Fix acpi_release_memory() to wait until the memory mappings
released by it have been really unmapped (Rafael Wysocki).
- Make the ACPI driver for AMD SoCs (APD) check the return value of
acpi_dev_get_property() to avoid failures in the cases when the
device property under inspection is missing (Furquan Shaikh)"
* tag 'acpi-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: OSL: Prevent acpi_release_memory() from returning too early
ACPI: ioremap: avoid redundant rounding to OS page size
ACPI: SoC: APD: Check return value of acpi_dev_get_property()
Linus Torvalds [Fri, 28 Aug 2020 20:12:09 +0000 (13:12 -0700)]
Merge tag 'pm-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix the recently added Tegra194 cpufreq driver and the handling
of devices using runtime PM during system-wide suspend, improve the
intel_pstate driver documentation and clean up the cpufreq core.
Specifics:
- Make the recently added Tegra194 cpufreq driver use
read_cpuid_mpir() instead of cpu_logical_map() to avoid exporting
logical_cpu_map (Sumit Gupta).
- Drop the automatic system wakeup event reporting for devices with
pending runtime-resume requests during system-wide suspend to avoid
spurious aborts of the suspend flow (Rafael Wysocki).
- Fix build warning in the intel_pstate driver documentation and
improve the wording in there (Randy Dunlap).
- Clean up two pieces of code in the cpufreq core (Viresh Kumar)"
* tag 'pm-5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: Use WARN_ON_ONCE() for invalid relation
cpufreq: No need to verify cpufreq_driver in show_scaling_cur_freq()
PM: sleep: core: Fix the handling of pending runtime resume requests
Documentation: fix pm/intel_pstate build warning and wording
cpufreq: replace cpu_logical_map() with read_cpuid_mpir()
* pm-cpufreq:
cpufreq: Use WARN_ON_ONCE() for invalid relation
cpufreq: No need to verify cpufreq_driver in show_scaling_cur_freq()
Documentation: fix pm/intel_pstate build warning and wording
cpufreq: replace cpu_logical_map() with read_cpuid_mpir()
Linus Torvalds [Fri, 28 Aug 2020 18:37:33 +0000 (11:37 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Fix kernel build with the integrated LLVM assembler which doesn't see
the -Wa,-march option.
- Fix "make vdso_install" when COMPAT_VDSO is disabled.
- Make KVM more robust if the AT S1E1R instruction triggers an
exception (architecture corner cases).
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
KVM: arm64: Set HCR_EL2.PTW to prevent AT taking synchronous exception
KVM: arm64: Survive synchronous exceptions caused by AT instructions
KVM: arm64: Add kvm_extable for vaxorcism code
arm64: vdso32: make vdso32 install conditional
arm64: use a common .arch preamble for inline assembly
Herbert Xu [Fri, 28 Aug 2020 07:11:25 +0000 (17:11 +1000)]
kernel.h: Silence sparse warning in lower_32_bits
I keep getting sparse warnings in crypto such as:
CHECK drivers/crypto/ccree/cc_hash.c
drivers/crypto/ccree/cc_hash.c:49:9: warning: cast truncates bits from constant value (47b5481dbefa4fa4 becomes befa4fa4)
drivers/crypto/ccree/cc_hash.c:49:26: warning: cast truncates bits from constant value (db0c2e0d64f98fa7 becomes 64f98fa7)
[.. many more ..]
This patch removes the warning by adding a mask to keep sparse
happy.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 28 Aug 2020 17:57:14 +0000 (10:57 -0700)]
Merge tag 'writeback_for_v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs
Pull writeback fixes from Jan Kara:
"Fixes for writeback code occasionally skipping writeback of some
inodes or livelocking sync(2)"
* tag 'writeback_for_v5.9-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs:
writeback: Drop I_DIRTY_TIME_EXPIRE
writeback: Fix sync livelock due to b_dirty_time processing
writeback: Avoid skipping inode writeback
writeback: Protect inode->i_io_list with inode->i_lock
Linus Torvalds [Fri, 28 Aug 2020 17:41:00 +0000 (10:41 -0700)]
Merge tag 'gfs2-v5.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 fix from Andreas Gruenbacher:
"Fix a memory leak on filesystem withdraw.
We didn't detect this bug because we have slab merging on by default
(CONFIG_SLAB_MERGE_DEFAULT). Adding 'slub_nomerge' to the kernel
command line exposed the problem"
* tag 'gfs2-v5.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: add some much needed cleanup for log flushes that fail
Linus Torvalds [Fri, 28 Aug 2020 17:33:04 +0000 (10:33 -0700)]
Merge tag 'ceph-for-5.9-rc3' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"We have an inode number handling change, prompted by s390x which is a
64-bit architecture with a 32-bit ino_t, a patch to disallow leases to
avoid potential data integrity issues when CephFS is re-exported via
NFS or CIFS and a fix for the bulk of W=1 compilation warnings"
* tag 'ceph-for-5.9-rc3' of git://github.com/ceph/ceph-client:
ceph: don't allow setlease on cephfs
ceph: fix inode number handling on arches with 32-bit ino_t
libceph: add __maybe_unused to DEFINE_CEPH_FEATURE
Paulo Alcantara [Thu, 27 Aug 2020 14:20:19 +0000 (11:20 -0300)]
cifs: fix check of tcon dfs in smb1
For SMB1, the DFS flag should be checked against tcon->Flags rather
than tcon->share_flags. While at it, add an is_tcon_dfs() helper to
check for DFS capability in a more generic way.
Signed-off-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Reviewed-by: Shyam Prasad N <nspmangalore@gmail.com>
Linus Torvalds [Fri, 28 Aug 2020 17:15:33 +0000 (10:15 -0700)]
Merge tag 'mfd-fixes-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd
Pull MFD fixes from Lee Jones:
- fix double free
- handle devicetree disabled devices gracefully
* tag 'mfd-fixes-5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd:
mfd: mfd-core: Ensure disabled devices are ignored without error
mfd: core: Fix double-free in mfd_remove_devices_fn()
Linus Torvalds [Fri, 28 Aug 2020 16:46:48 +0000 (09:46 -0700)]
Merge tag 'drm-fixes-2020-08-28' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"As expected a bit of an rc3 uptick, amdgpu and msm are the main ones,
one msm patch was from the merge window, but had dependencies and we
dropped it until the other tree had landed. Otherwise it's a couple of
fixes for core, and etnaviv, and single i915, exynos, omap fixes.
I'm still tracking the Sandybridge gpu relocations issue, if we don't
see much movement I might just queue up the reverts. I'll talk to
Daniel next week once he's back from holidays.
core:
- Take modeset bkl for legacy drivers
dp_mst:
- Allow null crtc in dp_mst
i915:
- Fix command parser desc matching with masks
amdgpu:
- Misc display fixes
- Backlight fixes
- MPO fix for DCN1
- Fixes for Sienna Cichlid
- Fixes for Navy Flounder
- Vega SW CTF fixes
- SMU fix for Raven
- Fix a possible overflow in INFO ioctl
- Gfx10 clockgating fix
exynos:
- Just drop __iommu annotation to fix sparse warning
omap:
- locking state fix"
* tag 'drm-fixes-2020-08-28' of git://anongit.freedesktop.org/drm/drm: (41 commits)
drm/amd/display: Fix memleak in amdgpu_dm_mode_config_init
drm/amdgpu: disable runtime pm for navy_flounder
drm/amd/display: Retry AUX write when fail occurs
drm/amdgpu: Fix buffer overflow in INFO ioctl
drm/amd/powerplay: Fix hardmins not being sent to SMU for RV
drm/amdgpu: use MODE1 reset for navy_flounder by default
drm/amd/pm: correct the thermal alert temperature limit settings
drm/amdgpu: add asd fw check before loading asd
drm/amd/display: Keep current gain when ABM disable immediately
drm/amd/display: Fix passive dongle mistaken as active dongle in EDID emulation
drm/amd/display: Revert HDCP disable sequence change
drm/amd/display: Send DISPLAY_OFF after power down on boot
drm/amdgpu/gfx10: refine mgcg setting
drm/amd/pm: correct Vega20 swctf limit setting
drm/amd/pm: correct Vega12 swctf limit setting
drm/amd/pm: correct Vega10 swctf limit setting
drm/amd/pm: set VCN pg per instances
drm/amd/pm: enable run_btc callback for sienna_cichlid
drivers: gpu: amd: Initialize amdgpu_dm_backlight_caps object to 0 in amdgpu_dm_update_backlight_caps
drm/amd/display: Reject overlay plane configurations in multi-display scenarios
...
James Morse [Fri, 21 Aug 2020 14:07:07 +0000 (15:07 +0100)]
KVM: arm64: Set HCR_EL2.PTW to prevent AT taking synchronous exception
AT instructions do a translation table walk and return the result, or
the fault in PAR_EL1. KVM uses these to find the IPA when the value is
not provided by the CPU in HPFAR_EL1.
If a translation table walk causes an external abort it is taken as an
exception, even if it was due to an AT instruction. (DDI0487F.a's D5.2.11
"Synchronous faults generated by address translation instructions")
While we previously made KVM resilient to exceptions taken due to AT
instructions, the device access causes mismatched attributes, and may
occur speculatively. Prevent this, by forbidding a walk through memory
described as device at stage2. Now such AT instructions will report a
stage2 fault.
Such a fault will cause KVM to restart the guest. If the AT instructions
always walk the page tables, but guest execution uses the translation cached
in the TLB, the guest can't make forward progress until the TLB entry is
evicted. This isn't a problem, as since commit 5dcd0fdbb492 ("KVM: arm64:
Defer guest entry when an asynchronous exception is pending"), KVM will
return to the host to process IRQs allowing the rest of the system to keep
running.
Cc: stable@vger.kernel.org # <v5.3: 5dcd0fdbb492 ("KVM: arm64: Defer guest entry when an asynchronous exception is pending") Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
James Morse [Fri, 21 Aug 2020 14:07:06 +0000 (15:07 +0100)]
KVM: arm64: Survive synchronous exceptions caused by AT instructions
KVM doesn't expect any synchronous exceptions when executing, any such
exception leads to a panic(). AT instructions access the guest page
tables, and can cause a synchronous external abort to be taken.
The arm-arm is unclear on what should happen if the guest has configured
the hardware update of the access-flag, and a memory type in TCR_EL1 that
does not support atomic operations. B2.2.6 "Possible implementation
restrictions on using atomic instructions" from DDI0487F.a lists
synchronous external abort as a possible behaviour of atomic instructions
that target memory that isn't writeback cacheable, but the page table
walker may behave differently.
Make KVM robust to synchronous exceptions caused by AT instructions.
Add a get_user() style helper for AT instructions that returns -EFAULT
if an exception was generated.
While KVM's version of the exception table mixes synchronous and
asynchronous exceptions, only one of these can occur at each location.
Re-enter the guest when the AT instructions take an exception on the
assumption the guest will take the same exception. This isn't guaranteed
to make forward progress, as the AT instructions may always walk the page
tables, but guest execution may use the translation cached in the TLB.
This isn't a problem, as since commit 5dcd0fdbb492 ("KVM: arm64: Defer guest
entry when an asynchronous exception is pending"), KVM will return to the
host to process IRQs allowing the rest of the system to keep running.
Cc: stable@vger.kernel.org # <v5.3: 5dcd0fdbb492 ("KVM: arm64: Defer guest entry when an asynchronous exception is pending") Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
James Morse [Fri, 21 Aug 2020 14:07:05 +0000 (15:07 +0100)]
KVM: arm64: Add kvm_extable for vaxorcism code
KVM has a one instruction window where it will allow an SError exception
to be consumed by the hypervisor without treating it as a hypervisor bug.
This is used to consume asynchronous external abort that were caused by
the guest.
As we are about to add another location that survives unexpected exceptions,
generalise this code to make it behave like the host's extable.
KVM's version has to be mapped to EL2 to be accessible on nVHE systems.
The SError vaxorcism code is a one instruction window, so has two entries
in the extable. Because the KVM code is copied for VHE and nVHE, we end up
with four entries, half of which correspond with code that isn't mapped.
Signed-off-by: James Morse <james.morse@arm.com> Reviewed-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
vdso32 should only be installed if CONFIG_COMPAT_VDSO is enabled,
since it's not even supposed to be compiled otherwise, and arm64
builds without a 32bit crosscompiler will fail.
Fixes: 8d75785a8142 ("ARM64: vdso32: Install vdso32 from vdso_install") Signed-off-by: Frank van der Linden <fllinden@amazon.com> Cc: stable@vger.kernel.org [5.4+] Link: https://lore.kernel.org/r/20200827234012.19757-1-fllinden@amazon.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Sami Tolvanen [Thu, 27 Aug 2020 20:36:08 +0000 (13:36 -0700)]
arm64: use a common .arch preamble for inline assembly
Commit 7c78f67e9bd9 ("arm64: enable tlbi range instructions") breaks
LLVM's integrated assembler, because -Wa,-march is only passed to
external assemblers and therefore, the new instructions are not enabled
when IAS is used.
This change adds a common architecture version preamble, which can be
used in inline assembly blocks that contain instructions that require
a newer architecture version, and uses it to fix __TLBI_0 and __TLBI_1
with ARM64_TLB_RANGE.
Lee Jones [Wed, 19 Aug 2020 08:16:50 +0000 (09:16 +0100)]
mfd: mfd-core: Ensure disabled devices are ignored without error
Commit e49aa9a9bd22 ("mfd: core: Make a best effort attempt to match
devices with the correct of_nodes") changed the semantics for disabled
devices in mfd_add_device(). Instead of silently ignoring a disabled
child device, an error was returned. On receipt of the error
mfd_add_devices() the precedes to remove *all* child devices and
returns an all-failed error to the caller, which will inevitably fail
the parent device as well.
This patch reverts back to the old semantics and ignores child devices
which are disabled in Device Tree.
Fixes: e49aa9a9bd22 ("mfd: core: Make a best effort attempt to match devices with the correct of_nodes") Reported-by: Icenowy Zheng <icenowy@aosc.io> Tested-by: Icenowy Zheng <icenowy@aosc.io> Signed-off-by: Lee Jones <lee.jones@linaro.org>
Alan Stern [Wed, 26 Aug 2020 14:32:29 +0000 (10:32 -0400)]
usb: storage: Add unusual_uas entry for Sony PSZ drives
The PSZ-HA* family of USB disk drives from Sony can't handle the
REPORT OPCODES command when using the UAS protocol. This patch adds
an appropriate quirks entry.
Reported-and-tested-by: Till Dörges <doerges@pre-sense.de> Signed-off-by: Alan Stern <stern@rowland.harvard.edu> CC: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20200826143229.GB400430@rowland.harvard.edu Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Yufen Yu [Thu, 20 Aug 2020 13:22:05 +0000 (09:22 -0400)]
md/raid5: make sure stripe_size as power of two
Commit 3b5408b98e4d ("md/raid5: support config stripe_size by sysfs
entry") make stripe_size as a configurable value. It just requires
stripe_size as multiple of 4KB.
In fact, we should make sure stripe_size as power of two. Otherwise,
stripe_shift which is the result of ilog2 can not represent the real
stripe_size. Then, stripe_hash() and stripe_hash_locks_hash() may
get unexpected value.
Fixes: 3b5408b98e4d ("md/raid5: support config stripe_size by sysfs entry") Signed-off-by: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Song Liu <songliubraving@fb.com>
Jens Axboe [Thu, 27 Aug 2020 22:46:24 +0000 (16:46 -0600)]
io_uring: don't bounce block based -EAGAIN retry off task_work
These events happen inline from submission, so there's no need to
bounce them through the original task. Just set them up for retry
and issue retry directly instead of going over task_work.
Jens Axboe [Thu, 27 Aug 2020 22:40:19 +0000 (16:40 -0600)]
io_uring: fix IOPOLL -EAGAIN retries
This normally isn't hit, as polling is mostly done on NVMe with deep
queue depths. But if we do run into request starvation, we need to
ensure that retries are properly serialized.
Reported-by: Andres Freund <andres@anarazel.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>