Jeremy Linton [Wed, 26 Jun 2019 21:37:17 +0000 (16:37 -0500)]
arm_pmu: acpi: spe: Add initial MADT/SPE probing
ACPI 6.3 adds additional fields to the MADT GICC
structure to describe SPE PPI's. We pick these out
of the cached reference to the madt_gicc structure
similarly to the core PMU code. We then create a platform
device referring to the IRQ and let the user/module loader
decide whether to load the SPE driver.
Tested-by: Hanjun Guo <hanjun.guo@linaro.org> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Reviewed-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
Jeremy Linton [Wed, 26 Jun 2019 21:37:16 +0000 (16:37 -0500)]
ACPI/PPTT: Add function to return ACPI 6.3 Identical tokens
ACPI 6.3 adds a flag to indicate that child nodes are all
identical cores. This is useful to authoritatively determine
if a set of (possibly offline) cores are identical or not.
Since the flag doesn't give us a unique id we can generate
one and use it to create bitmaps of sibling nodes, or simply
in a loop to determine if a subset of cores are identical.
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Hanjun Guo <hanjun.guo@linaro.org> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
Jeremy Linton [Wed, 26 Jun 2019 21:37:15 +0000 (16:37 -0500)]
ACPI/PPTT: Modify node flag detection to find last IDENTICAL
The ACPI specification implies that the IDENTICAL flag should be
set on all non leaf nodes where the children are identical.
This means that we need to be searching for the last node with
the identical flag set rather than the first one.
Since this flag is also dependent on the table revision, we
need to add a bit of extra code to verify the table revision,
and the next node's state in the traversal. Since we want to
avoid function pointers here, lets just special case
the IDENTICAL flag.
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: Hanjun Guo <hanjun.guo@linaro.org> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Will Deacon <will@kernel.org>
These probably meant to identify non present entries which can now be
achieved with PMD_SECT_VALID or PTE_VALID bits. Hence just drop these
unused symbols which are not required anymore.
Cc: Will Deacon <will.deacon@arm.com> Cc: Steve Capper <steve.capper@arm.com> Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Aaro Koskinen [Mon, 17 Jun 2019 20:35:19 +0000 (23:35 +0300)]
arm64: Implement panic_smp_self_stop()
Currently arm64 uses the default implementation of panic_smp_self_stop()
where the CPU runs in a cpu_relax() loop unable to receive IPIs anymore.
As a result, when two CPUs panic() simultaneously we get "SMP: failed to
stop secondary CPUs" warnings and extra delays before a reset, because
smp_send_stop() still tries to stop the other paniced CPU.
Provide an implementation of panic_smp_self_stop() that is identical to
the IPI CPU stop handler, so that the online status of stopped CPUs gets
properly updated.
Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Jayachandran C [Mon, 17 Jun 2019 20:35:18 +0000 (23:35 +0300)]
arm64: Improve parking of stopped CPUs
The current code puts the stopped cpus in an 'yield' instruction loop.
Using a busy loop here is unnecessary, we can use the cpu_park_loop()
function here to do a wfi/wfe.
Signed-off-by: Jayachandran C <jnair@caviumnetworks.com> Signed-off-by: Aaro Koskinen <aaro.koskinen@nokia.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Tue, 18 Jun 2019 18:10:55 +0000 (19:10 +0100)]
arm64: Expose FRINT capabilities to userspace
ARMv8.5 introduces the FRINT series of instructions for rounding floating
point numbers to integers. Provide a capability to userspace in order to
allow applications to determine if the system supports these instructions.
Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Mark Brown [Tue, 18 Jun 2019 18:10:54 +0000 (19:10 +0100)]
arm64: Expose ARMv8.5 CondM capability to userspace
ARMv8.5 adds new instructions XAFLAG and AXFLAG to translate the
representation of the results of floating point comparisons between the
native ARM format and an alternative format used by some software. Add
a hwcap allowing userspace to determine if they are present, since we
referred to earlier CondM extensions as FLAGM call these extensions
FLAGM2.
Signed-off-by: Mark Brown <broonie@kernel.org> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Ard Biesheuvel [Thu, 23 May 2019 10:22:56 +0000 (11:22 +0100)]
arm64: bpf: do not allocate executable memory
The BPF code now takes care of mapping the code pages executable
after mapping them read-only, to ensure that no RWX mapped regions
are needed, even transiently. This means we can drop the executable
permissions from the mapping at allocation time.
Ard Biesheuvel [Thu, 23 May 2019 10:22:55 +0000 (11:22 +0100)]
arm64/kprobes: set VM_FLUSH_RESET_PERMS on kprobe instruction pages
In order to avoid transient inconsistencies where freed code pages
are remapped writable while stale TLB entries still exist on other
cores, mark the kprobes text pages with the VM_FLUSH_RESET_PERMS
attribute. This instructs the core vmalloc code not to defer the
TLB flush when this region is unmapped and returned to the page
allocator.
Ard Biesheuvel [Thu, 23 May 2019 10:22:53 +0000 (11:22 +0100)]
arm64: module: create module allocations without exec permissions
Now that the core code manages the executable permissions of code
regions of modules explicitly, it is no longer necessary to create
the module vmalloc regions with RWX permissions, and we can create
them with RW- permissions instead, which is preferred from a
security perspective.
Florian Fainelli [Mon, 17 Jun 2019 22:29:59 +0000 (15:29 -0700)]
arm64: Allow user selection of ARM64_MODULE_PLTS
Make ARM64_MODULE_PLTS a selectable Kconfig symbol, since some people
might have very big modules spilling out of the dedicated module area
into vmalloc. Help text is copied from the ARM 32-bit counterpart and
modified to a mention of KASLR and specific ARM errata workaround(s).
Ard Biesheuvel [Wed, 19 Jun 2019 12:18:31 +0000 (14:18 +0200)]
acpi/arm64: ignore 5.1 FADTs that are reported as 5.0
Some Qualcomm Snapdragon based laptops built to run Microsoft Windows
are clearly ACPI 5.1 based, given that that is the first ACPI revision
that supports ARM, and introduced the FADT 'arm_boot_flags' field,
which has a non-zero field on those systems.
So in these cases, infer from the ARM boot flags that the FADT must be
5.1 or later, and treat it as 5.1.
Acked-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Lee Jones <lee.jones@linaro.org> Reviewed-by: Graeme Gregory <graeme.gregory@linaro.org> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com> Acked-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
The reason is init_gic_priority_masking() may unmask PSR.I while the
irq stacks are not inited yet. Some "NMI" could be raised unfortunately
and it will just go into this exception.
In this patch, we just write the PMR in smp_prepare_boot_cpu(), and delay
unmasking PSR.I after irq stacks inited in init_IRQ().
Fixes: e79321883842 ("arm64: Switch to PMR masking when starting CPUs") Cc: Will Deacon <will.deacon@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Wei Li <liwei391@huawei.com>
[JT: make init_gic_priority_masking() not modify daif, rebase on other
priority masking fixes] Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Julien Thierry [Tue, 11 Jun 2019 09:38:11 +0000 (10:38 +0100)]
arm64: irqflags: Introduce explicit debugging for IRQ priorities
Using IRQ priority masking to enable/disable interrupts is a bit
sensitive as it requires to deal with both ICC_PMR_EL1 and PSR.I.
Introduce some validity checks to both highlight the states in which
functions dealing with IRQ enabling/disabling can (not) be called, and
bark a warning when called in an unexpected state.
Since these checks are done on hotpaths, introduce a build option to
choose whether to do the checking.
Cc: Will Deacon <will.deacon@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Julien Thierry [Tue, 11 Jun 2019 09:38:10 +0000 (10:38 +0100)]
arm64: Fix incorrect irqflag restore for priority masking
When using IRQ priority masking to disable interrupts, in order to deal
with the PSR.I state, local_irq_save() would convert the I bit into a
PMR value (GIC_PRIO_IRQOFF). This resulted in local_irq_restore()
potentially modifying the value of PMR in undesired location due to the
state of PSR.I upon flag saving [1].
In an attempt to solve this issue in a less hackish manner, introduce
a bit (GIC_PRIO_IGNORE_PMR) for the PMR values that can represent
whether PSR.I is being used to disable interrupts, in which case it
takes precedence of the status of interrupt masking via PMR.
GIC_PRIO_PSR_I_SET is chosen such that (<pmr_value> |
GIC_PRIO_PSR_I_SET) does not mask more interrupts than <pmr_value> as
some sections (e.g. arch_cpu_idle(), interrupt acknowledge path)
requires PMR not to mask interrupts that could be signaled to the
CPU when using only PSR.I.
Julien Thierry [Tue, 11 Jun 2019 09:38:09 +0000 (10:38 +0100)]
arm64: Fix interrupt tracing in the presence of NMIs
In the presence of any form of instrumentation, nmi_enter() should be
done before calling any traceable code and any instrumentation code.
Currently, nmi_enter() is done in handle_domain_nmi(), which is much
too late as instrumentation code might get called before. Move the
nmi_enter/exit() calls to the arch IRQ vector handler.
On arm64, it is not possible to know if the IRQ vector handler was
called because of an NMI before acknowledging the interrupt. However, It
is possible to know whether normal interrupts could be taken in the
interrupted context (i.e. if taking an NMI in that context could
introduce a potential race condition).
When interrupting a context with IRQs disabled, call nmi_enter() as soon
as possible. In contexts with IRQs enabled, defer this to the interrupt
controller, which is in a better position to know if an interrupt taken
is an NMI.
Fixes: bc3c03ccb464 ("arm64: Enable the support of pseudo-NMIs") Cc: <stable@vger.kernel.org> # 5.1.x- Cc: Will Deacon <will.deacon@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jason Cooper <jason@lakedaemon.net> Cc: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Julien Thierry [Tue, 11 Jun 2019 09:38:06 +0000 (10:38 +0100)]
arm64: Do not enable IRQs for ct_user_exit
For el0_dbg and el0_error, DAIF bits get explicitly cleared before
calling ct_user_exit.
When context tracking is disabled, DAIF gets set (almost) immediately
after. When context tracking is enabled, among the first things done
is disabling IRQs.
What is actually needed is:
- PSR.D = 0 so the system can be debugged (should be already the case)
- PSR.A = 0 so async error can be handled during context tracking
Do not clear PSR.I in those two locations.
Reviewed-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: James Morse <james.morse@arm.com> Cc: Will Deacon <will.deacon@arm.com> Signed-off-by: Julien Thierry <julien.thierry@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Masayoshi Mizuma [Fri, 14 Jun 2019 13:11:41 +0000 (09:11 -0400)]
arm64/mm: Correct the cache line size warning with non coherent device
If the cache line size is greater than ARCH_DMA_MINALIGN (128),
the warning shows and it's tainted as TAINT_CPU_OUT_OF_SPEC.
However, it's not good because as discussed in the thread [1], the cpu
cache line size will be problem only on non-coherent devices.
Since the coherent flag is already introduced to struct device,
show the warning only if the device is non-coherent device and
ARCH_DMA_MINALIGN is smaller than the cpu cache size.
'default n' is the default value for any bool or tristate Kconfig
setting so there is no need to write it explicitly.
Also since commit f467c5640c29 ("kconfig: only write '# CONFIG_FOO
is not set' for visible symbols") the Kconfig behavior is the same
regardless of 'default n' being present or not:
...
One side effect of (and the main motivation for) this change is making
the following two definitions behave exactly the same:
config FOO
bool
config FOO
bool
default n
With this change, neither of these will generate a
'# CONFIG_FOO is not set' line (assuming FOO isn't selected/implied).
That might make it clearer to people that a bare 'default n' is
redundant.
...
Frank Li [Wed, 1 May 2019 18:43:29 +0000 (18:43 +0000)]
drivers/perf: imx_ddr: Add DDR performance counter support to perf
Add DDR performance monitor support for iMX8QXP. The PMU consists of 3
programmable event counters and a single dedicated cycle counter.
Example usage:
$ perf stat -a -e \
imx8_ddr0/read-cycles/,imx8_ddr0/write-cycles/,imx8_ddr0/precharge/ ls
- or -
$ perf stat -a -e \
imx8_ddr0/cycles/,imx8_ddr0/read-access/,imx8_ddr0/write-access/ ls
Other events are supported, and advertised via perf list.
Reviewed-by: Andrey Smirnov <andrew.smirnov@gmail.com> Signed-off-by: Frank Li <Frank.Li@nxp.com>
[will: rewrote commit message/kconfig and used #defines for dev/cpuhp names] Signed-off-by: Will Deacon <will.deacon@arm.com>
Add binding documentation for the imx8qxp DDR performance monitor unit.
Signed-off-by: Frank Li <Frank.Li@nxp.com> Reviewed-by: Rob Herring <robh@kernel.org>
[will: Removed trailing newline] Signed-off-by: Will Deacon <will.deacon@arm.com>
Mark Rutland [Mon, 10 Jun 2019 12:41:07 +0000 (13:41 +0100)]
arm64: mm: avoid redundant READ_ONCE(*ptep)
In set_pte_at(), we read the old pte value so that it can be passed into
checks for racy hw updates. These checks are only performed for
CONFIG_DEBUG_VM, and the value is not used otherwise.
Since we read the pte value with READ_ONCE(), the compiler cannot elide
the redundant read for !CONFIG_DEBUG_VM kernels.
Let's ameliorate matters by moving the read and the checks into a
helper, __check_racy_pte_update(), which only performs the read when the
value will be used. This also allows us to reformat the conditions for
clarity.
Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Sudeep Holla [Thu, 23 May 2019 09:06:18 +0000 (10:06 +0100)]
arm64: ptrace: add support for syscall emulation
Add PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP support on arm64.
We don't need any special handling for PTRACE_SYSEMU_SINGLESTEP.
It's quite difficult to generalize handling PTRACE_SYSEMU cross
architectures and avoid calls to tracehook_report_syscall_entry twice.
Different architecture have different mechanism to indicate NO_SYSCALL
and trying to generalise adds more code for no gain.
Sudeep Holla [Thu, 23 May 2019 09:06:17 +0000 (10:06 +0100)]
arm64: add PTRACE_SYSEMU{,SINGLESTEP} definations to uapi headers
x86 and um use 31 and 32 for PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP
while powerpc uses different value maybe for legacy reasons.
Though handling of PTRACE_SYSEMU can be made architecture independent,
it's hard to make these definations generic. To add to this existing
mess few architectures like arm, c6x and sh use 31 for PTRACE_GETFDPIC
(get the ELF fdpic loadmap address). It's not possible to move the
definations to generic headers.
So we unfortunately have to duplicate the same defination to ARM64 if
we need to support PTRACE_SYSEMU and PTRACE_SYSEMU_SINGLESTEP.
Sudeep Holla [Thu, 23 May 2019 09:06:15 +0000 (10:06 +0100)]
ptrace: move clearing of TIF_SYSCALL_EMU flag to core
While the TIF_SYSCALL_EMU is set in ptrace_resume independent of any
architecture, currently only powerpc and x86 unset the TIF_SYSCALL_EMU
flag in ptrace_disable which gets called from ptrace_detach.
Let's move the clearing of TIF_SYSCALL_EMU flag to __ptrace_unlink
which gets executed from ptrace_detach and also keep it along with
or close to clearing of TIF_SYSCALL_TRACE.
Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
arm64/mm: Drop task_struct argument from __do_page_fault()
The task_struct argument is not getting used in __do_page_fault(). Hence
just drop it and use current or cuurent->mm instead where ever required.
This does not change any functionality.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
arm64/mm: Drop mmap_sem before calling __do_kernel_fault()
There is an inconsistency between down_read_trylock() success and failure
paths while dealing with kernel access for non exception table areas where
it calls __do_kernel_fault(). In case of failure it just bails out without
holding mmap_sem but when it succeeds it does so while holding mmap_sem.
Fix this inconsistency by just dropping mmap_sem in success path as well.
__do_kernel_fault() calls die_kernel_fault() which then calls show_pte().
show_pte() in this path might become bit more unreliable without holding
mmap_sem. But there are already instances [1] in do_page_fault() where
die_kernel_fault() gets called without holding mmap_sem. show_pte() can
be made more robust independently but in a later patch.
[1] Conditional block for (is_ttbr0_addr && is_el1_permission_fault)
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
We don't currently set the FAULT_FLAG_INSTRUCTION mm flag for EL0
instruction aborts. This has no functional impact, as we don't override
arch_vma_access_permitted(), and the default implementation always returns
true. However, it would be helpful to provide the flag so that it can be
consumed by tracepoints such as dax_pmd_fault.
This patch sets the FAULT_FLAG_INSTRUCTION flag for EL0 instruction aborts.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
arm64/mm: Change BUG_ON() to VM_BUG_ON() in [pmd|pud]_set_huge()
There are no callers for the functions which will pass unaligned physical
addresses. Hence just change these BUG_ON() checks into VM_BUG_ON() which
gets compiled out unless CONFIG_VM_DEBUG is enabled.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Julien Grall [Thu, 30 May 2019 11:30:58 +0000 (12:30 +0100)]
arm64/cpufeature: Convert hook_lock to raw_spin_lock_t in cpu_enable_ssbs()
cpu_enable_ssbs() is called via stop_machine() as part of the cpu_enable
callback. A spin lock is used to ensure the hook is registered before
the rest of the callback is executed.
On -RT spin_lock() may sleep. However, all the callees in stop_machine()
are expected to not sleep. Therefore a raw_spin_lock() is required here.
Given this is already done under stop_machine() and the work done under
the lock is quite small, the latency should not increase too much.
Miles Chen [Tue, 28 May 2019 16:08:20 +0000 (00:08 +0800)]
arm64: mm: make CONFIG_ZONE_DMA32 configurable
This change makes CONFIG_ZONE_DMA32 defuly y and allows users
to overwrite it only when CONFIG_EXPERT=y.
For the SoCs that do not need CONFIG_ZONE_DMA32, this is the
first step to manage all available memory by a single
zone(normal zone) to reduce the overhead of multiple zones.
The change also fixes a build error when CONFIG_NUMA=y and
CONFIG_ZONE_DMA32=n.
arch/arm64/mm/init.c:195:17: error: use of undeclared identifier 'ZONE_DMA32'
max_zone_pfns[ZONE_DMA32] = PFN_DOWN(max_zone_dma_phys());
Change since v1:
1. only expose CONFIG_ZONE_DMA32 when CONFIG_EXPERT=y
2. remove redundant IS_ENABLED(CONFIG_ZONE_DMA32)
Cc: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Miles Chen <miles.chen@mediatek.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
arm64/mm: Simplify protection flag creation for kernel huge mappings
Even though they have got the same value, PMD_TYPE_SECT and PUD_TYPE_SECT
get used for kernel huge mappings. But before that first the table bit gets
cleared using leaf level PTE_TABLE_BIT. Though functionally they are same,
we should use page table level specific macros to be consistent as per the
MMU specifications. Create page table level specific wrappers for kernel
huge mapping entries and just drop mk_sect_prot() which does not have any
other user.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Shaokun Zhang [Tue, 28 May 2019 02:16:54 +0000 (10:16 +0800)]
arm64: cacheinfo: Update cache_line_size detected from DT or PPTT
cache_line_size is derived from CTR_EL0.CWG field and is called mostly
for I/O device drivers. For some platforms like the HiSilicon Kunpeng920
server SoC, cache line sizes are different between L1/2 cache and L3
cache while L1 cache line size is 64-byte and L3 is 128-byte, but
CTR_EL0.CWG is misreporting using L1 cache line size.
We shall correct the right value which is important for I/O performance.
Let's update the cache line size if it is detected from DT or PPTT
information.
Shaokun Zhang [Tue, 28 May 2019 02:16:53 +0000 (10:16 +0800)]
drivers: base: cacheinfo: Add variable to record max cache line size
Add coherency_max_size variable to record the maximum cache line size
for different cache levels. If it is available, we will synchronize
it as cache line size, otherwise we will use CTR_EL0.CWG reporting
in cache_line_size() for arm64.
Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Jeremy Linton <jeremy.linton@arm.com> Cc: Will Deacon <will.deacon@arm.com> Reviewed-by: Sudeep Holla <sudeep.holla@arm.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Julien Grall [Tue, 21 May 2019 17:21:39 +0000 (18:21 +0100)]
arm64/fpsimd: Don't disable softirq when touching FPSIMD/SVE state
When the kernel is compiled with CONFIG_KERNEL_MODE_NEON, some part of
the kernel may be able to use FPSIMD/SVE. This is for instance the case
for crypto code.
Any use of FPSIMD/SVE in the kernel are clearly marked by using the
function kernel_neon_{begin, end}. Furthermore, this can only be used
when may_use_simd() returns true.
The current implementation of may_use_simd() allows softirq to use
FPSIMD/SVE unless it is currently in use (i.e kernel_neon_busy is true).
When in use, softirqs usually fall back to a software method.
At the moment, as a softirq may use FPSIMD/SVE, softirqs are disabled
when touching the FPSIMD/SVE context. This has the drawback to disable
all softirqs even if they are not using FPSIMD/SVE.
Since a softirq is supposed to check may_use_simd() anyway before
attempting to use FPSIMD/SVE, there is limited reason to keep softirq
disabled when touching the FPSIMD/SVE context. Instead, we can simply
disable preemption and mark the FPSIMD/SVE context as in use by setting
CPU's fpsimd_context_busy flag.
Two new helpers {get, put}_cpu_fpsimd_context are introduced to mark
the area using FPSIMD/SVE context and they are used to replace
local_bh_{disable, enable}. The functions kernel_neon_{begin, end} are
also re-implemented to use the new helpers.
Additionally, double-underscored versions of the helpers are provided to
called when preemption is already disabled. These are only relevant on
paths where irqs are disabled anyway, so they are not needed for
correctness in the current code. Let's use them anyway though: this
marks critical sections clearly and will help to avoid mistakes during
future maintenance.
The change has been benchmarked on Linux 5.1-rc4 with defconfig.
On Juno2:
* hackbench 100 process 1000 (10 times)
* .7% quicker
On ThunderX 2:
* hackbench 1000 process 1000 (20 times)
* 3.4% quicker
Reviewed-by: Dave Martin <dave.martin@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Julien Grall [Tue, 21 May 2019 17:21:38 +0000 (18:21 +0100)]
arm64/fpsimd: Introduce fpsimd_save_and_flush_cpu_state() and use it
The only external user of fpsimd_save() and fpsimd_flush_cpu_state() is
the KVM FPSIMD code.
A following patch will introduce a mechanism to acquire owernship of the
FPSIMD/SVE context for performing context management operations. Rather
than having to export the new helpers to get/put the context, we can just
introduce a new function to combine fpsimd_save() and
fpsimd_flush_cpu_state().
This has also the advantage to remove any external call of fpsimd_save()
and fpsimd_flush_cpu_state(), so they can be turned static.
Lastly, the new function can also be used in the PM notifier.
Reviewed-by: Dave Martin <Dave.Martin@arm.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Julien Grall <julien.grall@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
arm64/mm: Move PTE_VALID from SW defined to HW page table entry definitions
PTE_VALID signifies that the last level page table entry is valid and it is
MMU recognized while walking the page table. This is not a software defined
PTE bit and should not be listed like one. Just move it to appropriate
header file.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Suzuki Poulose <suzuki.poulose@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
arm64/hugetlb: Use macros for contiguous huge page sizes
Replace all open encoded contiguous huge page size computations with
available macro encodings CONT_PTE_SIZE and CONT_PMD_SIZE. There are other
instances where these macros are used in the file and this change makes it
consistently use the same mnemonic.
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: Steve Capper <steve.capper@arm.com> Cc: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Linus Torvalds [Sun, 2 Jun 2019 18:10:01 +0000 (11:10 -0700)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"Two fixes: a quirk for KVM guests running on certain AMD CPUs, and a
KASAN related build fix"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/CPU/AMD: Don't force the CPB cap when running under a hypervisor
x86/boot: Provide KASAN compatible aliases for string routines
Linus Torvalds [Sun, 2 Jun 2019 18:08:12 +0000 (11:08 -0700)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"On the kernel side there's a bunch of ring-buffer ordering fixes for a
reproducible bug, plus a PEBS constraints regression fix.
Plus tooling fixes"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
tools headers UAPI: Sync kvm.h headers with the kernel sources
perf record: Fix s390 missing module symbol and warning for non-root users
perf machine: Read also the end of the kernel
perf test vmlinux-kallsyms: Ignore aliases to _etext when searching on kallsyms
perf session: Add missing swap ops for namespace events
perf namespace: Protect reading thread's namespace
tools headers UAPI: Sync drm/drm.h with the kernel
tools headers UAPI: Sync drm/i915_drm.h with the kernel
tools headers UAPI: Sync linux/fs.h with the kernel
tools headers UAPI: Sync linux/sched.h with the kernel
tools arch x86: Sync asm/cpufeatures.h with the with the kernel
tools include UAPI: Update copy of files related to new fspick, fsmount, fsconfig, fsopen, move_mount and open_tree syscalls
perf arm64: Fix mksyscalltbl when system kernel headers are ahead of the kernel
perf data: Fix 'strncat may truncate' build failure with recent gcc
perf/ring-buffer: Use regular variables for nesting
perf/ring-buffer: Always use {READ,WRITE}_ONCE() for rb->user_page data
perf/ring_buffer: Add ordering to rb->nest increment
perf/ring_buffer: Fix exposing a temporarily decreased data_head
perf/x86/intel/ds: Fix EVENT vs. UEVENT PEBS constraints
Linus Torvalds [Sun, 2 Jun 2019 18:06:13 +0000 (11:06 -0700)]
Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Two EFI fixes: a quirk for weird systabs, plus add more robust error
handling in the old 1:1 mapping code"
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Allow the number of EFI configuration tables entries to be zero
efi/x86/Add missing error handling to old_memmap 1:1 mapping code
Linus Torvalds [Sun, 2 Jun 2019 17:22:38 +0000 (10:22 -0700)]
Merge tag 'spdx-5.2-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull SPDX fixes from Greg KH:
"Here are just two small patches, that fix up some found SPDX
identifier issues.
The first patch fixes an error in a previous SPDX fixup patch, that
causes build errors when doing 'make clean' on the tree (the fact that
almost no one noticed it reflects the fact that kernel developers
don't like doing that option very often...)
The second patch fixes up a number of places in the tree where people
mistyped the string "SPDX-License-Identifier". Given that people can
not even type their own name all the time without mistakes, this was
bound to happen, and odds are, we will have to add some type of check
for this to checkpatch.pl to catch this happening in the future.
Both of these have passed testing by 0-day"
* tag 'spdx-5.2-rc3-2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
treewide: fix typos of SPDX-License-Identifier
crypto: ux500 - fix license comment syntax error
Linus Torvalds [Sun, 2 Jun 2019 17:21:04 +0000 (10:21 -0700)]
Merge tag 'powerpc-5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"A minor fix to our IMC PMU code to print a less confusing error
message when the driver can't initialise properly.
A fix for a bug where a user requesting an unsupported branch sampling
filter can corrupt PMU state, preventing the PMU from counting
properly.
And finally a fix for a bug in our support for kexec_file_load(),
which prevented loading a kernel and initramfs. Most versions of kexec
don't yet use kexec_file_load().
Thanks to: Anju T Sudhakar, Dave Young, Madhavan Srinivasan, Ravi
Bangoria, Thiago Jung Bauermann"
* tag 'powerpc-5.2-3' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/kexec: Fix loading of kernel + initramfs with kexec_file_load()
powerpc/perf: Fix MMCRA corruption by bhrb_filter
powerpc/powernv: Return for invalid IMC domain
Linus Torvalds [Sun, 2 Jun 2019 17:19:39 +0000 (10:19 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Fixes for PPC and s390"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: PPC: Book3S HV: Restore SPRG3 in kvmhv_p9_guest_entry()
KVM: PPC: Book3S HV: Fix lockdep warning when entering guest on POWER9
KVM: PPC: Book3S HV: XIVE: Fix page offset when clearing ESB pages
KVM: PPC: Book3S HV: XIVE: Take the srcu read lock when accessing memslots
KVM: PPC: Book3S HV: XIVE: Do not clear IRQ data of passthrough interrupts
KVM: PPC: Book3S HV: XIVE: Introduce a new mutex for the XIVE device
KVM: PPC: Book3S HV: XIVE: Fix the enforced limit on the vCPU identifier
KVM: PPC: Book3S HV: XIVE: Do not test the EQ flag validity when resetting
KVM: PPC: Book3S HV: XIVE: Clear file mapping when device is released
KVM: PPC: Book3S HV: Don't take kvm->lock around kvm_for_each_vcpu
KVM: PPC: Book3S: Use new mutex to synchronize access to rtas token list
KVM: PPC: Book3S HV: Use new mutex to synchronize MMU setup
KVM: PPC: Book3S HV: Avoid touching arch.mmu_ready in XIVE release functions
KVM: s390: Do not report unusabled IDs via KVM_CAP_MAX_VCPU_ID
kvm: fix compile on s390 part 2
Linus Torvalds [Sun, 2 Jun 2019 17:16:09 +0000 (10:16 -0700)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal
Pull thermal SoC fix from Eduardo Valentin:
"A single revert, detected to cause issues on the tsens driver"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal:
Revert "drivers: thermal: tsens: Add new operation to check if a sensor is enabled"
Linus Torvalds [Sun, 2 Jun 2019 17:14:25 +0000 (10:14 -0700)]
Merge tag 'led-fixes-for-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds
Pull LED fix from Jacek Anaszewski:
"Fix for a recent change in LED core, that didn't take into account the
possibility of calling led_blink_setup() from atomic context"
* tag 'led-fixes-for-5.2-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/j.anaszewski/linux-leds:
leds: avoid flush_work in atomic context
Linus Torvalds [Sun, 2 Jun 2019 16:26:34 +0000 (09:26 -0700)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Six minor fixes to device drivers and one to the multipath alua
handler.
The most extensive fix is the zfcp port remove prevention one, but
it's impact is only s390"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: libsas: delete sas port if expander discover failed
scsi: libsas: only clear phy->in_shutdown after shutdown event done
scsi: scsi_dh_alua: Fix possible null-ptr-deref
scsi: smartpqi: properly set both the DMA mask and the coherent DMA mask
scsi: zfcp: fix to prevent port_remove with pure auto scan LUNs (only sdevs)
scsi: zfcp: fix missing zfcp_port reference put on -EBUSY from port_remove
scsi: libcxgbi: add a check for NULL pointer in cxgbi_check_route()
Linus Torvalds [Sun, 2 Jun 2019 15:51:30 +0000 (08:51 -0700)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"Various fixes and followups"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm, compaction: make sure we isolate a valid PFN
include/linux/generic-radix-tree.h: fix kerneldoc comment
kernel/signal.c: trace_signal_deliver when signal_group_exit
drivers/iommu/intel-iommu.c: fix variable 'iommu' set but not used
spdxcheck.py: fix directory structures
kasan: initialize tag to 0xff in __kasan_kmalloc
z3fold: fix sheduling while atomic
scripts/gdb: fix invocation when CONFIG_COMMON_CLK is not set
mm/gup: continue VM_FAULT_RETRY processing even for pre-faults
ocfs2: fix error path kobject memory leak
memcg: make it work on sparse non-0-node systems
mm, memcg: consider subtrees in memory.events
prctl_set_mm: downgrade mmap_sem to read lock
prctl_set_mm: refactor checks from validate_prctl_map
kernel/fork.c: make max_threads symbol static
arch/arm/boot/compressed/decompress.c: fix build error due to lz4 changes
arch/parisc/configs/c8000_defconfig: remove obsoleted CONFIG_DEBUG_SLAB_LEAK
mm/vmalloc.c: fix typo in comment
lib/sort.c: fix kernel-doc notation warnings
mm: fix Documentation/vm/hmm.rst Sphinx warnings
When we have holes in a normal memory zone, we could endup having
cached_migrate_pfns which may not necessarily be valid, under heavy memory
pressure with swapping enabled ( via __reset_isolation_suitable(),
triggered by kswapd).
Later if we fail to find a page via fast_isolate_freepages(), we may end
up using the migrate_pfn we started the search with, as valid page. This
could lead to accessing NULL pointer derefernces like below, due to an
invalid mem_section pointer.
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [47/1825]
Mem abort info:
ESR = 0x96000004
Exception class = DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
Data abort info:
ISV = 0, ISS = 0x00000004
CM = 0, WnR = 0
user pgtable: 4k pages, 48-bit VAs, pgdp = 0000000082f94ae9
[0000000000000008] pgd=0000000000000000
Internal error: Oops: 96000004 [#1] SMP
...
CPU: 10 PID: 6080 Comm: qemu-system-aar Not tainted 510-rc1+ #6
Hardware name: AmpereComputing(R) OSPREY EV-883832-X3-0001/OSPREY, BIOS 4819 09/25/2018
pstate: 60000005 (nZCv daif -PAN -UAO)
pc : set_pfnblock_flags_mask+0x58/0xe8
lr : compaction_alloc+0x300/0x950
[...]
Process qemu-system-aar (pid: 6080, stack limit = 0x0000000095070da5)
Call trace:
set_pfnblock_flags_mask+0x58/0xe8
compaction_alloc+0x300/0x950
migrate_pages+0x1a4/0xbb0
compact_zone+0x750/0xde8
compact_zone_order+0xd8/0x118
try_to_compact_pages+0xb4/0x290
__alloc_pages_direct_compact+0x84/0x1e0
__alloc_pages_nodemask+0x5e0/0xe18
alloc_pages_vma+0x1cc/0x210
do_huge_pmd_anonymous_page+0x108/0x7c8
__handle_mm_fault+0xdd4/0x1190
handle_mm_fault+0x114/0x1c0
__get_user_pages+0x198/0x3c0
get_user_pages_unlocked+0xb4/0x1d8
__gfn_to_pfn_memslot+0x12c/0x3b8
gfn_to_pfn_prot+0x4c/0x60
kvm_handle_guest_abort+0x4b0/0xcd8
handle_exit+0x140/0x1b8
kvm_arch_vcpu_ioctl_run+0x260/0x768
kvm_vcpu_ioctl+0x490/0x898
do_vfs_ioctl+0xc4/0x898
ksys_ioctl+0x8c/0xa0
__arm64_sys_ioctl+0x28/0x38
el0_svc_common+0x74/0x118
el0_svc_handler+0x38/0x78
el0_svc+0x8/0xc
Code: f8607840f100001f8b0114019a801020 (f9400400)
---[ end trace af6a35219325a9b6 ]---
The issue was reported on an arm64 server with 128GB with holes in the
zone (e.g, [32GB@4GB, 96GB@544GB]), with a swap device enabled, while
running 100 KVM guest instances.
This patch fixes the issue by ensuring that the page belongs to a valid
PFN when we fallback to using the lower limit of the scan range upon
failure in fast_isolate_freepages().
Link: http://lkml.kernel.org/r/1558711908-15688-1-git-send-email-suzuki.poulose@arm.com Fixes: 5a811889de10f1eb ("mm, compaction: use free lists to quickly locate a migration target") Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reported-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Mel Gorman <mgorman@techsingularity.net> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Qian Cai <cai@lca.pw> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Zhenliang Wei [Sat, 1 Jun 2019 05:30:52 +0000 (22:30 -0700)]
kernel/signal.c: trace_signal_deliver when signal_group_exit
In the fixes commit, removing SIGKILL from each thread signal mask and
executing "goto fatal" directly will skip the call to
"trace_signal_deliver". At this point, the delivery tracking of the
SIGKILL signal will be inaccurate.
Therefore, we need to add trace_signal_deliver before "goto fatal" after
executing sigdelset.
Note: SEND_SIG_NOINFO matches the fact that SIGKILL doesn't have any info.
Link: http://lkml.kernel.org/r/20190425025812.91424-1-weizhenliang@huawei.com Fixes: cf43a757fd4944 ("signal: Restore the stop PTRACE_EVENT_EXIT") Signed-off-by: Zhenliang Wei <weizhenliang@huawei.com> Reviewed-by: Christian Brauner <christian@brauner.io> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Ivan Delalande <colona@arista.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Deepa Dinamani <deepa.kernel@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Qian Cai [Sat, 1 Jun 2019 05:30:49 +0000 (22:30 -0700)]
drivers/iommu/intel-iommu.c: fix variable 'iommu' set but not used
Commit cf04eee8bf0e ("iommu/vt-d: Include ACPI devices in iommu=pt")
added for_each_active_iommu() in iommu_prepare_static_identity_mapping()
but never used the each element, i.e, "drhd->iommu".
drivers/iommu/intel-iommu.c: In function
'iommu_prepare_static_identity_mapping':
drivers/iommu/intel-iommu.c:3037:22: warning: variable 'iommu' set but
not used [-Wunused-but-set-variable]
struct intel_iommu *iommu;
Fixed the warning by appending a compiler attribute __maybe_unused for it.
Link: http://lkml.kernel.org/r/20190523013314.2732-1-cai@lca.pw Signed-off-by: Qian Cai <cai@lca.pw> Suggested-by: Andrew Morton <akpm@linux-foundation.org> Cc: Joerg Roedel <jroedel@suse.de> Cc: David Woodhouse <dwmw2@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The LICENSE directory has recently changed structure and this makes
spdxcheck fails as per below:
FAIL: "Blob or Tree named 'other' not found"
Traceback (most recent call last):
File "scripts/spdxcheck.py", line 240, in <module>
spdx = read_spdxdata(repo)
File "scripts/spdxcheck.py", line 41, in read_spdxdata
for el in lictree[d].traverse():
[...]
KeyError: "Blob or Tree named 'other' not found"
Fix the script to restore the correctness on checkpatch License checking.
References: 62be257e986d ("LICENSES: Rename other to deprecated")
References: 8ea8814fcdcb ("LICENSES: Clearly mark dual license only licenses") Link: http://lkml.kernel.org/r/20190523084755.56739-1-vincenzo.frascino@arm.com Signed-off-by: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Joe Perches <joe@perches.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jeremy Cline <jcline@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
When building with -Wuninitialized and CONFIG_KASAN_SW_TAGS unset, Clang
warns:
mm/kasan/common.c:484:40: warning: variable 'tag' is uninitialized when
used here [-Wuninitialized]
kasan_unpoison_shadow(set_tag(object, tag), size);
^~~
set_tag ignores tag in this configuration but clang doesn't realize it at
this point in its pipeline, as it points to arch_kasan_set_tag as being
the point where it is used, which will later be expanded to (void
*)(object) without a use of tag. Initialize tag to 0xff, as it removes
this warning and doesn't change the meaning of the code.
Fabiano Rosas [Sat, 1 Jun 2019 05:30:36 +0000 (22:30 -0700)]
scripts/gdb: fix invocation when CONFIG_COMMON_CLK is not set
CLK_GET_RATE_NOCACHE depends on CONFIG_COMMON_CLK. Importing constants.py
when CONFIG_COMMON_CLK is not defined causes:
(gdb) lx-symbols
(...)
File "scripts/gdb/linux/proc.py", line 15, in <module>
from linux import constants
File "scripts/gdb/linux/constants.py", line 2, in <module>
LX_CLK_GET_RATE_NOCACHE = gdb.parse_and_eval("CLK_GET_RATE_NOCACHE")
gdb.error: No symbol "CLK_GET_RATE_NOCACHE" in current context.
Link: http://lkml.kernel.org/r/20190523195313.24701-1-farosas@linux.ibm.com Fixes: e7e6f462c1be ("scripts/gdb: print cached rate in lx-clk-summary") Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Cc: Jan Kiszka <jan.kiszka@siemens.com> Cc: Kieran Bingham <kbingham@kernel.org> Cc: Leonard Crestez <leonard.crestez@nxp.com> Cc: Jackie Liu <liuyun01@kylinos.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Sat, 1 Jun 2019 05:30:33 +0000 (22:30 -0700)]
mm/gup: continue VM_FAULT_RETRY processing even for pre-faults
When get_user_pages*() is called with pages = NULL, the processing of
VM_FAULT_RETRY terminates early without actually retrying to fault-in all
the pages.
If the pages in the requested range belong to a VMA that has userfaultfd
registered, handle_userfault() returns VM_FAULT_RETRY *after* user space
has populated the page, but for the gup pre-fault case there's no actual
retry and the caller will get no pages although they are present.
This issue was uncovered when running post-copy memory restore in CRIU
after d9c9ce34ed5c ("x86/fpu: Fault-in user stack if
copy_fpstate_to_sigframe() fails").
After this change, the copying of FPU state to the sigframe switched from
copy_to_user() variants which caused a real page fault to get_user_pages()
with pages parameter set to NULL.
In post-copy mode of CRIU, the destination memory is managed with
userfaultfd and lack of the retry for pre-fault case in get_user_pages()
causes a crash of the restored process.
Making the pre-fault behavior of get_user_pages() the same as the "normal"
one fixes the issue.
Link: http://lkml.kernel.org/r/1557844195-18882-1-git-send-email-rppt@linux.ibm.com Fixes: d9c9ce34ed5c ("x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails") Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Andrei Vagin <avagin@gmail.com> [https://travis-ci.org/avagin/linux/builds/533184940] Tested-by: Hugh Dickins <hughd@google.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Borislav Petkov <bp@suse.de> Cc: Pavel Machek <pavel@ucw.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If a call to kobject_init_and_add() fails we should call kobject_put()
otherwise we leak memory.
Add call to kobject_put() in the error path of call to
kobject_init_and_add(). Please note, this has the side effect that the
release method is called if kobject_init_and_add() fails.
Link: http://lkml.kernel.org/r/20190513033458.2824-1-tobin@kernel.org Signed-off-by: Tobin C. Harding <tobin@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jiri Slaby [Sat, 1 Jun 2019 05:30:26 +0000 (22:30 -0700)]
memcg: make it work on sparse non-0-node systems
We have a single node system with node 0 disabled:
Scanning NUMA topology in Northbridge 24
Number of physical nodes 2
Skipping disabled node 0
Node 1 MemBase 0000000000000000 Limit 00000000fbff0000
NODE_DATA(1) allocated [mem 0xfbfda000-0xfbfeffff]
This causes crashes in memcg when system boots:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
#PF error: [normal kernel read fault]
...
RIP: 0010:list_lru_add+0x94/0x170
...
Call Trace:
d_lru_add+0x44/0x50
dput.part.34+0xfc/0x110
__fput+0x108/0x230
task_work_run+0x9f/0xc0
exit_to_usermode_loop+0xf5/0x100
It is reproducible as far as 4.12. I did not try older kernels. You have
to have a new enough systemd, e.g. 241 (the reason is unknown -- was not
investigated). Cannot be reproduced with systemd 234.
The system crashes because the size of lru array is never updated in
memcg_update_all_list_lrus and the reads are past the zero-sized array,
causing dereferences of random memory.
The root cause are list_lru_memcg_aware checks in the list_lru code. The
test in list_lru_memcg_aware is broken: it assumes node 0 is always
present, but it is not true on some systems as can be seen above.
So fix this by avoiding checks on node 0. Remember the memcg-awareness by
a bool flag in struct list_lru.
Link: http://lkml.kernel.org/r/20190522091940.3615-1-jslaby@suse.cz Fixes: 60d3fd32a7a9 ("list_lru: introduce per-memcg lists") Signed-off-by: Jiri Slaby <jslaby@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Suggested-by: Vladimir Davydov <vdavydov.dev@gmail.com> Acked-by: Vladimir Davydov <vdavydov.dev@gmail.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Chris Down [Sat, 1 Jun 2019 05:30:22 +0000 (22:30 -0700)]
mm, memcg: consider subtrees in memory.events
memory.stat and other files already consider subtrees in their output, and
we should too in order to not present an inconsistent interface.
The current situation is fairly confusing, because people interacting with
cgroups expect hierarchical behaviour in the vein of memory.stat,
cgroup.events, and other files. For example, this causes confusion when
debugging reclaim events under low, as currently these always read "0" at
non-leaf memcg nodes, which frequently causes people to misdiagnose breach
behaviour. The same confusion applies to other counters in this file when
debugging issues.
Aggregation is done at write time instead of at read-time since these
counters aren't hot (unlike memory.stat which is per-page, so it does it
at read time), and it makes sense to bundle this with the file
notifications.
After this patch, events are propagated up the hierarchy:
[root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
low 0
high 0
max 0
oom 0
oom_kill 0
[root@ktst ~]# systemd-run -p MemoryMax=1 true
Running as unit: run-r251162a189fb4562b9dabfdc9b0422f5.service
[root@ktst ~]# cat /sys/fs/cgroup/system.slice/memory.events
low 0
high 0
max 7
oom 1
oom_kill 1
As this is a change in behaviour, this can be reverted to the old
behaviour by mounting with the `memory_localevents' flag set. However, we
use the new behaviour by default as there's a lack of evidence that there
are any current users of memory.events that would find this change
undesirable.
akpm: this is a behaviour change, so Cc:stable. THis is so that
forthcoming distros which use cgroup v2 are more likely to pick up the
revised behaviour.
Link: http://lkml.kernel.org/r/20190208224419.GA24772@chrisdown.name Signed-off-by: Chris Down <chris@chrisdown.name> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Roman Gushchin <guro@fb.com> Cc: Dennis Zhou <dennis@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Koutný [Sat, 1 Jun 2019 05:30:19 +0000 (22:30 -0700)]
prctl_set_mm: downgrade mmap_sem to read lock
The commit a3b609ef9f8b ("proc read mm's {arg,env}_{start,end} with mmap
semaphore taken.") added synchronization of reading argument/environment
boundaries under mmap_sem. Later commit 88aa7cc688d4 ("mm: introduce
arg_lock to protect arg_start|end and env_start|end in mm_struct") avoided
the coarse use of mmap_sem in similar situations. But there still
remained two places that (mis)use mmap_sem.
get_cmdline should also use arg_lock instead of mmap_sem when it reads the
boundaries.
The second place that should use arg_lock is in prctl_set_mm. By
protecting the boundaries fields with the arg_lock, we can downgrade
mmap_sem to reader lock (analogous to what we already do in
prctl_set_mm_map).
[akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/20190502125203.24014-3-mkoutny@suse.com Fixes: 88aa7cc688d4 ("mm: introduce arg_lock to protect arg_start|end and env_start|end in mm_struct") Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Laurent Dufour <ldufour@linux.ibm.com> Co-developed-by: Laurent Dufour <ldufour@linux.ibm.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Koutný [Sat, 1 Jun 2019 05:30:16 +0000 (22:30 -0700)]
prctl_set_mm: refactor checks from validate_prctl_map
Despite comment of validate_prctl_map claims there are no capability
checks, it is not completely true since commit 4d28df6152aa ("prctl: Allow
local CAP_SYS_ADMIN changing exe_file"). Extract the check out of the
function and make the function perform purely arithmetic checks.
This patch should not change any behavior, it is mere refactoring for
following patch.
[akpm@linux-foundation.org: coding style fixes] Link: http://lkml.kernel.org/r/20190502125203.24014-2-mkoutny@suse.com Signed-off-by: Michal Koutný <mkoutny@suse.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Cyrill Gorcunov <gorcunov@gmail.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Laurent Dufour <ldufour@linux.ibm.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Yang Shi <yang.shi@linux.alibaba.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
arch/arm/boot/compressed/decompress.c: fix build error due to lz4 changes
include/linux/cpumask.h: In function 'cpumask_parse':
include/linux/cpumask.h:636:21: error: implicit declaration of function 'strchrnul'; did you mean 'strchr'? [-Werror=implicit-function-declaration]
Because arch/arm/boot/compressed/decompress.c does
#define _LINUX_STRING_H_
preventing linux/string.h from providing strchrnul. It also #includes
asm/string.h, which for arm has a declaration of strchr(), explaining why
this didn't use to fail.
Link: http://lkml.kernel.org/r/20190528115346.f5a7kn3hdnuf5rts@linutronix.de Fixes: 3713a4e1fdb8d ("include/linux/cpumask.h: fix double string traverse in cpumask_parse") Suggested-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Yury Norov <ynorov@marvell.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Randy Dunlap [Sat, 1 Jun 2019 05:30:00 +0000 (22:30 -0700)]
lib/sort.c: fix kernel-doc notation warnings
Fix kernel-doc notation in lib/sort.c by using correct function parameter
names.
lib/sort.c:59: warning: Excess function parameter 'size' description in 'swap_words_32'
lib/sort.c:83: warning: Excess function parameter 'size' description in 'swap_words_64'
lib/sort.c:110: warning: Excess function parameter 'size' description in 'swap_bytes'
Link: http://lkml.kernel.org/r/60e25d3d-68d1-bde2-3b39-e4baa0b14907@infradead.org Fixes: 37d0ec34d111a ("lib/sort: make swap functions more generic") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Cc: George Spelvin <lkml@sdf.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Masahiro Yamada [Sat, 1 Jun 2019 03:22:42 +0000 (12:22 +0900)]
treewide: fix typos of SPDX-License-Identifier
Prior to the adoption of SPDX, it was difficult for tools to determine
the correct license due to incomplete or badly formatted license text.
The SPDX solves this issue, assuming people can correctly spell
"SPDX-License-Identifier" although this assumption is broken in some
places.
Since scripts/spdxcheck.py parses only lines that exactly matches to
the correct tag, it cannot (should not) detect this kind of error.
If the correct tag is missing, scripts/checkpatch.pl warns like this:
WARNING: Missing or malformed SPDX-License-Identifier tag in line *
So, people should notice it before the patch submission, but in reality
broken tags sometimes slip in. The checkpatch warning is not useful for
checking the committed files globally since large number of files still
have no SPDX tag.
Also, I am not sure about the legal effect when the SPDX tag is broken.
Anyway, these typos are absolutely worth fixing. It is pretty easy to
find suspicious lines by grep.
Paolo Bonzini [Fri, 31 May 2019 22:48:45 +0000 (00:48 +0200)]
Merge tag 'kvm-ppc-fixes-5.2-1' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc into kvm-master
PPC KVM fixes for 5.2
- Several bug fixes for the new XIVE-native code.
- Replace kvm->lock by other mutexes in several places where we hold a
vcpu mutex, to avoid lock order inversions.
- Fix a lockdep warning on guest entry for radix-mode guests.
- Fix a bug causing user-visible corruption of SPRG3 on the host.
John Pittman [Thu, 23 May 2019 21:49:39 +0000 (17:49 -0400)]
block: print offending values when cloned rq limits are exceeded
While troubleshooting issues where cloned request limits have been
exceeded, it is often beneficial to know the actual values that
have been breached. Print these values, assisting in ease of
identification of root cause of the breach.
Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: John Pittman <jpittman@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>