There is a race window where quota was redirted once we drop dq_list_lock inside dqput(),
but before we grab dquot->dq_lock inside dquot_release()
TASK1 TASK2 (chowner)
->dqput()
we_slept:
spin_lock(&dq_list_lock)
if (dquot_dirty(dquot)) {
spin_unlock(&dq_list_lock);
dquot->dq_sb->dq_op->write_dquot(dquot);
goto we_slept
if (test_bit(DQ_ACTIVE_B, &dquot->dq_flags)) {
spin_unlock(&dq_list_lock);
dquot->dq_sb->dq_op->release_dquot(dquot);
dqget()
mark_dquot_dirty()
dqput()
goto we_slept;
}
So dquot dirty quota will be released by TASK1, but on next we_sleept loop
we detect this and call ->write_dquot() for it.
XFSTEST: https://github.com/dmonakhov/xfstests/commit/440a80d4cbb39e9234df4d7240aee1d551c36107
Link: https://lore.kernel.org/r/20191031103920.3919-2-dmonakhov@openvz.org CC: stable@vger.kernel.org Signed-off-by: Dmitry Monakhov <dmtrmonakhov@yandex-team.ru> Signed-off-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The PCI INTx interrupts and other LSI interrupts are handled differently
under a sPAPR platform. When the interrupt source characteristics are
queried, the hypervisor returns an H_INT_ESB flag to inform the OS
that it should be using the H_INT_ESB hcall for interrupt management
and not loads and stores on the interrupt ESB pages.
A default -1 value is returned for the addresses of the ESB pages. The
driver ignores this condition today and performs a bogus IO mapping.
Recent changes and the DEBUG_VM configuration option make the bug
visible with :
When the machine crash handler is invoked, all interrupts are masked
but interrupts which have not been started yet do not have an ESB page
mapped in the Linux address space. This crashes the 'crash kexec'
sequence on sPAPR guests.
To fix, force the mapping of the ESB page when an interrupt is being
mapped in the Linux IRQ number space. This is done by setting the
initial state of the interrupt to OFF which is not necessarily the
case on PowerNV.
Fixes: 243e25112d06 ("powerpc/xive: Native exploitation of the XIVE interrupt controller") Cc: stable@vger.kernel.org # v4.12+ Signed-off-by: Cédric Le Goater <clg@kaod.org> Reviewed-by: Greg Kurz <groug@kaod.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191031063100.3864-1-clg@kaod.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When calling __kernel_sync_dicache with a size >4GB, we were masking
off the upper 32 bits, so we would incorrectly flush a range smaller
than intended.
This patch replaces the 32 bit shifts with 64 bit ones, so that
the full size is accounted for.
When tracing etm data of multiple threads on multiple cpus through perf
interface, some link devices are shared between paths of different cpus.
It creates race conditions when different cpus wants to enable/disable
the same link device at the same time.
Example 1:
Two cpus want to enable different ports of a coresight funnel, thus
calling the funnel enable operation at the same time. But the funnel
enable operation isn't reentrantable.
Example 2:
For an enabled coresight dynamic replicator with refcnt=1, one cpu wants
to disable it, while another cpu wants to enable it. Ideally we still have
an enabled replicator with refcnt=1 at the end. But in reality the result
is uncertain.
Since coresight devices claim themselves when enabled for self-hosted
usage, the race conditions above usually make the link devices not usable
after many cycles.
To fix the race conditions, this patch uses spinlocks to serialize
enabling/disabling link devices.
Commit c7fd62bc69d02 ("stm class: Introduce framing protocol drivers")
forgot to tear down the link between an stm device and its protocol
driver when policy is removed. This leads to an invalid pointer reference
if one tries to write to an stm device after the policy has been removed
and the protocol driver module unloaded, leading to the below splat:
If dev->dma_device->params == NULL then the maximum DMA segment size is 64
KB. See also the dma_get_max_seg_size() implementation. This patch fixes
the following kernel warning:
The MMC card detection GPIO polarity is active low on TAO3530, like in many
other similar boards. Now the card is not detected and it is unable to
mount rootfs from an SD card.
Fix this by using the correct polarity.
This incorrect polarity was defined already in the commit 30d95c6d7092
("ARM: dts: omap3: Add Technexion TAO3530 SOM omap3-tao3530.dtsi") in v3.18
kernel and later changed to use defined GPIO constants in v4.4 kernel by
the commit 3a637e008e54 ("ARM: dts: Use defined GPIO constants in flags
cell for OMAP2+ boards").
While the latter commit did not introduce the issue I'm marking it with
Fixes tag due the v4.4 kernels still being maintained.
Fixes: 3a637e008e54 ("ARM: dts: Use defined GPIO constants in flags cell for OMAP2+ boards") Cc: linux-stable <stable@vger.kernel.org> # 4.4+ Signed-off-by: Jarkko Nikula <jarkko.nikula@bitmer.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Pandora_wl1251_init_card was used to do special pdata based
setup of the sdio mmc interface. This does no longer work with
v4.7 and later. A fix requires a device tree based mmc3 setup.
Therefore we move the special setup to omap_hsmmc.c instead
of calling some pdata supplied init_card function.
The new code checks for a DT child node compatible to wl1251
so it will not affect other MMC3 use cases.
Generally, this code was and still is a hack and should be
moved to mmc core to e.g. read such properties from optional
DT child nodes.
Fixes: 81eef6ca9201 ("mmc: omap_hsmmc: Use dma_request_chan() for requesting DMA channel") Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Cc: <stable@vger.kernel.org> # v4.7+
[Ulf: Fixed up some checkpatch complaints] Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
In s3c64xx_eint_eint0_init() the for_each_child_of_node() loop is used
with a break to find a matching child node. Although each iteration of
for_each_child_of_node puts the previous node, but early exit from loop
misses it. This leads to leak of device node.
Several functions use for_each_child_of_node() loop with a break to find
a matching child node. Although each iteration of
for_each_child_of_node puts the previous node, but early exit from loop
misses it. This leads to leak of device node.
In s3c24xx_eint_init() the for_each_child_of_node() loop is used with a
break to find a matching child node. Although each iteration of
for_each_child_of_node puts the previous node, but early exit from loop
misses it. This leads to leak of device node.
In exynos_eint_wkup_init() the for_each_child_of_node() loop is used
with a break to find a matching child node. Although each iteration of
for_each_child_of_node puts the previous node, but early exit from loop
misses it. This leads to leak of device node.
Cc: <stable@vger.kernel.org> Fixes: 43b169db1841 ("pinctrl: add exynos4210 specific extensions for samsung pinctrl driver") Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Each iteration of for_each_child_of_node puts the previous node, but in
the case of a return from the middle of the loop, there is no put, thus
causing a memory leak. Hence add an of_node_put before the return of
exynos_eint_wkup_init() error path.
Issue found with Coccinelle.
As explained in the following commit a9a1a4833613 ("pinctrl:
armada-37xx: Fix gpio interrupt setup") the armada_37xx_irq_set_type()
function can be called before the initialization of the mask field.
That means that we can't use this field in this function and need to
workaround it using hwirq.
Fixes: 30ac0d3b0702 ("pinctrl: armada-37xx: Add edge both type gpio irq support") Cc: stable@vger.kernel.org Reported-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com> Link: https://lore.kernel.org/r/20191115155752.2562-1-gregory.clement@bootlin.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Certain ACPI-enumerated devices represented as platform devices in
Linux, like fans, require special low-level power management handling
implemented by their drivers that is not in agreement with the ACPI
PM domain behavior. That leads to problems with managing ACPI fans
during system-wide suspend and resume.
For this reason, make acpi_dev_pm_attach() skip the affected devices
by adding a list of device IDs to avoid to it and putting the IDs of
the affected devices into that list.
Fixes: e5cc8ef31267 (ACPI / PM: Provide ACPI PM callback routines for subsystems) Reported-by: Zhang Rui <rui.zhang@intel.com> Tested-by: Todd Brandt <todd.e.brandt@linux.intel.com> Cc: 3.10+ <stable@vger.kernel.org> # 3.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
In i2c_acpi_remove_space_handler(), a leak occurs whenever the
"data" parameter is initialized to 0 before being passed to
acpi_bus_get_private_data().
This is because the NULL pointer check in acpi_bus_get_private_data()
(condition->if(!*data)) returns EINVAL and, in consequence, memory is
never freed in i2c_acpi_remove_space_handler().
Fix the NULL pointer check in acpi_bus_get_private_data() to follow
the analogous check in acpi_get_data_full().
Signed-off-by: Vamshi K Sthambamkadi <vamshi.k.sthambamkadi@gmail.com>
[ rjw: Subject & changelog ] Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
acpi_os_map_cleanup checks map->refcount outside of acpi_ioremap_lock
before freeing the map. This creates a race condition the can result
in the map being freed more than once.
A panic can be caused by running
for ((i=0; i<10; i++))
do
for ((j=0; j<100000; j++))
do
cat /sys/firmware/acpi/tables/data/BERT >/dev/null
done &
done
This patch makes sure that only the process that drops the reference
to 0 does the freeing.
Fixes: b7c1fadd6c2e ("ACPI: Do not use krefs under a mutex in osl.c") Signed-off-by: Francesco Ruggeri <fruggeri@arista.com> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Valerio and others reported that commit 84c8b58ed3ad ("ACPI / hotplug /
PCI: Don't scan bridges managed by native hotplug") prevents some recent
LG and HP laptops from booting with endless loop of:
ACPI Error: No handler or method for GPE 08, disabling event (20190215/evgpe-835)
ACPI Error: No handler or method for GPE 09, disabling event (20190215/evgpe-835)
ACPI Error: No handler or method for GPE 0A, disabling event (20190215/evgpe-835)
...
What seems to happen is that during boot, after the initial PCI enumeration
when EC is enabled the platform triggers ACPI Notify() to one of the root
ports. The root port itself looks like this:
The BIOS has configured the root port so that it does not have I/O bridge
window.
Now when the ACPI Notify() is triggered ACPI hotplug handler calls
acpiphp_native_scan_bridge() for each non-hotplug bridge (as this system is
using native PCIe hotplug) and pci_assign_unassigned_bridge_resources() to
allocate resources.
The device connected to the root port is a PCIe switch (Thunderbolt
controller) with two hotplug downstream ports. Because of the hotplug ports
__pci_bus_size_bridges() tries to add "additional I/O" of 256 bytes to each
(DEFAULT_HOTPLUG_IO_SIZE). This gets further aligned to 4k as that's the
minimum I/O window size so each hotplug port gets 4k I/O window and the
same happens for the root port (which is also hotplug port). This means
3 * 4k = 12k I/O window.
Because of this pci_assign_unassigned_bridge_resources() ends up opening a
I/O bridge window for the root port at first available I/O address which
seems to be in range 0x1000 - 0x3fff. Normally this range is used for ACPI
stuff such as GPE bits (below is part of /proc/ioports):
However, when the ACPI Notify() happened this range was not yet reserved
for ACPI/PNP (that happens later) so PCI gets it. It then starts writing to
this range and accidentally stomps over GPE bits among other things causing
the endless stream of messages about missing GPE handler.
This problem does not happen if "pci=hpiosize=0" is passed in the kernel
command line. The reason is that then the kernel does not try to allocate
the additional 256 bytes for each hotplug port.
Fix this by allocating resources directly below the non-hotplug bridges
where a new device may appear as a result of ACPI Notify(). This avoids the
hotplug bridges and prevents opening the additional I/O window.
The iGPU / GFX0 device's _PS0 method on the ASUS T200TA depends on the
I2C1 controller (which is connected to the embedded controller). But unlike
in the T100TA/T100CHI this dependency is not listed in the _DEP of the GFX0
device.
This results in the dev_WARN_ONCE(..., "Transfer while suspended\n") call
in i2c-designware-master.c triggering and the AML code not working as it
should.
This commit fixes this by adding a dmi based quirk mechanism for devices
which miss a _DEP, and adding a quirk for the LNXVIDEO depending on the
I2C1 device on the Asus T200TA.
Fixes: 2d71ee0ce72f ("ACPI / LPSS: Add a device link from the GPU to the BYT I2C5 controller") Tested-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: 4.20+ <stable@vger.kernel.org> # 4.20+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Various Asus Bay Trail devices (T100TA, T100CHI, T200TA) have an embedded
controller connected to I2C1 and the iGPU (LNXVIDEO) _PS0/_PS3 methods
access it, so we need to add a consumer link from LNXVIDEO to I2C1 on
these devices to avoid suspend/resume ordering problems.
Fixes: 2d71ee0ce72f ("ACPI / LPSS: Add a device link from the GPU to the BYT I2C5 controller") Tested-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: 4.20+ <stable@vger.kernel.org> # 4.20+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
So far on Bay Trail (BYT) we only have been adding a device_link adding
the iGPU (LNXVIDEO) device as consumer for the I2C controller for the
PMIC for I2C5, but the PMIC only uses I2C5 on BYT CR (cost reduced) on
regular BYT platforms I2C7 is used and we were not adding the device_link
sometimes causing resume ordering issues.
This commit adds LNXVIDEO -> BYT I2C7 to the lpss_device_links table,
fixing this.
Fixes: 2d71ee0ce72f ("ACPI / LPSS: Add a device link from the GPU to the BYT I2C5 controller") Tested-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: 4.20+ <stable@vger.kernel.org> # 4.20+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The following build warning occurred on powerpc 64-bit builds:
drivers/cpufreq/powernv-cpufreq.c: In function 'init_chip_info':
drivers/cpufreq/powernv-cpufreq.c:1070:1: warning: the frame size of
1040 bytes is larger than 1024 bytes [-Wframe-larger-than=]
This is with a cross-compiler based on gcc 8.1.0, which I got from:
https://mirrors.edge.kernel.org/pub/tools/crosstool/files/bin/x86_64/8.1.0/
The warning is due to putting 1024 bytes on the stack:
unsigned int chip[256];
...and it's also undesirable to have a hard limit on the number of
CPUs here.
Fix both problems by dynamically allocating based on num_possible_cpus,
as recommended by Michael Ellerman.
Fixes: 053819e0bf840 ("cpufreq: powernv: Handle throttling due to Pmax capping at chip level") Signed-off-by: John Hubbard <jhubbard@nvidia.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.10+ <stable@vger.kernel.org> # 4.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
There is no locking in this sysfs show function so stats printing can
race with a devfreq_update_status called as part of freq switching or
with initialization.
Also add an assert in devfreq_update_status to make it clear that lock
must be held by caller.
Commit a753bfcfdb1f ("intel_th: Make the switch allocate its subdevices")
factored out intel_th_subdevice_alloc() from intel_th_populate(), but got
the error path wrong, resulting in two instances of a double put_device()
on a freshly initialized, but not 'added' device.
Fix this by only doing one put_device() in the error path.
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Fixes: a753bfcfdb1f ("intel_th: Make the switch allocate its subdevices") Reported-by: Wen Yang <wenyang@linux.alibaba.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: stable@vger.kernel.org # v4.14+ Link: https://lore.kernel.org/r/20191120130806.44028-2-alexander.shishkin@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When a root user or a user with CAP_SYS_ADMIN privilege uses any
trace_imc performance monitoring unit events, to monitor application
or KVM threads, it may result in a checkstop (System crash).
The cause is frequent switching of the "trace/accumulation" mode of
the In-Memory Collection hardware (LDBAR).
This patch disables the trace_imc PMU unit entirely to avoid
triggering the checkstop. A future patch will reenable it at a later
stage once a workaround has been developed.
Fixes: 012ae244845f ("powerpc/perf: Trace imc PMU functions") Cc: stable@vger.kernel.org # v5.2+ Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com> Tested-by: Hariharan T.S. <hari@linux.ibm.com>
[mpe: Add pr_info_once() so dmesg shows the PMU has been disabled] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191118034452.9939-1-maddy@linux.vnet.ibm.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
As David reported [1], ENODATA returns when attempting
to modify files by using EROFS as an overlayfs lower layer.
The root cause is that listxattr could return unexpected
-ENODATA by mistake for inodes without xattr. That breaks
listxattr return value convention and it can cause copy
up failure when used with overlayfs.
Resolve by zeroing out if no xattr is found for listxattr.
The TEO governor uses idle duration "bins" defined in accordance with
the CPU idle states table provided by the driver, so that each "bin"
covers the idle duration range between the target residency of the
idle state corresponding to it and the target residency of the closest
deeper idle state. The governor collects statistics for each bin
regardless of whether or not the idle state corresponding to it is
currently enabled.
In particular, the "early hits" metric measures the likelihood of a
situation in which the idle duration measured after wakeup falls into
to given bin, but the time till the next timer (sleep length) falls
into a bin corresponding to one of the deeper idle states. It is
used when the "hits" and "misses" metrics indicate that the state
"matching" the sleep length should not be selected, so that the state
with the maximum "early hits" value is selected instead of it.
If the idle state corresponding to the given bin is disabled, it
cannot be selected and if it turns out to be the one that should be
selected, a shallower idle state needs to be used instead of it.
Nevertheless, the metrics collected for the bin corresponding to it
are still valid and need to be taken into account as though that
state had not been disabled.
As far as the "early hits" metric is concerned, teo_select() tries to
take disabled states into account, but the state index corresponding
to the maximum "early hits" value computed by it may be incorrect.
Namely, it always uses the index of the previous maximum "early hits"
state then, but there may be enabled idle states closer to the
disabled one in question. In particular, if the current candidate
state (whose index is the idx value) is closer to the disabled one
and the "early hits" value of the disabled state is greater than the
current maximum, the index of the current candidate state (idx)
should replace the "maximum early hits state" index.
Modify the code to handle that case correctly.
Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems") Reported-by: Doug Smythies <dsmythies@telus.net> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 5.1+ <stable@vger.kernel.org> # 5.1+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The TEO governor uses idle duration "bins" defined in accordance with
the CPU idle states table provided by the driver, so that each "bin"
covers the idle duration range between the target residency of the
idle state corresponding to it and the target residency of the closest
deeper idle state. The governor collects statistics for each bin
regardless of whether or not the idle state corresponding to it is
currently enabled.
In particular, the "hits" and "misses" metrics measure the likelihood
of a situation in which both the time till the next timer (sleep
length) and the idle duration measured after wakeup fall into the
given bin. Namely, if the "hits" value is greater than the "misses"
one, that situation is more likely than the one in which the sleep
length falls into the given bin, but the idle duration measured after
wakeup falls into a bin corresponding to one of the shallower idle
states.
If the idle state corresponding to the given bin is disabled, it
cannot be selected and if it turns out to be the one that should be
selected, a shallower idle state needs to be used instead of it.
Nevertheless, the metrics collected for the bin corresponding to it
are still valid and need to be taken into account as though that
state had not been disabled.
For this reason, make teo_select() always use the "hits" and "misses"
values of the idle duration range that the sleep length falls into
even if the specific idle state corresponding to it is disabled and
if the "hits" values is greater than the "misses" one, select the
closest enabled shallower idle state in that case.
Fixes: b26bf6ab716f ("cpuidle: New timer events oriented governor for tickless systems") Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: 5.1+ <stable@vger.kernel.org> # 5.1+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Fix __cpuidle_set_driver() to check if any of the CPUs in the mask has
a driver different from drv already and, if so, return -EBUSY before
updating any cpuidle_drivers per-CPU pointers.
If a process is interrupted while accessing the radio device and the
core lock is contended, release() could return early and fail to update
the interrupt mask.
Note that the return value of the v4l2 release file operation is
ignored.
Fixes: 87d1a50ce451 ("[media] V4L2: WL1273 FM Radio: TI WL1273 FM radio driver") Cc: stable <stable@vger.kernel.org> # 2.6.38 Cc: Matti Aaltonen <matti.j.aaltonen@nokia.com> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
If a process is interrupted while accessing the video device and the
device lock is contended, release() could return early and fail to free
related resources.
Note that the return value of the v4l2 release file operation is
ignored.
Commit 953aaa1492c53 ("media: rockchip/vpu: Prepare things to support decoders")
changed the conditions under S_FMT was allowed for OUTPUT
CAPTURE buffers.
However, and according to the mem-to-mem stateless decoder specification,
in order to support dynamic resolution changes, S_FMT should be allowed
even if OUTPUT buffers have been allocated.
Relax decoder S_FMT restrictions on OUTPUT buffers, allowing a
resolution modification, provided the pixel format stays the same.
Tested on RK3288 platforms using ChromiumOS Video Decode/Encode
Accelerator Unittests.
[hverkuil: fix typo: In other -> In order]
Fixes: 953aaa1492c53 ("media: rockchip/vpu: Prepare things to support decoders") Signed-off-by: Ezequiel Garcia <ezequiel@collabora.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Cc: <stable@vger.kernel.org> # for v5.4 and up Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
On older HW or under a hypervisor, w/o the instruction-execution-
protection (IEP) facility, and also w/o EDAT-1, a translation-specification
exception may be recognized when bit 55 of a pte is one (_PAGE_NOEXEC).
The current code tries to prevent setting _PAGE_NOEXEC in such cases,
by removing it within set_pte_at(). However, ptep_set_access_flags()
will modify a pte directly, w/o using set_pte_at(). There is at least
one scenario where this can result in an active pte with _PAGE_NOEXEC
set, which would then lead to a panic due to a translation-specification
exception (write to swapped out page):
do_swap_page
pte = mk_pte (with _PAGE_NOEXEC bit)
set_pte_at (will remove _PAGE_NOEXEC bit in page table, but keep it
in local variable pte)
vmf->orig_pte = pte (pte still contains _PAGE_NOEXEC bit)
do_wp_page
wp_page_reuse
entry = vmf->orig_pte (still with _PAGE_NOEXEC bit)
ptep_set_access_flags (writes entry with _PAGE_NOEXEC bit)
Fix this by clearing _PAGE_NOEXEC already in mk_pte_phys(), where the
pgprot value is applied, so that no pte with _PAGE_NOEXEC will ever be
visible, if it is not supported. The check in set_pte_at() can then also
be removed.
memcpy() call with "idata == NULL && ilen == 0" results in undefined
behavior in ar5523_cmd(). For example, NULL is passed in callchain
"ar5523_stat_work() -> ar5523_cmd_write() -> ar5523_cmd()". This patch
adds ilen check before memcpy() call in ar5523_cmd() to prevent an
undefined behavior.
Cc: Pontus Fuchs <pontus.fuchs@gmail.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: David Laight <David.Laight@ACULAB.COM> Cc: stable@vger.kernel.org Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
memcpy() in wmi_set_ie() and wmi_update_ft_ies() is called with
src == NULL and len == 0. This is an undefined behavior. Fix it
by checking "ie_len > 0" before the memcpy() calls.
As suggested by GCC documentation:
"The pointers passed to memmove (and similar functions in <string.h>)
must be non-null even when nbytes==0, so GCC can use that information
to remove the check after the memmove call." [1]
[1] https://gcc.gnu.org/gcc-4.9/porting_to.html
Cc: Maya Erez <merez@codeaurora.org> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: stable@vger.kernel.org Signed-off-by: Denis Efremov <efremov@linux.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
It is reported that sysfs buffer overflow can be triggered if the system
has too many CPU cores(>841 on 4K PAGE_SIZE) when showing CPUs of
hctx via /sys/block/$DEV/mq/$N/cpu_list.
Use snprintf to avoid the potential buffer overflow.
This version doesn't change the attribute format, and simply stops
showing CPU numbers if the buffer is going to overflow.
Cc: stable@vger.kernel.org Fixes: 676141e48af7("blk-mq: don't dump CPU -> hw queue map on driver load") Signed-off-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
If pers->make_request fails in md_flush_request(), the bio is lost. To
fix this, pass back a bool to indicate if the original make_request call
should continue to handle the I/O and instead of assuming the flush logic
will push it to completion.
Convert md_flush_request to return a bool and no longer calls the raid
driver's make_request function. If the return is true, then the md flush
logic has or will complete the bio and the md make_request call is done.
If false, then the md make_request function needs to keep processing like
it is a normal bio. Let the original call to md_handle_request handle any
need to retry sending the bio to the raid driver's make_request function
should it be needed.
Also mark md_flush_request and the make_request function pointer as
__must_check to issue warnings should these critical return values be
ignored.
Fixes: 2bc13b83e629 ("md: batch flush requests.") Cc: stable@vger.kernel.org # # v4.19+ Cc: NeilBrown <neilb@suse.com> Signed-off-by: David Jeffery <djeffery@redhat.com> Reviewed-by: Xiao Ni <xni@redhat.com> Signed-off-by: Song Liu <songliubraving@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Audmix support two substream, When two substream start
to run, the trigger function may be called by two substream
in same time, that the priv->tdms may be updated wrongly.
The expected priv->tdms is 0x3, but sometimes the
result is 0x2, or 0x1.
Fixes: be1df61cf06e ("ASoC: fsl: Add Audio Mixer CPU DAI driver") Signed-off-by: Shengjiu Wang <shengjiu.wang@nxp.com> Acked-by: Nicolin Chen <nicoleotsuka@gmail.com> Reviewed-by: Daniel Baluta <daniel.baluta@nxp.com> Link: https://lore.kernel.org/r/1e706afe53fdd1fbbbc79277c48a98f8416ba873.1573458378.git.shengjiu.wang@nxp.com Signed-off-by: Mark Brown <broonie@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The headphone jack on buddy was broken with the following commit:
commit 6b5da66322c5 ("ASoC: rt5645: read jd1_1 status for jd
detection").
This changes the jd_mode for buddy to 4 so buddy can read from the same
register that was used in the working version of this driver without
affecting any other devices that might use this, since no other device uses
jd_mode = 4. To test this I plugged and uplugged the headphone jack, verifying
audio works.
Signed-off-by: Jacob Rasmussen <jacobraz@google.com> Reviewed-by: Ross Zwisler <zwisler@google.com> Link: https://lore.kernel.org/r/20191111185957.217244-1-jacobraz@google.com Signed-off-by: Mark Brown <broonie@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
008847f66c3 ("workqueue: allow rescuer thread to do more work.") made
the rescuer worker requeue the pwq immediately if there may be more
work items which need rescuing instead of waiting for the next mayday
timer expiration. Unfortunately, it doesn't check whether the pwq is
already on the mayday list and unconditionally gets the ref and moves
it onto the list. This doesn't corrupt the list but creates an
additional reference to the pwq. It got queued twice but will only be
removed once.
This leak later can trigger pwq refcnt warning on workqueue
destruction and prevent freeing of the workqueue.
Before actually destrying a workqueue, destroy_workqueue() checks
whether it's actually idle. If it isn't, it prints out a bunch of
warning messages and leaves the workqueue dangling. It unfortunately
has a couple issues.
* Mayday list queueing increments pwq's refcnts which gets detected as
busy and fails the sanity checks. However, because mayday list
queueing is asynchronous, this condition can happen without any
actual work items left in the workqueue.
* Sanity check failure leaves the sysfs interface behind too which can
lead to init failure of newer instances of the workqueue.
This patch fixes the above two by
* If a workqueue has a rescuer, disable and kill the rescuer before
sanity checks. Disabling and killing is guaranteed to flush the
existing mayday list.
Commit 75d66ffb48efb3 added backing device health checks and as a part
of these checks, check_events() block ops template call is invoked in
dm-zoned mapping path as well as in reclaim and flush path. Calling
check_events() with ATA or SCSI backing devices introduces a blocking
scsi_test_unit_ready() call being made in sd_check_events(). Even though
the overhead of calling scsi_test_unit_ready() is small for ATA zoned
devices, it is much larger for SCSI and it affects performance in a very
negative way.
Fix this performance regression by executing check_events() only in case
of any I/O errors. The function dmz_bdev_is_dying() is modified to call
only blk_queue_dying(), while calls to check_events() are made in a new
helper function, dmz_check_bdev().
Existing RNG data read timeout is 200us but it doesn't cover EIP76 RNG
data rate which takes approx. 700us to produce 16 bytes of output data
as per testing results. So configure the timeout as 1000us to also take
account of lack of udelay()'s reliability.
Fixes: 383212425c92 ("hwrng: omap - Add device variant for SafeXcel IP-76 found in Armada 8K") Cc: <stable@vger.kernel.org> Signed-off-by: Sumit Garg <sumit.garg@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
In ovl_rename(), if new upper is hardlinked to old upper underneath
overlayfs before upper dirs are locked, user will get an ESTALE error
and a WARN_ON will be printed.
Changes to underlying layers while overlayfs is mounted may result in
unexpected behavior, but it shouldn't crash the kernel and it shouldn't
trigger WARN_ON() either, so relax this WARN_ON().
On non-samefs overlay without xino, non pure upper inodes should use a
pseudo_dev assigned to each unique lower fs and pure upper inodes use the
real upper st_dev.
It is fine for an overlay pure upper inode to use the same st_dev;st_ino
values as the real upper inode, because the content of those two different
filesystem objects is always the same.
In this case, however:
- two filesystems, A and B
- upper layer is on A
- lower layer 1 is also on A
- lower layer 2 is on B
Non pure upper overlay inode, whose origin is in layer 1 will have the same
st_dev;st_ino values as the real lower inode. This may result with a false
positive results of 'diff' between the real lower and copied up overlay
inode.
Fix this by using the upper st_dev;st_ino values in this case. This breaks
the property of constant st_dev;st_ino across copy up of this case. This
breakage will be fixed by a later patch.
In the past, overlayfs required that lower fs have non null uuid in
order to support nfs export and decode copy up origin file handles.
Commit 9df085f3c9a2 ("ovl: relax requirement for non null uuid of
lower fs") relaxed this requirement for nfs export support, as long
as uuid (even if null) is unique among all lower fs.
However, said commit unintentionally also relaxed the non null uuid
requirement for decoding copy up origin file handles, regardless of
the unique uuid requirement.
Amend this mistake by disabling decoding of copy up origin file handle
from lower fs with a conflicting uuid.
We still encode copy up origin file handles from those fs, because
file handles like those already exist in the wild and because they
might provide useful information in the future.
There is an unhandled corner case described by Miklos this way:
- two filesystems, A and B, both have null uuid
- upper layer is on A
- lower layer 1 is also on A
- lower layer 2 is on B
In this case bad_uuid won't be set for B, because the check only
involves the list of lower fs. Hence we'll try to decode a layer 2
origin on layer 1 and fail.
We will deal with this corner case later.
Reported-by: Colin Ian King <colin.king@canonical.com> Tested-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/lkml/20191106234301.283006-1-colin.king@canonical.com/ Fixes: 9df085f3c9a2 ("ovl: relax requirement for non null uuid ...") Cc: stable@vger.kernel.org # v4.20+ Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Newer versions of awk spit out these fun warnings:
awk: ../lib/raid6/unroll.awk:16: warning: regexp escape sequence `\#' is not a known regexp operator
As commit 700c1018b86d ("x86/insn: Fix awk regexp warnings") showed, it
turns out that there are a number of awk strings that do not need to be
escaped and newer versions of awk now warn about this.
Fix the string up so that no warning is produced. The exact same kernel
module gets created before and after this patch, showing that it wasn't
needed.
In commit 38506ecefab9 ("rtlwifi: rtl_pci: Start modification for
new drivers"), a callback needed to check if the hardware has released
a buffer indicating that a DMA operation is completed was not added.
Fixes: 38506ecefab9 ("rtlwifi: rtl_pci: Start modification for new drivers") Cc: Stable <stable@vger.kernel.org> # v3.18+ Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
In commit 38506ecefab9 ("rtlwifi: rtl_pci: Start modification for
new drivers"), a callback to get the RX buffer address was added to
the PCI driver. Unfortunately, driver rtl8192de was not modified
appropriately and the code runs into a WARN_ONCE() call. The use
of an incorrect array is also fixed.
Fixes: 38506ecefab9 ("rtlwifi: rtl_pci: Start modification for new drivers") Cc: Stable <stable@vger.kernel.org> # 3.18+ Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Testing with the new fsstress support for subvolumes uncovered a pretty
bad problem with rename exchange on subvolumes. We're modifying two
different subvolumes, but we only start the transaction on one of them,
so the other one is not added to the dirty root list. This is caught by
btrfs_cow_block() with a warning because the root has not been updated,
however if we do not modify this root again we'll end up pointing at an
invalid root because the root item is never updated.
Fix this by making sure we add the destination root to the trans list,
the same as we do with normal renames. This fixes the corruption.
Fixes: cdd1fedf8261 ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT") CC: stable@vger.kernel.org # 4.9+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Backreference walking, which is used by send to figure if it can issue
clone operations instead of write operations, can be very slow and use
too much memory when extents have many references. This change simply
skips backreference walking when an extent has more than 64 references,
in which case we fallback to a write operation instead of a clone
operation. This limit is conservative and in practice I observed no
signicant slowdown with up to 100 references and still low memory usage
up to that limit.
This is a temporary workaround until there are speedups in the backref
walking code, and as such it does not attempt to add extra interfaces or
knobs to tweak the threshold.
The last user of btrfs_bio::flags was removed in commit 326e1dbb5736
("block: remove management of bi_remaining when restoring original
bi_end_io"), remove it.
(Tagged for stable as the structure is heavily used and space savings
are desirable.)
CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
During a cyclic writeback, extent_write_cache_pages() uses done_index
to update the writeback_index after the current run is over. However,
instead of current index + 1, it gets to to the current index itself.
Unfortunately, this, combined with returning on EOF instead of looping
back, can lead to the following pathlogical behavior.
1. There is a single file which has accumulated enough dirty pages to
trigger balance_dirty_pages() and the writer appending to the file
with a series of short writes.
2. balance_dirty_pages kicks in, wakes up background writeback and sleeps.
3. Writeback kicks in and the cursor is on the last page of the dirty
file. Writeback is started or skipped if already in progress. As
it's EOF, extent_write_cache_pages() returns and the cursor is set
to done_index which is pointing to the last page.
4. Writeback is done. Nothing happens till balance_dirty_pages
finishes, at which point we go back to #1.
This can almost completely stall out writing back of the file and keep
the system over dirty threshold for a long time which can mess up the
whole system. We encountered this issue in production with a package
handling application which can reliably reproduce the issue when
running under tight memory limits.
Reading the comment in the error handling section, this seems to be to
avoid accidentally skipping a page in case the write attempt on the
page doesn't succeed. However, this concern seems bogus.
On each page, the code either:
* Skips and moves onto the next page.
* Fails issue and sets done_index to index + 1.
* Successfully issues and continue to the next page if budget allows
and not EOF.
IOW, as long as it's not EOF and there's budget, the code never
retries writing back the same page. Only when a page happens to be
the last page of a particular run, we end up retrying the page, which
can't possibly guarantee anything data integrity related. Besides,
cyclic writes are only used for non-syncing writebacks meaning that
there's no data integrity implication to begin with.
Fix it by always setting done_index past the current page being
processed.
Note that this problem exists in other writepages too.
CC: stable@vger.kernel.org # 4.19+ Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When doing a buffered write it's possible to leave the subv_writers
counter of the root, used for synchronization between buffered nocow
writers and snapshotting. This happens in an exceptional case like the
following:
1) We fail to allocate data space for the write, since there's not
enough available data space nor enough unallocated space for allocating
a new data block group;
2) Because of that failure, we try to go to NOCOW mode, which succeeds
and therefore we set the local variable 'only_release_metadata' to true
and set the root's sub_writers counter to 1 through the call to
btrfs_start_write_no_snapshotting() made by check_can_nocow();
3) The call to btrfs_copy_from_user() returns zero, which is very unlikely
to happen but not impossible;
4) No pages are copied because btrfs_copy_from_user() returned zero;
5) We call btrfs_end_write_no_snapshotting() which decrements the root's
subv_writers counter to 0;
6) We don't set 'only_release_metadata' back to 'false' because we do
it only if 'copied', the value returned by btrfs_copy_from_user(), is
greater than zero;
7) On the next iteration of the while loop, which processes the same
page range, we are now able to allocate data space for the write (we
got enough data space released in the meanwhile);
8) After this if we fail at btrfs_delalloc_reserve_metadata(), because
now there isn't enough free metadata space, or in some other place
further below (prepare_pages(), lock_and_cleanup_extent_if_need(),
btrfs_dirty_pages()), we break out of the while loop with
'only_release_metadata' having a value of 'true';
9) Because 'only_release_metadata' is 'true' we end up decrementing the
root's subv_writers counter to -1 (through a call to
btrfs_end_write_no_snapshotting()), and we also end up not releasing the
data space previously reserved through btrfs_check_data_free_space().
As a consequence the mechanism for synchronizing NOCOW buffered writes
with snapshotting gets broken.
Fix this by always setting 'only_release_metadata' to false at the start
of each iteration.
Fixes: 8257b2dc3c1a ("Btrfs: introduce btrfs_{start, end}_nocow_write() for each subvolume") Fixes: 7ee9e4405f26 ("Btrfs: check if we can nocow if we don't have data space") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
In the fixup worker, if we fail to mark the range as delalloc in the io
tree, we must release the previously reserved metadata, as well as update
the outstanding extents counter for the inode, otherwise we leak metadata
space.
In pratice we can't return an error from btrfs_set_extent_delalloc(),
which is just a wrapper around __set_extent_bit(), as for most errors
__set_extent_bit() does a BUG_ON() (or panics which hits a BUG_ON() as
well) and returning an -EEXIST error doesn't happen in this case since
the exclusive bits parameter always has a value of 0 through this code
path. Nevertheless, just fix the error handling in the fixup worker,
in case one day __set_extent_bit() can return an error to this code
path.
Fixes: f3038ee3a3f101 ("btrfs: Handle btrfs_set_extent_delalloc failure in fixup worker") CC: stable@vger.kernel.org # 4.19+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
This is because the free side drops the ref without the lock, and then
takes the lock if our refcount is 0. So you can have nodes on the tree
that have a refcount of 0. Fix this by zero'ing out that element in our
temporary array so we don't try to kill it again.
CC: stable@vger.kernel.org # 4.14+ Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com>
[ add comment ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
The page we were trying to drop had a page->private, but had no
page->mapping and so called drop_buffers, assuming that we had a
buffer_head on the page, and then panic'ed trying to deref 1, which is
our page->private for data pages.
This is happening because we're truncating the free space cache while
we're trying to load the free space cache. This isn't supposed to
happen, and I'll fix that in a followup patch. However we still
shouldn't allow those sort of mistakes to result in messing with pages
that do not belong to us. So add the page->mapping check to verify that
we still own this page after dropping and re-acquiring the page lock.
This page being unlocked as:
btrfs_readpage
extent_read_full_page
__extent_read_full_page
__do_readpage
if (!nr)
unlock_page <-- nr can be 0 only if submit_extent_page
returns an error
CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
[ add callchain ] Signed-off-by: David Sterba <dsterba@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When the implementation of SKBs with fraglist was sent upstream, a
merge-damage occurred and half the patch was not applied.
This causes problems in high-throughput situations with AX200 devices,
including low throughput and FW crashes.
Introduce the part that was missing from the original patch.
Fixes: 0044f1716c4d ("iwlwifi: pcie: support transmitting SKBs with fraglist") Cc: stable@vger.kernel.org # 4.20+ Signed-off-by: Johannes Berg <johannes.berg@intel.com>
[ This patch was created by me, but the original author of this code
is Johannes, so his s-o-b is here and he's marked as the author of
the patch. ] Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Since the role_store() uses strncmp(), it's possible to refer
out-of-memory if the sysfs data size is smaller than strlen("host").
This patch fixes it by using sysfs_streq() instead of strncmp().
Reported-by: Pavel Machek <pavel@denx.de> Fixes: 9bb86777fb71 ("phy: rcar-gen3-usb2: add sysfs for usb role swap") Cc: <stable@vger.kernel.org> # v4.10+ Signed-off-by: Yoshihiro Shimoda <yoshihiro.shimoda.uh@renesas.com> Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be> Acked-by: Pavel Machek <pavel@denx.de> Signed-off-by: Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Clear ep0's DWC3_EP_TRANSFER_STARTED flag if the END_TRANSFER command is
completed. Otherwise, we can't start control transfer again after
END_TRANSFER.
Normally the END_TRANSFER command completion handler will clear the
DWC3_EP_TRANSFER_STARTED flag. However, if the command was sent without
interrupt on completion, then the flag will not be cleared. Make sure to
clear the flag in this case.
This patch corrects the condition to kick the transfer without
giving back the requests when either request has remaining data
or when there are pending SGs. The && check was introduced during
spliting up the dwc3_gadget_ep_cleanup_completed_requests() function.
In case we have to migrate a ballon page to a newpage of another zone, the
managed page count of both zones is wrong. Paired with memory offlining
(which will adjust the managed page count), we can trigger kernel crashes
and all kinds of different symptoms.
In another instance (older kernel), I was able to observe this
(negative page count :/):
[ 180.896971] Offlined Pages 32768
[ 182.667462] Offlined Pages 32768
[ 184.408117] Offlined Pages 32768
[ 186.026321] Offlined Pages 32768
[ 187.684861] Offlined Pages 32768
[ 189.227013] Offlined Pages 32768
[ 190.830303] Offlined Pages 32768
[ 190.833071] Built 1 zonelists, mobility grouping on. Total pages: -36920272750453009
In another instance (older kernel), I was no longer able to start any
process:
[root@vm ~]# [ 214.348068] Offlined Pages 32768
[ 215.973009] Offlined Pages 32768
cat /proc/meminfo
-bash: fork: Cannot allocate memory
[root@vm ~]# cat /proc/meminfo
-bash: fork: Cannot allocate memory
Fix it by properly adjusting the managed page count when migrating if
the zone changed. The managed page count of the zones now looks after
unplug of the DIMM (and after deflating the balloon) just like before
inflating the balloon (and plugging+onlining the DIMM).
We'll temporarily modify the totalram page count. If this ever becomes a
problem, we can fine tune by providing helpers that don't touch
the totalram pages (e.g., adjust_zone_managed_page_count()).
Please note that fixing up the managed page count is only necessary when
we adjusted the managed page count when inflating - only if we
don't have VIRTIO_BALLOON_F_DEFLATE_ON_OOM. With that feature, the
managed page count is not touched when inflating/deflating.
Reported-by: Yumei Huang <yuhuang@redhat.com> Fixes: 3dcc0571cd64 ("mm: correctly update zone->managed_pages") Cc: <stable@vger.kernel.org> # v3.11+ Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Jiang Liu <liuj97@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Igor Mammedov <imammedo@redhat.com> Cc: virtualization@lists.linux-foundation.org Signed-off-by: David Hildenbrand <david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
When virt_wifi interface is created, virt_wifi_newlink() is called and
it calls register_netdevice().
if register_netdevice() fails, it internally would call
->priv_destructor(), which is virt_wifi_net_device_destructor() and
it frees netdev. but virt_wifi_newlink() still use netdev.
So, use-after-free would occur in virt_wifi_newlink().
Test commands:
ip link add dummy0 type dummy
modprobe bonding
ip link add bonding_masters link dummy0 type virt_wifi
Splat looks like:
[ 202.220554] BUG: KASAN: use-after-free in virt_wifi_newlink+0x88b/0x9a0 [virt_wifi]
[ 202.221659] Read of size 8 at addr ffff888061629cb8 by task ip/852
Change calculating of position page containing BBM
If none of BBM flags are set then function nand_bbm_get_next_page
reports EINVAL. It causes that BBM is not read at all during scanning
factory bad blocks. The result is that the BBT table is build without
checking factory BBM at all. For Micron flash memories none of these
flags are set if page size is different than 2048 bytes.
Address this regression by:
- adding NAND_BBM_FIRSTPAGE chip flag without any condition. It solves
issue only for Micron devices.
- changing the nand_bbm_get_next_page_function. It will return 0
if no of BBM flag is set and page parameter is 0. After that modification
way of discovering factory bad blocks will work similar as in kernel
version 5.1.
Cc: stable@vger.kernel.org Fixes: f90da7818b14 (mtd: rawnand: Support bad block markers in first, second or last page) Signed-off-by: Piotr Sroka <piotrs@cadence.com> Reviewed-by: Frieder Schrempf <frieder.schrempf@kontron.de> Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Any write with either dd or flashcp to a device driven by the
spear_smi.c driver will pass through the spear_smi_cpy_toio()
function. This function will get called for chunks of up to 256 bytes.
If the amount of data is smaller, we may have a problem if the data
length is not 4-byte aligned. In this situation, the kernel panics
during the memcpy:
# dd if=/dev/urandom bs=1001 count=1 of=/dev/mtd6
spear_smi_cpy_toio [620] dest c9070000, src c7be8800, len 256
spear_smi_cpy_toio [620] dest c9070100, src c7be8900, len 256
spear_smi_cpy_toio [620] dest c9070200, src c7be8a00, len 256
spear_smi_cpy_toio [620] dest c9070300, src c7be8b00, len 233
Unhandled fault: external abort on non-linefetch (0x808) at 0xc90703e8
[...]
PC is at memcpy+0xcc/0x330
The above error occurs because the implementation of memcpy_toio()
tries to optimize the number of I/O by writing 4 bytes at a time as
much as possible, until there are less than 4 bytes left and then
switches to word or byte writes.
Unfortunately, the specification states about the Write Burst mode:
"the next AHB Write request should point to the next
incremented address and should have the same size (byte,
half-word or word)"
This means ARM architecture implementation of memcpy_toio() cannot
reliably be used blindly here. Workaround this situation by update the
write path to stick to byte access when the burst length is not
multiple of 4.
Fixes: f18dbbb1bfe0 ("mtd: ST SPEAr: Add SMI driver for serial NOR flash") Cc: Russell King <linux@armlinux.org.uk> Cc: Boris Brezillon <boris.brezillon@collabora.com> Cc: stable@vger.kernel.org Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Keeping interrupts on could result in brcmfmac freeing some resources
and then IRQ handlers trying to use them. That was obviously a straight
path for crashing a kernel.
When an IRQ occurs, regmap_{read,write,...}() is invoked in atomic
context. Regmap must indicate register IO is fast so that a spinlock is
used instead of a mutex to avoid sleeping in atomic context:
The problem arises because our read() function grabs a lock of the
circular buffer, finds something of interest, then invokes copy_to_user()
straight from the buffer, which in turn takes mm->mmap_sem. In the same
time, the callback mon_bin_vma_fault() is invoked under mm->mmap_sem.
It attempts to take the fetch lock and deadlocks.
This patch does away with protecting of our page list with any
semaphores, and instead relies on the kernel not close the device
while mmap is active in a process.
In addition, we prohibit re-sizing of a buffer while mmap is active.
This way, when (now unlocked) fault is processed, it works with the
page that is intended to be mapped-in, and not some other random page.
Note that this may have an ABI impact, but hopefully no legitimate
program is this wrong.
Explicitly initialize URB structure urb_list field in usb_init_urb().
This field can be potentially accessed uninitialized and its
initialization is coherent with the usage of list_del_init() in
usb_hcd_unlink_urb_from_ep() and usb_giveback_urb_bh() and its
explicit initialization in usb_hcd_submit_urb() error path.
Make sure that the interrupt interface has an endpoint before trying to
access its endpoint descriptors to avoid dereferencing a NULL pointer.
The driver binds to the interrupt interface with interface number 0, but
must not assume that this interface or its current alternate setting are
the first entries in the corresponding configuration arrays.
Fixes: b72458a80c75 ("[PATCH] USB: Eagle and ADI 930 usb adsl modem driver") Cc: stable <stable@vger.kernel.org> # 2.6.16 Signed-off-by: Johan Hovold <johan@kernel.org> Link: https://lore.kernel.org/r/20191210112601.3561-2-johan@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Connor Kuehl <connor.kuehl@canonical.com> Signed-off-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>