Guoqing Jiang [Mon, 23 Dec 2019 09:49:01 +0000 (10:49 +0100)]
md/raid1: use bucket based mechanism for IO serialization
Since raid1 had already used bucket based mechanism to reduce
the conflict between write IO and resync IO, it is possible to
speed up performance for io serialization with refer to the
same mechanism.
To align with the barrier bucket mechanism, we created arrays
(with the same number of BARRIER_BUCKETS_NR) for spinlock, rb
tree and waitqueue. Then we can reduce lock competition with
multiple spinlocks, boost search performance with multiple rb
trees and also reduce thundering herd problem with multiple
waitqueues.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
Guoqing Jiang [Mon, 23 Dec 2019 09:49:00 +0000 (10:49 +0100)]
md: introduce a new struct for IO serialization
Obviously, IO serialization could cause the degradation of
performance a lot. In order to reduce the degradation, so a
rb interval tree is added in raid1 to speed up the check of
collision.
So, a rb root is needed in md_rdev, then abstract all the
serialize related members to a new struct (serial_in_rdev),
embed it into md_rdev.
Of course, we need to free the struct if it is not needed
anymore, so rdev/rdevs_uninit_serial are added accordingly.
And they should be called when destroty memory pool or can't
alloc memory.
And we need to consider to call mddev_destroy_serial_pool
in case serialize_policy/write-behind is disabled, bitmap
is destroyed or in __md_stop_writes.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
Guoqing Jiang [Mon, 23 Dec 2019 09:48:58 +0000 (10:48 +0100)]
raid1: serialize the overlap write
Before dispatch write bio, raid1 array which enables
serialize_policy need to check if overlap exists between
this bio and previous on-flying bios. If there is overlap,
then it has to wait until the collision is disappeared.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
Guoqing Jiang [Mon, 23 Dec 2019 09:48:57 +0000 (10:48 +0100)]
md: reorgnize mddev_create/destroy_serial_pool
So far, IO serialization is used for two scenarios:
1. raid1 which enables write-behind mode, and there is rdev in the array
which is multi-queue device and flaged with writemostly.
2. IO serialization is enabled or disabled by change serialize_policy.
So introduce rdev_need_serial to check the first scenario. And for 1, IO
serialization is enabled automatically while 2 is controlled manually.
And it is possible that both scenarios are true, so for create serial pool,
rdev/rdevs_init_serial should be separate from check if the pool existed or
not. Then for destroy pool, we need to check if the pool is needed by other
rdevs due to the first scenario.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
Guoqing Jiang [Mon, 23 Dec 2019 09:48:56 +0000 (10:48 +0100)]
md: add serialize_policy sysfs node for raid1
With the new sysfs node, we can use it to control if raid1 array
wants io serialization or not. So mddev_create_serial_pool and
mddev_destroy_serial_pool are called in serialize_policy_store
to enable or disable the serialization.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
Guoqing Jiang [Mon, 23 Dec 2019 09:48:55 +0000 (10:48 +0100)]
md: prepare for enable raid1 io serialization
1. The related resources (spin_lock, list and waitqueue) are needed for
address raid1 reorder overlap issue too, in this case, rdev is set to
NULL for mddev_create/destroy_serial_pool which implies all rdevs need
to handle these resources.
And also add "is_suspend" to mddev_destroy_serial_pool since it will
be called under suspended situation, which also makes both create and
destroy pool have same arguments.
2. Introduce rdevs_init_serial which is called if raid1 io serialization
is enabled since all rdevs need to init related stuffs.
3. rdev_init_serial and clear_bit(CollisionCheck, &rdev->flags) should
be called between suspend and resume.
No need to export mddev_create_serial_pool since it is only called in
md-mod module.
Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
Guoqing Jiang [Mon, 23 Dec 2019 09:48:54 +0000 (10:48 +0100)]
md: fix a typo s/creat/create
It actually means create here, so fix the typo.
Reported-by: Song Liu <liu.song.a23@gmail.com> Signed-off-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com> Signed-off-by: Song Liu <songliubraving@fb.com>
Zhengyuan Liu [Fri, 20 Dec 2019 02:21:28 +0000 (10:21 +0800)]
md/raid6: fix algorithm choice under larger PAGE_SIZE
There are several algorithms available for raid6 to generate xor and syndrome
parity, including basic int1, int2 ... int32 and SIMD optimized implementation
like sse and neon. To test and choose the best algorithms at the initial
stage, we need provide enough disk data to feed the algorithms. However, the
disk number we provided depends on page size and gfmul table, seeing bellow:
const int disks = (65536/PAGE_SIZE) + 2;
So when come to 64K PAGE_SIZE, there is only one data disk plus 2 parity disk,
as a result the chosed algorithm is not reliable. For example, on my arm64
machine with 64K page enabled, it will choose intx32 as the best one, although
the NEON implementation is better.
This patch tries to fix the problem by defining a constant raid6 disk number to
supporting arbitrary page size.
Suggested-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com>
Zhengyuan Liu [Fri, 20 Dec 2019 02:21:27 +0000 (10:21 +0800)]
raid6/test: fix a compilation warning
The compilation warning is redefination showed as following:
In file included from tables.c:2:
../../../include/linux/export.h:180: warning: "EXPORT_SYMBOL" redefined
#define EXPORT_SYMBOL(sym) __EXPORT_SYMBOL(sym, "")
In file included from tables.c:1:
../../../include/linux/raid/pq.h:61: note: this is the location of the previous definition
#define EXPORT_SYMBOL(sym)
Fixes: 69a94abb82ee ("export.h, genksyms: do not make genksyms calculate CRC of trimmed symbols") Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com>
Zhengyuan Liu [Fri, 20 Dec 2019 02:21:26 +0000 (10:21 +0800)]
raid6/test: fix a compilation error
The compilation error is redeclaration showed as following:
In file included from ../../../include/linux/limits.h:6,
from /usr/include/x86_64-linux-gnu/bits/local_lim.h:38,
from /usr/include/x86_64-linux-gnu/bits/posix1_lim.h:161,
from /usr/include/limits.h:183,
from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/limits.h:194,
from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/syslimits.h:7,
from /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed/limits.h:34,
from ../../../include/linux/raid/pq.h:30,
from algos.c:14:
../../../include/linux/types.h:114:15: error: conflicting types for ‘int64_t’
typedef s64 int64_t;
^~~~~~~
In file included from /usr/include/stdint.h:34,
from /usr/lib/gcc/x86_64-linux-gnu/8/include/stdint.h:9,
from /usr/include/inttypes.h:27,
from ../../../include/linux/raid/pq.h:29,
from algos.c:14:
/usr/include/x86_64-linux-gnu/bits/stdint-intn.h:27:19: note: previous \
declaration of ‘int64_t’ was here
typedef __int64_t int64_t;
Fixes: 54d50897d544 ("linux/kernel.h: split *_MAX and *_MIN macros into <linux/limits.h>") Signed-off-by: Zhengyuan Liu <liuzhengyuan@kylinos.cn> Signed-off-by: Song Liu <songliubraving@fb.com>
In the current implementation, final zone-mgmt request is issued with
submit_bio_wait() which marks the bio REQ_SYNC. This is needed since
immediate action is expected for zone-mgmt requests as these are
blocking operations. This also bypasses the scheduler in the
blk_mq_make_request() and dispatches the request directly into the
hw ctx.
This patch marks all the chained bios REQ_SYNC so that we can have
above-mentioned behavior for non-final bios also.
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Bob Liu <bob.liu@oracle.com> Signed-off-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Herbert Xu [Mon, 23 Dec 2019 08:13:51 +0000 (16:13 +0800)]
block: Allow t10-pi to be modular
Currently t10-pi can only be built into the block layer which via
crc-t10dif pulls in a whole chunk of the Crypto API. In fact all
users of t10-pi work as modules and there is no reason for it to
always be built-in.
This patch adds a new hidden option for t10-pi that is selected
automatically based on BLK_DEV_INTEGRITY and whether the users
of t10-pi are built-in or not.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Begunkov [Thu, 28 Nov 2019 21:11:55 +0000 (00:11 +0300)]
blk-mq: optimise blk_mq_flush_plug_list()
Instead of using list_del_init() in a loop, that generates a lot of
unnecessary memory read/writes, iterate from the first request of a
batch and cut out a sublist with list_cut_before().
Apart from removing the list node initialisation part, this is more
register-friendly, and the assembly uses the stack less intensively.
list_empty() at the beginning is done with hope, that the compiler can
optimise out the same check in the following list_splice_init().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Linus Torvalds [Wed, 18 Dec 2019 16:54:15 +0000 (08:54 -0800)]
Merge tag 'sound-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A slightly high amount at this time, but all good and small fixes:
- A PCM core fix that initializes the buffer properly for avoiding
information leaks; it is a long-standing minor problem, but good to
fix better now
- A few ASoC core fixes for the init / cleanup ordering issues that
surfaced after the recent refactoring
- Lots of SOF and topology-related fixes went in, as usual as such
hot topics
- Several ASoC codec and platform-specific small fixes: wm89xx,
realtek, and max98090, AMD, Intel-SST
- A fix for the previous incomplete regression of HD-audio, now
hitting Nvidia HDMI
- A few HD-audio CA0132 codec fixes"
* tag 'sound-5.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (27 commits)
ALSA: hda - Downgrade error message for single-cmd fallback
ASoC: wm8962: fix lambda value
ALSA: hda: Fix regression by strip mask fix
ALSA: hda/ca0132 - Fix work handling in delayed HP detection
ALSA: hda/ca0132 - Avoid endless loop
ALSA: hda/ca0132 - Keep power on during processing DSP response
ALSA: pcm: Avoid possible info leaks from PCM stream buffers
ASoC: Intel: common: work-around incorrect ACPI HID for CML boards
ASoC: SOF: Intel: split cht and byt debug window sizes
ASoC: SOF: loader: fix snd_sof_fw_parse_ext_data
ASoC: SOF: loader: snd_sof_fw_parse_ext_data log warning on unknown header
ASoC: simple-card: Don't create separate link when platform is present
ASoC: topology: Check return value for soc_tplg_pcm_create()
ASoC: topology: Check return value for snd_soc_add_dai_link()
ASoC: core: only flush inited work during free
ASoC: Intel: bytcr_rt5640: Update quirk for Teclast X89
ASoC: core: Init pcm runtime work early to avoid warnings
ASoC: Intel: sst: Add missing include <linux/io.h>
ASoC: max98090: fix possible race conditions
ASoC: max98090: exit workaround earlier if PLL is locked
...
Linus Torvalds [Tue, 17 Dec 2019 21:27:02 +0000 (13:27 -0800)]
Merge tag 'for-5.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A mix of regression fixes and regular fixes for stable trees:
- fix swapped error messages for qgroup enable/rescan
- fixes for NO_HOLES feature with clone range
- fix deadlock between iget/srcu lock/synchronize srcu while freeing
an inode
- fix double lock on subvolume cross-rename
- tree log fixes
* fix missing data checksums after replaying a log tree
* also teach tree-checker about this problem
* skip log replay on orphaned roots
- fix maximum devices constraints for RAID1C -3 and -4
- send: don't print warning on read-only mount regarding orphan
cleanup
- error handling fixes"
* tag 'for-5.5-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: send: remove WARN_ON for readonly mount
btrfs: do not leak reloc root if we fail to read the fs root
btrfs: skip log replay on orphaned roots
btrfs: handle ENOENT in btrfs_uuid_tree_iterate
btrfs: abort transaction after failed inode updates in create_subvol
Btrfs: fix hole extent items with a zero size after range cloning
Btrfs: fix removal logic of the tree mod log that leads to use-after-free issues
Btrfs: make tree checker detect checksum items with overlapping ranges
Btrfs: fix missing data checksums after replaying a log tree
btrfs: return error pointer from alloc_test_extent_buffer
btrfs: fix devs_max constraints for raid1c3 and raid1c4
btrfs: tree-checker: Fix error format string for size_t
btrfs: don't double lock the subvol_sem for rename exchange
btrfs: handle error in btrfs_cache_block_group
btrfs: do not call synchronize_srcu() in inode_tree_del
Btrfs: fix cloning range with a hole when using the NO_HOLES feature
btrfs: Fix error messages in qgroup_rescan_init
Linus Torvalds [Tue, 17 Dec 2019 21:08:41 +0000 (13:08 -0800)]
Merge tag 'regulator-fix-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A small set of fixes for mostly minor issues here, the only real code
ones are Wen Yang's fixes for error handling in the core and Christian
Marussi's list_voltage() change which is a fix for disruptively bad
performance for regulators with continuous voltage control (which are
rare)"
* tag 'regulator-fix-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: rn5t618: fix module aliases
regulator: max77650: add of_match table
regulator: core: avoid unneeded .list_voltage calls
regulator: s5m8767: Fix a warning message
regulator: core: fix regulator_register() error paths to properly release rdev
regulator: fix use after free issue
Linus Torvalds [Tue, 17 Dec 2019 21:06:31 +0000 (13:06 -0800)]
Merge tag 'spi-fix-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"A relatively large set of fixes here, the biggest part of it is for
fallout from the GPIO descriptor rework that affected several of the
devices with usable native chip select support. There's also some new
PCI IDs for Intel Jasper Lake devices.
The conversion to platform_get_irq() in the fsl driver is an
incremental fix for build errors introduced on SPARC by the earlier
fix for error handling in probe in that driver"
* tag 'spi-fix-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: fsl: use platform_get_irq() instead of of_irq_to_resource()
spi: nxp-fspi: Ensure width is respected in spi-mem operations
spi: spi-ti-qspi: Fix a bug when accessing non default CS
spi: fsl: don't map irq during probe
spi: spi-cavium-thunderx: Add missing pci_release_regions()
spi: sprd: Fix the incorrect SPI register
gpiolib: of: Make of_gpio_spi_cs_get_count static
spi: fsl: Handle the single hardwired chipselect case
gpio: Handle counting of Freescale chipselects
spi: fsl: Fix GPIO descriptor support
spi: dw: Correct handling of native chipselect
spi: cadence: Correct handling of native chipselect
spi: pxa2xx: Add support for Intel Jasper Lake
Linus Torvalds [Tue, 17 Dec 2019 19:17:03 +0000 (11:17 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fix from Ingo Molnar:
"Fix kexec booting with certain EFI memory map layouts"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/efi: Update e820 with reserved EFI boot services data to fix kexec breakage
Linus Torvalds [Tue, 17 Dec 2019 19:11:08 +0000 (11:11 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer fixes from Ingo Molnar:
"Add HPET quirks for the Intel 'Coffee Lake H' and 'Ice Lake' platforms"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/intel: Disable HPET on Intel Ice Lake platforms
x86/intel: Disable HPET on Intel Coffee Lake H platforms
Linus Torvalds [Tue, 17 Dec 2019 19:03:57 +0000 (11:03 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf tooling fixes from Ingo Molnar:
"These are all perf tooling changes: most of them are fixes.
Note that the large CPU count related fixes go beyond regression
fixes, but the IPI-flood symptoms are severe enough that I think
justifies their inclusion"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (38 commits)
perf vendor events s390: Remove name from L1D_RO_EXCL_WRITES description
perf vendor events s390: Fix counter long description for DTLB1_GPAGE_WRITES
libtraceevent: Allow custom libdir path
perf header: Fix false warning when there are no duplicate cache entries
perf metricgroup: Fix printing event names of metric group with multiple events
perf/x86/pmu-events: Fix Kernel_Utilization metric
perf top: Do not bail out when perf_env__read_cpuid() returns ENOSYS
perf arch: Make the default get_cpuid() return compatible error
tools headers kvm: Sync linux/kvm.h with the kernel sources
tools headers UAPI: Update tools's copy of drm.h headers
tools headers UAPI: Sync drm/i915_drm.h with the kernel sources
perf inject: Fix processing of ID index for injected instruction tracing
perf report: Bail out --mem-mode if mem info is not available
perf report: Make -F more strict like -s
perf report/top TUI: Replace pr_err() with ui__error()
libtraceevent: Copy pkg-config file to output folder when using O=
libtraceevent: Fix lib installation with O=
perf kvm: Clarify the 'perf kvm' -i and -o command line options
tools arch x86: Sync asm/cpufeatures.h with the kernel sources
perf beauty: Add CLEAR_SIGHAND support for clone's flags arg
...
Linus Torvalds [Tue, 17 Dec 2019 19:00:46 +0000 (11:00 -0800)]
Merge branch 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fixes from Ingo Molnar:
"Tone down mutex debugging complaints, and annotate/fix spinlock
debugging data accesses for KCSAN"
* 'locking-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Revert "locking/mutex: Complain upon mutex API misuse in IRQ contexts"
locking/spinlock/debug: Fix various data races
Linus Torvalds [Tue, 17 Dec 2019 18:39:55 +0000 (10:39 -0800)]
Merge branch 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull EFI fixes from Ingo Molnar:
"Protect presistent EFI memory reservations from kexec, fix EFIFB early
console, EFI stub graphics output fixes and other misc fixes."
* 'efi-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
efi: Don't attempt to map RCI2 config table if it doesn't exist
efi/earlycon: Remap entire framebuffer after page initialization
efi: Fix efi_loaded_image_t::unload type
efi/gop: Fix memory leak in __gop_query32/64()
efi/gop: Return EFI_SUCCESS if a usable GOP was found
efi/gop: Return EFI_NOT_FOUND if there are no usable GOPs
efi/memreserve: Register reservations as 'reserved' in /proc/iomem
Takashi Iwai [Tue, 17 Dec 2019 13:18:32 +0000 (14:18 +0100)]
Merge tag 'asoc-fix-v5.5-rc2' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Fixes for v5.5
A collection of fixes since the merge window, mostly driver specific but
there's a few in the core that clean up fallout from the refactorings
done in the last cycle.
Linus Torvalds [Tue, 17 Dec 2019 00:43:07 +0000 (16:43 -0800)]
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
"I didn't get a batch in this weekend, so here's what we queued up last
week and today.
- A couple of defconfigs add back debugfs -- it used to be implicitly
enabled through CONFIG_TRACING, but 0e4a459f56c32d3e ("tracing:
Remove unnecessary DEBUG_FS dependency") removed that.
- The rest are mostly minor fixlets of the usual kind; some DT
tweaks, a headerfile refactor that needs a build fix now, etc"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (30 commits)
ARM: bcm: Add missing sentinel to bcm2711_compat[]
ARM: shmobile: defconfig: Restore debugfs support
bus: ti-sysc: Fix missing reset delay handling
ARM: imx: Fix boot crash if ocotp is not found
ARM: imx_v6_v7_defconfig: Explicitly restore CONFIG_DEBUG_FS
ARM: dts: imx6ul-evk: Fix peripheral regulator
arm64: dts: ls1028a: fix reboot node
ARM: mmp: include the correct cputype.h
ARM: dts: am437x-gp/epos-evm: fix panel compatible
arm64: dts: ls1028a: fix typo in TMU calibration data
ARM: imx: Correct ocotp id for serial number support of i.MX6ULL/ULZ SoCs
ARM: dts: bcm283x: Fix critical trip point
ARM: omap2plus_defconfig: Add back DEBUG_FS
ARM: omap2plus_defconfig: enable NET_SWITCHDEV
ARM: dts: am335x-sancloud-bbe: fix phy mode
bus: ti-sysc: Fix missing force mstandby quirk handling
reset: Do not register resource data for missing resets
reset: Fix {of,devm}_reset_control_array_get kerneldoc return types
reset: brcmstb: Remove resource checks
dt-bindings: reset: Fix brcmstb-reset example
...
Olof Johansson [Mon, 16 Dec 2019 19:33:13 +0000 (11:33 -0800)]
Merge tag 'samsung-fixes-5.5' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into arm/fixes
Samsung fixes for v5.5
1. Restore debugfs support in exynos_defconfig (as now it is not
selected as dependency of tracing). Debugfs is required by systemd
and several tests.
2. Maintainers updates.
* tag 'samsung-fixes-5.5' of https://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
ARM: exynos_defconfig: Restore debugfs support
MAINTAINERS: Include Samsung SoC serial driver in Samsung SoC entry
MAINTAINERS: Update Lukasz Luba's email address
Olof Johansson [Mon, 16 Dec 2019 19:33:04 +0000 (11:33 -0800)]
Merge tag 'renesas-fixes-for-v5.5-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel into arm/fixes
Renesas fixes for v5.5
- Restore debugfs support
* tag 'renesas-fixes-for-v5.5-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/renesas-devel:
ARM: shmobile: defconfig: Restore debugfs support
Linus Torvalds [Mon, 16 Dec 2019 18:06:04 +0000 (10:06 -0800)]
Merge tag 'linux-kselftest-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
- ftrace and safesetid test fixes from Masami Hiramatsu
- Kunit fixes from Brendan Higgins, Iurii Zaikin, and Heidi Fahim
- Kselftest framework fixes from SeongJae Park and Michael Ellerman
* tag 'linux-kselftest-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
kselftest: Support old perl versions
kselftest/runner: Print new line in print of timeout log
selftests: Fix dangling documentation references to kselftest_module.sh
Documentation: kunit: add documentation for kunit_tool
Documentation: kunit: fix typos and gramatical errors
kunit: testing kunit: Bug fix in test_run_timeout function
fs/ext4/inode-test: Fix inode test on 32 bit platforms.
selftests: safesetid: Fix Makefile to set correct test program
selftests: safesetid: Check the return value of setuid/setgid
selftests: safesetid: Move link library to LDLIBS
selftests/ftrace: Fix multiple kprobe testcase
selftests/ftrace: Do not to use absolute debugfs path
selftests/ftrace: Fix ftrace test cases to check unsupported
selftests/ftrace: Fix to check the existence of set_ftrace_filter
Eric Auger [Tue, 26 Nov 2019 17:54:13 +0000 (18:54 +0100)]
iommu: fix KASAN use-after-free in iommu_insert_resv_region
In case the new region gets merged into another one, the nr list node is
freed. Checking its type while completing the merge algorithm leads to
a use-after-free. Use new->type instead.
Fixes: 4dbd258ff63e ("iommu: Revisit iommu_insert_resv_region() implementation") Signed-off-by: Eric Auger <eric.auger@redhat.com> Reported-by: Qian Cai <cai@lca.pw> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Cc: Stable <stable@vger.kernel.org> #v5.3+ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Mon, 16 Dec 2019 03:50:23 +0000 (19:50 -0800)]
Fix root mounting with no mount options
The "trivial conversion" in commit cccaa5e33525 ("init: use do_mount()
instead of ksys_mount()") was totally broken, since it didn't handle the
case of a NULL mount data pointer. And while I had "tested" it (and
presumably Dominik had too) that bug was hidden by me having options.
Ed Maste [Thu, 12 Dec 2019 14:53:46 +0000 (14:53 +0000)]
perf vendor events s390: Remove name from L1D_RO_EXCL_WRITES description
In 7fcfa9a2d9 an unintended prefix "Counter:18 Name:" was removed from
the description for L1D_RO_EXCL_WRITES, but the extra name remained in
the description. Remove it too.
Fixes: 7fcfa9a2d9a7 ("perf list: Fix s390 counter long description for L1D_RO_EXCL_WRITES") Signed-off-by: Ed Maste <emaste@freebsd.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Vincent Chen <deanbo422@gmail.com> Link: http://lore.kernel.org/lkml/20191212145346.5026-1-emaste@freefall.freebsd.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ed Maste [Thu, 12 Dec 2019 14:34:46 +0000 (14:34 +0000)]
perf vendor events s390: Fix counter long description for DTLB1_GPAGE_WRITES
The cf_z13 counter DTLB1_GPAGE_WRITES included a prefix
'Counter:132\tName:'.
This is incorrect; remove the prefix as with 7fcfa9a2d9 for cf_z14.
Signed-off-by: Ed Maste <emaste@freebsd.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Greentime Hu <green.hu@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nick Hu <nickhu@andestech.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Vincent Chen <deanbo422@gmail.com> Link: http://lore.kernel.org/lkml/20191212143446.88582-1-emaste@freefall.freebsd.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sudip Mukherjee [Sat, 7 Dec 2019 11:14:40 +0000 (11:14 +0000)]
libtraceevent: Allow custom libdir path
When I use prefix=/usr and try to install libtraceevent in my laptop it
tries to install in /usr/lib64. I am not having any folder as /usr/lib64
and also the debian policy doesnot allow installing in /usr/lib64. It
should be in /usr/lib/x86_64-linux-gnu/.
Quote: No package for a 64 bit architecture may install files in
/usr/lib64/ or in a subdirectory of it.
Make it more flexible by allowing to mention libdir_relative while
installing so that distros can mention the path according to their
policy or use the default one.
Takashi Iwai [Mon, 16 Dec 2019 15:12:24 +0000 (16:12 +0100)]
ALSA: hda - Downgrade error message for single-cmd fallback
We made the error message for the CORB/RIRB communication clearer by
upgrading to dev_WARN() so that user can notice better. But this
struck us like a boomerang: now it caught syzbot and reported back as
a fatal issue although it's not really any too serious bug that worth
for stopping the whole system.
OK, OK, let's be softy, downgrade it to the standard dev_err() again.
Fixes: dd65f7e19c69 ("ALSA: hda - Show the fatal CORB/RIRB error more clearly") Reported-by: syzbot+b3028ac3933f5c466389@syzkaller.appspotmail.com Link: https://lore.kernel.org/r/20191216151224.30013-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
Michael Walle [Wed, 11 Dec 2019 19:57:30 +0000 (20:57 +0100)]
spi: nxp-fspi: Ensure width is respected in spi-mem operations
Make use of a core helper to ensure the desired width is respected
when calling spi-mem operators.
Otherwise only the SPI controller will be matched with the flash chip,
which might lead to wrong widths. Also consider the width specified by
the user in the device tree.
ARM: bcm: Add missing sentinel to bcm2711_compat[]
commit 781fa0a95424 ("ARM: bcm: Add support for BCM2711 SoC")
breaks boot of many other platforms (e.g. OMAP or i.MX6) if
CONFIG_ARCH_BCM2835 is enabled in addition to some multiplatform
config (e.g. omap2plus_defconfig). The symptom is that the OMAP
based board does not show any activity beyond "Starting Kernel ..."
even with earlycon.
Reverting the mentioned commit makes it work again.
The real fix is to add the missing NULL sentinel to the
bcm2711_compat[] variable-length array.
Fixes: 781fa0a95424 ("ARM: bcm: Add support for BCM2711 SoC") Acked-by: Stefan Wahren <wahrenst@gmx.net> Signed-off-by: H. Nikolaus Schaller <hns@goldelico.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Linus Torvalds [Sun, 15 Dec 2019 22:58:13 +0000 (14:58 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
"A small collection of -rc fixes. Mostly. One API addition, but that's
because we wanted to use it in a fix. There's also a bug fix that is
going to render the 5.5 kernel's soft-RoCE driver incompatible with
all soft-RoCE versions prior, but it's required to actually implement
the protocol according to the RoCE spec and required in order for the
soft-RoCE driver to be able to successfully work with actual RoCE
hardware.
Summary:
- Update Steve Wise info
- Fix for soft-RoCE crc calculations (will break back compatibility,
but only with the soft-RoCE driver, which has had this bug since it
was introduced and it is an on-the-wire bug, but will make
soft-RoCE fully compatible with real RoCE hardware)
- cma init fixup
- counters oops fix
- fix for mlx4 init/teardown sequence
- fix for mkx5 steering rules
- introduce a cleanup API, which isn't a fix, but we want to use it
in the next fix
- fix for mlx5 memory management that uses API in previous patch"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/mlx5: Fix device memory flows
IB/core: Introduce rdma_user_mmap_entry_insert_range() API
IB/mlx5: Fix steering rule of drop and count
IB/mlx4: Follow mirror sequence of device add during device removal
RDMA/counter: Prevent auto-binding a QP which are not tracked with res
rxe: correctly calculate iCRC for unaligned payloads
Update mailmap info for Steve Wise
RDMA/cma: add missed unregister_pernet_subsys in init failure
Linus Torvalds [Sun, 15 Dec 2019 20:27:31 +0000 (12:27 -0800)]
Merge tag 'riscv/for-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V fixes from Paul Walmsley:
"Two minor build fixes:
- Fix builds of the ELF loader when built with 'make -j1' (nommu
only)
- Fix CONFIG_SOC_SIFIVE builds when CONFIG_TTY is disabled (found
during randconfig testing)"
* tag 'riscv/for-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
riscv: only select serial sifive if TTY is enabled
riscv: Fix build dependency for loader
Linus Torvalds [Sun, 15 Dec 2019 20:24:44 +0000 (12:24 -0800)]
Merge tag 'for-linus-5.5b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull xen fixes from Juergen Gross:
"Two fixes: one for a resource accounting bug in some configurations
and a fix for another patch which went into rc1"
* tag 'for-linus-5.5b-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/balloon: fix ballooned page accounting without hotplug enabled
xen-blkback: prevent premature module unload
Linus Torvalds [Sun, 15 Dec 2019 19:36:12 +0000 (11:36 -0800)]
Merge branch 'remove-ksys-mount-dup' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux
Pull ksys_mount() and ksys_dup() removal from Dominik Brodowski:
"This small series replaces all in-kernel calls to the
userspace-focused ksys_mount() and ksys_dup() with calls to
kernel-centric functions:
For each replacement of ksys_mount() with do_mount(), one needs to
verify that the first and third parameter (char *dev_name, char *type)
are strings allocated in kernelspace and that the fifth parameter
(void *data) is either NULL or refers to a full page (only occurence
in init/do_mounts.c::do_mount_root()). The second and fourth
parameters (char *dir_name, unsigned long flags) are passed by
ksys_mount() to do_mount() unchanged, and therefore do not require
particular care.
Moreover, instead of pretending to be userspace, the opening of
/dev/console as stdin/stdout/stderr can be implemented using in-kernel
functions as well. Thereby, ksys_dup() can be removed for good"
[ This doesn't get rid of the special "kernel init runs with KERNEL_DS"
case, but it at least removes _some_ of the users of "treat kernel
pointers as user pointers for our magical init sequence".
One day we'll hopefully be rid of it all, and can initialize our
init_thread addr_limit to USER_DS. - Linus ]
* 'remove-ksys-mount-dup' of git://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux:
fs: remove ksys_dup()
init: unify opening /dev/console as stdin/stdout/stderr
init: use do_mount() instead of ksys_mount()
initrd: use do_mount() instead of ksys_mount()
devtmpfs: use do_mount() instead of ksys_mount()
Linus Torvalds [Sat, 14 Dec 2019 20:51:57 +0000 (12:51 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"24 fixes, all in drivers. The lion's share (16) are qla2xxx and the
rest are iscsi (3), ufs (2), smarpqi, lpfc and libsas"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (24 commits)
scsi: iscsi: Avoid potential deadlock in iscsi_if_rx func
scsi: iscsi: Fix a potential deadlock in the timeout handler
scsi: smartpqi: Update attribute name to `driver_version`
scsi: libsas: stop discovering if oob mode is disconnected
scsi: ufs: Disable autohibern8 feature in Cadence UFS
scsi: iscsi: qla4xxx: fix double free in probe
scsi: ufs: Give an unique ID to each ufs-bsg
scsi: qla2xxx: Add debug dump of LOGO payload and ELS IOCB
scsi: qla2xxx: Ignore PORT UPDATE after N2N PLOGI
scsi: qla2xxx: Don't defer relogin unconditonally
scsi: qla2xxx: Send Notify ACK after N2N PLOGI
scsi: qla2xxx: Configure local loop for N2N target
scsi: qla2xxx: Fix PLOGI payload and ELS IOCB dump length
scsi: qla2xxx: Don't call qlt_async_event twice
scsi: qla2xxx: Allow PLOGI in target mode
scsi: qla2xxx: Change discovery state before PLOGI
scsi: qla2xxx: Drop superfluous INIT_WORK of del_work
scsi: qla2xxx: Initialize free_work before flushing it
scsi: qla2xxx: Use explicit LOGO in target mode
scsi: qla2xxx: Ignore NULL pointer in tcm_qla2xxx_free_mcmd
...
Linus Torvalds [Sat, 14 Dec 2019 20:49:20 +0000 (12:49 -0800)]
Merge tag 'char-misc-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc driver fixes from Greg KH:
"Here are six small fixes for some reported char/misc driver issues:
- fix build warnings with new 'awk' with the raid6 code
- four interconnect driver bugfixes
- binder fix for reported problem
All of these except the binder fix have been in linux-next with no
reported issues. The binder fix is "new" but Todd says it is good as
he has tested it :)"
* tag 'char-misc-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
binder: fix incorrect calculation for num_valid
interconnect: qcom: msm8974: Walk the list safely on node removal
interconnect: qcom: qcs404: Walk the list safely on node removal
interconnect: qcom: sdm845: Walk the list safely on node removal
interconnect: qcom: Fix Kconfig indentation
lib: raid6: fix awk build warnings
Linus Torvalds [Sat, 14 Dec 2019 20:45:36 +0000 (12:45 -0800)]
Merge tag 'driver-core-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are two small driver core fixes to resolve some reported issues
The first is to handle the much-reported (by the build systems)
problem that superH does not boot anymore.
The second handles an issue in the new platform logic that a number of
people ran into with the automated tests in kbuild
Both of these have been in linux-next with no reported issues"
* tag 'driver-core-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
drivers: Fix boot problem on SuperH
of/platform: Unconditionally pause/resume sync state during kernel init
Linus Torvalds [Sat, 14 Dec 2019 20:43:57 +0000 (12:43 -0800)]
Merge tag 'staging-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO fixes from Greg KH:
"Here are a number of small staging and IIO driver fixes for reported
issues for 5.5-rc2
Nothing major, a bunch of tiny IIO driver issues resolved, and some
staging driver fixes for things that people ran into with 5.5-rc1.
Full details are in the shortlog.
All of these have been in linux-next with no reported issues"
* tag 'staging-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (28 commits)
fbtft: Fix the initialization from property algorithm
staging: rtl8712: fix interface sanity check
staging: rtl8188eu: fix interface sanity check
staging: gigaset: add endpoint-type sanity check
staging: gigaset: fix illegal free on probe errors
staging: gigaset: fix general protection fault on probe
staging: vchiq: call unregister_chrdev_region() when driver registration fails
staging: exfat: fix multiple definition error of `rename_file'
staging/wlan-ng: add CRC32 dependency in Kconfig
staging: hp100: Fix build error without ETHERNET
staging: fbtft: Do not hardcode SPI CS polarity inversion
staging: exfat: properly support discard in clr_alloc_bitmap()
staging/octeon: Mark Ethernet driver as BROKEN
iio: adc: max9611: Fix too short conversion time delay
iio: ad7949: fix channels mixups
iio: imu: st_lsm6dsx: do not power-off accel if events are enabled
iio: imu: st_lsm6dsx: track hw FIFO buffering with fifo_mask
iio: imu: st_lsm6dsx: fix decimation factor estimation
iio: imu: inv_mpu6050: fix temperature reporting using bad unit
iio: humidity: hdc100x: fix IIO_HUMIDITYRELATIVE channel reporting
...
Linus Torvalds [Sat, 14 Dec 2019 20:40:39 +0000 (12:40 -0800)]
Merge tag 'usb-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are some small USB driver fixes for reported issues for 5.5-rc2
There's the usual gadget and xhci fixes, as well as some other
problems that syzbot has been finding during it's fuzzing runs. Full
details are in the shortlog.
All of these have been in linux-next with no reported issues"
* tag 'usb-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (21 commits)
usb: dwc3: pci: add ID for the Intel Comet Lake -H variant
xhci: make sure interrupts are restored to correct state
xhci: handle some XHCI_TRUST_TX_LENGTH quirks cases as default behaviour.
xhci: Increase STS_HALT timeout in xhci_suspend()
usb: xhci: only set D3hot for pci device
xhci: fix USB3 device initiated resume race with roothub autosuspend
xhci: Fix memory leak in xhci_add_in_port()
USB: Fix incorrect DMA allocations for local memory pool drivers
usb: gadget: fix wrong endpoint desc
usb: dwc3: ep0: Clear started flag on completion
usb: dwc3: gadget: Clear started flag for non-IOC
usb: dwc3: gadget: Fix logical condition
USB: atm: ueagle-atm: add missing endpoint check
USB: adutux: fix interface sanity check
USB: idmouse: fix interface sanity checks
USB: serial: io_edgeport: fix epic endpoint lookup
usb: mon: Fix a deadlock in usbmon between mmap and read
usb: common: usb-conn-gpio: Don't log an error on probe deferral
usb: core: urb: fix URB structure initialization function
usb: typec: fix use after free in typec_register_port()
...
Linus Torvalds [Sat, 14 Dec 2019 19:44:14 +0000 (11:44 -0800)]
Merge tag '5.5-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6
Pull cfis fixes from Steve French:
"Three small smb3 fixes: this addresses two recent issues reported in
additional testing during rc1, a refcount underflow and a problem with
an intermittent crash in SMB2_open_init"
* tag '5.5-rc1-smb3-fixes' of git://git.samba.org/sfrench/cifs-2.6:
CIFS: Close cached root handle only if it has a lease
SMB3: Fix crash in SMB2_open_init due to uninitialized field in compounding path
smb3: fix refcount underflow warning on unmount when no directory leases
Linus Torvalds [Sat, 14 Dec 2019 19:13:54 +0000 (11:13 -0800)]
Merge tag 'ovl-fixes-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
"Fix some bugs and documentation"
* tag 'ovl-fixes-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
docs: filesystems: overlayfs: Fix restview warnings
docs: filesystems: overlayfs: Rename overlayfs.txt to .rst
ovl: relax WARN_ON() on rename to self
ovl: fix corner case of non-unique st_dev;st_ino
ovl: don't use a temp buf for encoding real fh
ovl: make sure that real fid is 32bit aligned in memory
ovl: fix lookup failure on multi lower squashfs
Takashi Iwai [Sat, 14 Dec 2019 17:52:17 +0000 (18:52 +0100)]
ALSA: hda: Fix regression by strip mask fix
The commit e38e486d66e2 ("ALSA: hda: Modify stream stripe mask only
when needed") tried to address the regression by the unconditional
application of the stripe mask, but this caused yet another
regression for the previously working devices. Namely, the patch
clears the azx_dev->stripe flag at snd_hdac_stream_clear(), but this
may be called multiple times before restarting the stream, so this
ended up with clearance of the flag for the whole time.
This patch fixes the regression by moving the azx_dev->stripe flag
clearance at the counter-part, the close callback of HDMI codec
driver instead.
Takashi Iwai [Fri, 13 Dec 2019 08:51:11 +0000 (09:51 +0100)]
ALSA: hda/ca0132 - Fix work handling in delayed HP detection
CA0132 has the delayed HP jack detection code that is invoked from the
unsol handler, but it does a few weird things: it contains the cancel
of a work inside the work handler, and yet it misses the cancel-sync
call at (runtime-)suspend. This patch addresses those issues.
Takashi Iwai [Fri, 13 Dec 2019 08:51:09 +0000 (09:51 +0100)]
ALSA: hda/ca0132 - Keep power on during processing DSP response
We need to keep power on while processing the DSP response via unsol
event. Each snd_hda_codec_read() call does the power management, so
it should work normally, but still it's safer to keep the power up for
the whole function.
Takashi Iwai [Wed, 11 Dec 2019 15:57:42 +0000 (16:57 +0100)]
ALSA: pcm: Avoid possible info leaks from PCM stream buffers
The current PCM code doesn't initialize explicitly the buffers
allocated for PCM streams, hence it might leak some uninitialized
kernel data or previous stream contents by mmapping or reading the
buffer before actually starting the stream.
Since this is a common problem, this patch simply adds the clearance
of the buffer data at hw_params callback. Although this does only
zero-clear no matter which format is used, which doesn't mean the
silence for some formats, but it should be OK because the intention is
just to clear the previous data on the buffer.
Todd Kjos [Fri, 13 Dec 2019 20:25:31 +0000 (12:25 -0800)]
binder: fix incorrect calculation for num_valid
For BINDER_TYPE_PTR and BINDER_TYPE_FDA transactions, the
num_valid local was calculated incorrectly causing the
range check in binder_validate_ptr() to miss out-of-bounds
offsets.
Fixes: bde4a19fc04f ("binder: use userspace pointer as base of buffer space") Signed-off-by: Todd Kjos <tkjos@google.com> Cc: stable <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20191213202531.55010-1-tkjos@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Linus Torvalds [Fri, 13 Dec 2019 22:58:24 +0000 (14:58 -0800)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio fixes from Michael Tsirkin:
"Some fixes and cleanup patches"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
virtio_balloon: divide/multiply instead of shifts
virtio_balloon: name cleanups
virtio-balloon: fix managed page counts when migrating pages between zones
Linus Torvalds [Fri, 13 Dec 2019 22:45:40 +0000 (14:45 -0800)]
Merge branch 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
- removal of an old API where all in-kernel users have been converted
as of this merge window.
- a kdoc fix
- a new helper that will make dependencies for the next API conversion
a tad easier
* 'i2c/for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: add helper to check if a client has a driver attached
i2c: fix header file kernel-doc warning
i2c: remove i2c_new_dummy() API
Linus Torvalds [Fri, 13 Dec 2019 22:43:26 +0000 (14:43 -0800)]
Merge tag 'pm-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These add PM QoS support to devfreq and fix a few issues in that
subsystem, fix two cpuidle issues and do one minor cleanup in there,
and address an ACPI power management problem related to devices with
special power management requirements, like fans.
Specifics:
- Add PM QoS support, based on the frequency QoS introduced during
the 5.4 cycle, to devfreq (Leonard Crestez).
- Fix some assorted devfreq issues (Leonard Crestez).
- Fix an unintentional cpuidle behavior change (introduced during the
5.4 cycle) related to the active polling time limit (Marcelo
Tosatti).
- Fix a recently introduced cpuidle helper function and do a minor
cleanup in the cpuidle core (Rafael Wysocki).
- Avoid adding devices with special power management requirements,
like fans, to the generic ACPI PM domain (Rafael Wysocki)"
* tag 'pm-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpuidle: Drop unnecessary type cast in cpuidle_poll_time()
cpuidle: Fix cpuidle_driver_state_disabled()
ACPI: PM: Avoid attaching ACPI PM domain to certain devices
cpuidle: use first valid target residency as poll time
PM / devfreq: Use PM QoS for sysfs min/max_freq
PM / devfreq: Add PM QoS support
PM / devfreq: Don't fail devfreq_dev_release if not in list
PM / devfreq: Introduce get_freq_range helper
PM / devfreq: Set scaling_max_freq to max on OPP notifier error
PM / devfreq: Fix devfreq_notifier_call returning errno
Linus Torvalds [Fri, 13 Dec 2019 22:40:38 +0000 (14:40 -0800)]
Merge tag 'sound-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"A small collection of fixes.
The main changes are fixes for a couple of regressions in AMD HD-audio
and FireWire that were introduced in 5.5-rc1. The rest are small fixes
for echoaudio and FireWire, as well as a usual Dell HD-audio fixup"
* tag 'sound-5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek - Line-out jack doesn't work on a Dell AIO
ALSA: hda/hdmi - Fix duplicate unref of pci_dev
ALSA: fireface: fix return value in error path of isochronous resources reservation
ALSA: oxfw: fix return value in error path of isochronous resources reservation
ALSA: firewire-motu: fix double unlocked 'motu->mutex'
ALSA: echoaudio: simplify get_audio_levels
Linus Torvalds [Fri, 13 Dec 2019 22:36:45 +0000 (14:36 -0800)]
Merge tag 'drm-fixes-2019-12-13' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Usual round of rc2 fixes.
i915 and amdgpu leading the charge, but a few others in here,
including some nouveau fixes, all seems pretty for rc2, but hey it's a
Fri 13th pull so I'm sure it'll cause untold bad fortune.
i915:
- GPU hang on idle transition
- GLK+ FBC corruption fix
- non-priv OA access on Tigerlake
- HDCP state fix
- CI found race fixes
amdgpu:
- renoir DC fixes
- GFX8 fence flush alignment with userspace
- Arcturus power profile fix
- DC aux + i2c over aux fixes
- GPUVM invalidation semaphore fixes
- gfx10 golden registers update
mgag200:
- expand startadd fix
panfrost:
- devfreq fix
- memory fixes
mcde:
- DSI pointer deref fix"
* tag 'drm-fixes-2019-12-13' of git://anongit.freedesktop.org/drm/drm: (51 commits)
drm/amdgpu: add invalidate semaphore limit for SRIOV in gmc10
drm/amdgpu: add invalidate semaphore limit for SRIOV and picasso in gmc9
drm/amdgpu: avoid using invalidate semaphore for picasso
Revert "drm/amdgpu: dont schedule jobs while in reset"
drm/amdgpu: fix license on Kconfig and Makefiles
drm/amdgpu/gfx10: update gfx golden settings for navi14
drm/amdgpu/gfx10: update gfx golden settings
drm/amdgpu/gfx10: update gfx golden settings for navi14
drm/amdgpu/gfx10: update gfx golden settings
drm/i915: Serialise with remote retirement
drm/amd/display: include linux/slab.h where needed
drm/amd/display: fix undefined struct member reference
drm/nouveau/kms/nv50-: fix panel scaling
drm/nouveau/kms/nv50-: Limit MST BPC to 8
drm/nouveau/kms/nv50-: Store the bpc we're using in nv50_head_atom
drm/nouveau/kms/nv50-: Call outp_atomic_check_view() before handling PBN
drm/nouveau: Fix drm-core using atomic code-paths on pre-nv50 hardware
drm/nouveau: Move the declaration of struct nouveau_conn_atom up a bit
drm/i915/gt: Detect if we miss WaIdleLiteRestore
drm/i915/hdcp: Nuke intel_hdcp_transcoder_config()
...
Linus Torvalds [Fri, 13 Dec 2019 22:27:19 +0000 (14:27 -0800)]
Merge tag 'for-linus-20191212' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- stable fix for the bi_size overflow. Not a corruption issue, but a
case wher we could merge but disallowed (Andreas)
- NVMe pull request via Keith, with various fixes.
- MD pull request from Song.
- Merge window regression fix for the rq passthrough stats (Logan)
- Remove unused blkcg_drain_queue() function (Guoqing)
* tag 'for-linus-20191212' of git://git.kernel.dk/linux-block:
blk-cgroup: remove blkcg_drain_queue
block: fix NULL pointer dereference in account statistics with IDE
md: make sure desc_nr less than MD_SB_DISKS
md: raid1: check rdev before reference in raid1_sync_request func
raid5: need to set STRIPE_HANDLE for batch head
block: fix "check bi_size overflow before merge"
nvme/pci: Fix read queue count
nvme/pci Limit write queue sizes to possible cpus
nvme/pci: Fix write and poll queue types
nvme/pci: Remove last_cq_head
nvme: Namepace identification descriptor list is optional
nvme-fc: fix double-free scenarios on hw queues
nvme: else following return is not needed
nvme: add error message on mismatching controller ids
nvme_fc: add module to ops template to allow module references
nvmet-loop: Avoid preallocating big SGL for data
nvme-fc: Avoid preallocating big SGL for data
nvme-rdma: Avoid preallocating big SGL for data
Linus Torvalds [Fri, 13 Dec 2019 22:24:54 +0000 (14:24 -0800)]
Merge tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-block
Pull io_uring fixes from Jens Axboe:
- A tweak to IOSQE_IO_LINK (also marked for stable) to allow links that
don't sever if the result is < 0.
This is mostly for linked timeouts, where if we ask for a pure
timeout we always get -ETIME. This makes links useless for that case,
hence allow a case where it works.
- Five minor optimizations to fix and improve cases that regressed
since v5.4.
- An SQTHREAD locking fix.
- A sendmsg/recvmsg iov assignment fix.
- Net fix where read_iter/write_iter don't honor IOCB_NOWAIT, and
subsequently ensuring that works for io_uring.
- Fix a case where for an invalid opcode we might return -EBADF instead
of -EINVAL, if the ->fd of that sqe was set to an invalid fd value.
* tag 'io_uring-5.5-20191212' of git://git.kernel.dk/linux-block:
io_uring: ensure we return -EINVAL on unknown opcode
io_uring: add sockets to list of files that support non-blocking issue
net: make socket read/write_iter() honor IOCB_NOWAIT
io_uring: only hash regular files for async work execution
io_uring: run next sqe inline if possible
io_uring: don't dynamically allocate poll data
io_uring: deferred send/recvmsg should assign iov
io_uring: sqthread should grab ctx->uring_lock for submissions
io-wq: briefly spin for new work after finishing work
io-wq: remove worker->wait waitqueue
io_uring: allow unbreakable links
Linus Torvalds [Fri, 13 Dec 2019 22:13:15 +0000 (14:13 -0800)]
Merge tag 'for-5.5/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM multipath by restoring full path selector functionality for
bio-based configurations that don't haave a SCSI device handler.
- Fix dm-btree removal to ensure non-root btree nodes have at least
(max_entries / 3) entries. This resolves userspace thin_check
utility's report of "too few entries in btree_node".
- Fix both the DM thin-provisioning and dm-clone targets to properly
flush the data device prior to metadata commit. This resolves the
potential for inconsistency across a power loss event when the data
device has a volatile writeback cache.
- Small documentation fixes to dm-clone and dm-integrity.
* tag 'for-5.5/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
docs: dm-integrity: remove reference to ARC4
dm thin: Flush data device before committing metadata
dm thin metadata: Add support for a pre-commit callback
dm clone: Flush destination device before committing metadata
dm clone metadata: Use a two phase commit
dm clone metadata: Track exact changes per transaction
dm btree: increase rebalance threshold in __rebalance2()
dm: add dm-clone to the documentation index
dm mpath: remove harmful bio-based optimization
Linus Torvalds [Fri, 13 Dec 2019 22:02:12 +0000 (14:02 -0800)]
Merge tag 'sizeof_field-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull FIELD_SIZEOF conversion from Kees Cook:
"A mostly mechanical treewide conversion from FIELD_SIZEOF() to
sizeof_field(). This avoids the redundancy of having 2 macros
(actually 3) doing the same thing, and consolidates on sizeof_field().
While "field" is not an accurate name, it is the common name used in
the kernel, and doesn't result in any unintended innuendo.
As there are still users of FIELD_SIZEOF() in -next, I will clean up
those during this coming development cycle and send the final old
macro removal patch at that time"
* tag 'sizeof_field-v5.5-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
treewide: Use sizeof_field() macro
MIPS: OCTEON: Replace SIZEOF_FIELD() macro
Anand Jain [Thu, 5 Dec 2019 11:39:07 +0000 (19:39 +0800)]
btrfs: send: remove WARN_ON for readonly mount
We log warning if root::orphan_cleanup_state is not set to
ORPHAN_CLEANUP_DONE in btrfs_ioctl_send(). However if the filesystem is
mounted as readonly we skip the orphan item cleanup during the lookup
and root::orphan_cleanup_state remains at the init state 0 instead of
ORPHAN_CLEANUP_DONE (2). So during send in btrfs_ioctl_send() we hit the
warning as below.
The warning exists because having orphan inodes could confuse send and
cause it to fail or produce incorrect streams. The two cases that would
cause such send failures, which are already fixed are:
1) Inodes that were unlinked - these are orphanized and remain with a
link count of 0. These caused send operations to fail because it
expected to always find at least one path for an inode. However this
is no longer a problem since send is now able to deal with such
inodes since commit 46b2f4590aab ("Btrfs: fix send failure when root
has deleted files still open") and treats them as having been
completely removed (the state after an orphan cleanup is performed).
2) Inodes that were in the process of being truncated. These resulted in
send not knowing about the truncation and potentially issue write
operations full of zeroes for the range from the new file size to the
old file size. This is no longer a problem because we no longer
create orphan items for truncation since commit f7e9e8fc792f ("Btrfs:
stop creating orphan items for truncate").
As such before these commits, the WARN_ON here provided a clue in case
something went wrong. Instead of being a warning against the
root::orphan_cleanup_state value, it could have been more accurate by
checking if there were actually any orphan items, and then issue a
warning only if any exists, but that would be more expensive to check.
Since orphanized inodes no longer cause problems for send, just remove
the warning.
Reported-by: Christoph Anton Mitterer <calestyo@scientia.net> Link: https://lore.kernel.org/linux-btrfs/21cb5e8d059f6e1496a903fa7bfc0a297e2f5370.camel@scientia.net/ CC: stable@vger.kernel.org # 4.19+ Suggested-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Anand Jain <anand.jain@oracle.com> Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Fri, 6 Dec 2019 14:37:18 +0000 (09:37 -0500)]
btrfs: do not leak reloc root if we fail to read the fs root
If we fail to read the fs root corresponding with a reloc root we'll
just break out and free the reloc roots. But we remove our current
reloc_root from this list higher up, which means we'll leak this
reloc_root. Fix this by adding ourselves back to the reloc_roots list
so we are properly cleaned up.
CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Fri, 6 Dec 2019 14:37:17 +0000 (09:37 -0500)]
btrfs: skip log replay on orphaned roots
My fsstress modifications coupled with generic/475 uncovered a failure
to mount and replay the log if we hit a orphaned root. We do not want
to replay the log for an orphan root, but it's completely legitimate to
have an orphaned root with a log attached. Fix this by simply skipping
replaying the log. We still need to pin it's root node so that we do
not overwrite it while replaying other logs, as we re-read the log root
at every stage of the replay.
CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Fri, 6 Dec 2019 16:39:00 +0000 (11:39 -0500)]
btrfs: handle ENOENT in btrfs_uuid_tree_iterate
If we get an -ENOENT back from btrfs_uuid_iter_rem when iterating the
uuid tree we'll just continue and do btrfs_next_item(). However we've
done a btrfs_release_path() at this point and no longer have a valid
path. So increment the key and go back and do a normal search.
CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Fri, 6 Dec 2019 14:37:15 +0000 (09:37 -0500)]
btrfs: abort transaction after failed inode updates in create_subvol
We can just abort the transaction here, and in fact do that for every
other failure in this function except these two cases.
CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 5 Dec 2019 16:58:41 +0000 (16:58 +0000)]
Btrfs: fix hole extent items with a zero size after range cloning
Normally when cloning a file range if we find an implicit hole at the end
of the range we assume it is because the NO_HOLES feature is enabled.
However that is not always the case. One well known case [1] is when we
have a power failure after mixing buffered and direct IO writes against
the same file.
In such cases we need to punch a hole in the destination file, and if
the NO_HOLES feature is not enabled, we need to insert explicit file
extent items to represent the hole. After commit 690a5dbfc51315
("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning
extents"), we started to insert file extent items representing the hole
with an item size of 0, which is invalid and should be 53 bytes (the size
of a btrfs_file_extent_item structure), resulting in all sorts of
corruptions and invalid memory accesses. This is detected by the tree
checker when we attempt to write a leaf to disk.
The problem can be sporadically triggered by test case generic/561 from
fstests. That test case does not exercise power failure and creates a new
filesystem when it starts, so it does not use a filesystem created by any
previous test that tests power failure. However the test does both
buffered and direct IO writes (through fsstress) and it's precisely that
which is creating the implicit holes in files. That happens even before
the commit mentioned earlier. I need to investigate why we get those
implicit holes to check if there is a real problem or not. For now this
change fixes the regression of introducing file extent items with an item
size of 0 bytes.
Fix the issue by calling btrfs_punch_hole_range() without passing a
btrfs_clone_extent_info structure, which ensures file extent items are
inserted to represent the hole with a correct item size. We were passing
a btrfs_clone_extent_info with a value of 0 for its 'item_size' field,
which was causing the insertion of file extent items with an item size
of 0.
Reported-by: David Sterba <dsterba@suse.com> Fixes: 690a5dbfc51315 ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning extents") Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Fri, 6 Dec 2019 12:27:39 +0000 (12:27 +0000)]
Btrfs: fix removal logic of the tree mod log that leads to use-after-free issues
When a tree mod log user no longer needs to use the tree it calls
btrfs_put_tree_mod_seq() to remove itself from the list of users and
delete all no longer used elements of the tree's red black tree, which
should be all elements with a sequence number less then our equals to
the caller's sequence number. However the logic is broken because it
can delete and free elements from the red black tree that have a
sequence number greater then the caller's sequence number:
1) At a point in time we have sequence numbers 1, 2, 3 and 4 in the
tree mod log;
2) The task which got assigned the sequence number 1 calls
btrfs_put_tree_mod_seq();
3) Sequence number 1 is deleted from the list of sequence numbers;
4) The current minimum sequence number is computed to be the sequence
number 2;
5) A task using sequence number 2 is at tree_mod_log_rewind() and gets
a pointer to one of its elements from the red black tree through
a call to tree_mod_log_search();
6) The task with sequence number 1 iterates the red black tree of tree
modification elements and deletes (and frees) all elements with a
sequence number less then or equals to 2 (the computed minimum sequence
number) - it ends up only leaving elements with sequence numbers of 3
and 4;
7) The task with sequence number 2 now uses the pointer to its element,
already freed by the other task, at __tree_mod_log_rewind(), resulting
in a use-after-free issue. When CONFIG_DEBUG_PAGEALLOC=y it produces
a trace like the following:
Fix this by making btrfs_put_tree_mod_seq() skip deletion of elements that
have a sequence number equals to the computed minimum sequence number, and
not just elements with a sequence number greater then that minimum.
Fixes: bd989ba359f2ac ("Btrfs: add tree modification log functions") CC: stable@vger.kernel.org # 4.4+ Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Mon, 2 Dec 2019 11:01:03 +0000 (11:01 +0000)]
Btrfs: make tree checker detect checksum items with overlapping ranges
Having checksum items, either on the checksums tree or in a log tree, that
represent ranges that overlap each other is a sign of a corruption. Such
case confuses the checksum lookup code and can result in not being able to
find checksums or find stale checksums.
So add a check for such case.
This is motivated by a recent fix for a case where a log tree had checksum
items covering ranges that overlap each other due to extent cloning, and
resulted in missing checksums after replaying the log tree. It also helps
detect past issues such as stale and outdated checksums due to overlapping,
commit 27b9a8122ff71a ("Btrfs: fix csum tree corruption, duplicate and
outdated checksums").
CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Thu, 5 Dec 2019 16:58:30 +0000 (16:58 +0000)]
Btrfs: fix missing data checksums after replaying a log tree
When logging a file that has shared extents (reflinked with other files or
with itself), we can end up logging multiple checksum items that cover
overlapping ranges. This confuses the search for checksums at log replay
time causing some checksums to never be added to the fs/subvolume tree.
Consider the following example of a file that shares the same extent at
offsets 0 and 256Kb:
[ bytenr 13631488, offset 64Kb, len 192Kb ]
64Kb 256Kb
[ bytenr 13893632, offset 0, len 256Kb ]
256Kb 512Kb
When logging the inode, at tree-log.c:copy_items(), when processing the
file extent item at offset 0, we log a checksum item covering the range 13959168 to 14024704, which corresponds to 13893632 + 64Kb and 13893632 +
64Kb + 64Kb, respectively.
Later when processing the extent item at offset 256K, we log the checksums
for the range from 13893632 to 14155776 (which corresponds to 13893632 +
256Kb). These checksums get merged with the checksum item for the range
from 13631488 to 13893632 (13631488 + 256Kb), logged by a previous fsync.
So after this we get the two following checksum items in the log tree:
(...)
item 6 key (EXTENT_CSUM EXTENT_CSUM 13631488) itemoff 3095 itemsize 512
range start 13631488 end 14155776 length 524288
item 7 key (EXTENT_CSUM EXTENT_CSUM 13959168) itemoff 3031 itemsize 64
range start 13959168 end 14024704 length 65536
The first one covers the range from the second one, they overlap.
So far this does not cause a problem after replaying the log, because
when replaying the file extent item for offset 256K, we copy all the
checksums for the extent 13893632 from the log tree to the fs/subvolume
tree, since searching for an checksum item for bytenr 13893632 leaves us
at the first checksum item, which covers the whole range of the extent.
However if we write 64Kb to file offset 256Kb for example, we will
not be able to find and copy the checksums for the last 128Kb of the
extent at bytenr 13893632, referenced by the file range 384Kb to 512Kb.
After writing 64Kb into file offset 256Kb we get the following extent
layout for our file:
[ bytenr 13631488, offset 64Kb, len 192Kb ]
64Kb 256Kb
[ bytenr 14155776, offset 0, len 64Kb ]
256Kb 320Kb
[ bytenr 13893632, offset 64Kb, len 192Kb ]
320Kb 512Kb
After fsync'ing the file, if we have a power failure and then mount
the filesystem to replay the log, the following happens:
1) When replaying the file extent item for file offset 320Kb, we
lookup for the checksums for the extent range from 13959168
(13893632 + 64Kb) to 14155776 (13893632 + 256Kb), through a call
to btrfs_lookup_csums_range();
2) btrfs_lookup_csums_range() finds the checksum item that starts
precisely at offset 13959168 (item 7 in the log tree, shown before);
3) However that checksum item only covers 64Kb of data, and not 192Kb
of data;
4) As a result only the checksums for the first 64Kb of data referenced
by the file extent item are found and copied to the fs/subvolume tree.
The remaining 128Kb of data, file range 384Kb to 512Kb, doesn't get
the corresponding data checksums found and copied to the fs/subvolume
tree.
5) After replaying the log userspace will not be able to read the file
range from 384Kb to 512Kb, because the checksums are missing and
resulting in an -EIO error.
The following steps reproduce this scenario:
$ mkfs.btrfs -f /dev/sdc
$ mount /dev/sdc /mnt/sdc
$ dmesg | tail
[165305.003464] BTRFS info (device sdc): no csum found for inode 257 start 401408
[165305.004014] BTRFS info (device sdc): no csum found for inode 257 start 405504
[165305.004559] BTRFS info (device sdc): no csum found for inode 257 start 409600
[165305.005101] BTRFS info (device sdc): no csum found for inode 257 start 413696
[165305.005627] BTRFS info (device sdc): no csum found for inode 257 start 417792
[165305.006134] BTRFS info (device sdc): no csum found for inode 257 start 421888
[165305.006625] BTRFS info (device sdc): no csum found for inode 257 start 425984
[165305.007278] BTRFS info (device sdc): no csum found for inode 257 start 430080
[165305.008248] BTRFS warning (device sdc): csum failed root 5 ino 257 off 393216 csum 0x1337385e expected csum 0x00000000 mirror 1
[165305.009550] BTRFS warning (device sdc): csum failed root 5 ino 257 off 393216 csum 0x1337385e expected csum 0x00000000 mirror 1
Fix this simply by deleting first any checksums, from the log tree, for the
range of the extent we are logging at copy_items(). This ensures we do not
get checksum items in the log tree that have overlapping ranges.
This is a long time issue that has been present since we have the clone
(and deduplication) ioctl, and can happen both when an extent is shared
between different files and within the same file.
A test case for fstests follows soon.
CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Dan Carpenter [Tue, 3 Dec 2019 11:24:58 +0000 (14:24 +0300)]
btrfs: return error pointer from alloc_test_extent_buffer
Callers of alloc_test_extent_buffer have not correctly interpreted the
return value as error pointer, as alloc_test_extent_buffer should behave
as alloc_extent_buffer. The self-tests were unaffected but
btrfs_find_create_tree_block could call both functions and that would
cause problems up in the call chain.
Fixes: faa2dbf004e8 ("Btrfs: add sanity tests for new qgroup accounting code") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
David Sterba [Wed, 27 Nov 2019 15:10:54 +0000 (16:10 +0100)]
btrfs: fix devs_max constraints for raid1c3 and raid1c4
The value 0 for devs_max means to spread the allocated chunks over all
available devices, eg. stripe for RAID0 or RAID5. This got mistakenly
copied to the RAID1C3/4 profiles. The intention is to have exactly 3 and
4 copies respectively.
Fixes: 47e6f7423b91 ("btrfs: add support for 3-copy replication (raid1c3)") Fixes: 8d6fac0087e5 ("btrfs: add support for 4-copy replication (raid1c4)") Signed-off-by: David Sterba <dsterba@suse.com>
Andreas Färber [Fri, 8 Nov 2019 21:38:52 +0000 (22:38 +0100)]
btrfs: tree-checker: Fix error format string for size_t
Argument BTRFS_FILE_EXTENT_INLINE_DATA_START is defined as offsetof(),
which returns type size_t, so we need %zu instead of %lu.
This fixes a build warning on 32-bit ARM:
../fs/btrfs/tree-checker.c: In function 'check_extent_data_item':
../fs/btrfs/tree-checker.c:230:43: warning: format '%lu' expects argument of type 'long unsigned int', but argument 5 has type 'unsigned int' [-Wformat=]
230 | "invalid item size, have %u expect [%lu, %u)",
| ~~^
| long unsigned int
| %u
Fixes: 153a6d299956 ("btrfs: tree-checker: Check item size before reading file extent type") Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Andreas Färber <afaerber@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 19 Nov 2019 18:59:20 +0000 (13:59 -0500)]
btrfs: don't double lock the subvol_sem for rename exchange
If we're rename exchanging two subvols we'll try to lock this lock
twice, which is bad. Just lock once if either of the ino's are subvols.
Fixes: cdd1fedf8261 ("btrfs: add support for RENAME_EXCHANGE and RENAME_WHITEOUT") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 19 Nov 2019 18:59:00 +0000 (13:59 -0500)]
btrfs: handle error in btrfs_cache_block_group
We have a BUG_ON(ret < 0) in find_free_extent from
btrfs_cache_block_group. If we fail to allocate our ctl we'll just
panic, which is not good. Instead just go on to another block group.
If we fail to find a block group we don't want to return ENOSPC, because
really we got a ENOMEM and that's the root of the problem. Save our
return from btrfs_cache_block_group(), and then if we still fail to make
our allocation return that ret so we get the right error back.
Tested with inject-error.py from bcc.
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Josef Bacik [Tue, 19 Nov 2019 18:59:35 +0000 (13:59 -0500)]
btrfs: do not call synchronize_srcu() in inode_tree_del
Testing with the new fsstress uncovered a pretty nasty deadlock with
lookup and snapshot deletion.
Process A
unlink
-> final iput
-> inode_tree_del
-> synchronize_srcu(subvol_srcu)
Process B
btrfs_lookup <- srcu_read_lock() acquired here
-> btrfs_iget
-> find inode that has I_FREEING set
-> __wait_on_freeing_inode()
We're holding the srcu_read_lock() while doing the iget in order to make
sure our fs root doesn't go away, and then we are waiting for the inode
to finish freeing. However because the free'ing process is doing a
synchronize_srcu() we deadlock.
Fix this by dropping the synchronize_srcu() in inode_tree_del(). We
don't need people to stop accessing the fs root at this point, we're
only adding our empty root to the dead roots list.
A larger much more invasive fix is forthcoming to address how we deal
with fs roots, but this fixes the immediate problem.
Fixes: 76dda93c6ae2 ("Btrfs: add snapshot/subvolume destroy ioctl") CC: stable@vger.kernel.org # 4.4+ Signed-off-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Since commit 0e4a459f56c32d3e ("tracing: Remove unnecessary DEBUG_FS
dependency"), CONFIG_DEBUG_FS is no longer auto-enabled. This breaks
booting Debian 9, as systemd needs debugfs:
[FAILED] Failed to mount /sys/kernel/debug.
See 'systemctl status sys-kernel-debug.mount' for details.
[DEPEND] Dependency failed for Local File Systems.
...
You are in emergGive root password for maintenance
(or press Control-D to continue):
Fix this by enabling CONFIG_DEBUG_FS explicitly.
See also commit 18977008f44c66bd ("ARM: multi_v7_defconfig: Restore
debugfs support").
Filipe Manana [Thu, 5 Dec 2019 16:57:39 +0000 (16:57 +0000)]
Btrfs: fix cloning range with a hole when using the NO_HOLES feature
When using the NO_HOLES feature if we clone a range that contains a hole
and a temporary ENOSPC happens while dropping extents from the target
inode's range, we can end up failing and aborting the transaction with
-EEXIST or with a corrupt file extent item, that has a length greater
than it should and overlaps with other extents. For example when cloning
the following range from inode A to inode B:
Target range: [1MB, 6MB) (same as source, to make it easier to explain)
The following can happen:
1) btrfs_punch_hole_range() gets -ENOSPC from __btrfs_drop_extents();
2) At that point, 'cur_offset' is set to 1MB and __btrfs_drop_extents()
set 'drop_end' to 2MB, meaning it was able to drop only extent B2;
3) We then compute 'clone_len' as 'drop_end' - 'cur_offset' = 2MB - 1MB =
1MB;
4) We then attempt to insert a file extent item at inode B with a file
offset of 5MB, which is the value of clone_info->file_offset. This
fails with error -EEXIST because there's already an extent at that
offset (extent B4);
5) We abort the current transaction with -EEXIST and return that error
to user space as well.
Target range: [1MB, 12MB) (same as source, to make it easier to explain)
1) btrfs_punch_hole_range() gets -ENOSPC from __btrfs_drop_extents();
2) At that point, 'cur_offset' is set to 1MB and __btrfs_drop_extents()
set 'drop_end' to 5MB, meaning it was able to drop only extent B2;
3) We then compute 'clone_len' as 'drop_end' - 'cur_offset' = 5MB - 1MB =
4MB;
4) We then insert a file extent item at inode B with a file offset of 11MB
which is the value of clone_info->file_offset, and a length of 4MB (the
value of 'clone_len'). So we get 2 extents items with ranges that
overlap and an extent length of 4MB, larger then the extent A2 from
inode A (1MB length);
5) After that we end the transaction, balance the btree dirty pages and
then start another or join the previous transaction. It might happen
that the transaction which inserted the incorrect extent was committed
by another task so we end up with extent corruption if a power failure
happens.
So fix this by making sure we attempt to insert the extent to clone at
the destination inode only if we are past dropping the sub-range that
corresponds to a hole.
Fixes: 690a5dbfc51315 ("Btrfs: fix ENOSPC errors, leading to transaction aborts, when cloning extents") Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>