Omar Sandoval [Thu, 3 Nov 2016 17:28:12 +0000 (10:28 -0700)]
Btrfs: prevent ioctls from interfering with a swap file
A later patch will implement swap file support for Btrfs, but before we
do that, we need to make sure that the various Btrfs ioctls cannot
change a swap file.
When a swap file is active, we must make sure that the extents of the
file are not moved and that they don't become shared. That means that
the following are not safe:
Don't allow those to happen on an active swap file.
Additionally, balance, resize, device remove, and device replace are
also unsafe if they affect an active swapfile. Add a red-black tree of
block groups and devices which contain an active swapfile. Relocation
checks each block group against this tree and skips it or errors out for
balance or resize, respectively. Device remove and device replace check
the tree for the device they will operate on.
Note that we don't have to worry about chattr -C (disable nocow), which
we ignore for non-empty files, because an active swapfile must be
non-empty and can't be truncated. We also don't have to worry about
autodefrag because it's only done on COW files. Truncate and fallocate
are already taken care of by the generic code. Device add doesn't do
relocation so it's not an issue, either.
Signed-off-by: Omar Sandoval <osandov@fb.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
This is the counterpart to merge_extent_hook, similarly, it's used only
for data/freespace inodes so let's remove it, rename it and call it
directly where necessary. No functional changes.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
This callback is used only for data and free space inodes. Such inodes
are guaranteed to have their extent_io_tree::private_data set to the
inode struct. Exploit this fact to directly call the function. Also give
it a more descriptive name. No functional changes.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
This is the counterpart to ex-set_bit_hook (now btrfs_set_delalloc_extent),
similar to what was done before remove clear_bit_hook and rename the
function. No functional changes.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
This callback is used to properly account delalloc extents for data
inodes (ordinary file inodes and freespace v1 inodes). Those can be
easily identified since they have their extent_io trees ->private_data
member point to the inode. Let's exploit this fact to remove the
needless indirection through extent_io_hooks and directly call the
function. Also give the function a name which reflects its purpose -
btrfs_set_delalloc_extent.
This patch also modified test_find_delalloc so that the extent_io_tree
used for testing doesn't have its ->private_data set which would have
caused a crash in btrfs_set_delalloc_extent due to the btrfs_inode->root
member not being initialised. The old version of the code also didn't
call set_bit_hook since the extent_io ops weren't set for the inode. No
functional changes.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
This callback was only used in debug builds by btrfs_leak_debug_check.
A better approach is to move its implementation in
btrfs_leak_debug_check and ensure the latter is only executed for extent
tree which have ->private_data set i.e. relate to a data node and not
the btree one. No functional changes.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
This callback is ony ever called for data page writeout so there is no
need to actually abstract it via extent_io_ops. Lets just export it,
remove the definition of the callback and call it directly in the
functions that invoke the callback. Also rename the function to
btrfs_writepage_endio_finish_ordered since what it really does is
account finished io in the ordered extent data structures. No
functional changes.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Thu, 1 Nov 2018 12:09:47 +0000 (14:09 +0200)]
btrfs: Remove extent_io_ops::writepage_start_hook
This hook is called only from __extent_writepage_io which is already
called only from the data page writeout path. So there is no need to
make an indirect call via extent_io_ops. This patch just removes the
callback definition, exports the callback function and calls it directly
at the only call site. Also give the function a more descriptive name.
No functional changes.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Nikolay Borisov [Thu, 1 Nov 2018 12:09:46 +0000 (14:09 +0200)]
btrfs: Remove extent_io_ops::fill_delalloc
This callback is called only from writepage_delalloc which in turn is
guaranteed to be called from the data page writeout path. In the end
there is no reason to have the call to this function to be indrected via
the extent_io_ops structure. This patch removes the callback definition,
exports the function and calls it directly. No functional changes.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Nikolay Borisov <nborisov@suse.com> Reviewed-by: David Sterba <dsterba@suse.com>
[ rename to btrfs_run_delalloc_range ] Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 5 Oct 2018 09:45:54 +0000 (17:45 +0800)]
btrfs: volumes: Make sure there is no overlap of dev extents at mount time
Enhance btrfs_verify_dev_extents() to remember previous checked dev
extents, so it can verify no dev extents can overlap.
Analysis from Hans:
"Imagine allocating a DATA|DUP chunk.
In the chunk allocator, we first set...
max_stripe_size = SZ_1G;
max_chunk_size = BTRFS_MAX_DATA_CHUNK_SIZE
... which is 10GiB.
Then...
/* we don't want a chunk larger than 10% of writeable space */
max_chunk_size = min(div_factor(fs_devices->total_rw_bytes, 1),
max_chunk_size);
Imagine we only have one 7880MiB block device in this filesystem. Now
max_chunk_size is down to 788MiB.
The next step in the code is to search for max_stripe_size * dev_stripes
amount of free space on the device, which is in our example 1GiB * 2 =
2GiB. Imagine the device has exactly 1578MiB free in one contiguous
piece. This amount of bytes will be put in devices_info[ndevs - 1].max_avail
Next we recalculate the stripe_size (which is actually the device extent
length), based on the actual maximum amount of available raw disk space:
stripe_size = div_u64(devices_info[ndevs - 1].max_avail, dev_stripes);
stripe_size is now 789MiB
Next we do...
data_stripes = num_stripes / ncopies
...where data_stripes ends up as 1, because num_stripes is 2 (the amount
of device extents we're going to have), and DUP has ncopies 2.
Next there's a check...
if (stripe_size * data_stripes > max_chunk_size)
...which matches because 789MiB * 1 > 788MiB.
We go into the if code, and next is...
stripe_size = div_u64(max_chunk_size, data_stripes);
...which resets stripe_size to max_chunk_size: 788MiB
Next is a fun one...
/* bump the answer up to a 16MB boundary */
stripe_size = round_up(stripe_size, SZ_16M);
...which changes stripe_size from 788MiB to 800MiB.
We're not done changing stripe_size yet...
/* But don't go higher than the limits we found while searching
* for free extents
*/
stripe_size = min(devices_info[ndevs - 1].max_avail,
stripe_size);
This is bad. max_avail is twice the stripe_size (we need to fit 2 device
extents on the same device for DUP).
The result here is that 800MiB < 1578MiB, so it's unchanged. However,
the resulting DUP chunk will need 1600MiB disk space, which isn't there,
and the second dev_extent might extend into the next thing (next
dev_extent? end of device?) for 22MiB.
The last shown line of code relies on a situation where there's twice
the value of stripe_size present as value for the variable stripe_size
when it's DUP. This was actually the case before commit 92e222df7b
"btrfs: alloc_chunk: fix DUP stripe size handling", from which I quote:
"[...] in the meantime there's a check to see if the stripe_size does
not exceed max_chunk_size. Since during this check stripe_size is twice
the amount as intended, the check will reduce the stripe_size to
max_chunk_size if the actual correct to be used stripe_size is more than
half the amount of max_chunk_size."
In the previous version of the code, the 16MiB alignment (why is this
done, by the way?) would result in a 50% chance that it would actually
do an 8MiB alignment for the individual dev_extents, since it was
operating on double the size. Does this matter?
Does it matter that stripe_size can be set to anything which is not
16MiB aligned because of the amount of remaining available disk space
which is just taken?
What is the main purpose of this round_up?
The most straightforward thing to do seems something like...
stripe_size = min(
div_u64(devices_info[ndevs - 1].max_avail, dev_stripes),
stripe_size
)
..just putting half of the max_avail into stripe_size."
Qu Wenruo [Fri, 2 Nov 2018 01:39:50 +0000 (09:39 +0800)]
btrfs: Refactor find_free_extent loops update into find_free_extent_update_loop
We have a complex loop design for find_free_extent(), that has different
behavior for each loop, some even includes new chunk allocation.
Instead of putting such a long code into find_free_extent() and makes it
harder to read, just extract them into find_free_extent_update_loop().
With all the cleanups, the main find_free_extent() should be pretty
barebone:
find_free_extent()
|- Iterate through all block groups
| |- Get a valid block group
| |- Try to do clustered allocation in that block group
| |- Try to do unclustered allocation in that block group
| |- Check if the result is valid
| | |- If valid, then exit
| |- Jump to next block group
|
|- Push harder to find free extents
|- If not found, re-iterate all block groups
Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com>
[ copy callchain from changelog to function comment ] Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 2 Nov 2018 01:39:49 +0000 (09:39 +0800)]
btrfs: Refactor unclustered extent allocation into find_free_extent_unclustered()
This patch will extract unclsutered extent allocation code into
find_free_extent_unclustered().
And this helper function will use return value to indicate what to do
next.
This should make find_free_extent() a little easier to read.
Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com>
[Update merge conflict with fb5c39d7a887 ("btrfs: don't use ctl->free_space for max_extent_size")] Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Qu Wenruo [Fri, 2 Nov 2018 01:39:47 +0000 (09:39 +0800)]
btrfs: Introduce find_free_extent_ctl structure for later rework
Instead of tons of different local variables in find_free_extent(),
extract them into find_free_extent_ctl structure, and add better
explanation for them.
Some modification may looks redundant, but will later greatly simplify
function parameter list during find_free_extent() refactor.
Also add two comments to co-operate with fb5c39d7a887 ("btrfs: don't use
ctl->free_space for max_extent_size"), to make ffe_ctl->max_extent_size
update more reader-friendly.
Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Su Yue <suy.fnst@cn.fujitsu.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Introduce a new wrapper update_bytes_pinned to replace open coded
bytes_pinned modifiers. Now the underflows of space_info::bytes_pinned
get detected and reported.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
Although we have space_info::bytes_may_use underflow detection in
btrfs_free_reserved_data_space_noquota(), we have more callers who are
subtracting number from space_info::bytes_may_use.
So instead of doing underflow detection for every caller, introduce a
new wrapper update_bytes_may_use() to replace open coded bytes_may_use
modifiers.
This also introduce a macro to declare more wrappers, but currently
space_info::bytes_may_use is the mostly interesting one.
Reviewed-by: Josef Bacik <josef@toxicpanda.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Lu Fengqi <lufq.fnst@cn.fujitsu.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Fri, 26 Oct 2018 16:15:21 +0000 (17:15 +0100)]
Btrfs: remove no longer used stuff for tracking pending ordered extents
Tracking pending ordered extents per transaction was introduced in commit 50d9aa99bd35 ("Btrfs: make sure logged extents complete in the current
transaction V3") and later updated in commit 161c3549b45a ("Btrfs: change
how we wait for pending ordered extents").
However now that on fsync we always wait for ordered extents to complete
before logging, done in commit 5636cf7d6dc8 ("btrfs: remove the logged
extents infrastructure"), we no longer need the stuff to track for pending
ordered extents, which was not completely removed in the mentioned commit.
So remove the remaining of the pending ordered extents infrastructure.
Reviewed-by: Liu Bo <bo.liu@linux.alibaba.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Filipe Manana [Fri, 26 Oct 2018 20:26:40 +0000 (21:26 +0100)]
Btrfs: remove no longer used logged range variables when logging extents
The logged_start and logged_end variables, at btrfs_log_changed_extents,
were added in commit 8c6c592831a0 ("btrfs: log csums for all modified
extents"). However since the recent simplification for fsync, which makes
us wait for all ordered extents to complete before logging extents, we
no longer need those variables. Commit a2120a473a80 ("btrfs: clean up the
left over logged_list usage") forgot to remove them.
Reviewed-by: Nikolay Borisov <nborisov@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
Linus Torvalds [Fri, 14 Dec 2018 23:35:30 +0000 (15:35 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"11 fixes"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
scripts/spdxcheck.py: always open files in binary mode
checkstack.pl: fix for aarch64
userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered
fs/iomap.c: get/put the page in iomap_page_create/release()
hugetlbfs: call VM_BUG_ON_PAGE earlier in free_huge_page()
memblock: annotate memblock_is_reserved() with __init_memblock
psi: fix reference to kernel commandline enable
arch/sh/include/asm/io.h: provide prototypes for PCI I/O mapping in asm/io.h
mm/sparse: add common helper to mark all memblocks present
mm: introduce common STRUCT_PAGE_MAX_SHIFT define
alpha: fix hang caused by the bootmem removal
Thierry Reding [Fri, 14 Dec 2018 22:17:24 +0000 (14:17 -0800)]
scripts/spdxcheck.py: always open files in binary mode
The spdxcheck script currently falls over when confronted with a binary
file (such as Documentation/logo.gif). To avoid that, always open files
in binary mode and decode line-by-line, ignoring encoding errors.
One tricky case is when piping data into the script and reading it from
standard input. By default, standard input will be opened in text mode,
so we need to reopen it in binary mode.
The breakage only happens with python3 and results in a
UnicodeDecodeError (according to Uwe).
Link: http://lkml.kernel.org/r/20181212131210.28024-1-thierry.reding@gmail.com Fixes: 6f4d29df66ac ("scripts/spdxcheck.py: make python3 compliant") Signed-off-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Jeremy Cline <jcline@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Joe Perches <joe@perches.com> Cc: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Right now, checkstack.pl isn't able to print anything on aarch64,
because it won't be able to match the stating objdump line of a function
due to this missing space. Hence, it displays every stack as zero-size.
After this patch, checkpatch.pl is able to match the start of a
function's objdump, and is then able to calculate each function's stack
correctly.
Andrea Arcangeli [Fri, 14 Dec 2018 22:17:17 +0000 (14:17 -0800)]
userfaultfd: check VM_MAYWRITE was set after verifying the uffd is registered
Calling UFFDIO_UNREGISTER on virtual ranges not yet registered in uffd
could trigger an harmless false positive WARN_ON. Check the vma is
already registered before checking VM_MAYWRITE to shut off the false
positive warning.
Link: http://lkml.kernel.org/r/20181206212028.18726-2-aarcange@redhat.com Cc: <stable@vger.kernel.org> Fixes: 29ec90660d68 ("userfaultfd: shmem/hugetlbfs: only allow to register VM_MAYWRITE vmas") Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> Reported-by: syzbot+06c7092e7d71218a2c16@syzkaller.appspotmail.com Acked-by: Mike Rapoport <rppt@linux.ibm.com> Acked-by: Hugh Dickins <hughd@google.com> Acked-by: Peter Xu <peterx@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
fs/iomap.c: get/put the page in iomap_page_create/release()
migrate_page_move_mapping() expects pages with private data set to have
a page_count elevated by 1. This is what used to happen for xfs through
the buffer_heads code before the switch to iomap in commit 82cb14175e7d
("xfs: add support for sub-pagesize writeback without buffer_heads").
Not having the count elevated causes move_pages() to fail on memory
mapped files coming from xfs.
Make iomap compatible with the migrate_page_move_mapping() assumption by
elevating the page count as part of iomap_page_create() and lowering it
in iomap_page_release().
It causes the move_pages() syscall to misbehave on memory mapped files
from xfs. It does not not move any pages, which I suppose is "just" a
perf issue, but it also ends up returning a positive number which is out
of spec for the syscall. Talking to Michal Hocko, it sounds like
returning positive numbers might be a necessary update to move_pages()
anyway though
(https://lkml.kernel.org/r/20181116114955.GJ14706@dhcp22.suse.cz).
I only hit this in tests that verify that move_pages() actually moved
the pages. The test also got confused by the positive return from
move_pages() (it got treated as a success as positive numbers were not
expected and not handled) making it a bit harder to track down what's
going on.
Link: http://lkml.kernel.org/r/20181115184140.1388751-1-pjaroszynski@nvidia.com Fixes: 82cb14175e7d ("xfs: add support for sub-pagesize writeback without buffer_heads") Signed-off-by: Piotr Jaroszynski <pjaroszynski@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Darrick J. Wong <darrick.wong@oracle.com> Cc: Brian Foster <bfoster@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yongkai Wu [Fri, 14 Dec 2018 22:17:10 +0000 (14:17 -0800)]
hugetlbfs: call VM_BUG_ON_PAGE earlier in free_huge_page()
A stack trace was triggered by VM_BUG_ON_PAGE(page_mapcount(page), page)
in free_huge_page(). Unfortunately, the page->mapping field was set to
NULL before this test. This made it more difficult to determine the
root cause of the problem.
Move the VM_BUG_ON_PAGE tests earlier in the function so that if they do
trigger more information is present in the page struct.
Link: http://lkml.kernel.org/r/1543491843-23438-1-git-send-email-nic_w@163.com Signed-off-by: Yongkai Wu <nic_w@163.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yueyi Li [Fri, 14 Dec 2018 22:17:06 +0000 (14:17 -0800)]
memblock: annotate memblock_is_reserved() with __init_memblock
Found warning:
WARNING: EXPORT symbol "gsi_write_channel_scratch" [vmlinux] version generation failed, symbol will not be versioned.
WARNING: vmlinux.o(.text+0x1e0a0): Section mismatch in reference from the function valid_phys_addr_range() to the function .init.text:memblock_is_reserved()
The function valid_phys_addr_range() references
the function __init memblock_is_reserved().
This is often because valid_phys_addr_range lacks a __init
annotation or the annotation of memblock_is_reserved is wrong.
Baruch Siach [Fri, 14 Dec 2018 22:17:03 +0000 (14:17 -0800)]
psi: fix reference to kernel commandline enable
The kernel commandline parameter named in CONFIG_PSI_DEFAULT_DISABLED
help text contradicts the documentation in kernel-parameters.txt, and
the code. Fix that.
Link: http://lkml.kernel.org/r/20181203213416.GA12627@cmpxchg.org Fixes: e0c274472d ("psi: make disabling/enabling easier for vendor kernels") Signed-off-by: Baruch Siach <baruch@tkos.co.il> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mark Brown [Fri, 14 Dec 2018 22:17:00 +0000 (14:17 -0800)]
arch/sh/include/asm/io.h: provide prototypes for PCI I/O mapping in asm/io.h
Most architectures provide prototypes for the PCI I/O mapping operations
when asm/io.h is included but SH doesn't currently do that, leading to
for example warnings in sound/pci/hda/patch_ca0132.c when pci_iomap() is
used on current -next. Make SH more consistent with other architectures
by including asm-generic/pci_iomap.h in asm/io.h.
Link: http://lkml.kernel.org/r/20181106175142.27988-1-broonie@kernel.org Signed-off-by: Mark Brown <broonie@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Cc: Rich Felker <dalias@libc.org> Cc: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Logan Gunthorpe [Fri, 14 Dec 2018 22:16:57 +0000 (14:16 -0800)]
mm/sparse: add common helper to mark all memblocks present
Presently the arches arm64, arm and sh have a function which loops
through each memblock and calls memory present. riscv will require a
similar function.
Introduce a common memblocks_present() function that can be used by all
the arches. Subsequent patches will cleanup the arches that make use of
this.
Link: http://lkml.kernel.org/r/20181107205433.3875-3-logang@deltatee.com Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Oscar Salvador <osalvador@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Logan Gunthorpe [Fri, 14 Dec 2018 22:16:53 +0000 (14:16 -0800)]
mm: introduce common STRUCT_PAGE_MAX_SHIFT define
This define is used by arm64 to calculate the size of the vmemmap
region. It is defined as the log2 of the upper bound on the size of a
struct page.
We move it into mm_types.h so it can be defined properly instead of set
and checked with a build bug. This also allows us to use the same
define for riscv.
Link: http://lkml.kernel.org/r/20181107205433.3875-2-logang@deltatee.com Signed-off-by: Logan Gunthorpe <logang@deltatee.com> Acked-by: Will Deacon <will.deacon@arm.com> Acked-by: Andrew Morton <akpm@linux-foundation.org> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Mike Rapoport [Fri, 14 Dec 2018 22:16:50 +0000 (14:16 -0800)]
alpha: fix hang caused by the bootmem removal
The conversion of alpha to memblock as the early memory manager caused
boot to hang as described at [1].
The issue is caused because for CONFIG_DISCTONTIGMEM=y case,
memblock_add() is called using memory start PFN that had been rounded
down to the nearest 8Mb and it caused memblock to see more memory that
is actually present in the system.
Besides, memblock allocates memory from high addresses while bootmem was
using low memory, which broke the assumption that early allocations are
always accessible by the hardware.
This patch ensures that memblock_add() is using the correct PFN for the
memory start and forces memblock to use bottom-up allocations.
[1] https://lkml.org/lkml/2018/11/22/1032
Link: http://lkml.kernel.org/r/1543233216-25833-1-git-send-email-rppt@linux.ibm.com Reported-by: Meelis Roos <mroos@linux.ee> Signed-off-by: Mike Rapoport <rppt@linux.ibm.com> Tested-by: Meelis Roos <mroos@linux.ee> Cc: Richard Henderson <rth@twiddle.net> Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru> Cc: Matt Turner <mattst88@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 14 Dec 2018 20:18:30 +0000 (12:18 -0800)]
Merge tag 'for-linus-20181214' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Three small fixes for this week. contains:
- spectre indexing fix for aio (Jeff)
- fix for the previous zeroing bio fix, we don't need it for user
mapped pages, and in fact it breaks some applications if we do
(Keith)
- allocation failure fix for null_blk with zoned (Shin'ichiro)"
* tag 'for-linus-20181214' of git://git.kernel.dk/linux-block:
block: Fix null_blk_zoned creation failure with small number of zones
aio: fix spectre gadget in lookup_ioctx
block/bio: Do not zero user pages
Linus Torvalds [Fri, 14 Dec 2018 17:33:34 +0000 (09:33 -0800)]
Merge tag 'powerpc-4.20-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"One notable fix for our change to split pt_regs between user/kernel,
we forgot to update BPF to use the user-visible type which was an ABI
break for BPF programs.
A slightly ugly but minimal fix to do_syscall_trace_enter() so that we
use tracehook_report_syscall_entry() properly. We'll rework the code
in next to avoid the empty if body.
Seven commits fixing bugs in the new papr_scm (Storage Class Memory)
driver. The driver was finally able to be tested on the other
hypervisor which exposed several bugs. The fixes are all fairly
minimal at least.
Fix a crash in our MSI code if an MSI-capable device is plugged into a
non-MSI capable PHB, only seen on older hardware (MPC8378).
Fix our legacy serial code to look for "stdout-path" since the device
trees were updated to use that instead of "linux,stdout-path".
A change to the COFF zImage code to fix booting old powermacs.
A couple of minor build fixes.
Thanks to: Benjamin Herrenschmidt, Daniel Axtens, Dmitry V. Levin,
Elvira Khabirova, Oliver O'Halloran, Paul Mackerras, Radu Rendec, Rob
Herring, Sandipan Das"
* tag 'powerpc-4.20-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/ptrace: replace ptrace_report_syscall() with a tracehook call
powerpc/mm: Fallback to RAM if the altmap is unusable
powerpc/papr_scm: Use ibm,unit-guid as the iset cookie
powerpc/papr_scm: Fix DIMM device registration race
powerpc/papr_scm: Remove endian conversions
powerpc/papr_scm: Update DT properties
powerpc/papr_scm: Fix resource end address
powerpc/papr_scm: Use depend instead of select
powerpc/bpf: Fix broken uapi for BPF_PROG_TYPE_PERF_EVENT
powerpc/boot: Fix build failures with -j 1
powerpc: Look for "stdout-path" when setting up legacy consoles
powerpc/msi: Fix NULL pointer access in teardown code
powerpc/mm: Fix linux page tables build with some configs
powerpc: Fix COFF zImage booting on old powermacs
Linus Torvalds [Fri, 14 Dec 2018 17:12:02 +0000 (09:12 -0800)]
Merge tag 'drm-fixes-2018-12-14' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"While I hoped things would calm down, the world hasn't joined with me,
but it's a few things scattered over a wide area. The i915 workarounds
regression fix is probably the largest, the rest are more usual sized.
We also get some new AMD PCI IDs.
There is also a patch in here to MAINTAINERS to added Daniel as an
official DRM toplevel co-maintainer, he's decided he wants to step up
and share the glory, and he'll likely process next weeks fixes while
I'm away on holidays.
Summary:
amdgpu:
- some new PCI IDs
- fixed firmware image updates
- power management fixes
- locking warning fix
nouveau:
- framebuffer flushing fix
- memory leak fix
- tegra device init regression fix
vmwgfx:
- OOM kernel memory fix
- excess return in function fix
i915:
- the biggest fix is a regression fix where workarounds weren't
getting reapplied after a gpu hang causing further crashing, this
fixes the workaround application to make it happen again
- GPU hang fixes for Braswell and some GEN3 GPUs
- GVT fix for broadwell tiling
rockchip:
- revert to fix a regression causing a WARN on shutdown
mediatek:
- avoid crash attaching to non-existant bridges"
* tag 'drm-fixes-2018-12-14' of git://anongit.freedesktop.org/drm/drm: (23 commits)
drm/vmwgfx: Protect from excessive execbuf kernel memory allocations v3
MAINTAINERS: Daniel for drm co-maintainer
drm/amdgpu: drop fclk/gfxclk ratio setting
drm/vmwgfx: remove redundant return ret statement
drm/i915: Flush GPU relocs harder for gen3
drm/i915: Allocate a common scratch page
drm/i915/execlists: Apply a full mb before execution for Braswell
drm/nouveau/kms: Fix memory leak in nv50_mstm_del()
drm/nouveau/kms/nv50-: also flush fb writes when rewinding push buffer
drm/amdgpu: Fix DEBUG_LOCKS_WARN_ON(depth <= 0) in amdgpu_ctx.lock
Revert "drm/rockchip: Allow driver to be shutdown on reboot/kexec"
drm/nouveau/drm/nouveau: tegra: Call nouveau_drm_device_init()
drm/amdgpu/powerplay: Apply avfs cks-off voltages on VI
drm/amdgpu: update SMC firmware image for polaris10 variants
drm/amdkfd: add new vega20 pci id
drm/amdkfd: add new vega10 pci ids
drm/amdgpu: add some additional vega20 pci ids
drm/amdgpu: add some additional vega10 pci ids
drm/amdgpu: update smu firmware images for VI variants (v2)
drm/i915: Introduce per-engine workarounds
...
Linus Torvalds [Fri, 14 Dec 2018 00:35:58 +0000 (16:35 -0800)]
Merge tag 'xarray-4.20-rc7' of git://git.infradead.org/users/willy/linux-dax
Pull XArray fixes from Matthew Wilcox:
"Two bugfixes, each with test-suite updates, two improvements to the
test-suite without associated bugs, and one patch adding a missing
API"
* tag 'xarray-4.20-rc7' of git://git.infradead.org/users/willy/linux-dax:
XArray: Fix xa_alloc when id exceeds max
XArray tests: Check iterating over multiorder entries
XArray tests: Handle larger indices more elegantly
XArray: Add xa_cmpxchg_irq and xa_cmpxchg_bh
radix tree: Don't return retry entries from lookup
Linus Torvalds [Thu, 13 Dec 2018 20:57:21 +0000 (12:57 -0800)]
Merge tag 'linux-kselftest-4.20-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull Kselftest fix from Shuah Khan:
"A single fix for a seccomp test from Kees Cook."
* tag 'linux-kselftest-4.20-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests/seccomp: Remove SIGSTOP si_pid check
Matthew Wilcox [Thu, 13 Dec 2018 18:57:42 +0000 (13:57 -0500)]
XArray: Fix xa_alloc when id exceeds max
Specifying a starting ID greater than the maximum ID isn't something
attempted very often, but it should fail. It was succeeding due to
xas_find_marked() returning the wrong error state, so add tests for
both xa_alloc() and xas_find_marked().
Fixes: b803b42823d0 ("xarray: Add XArray iterators") Signed-off-by: Matthew Wilcox <willy@infradead.org>
Linus Torvalds [Thu, 13 Dec 2018 19:02:23 +0000 (11:02 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
"We have 5 small fixes for this pull request. One is a performance
regression, so not necessarily strictly a fix, but it was small and
reasonable and claimed to avoid thrashing in the scheduler, so I took
it. The remaining are all legitimate fixes that match the "we take
fixes any time" criteria.
Summary:
- One performance regression for hfi1
- One kasan fix for hfi1
- A couple mlx5 fixes
- A core oops fix"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/core: Fix oops in netdev_next_upper_dev_rcu()
IB/mlx5: Block DEVX umem from the non applicable cases
IB/mlx5: Fix implicit ODP interrupted page fault
IB/hfi1: Fix an out-of-bounds access in get_hw_stats
IB/hfi1: Fix a latency issue for small messages
Linus Torvalds [Thu, 13 Dec 2018 18:59:02 +0000 (10:59 -0800)]
Merge tag 'mmc-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull mmc fixes from Ulf Hansson:
"MMC core:
- Fixup RPMB requests to use mrq->sbc when sending CMD23
MMC host:
- omap: Fix broken MMC/SD on OMAP15XX/OMAP5910/OMAP310
- sdhci-omap: Fix DCRC error handling during tuning
- sdhci: Fixup the timeout check window for clock and reset"
* tag 'mmc-v4.20-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci: fix the timeout check window for clock and reset
mmc: sdhci-omap: Fix DCRC error handling during tuning
MMC: OMAP: fix broken MMC on OMAP15XX/OMAP5910/OMAP310
mmc: core: use mrq->sbc when sending CMD23 for RPMB
Linus Torvalds [Thu, 13 Dec 2018 18:54:13 +0000 (10:54 -0800)]
Merge tag 'sound-4.20-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Only usual suspects here: a few more fixups for Realtek HD-audio on
various PCs, including a regression fix in the previous fix for Lenovo
X1 Carbon, as well as a typo fix in the recent Fireface patch"
* tag 'sound-4.20-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda/realtek: Enable audio jacks of ASUS UX433FN/UX333FA with ALC294
ALSA: hda/realtek: Enable audio jacks of ASUS UX533FD with ALC294
ALSA: hda/realtek: ALC294 mic and headset-mode fixups for ASUS X542UN
ALSA: fireface: fix reference to wrong register for clock configuration
ALSA: hda/realtek - Fix the mute LED regresion on Lenovo X1 Carbon
ALSA: hda/realtek - Fixed headphone issue for ALC700
Thomas Hellstrom [Wed, 12 Dec 2018 10:52:08 +0000 (11:52 +0100)]
drm/vmwgfx: Protect from excessive execbuf kernel memory allocations v3
With the new validation code, a malicious user-space app could
potentially submit command streams with enough buffer-object and resource
references in them to have the resulting allocated validion nodes and
relocations make the kernel run out of GFP_KERNEL memory.
Protect from this by having the validation code reserve TTM graphics
memory when allocating.
Linus Torvalds [Thu, 13 Dec 2018 02:29:24 +0000 (18:29 -0800)]
Merge tag 'for-4.20/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix DM cache metadata to verify that a cache has block before trying
to continue with operation that requires them.
- Fix bio-based DM core's dm_make_request() to properly impose device
limits on individual bios by making use of blk_queue_split().
- Fix long-standing race with how DM thinp notified userspace of
thin-pool mode state changes before they were actually made.
- Fix the zoned target's bio completion handling; this is a fairly
invassive fix at this stage but it is localized to the zoned target.
Any zoned target users will benefit from this fix.
* tag 'for-4.20/dm-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm thin: bump target version
dm thin: send event about thin-pool state change _after_ making it
dm zoned: Fix target BIO completion handling
dm: call blk_queue_split() to impose device limits on bios
dm cache metadata: verify cache has blocks in blocks_are_clean_separate_dirty()
Linus Torvalds [Thu, 13 Dec 2018 02:24:32 +0000 (18:24 -0800)]
Merge tag 'media/v4.20-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- one regression at vsp1 driver
- some last time changes for the upcoming request API logic and for
stateless codec support. As the stateless codec "cedrus" driver is at
staging, don't apply the MPEG controls as part of the main V4L2 API,
as those may not be ready for production yet.
* tag 'media/v4.20-5' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: Add a Kconfig option for the Request API
media: extended-controls.rst: add note to the MPEG2 state controls
media: mpeg2-ctrls.h: move MPEG2 state controls to non-public header
media: vicodec: set state resolution from raw format
media: vivid: drop v4l2_ctrl_request_complete() from start_streaming
media: vb2: don't unbind/put the object when going to state QUEUED
media: vb2: keep a reference to the request until dqbuf
media: vb2: skip request checks for VIDIOC_PREPARE_BUF
media: vb2: don't call __vb2_queue_cancel if vb2_start_streaming failed
media: cedrus: Fix a NULL vs IS_ERR() check
media: vsp1: Fix LIF buffer thresholds
Linus Torvalds [Thu, 13 Dec 2018 02:19:44 +0000 (18:19 -0800)]
Merge tag 'ovl-fixes-4.20-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs
Pull overlayfs fixes from Miklos Szeredi:
"Needed to revert a patch, because it possibly introduces a security
hole. Since the patch is basically a conceptual cleanup, not a bug
fix, it's safe to revert. I'm not giving up on this, and discussions
seemed to have reached an agreement over how to move forward, but that
can wait 'till the next release.
The other two patches are fixes for bugs introduced in recent
releases"
* tag 'ovl-fixes-4.20-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/vfs:
Revert "ovl: relax permission checking on underlying layers"
ovl: fix decode of dir file handle with multi lower layers
ovl: fix missing override creds in link of a metacopy upper
Linus Torvalds [Thu, 13 Dec 2018 02:17:35 +0000 (18:17 -0800)]
Merge tag 'fuse-fixes-4.20-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse
Pull fuse fixes from Miklos Szeredi:
"There's one patch fixing a minor but long lived bug, the others are
fixing regressions introduced in this cycle"
* tag 'fuse-fixes-4.20-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
fuse: continue to send FUSE_RELEASEDIR when FUSE_OPEN returns ENOSYS
fuse: Fix memory leak in fuse_dev_free()
fuse: fix revalidation of attributes for permission check
fuse: fix fsync on directory
fuse: Add bad inode check in fuse_destroy_inode()
Linus Torvalds [Thu, 13 Dec 2018 02:15:29 +0000 (18:15 -0800)]
Merge tag 'trace-v4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"While running various ftrace tests on new development code, the
kmemleak detector found some allocations that were not freed
correctly.
This fixes a couple of leaks in the event trigger code as well as in
adding function trace filters in trace instances"
* tag 'trace-v4.20-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix memory leak of instance function hash filters
tracing: Fix memory leak in set_trigger_filter()
tracing: Fix memory leak in create_filter()
Daniel Vetter [Mon, 10 Dec 2018 10:30:01 +0000 (11:30 +0100)]
MAINTAINERS: Daniel for drm co-maintainer
lkml and Linus gained a CoC, and it's serious this time. Which means
my no 1 reason for declining to officially step up as drm maintainer
is gone, and I didn't find any new good excuse.
I chatted with a few people in private already, and the biggest
concern is that I mislay my community hat and start running around
with my intel hat only. Or some other convenient abuse of trust.
That's why this patch doesn't just need a lot of acks that mean "yeah
seems fine to me", but a lot of acks that mean "yeah we'll tell you
when you're over the line and usurp you from that comfy chair if you
don't get it". Which I think we've been done a fairly good job here at
dri-devel in general, but better to be clear.
Rough idea is that I'll do this for maybe 2-3 years, helping Dave
figure out a group model for drm overall. And getting the tooling and
infrastructure for that off the ground. Then step down again because
some other shiny thing that needs chasing. Of course as plans tend to
do, this one will probably pan out a bit different in reality.
Cc: David Airlie <airlied@linux.ie> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Daniel Vetter <daniel.vetter@intel.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Acked-by: Eric Anholt <eric@anholt.net> Acked-by: Gerd Hoffmann <kraxel@redhat.com> Acked-by: Daniel Stone <daniels@collabora.com> Signed-off-by: Daniel Vetter <daniel@ffwll.ch> Acked-by: Neil Armstrong <narmstrong@baylibre.com> Acked-by: Thierry Reding <treding@nvidia.com> Acked-by: Thomas Hellstrom <thellstrom@vmware.com> Acked-by: Sean Paul <sean@poorly.run> Signed-off-by: Dave Airlie <airlied@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181210103001.30549-1-daniel.vetter@ffwll.ch
Mark Zhang [Wed, 5 Dec 2018 13:50:49 +0000 (15:50 +0200)]
IB/core: Fix oops in netdev_next_upper_dev_rcu()
When support for bonding of RoCE devices was added, there was
necessarily a link between the RoCE device and the paired netdevice that
was part of the bond. If you remove the mlx4_en module, that paired
association is broken (the RoCE device is still present but the paired
netdevice has been released). We need to account for this in
is_upper_ndev_bond_master_filter() and filter out those links with a
broken pairing or else we later oops in netdev_next_upper_dev_rcu().
Fixes: 408f1242d940 ("IB/core: Delete lower netdevice default GID entries in bonding scenario") Signed-off-by: Mark Zhang <markz@mellanox.com> Reviewed-by: Parav Pandit <parav@mellanox.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
Mike Snitzer [Wed, 12 Dec 2018 14:39:54 +0000 (09:39 -0500)]
dm thin: bump target version
Decoupled version bump from commit f6c367585d0 ("dm thin: send event
about thin-pool state change _after_ making it") because version bumps
just create conflicts when backporting to the stable trees.
Colin Ian King [Thu, 4 Oct 2018 17:49:53 +0000 (18:49 +0100)]
drm/vmwgfx: remove redundant return ret statement
The return statement is redundant as there is a return statement
immediately before it so we have dead code that can be removed.
Also remove the unused declaration of ret.
Detected by CoverityScan, CID#1473793 ("Structurally dead code")
Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com>
Chris Wilson [Fri, 7 Dec 2018 13:40:37 +0000 (13:40 +0000)]
drm/i915: Flush GPU relocs harder for gen3
Adding an extra MI_STORE_DWORD_IMM to the gpu relocation path for gen3
was good, but still not good enough. To survive 24+ hours under test we
needed to perform not one, not two but three extra store-dw. Doing so
for each GPU relocation was a little unsightly and since we need to
worry about userspace hitting the same issues, we should apply the dummy
store-dw into the EMIT_FLUSH.
Chris Wilson [Tue, 4 Dec 2018 14:15:16 +0000 (14:15 +0000)]
drm/i915: Allocate a common scratch page
Currently we allocate a scratch page for each engine, but since we only
ever write into it for post-sync operations, it is not exposed to
userspace nor do we care for coherency. As we then do not care about its
contents, we can use one page for all, reducing our allocations and
avoid complications by not assuming per-engine isolation.
For later use, it simplifies engine initialisation (by removing the
allocation that required struct_mutex!) and means that we can always rely
on there being a scratch page.
v2: Check that we allocated a large enough scratch for I830 w/a
Fixes: 06e562e7f515 ("drm/i915/ringbuffer: Delay after EMIT_INVALIDATE for gen4/gen5") # v4.18.20
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=108850 Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181204141522.13640-1-chris@chris-wilson.co.uk Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: <stable@vger.kernel.org> # v4.18.20+
(cherry picked from commit 5179749925933575a67f9d8f16d0cc204f98a29f)
[Joonas: Use new function in gen9_init_indirectctx_bb too] Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Chris Wilson [Thu, 6 Dec 2018 08:44:31 +0000 (08:44 +0000)]
drm/i915/execlists: Apply a full mb before execution for Braswell
Braswell is really picky about having our writes posted to memory before
we execute or else the GPU may see stale values. A wmb() is insufficient
as it only ensures the writes are visible to other cores, we need a full
mb() to ensure the writes are in memory and visible to the GPU.
The most frequent failure in flushing before execution is that we see
stale PTE values and execute the wrong pages.
References: 987abd5c62f9 ("drm/i915/execlists: Force write serialisation into context image vs execution") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: stable@vger.kernel.org Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181206084431.9805-3-chris@chris-wilson.co.uk
(cherry picked from commit 490b8c65b9db45896769e1095e78725775f47b3e) Signed-off-by: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Lyude Paul [Tue, 11 Dec 2018 23:56:20 +0000 (18:56 -0500)]
drm/nouveau/kms: Fix memory leak in nv50_mstm_del()
Noticed this while working on redoing the reference counting scheme in
the DP MST helpers. Nouveau doesn't attempt to call
drm_dp_mst_topology_mgr_destroy() at all, which leaves it leaking all of
the resources for drm_dp_mst_topology_mgr and it's children mstbs+ports.
Fixes: f479c0ba4a17 ("drm/nouveau/kms/nv50: initial support for DP 1.2 multi-stream") Signed-off-by: Lyude Paul <lyude@redhat.com> Cc: <stable@vger.kernel.org> # v4.10+ Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Kees Cook [Thu, 6 Dec 2018 23:50:38 +0000 (15:50 -0800)]
selftests/seccomp: Remove SIGSTOP si_pid check
Commit f149b3155744 ("signal: Never allocate siginfo for SIGKILL or SIGSTOP")
means that the seccomp selftest cannot check si_pid under SIGSTOP anymore.
Since it's believed[1] there are no other userspace things depending on the
old behavior, this removes the behavioral check in the selftest, since it's
more a "extra" sanity check (which turns out, maybe, not to have been
useful to test).
block: Fix null_blk_zoned creation failure with small number of zones
null_blk_zoned creation fails if the number of zones specified is equal to or is
smaller than 64 due to a memory allocation failure in blk_alloc_zones(). With
such a small number of zones, the required memory size for all zones descriptors
fits in a single page, and the page order for alloc_pages_node() is zero. Allow
this value in blk_alloc_zones() for the allocation to succeed.
Fixes: bf5054569653 "block: Introduce blk_revalidate_disk_zones()" Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Chad Austin [Mon, 10 Dec 2018 18:54:52 +0000 (10:54 -0800)]
fuse: continue to send FUSE_RELEASEDIR when FUSE_OPEN returns ENOSYS
When FUSE_OPEN returns ENOSYS, the no_open bit is set on the connection.
Because the FUSE_RELEASE and FUSE_RELEASEDIR paths share code, this
incorrectly caused the FUSE_RELEASEDIR request to be dropped and never sent
to userspace.
Pass an isdir bool to distinguish between FUSE_RELEASE and FUSE_RELEASEDIR
inside of fuse_file_put.
Fixes: 7678ac50615d ("fuse: support clients that don't implement 'open'") Cc: <stable@vger.kernel.org> # v3.14 Signed-off-by: Chad Austin <chadaustin@fb.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com>
Mike Snitzer [Tue, 11 Dec 2018 18:31:40 +0000 (13:31 -0500)]
dm thin: send event about thin-pool state change _after_ making it
Sending a DM event before a thin-pool state change is about to happen is
a bug. It wasn't realized until it became clear that userspace response
to the event raced with the actual state change that the event was
meant to notify about.
Fix this by first updating internal thin-pool state to reflect what the
DM event is being issued about. This fixes a long-standing racey/buggy
userspace device-mapper-test-suite 'resize_io' test that would get an
event but not find the state it was looking for -- so it would just go
on to hang because no other events caused the test to reevaluate the
thin-pool's state.
Cc: stable@vger.kernel.org Signed-off-by: Mike Snitzer <snitzer@redhat.com>
The reason is that the hashes that hold the filters to set_ftrace_filter and
set_ftrace_notrace are not freed if they contain any data on the instance
and the instance is removed.
Found by kmemleak detector.
Cc: stable@vger.kernel.org Fixes: 591dffdade9f ("ftrace: Allow for function tracing instance to filter functions") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
When create_event_filter() fails in set_trigger_filter(), the filter may
still be allocated and needs to be freed. The caller expects the
data->filter to be updated with the new filter, even if the new filter
failed (we could add an error message by setting set_str parameter of
create_event_filter(), but that's another update).
But because the error would just exit, filter was left hanging and
nothing could free it.
Found by kmemleak detector.
Cc: stable@vger.kernel.org Fixes: bac5fb97a173a ("tracing: Add and use generic set_trigger_filter() implementation") Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
The create_filter() calls create_filter_start() which allocates a
"parse_error" descriptor, but fails to call create_filter_finish() that
frees it.
The op_stack and inverts in predicate_parse() were also not freed.
Found by kmemleak detector.
Cc: stable@vger.kernel.org Fixes: 80765597bc587 ("tracing: Rewrite filter logic to be simpler and faster") Reviewed-by: Tom Zanussi <tom.zanussi@linux.intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Jeff Moyer [Tue, 11 Dec 2018 17:37:49 +0000 (12:37 -0500)]
aio: fix spectre gadget in lookup_ioctx
Matthew pointed out that the ioctx_table is susceptible to spectre v1,
because the index can be controlled by an attacker. The below patch
should mitigate the attack for all of the aio system calls.
Cc: stable@vger.kernel.org Reported-by: Matthew Wilcox <willy@infradead.org> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Luis Henriques [Mon, 10 Dec 2018 10:23:12 +0000 (10:23 +0000)]
ceph: make 'nocopyfrom' a default mount option
Since we found a problem with the 'copy-from' operation after objects have
been truncated, offloading object copies to OSDs should be discouraged
until the issue is fixed.
Thus, this patch adds the 'nocopyfrom' mount option to the default mount
options which effectily means that remote copies won't be done in
copy_file_range unless they are explicitly enabled at mount time.
drm/amdgpu: Fix DEBUG_LOCKS_WARN_ON(depth <= 0) in amdgpu_ctx.lock
If CS is submitted using guilty ctx, we terminate amdgpu_cs_parser_init
before locking ctx->lock, latter in amdgpu_cs_parser_fini we still are
trying to release the lock just becase parser->ctx != NULL.
Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
It causes new warnings [1] on shutdown when running the Google Kevin or
Scarlet (RK3399) boards under Chrome OS. Presumably our usage of DRM is
different than what Marc and Heiko test.
We're looking at a different approach (e.g., [2]) to replace this, but
IMO the revert should be taken first, as it already propagated to
-stable.
We need to invalidate the caches *before* clearing the buffer via the
non-cacheable alias, else in the worst case __dma_flush_area() may
write back dirty lines over the top of our nice new zeros.
Fixes: dd65a941f6ba ("arm64: dma-mapping: clear buffers allocated with FORCE_CONTIGUOUS flag") Cc: <stable@vger.kernel.org> # 4.18.x- Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
As part of commit cfea88a4d866 ("drm/nouveau: Start using new drm_dev
initialization helpers"), the initialization of the Nouveau DRM device
was reworked and along the way the platform driver initialization was
left incomplete. Add a call to nouveau_drm_device_init() to make sure
all of the structures are properly initialized.
Signed-off-by: Thierry Reding <treding@nvidia.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Tested-by: Marcel Ziswiler <marcel.ziswiler@toradex.com> Signed-off-by: Ben Skeggs <bskeggs@redhat.com>
Adding brackets allows to multiply the register value,
masked by TS1_RAMP_COEFF_MASK, by an ADJUST value
properly and not to multiply ADJUST by register value and
then mask the whole.
Fixes: 1d693155 ("thermal: add stm32 thermal driver") Reviewed-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: David Hernandez Sanchez <david.hernandezsanchez@st.com> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Daniel Lezcano [Fri, 30 Nov 2018 08:00:32 +0000 (09:00 +0100)]
thermal/drivers/hisi: Fix number of sensors on hi3660
Without this patch the thermal driver is broken on hi3660.
The dual sensors support patchset was partially merged, unfortunately
the dual thermal zones definition is not available in the DT yet, so
when the driver tries to register all the sensors that fails.
By reducing to 1 the number of sensors on the hi3660, we switch back
to the previous functionnality.
Fixes: 8c6c36846f11 (thermal/drivers/hisi: Add the dual clusters sensors for hi3660) Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Linus Torvalds [Mon, 10 Dec 2018 19:04:41 +0000 (11:04 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID subsystem fixes from Jiri Kosina:
- two device-specific quirks from Hans de Goede and Nic Soudée
- reintroduction of (mistakenly remocved) ABS_RESERVED from Peter
Hutterer
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
Input: restore EV_ABS ABS_RESERVED
HID: quirks: fix RetroUSB.com devices
HID: ite: Add USB id match for another ITE based keyboard rfkill key quirk
Linus Torvalds [Mon, 10 Dec 2018 17:06:22 +0000 (09:06 -0800)]
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Olof Johansson:
"The usual batch; most of them are DT tweaks to fix misdescribed
hardware. Beyond that:
- A bugfix for MMP2 CPU detection, it's been there quite a while but
makes sense to fix now anyway.
- Some power management tweaks:
+ disabling of CPU idle power state on Marvell Armada 7K/8K
(Macchiatobin et al)
+ Increase of minimum voltage on BananaPi M3
+ Tweak of power ramp time for DVFS on NXP/Freescale i.MX7SX
- A couple of MAINTAINER updates:
+ MMP has a new volunteer to look after it
+ Mediatek adds a few keywords, IRC channel and wiki URL"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: dts: imx7d-nitrogen7: Fix the description of the Wifi clock
ARM: imx: update the cpu power up timing setting on i.mx6sx
Revert "arm64: dts: marvell: add CPU Idle power state support on Armada 7K/8K"
ARM: dts: imx7d-pico: Describe the Wifi clock
ARM: dts: realview: Fix some more duplicate regulator nodes
MAINTAINERS: update entry for MMP platform
ARM: mmp/mmp2: fix cpu_is_mmp2() on mmp2-dt
MAINTAINERS: mediatek: Update SoC entry
ARM: dts: bcm2837: Fix polarity of wifi reset GPIOs
arm64: dts: mt7622: Drop the general purpose timer node
arm64: dts: mt7622: fix no more console output on BPI-R64 board
arm64: dts: mt7622: fix no more console output on rfb1
ARM: dts: sun8i: a83t: bananapi-m3: increase vcc-pd voltage to 3.3V
backlight: pwm_bl: Fix brightness levels for non-DT case.
Commit '88ba95bedb79 ("backlight: pwm_bl: Compute brightness of LED
linearly to human eye")' allows the possibility to compute a default
brightness table when there isn't the brightness-levels property in the
DT. Unfortunately the changes made broke the pwm backlight for the
non-DT boards.
Usually, the non-DT boards don't pass the brightness levels via platform
data, instead, it sets the max_brightness in their platform data and the
driver calculates the level without a table. The offending patch assumed
that when there is no brightness levels table we should create one, but this
is clearly wrong for the non-DT case.
After this patch the code handles the DT and the non-DT case taking in
consideration also if max_brightness is set or not.
Fixes: 88ba95bedb79 ("backlight: pwm_bl: Compute brightness of LED linearly to human eye") Reported-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Tested-by: Robert Jarzmik <robert.jarzmik@free.fr> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Lee Jones <lee.jones@linaro.org>
Jian-Hong Pan [Fri, 7 Dec 2018 09:17:13 +0000 (17:17 +0800)]
ALSA: hda/realtek: Enable audio jacks of ASUS UX433FN/UX333FA with ALC294
The ASUS UX433FN and UX333FA with ALC294 cannot detect the headset MIC
and output through the internal speaker and the headphone until
ALC294_FIXUP_ASUS_SPK and ALC294_FIXUP_ASUS_HEADSET_MIC quirk applied.
Signed-off-by: Daniel Drake <drake@endlessm.com> Signed-off-by: Jian-Hong Pan <jian-hong@endlessm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de>