ocfs2: fix start offset to ocfs2_zero_range_for_truncate()
If we punch a hole on a reflink such that following conditions are met:
1. start offset is on a cluster boundary
2. end offset is not on a cluster boundary
3. (end offset is somewhere in another extent) or
(hole range > MAX_CONTIG_BYTES(1MB)),
we dont COW the first cluster starting at the start offset. But in this
case, we were wrongly passing this cluster to
ocfs2_zero_range_for_truncate() to zero out. This will modify the
cluster in place and zero it in the source too.
Fix this by skipping this cluster in such a scenario.
To reproduce:
1. Create a random file of say 10 MB
xfs_io -c 'pwrite -b 4k 0 10M' -f 10MBfile
2. Reflink it
reflink -f 10MBfile reflnktest
3. Punch a hole at starting at cluster boundary with range greater that
1MB. You can also use a range that will put the end offset in another
extent.
fallocate -p -o 0 -l 1048615 reflnktest
4. sync
5. Check the first cluster in the source file. (It will be zeroed out).
dd if=10MBfile iflag=direct bs=<cluster size> count=1 | hexdump -C
Link: http://lkml.kernel.org/r/1470957147-14185-1-git-send-email-ashish.samant@oracle.com Signed-off-by: Ashish Samant <ashish.samant@oracle.com> Reported-by: Saar Maoz <saar.maoz@oracle.com> Reviewed-by: Srinivas Eeda <srinivas.eeda@oracle.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Joseph Qi <joseph.qi@huawei.com> Cc: Eric Ren <zren@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Mon, 19 Sep 2016 21:44:38 +0000 (14:44 -0700)]
cgroup: duplicate cgroup reference when cloning sockets
When a socket is cloned, the associated sock_cgroup_data is duplicated
but not its reference on the cgroup. As a result, the cgroup reference
count will underflow when both sockets are destroyed later on.
Fixes: bd1060a1d671 ("sock, cgroup: add sock->sk_cgroup") Link: http://lkml.kernel.org/r/20160914194846.11153-2-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Mon, 19 Sep 2016 21:44:36 +0000 (14:44 -0700)]
mm: memcontrol: make per-cpu charge cache IRQ-safe for socket accounting
During cgroup2 rollout into production, we started encountering css
refcount underflows and css access crashes in the memory controller.
Splitting the heavily shared css reference counter into logical users
narrowed the imbalance down to the cgroup2 socket memory accounting.
The problem turns out to be the per-cpu charge cache. Cgroup1 had a
separate socket counter, but the new cgroup2 socket accounting goes
through the common charge path that uses a shared per-cpu cache for all
memory that is being tracked. Those caches are safe against scheduling
preemption, but not against interrupts - such as the newly added packet
receive path. When cache draining is interrupted by network RX taking
pages out of the cache, the resuming drain operation will put references
of in-use pages, thus causing the imbalance.
Disable IRQs during all per-cpu charge cache operations.
Fixes: f7e1cb6ec51b ("mm: memcontrol: account socket memory in unified hierarchy memory controller") Link: http://lkml.kernel.org/r/20160914194846.11153-1-hannes@cmpxchg.org Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: <stable@vger.kernel.org> [4.5+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Joseph Qi [Mon, 19 Sep 2016 21:44:33 +0000 (14:44 -0700)]
ocfs2: fix double unlock in case retry after free truncate log
If ocfs2_reserve_cluster_bitmap_bits() fails with ENOSPC, it will try to
free truncate log and then retry. Since ocfs2_try_to_free_truncate_log
will lock/unlock global bitmap inode, we have to unlock it before
calling this function. But when retry reserve and it fails with no
global bitmap inode lock taken, it will unlock again in error handling
branch and BUG.
This issue also exists if no need retry and then ocfs2_inode_lock fails.
So fix it.
Fixes: 2070ad1aebff ("ocfs2: retry on ENOSPC if sufficient space in truncate log") Link: http://lkml.kernel.org/r/57D91939.6030809@huawei.com Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Jiufei Xue <xuejiufei@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Jan Kara [Mon, 19 Sep 2016 21:44:30 +0000 (14:44 -0700)]
fanotify: fix list corruption in fanotify_get_response()
fanotify_get_response() calls fsnotify_remove_event() when it finds that
group is being released from fanotify_release() (bypass_perm is set).
However the event it removes need not be only in the group's notification
queue but it can have already moved to access_list (userspace read the
event before closing the fanotify instance fd) which is protected by a
different lock. Thus when fsnotify_remove_event() races with
fanotify_release() operating on access_list, the list can get corrupted.
Fix the problem by moving all the logic removing permission events from
the lists to one place - fanotify_release().
Fixes: 5838d4442bd5 ("fanotify: fix double free of pending permission events") Link: http://lkml.kernel.org/r/1473797711-14111-3-git-send-email-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz> Reported-by: Miklos Szeredi <mszeredi@redhat.com> Tested-by: Miklos Szeredi <mszeredi@redhat.com> Reviewed-by: Miklos Szeredi <mszeredi@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Junxiao Bi [Mon, 19 Sep 2016 21:44:24 +0000 (14:44 -0700)]
ocfs2: fix trans extend while free cached blocks
The root cause of this issue is the same with the one fixed by the last
patch, but this time credits for allocator inode and group descriptor
may not be consumed before trans extend.
Junxiao Bi [Mon, 19 Sep 2016 21:44:21 +0000 (14:44 -0700)]
ocfs2: fix trans extend while flush truncate log
Every time, ocfs2_extend_trans() included a credit for truncate log
inode, but as that inode had been managed by jbd2 running transaction
first time, it will not consume that credit until
jbd2_journal_restart().
Since total credits to extend always included the un-consumed ones,
there will be more and more un-consumed credit, at last
jbd2_journal_restart() will fail due to credit number over the half of
max transction credit.
The following error was caught when unlinking a large file with many
extents:
Commit c01d5b300774 ("shmem: get_unmapped_area align huge page") makes
use of shm_get_unmapped_area() in shm_file_operations() unconditional to
CONFIG_MMU.
As Tony Battersby pointed this can lead NULL-pointer dereference on
machine with CONFIG_MMU=y and CONFIG_SHMEM=n. In this case ipc/shm is
backed by ramfs which doesn't provide f_op->get_unmapped_area for
configurations with MMU.
The solution is to provide dummy f_op->get_unmapped_area for ramfs when
CONFIG_MMU=y, which just call current->mm->get_unmapped_area().
Fixes: c01d5b300774 ("shmem: get_unmapped_area align huge page") Link: http://lkml.kernel.org/r/20160912102704.140442-1-kirill.shutemov@linux.intel.com Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Tony Battersby <tonyb@cybernetics.com> Tested-by: Tony Battersby <tonyb@cybernetics.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> [4.7.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 62c230bc1790 ("mm: add support for a filesystem to activate
swap files and use direct_IO for writing swap pages") replaced the
swap_aops dirty hook from __set_page_dirty_no_writeback() with
swap_set_page_dirty().
For normal cases without these special SWP flags code path falls back to
__set_page_dirty_no_writeback() so the behaviour is expected to be the
same as before.
But swap_set_page_dirty() makes use of the page_swap_info() helper to
get the swap_info_struct to check for the flags like SWP_FILE,
SWP_BLKDEV etc as desired for those features. This helper has
BUG_ON(!PageSwapCache(page)) which is racy and safe only for the
set_page_dirty_lock() path.
For the set_page_dirty() path which is often needed for cases to be
called from irq context, kswapd() can toggle the flag behind the back
while the call is getting executed when system is low on memory and
heavy swapping is ongoing.
This ends up with undesired kernel panic.
This patch just moves the check outside the helper to its users
appropriately to fix kernel panic for the described path. Couple of
users of helpers already take care of SwapCache condition so I skipped
them.
Link: http://lkml.kernel.org/r/1473460718-31013-1-git-send-email-santosh.shilimkar@oracle.com Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Joe Perches <joe@perches.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Rik van Riel <riel@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Jens Axboe <axboe@fb.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> [4.7.x] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ian Kent [Mon, 19 Sep 2016 21:44:12 +0000 (14:44 -0700)]
autofs: use dentry flags to block walks during expire
Somewhere along the way the autofs expire operation has changed to hold
a spin lock over expired dentry selection. The autofs indirect mount
expired dentry selection is complicated and quite lengthy so it isn't
appropriate to hold a spin lock over the operation.
Commit 47be61845c77 ("fs/dcache.c: avoid soft-lockup in dput()") added a
might_sleep() to dput() causing a WARN_ONCE() about this usage to be
issued.
But the spin lock doesn't need to be held over this check, the autofs
dentry info. flags are enough to block walks into dentrys during the
expire.
I've left the direct mount expire as it is (for now) because it is much
simpler and quicker than the indirect mount expire and adding spin lock
release and re-aquires would do nothing more than add overhead.
Fixes: 47be61845c77 ("fs/dcache.c: avoid soft-lockup in dput()") Link: http://lkml.kernel.org/r/20160912014017.1773.73060.stgit@pluto.themaw.net Signed-off-by: Ian Kent <raven@themaw.net> Reported-by: Takashi Iwai <tiwai@suse.de> Tested-by: Takashi Iwai <tiwai@suse.de> Cc: Takashi Iwai <tiwai@suse.de> Cc: NeilBrown <neilb@suse.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
dump_page() uses page_mapcount() to get mapcount of the page.
page_mapcount() has VM_BUG_ON_PAGE(PageSlab(page)) as mapcount doesn't
make sense for slab pages and the field in struct page used for other
information.
It leads to recursion if dump_page() called for slub page and DEBUG_VM
is enabled:
mm, thp: fix leaking mapped pte in __collapse_huge_page_swapin()
Currently, khugepaged does not permit swapin if there are enough young
pages in a THP. The problem is when a THP does not have enough young
pages, khugepaged leaks mapped ptes.
This patch prohibits leaking mapped ptes.
Link: http://lkml.kernel.org/r/1472820276-7831-1-git-send-email-ebru.akagunduz@gmail.com Signed-off-by: Ebru Akagunduz <ebru.akagunduz@gmail.com> Suggested-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Andrea Arcangeli <aarcange@redhat.com> Reviewed-by: Rik van Riel <riel@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
khugepaged: fix use-after-free in collapse_huge_page()
hugepage_vma_revalidate() tries to re-check if we still should try to
collapse small pages into huge one after the re-acquiring mmap_sem.
The problem Dmitry Vyukov reported[1] is that the vma found by
hugepage_vma_revalidate() can be suitable for huge pages, but not the
same vma we had before dropping mmap_sem. And dereferencing original
vma can lead to fun results..
Let's use vma hugepage_vma_revalidate() found instead of assuming it's the
same as what we had before the lock was dropped.
Joseph Qi [Mon, 19 Sep 2016 21:43:55 +0000 (14:43 -0700)]
ocfs2/dlm: fix race between convert and migration
Commit ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery")
checks if lockres master has changed to identify whether new master has
finished recovery or not. This will introduce a race that right after
old master does umount ( means master will change), a new convert
request comes.
In this case, it will reset lockres state to DLM_RECOVERING and then
retry convert, and then fail with lockres->l_action being set to
OCFS2_AST_INVALID, which will cause inconsistent lock level between
ocfs2 and dlm, and then finally BUG.
Since dlm recovery will clear lock->convert_pending in
dlm_move_lockres_to_recovery_list, we can use it to correctly identify
the race case between convert and recovery. So fix it.
Fixes: ac7cf246dfdb ("ocfs2/dlm: fix race between convert and recovery") Link: http://lkml.kernel.org/r/57CE1569.8010704@huawei.com Signed-off-by: Joseph Qi <joseph.qi@huawei.com> Signed-off-by: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mfasheh@suse.de> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Li Zhong [Mon, 19 Sep 2016 21:43:52 +0000 (14:43 -0700)]
mem-hotplug: don't clear the only node in new_node_page()
Commit 394e31d2ceb4 ("mem-hotplug: alloc new page from a nearest
neighbor node when mem-offline") introduced new_node_page() for memory
hotplug.
In new_node_page(), the nid is cleared before calling
__alloc_pages_nodemask(). But if it is the only node of the system, and
the first round allocation fails, it will not be able to get memory from
an empty nodemask, and will trigger oom.
The patch checks whether it is the last node on the system, and if it
is, then don't clear the nid in the nodemask.
Fixes: 394e31d2ceb4 ("mem-hotplug: alloc new page from a nearest neighbor node when mem-offline") Link: http://lkml.kernel.org/r/1473044391.4250.19.camel@TP420 Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com> Reported-by: John Allen <jallen@linux.vnet.ibm.com> Acked-by: Vlastimil Babka <vbabka@suse.cz> Cc: Xishi Qiu <qiuxishi@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'usb-4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB fixes from Greg KH:
"Here are two small fixes, and one new device id, for 4.8-rc7
The fixes solve a build error that was reported in your tree for the
blackfin arch, and resolve an issue with a number of broken USB
devices that reported the wrong interval rate. Included here is also
a new device id for the usb-serial driver.
All have been in linux-next with no reported issues"
* tag 'usb-4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
USB: change bInterval default to 10 ms
usb: musb: Fix tusb6010 compile error on blackfin
USB: serial: simple: add support for another Infineon flashloader
Merge tag 'fixes-for-linus-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull uaccess fixes from Guenter Roeck:
"Two patches fixing problems introduced with copy_from_user changes"
* tag 'fixes-for-linus-v4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
openrisc: fix the fix of copy_from_user()
avr32: fix 'undefined reference to `___copy_from_user'
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Thomas Gleixner:
"A couple of small fixes to x86 perf drivers:
- Measure L2 for HW_CACHE* events on AMD
- Fix the address filter handling in the intel/pt driver
- Handle the BTS disabling at the proper place"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
perf/x86/intel/pt: Do validate the size of a kernel address filter
perf/x86/intel/pt: Fix kernel address filter's offset validation
perf/x86/intel/pt: Fix an off-by-one in address filter configuration
perf/x86/intel: Don't disable "intel_bts" around "intel" event batching
Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull irq fixes from Thomas Gleixner:
"Two patches from Boris which address a potential deadlock in the atmel
irq chip driver"
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/atmel-aic: Fix potential deadlock in ->xlate()
genirq: Provide irq_gc_{lock_irqsave,unlock_irqrestore}() helpers
Since commit acb2505d0119 ("openrisc: fix copy_from_user()"),
copy_from_user() returns the number of bytes requested, not the
number of bytes not copied.
avr32: fix 'undefined reference to `___copy_from_user'
avr32 builds fail with:
arch/avr32/kernel/built-in.o: In function `arch_ptrace':
(.text+0x650): undefined reference to `___copy_from_user'
arch/avr32/kernel/built-in.o:(___ksymtab+___copy_from_user+0x0): undefined
reference to `___copy_from_user'
kernel/built-in.o: In function `proc_doulongvec_ms_jiffies_minmax':
(.text+0x5dd8): undefined reference to `___copy_from_user'
kernel/built-in.o: In function `proc_dointvec_minmax_sysadmin':
sysctl.c:(.text+0x6174): undefined reference to `___copy_from_user'
kernel/built-in.o: In function `ptrace_has_cap':
ptrace.c:(.text+0x69c0): undefined reference to `___copy_from_user'
kernel/built-in.o:ptrace.c:(.text+0x6b90): more undefined references to
`___copy_from_user' follow
Merge tag 'mmc-v4.8-rc6' of git://git.linaro.org/people/ulf.hansson/mmc
Pull MMC fixes from Ulf Hansson:
"MMC host:
- omap/omap_hsmmc: Initialize dma_slave_config to avoid random data
- sdhci-st: Handle interconnect clock"
* tag 'mmc-v4.8-rc6' of git://git.linaro.org/people/ulf.hansson/mmc:
mmc: omap: Initialize dma_slave_config to avoid random data in it's fields
mmc: omap_hsmmc: Initialize dma_slave_config to avoid random data
mmc: sdhci-st: Handle interconnect clock
dt-bindings: mmc: sdhci-st: Mention the discretionary "icn" clock
Merge tag 'powerpc-4.8-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Fixes for code merged this cycle:
- Fix restore of SPRs upon wake up from hypervisor state loss from
Gautham R Shenoy
- Fix the state of root PE from Gavin Shan
- Detach from PE on releasing PCI device from Gavin Shan
- Fix size of NUM_CPU_FTR_KEYS on 32-bit
- Fix missed TCE invalidations that should fallback to OPAL"
* tag 'powerpc-4.8-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/powernv/pci: Fix missed TCE invalidations that should fallback to OPAL
powerpc/powernv: Detach from PE on releasing PCI device
powerpc/powernv: Fix the state of root PE
powerpc/kernel: Fix size of NUM_CPU_FTR_KEYS on 32-bit
powerpc/powernv: Fix restore of SPRs upon wake up from hypervisor state loss
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"Small set of cifs fixes"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
Move check for prefix path to within cifs_get_root()
Compare prepaths when comparing superblocks
Fix memory leaks in cifs_do_mount()
Merge tag 'drm-fixes-for-4.8-rc6' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Two sets of i915 fixes, one set of vc4 crasher fixes, and a couple of
atmel fixes.
Nothing too out there at this stage, though I think some people are
holidaying so it's been quiet enough"
* tag 'drm-fixes-for-4.8-rc6' of git://people.freedesktop.org/~airlied/linux:
drm/i915: Ignore OpRegion panel type except on select machines
Revert "drm/i915/psr: Make idle_frames sensible again"
drm/i915: Restore lost "Initialized i915" welcome message
drm/vc4: mark vc4_bo_cache_purge() static
drm/i915: Add GEN7_PCODE_MIN_FREQ_TABLE_GT_RATIO_OUT_OF_RANGE to SNB
drm/i915: disable 48bit full PPGTT when vGPU is active
drm/i915: enable vGPU detection for all
drm/atmel-hlcdc: Make ->reset() implementation static
drm: atmel-hlcdc: Fix vertical scaling
drm/vc4: Allow some more signals to be packed with uniform resets.
drm/i915/dvo: Remove dangling call to drm_encoder_cleanup()
Merge tag 'pm-4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"More annotations of tracepoints in the runtime PM framework to prevent
RCU from complaining when that code is invoked from the idle path
(Paul McKenney)"
* tag 'pm-4.8-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM / runtime: Use _rcuidle for runtime suspend tracepoints
Dave Airlie [Fri, 16 Sep 2016 21:57:55 +0000 (07:57 +1000)]
Merge tag 'drm-vc4-fixes-2016-09-14' of https://github.com/anholt/linux into drm-fixes
This pull request brings in a fix for crashes in X on VC4.
* tag 'drm-vc4-fixes-2016-09-14' of https://github.com/anholt/linux:
drm/vc4: mark vc4_bo_cache_purge() static
drm/vc4: Allow some more signals to be packed with uniform resets.
Dave Airlie [Fri, 16 Sep 2016 21:57:21 +0000 (07:57 +1000)]
Merge tag 'drm-intel-fixes-2016-09-15' of git://anongit.freedesktop.org/drm-intel into drm-fixes
i915 fixes from Jani.
* tag 'drm-intel-fixes-2016-09-15' of git://anongit.freedesktop.org/drm-intel:
drm/i915: Ignore OpRegion panel type except on select machines
Revert "drm/i915/psr: Make idle_frames sensible again"
drm/i915: Restore lost "Initialized i915" welcome message
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma fixes from Doug Ledford:
"Round three of 4.8 rc fixes.
This is likely the last rdma pull request this cycle. The new rxe
driver had a few issues (you probably saw the boot bot bug report) and
they should be addressed now. There are a couple other fixes here,
mainly mlx4. There are still two outstanding issues that need
resolved but I don't think their fix will make this kernel cycle.
Summary:
- Various fixes to rdmavt, ipoib, mlx5, mlx4, rxe"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/rdmavt: Don't vfree a kzalloc'ed memory region
IB/rxe: Fix kmem_cache leak
IB/rxe: Fix race condition between requester and completer
IB/rxe: Fix duplicate atomic request handling
IB/rxe: Fix kernel panic in udp_setup_tunnel
IB/mlx5: Set source mac address in FTE
IB/mlx5: Enable MAD_IFC commands for IB ports only
IB/mlx4: Diagnostic HW counters are not supported in slave mode
IB/mlx4: Use correct subnet-prefix in QP1 mads under SR-IOV
IB/mlx4: Fix code indentation in QP1 MAD flow
IB/mlx4: Fix incorrect MC join state bit-masking on SR-IOV
IB/ipoib: Don't allow MC joins during light MC flush
IB/rxe: fix GFP_KERNEL in spinlock context
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"Here are a couple of bugfixes for v4.8-rc.
Most of them have actually been around for a while this time but for
some reason didn't get applied early on. The shmobile regulator fix
is the only one that isn't completely obvious.
Device tree changes:
- archtimer interrupts must be level triggered (multiple platforms)
- fix for USB and MMC clocks on STiH410
- fix split DT repository in case of raspberry-pi 3
- a new use of skeleton.dtsi on arm64 has crept in after that was
removed.
defconfig updates:
- xilinx vdma has a new Kconfig symbol name
- keystone requires CONFIG_NOP_USB_XCEIV since v4.8-rc1
Code fixes:
- fix regulator quirk on shmobile
- suspend-to-ram regression on EXYNOS
Maintainer updates:
- Javier Martinez Canillas is now a reviewer for Samsung EXYNOS"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
ARM: keystone: defconfig: Fix USB configuration
arm64: dts: Fix broken architected timer interrupt trigger
ARM: multi_v7_defconfig: update XILINX_VDMA
ARM64: dts: bcm: Use a symlink to R-Pi dtsi files from arch=arm
ARM: dts: Remove use of skeleton.dtsi from bcm283x.dtsi
ARM: dts: STiH407-family: Provide interconnect clock for consumption in ST SDHCI
ARM: dts: STiH410: Handle interconnect clock required by EHCI/OHCI (USB)
ARM: shmobile: fix regulator quirk for Gen2
ARM: EXYNOS: Clear OF_POPULATED flag from PMU node in IRQ init callback
MAINTAINERS: Add myself as reviewer for Samsung Exynos support
Merge branch 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM fixes from Russell King:
"Most of this update are fixes primarily discovered from testing on the
older StrongARM 1110 and PXA systems, as a result of recent interest
from several people in these platforms:
- Locomo interrupt handling incorrectly stores the handler data in
the chip's private data slot: when Locomo is combined with an
interrupt controller who's chip uses the chip private data, this
leads to an oops.
- SA1111 was missing a call to clk_disable() to clean up after a
failed probe.
- SA1111 and PCMCIA suspend/resume was broken:
The PCMCIA "ds" layer was using the legacy bus suspend/resume
methods, which the core PM code is no longer calling as a result of
device_pm_check_callbacks() introduced in commit aa8e54b559479
("PM / sleep: Go direct_complete if driver has no callbacks").
SA1111 was broken due to changes to PCMCIA which makes PCMCIA
suspend itself later than the SA1111 code expects, and resume
before the SA1111 code has initialised access to the pcmcia
sub-device.
- the default SA1111 interrupt mask polarity got messed up when it
was converted to use a dynamic interrupt base number for its
interrupts.
- fix platform_get_irq() error code propagation, which was causing
problems on platforms where the interrupt may not be available at
probe time in DT setups.
- fix the lack of clock to PCMCIA code on PXA platforms, which was
omitted in conversions of PXA to CCF.
- fix an oops in the PXA PCMCIA code caused by a previous commit not
realising that Lubbock is different from the rest of the PXA PCMCIA
drivers.
- ensure that SA1111 low-level PCMCIA drivers propagate their error
codes to the main probe function, rather than the driver silently
accepting a failure.
- fix the sa11xx debugfs reporting of timing information, which
always indicated zero due to the clock being a factor of 1000 out.
- fix the polarity of the status change signal reported from the
sockets.
Lastly, one ARM specific commit from Stefan Agner fixing the LPAE
cache attributes"
* 'fixes' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: pxa/lubbock: add pcmcia clock
ARM: locomo: fix locomo irq handling
ARM: 8612/1: LPAE: initialize cache policy correctly
ARM: sa1111: fix missing clk_disable()
ARM: sa1111: fix pcmcia suspend/resume
ARM: sa1111: fix pcmcia interrupt mask polarity
ARM: sa1111: fix error code propagation in sa1111_probe()
pcmcia: lubbock: fix sockets configuration
pcmcia: sa1111: fix propagation of lowlevel board init return code
pcmcia: soc_common: fix SS_STSCHG polarity
pcmcia: sa11xx_base: add units to the timing information
pcmcia: sa11xx_base: fix reporting of timing information
pcmcia: ds: fix suspend/resume
Colin Ian King [Fri, 9 Sep 2016 07:15:37 +0000 (08:15 +0100)]
IB/rdmavt: Don't vfree a kzalloc'ed memory region
The userspace memory region 'mr' is allocated with kzalloc in
__rvt_alloc_mr however it is incorrectly being freed with vfree in
__rvt_free_mr. Fix this by using kfree to free it.
Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
IB/rxe: Fix race condition between requester and completer
rxe_requester() is sending a pkt with rxe_xmit_packet() and
then calls rxe_update() to update the wqe and qp's psn values.
But sometimes the response is received before the requester
had time to update the wqe in which case the completer
acts on errornous wqe values.
This fix updates the wqe and qp before actually sending
the request and rolls back when xmit fails.
Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
When handling ack for atomic opcodes like "fetch&add"
or "cmp&swp", the method send_atomic_ack() saves the ack
before sending it, in case it gets lost and never reach the
requester. In which case the method duplicate_request()
will need to find it using the duplicated request.psn.
But send_atomic_ack() used a wrong psn value and thus
the above ack was never found.
This fix uses the ack.psn to locate the ack in case
its needed.
This fix also copies the ack packet to the skb's control buffer
since duplicate_request() will need it when calling rxe_xmit_packet()
Fixes: 8700e3e7c485 ("Soft RoCE driver") Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
Jack Morgenstein [Mon, 12 Sep 2016 16:16:20 +0000 (19:16 +0300)]
IB/mlx4: Use correct subnet-prefix in QP1 mads under SR-IOV
When sending QP1 MAD packets which use a GRH, the source GID
(which consists of the 64-bit subnet prefix, and the 64 bit port GUID)
must be included in the packet GRH.
For SR-IOV, a GID cache is used, since the source GID needs to be the
slave's source GID, and not the Hypervisor's GID. This cache also
included a subnet_prefix. Unfortunately, the subnet_prefix field in
the cache was never initialized (to the default subnet prefix 0xfe80::0).
As a result, this field remained all zeroes. Therefore, when SR-IOV
was active, all QP1 packets which included a GRH had a source GID
subnet prefix of all-zeroes.
However, the subnet-prefix should initially be 0xfe80::0 (the default
subnet prefix). In addition, if OpenSM modifies a port's subnet prefix,
the new subnet prefix must be used in the GRH when sending QP1 packets.
To fix this we now initialize the subnet prefix in the SR-IOV GID cache
to the default subnet prefix. We update the cached value if/when OpenSM
modifies the port's subnet prefix. We take this cached value when sending
QP1 packets when SR-IOV is active.
Note that the value is stored as an atomic64. This eliminates any need
for locking when the subnet prefix is being updated.
Note also that we depend on the FW generating the "port management change"
event for tracking subnet-prefix changes performed by OpenSM. If running
early FW (before 2.9.4630), subnet prefix changes will not be tracked (but
the default subnet prefix still will be stored in the cache; therefore
users who do not modify the subnet prefix will not have a problem).
IF there is a need for such tracking also for early FW, we will add that
capability in a subsequent patch.
Fixes: 1ffeb2eb8be9 ("IB/mlx4: SR-IOV IB context objects and proxy/tunnel SQP support") Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
Alex Vesker [Mon, 12 Sep 2016 16:16:18 +0000 (19:16 +0300)]
IB/mlx4: Fix incorrect MC join state bit-masking on SR-IOV
Because of an incorrect bit-masking done on the join state bits, when
handling a join request we failed to detect a difference between the
group join state and the request join state when joining as send only
full member (0x8). This caused the MC join request not to be sent.
This issue is relevant only when SRIOV is enabled and SM supports
send only full member.
This fix separates scope bits and join states bits a nibble each.
Fixes: b9c5d6a64358 ('IB/mlx4: Add multicast group (MCG) paravirtualization for SR-IOV') Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
Alex Vesker [Mon, 12 Sep 2016 06:55:28 +0000 (09:55 +0300)]
IB/ipoib: Don't allow MC joins during light MC flush
This fix solves a race between light flush and on the fly joins.
Light flush doesn't set the device to down and unset IPOIB_OPER_UP
flag, this means that if while flushing we have a MC join in progress
and the QP was attached to BC MGID we can have a mismatches when
re-attaching a QP to the BC MGID.
The light flush would set the broadcast group to NULL causing an on
the fly join to rejoin and reattach to the BC MCG as well as adding
the BC MGID to the multicast list. The flush process would later on
remove the BC MGID and detach it from the QP. On the next flush
the BC MGID is present in the multicast list but not found when trying
to detach it because of the previous double attach and single detach.
Fixes: ee1e2c82c245 ("IPoIB: Refresh paths instead of flushing them on SM change events") Signed-off-by: Alex Vesker <valex@mellanox.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Doug Ledford <dledford@redhat.com>
Merge tag 'samsung-fixes-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux into fixes
Pull "ARM: exynos: Fixes for v4.8, secound round" from Krzysztof Kozłowski:
1. A recent change in populating irqchip devices from Device Tree
broke Suspend to RAM on Exynos boards due to lack of probing of
PMU (Power Management Unit) driver. Multiple drivers attach to
the PMU's DT node: irqchip, clock controller and PMU platform
driver for handling suspend. The new irqchip code marked the
PMU's DT node as OF_POPULATED but we need to attach to this
node also PMU platform driver.
2. Add Javier as additional reviewer for Exynos patches.
* tag 'samsung-fixes-4.8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/krzk/linux:
ARM: EXYNOS: Clear OF_POPULATED flag from PMU node in IRQ init callback
MAINTAINERS: Add myself as reviewer for Samsung Exynos support
Alan Stern [Fri, 16 Sep 2016 14:24:26 +0000 (10:24 -0400)]
USB: change bInterval default to 10 ms
Some full-speed mceusb infrared transceivers contain invalid endpoint
descriptors for their interrupt endpoints, with bInterval set to 0.
In the past they have worked out okay with the mceusb driver, because
the driver sets the bInterval field in the descriptor to 1,
overwriting whatever value may have been there before. However, this
approach was never sanctioned by the USB core, and in fact it does not
work with xHCI controllers, because they use the bInterval value that
was present when the configuration was installed.
Currently usbcore uses 32 ms as the default interval if the value in
the endpoint descriptor is invalid. It turns out that these IR
transceivers don't work properly unless the interval is set to 10 ms
or below. To work around this mceusb problem, this patch changes the
endpoint-descriptor parsing routine, making the default interval value
be 10 ms rather than 32 ms.
Tony Lindgren [Fri, 16 Sep 2016 14:24:44 +0000 (09:24 -0500)]
usb: musb: Fix tusb6010 compile error on blackfin
We have CONFIG_BLACKFIN ifdef redefining all musb registers in
musb_regs.h and tusb6010.h is never included causing a build
error with blackfin-allmodconfig and COMPILE_TEST.
Let's fix the issue by not building tusb6010 if CONFIG_BLACKFIN
is selected.
Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Bin Liu <b-liu@ti.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Matt Fleming [Wed, 24 Aug 2016 13:12:08 +0000 (14:12 +0100)]
perf/x86/amd: Make HW_CACHE_REFERENCES and HW_CACHE_MISSES measure L2
While the Intel PMU monitors the LLC when perf enables the
HW_CACHE_REFERENCES and HW_CACHE_MISSES events, these events monitor
L1 instruction cache fetches (0x0080) and instruction cache misses
(0x0081) on the AMD PMU.
This is extremely confusing when monitoring the same workload across
Intel and AMD machines, since parameters like,
$ perf stat -e cache-references,cache-misses
measure completely different things.
Instead, make the AMD PMU measure instruction/data cache and TLB fill
requests to the L2 and instruction/data cache and TLB misses in the L2
when HW_CACHE_REFERENCES and HW_CACHE_MISSES are enabled,
respectively. That way the events measure unified caches on both
platforms.
Signed-off-by: Matt Fleming <matt@codeblueprint.co.uk> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1472044328-21302-1-git-send-email-matt@codeblueprint.co.uk Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf/x86/intel/pt: Do validate the size of a kernel address filter
Right now, the kernel address filters in PT are prone to integer overflow
that may happen in adding filter's size to its offset to obtain the end
of the range. Such an overflow would also throw a #GP in the PT event
configuration path.
Fix this by explicitly validating the result of this calculation.
Reported-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: stable@vger.kernel.org # v4.7 Cc: stable@vger.kernel.org#v4.7 Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160915151352.21306-4-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
The kernel_ip() filter is used mostly by the DS/LBR code to look at the
branch addresses, but Intel PT also uses it to validate the address
filter offsets for kernel addresses, for which it is not sufficient:
supplying something in bits 64:48 that's not a sign extension of the lower
address bits (like 0xf00d000000000000) throws a #GP.
This patch adds address validation for the user supplied kernel filters.
Reported-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: stable@vger.kernel.org # v4.7 Cc: stable@vger.kernel.org#v4.7 Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160915151352.21306-3-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
perf/x86/intel/pt: Fix an off-by-one in address filter configuration
PT address filter configuration requires that a range is specified by
its first and last address, but at the moment we're obtaining the end
of the range by adding user specified size to its start, which is off
by one from what it actually needs to be.
Fix this and make sure that zero-sized filters don't pass the filter
validation.
Reported-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: stable@vger.kernel.org # v4.7 Cc: stable@vger.kernel.org#v4.7 Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160915151352.21306-2-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Paul E. McKenney [Tue, 26 Apr 2016 17:42:25 +0000 (10:42 -0700)]
PM / runtime: Use _rcuidle for runtime suspend tracepoints
Further testing with false negatives suppressed by commit 293e2421fe25
("rcu: Remove superfluous versions of rcu_read_lock_sched_held()")
identified a few more unprotected uses of RCU from the idle loop.
Because RCU actively ignores idle-loop code (for energy-efficiency
reasons, among other things), using RCU from the idle loop can result
in too-short grace periods, in turn resulting in arbitrary misbehavior.
RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 0
RCU used illegally from extended quiescent state!
1 lock held by swapper/0/0:
#0: (&(&dev->power.lock)->rlock){-.-...}, at: [<c052ee24>] __pm_runtime_suspend+0x54/0x84
stack backtrace:
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.6.0-rc5-next-20160426+ #1112
Hardware name: Generic OMAP36xx (Flattened Device Tree)
[<c0110308>] (unwind_backtrace) from [<c010c3a8>] (show_stack+0x10/0x14)
[<c010c3a8>] (show_stack) from [<c047fec8>] (dump_stack+0xb0/0xe4)
[<c047fec8>] (dump_stack) from [<c052d7b4>] (rpm_suspend+0x604/0x7e4)
[<c052d7b4>] (rpm_suspend) from [<c052ee34>] (__pm_runtime_suspend+0x64/0x84)
[<c052ee34>] (__pm_runtime_suspend) from [<c04bf3bc>] (omap2_gpio_prepare_for_idle+0x5c/0x70)
[<c04bf3bc>] (omap2_gpio_prepare_for_idle) from [<c01255e8>] (omap_sram_idle+0x140/0x244)
[<c01255e8>] (omap_sram_idle) from [<c0126b48>] (omap3_enter_idle_bm+0xfc/0x1ec)
[<c0126b48>] (omap3_enter_idle_bm) from [<c0601db8>] (cpuidle_enter_state+0x80/0x3d4)
[<c0601db8>] (cpuidle_enter_state) from [<c0183c74>] (cpu_startup_entry+0x198/0x3a0)
[<c0183c74>] (cpu_startup_entry) from [<c0b00c0c>] (start_kernel+0x354/0x3c8)
[<c0b00c0c>] (start_kernel) from [<8000807c>] (0x8000807c)
Reported-by: Tony Lindgren <tony@atomide.com> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Tested-by: Tony Lindgren <tony@atomide.com> Tested-by: Guenter Roeck <linux@roeck-us.net>
[ rjw: Subject ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This ensures that do_mmap() won't implicitly make AIO memory mappings
executable if the READ_IMPLIES_EXEC personality flag is set. Such
behavior is problematic because the security_mmap_file LSM hook doesn't
catch this case, potentially permitting an attacker to bypass a W^X
policy enforced by SELinux.
Darrick J. Wong [Thu, 15 Sep 2016 03:20:44 +0000 (20:20 -0700)]
vfs: cap dedupe request structure size at PAGE_SIZE
Kirill A Shutemov reports that the kernel doesn't try to cap dest_count
in any way, and uses the number to allocate kernel memory. This causes
high order allocation warnings in the kernel log if someone passes in a
big enough value. We should clamp the allocation at PAGE_SIZE to avoid
stressing the VM.
The two existing users of the dedupe ioctl never send more than 120
requests, so we can safely clamp dest_range at PAGE_SIZE, because with
4k pages we can handle up to 127 dedupe candidates. Given the max
extent length of 16MB, we can end up doing 2GB of IO which is plenty.
[ Note: the "offsetof()" can't overflow, because 'count' is just a
16-bit integer. That's not obvious in the limited context of the
patch, so I'm noting it here because it made me go look. - Linus ]
Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A set of fixes for the current series in the realm of block.
Like the previous pull request, the meat of it are fixes for the nvme
fabrics/target code. Outside of that, just one fix from Gabriel for
not doing a queue suspend if we didn't get the admin queue setup in
the first place"
* 'for-linus' of git://git.kernel.dk/linux-block:
nvme-rdma: add back dependency on CONFIG_BLOCK
nvme-rdma: fix null pointer dereference on req->mr
nvme-rdma: use ib_client API to detect device removal
nvme-rdma: add DELETING queue flag
nvme/quirk: Add a delay before checking device ready for memblaze device
nvme: Don't suspend admin queue that wasn't created
nvme-rdma: destroy nvme queue rdma resources on connect failure
nvme_rdma: keep a ref on the ctrl during delete/flush
iw_cxgb4: block module unload until all ep resources are released
iw_cxgb4: call dev_put() on l2t allocation failure
Al Viro [Thu, 15 Sep 2016 01:35:29 +0000 (02:35 +0100)]
fix minor infoleak in get_user_ex()
get_user_ex(x, ptr) should zero x on failure. It's not a lot of a leak
(at most we are leaking uninitialized 64bit value off the kernel stack,
and in a fairly constrained situation, at that), but the fix is trivial,
so...
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
[ This sat in different branch from the uaccess fixes since mid-August ] Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Paolo Bonzini [Wed, 14 Sep 2016 21:39:12 +0000 (23:39 +0200)]
kvm: x86: correctly reset dest_map->vector when restoring LAPIC state
When userspace sends KVM_SET_LAPIC, KVM schedules a check between
the vCPU's IRR and ISR and the IOAPIC redirection table, in order
to re-establish the IOAPIC's dest_map (the list of CPUs servicing
the real-time clock interrupt with the corresponding vectors).
However, __rtc_irq_eoi_tracking_restore_one was forgetting to
set dest_map->vectors. Because of this, the IOAPIC did not process
the real-time clock interrupt EOI, ioapic->rtc_status.pending_eoi
got stuck at a non-zero value, and further RTC interrupts were
reported to userspace as coalesced.
perf/x86/intel: Don't disable "intel_bts" around "intel" event batching
At the moment, intel_bts events get disabled from intel PMU's disable
callback, which includes event scheduling transactions of said PMU,
which have nothing to do with intel_bts events.
We do want to keep intel_bts events off inside the PMI handler to
avoid filling up their buffer too soon.
This patch moves intel_bts enabling/disabling directly to the PMI
handler.
Reported-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160915082233.11065-1-alexander.shishkin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
Michael Ellerman [Thu, 15 Sep 2016 07:03:06 +0000 (17:03 +1000)]
powerpc/powernv/pci: Fix missed TCE invalidations that should fallback to OPAL
In commit f0228c413011 ("powerpc/powernv/pci: Fallback to OPAL for TCE
invalidations"), we added logic to fallback to OPAL for doing TCE
invalidations if we can't do it in Linux.
Ben sent a v2 of the patch, containing these additional call sites, but
I had already applied v1 and didn't notice. So fix them now.
Fixes: f0228c413011 ("powerpc/powernv/pci: Fallback to OPAL for TCE invalidations") Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
powerpc/powernv: Detach from PE on releasing PCI device
The PCI hotplug can be part of EEH error recovery. The @pdn and
the device's PE number aren't removed and added afterwords. The
PE number in @pdn should be set to an invalid one. Otherwise, the
PE's device count is decreased on removing devices while failing
to be increased on adding devices. It leads to unbalanced PE's
device count and make normal PCI hotplug path broken.
Merge tag 'pci-v4.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci
Pull PCI fixes from Bjorn Helgaas:
"Here are two changes for v4.8. The first fixes a "[Firmware Bug]: reg
0x10: invalid BAR (can't size)" warning on Haswell, and the second
fixes a problem in some new runtime suspend functionality we merged
for v4.8. Summary:
Enumeration:
Mark Haswell Power Control Unit as having non-compliant BARs (Bjorn Helgaas)
Power management:
Fix bridge_d3 update on device removal (Lukas Wunner)"
* tag 'pci-v4.8-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
PCI: Fix bridge_d3 update on device removal
PCI: Mark Haswell Power Control Unit as having non-compliant BARs
The ARM architected timer specification mandates that the interrupt
associated with each timer is level triggered (which corresponds to
the "counter >= comparator" condition).
A number of DTs are being remarkably creative, declaring the interrupt
to be edge triggered. A quick look at the TRM for the corresponding ARM
CPUs clearly shows that this is wrong, and I've corrected those.
For non-ARM designs (and in the absence of a publicly available TRM),
I've made them active low as well, which can't be completely wrong
as the GIC cannot disinguish between level low and level high.
The respective maintainers are of course welcome to prove me wrong.
While I was at it, I took the liberty to fix a couple of related issue,
such as some spurious affinity bits on ThunderX, and their complete
absence on ls1043a (both of which seem to be related to copy-pasting
from other DTs).
Acked-by: Duc Dang <dhdang@apm.com> Acked-by: Carlo Caione <carlo@endlessm.com> Acked-by: Michal Simek <michal.simek@xilinx.com> Acked-by: Krzysztof Kozlowski <k.kozlowski@samsung.com> Acked-by: Dinh Nguyen <dinguyen@opensource.altera.com> Acked-by: Masahiro Yamada <yamada.masahiro@socionext.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Merge branch 'uaccess-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull uaccess fixes from Al Viro:
"Fixes for broken uaccess primitives - mostly lack of proper zeroing
in copy_from_user()/get_user()/__get_user(), but for several
architectures there's more (broken clear_user() on frv and
strncpy_from_user() on hexagon)"
* 'uaccess-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (28 commits)
avr32: fix copy_from_user()
microblaze: fix __get_user()
microblaze: fix copy_from_user()
m32r: fix __get_user()
blackfin: fix copy_from_user()
sparc32: fix copy_from_user()
sh: fix copy_from_user()
sh64: failing __get_user() should zero
score: fix copy_from_user() and friends
score: fix __get_user/get_user
s390: get_user() should zero on failure
ppc32: fix copy_from_user()
parisc: fix copy_from_user()
openrisc: fix copy_from_user()
nios2: fix __get_user()
nios2: copy_from_user() should zero the tail of destination
mn10300: copy_from_user() should zero on access_ok() failure...
mn10300: failing __get_user() and get_user() should zero
mips: copy_from_user() must zero the destination on access_ok() failure
ARC: uaccess: get_user to zero out dest in cause of fault
...
Commit 88e957d6e47f ("xen: introduce xen_vcpu_id mapping") broke SMP
ARM guests on Xen. When FIFO-based event channels are in use (this is
the default), evtchn_fifo_alloc_control_block() is called on
CPU_UP_PREPARE event and this happens before we set up xen_vcpu_id
mapping in xen_starting_cpu. Temporary fix the issue by setting direct
Linux CPU id <-> Xen vCPU id mapping for all possible CPUs at boot. We
don't currently support kexec/kdump on Xen/ARM so these ids always
match.
In future, we have several ways to solve the issue, e.g.:
- Eliminate all hypercalls from CPU_UP_PREPARE, do them from the
starting CPU. This can probably be done for both x86 and ARM and, if
done, will allow us to get Xen's idea of vCPU id from CPUID/MPIDR on
the starting CPU directly, no messing with ACPI/device tree
required.
- Save vCPU id information from ACPI/device tree on ARM and use it to
initialize xen_vcpu_id mapping. This is the same trick we currently
do on x86.
Paul Burton [Wed, 14 Sep 2016 10:00:26 +0000 (11:00 +0100)]
cpu/hotplug: Include linux/types.h in linux/cpuhotplug.h
The linux/cpuhotplug.h header makes use of the bool type, but wasn't
including linux/types.h to ensure that type has been defined. Fix this
by including linux/types.h in preparation for including
linux/cpuhotplug.h in a file that doesn't do so already.
Signed-off-by: Paul Burton <paul.burton@imgtec.com> Cc: linux-mips@linux-mips.org Cc: Richard Cochran <rcochran@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Link: http://lkml.kernel.org/r/20160914100027.20945-1-paul.burton@imgtec.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Peter Ujfalusi [Wed, 14 Sep 2016 11:21:54 +0000 (14:21 +0300)]
mmc: omap: Initialize dma_slave_config to avoid random data in it's fields
It is wrong to use uninitialized dma_slave_config and configure only
certain fields as the DMAengine driver might look at non initialized
(random data) fields and tries to interpret it.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Peter Ujfalusi [Wed, 14 Sep 2016 11:22:07 +0000 (14:22 +0300)]
mmc: omap_hsmmc: Initialize dma_slave_config to avoid random data
It is wrong to use uninitialized dma_slave_config and configure only
certain fields as the DMAengine driver might look at non initialized
(random data) fields and tries to interpret it.
Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Ville Syrjälä [Tue, 13 Sep 2016 09:22:19 +0000 (12:22 +0300)]
drm/i915: Ignore OpRegion panel type except on select machines
Turns out
commit a05628195a0d ("drm/i915: Get panel_type from OpRegion panel
details") has regressed quite a few machines. So it looks like we
can't use the panel type from OpRegion on all systems, and yet we
absolutely must use it on some specific systems.
Despite trying, I was unable to find any automagic way to determine
if the OpRegion panel type is respectable or not. The only glimmer
of hope I had was bit 8 in the SCIC response, but that turned out to
not work either (it was always 0 on both types of systems).
So, to fix the regressions without breaking the machine we know to need
the OpRegion panel type, let's just add a quirk for this. Only specific
machines known to require the OpRegion panel type will therefore use
it. Everyone else will fall bck to the VBT panel type.
The only known machine so far is a "Conrac GmbH IX45GM2". The PCI
subsystem ID on this machine is just a generic 8086:2a42, so of no use.
Instead we'll go with a DMI match.
I suspect we can now also revert
commit aeddda06c1a7 ("drm/i915: Ignore panel type from OpRegion on SKL")
but let's leave that to a separate patch.
v2: Do the DMI match in the opregion code directly, as dev_priv->quirks
gets populated too late
Cc: Rob Kramer <rob@solution-space.com> Cc: Martin van Es <martin@mrvanes.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Dave Airlie <airlied@linux.ie> Cc: Marco Krüger <krgsch@gmail.com> Cc: Sean Greenslade <sean@seangreenslade.com> Cc: Trudy Tective <bertslany@gmail.com> Cc: Robin Müller <rm1990@gmx.de> Cc: Alexander Kobel <a-kobel@a-kobel.de> Cc: Alexey Shumitsky <alexey.shumitsky@gmail.com> Cc: Emil Andersen Lauridsen <mine809@gmail.com> Cc: oceans112@gmail.com Cc: James Hogan <james@albanarts.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Cc: stable@vger.kernel.org
References: https://lists.freedesktop.org/archives/intel-gfx/2016-August/105545.html
References: https://lists.freedesktop.org/archives/dri-devel/2016-August/116888.html
References: https://lists.freedesktop.org/archives/intel-gfx/2016-June/098826.html
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94825
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97060
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97443
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97363 Fixes: a05628195a0d ("drm/i915: Get panel_type from OpRegion panel details") Tested-by: Marco Krüger <krgsch@gmail.com> Tested-by: Alexey Shumitsky <alexey.shumitsky@gmail.com> Tested-by: Sean Greenslade <sean@seangreenslade.com> Tested-by: Emil Andersen Lauridsen <mine809@gmail.com> Tested-by: Robin Müller <rm1990@gmx.de> Tested-by: oceans112@gmail.com Tested-by: Rob Kramer <rob@solution-space.com> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1473758539-21565-1-git-send-email-ville.syrjala@linux.intel.com
References: http://patchwork.freedesktop.org/patch/msgid/1473602239-15855-1-git-send-email-adrienverge@gmail.com Acked-by: Jani Nikula <jani.nikula@intel.com>
(cherry picked from commit c8ebfad7a063fe665417fa0eeb0da7cfe987d8ed) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
There are panels that needs 4 idle frames before entering PSR,
but VBT is unproperly set.
Also lately it was identified that idle frame count calculated at HW
can be off by 1, what makes the minimum of 2, at least.
Without the current vbt+1 we are with the risk of having HW calculating
0 idle frames and entering PSR when it shouldn't. Regardless the lack
of link training.
[Jani: there is some disagreement on the explanation, but the commit
regresses so revert it is.]
References: http://marc.info/?i=20160904191153.GA2328@light.dominikbrodowski.net Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Jani Nikula <jani.nikula@intel.com> Cc: Daniel Vetter <daniel.vetter@intel.com> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Fixes: 1c80c25fb622 ("drm/i915/psr: Make idle_frames sensible again") Cc: drm-intel-fixes@lists.freedesktop.org # v4.8-rc1+ Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: http://patchwork.freedesktop.org/patch/msgid/1473295351-8766-1-git-send-email-rodrigo.vivi@intel.com
(cherry picked from commit 40918e0bb81be02f507a941f8b2741f0dc1771b0) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Chris Wilson [Thu, 25 Aug 2016 07:23:14 +0000 (08:23 +0100)]
drm/i915: Restore lost "Initialized i915" welcome message
A side effect of removing the midlayer from driver loading was the loss
of a useful message announcing to userspace that i915 had successfully
started, e.g.:
[drm] Initialized i915 1.6.0 20160425 for 0000:00:02.0 on minor 0
Reported-by: Timo Aaltonen <tjaalton@ubuntu.com> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Fixes: 8f460e2c78f2 ("drm/i915: Demidlayer driver loading") Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: drm-intel-fixes@lists.freedesktop.org Link: http://patchwork.freedesktop.org/patch/msgid/20160825072314.17402-1-chris@chris-wilson.co.uk Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
(cherry picked from commit bc5ca47c0af4f949ba889e666b7da65569e36093) Signed-off-by: Jani Nikula <jani.nikula@intel.com>
The PE for root bus (root PE) can be removed because of PCI hot
remove in EEH recovery path for fenced PHB error. We need update
@phb->root_pe_populated accordingly so that the root PE can be
populated again in forthcoming PCI hot add path. Also, the PE
shouldn't be destroyed as it's global and reserved resource.
Al Viro [Fri, 9 Sep 2016 23:28:23 +0000 (19:28 -0400)]
avr32: fix copy_from_user()
really ugly, but apparently avr32 compilers turns access_ok() into
something so bad that they want it in assembler. Left that way,
zeroing added in inline wrapper.
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Mon, 22 Aug 2016 03:33:47 +0000 (23:33 -0400)]
sh64: failing __get_user() should zero
It could be done in exception-handling bits in __get_user_b() et.al.,
but the surgery involved would take more knowledge of sh64 details
than I have or _want_ to have.
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Al Viro [Sat, 20 Aug 2016 21:05:21 +0000 (17:05 -0400)]
openrisc: fix copy_from_user()
... that should zero on faults. Also remove the <censored> helpful
logics wrt range truncation copied from ppc32. Where it had ever
been needed only in case of copy_from_user() *and* had not been merged
into the mainline until a month after the need had disappeared.
A decade before openrisc went into mainline, I might add...
Cc: stable@vger.kernel.org Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>