Certain registers that pcie-designware-ep tries to write to are read-only
registers. However, these registers can become read/write if we first
enable the DBI_RO_WR_EN bit. Set/unset the DBI_RO_WR_EN bit before/after
writing these registers.
Previously, dw_pcie_ep_set_msi() wrote all bits in the Message Control
register, thus overwriting the PCI_MSI_FLAGS_64BIT bit.
By clearing the PCI_MSI_FLAGS_64BIT bit, we break MSI
on systems where the RC has set a 64 bit MSI address.
Fix dw_pcie_ep_set_msi() so that it only sets MMC bits.
Currently we deal with single codec and suspend codec callbacks for
all S3, S4 and runtime PM handling. But it turned out that we want
distinguish the call patterns sometimes, e.g. for applying some init
sequence only at probing and restoring from hibernate.
This patch slightly modifies the common PM callbacks for HD-audio
codec and stores the currently processed PM event in power_state of
the codec's device.power field, which is currently unused. The codec
callback can take a look at this event value and judges which purpose
it's being called.
Tetsuo Handa had reported he saw an incorrect "downgrading a read lock"
warning right after a previous lockdep warning. It is likely that the
previous warning turned off lock debugging causing the lockdep to have
inconsistency states leading to the lock downgrade warning.
Fix that by add a check for debug_locks at the beginning of
__lock_downgrade().
Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Reported-by: syzbot+53383ae265fb161ef488@syzkaller.appspotmail.com Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Link: https://lkml.kernel.org/r/1547093005-26085-1-git-send-email-longman@redhat.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
When the ORC unwinder is invoked for an oops caused by IP==0,
it currently has no idea what to do because there is no debug information
for the stack frame of NULL.
But if RIP is NULL, it is very likely that the last successfully executed
instruction was an indirect CALL/JMP, and it is possible to unwind out in
the same way as for the first instruction of a normal function. Hardcode
a corresponding ORC entry.
With an artificially-added NULL call in prctl_set_seccomp(), before this
patch, the trace is:
prctl_set_seccomp() still doesn't show up in the trace because for some
reason, tail call optimization is only disabled in builds that use the
frame pointer unwinder.
Signed-off-by: Jann Horn <jannh@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: syzbot <syzbot+ca95b2b7aef9e7cbd6ab@syzkaller.appspotmail.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Masahiro Yamada <yamada.masahiro@socionext.com> Cc: Michal Marek <michal.lkml@markovi.net> Cc: linux-kbuild@vger.kernel.org Link: https://lkml.kernel.org/r/20190301031201.7416-2-jannh@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
When the frame unwinder is invoked for an oops caused by a call to NULL, it
currently skips the parent function because BP still points to the parent's
stack frame; the (nonexistent) current function only has the first half of
a stack frame, and BP doesn't point to it yet.
Add a special case for IP==0 that calculates a fake BP from SP, then uses
the real BP for the next frame.
Note that this handles first_frame specially: Return information about the
parent function as long as the saved IP is >=first_frame, even if the fake
BP points below it.
With an artificially-added NULL call in prctl_set_seccomp(), before this
patch, the trace is:
Delay the drm_modeset_acquire_init() until after we check for an
allocation failure so that we can return immediately upon error without
having to unwind.
WARNING: lock held when returning to user space!
4.20.0+ #174 Not tainted
------------------------------------------------
syz-executor556/8153 is leaving the kernel with locks still held!
1 lock held by syz-executor556/8153:
#0: 000000005100c85c (crtc_ww_class_acquire){+.+.}, at:
set_property_atomic+0xb3/0x330 drivers/gpu/drm/drm_mode_object.c:462
Reported-by: syzbot+6ea337c427f5083ebdf2@syzkaller.appspotmail.com Fixes: 144a7999d633 ("drm: Handle properties in the core for atomic drivers") Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Cc: Sean Paul <sean@poorly.run> Cc: David Airlie <airlied@linux.ie> Cc: <stable@vger.kernel.org> # v4.14+ Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181230122842.21917-1-chris@chris-wilson.co.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
It could use ioctl to set hci uart proto, but there is
a use-after-free issue when hci_uart_register_dev() fail in
hci_uart_set_proto(), see stack above, fix this by setting
HCI_UART_PROTO_READY bit only when hci_uart_register_dev()
return success.
Reported-by: syzbot+899a33dc0fa0dbaf06a6@syzkaller.appspotmail.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Jeremy Cline <jcline@redhat.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The hci_dev struct hdev is referenced in work queues and timers started
by open() in some protocols. This creates a race between the
initialization function and the work or timer which can result hdev
being dereferenced while it is still null.
The syzbot report contains a reliable reproducer which causes a null
pointer dereference of hdev in hci_uart_write_work() by making the
memory allocation for hdev fail.
To fix this, ensure hdev is valid from before calling a protocol's
open() until after calling a protocol's close().
When releasing socket, it is possible to enter hci_sock_release() and
hci_sock_dev_event(HCI_DEV_UNREG) at the same time in different thread.
The reference count of hdev should be decremented only once from one of
them but if storing hdev to local variable in hci_sock_release() before
detached from socket and setting to NULL in hci_sock_dev_event(),
hci_dev_put(hdev) is unexpectedly called twice. This is resolved by
referencing hdev from socket after bt_sock_unlink() in
hci_sock_release().
h4_recv_buf() callers store the return value to socket buffer and
recursively pass the buffer to h4_recv_buf() without protection. So,
ERR_PTR returned from h4_recv_buf() can be dereferenced, if called again
before setting the socket buffer to NULL from previous error. Check if
skb is ERR_PTR in h4_recv_buf().
Control events can leak kernel memory since they do not fully zero the
event. The same code is present in both v4l2-ctrls.c and uvc_ctrl.c, so
fix both.
It appears that all other event code is properly zeroing the structure,
it's these two places.
All indirect buffers get by ext4_find_shared() should be released no
mater the branch should be freed or not. But now, we forget to release
the lower depth indirect buffers when removing space from the same
higher depth indirect block. It will lead to buffer leak and futher
more, it may lead to quota information corruption when using old quota,
consider the following case.
- Create and mount an empty ext4 filesystem without extent and quota
features,
- quotacheck and enable the user & group quota,
- Create some files and write some data to them, and then punch hole
to some files of them, it may trigger the buffer leak problem
mentioned above.
- Disable quota and run quotacheck again, it will create two new
aquota files and write the checked quota information to them, which
probably may reuse the freed indirect block(the buffer and page
cache was not freed) as data block.
- Enable quota again, it will invoke
vfs_load_quota_inode()->invalidate_bdev() to try to clean unused
buffers and pagecache. Unfortunately, because of the buffer of quota
data block is still referenced, quota code cannot read the up to date
quota info from the device and lead to quota information corruption.
This problem can be reproduced by xfstests generic/231 on ext3 file
system or ext4 file system without extent and quota features.
This patch fix this problem by releasing the missing indirect buffers,
in ext4_ind_remove_space().
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Ext4 needs to serialize unaligned direct AIO because the zeroing of
partial blocks of two competing unaligned AIOs can result in data
corruption.
However it decides not to serialize if the potentially unaligned aio is
past i_size with the rationale that no pending writes are possible past
i_size. Unfortunately if the i_size is not block aligned and the second
unaligned write lands past i_size, but still into the same block, it has
the potential of corrupting the previous unaligned write to the same
block.
Without this patch the 512B range from 40960 up to the start of the
second unaligned write (41472) is going to be zeroed overwriting the data
written by the first write. This is a data corruption.
We see the following NULL pointer dereference while running xfstests
generic/475:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
PGD 8000000c84bad067 P4D 8000000c84bad067 PUD c84e62067 PMD 0
Oops: 0000 [#1] SMP PTI
CPU: 7 PID: 9886 Comm: fsstress Kdump: loaded Not tainted 5.0.0-rc8 #10
RIP: 0010:ext4_do_update_inode+0x4ec/0x760
...
Call Trace:
? jbd2_journal_get_write_access+0x42/0x50
? __ext4_journal_get_write_access+0x2c/0x70
? ext4_truncate+0x186/0x3f0
ext4_mark_iloc_dirty+0x61/0x80
ext4_mark_inode_dirty+0x62/0x1b0
ext4_truncate+0x186/0x3f0
? unmap_mapping_pages+0x56/0x100
ext4_setattr+0x817/0x8b0
notify_change+0x1df/0x430
do_truncate+0x5e/0x90
? generic_permission+0x12b/0x1a0
This is triggered because the NULL pointer handle->h_transaction was
dereferenced in function ext4_update_inode_fsync_trans().
I found that the h_transaction was set to NULL in jbd2__journal_restart
but failed to attached to a new transaction while the journal is aborted.
Fix this by checking the handle before updating the inode.
Fixes: b436b9bef84d ("ext4: Wait for proper transaction commit on fsync") Signed-off-by: Jiufei Xue <jiufei.xue@linux.alibaba.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Commit 46e831abe864 ("drm/i915/lpe: Mark LPE audio runtime pm as
"no callbacks"") broke runtime PM with lpe audio. We can no longer
runtime suspend the GPU since the sysfs power/control for the
lpe-audio device no longer exists and the device is considered
always active. We can fix this by not marking the device as
active.
Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Takashi Iwai <tiwai@suse.de> Cc: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com> Fixes: 46e831abe864 ("drm/i915/lpe: Mark LPE audio runtime pm as "no callbacks"") Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20181024154825.18185-1-ville.syrjala@linux.intel.com Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Acked-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Objtool uses over 512k of stack, thanks to the hash table embedded in
the objtool_file struct. This causes an unnecessarily large stack
allocation and breaks users with low stack limits.
Move the struct off the stack.
Fixes: 042ba73fe7eb ("objtool: Add several performance improvements") Reported-by: Vassili Karpov <moosotc@gmail.com> Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/df92dcbc4b84b02ffa252f46876df125fb56e2d7.1552954176.git.jpoimboe@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Since commit 4d99e4136580 ("perf machine: Workaround missing maps for
x86 PTI entry trampolines"), perf tools has been creating more than one
kernel map, however 'perf probe' assumed there could be only one.
Fix by using machine__kernel_map() to get the main kernel map.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Tested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jiufei Xue <jiufei.xue@linux.alibaba.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Cc: Xu Yu <xuyu@linux.alibaba.com> Fixes: 4d99e4136580 ("perf machine: Workaround missing maps for x86 PTI entry trampolines") Fixes: d83212d5dd67 ("kallsyms, x86: Export addresses of PTI entry trampolines") Link: http://lkml.kernel.org/r/2ed432de-e904-85d2-5c36-5897ddc5b23b@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The futex code requires that the user space addresses of futexes are 32bit
aligned. sys_futex() checks this in futex_get_keys() but the robust list
code has no alignment check in place.
As a consequence the kernel crashes on architectures with strict alignment
requirements in handle_futex_death() when trying to cmpxchg() on an
unaligned futex address which was retrieved from the robust list.
[ tglx: Rewrote changelog, proper sizeof() based alignement check and add
comment ]
The event pool used for queueing commands is destroyed fairly early in the
ibmvscsi_remove() code path. Since, this happens prior to the call so
scsi_remove_host() it is possible for further calls to queuecommand to be
processed which manifest as a panic due to a NULL pointer dereference as
seen here:
PANIC: "Unable to handle kernel paging request for data at address
0x00000000"
The kernel buffer log is overfilled with this log:
[11261.952732] ibmvscsi: found no event struct in pool!
This patch reorders the operations during host teardown. Start by calling
the SRP transport and Scsi_Host remove functions to flush any outstanding
work and set the host offline. LLDD teardown follows including destruction
of the event pool, freeing the Command Response Queue (CRQ), and unmapping
any persistent buffers. The event pool destruction is protected by the
scsi_host lock, and the pool is purged prior of any requests for which we
never received a response. Finally, move the removal of the scsi host from
our global list to the end so that the host is easily locatable for
debugging purposes during teardown.
Cc: <stable@vger.kernel.org> # v2.6.12+ Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
For each ibmvscsi host created during a probe or destroyed during a remove
we either add or remove that host to/from the global ibmvscsi_head
list. This runs the risk of concurrent modification.
This patch adds a simple spinlock around the list modification calls to
prevent concurrent updates as is done similarly in the ibmvfc driver and
ipr driver.
Fixes: 32d6e4b6e4ea ("scsi: ibmvscsi: add vscsi hosts to global list_head") Cc: <stable@vger.kernel.org> # v4.10+ Signed-off-by: Tyrel Datwyler <tyreld@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Insert Branch instruction instead of NOP to make sure assembler don't
patch code in forbidden slot. In jump label function, it might
be possible to patch Control Transfer Instructions(CTIs) into
forbidden slot, which will generate Reserved Instruction exception
in MIPS release 6.
Signed-off-by: Archer Yan <ayan@wavecomp.com> Reviewed-by: Paul Burton <paul.burton@mips.com>
[paul.burton@mips.com:
- Add MIPS prefix to subject.
- Mark for stable from v4.0, which introduced r6 support, onwards.] Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Timekeeping IRQs from CS5536 MFGPT are routed to i8259, which then
triggers the "cascade" IRQ on MIPS CPU. Without IRQF_NO_SUSPEND in
cascade_irqaction, MFGPT interrupts will be masked in suspend mode,
and the machine would be unable to resume once suspended.
Previously, MIPS IRQs were not disabled properly, so the original
code appeared to work. Commit a3e6c1eff5 ("MIPS: IRQ: Fix disable_irq on
CPU IRQs") uncovers the bug. To fix it, add IRQF_NO_SUSPEND to
cascade_irqaction.
This commit is functionally identical to 0add9c2f1cff ("MIPS:
Loongson-3: Add IRQF_NO_SUSPEND to Cascade irqaction"), but it forgot
to apply the same fix to Loongson2.
Signed-off-by: Yifeng Li <tomli@tomli.me> Signed-off-by: Paul Burton <paul.burton@mips.com> Cc: linux-mips@vger.kernel.org Cc: Jiaxun Yang <jiaxun.yang@flygoat.com> Cc: Huacai Chen <chenhc@lemote.com> Cc: Ralf Baechle <ralf@linux-mips.org> Cc: James Hogan <jhogan@kernel.org> Cc: linux-kernel@vger.kernel.org Cc: stable@vger.kernel.org # v3.19+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Because map updates are distributed lazily, an OSD may not know about
the new blacklist for quite some time after "osd blacklist add" command
is completed. This makes it possible for a blacklisted but still alive
client to overwrite a post-blacklist update, resulting in data
corruption.
Waiting for latest osdmap in ceph_monc_blacklist_add() and thus using
the post-blacklist epoch for all post-blacklist requests ensures that
all such requests "wait" for the blacklist to come into force on their
respective OSDs.
Cc: stable@vger.kernel.org Fixes: 6305a3b41515 ("libceph: support for blacklisting clients") Signed-off-by: Ilya Dryomov <idryomov@gmail.com> Reviewed-by: Jason Dillaman <dillaman@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Take into account that sg->offset can be bigger than PAGE_SIZE when
setting segment sg->dma_address. Otherwise sg->dma_address will point
at diffrent page, what makes DMA not possible with erros like this:
When calling vmw_fb_set_par(), the mode stored in par->set_mode gets free'd
twice. The first free is in vmw_fb_kms_detach(), the second is near the
end of vmw_fb_set_par() under the name of 'old_mode'. The mode-setting code
only works correctly if the mode doesn't actually change. Removing
'old_mode' in favor of using par->set_mode directly fixes the problem.
Cc: <stable@vger.kernel.org> Fixes: a278724aa23c ("drm/vmwgfx: Implement fbdev on kms v2") Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Deepak Rawat <drawat@vmware.com> Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
clang points out several instances of mismatched types in this drivers,
all coming from a single declaration:
drivers/mmc/host/pxamci.c:193:15: error: implicit conversion from enumeration type 'enum dma_transfer_direction' to
different enumeration type 'enum dma_data_direction' [-Werror,-Wenum-conversion]
direction = DMA_DEV_TO_MEM;
~ ^~~~~~~~~~~~~~
drivers/mmc/host/pxamci.c:212:62: error: implicit conversion from enumeration type 'enum dma_data_direction' to
different enumeration type 'enum dma_transfer_direction' [-Werror,-Wenum-conversion]
tx = dmaengine_prep_slave_sg(chan, data->sg, host->dma_len, direction,
The behavior is correct, so this must be a simply typo from
dma_data_direction and dma_transfer_direction being similarly named
types with a similar purpose.
Fixes: 6464b7140951 ("mmc: pxamci: switch over to dmaengine use") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Nathan Chancellor <natechancellor@gmail.com> Acked-by: Robert Jarzmik <robert.jarzmik@free.fr> Cc: stable@vger.kernel.org Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
In 'commit 752f66a75aba ("bcache: use REQ_PRIO to indicate bio for
metadata")' REQ_META is replaced by REQ_PRIO to indicate metadata bio.
This assumption is not always correct, e.g. XFS uses REQ_META to mark
metadata bio other than REQ_PRIO. This is why Nix noticed that bcache
does not cache metadata for XFS after the above commit.
Thanks to Dave Chinner, he explains the difference between REQ_META and
REQ_PRIO from view of file system developer. Here I quote part of his
explanation from mailing list,
REQ_META is used for metadata. REQ_PRIO is used to communicate to
the lower layers that the submitter considers this IO to be more
important that non REQ_PRIO IO and so dispatch should be expedited.
IOWs, if the filesystem considers metadata IO to be more important
that user data IO, then it will use REQ_PRIO | REQ_META rather than
just REQ_META.
Then it seems bios with REQ_META or REQ_PRIO should both be cached for
performance optimation, because they are all probably low I/O latency
demand by upper layer (e.g. file system).
So in this patch, when we want to decide whether to bypass the cache,
REQ_META and REQ_PRIO are both checked. Then both metadata and
high priority I/O requests will be handled properly.
Reported-by: Nix <nix@esperi.org.uk> Signed-off-by: Coly Li <colyli@suse.de> Reviewed-by: Andre Noll <maan@tuebingen.mpg.de> Tested-by: Nix <nix@esperi.org.uk> Cc: stable@vger.kernel.org Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The networking maintainer keeps a public list of the patches being
queued up for the next round of stable releases. Be sure to check there
before asking for a patch to be applied so that you do not waste
people's time.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: David S. Miller <davem@davemloft.net> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
We're unintentionally limiting the number of slots per nfsv4.1 session
to 10. Often more than 10 simultaneous RPCs are needed for the best
performance.
This calculation was meant to prevent any one client from using up more
than a third of the limit we set for total memory use across all clients
and sessions. Instead, it's limiting the client to a third of the
maximum for a single session.
Fix this.
Reported-by: Chris Tracy <ctracy@engr.scu.edu> Cc: stable@vger.kernel.org Fixes: de766e570413 "nfsd: give out fewer session slots as limit approaches" Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
There is a potential NULL pointer dereference in case devm_kzalloc()
fails and returns NULL.
Fix this by adding a NULL check on *lookup*
This bug was detected with the help of Coccinelle.
Fixes: b2e63555592f ("i2c: gpio: Convert to use descriptors") Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Commit 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api
only on Hotplug", 2017-07-21) added two calls to opal_slw_set_reg()
inside pnv_cpu_offline(), with the aim of changing the LPCR value in
the SLW image to disable wakeups from the decrementer while a CPU is
offline. However, pnv_cpu_offline() gets called each time a secondary
CPU thread is woken up to participate in running a KVM guest, that is,
not just when a CPU is offlined.
Since opal_slw_set_reg() is a very slow operation (with observed
execution times around 20 milliseconds), this means that an offline
secondary CPU can often be busy doing the opal_slw_set_reg() call
when the primary CPU wants to grab all the secondary threads so that
it can run a KVM guest. This leads to messages like "KVM: couldn't
grab CPU n" being printed and guest execution failing.
There is no need to reprogram the SLW image on every KVM guest entry
and exit. So that we do it only when a CPU is really transitioning
between online and offline, this moves the calls to
pnv_program_cpu_hotplug_lpcr() into pnv_smp_cpu_kill_self().
Fixes: 24be85a23d1f ("powerpc/powernv: Clear PECE1 in LPCR via stop-api only on Hotplug") Cc: stable@vger.kernel.org # v4.14+ Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
According to the ov5640 specification (2.7 power up sequence), host can
access the sensor's registers 20ms after reset. Trying to access them
before leads to undefined behavior and result in sporadic initialization
errors.
Signed-off-by: Loic Poulain <loic.poulain@linaro.org> Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Cc: Adam Ford <aford173@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
LTP testcase mtest06 [1] can trigger a crash on s390x running 5.0.0-rc8.
This is a stress test, where one thread mmaps/writes/munmaps memory area
and other thread is trying to read from it:
page_table_free() is called with NULL mm parameter, but because "0" is a
valid address on s390 (see S390_lowcore), it keeps going until it
eventually crashes in lockdep's lock_acquire. This crash is
reproducible at least since 4.14.
Problem is that "vmf->vma" used in do_fault() can become stale. Because
mmap_sem may be released, other threads can come in, call munmap() and
cause "vma" be returned to kmem cache, and get zeroed/re-initialized and
re-used:
handle_mm_fault |
__handle_mm_fault |
do_fault |
vma = vmf->vma |
do_read_fault |
__do_fault |
vma->vm_ops->fault(vmf); |
mmap_sem is released |
|
| do_munmap()
| remove_vma_list()
| remove_vma()
| vm_area_free()
| # vma is released
| ...
| # same vma is allocated
| # from kmem cache
| do_mmap()
| vm_area_alloc()
| memset(vma, 0, ...)
|
pte_free(vma->vm_mm, ...); |
page_table_free |
spin_lock_bh(&mm->context.lock);|
<crash> |
Cache mm_struct to avoid using potentially stale "vma".
This commit fixes the issue that USB-DMAC hangs silently after system
resumes on R-Car Gen3 hence renesas_usbhs will not work correctly
when using USB-DMAC for bulk transfer e.g. ethernet or serial
gadgets.
The issue can be reproduced by these steps:
1. modprobe g_serial
2. Suspend and resume system.
3. connect a usb cable to host side
4. Transfer data from Host to Target
5. cat /dev/ttyGS0 (Target side)
6. echo "test" > /dev/ttyACM0 (Host side)
The 'cat' will not result anything. However, system still can work
normally.
Currently, USB-DMAC driver does not have system sleep callbacks hence
this driver relies on the PM core to force runtime suspend/resume to
suspend and reinitialize USB-DMAC during system resume. After
the commit 17218e0092f8 ("PM / genpd: Stop/start devices without
pm_runtime_force_suspend/resume()"), PM core will not force
runtime suspend/resume anymore so this issue happens.
To solve this, make system suspend resume explicit by using
pm_runtime_force_{suspend,resume}() as the system sleep callbacks.
SET_NOIRQ_SYSTEM_SLEEP_PM_OPS() is used to make sure USB-DMAC
suspended after and initialized before renesas_usbhs."
While do swap between two inode, they swap i_data without update
quota information. Also, swap_inode_boot_loader can do "revert"
somtimes, so update the quota while all operations has been finished.
While do swap, we should make sure there has no new dirty page since we
should swap i_data between two inode:
1.We should lock i_mmap_sem with write to avoid new pagecache from mmap
read/write;
2.Change filemap_flush to filemap_write_and_wait and move them to the
space protected by inode lock to avoid new pagecache from buffer read/write.
Before really do swap between inode and boot inode, something need to
check to avoid invalid or not permitted operation, like does this inode
has inline data. But the condition check should be protected by inode
lock to avoid change while swapping. Also some other condition will not
change between swapping, but there has no problem to do this under inode
lock.
Using the irq_gc_lock/irq_gc_unlock functions in the suspend and
resume functions creates the opportunity for a deadlock during
suspend, resume, and shutdown. Using the irq_gc_lock_irqsave/
irq_gc_unlock_irqrestore variants prevents this possible deadlock.
Cc: stable@vger.kernel.org Fixes: 7f646e92766e2 ("irqchip: brcmstb-l2: Add Broadcom Set Top Box Level-2 interrupt controller") Signed-off-by: Doug Berger <opendmb@gmail.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
[maz: tidied up $SUBJECT] Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The Allwinner A64 SoC is known[1] to have an unstable architectural
timer, which manifests itself most obviously in the time jumping forward
a multiple of 95 years[2][3]. This coincides with 2^56 cycles at a
timer frequency of 24 MHz, implying that the time went slightly backward
(and this was interpreted by the kernel as it jumping forward and
wrapping around past the epoch).
Investigation revealed instability in the low bits of CNTVCT at the
point a high bit rolls over. This leads to power-of-two cycle forward
and backward jumps. (Testing shows that forward jumps are about twice as
likely as backward jumps.) Since the counter value returns to normal
after an indeterminate read, each "jump" really consists of both a
forward and backward jump from the software perspective.
Unless the kernel is trapping CNTVCT reads, a userspace program is able
to read the register in a loop faster than it changes. A test program
running on all 4 CPU cores that reported jumps larger than 100 ms was
run for 13.6 hours and reported the following:
This works out to a jump larger than 100 ms about every 5.5 seconds on
each CPU core.
The largest jump (almost an hour!) was the following sequence of reads:
0x0000007fffffffff → 0x00000093feffffff → 0x0000008000000000
Note that the middle bits don't necessarily all read as all zeroes or
all ones during the anomalous behavior; however the low 10 bits checked
by the function in this patch have never been observed with any other
value.
Also note that smaller jumps are much more common, with backward jumps
of 2048 (2^11) cycles observed over 400 times per second on each core.
(Of course, this is partially explained by lower bits rolling over more
frequently.) Any one of these could have caused the 95 year time skip.
Similar anomalies were observed while reading CNTPCT (after patching the
kernel to allow reads from userspace). However, the CNTPCT jumps are
much less frequent, and only small jumps were observed. The same program
as before (except now reading CNTPCT) observed after 72 hours:
Further investigation showed that the instability in CNTPCT/CNTVCT also
affected the respective timer's TVAL register. The following values were
observed immediately after writing CNVT_TVAL to 0x10000000:
I was also twice able to reproduce the issue covered by Allwinner's
workaround[4], that writing to TVAL sometimes fails, and both CVAL and
TVAL are left with entirely bogus values. One was the following values:
CNTVCT | CNTV_TVAL | CNTV_CVAL
--------------------+------------+--------------------------------------
0x000000d4a2d6014c | 0x8fbd5721 | 0x000000d132935fff (615s in the past) Reviewed-by: Marc Zyngier <marc.zyngier@arm.com>
========================================================================
Because the CPU can read the CNTPCT/CNTVCT registers faster than they
change, performing two reads of the register and comparing the high bits
(like other workarounds) is not a workable solution. And because the
timer can jump both forward and backward, no pair of reads can
distinguish a good value from a bad one. The only way to guarantee a
good value from consecutive reads would be to read _three_ times, and
take the middle value only if the three values are 1) each unique and
2) increasing. This takes at minimum 3 counter cycles (125 ns), or more
if an anomaly is detected.
However, since there is a distinct pattern to the bad values, we can
optimize the common case (1022/1024 of the time) to a single read by
simply ignoring values that match the error pattern. This still takes no
more than 3 cycles in the worst case, and requires much less code. As an
additional safety check, we still limit the loop iteration to the number
of max-frequency (1.2 GHz) CPU cycles in three 24 MHz counter periods.
For the TVAL registers, the simple solution is to not use them. Instead,
read or write the CVAL and calculate the TVAL value in software.
Although the manufacturer is aware of at least part of the erratum[4],
there is no official name for it. For now, use the kernel-internal name
"UNKNOWN1".
Acked-by: Maxime Ripard <maxime.ripard@bootlin.com> Tested-by: Andre Przywara <andre.przywara@arm.com> Signed-off-by: Samuel Holland <samuel@sholland.org> Cc: stable@vger.kernel.org Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The original purpose of the code I fix is to replace max_discard with
max_trim if max_trim is less than max_discard. When max_discard is 0
we should replace max_discard with max_trim as well, because
max_discard equals 0 happens only when the max_do_calc_max_discard
process is overflowed, so if mmc_can_trim(card) is true, max_discard
should be replaced by an available max_trim.
However, in the original code, there are two lines of code interfere
the right process.
1) if (max_discard && mmc_can_trim(card))
when max_discard is 0, it skips the process checking if max_discard
needs to be replaced with max_trim.
2) if (max_trim < max_discard)
the condition is false when max_discard is 0. it also skips the process
that replaces max_discard with max_trim, in fact, we should replace the
0-valued max_discard with max_trim.
Commit 11189c1089da "acpi/nfit: Fix command-supported detection" broke
ND_CMD_CALL for bus-level commands. The "func = cmd" assumption is only
valid for:
Since for_each_cpu(cpu, mask) added by commit 2d3854a37e8b767a
("cpumask: introduce new API, without changing anything") did not
evaluate the mask argument if NR_CPUS == 1 due to CONFIG_SMP=n,
lru_add_drain_all() is hitting WARN_ON() at __flush_work() added by
commit 4d43d395fed12463 ("workqueue: Try to catch flush_work() without
INIT_WORK().") by unconditionally calling flush_work() [1].
Workaround this issue by using CONFIG_SMP=n specific lru_add_drain_all
implementation. There is no real need to defer the implementation to
the workqueue as the draining is going to happen on the local cpu. So
alias lru_add_drain_all to lru_add_drain which does all the necessary
work.
The assumption that the maximum size of a syn packet is 128 bytes
is wrong. Tunneling headers were not accounted for.
Allocate buffers large enough for mtu.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
We assume in the bcm_sf2 driver that the DSA master network device
supports ethtool_ops::{get,set}_wol operations, which is not a given.
Avoid de-referencing potentially non-existent function pointers and
check them as we should.
Fixes: 96e65d7f3f88 ("net: dsa: bcm_sf2: add support for Wake-on-LAN") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Installing the appropriate non-IOMMU DMA ops in arm_iommu_detch_device()
serves the case where IOMMU-aware drivers choose to control their own
mapping but still make DMA API calls, however it also affects the case
when the arch code itself tears down the mapping upon driver unbinding,
where the ops now get left in place and can inhibit arch_setup_dma_ops()
on subsequent re-probe attempts.
Fix the latter case by making sure that arch_teardown_dma_ops() cleans
up whenever the ops were automatically installed by its counterpart.
Reported-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Reported-by: Marek Szyprowski <m.szyprowski@samsung.com> Fixes: 1874619a7df4 "ARM: dma-mapping: Set proper DMA ops in arm_iommu_detach_device()" Tested-by: Tobias Jakobi <tjakobi@math.uni-bielefeld.de> Tested-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The value of ->num_ports comes from bcm_sf2_sw_probe() and it is less
than or equal to DSA_MAX_PORTS. The ds->ports[] array is used inside
the dsa_is_user_port() and dsa_is_cpu_port() functions. The ds->ports[]
array is allocated in dsa_switch_alloc() and it has ds->num_ports
elements so this leads to a static checker warning about a potential out
of bounds read.
Fixes: 8cfa94984c9c ("net: dsa: bcm_sf2: add suspend/resume callbacks") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
When requeue, if RQF_DONTPREP, rq has contained some driver
specific data, so insert it to hctx dispatch list to avoid any
merge. Take scsi as example, here is the trace event log (no
io scheduler, because RQF_STARTED would prevent merging),
(32768 + 8) was requeued by scsi_queue_insert and had RQF_DONTPREP.
Then it was merged with (32776 + 8) and issued. Due to RQF_DONTPREP,
the sdb only contained the part of (32768 + 8), then only that part
was completed. The lucky thing was that scsi_io_completion detected
it and requeued the remaining part. So we didn't get corrupted data.
However, the requeue of (32776 + 8) is not expected.
If a driver does any significant activity in its ibss_join method,
then it will very well expect that to be called during restart,
before any stations are added. Do that.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
This patch moves clk_get_rate() call from trigger() to hw_params()
callback to avoid calling sleeping clk API from atomic context
and prevent deadlock as indicated below.
Before this change clk_get_rate() was being called with same
spinlock held as the one passed to the clk API when registering
clocks exposed by the I2S driver.
[ 82.109780] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:908
[ 82.117009] in_atomic(): 1, irqs_disabled(): 128, pid: 1554, name: speaker-test
[ 82.124235] 3 locks held by speaker-test/1554:
[ 82.128653] #0: cc8c5328 (snd_pcm_link_rwlock){...-}, at: snd_pcm_stream_lock_irq+0x20/0x38
[ 82.137058] #1: ec9eda17 (&(&substream->self_group.lock)->rlock){..-.}, at: snd_pcm_ioctl+0x900/0x1268
[ 82.146417] #2: 6ac279bf (&(&pri_dai->spinlock)->rlock){..-.}, at: i2s_trigger+0x64/0x6d4
[ 82.154650] irq event stamp: 8144
[ 82.157949] hardirqs last enabled at (8143): [<c0a0f574>] _raw_read_unlock_irq+0x24/0x5c
[ 82.166089] hardirqs last disabled at (8144): [<c0a0f6a8>] _raw_read_lock_irq+0x18/0x58
[ 82.174063] softirqs last enabled at (8004): [<c01024e4>] __do_softirq+0x3a4/0x66c
[ 82.181688] softirqs last disabled at (7997): [<c012d730>] irq_exit+0x140/0x168
[ 82.188964] Preemption disabled at:
[ 82.188967] [<00000000>] (null)
[ 82.195728] CPU: 6 PID: 1554 Comm: speaker-test Not tainted 5.0.0-rc5-00192-ga6e6caca8f03 #191
[ 82.204302] Hardware name: SAMSUNG EXYNOS (Flattened Device Tree)
[ 82.210376] [<c0111a54>] (unwind_backtrace) from [<c010d8f4>] (show_stack+0x10/0x14)
[ 82.218084] [<c010d8f4>] (show_stack) from [<c09ef004>] (dump_stack+0x90/0xc8)
[ 82.225278] [<c09ef004>] (dump_stack) from [<c0152980>] (___might_sleep+0x22c/0x2c8)
[ 82.232990] [<c0152980>] (___might_sleep) from [<c0a0a2e4>] (__mutex_lock+0x28/0xa3c)
[ 82.240788] [<c0a0a2e4>] (__mutex_lock) from [<c0a0ad80>] (mutex_lock_nested+0x1c/0x24)
[ 82.248763] [<c0a0ad80>] (mutex_lock_nested) from [<c04923dc>] (clk_prepare_lock+0x78/0xec)
[ 82.257079] [<c04923dc>] (clk_prepare_lock) from [<c049538c>] (clk_core_get_rate+0xc/0x5c)
[ 82.265309] [<c049538c>] (clk_core_get_rate) from [<c0766b18>] (i2s_trigger+0x490/0x6d4)
[ 82.273369] [<c0766b18>] (i2s_trigger) from [<c074fec4>] (soc_pcm_trigger+0x100/0x140)
[ 82.281254] [<c074fec4>] (soc_pcm_trigger) from [<c07378a0>] (snd_pcm_do_start+0x2c/0x30)
[ 82.289400] [<c07378a0>] (snd_pcm_do_start) from [<c07376cc>] (snd_pcm_action_single+0x38/0x78)
[ 82.298065] [<c07376cc>] (snd_pcm_action_single) from [<c073a450>] (snd_pcm_ioctl+0x910/0x1268)
[ 82.306734] [<c073a450>] (snd_pcm_ioctl) from [<c0292344>] (do_vfs_ioctl+0x90/0x9ec)
[ 82.314443] [<c0292344>] (do_vfs_ioctl) from [<c0292cd4>] (ksys_ioctl+0x34/0x60)
[ 82.321808] [<c0292cd4>] (ksys_ioctl) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
[ 82.329431] Exception stack(0xeb875fa8 to 0xeb875ff0)
[ 82.334459] 5fa0: 00033c18b6e31000000000040000414200033d8000033d80
[ 82.342605] 5fc0: 00033c18b6e3100000008000000000360000800000000000beea38a800008000
[ 82.350748] 5fe0: b6e3142cbeea384cb6da9a30b6c9212c
[ 82.355789]
[ 82.357245] ======================================================
[ 82.363397] WARNING: possible circular locking dependency detected
[ 82.369551] 5.0.0-rc5-00192-ga6e6caca8f03 #191 Tainted: G W
[ 82.376395] ------------------------------------------------------
[ 82.382548] speaker-test/1554 is trying to acquire lock:
[ 82.387834] 6d2007f4 (prepare_lock){+.+.}, at: clk_prepare_lock+0x78/0xec
[ 82.394593]
[ 82.394593] but task is already holding lock:
[ 82.400398] 6ac279bf (&(&pri_dai->spinlock)->rlock){..-.}, at: i2s_trigger+0x64/0x6d4
[ 82.408197]
[ 82.408197] which lock already depends on the new lock.
[ 82.416343]
[ 82.416343] the existing dependency chain (in reverse order) is:
[ 82.423795]
[ 82.423795] -> #1 (&(&pri_dai->spinlock)->rlock){..-.}:
[ 82.430472] clk_mux_set_parent+0x34/0xb8
[ 82.434975] clk_core_set_parent_nolock+0x1c4/0x52c
[ 82.440347] clk_set_parent+0x38/0x6c
[ 82.444509] of_clk_set_defaults+0xc8/0x308
[ 82.449186] of_clk_add_provider+0x84/0xd0
[ 82.453779] samsung_i2s_probe+0x408/0x5f8
[ 82.458376] platform_drv_probe+0x48/0x98
[ 82.462879] really_probe+0x224/0x3f4
[ 82.467037] driver_probe_device+0x70/0x1c4
[ 82.471716] bus_for_each_drv+0x44/0x8c
[ 82.476049] __device_attach+0xa0/0x138
[ 82.480382] bus_probe_device+0x88/0x90
[ 82.484715] deferred_probe_work_func+0x6c/0xbc
[ 82.489741] process_one_work+0x200/0x740
[ 82.494246] worker_thread+0x2c/0x4c8
[ 82.498408] kthread+0x128/0x164
[ 82.502131] ret_from_fork+0x14/0x20
[ 82.506204] (null)
[ 82.508976]
[ 82.508976] -> #0 (prepare_lock){+.+.}:
[ 82.514264] __mutex_lock+0x60/0xa3c
[ 82.518336] mutex_lock_nested+0x1c/0x24
[ 82.522756] clk_prepare_lock+0x78/0xec
[ 82.527088] clk_core_get_rate+0xc/0x5c
[ 82.531421] i2s_trigger+0x490/0x6d4
[ 82.535494] soc_pcm_trigger+0x100/0x140
[ 82.539913] snd_pcm_do_start+0x2c/0x30
[ 82.544246] snd_pcm_action_single+0x38/0x78
[ 82.549012] snd_pcm_ioctl+0x910/0x1268
[ 82.553345] do_vfs_ioctl+0x90/0x9ec
[ 82.557417] ksys_ioctl+0x34/0x60
[ 82.561229] ret_fast_syscall+0x0/0x28
[ 82.565477] 0xbeea384c
[ 82.568421]
[ 82.568421] other info that might help us debug this:
[ 82.568421]
[ 82.576394] Possible unsafe locking scenario:
[ 82.576394]
[ 82.582285] CPU0 CPU1
[ 82.586792] ---- ----
[ 82.591297] lock(&(&pri_dai->spinlock)->rlock);
[ 82.595977] lock(prepare_lock);
[ 82.601782] lock(&(&pri_dai->spinlock)->rlock);
[ 82.608975] lock(prepare_lock);
[ 82.612268]
[ 82.612268] *** DEADLOCK ***
Fixes: 647d04f8e07a ("ASoC: samsung: i2s: Ensure the RCLK rate is properly determined") Reported-by: Krzysztof Kozłowski <krzk@kernel.org> Signed-off-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Mark Brown <broonie@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Failing to properly reset system registers is pretty bad. But not
quite as bad as bringing the whole machine down... So warn loudly,
but slightly more gracefully.
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The current kvm_psci_vcpu_on implementation will directly try to
manipulate the state of the VCPU to reset it. However, since this is
not done on the thread that runs the VCPU, we can end up in a strangely
corrupted state when the source and target VCPUs are running at the same
time.
Fix this by factoring out all reset logic from the PSCI implementation
and forwarding the required information along with a request to the
target VCPU.
Reviewed-by: Andrew Jones <drjones@redhat.com> Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Signed-off-by: Christoffer Dall <christoffer.dall@arm.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Roland reports the following issue and provides a root cause analysis:
"On a v4.19 i.MX6 system with IMA and CONFIG_DMA_API_DEBUG enabled, a
warning is generated when accessing files on a filesystem for which IMA
measurement is enabled:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 1 at kernel/dma/debug.c:1181 check_for_stack.part.9+0xd0/0x120
caam_jr 2101000.jr0: DMA-API: device driver maps memory from stack [addr=b668049e]
Modules linked in:
CPU: 0 PID: 1 Comm: switch_root Not tainted 4.19.0-20181214-1 #2
Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
Backtrace:
[<c010efb8>] (dump_backtrace) from [<c010f2d0>] (show_stack+0x20/0x24)
[<c010f2b0>] (show_stack) from [<c08b04f4>] (dump_stack+0xa0/0xcc)
[<c08b0454>] (dump_stack) from [<c012b610>] (__warn+0xf0/0x108)
[<c012b520>] (__warn) from [<c012b680>] (warn_slowpath_fmt+0x58/0x74)
[<c012b62c>] (warn_slowpath_fmt) from [<c0199acc>] (check_for_stack.part.9+0xd0/0x120)
[<c01999fc>] (check_for_stack.part.9) from [<c019a040>] (debug_dma_map_page+0x144/0x174)
[<c0199efc>] (debug_dma_map_page) from [<c065f7f4>] (ahash_final_ctx+0x5b4/0xcf0)
[<c065f240>] (ahash_final_ctx) from [<c065b3c4>] (ahash_final+0x1c/0x20)
[<c065b3a8>] (ahash_final) from [<c03fe278>] (crypto_ahash_op+0x38/0x80)
[<c03fe240>] (crypto_ahash_op) from [<c03fe2e0>] (crypto_ahash_final+0x20/0x24)
[<c03fe2c0>] (crypto_ahash_final) from [<c03f19a8>] (ima_calc_file_hash+0x29c/0xa40)
[<c03f170c>] (ima_calc_file_hash) from [<c03f2b24>] (ima_collect_measurement+0x1dc/0x240)
[<c03f2948>] (ima_collect_measurement) from [<c03f0a60>] (process_measurement+0x4c4/0x6b8)
[<c03f059c>] (process_measurement) from [<c03f0cdc>] (ima_file_check+0x88/0xa4)
[<c03f0c54>] (ima_file_check) from [<c02d8adc>] (path_openat+0x5d8/0x1364)
[<c02d8504>] (path_openat) from [<c02dad24>] (do_filp_open+0x84/0xf0)
[<c02daca0>] (do_filp_open) from [<c02cf50c>] (do_open_execat+0x84/0x1b0)
[<c02cf488>] (do_open_execat) from [<c02d1058>] (__do_execve_file+0x43c/0x890)
[<c02d0c1c>] (__do_execve_file) from [<c02d1770>] (sys_execve+0x44/0x4c)
[<c02d172c>] (sys_execve) from [<c0101000>] (ret_fast_syscall+0x0/0x28)
---[ end trace 3455789a10e3aefd ]---
The cause is that the struct ahash_request *req is created as a
stack-local variable up in the stack (presumably somewhere in the IMA
implementation), then passed down into the CAAM driver, which tries to
dma_single_map the req->result (indirectly via map_seq_out_ptr_result)
in order to make that buffer available for the CAAM to store the result
of the following hash operation.
The calling code doesn't know how req will be used by the CAAM driver,
and there could be other such occurrences where stack memory is passed
down to the CAAM driver. Therefore we should rather fix this issue in
the CAAM driver where the requirements are known."
Fix this problem by:
-instructing the crypto engine to write the final hash in state->caam_ctx
-subsequently memcpy-ing the final hash into req->result
Cc: <stable@vger.kernel.org> # v4.19+ Reported-by: Roland Hieber <rhi@pengutronix.de> Signed-off-by: Horia Geantă <horia.geanta@nxp.com> Tested-by: Roland Hieber <rhi@pengutronix.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The fix to make WARN work in the early boot code created a problem
on older machines without EDAT-1. The setup_lowcore_dat_on function
uses the pointer from lowcore_ptr[0] to set the DAT bit in the new
PSWs. That does not work if the kernel page table is set up with
4K pages as the prefix address maps to absolute zero.
To make this work the PSWs need to be changed with via address 0 in
form of the S390_lowcore definition.
Regarding segments with a limit==0xffffffff, the SDM officially states:
When the effective limit is FFFFFFFFH (4 GBytes), these accesses may
or may not cause the indicated exceptions. Behavior is
implementation-specific and may vary from one execution to another.
In practice, all CPUs that support VMX ignore limit checks for "flat
segments", i.e. an expand-up data or code segment with base=0 and
limit=0xffffffff. This is subtly different than wrapping the effective
address calculation based on the address size, as the flat segment
behavior also applies to accesses that would wrap the 4g boundary, e.g.
a 4-byte access starting at 0xffffffff will access linear addresses
0xffffffff, 0x0, 0x1 and 0x2.
Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The address size of an instruction affects the effective address, not
the virtual/linear address. The final address may still be truncated,
e.g. to 32-bits outside of long mode, but that happens irrespective of
the address size, e.g. a 32-bit address size can yield a 64-bit virtual
address when using FS/GS with a non-zero base.
Fixes: 064aea774768 ("KVM: nVMX: Decoding memory operands of VMX instructions") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The VMCS.EXIT_QUALIFCATION field reports the displacements of memory
operands for various instructions, including VMX instructions, as a
naturally sized unsigned value, but masks the value by the addr size,
e.g. given a ModRM encoded as -0x28(%ebp), the -0x28 displacement is
reported as 0xffffffd8 for a 32-bit address size. Despite some weird
wording regarding sign extension, the SDM explicitly states that bits
beyond the instructions address size are undefined:
In all cases, bits of this field beyond the instruction’s address
size are undefined.
Failure to sign extend the displacement results in KVM incorrectly
treating a negative displacement as a large positive displacement when
the address size of the VMX instruction is smaller than KVM's native
size, e.g. a 32-bit address size on a 64-bit KVM.
The very original decoding, added by commit 064aea774768 ("KVM: nVMX:
Decoding memory operands of VMX instructions"), sort of modeled sign
extension by truncating the final virtual/linear address for a 32-bit
address size. I.e. it messed up the effective address but made it work
by adjusting the final address.
When segmentation checks were added, the truncation logic was kept
as-is and no sign extension logic was introduced. In other words, it
kept calculating the wrong effective address while mostly generating
the correct virtual/linear address. As the effective address is what's
used in the segment limit checks, this results in KVM incorreclty
injecting #GP/#SS faults due to non-existent segment violations when
a nested VMM uses negative displacements with an address size smaller
than KVM's native address size.
Using the -0x28(%ebp) example, an EBP value of 0x1000 will result in
KVM using 0x100000fd8 as the effective address when checking for a
segment limit violation. This causes a 100% failure rate when running
a 32-bit KVM build as L1 on top of a 64-bit KVM L0.
Fixes: f9eb4af67c9d ("KVM: nVMX: VMX instructions: add checks for #GP/#SS exceptions") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
When installing new memslots, KVM sets bit 0 of the generation number to
indicate that an update is in-progress. Until the update is complete,
there are no guarantees as to whether a vCPU will see the old or the new
memslots. Explicity prevent caching MMIO accesses so as to avoid using
an access cached from the old memslots after the new memslots have been
installed.
Note that it is unclear whether or not disabling caching during the
update window is strictly necessary as there is no definitive
documentation as to what ordering guarantees KVM provides with respect
to updating memslots. That being said, the MMIO spte code does not
allow reusing sptes created while an update is in-progress, and the
associated documentation explicitly states:
We do not want to use an MMIO sptes created with an odd generation
number, ... If KVM is unlucky and creates an MMIO spte while the
low bit is 1, the next access to the spte will always be a cache miss.
At the very least, disabling the per-vCPU MMIO cache during updates will
make its behavior consistent with the MMIO spte behavior and
documentation.
Fixes: 56f17dd3fbc4 ("kvm: x86: fix stale mmio cache bug") Cc: <stable@vger.kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The check to detect a wrap of the MMIO generation explicitly looks for a
generation number of zero. Now that unique memslots generation numbers
are assigned to each address space, only address space 0 will get a
generation number of exactly zero when wrapping. E.g. when address
space 1 goes from 0x7fffe to 0x80002, the MMIO generation number will
wrap to 0x2. Adjust the MMIO generation to strip the address space
modifier prior to checking for a wrap.
Fixes: 4bd518f1598d ("KVM: use separate generations for each address space") Cc: <stable@vger.kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
kvm_arch_memslots_updated() is at this point in time an x86-specific
hook for handling MMIO generation wraparound. x86 stashes 19 bits of
the memslots generation number in its MMIO sptes in order to avoid
full page fault walks for repeat faults on emulated MMIO addresses.
Because only 19 bits are used, wrapping the MMIO generation number is
possible, if unlikely. kvm_arch_memslots_updated() alerts x86 that
the generation has changed so that it can invalidate all MMIO sptes in
case the effective MMIO generation has wrapped so as to avoid using a
stale spte, e.g. a (very) old spte that was created with generation==0.
Given that the purpose of kvm_arch_memslots_updated() is to prevent
consuming stale entries, it needs to be called before the new generation
is propagated to memslots. Invalidating the MMIO sptes after updating
memslots means that there is a window where a vCPU could dereference
the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
spte that was created with (pre-wrap) generation==0.
Fixes: e59dbe09f8e6 ("KVM: Introduce kvm_arch_memslots_updated()") Cc: <stable@vger.kernel.org> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Move upstream stream off to just after receiving the last EOF completion
and disabling the CSI (and thus before disabling the IDMA channel) in
csi_stop(). For symmetry also move upstream stream on to beginning of
csi_start().
Doing this makes csi_s_stream() more symmetric with prp_s_stream() which
will require the same change to fix a hard lockup.
Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com> Cc: stable@vger.kernel.org # for 4.13 and up Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Disable the CSI immediately after receiving the last EOF before stream
off (and thus before disabling the IDMA channel). Do this by moving the
wait for EOF completion into a new function csi_idmac_wait_last_eof().
This fixes a complete system hard lockup on the SabreAuto when streaming
from the ADV7180, by repeatedly sending a stream off immediately followed
by stream on:
while true; do v4l2-ctl -d4 --stream-mmap --stream-count=3; done
Eventually this either causes the system lockup or EOF timeouts at all
subsequent stream on, until a system reset.
The lockup occurs when disabling the IDMA channel at stream off. Disabling
the CSI before disabling the IDMA channel appears to be a reliable fix for
the hard lockup.
Fixes: 4a34ec8e470cb ("[media] media: imx: Add CSI subdev driver") Reported-by: Gaël PORTAY <gael.portay@collabora.com> Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com> Cc: stable@vger.kernel.org # for 4.13 and up Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Add a linear pipeline logic for the stream control. It's created by
walking backwards on the entity graph. When the stream starts it will
simply loop through the pipeline calling the respective process_frame
function of each entity.
Fixes: f2fe89061d797 ("vimc: Virtual Media Controller core, capture
and sensor")
Cc: stable@vger.kernel.org # for v4.20 Signed-off-by: Lucas A. M. Magalhães <lucmaga@gmail.com> Acked-by: Helen Koike <helen.koike@collabora.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl>
[hverkuil-cisco@xs4all.nl: fixed small space-after-tab issue in the patch] Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The UVC video driver converts the timestamp from hardware specific unit
to one known by the kernel at the time when the buffer is dequeued. This
is fine in general, but the streamoff operation consists of the
following steps (among other things):
1. uvc_video_clock_cleanup --- the hardware clock sample array is
released and the pointer to the array is set to NULL,
2. buffers in active state are returned to the user and
3. buf_finish callback is called on buffers that are prepared.
buf_finish includes calling uvc_video_clock_update that accesses the
hardware clock sample array.
The above is serialised by a queue specific mutex. Address the problem
by skipping the clock conversion if the hardware clock sample array is
already released.
Upstream must be stopped immediately after receiving the last EOF and
before disabling the IDMA channel. This can be accomplished by moving
upstream stream off to just after receiving the last EOF completion in
prp_stop(). For symmetry also move upstream stream on to end of
prp_start().
This fixes a complete system hard lockup on the SabreAuto when streaming
from the ADV7180, by repeatedly sending a stream off immediately followed
by stream on:
while true; do v4l2-ctl -d1 --stream-mmap --stream-count=3; done
Eventually this either causes the system lockup or EOF timeouts at all
subsequent stream on, until a system reset.
The lockup occurs when disabling the IDMA channel at stream off. Stopping
the video data stream entering the IDMA channel before disabling the
channel itself appears to be a reliable fix for the hard lockup.
Fixes: f0d9c8924e2c3 ("[media] media: imx: Add IC subdev drivers") Reported-by: Gaël PORTAY <gael.portay@collabora.com> Tested-by: Gaël PORTAY <gael.portay@collabora.com> Signed-off-by: Steve Longerbeam <slongerbeam@gmail.com> Cc: stable@vger.kernel.org # for 4.13 and up Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The rcu_gp_kthread_wake() function is invoked when it might be necessary
to wake the RCU grace-period kthread. Because self-wakeups are normally
a useless waste of CPU cycles, if rcu_gp_kthread_wake() is invoked from
this kthread, it naturally refuses to do the wakeup.
Unfortunately, natural though it might be, this heuristic fails when
rcu_gp_kthread_wake() is invoked from an interrupt or softirq handler
that interrupted the grace-period kthread just after the final check of
the wait-event condition but just before the schedule() call. In this
case, a wakeup is required, even though the call to rcu_gp_kthread_wake()
is within the RCU grace-period kthread's context. Failing to provide
this wakeup can result in grace periods failing to start, which in turn
results in out-of-memory conditions.
This race window is quite narrow, but it actually did happen during real
testing. It would of course need to be fixed even if it was strictly
theoretical in nature.
This patch does not Cc stable because it does not apply cleanly to
earlier kernel versions.
Fixes: 48a7639ce80c ("rcu: Make callers awaken grace-period kthread") Reported-by: "He, Bo" <bo.he@intel.com> Co-developed-by: "Zhang, Jun" <jun.zhang@intel.com> Co-developed-by: "He, Bo" <bo.he@intel.com> Co-developed-by: "xiao, jin" <jin.xiao@intel.com> Co-developed-by: Bai, Jie A <jie.a.bai@intel.com>
Signed-off: "Zhang, Jun" <jun.zhang@intel.com>
Signed-off: "He, Bo" <bo.he@intel.com>
Signed-off: "xiao, jin" <jin.xiao@intel.com>
Signed-off: Bai, Jie A <jie.a.bai@intel.com> Signed-off-by: "Zhang, Jun" <jun.zhang@intel.com>
[ paulmck: Switch from !in_softirq() to "!in_interrupt() &&
!in_serving_softirq() to avoid redundant wakeups and to also handle the
interrupt-handler scenario as well as the softirq-handler scenario that
actually occurred in testing. ] Signed-off-by: Paul E. McKenney <paulmck@linux.ibm.com> Link: https://lkml.kernel.org/r/CD6925E8781EFD4D8E11882D20FC406D52A11F61@SHSMSX104.ccr.corp.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The send() callback should never return length as it does not in every
driver except tpm_crb in the success case. The reason is that the main
transmit functionality only cares about whether the transmit was
successful or not and ignores the count completely.
Suggested-by: Stefan Berger <stefanb@linux.ibm.com> Cc: stable@vger.kernel.org Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: Stefan Berger <stefanb@linux.ibm.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Tested-by: Alexander Steffen <Alexander.Steffen@infineon.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The current approach to read first 6 bytes from the response and then tail
of the response, can cause the 2nd memcpy_fromio() to do an unaligned read
(e.g. read 32-bit word from address aligned to a 16-bits), depending on how
memcpy_fromio() is implemented. If this happens, the read will fail and the
memory controller will fill the read with 1's.
This was triggered by 170d13ca3a2f, which should be probably refined to
check and react to the address alignment. Before that commit, on x86
memcpy_fromio() turned out to be memcpy(). By a luck GCC has done the right
thing (from tpm_crb's perspective) for us so far, but we should not rely on
that. Thus, it makes sense to fix this also in tpm_crb, not least because
the fix can be then backported to stable kernels and make them more robust
when compiled in differing environments.
Cc: stable@vger.kernel.org Cc: James Morris <jmorris@namei.org> Cc: Tomas Winkler <tomas.winkler@intel.com> Cc: Jerry Snitselaar <jsnitsel@redhat.com> Fixes: 30fc8d138e91 ("tpm: TPM 2.0 CRB Interface") Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Reviewed-by: Jerry Snitselaar <jsnitsel@redhat.com> Acked-by: Tomas Winkler <tomas.winkler@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
Adrian Hunter [Wed, 7 Mar 2018 14:02:21 +0000 (16:02 +0200)]
perf intel-pt: Fix overlap detection to identify consecutive buffers correctly
BugLink: https://bugs.launchpad.net/bugs/1837952
Overlap detection was not not updating the buffer's 'consecutive' flag.
Marking buffers consecutive has the advantage that decoding begins from
the start of the buffer instead of the first PSB. Fix overlap detection
to identify consecutive buffers correctly.
CYC packet timestamp calculation depends upon CBR which was being
cleared upon overflow (OVF). That can cause errors due to failing to
synchronize with sideband events. Even if a CBR change has been lost,
the old CBR is still a better estimate than zero. So remove the clearing
of CBR.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20190206103947.15750-4-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
The .orc_unwind section is a packed array of 6-byte structs. It's
currently aligned to 6 bytes, which is causing warnings in the LLD
linker.
Six isn't a power of two, so it's not a valid alignment value. The
actual alignment doesn't matter much because it's an array of packed
structs. An alignment of two is sufficient. In reality it always gets
aligned to four bytes because it comes immediately after the
4-byte-aligned .orc_unwind_ip section.
If wakeup_source_add() is called right after wakeup_source_remove()
for the same wakeup source, timer_setup() may be called for a
potentially scheduled timer which is incorrect.
To avoid that, move the wakeup source timer cancellation from
wakeup_source_drop() to wakeup_source_remove().
Moreover, make wakeup_source_remove() clear the timer function after
canceling the timer to let wakeup_source_not_registered() treat
unregistered wakeup sources in the same way as the ones that have
never been registered.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.4+ <stable@vger.kernel.org> # 4.4+
[ rjw: Subject, changelog, merged two patches together ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
If we have to retransmit a request, we should ensure that we reinitialise
the sequence results structure, since in the event of a signal
we need to treat the request as if it had not been sent.
Commit 62a063b8e7d1 "nfsd4: fix crash on writing v4_end_grace before
nfsd startup" is trying to fix a NULL dereference issue, but it
mistakenly checks if the nfsd server is started. So fix it.
Fixes: 62a063b8e7d1 "nfsd4: fix crash on writing v4_end_grace before nfsd startup" Cc: stable@vger.kernel.org Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: Yihao Wu <wuyihao@linux.alibaba.com> Signed-off-by: J. Bruce Fields <bfields@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
If the result of an NFSv3 readdir{,plus} request results in the
"offset" on one entry having to be split across 2 pages, and is sized
so that the next directory entry doesn't fit in the requested size,
then memory corruption can happen.
When encode_entry() is called after encoding the last entry that fits,
it notices that ->offset and ->offset1 are set, and so stores the
offset value in the two pages as required. It clears ->offset1 but
*does not* clear ->offset.
Normally this omission doesn't matter as encode_entry_baggage() will
be called, and will set ->offset to a suitable value (not on a page
boundary).
But in the case where cd->buflen < elen and nfserr_toosmall is
returned, ->offset is not reset.
This means that nfsd3proc_readdirplus will see ->offset with a value 4
bytes before the end of a page, and ->offset1 set to NULL.
It will try to write 8bytes to ->offset.
If we are lucky, the next page will be read-only, and the system will
BUG: unable to handle kernel paging request at...
If we are unlucky, some innocent page will have the first 4 bytes
corrupted.
nfsd3proc_readdir() doesn't even check for ->offset1, it just blindly
writes 8 bytes to the offset wherever it is.
Fix this by clearing ->offset after it is used, and copying the
->offset handling code from nfsd3_proc_readdirplus into
nfsd3_proc_readdir.
(Note that the commit hash in the Fixes tag is from the 'history'
tree - this bug predates git).
When we fail to add the request to the I/O queue, we currently leave it
to the caller to free the failed request. However since some of the
requests that fail are actually created by nfs_pageio_add_request()
itself, and are not passed back the caller, this leads to a leakage
issue, which can again cause page locks to leak.
This commit addresses the leakage by freeing the created requests on
error, using desc->pg_completion_ops->error_cleanup()
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Fixes: a7d42ddb30997 ("nfs: add mirroring support to pgio layer") Cc: stable@vger.kernel.org # v4.0: c18b96a1b862: nfs: clean up rest of reqs Cc: stable@vger.kernel.org # v4.0: d600ad1f2bdb: NFS41: pop some layoutget Cc: stable@vger.kernel.org # v4.0+ Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>
When using dm-integrity underneath md-raid, some tests with raid
auto-correction trigger large amounts of integrity failures - and all
these failures print an error message. These messages can bring the
system to a halt if the system is using serial console.
Fix this by limiting the rate of error messages - it improves the speed
of raid recovery and avoids the hang.
A dm-raid array with devices larger than 4GB won't assemble on
a 32 bit host since _check_data_dev_sectors() was added in 4.16.
This is because to_sector() treats its argument as an "unsigned long"
which is 32bits (4GB) on a 32bit host. Using "unsigned long long"
is more correct.
Kernels as early as 4.2 can have other problems due to to_sector()
being used on the size of a device.
Fixes: 0cf4503174c1 ("dm raid: add support for the MD RAID0 personality")
cc: stable@vger.kernel.org (v4.2+) Reported-and-tested-by: Guillaume Perréal <gperreal@free.fr> Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Khalid Elmously <khalid.elmously@canonical.com>