Prior to commit 55b9b50811ca ("perf script: Support -F brstack,dso and
brstacksym,dso"), we were printing a space before the brstack data. It
seems that this space was important. Without it, parsing is difficult.
Very sorry for the mistake.
Notice here how the "ip" and "brstack" run together:
Since "block: support large requests in blk_rq_map_user_iov" we
started to call it with partially drained iter; that works fine
on the write side, but reads create a copy of iter for completion
time. And that needs to take the possibility of ->iov_iter != 0
into account...
we need to take care of failure exit as well - pages already
in bio should be dropped by analogue of bio_unmap_pages(),
since their refcounts had been bumped only once per reference
in bio.
bio_map_user_iov and bio_unmap_user do unbalanced pages refcounting if
IO vector has small consecutive buffers belonging to the same page.
bio_add_pc_page merges them into one, but the page reference is never
dropped.
In the code added to function submit_page_section by commit b1058b981,
sdio->bio can currently be NULL when calling dio_bio_submit. This then
leads to a NULL pointer access in dio_bio_submit, so check for a NULL
bio in submit_page_section before trying to submit it instead.
Fixes xfstest generic/250 on gfs2.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
It turns out that Tegra20 has a bug in the implementation of the MSI
target address register (which is worked around by the existence of the
struct tegra_pcie_soc.msi_base_shift parameter) that restricts the MSI
target memory to the lower 32 bits of physical memory on that particular
generation. The offending patch causes a regression on TrimSlice, which
is a Tegra20-based device and has a PCI network interface card.
An initial, simpler fix was to change the MSI target address for Tegra20
only, but it was pointed out that the offending commit also prevents the
use of 32-bit only MSI capable devices, even on later chips. Technically
this was never guaranteed to work with the prior code in the first place
because the allocated page could have resided beyond the 4 GiB boundary,
but it is still possible that this could've introduced a regression.
The proper fix that was settled on is to select a fixed address within
the lowest 32 bits of physical address space that is otherwise unused,
but testing of that patch has provided mixed results that are not fully
understood yet.
Given all of the above and the relative urgency to get this fixed in
v4.13, revert the offending commit until a universal fix is found.
Fixes: d7bd554f27c9 ("PCI: tegra: Do not allocate MSI target memory") Reported-by: Tomasz Maciej Nowak <tmn505@gmail.com> Reported-by: Erik Faye-Lund <kusmabite@gmail.com> Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
struct pci_host_bridge gained hooks to map/swizzle IRQs, so that the IRQ
mapping can be done automatically by PCI core code through the
pci_assign_irq() function instead of resorting to arch-specific
implementation callbacks to carry out the same task which force PCI host
bridge drivers implementation to implement per-arch kludges to carry out a
task that is inherently architecture agnostic.
Commit 769b461fc0c0 ("arm64: PCI: Drop DT IRQ allocation from
pcibios_alloc_irq()") was assuming all PCI host controller drivers had been
converted to use ->map_irq(), but that wasn't the case: pci-aardvark had
not been converted. Due to this, it broke the support for legacy PCI
interrupts when using the pci-aardvark driver (used on Marvell Armada 3720
platforms).
In order to fix this, we make sure the ->map_irq and ->swizzle_irq fields
of pci_host_bridge are properly filled in.
Fixes: 769b461fc0c0 ("arm64: PCI: Drop DT IRQ allocation from pcibios_alloc_irq()") Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
BUG: KASAN: use-after-free in usb_composite_overwrite_options+...
[libcomposite] at addr ...
Read of size 1 by task ...
when some driver is un-bound and then bound again.
For example, this happens with FunctionFS driver when "ffs-test"
test application is run several times in a row.
If the driver has empty manufacturer ID string in initial static data,
it is then replaced with generated string. After driver unbinding
the generated string is freed, but the driver data still keep that
pointer. And if the driver is then bound again, that pointer
is re-used for string emptiness check.
The fix is to clean up the driver string data upon its unbinding
to drop the pointer to freed memory.
Fixes: cc2683c318a5 ("usb: gadget: Provide a default implementation of default manufacturer string") Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Kmemleak checking configuration reports a memory leak in
usb_os_desc_prepare_interf_dir function when rndis function
instance is freed and then allocated again. For example, this
happens with FunctionFS driver with RNDIS function enabled
when "ffs-test" test application is run several times in a row.
The data for intermediate "os_desc" group for interface directories
is allocated as a single VLA chunk and (after a change of default
groups handling) is not ever freed and actually not stored anywhere
besides inside a list of default groups of a parent group.
The fix is to make usb_os_desc_prepare_interf_dir function return
a pointer to allocated data (as a pointer to the first VLA item)
instead of (an unused) integer and to make the caller component
(currently the only one is RNDIS function) responsible for storing
the pointer and freeing the memory when appropriate.
Fixes: 1ae1602de028 ("configfs: switch ->default groups to a linked list") Signed-off-by: Andrew Gabbasov <andrew_gabbasov@mentor.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
crtc_state_is_legacy_gamma also checks for CTM, which was missing from
intel_color_check. By using the same condition for commit and check
we reduce the chance of mismatches.
This was spotted by KASAN while trying to rework kms_color igt test.
While technically CHV isn't DDI, we do look at the VBT based DDI port
info for HDMI DDC pin and DP AUX channel. (We call these "alternate",
but they're really just something that aren't platform defaults.)
In commit e4ab73a13291 ("drm/i915: Respect alternate_ddc_pin for all DDI
ports") Ville writes, "IIRC there may be CHV system that might actually
need this."
I'm not sure why there couldn't be even more platforms that need this,
but start conservative, and parse the info for CHV in addition to DDI.
intel_crtc->config->cpu_transcoder isn't yet filled out when
intel_crtc_mode_get() gets called during output probing, so we should
not use it there. Instead intel_crtc_mode_get() figures out the correct
transcoder on its own, and that's what we should use.
If the BIOS boots LVDS on pipe B, intel_crtc_mode_get() would actually
end up reading the timings from pipe A instead (since PIPE_A==0),
which clearly isn't what we want.
It looks to me like this may have been broken by
commit eccb140bca67 ("drm/i915: hw state readout&check support for cpu_transcoder")
as that one removed the early initialization of cpu_transcoder from
intel_crtc_init().
Cc: dri-devel@lists.freedesktop.org Cc: Rob Kramer <rob@solution-space.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reported-by: Rob Kramer <rob@solution-space.com> Fixes: eccb140bca67 ("drm/i915: hw state readout&check support for cpu_transcoder")
References: https://lists.freedesktop.org/archives/dri-devel/2016-April/104142.html Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk> Link: https://patchwork.freedesktop.org/patch/msgid/1459525046-19425-1-git-send-email-ville.syrjala@linux.intel.com
(cherry picked from commit e30a154b5262b967b133b06ac40777e651045898) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Kernel stores the time in jiffies at which the eDP panel is turned
off. This should be obtained after the panel is off (after the
wait_panel_off). When we next attempt to turn the panel on, we use the
difference between the timestamp at which we want to turn the panel on
and timestamp at which panel was turned off to ensure that this is equal
to panel power cycle delay and if not we wait for the remaining
time. Not waiting for the panel power cycle delay can cause the panel to
not turn on giving rise to AUX timeouts for the attempted AUX
transactions.
v2:
* Separate lines for bugzilla (Jani Nikula)
* Suggested by tag (Daniel Vetter)
Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Cc: Jani Nikula <jani.nikula@linux.intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101518
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101144 Suggested-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Manasi Navare <manasi.d.navare@intel.com> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Jani Nikula <jani.nikula@linux.intel.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/1507135706-17147-1-git-send-email-manasi.d.navare@intel.com
(cherry picked from commit cbacf02e7796fea02e5c6e46c90ed7cbe9e6f2c0) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
While line6_probe() may kick off URB for a control MIDI endpoint, the
function doesn't clean up it properly at its error path. This results
in a leftover URB action that is eventually triggered later and causes
an Oops like:
general protection fault: 0000 [#1] PREEMPT SMP KASAN
CPU: 1 PID: 0 Comm: swapper/1 Not tainted
RIP: 0010:usb_fill_bulk_urb ./include/linux/usb.h:1619
RIP: 0010:line6_start_listen+0x3fe/0x9e0 sound/usb/line6/driver.c:76
Call Trace:
<IRQ>
line6_data_received+0x1f7/0x470 sound/usb/line6/driver.c:326
__usb_hcd_giveback_urb+0x2e0/0x650 drivers/usb/core/hcd.c:1779
usb_hcd_giveback_urb+0x337/0x420 drivers/usb/core/hcd.c:1845
dummy_timer+0xba9/0x39f0 drivers/usb/gadget/udc/dummy_hcd.c:1965
call_timer_fn+0x2a2/0x940 kernel/time/timer.c:1281
....
Since the whole clean-up procedure is done in line6_disconnect()
callback, we can simply call it in the error path instead of
open-coding the whole again. It'll fix such an issue automagically.
The error path in podhd_init() tries to clear the pending timer, while
the timer object is initialized at the end of init sequence, thus it
may hit the uninitialized object, as spotted by syzkaller:
INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
CPU: 1 PID: 1845 Comm: kworker/1:2 Not tainted 4.14.0-rc2-42613-g1488251d1a98 #238
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
Workqueue: usb_hub_wq hub_event
Call Trace:
__dump_stack lib/dump_stack.c:16
dump_stack+0x292/0x395 lib/dump_stack.c:52
register_lock_class+0x6c4/0x1a00 kernel/locking/lockdep.c:769
__lock_acquire+0x27e/0x4550 kernel/locking/lockdep.c:3385
lock_acquire+0x259/0x620 kernel/locking/lockdep.c:4002
del_timer_sync+0x12c/0x280 kernel/time/timer.c:1237
podhd_disconnect+0x8c/0x160 sound/usb/line6/podhd.c:299
line6_probe+0x844/0x1310 sound/usb/line6/driver.c:783
podhd_probe+0x64/0x70 sound/usb/line6/podhd.c:474
....
For addressing it, assure the initializations of timer and work by
moving them to the beginning of podhd_init().
Fixes: 790869dacc3d ("ALSA: line6: Add support for POD X3") Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
When podhd_init() failed with the acquiring a ctrl i/f, the line6
helper still calls the disconnect callback that eventually calls again
usb_driver_release_interface() with the NULL intf.
Put the proper NULL check before calling it for avoiding an Oops.
Fixes: fc90172ba283 ("ALSA: line6: Claim pod x3 usb data interface") Reported-by: Andrey Konovalov <andreyknvl@google.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
caiaq driver doesn't kill the URB properly at its error path during
the probe, which may lead to a use-after-free error later. This patch
addresses it.
Reported-by: Johan Hovold <johan@kernel.org> Reviewed-by: Johan Hovold <johan@kernel.org> Signed-off-by: Takashi Iwai <tiwai@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The event handler in the virmidi sequencer code takes a read-lock for
the linked list traverse, while it's calling snd_seq_dump_var_event()
in the loop. The latter function may expand the user-space data
depending on the event type. It eventually invokes copy_from_user(),
which might be a potential dead-lock.
The sequencer core guarantees that the user-space data is passed only
with atomic=0 argument, but snd_virmidi_dev_receive_event() ignores it
and always takes read-lock(). For avoiding the problem above, this
patch introduces rwsem for non-atomic case, while keeping rwlock for
atomic case.
Also while we're at it: the superfluous irq flags is dropped in
snd_virmidi_input_open().
There is a potential race window opened at creating and deleting a
port via ioctl, as spotted by fuzzing. snd_seq_create_port() creates
a port object and returns its pointer, but it doesn't take the
refcount, thus it can be deleted immediately by another thread.
Meanwhile, snd_seq_ioctl_create_port() still calls the function
snd_seq_system_client_ev_port_start() with the created port object
that is being deleted, and this triggers use-after-free like:
We may fix this in a few different ways, and in this patch, it's fixed
simply by taking the refcount properly at snd_seq_create_port() and
letting the caller unref the object after use. Also, there is another
potential use-after-free by sprintf() call in snd_seq_create_port(),
and this is moved inside the lock.
USB-audio driver may leave a stray URB for the mixer interrupt when it
exits by some error during probe. This leads to a use-after-free
error as spotted by syzkaller like:
==================================================================
BUG: KASAN: use-after-free in snd_usb_mixer_interrupt+0x604/0x6f0
Call Trace:
<IRQ>
__dump_stack lib/dump_stack.c:16
dump_stack+0x292/0x395 lib/dump_stack.c:52
print_address_description+0x78/0x280 mm/kasan/report.c:252
kasan_report_error mm/kasan/report.c:351
kasan_report+0x23d/0x350 mm/kasan/report.c:409
__asan_report_load8_noabort+0x19/0x20 mm/kasan/report.c:430
snd_usb_mixer_interrupt+0x604/0x6f0 sound/usb/mixer.c:2490
__usb_hcd_giveback_urb+0x2e0/0x650 drivers/usb/core/hcd.c:1779
....
Actually such a URB is killed properly at disconnection when the
device gets probed successfully, and what we need is to apply it for
the error-path, too.
In this patch, we apply snd_usb_mixer_disconnect() at releasing.
Also introduce a new flag, disconnected, to struct usb_mixer_interface
for not performing the disconnection procedure twice.
When using FAT on a block device which supports rw_page, we can hit
BUG_ON(!PageLocked(page)) in try_to_free_buffers(). This is because we
call clean_buffers() after unlocking the page we've written. Introduce
a new clean_page_buffers() which cleans all buffers associated with a
page and call it from within bdev_write_page().
[akpm@linux-foundation.org: s/PAGE_SIZE/~0U/ per Linus and Matthew] Link: http://lkml.kernel.org/r/20171006211541.GA7409@bombadil.infradead.org Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Reported-by: Toshi Kani <toshi.kani@hpe.com> Reported-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Tested-by: Toshi Kani <toshi.kani@hpe.com> Acked-by: Johannes Thumshirn <jthumshirn@suse.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
This reverts commits 5d17a73a2ebe ("vmalloc: back off when the current
task is killed") and 171012f56127 ("mm: don't warn when vmalloc() fails
due to a fatal signal").
Commit 5d17a73a2ebe ("vmalloc: back off when the current task is
killed") made all vmalloc allocations from a signal-killed task fail.
We have seen crashes in the tty driver from this, where a killed task
exiting tries to switch back to N_TTY, fails n_tty_open because of the
vmalloc failing, and later crashes when dereferencing tty->disc_data.
Arguably, relying on a vmalloc() call to succeed in order to properly
exit a task is not the most robust way of doing things. There will be a
follow-up patch to the tty code to fall back to the N_NULL ldisc.
But the justification to make that vmalloc() call fail like this isn't
convincing, either. The patch mentions an OOM victim exhausting the
memory reserves and thus deadlocking the machine. But the OOM killer is
only one, improbable source of fatal signals. It doesn't make sense to
fail allocations preemptively with plenty of memory in most cases.
The patch doesn't mention real-life instances where vmalloc sites would
exhaust memory, which makes it sound more like a theoretical issue to
begin with. But just in case, the OOM access to memory reserves has
been restricted on the allocator side in cd04ae1e2dc8 ("mm, oom: do not
rely on TIF_MEMDIE for memory reserves access"), which should take care
of any theoretical concerns on that front.
Revert this patch, and the follow-up that suppresses the allocation
warnings when we fail the allocations due to a signal.
Link: http://lkml.kernel.org/r/20171004185906.GB2136@cmpxchg.org Fixes: 171012f56127 ("mm: don't warn when vmalloc() fails due to a fatal signal") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Vlastimil Babka <vbabka@suse.cz> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Alan Cox <alan@llwyncelyn.cymru> Cc: Christoph Hellwig <hch@lst.de> Cc: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Deletion of subdevice will remove device properties associated to parent
when they share the same firmware node after commit 478573c93abd (driver
core: Don't leak secondary fwnode on device removal). This was observed
with a driver adding subdevice that driver wasn't able to read device
properties after rmmod/modprobe cycle.
Parent device will have its primary firmware node pointing to an ACPI
node and secondary firmware node point to device properties.
ACPI_COMPANION_SET() call in parent probe will set the subdevice's
firmware node to point to the same 'struct fwnode_handle' and the
associated secondary firmware node, i.e. the device properties as the
parent.
When subdevice is deleted in parent remove that will remove those
device properties and attempt to read device properties in next
parent probe call will fail.
Fix this by tracking the owner device of device properties and delete
them only when owner device is being deleted.
Fixes: 478573c93abd (driver core: Don't leak secondary fwnode on device removal) Signed-off-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The commit 79d2c8bede2c93f943 ("pinctrl/amd: save pin registers over
suspend/resume") caused the following compilation errors:
drivers/pinctrl/pinctrl-amd.c: In function ‘amd_gpio_should_save’:
drivers/pinctrl/pinctrl-amd.c:741:8: error: ‘const struct pin_desc’ has no member named ‘mux_owner’
if (pd->mux_owner || pd->gpio_owner ||
^
drivers/pinctrl/pinctrl-amd.c:741:25: error: ‘const struct pin_desc’ has no member named ‘gpio_owner’
if (pd->mux_owner || pd->gpio_owner ||
We need to enable CONFIG_PINMUX for this driver as well.
Fixes: 79d2c8bede2c93f943 ("pinctrl/amd: save pin registers over suspend/resume") Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The DREQE bit of the DnFIFOSEL should be set to 1 after the DE bit of
USB-DMAC on R-Car SoCs is set to 1 after the USB-DMAC received a
zero-length packet. Otherwise, a transfer completion interruption
of USB-DMAC doesn't happen. Even if the driver changes the sequence,
normal operations (transmit/receive without zero-length packet) will
not cause any side-effects. So, this patch fixes the sequence anyway.
When KVM emulates an exit from L2 to L1, it loads L1 CR4 into the
guest CR4. Before this CR4 loading, the guest CR4 refers to L2
CR4. Because these two CR4's are in different levels of guest, we
should vmx_set_cr4() rather than kvm_set_cr4() here. The latter, which
is used to handle guest writes to its CR4, checks the guest change to
CR4 and may fail if the change is invalid.
The failure may cause trouble. Consider we start
a L1 guest with non-zero L1 PCID in use,
(i.e. L1 CR4.PCIDE == 1 && L1 CR3.PCID != 0)
and
a L2 guest with L2 PCID disabled,
(i.e. L2 CR4.PCIDE == 0)
and following events may happen:
1. If kvm_set_cr4() is used in load_vmcs12_host_state() to load L1 CR4
into guest CR4 (in VMCS01) for L2 to L1 exit, it will fail because
of PCID check. As a result, the guest CR4 recorded in L0 KVM (i.e.
vcpu->arch.cr4) is left to the value of L2 CR4.
2. Later, if L1 attempts to change its CR4, e.g., clearing VMXE bit,
kvm_set_cr4() in L0 KVM will think L1 also wants to enable PCID,
because the wrong L2 CR4 is used by L0 KVM as L1 CR4. As L1
CR3.PCID != 0, L0 KVM will inject GP to L1 guest.
Fixes: 4704d0befb072 ("KVM: nVMX: Exiting from L2 to L1") Cc: qemu-stable@nongnu.org Signed-off-by: Haozhong Zhang <haozhong.zhang@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
is_last_gpte() is not equivalent to the pseudo-code given in commit 6bb69c9b69c31 ("KVM: MMU: simplify last_pte_bitmap") because an incorrect
value of last_nonleaf_level may override the result even if level == 1.
It is critical for is_last_gpte() to return true on level == 1 to
terminate page walks. Otherwise memory corruption may occur as level
is used as an index to various data structures throughout the page
walking code. Even though the actual bug would be wherever the MMU is
initialized (as in the previous patch), be defensive and ensure here
that is_last_gpte() returns the correct value.
This patch is also enough to fix CVE-2017-12188.
Fixes: 6bb69c9b69c315200ddc2bc79aee14c0184cf5b2 Cc: Andy Honig <ahonig@google.com> Signed-off-by: Ladi Prosek <lprosek@redhat.com>
[Panic if walk_addr_generic gets an incorrect level; this is a serious
bug and it's not worth a WARN_ON where the recovery path might hide
further exploitable issues; suggested by Andrew Honig. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The shash ahash digest adaptor function may crash if given a
zero-length input together with a null SG list. This is because
it tries to read the SG list before looking at the length.
This patch fixes it by checking the length first.
Reported-by: Stephan Müller<smueller@chronox.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Tested-by: Stephan Müller <smueller@chronox.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The skcipher walk interface doesn't handle zero-length input
properly as the old blkcipher walk interface did. This is due
to the fact that the length check is done too late.
This patch moves the length check forward so that it does the
right thing.
The hid descriptor identifies the length and type of subordinate
descriptors for a device. If the received hid descriptor is smaller than
the size of the struct hid_descriptor, it is possible to cause
out-of-bounds.
In addition, if bNumDescriptors of the hid descriptor have an incorrect
value, this can also cause out-of-bounds while approaching hdesc->desc[n].
So check the size of hid descriptor and bNumDescriptors.
BUG: KASAN: slab-out-of-bounds in usbhid_parse+0x9b1/0xa20
Read of size 1 at addr ffff88006c5f8edf by task kworker/1:2/1261
Michael Sterrett reports a NULL pointer dereference on NFSv3 mounts when
CONFIG_NFS_V4 is not set because the NFS UOC rpc_wait_queue has not been
initialized. Move the initialization of the queue out of the CONFIG_NFS_V4
conditional setion.
Fixes: 7d6ddf88c4db ("NFS: Add an iocounter wait function for async RPC tasks") Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
When looking for unused xbar_out lane we should also protect the set_bit()
call with the same mutex to protect against concurrent threads picking the
same ID.
Fixes: ec9bfa1e1a796 ("dmaengine: ti-dma-crossbar: dra7: Use bitops instead of idr") Signed-off-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Signed-off-by: Vinod Koul <vinod.koul@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Memory to Memory transfers does not have any special alignment needs
regarding to acnt array size, but if one of the areas are in memory mapped
regions (like PCIe memory), we need to make sure that the acnt array size
is aligned with the mem copy parameters.
Before "dmaengine: edma: Optimize memcpy operation" change the memcpy was set
up in a different way: acnt == number of bytes in a word based on
__ffs((src | dest | len), bcnt and ccnt for looping the necessary number of
words to comlete the trasnfer.
Instead of reverting the commit we can fix it to make sure that the ACNT size
is aligned to the traswnfer.
This patch fixes a regression caused by the new changes
in the "run wake" handlers.
The mei devices that support D0i3 are no longer receiving an interrupt
after entering runtime suspend state and will stall.
pci_dev_run_wake function now returns "true" for some devices
(including mei) for which it used to return "false",
arguably incorrectly as "run wake" used to mean that
wakeup signals can be generated for a device in
the working state of the system, so it could not be enabled
or disabled before too.
MEI maps runtime suspend/resume to its own defined
power gating (PG) states, (D0i3 or other depending on generation),
hence we need to go around the native PCI runtime service which
eventually brings the device into D3cold/hot state,
but the mei devices cannot wake up from D3 unlike from D0i3/PG state,
which keeps irq running.
To get around PCI device native runtime pm,
MEI uses runtime pm domain handlers which take precedence.
Signed-off-by: Alexander Usyskin <alexander.usyskin@intel.com> Signed-off-by: Tomas Winkler <tomas.winkler@intel.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Compiling ebpf_jit.c with gcc 4.9 results in a (likely spurious)
compiler warning, as gcc has detected that the variable "target" may be
used uninitialised. Since -Werror is active, this is treated as an error
and causes a kernel build failure whenever CONFIG_MIPS_EBPF_JIT is
enabled.
arch/mips/net/ebpf_jit.c: In function 'build_one_insn':
arch/mips/net/ebpf_jit.c:1118:80: error: 'target' may be used
uninitialized in this function [-Werror=maybe-uninitialized]
emit_instr(ctx, j, target);
^
cc1: all warnings being treated as errors
Fix this by initialising "target" to 0. If it really is used
uninitialised this would result in a jump to 0 and a detectable run time
failure.
Signed-off-by: Matt Redfearn <matt.redfearn@imgtec.com> Fixes: b6bd53f9c4e8 ("MIPS: Add missing file for eBPF JIT.") Cc: James Hogan <james.hogan@imgtec.com> Cc: David Daney <david.daney@cavium.com> Cc: David S. Miller <davem@davemloft.net> Cc: Colin Ian King <colin.king@canonical.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org
Patchwork: https://patchwork.linux-mips.org/patch/17375/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The FPU emulator includes 2 calls to pr_err() which are triggered by
invalid instruction encodings for MIPSr6 cmp.cond.fmt instructions.
These cases are not kernel errors, merely invalid instructions which are
already handled by delivering a SIGILL which will provide notification
that something failed in cases where that makes sense.
In cases where that SIGILL is somewhat expected & being handled, for
example when crashme happens to generate one of the affected bad
encodings, the message is printed with no useful context about what
triggered it & spams the kernel log for no good reason.
Remove the pr_err() calls to make crashme run silently & treat the bad
encodings the same way we do others, with a SIGILL & no further kernel
log output.
Signed-off-by: Paul Burton <paul.burton@imgtec.com> Fixes: f8c3c6717a71 ("MIPS: math-emu: Add support for the CMP.condn.fmt R6 instruction") Cc: linux-mips@linux-mips.org
Patchwork: https://patchwork.linux-mips.org/patch/17253/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The dummy-hcd driver calls the gadget driver's disconnect callback
under the wrong conditions. It should invoke the callback when Vbus
power is turned off, but instead it does so when the D+ pullup is
turned off.
This can cause a deadlock in the composite core when a gadget driver
is unregistered:
This is obviously bad as all of the reuseport/reuse logic was reversed,
which caused weird problems like allowing an ipv4 bind conflict if we
opened an ipv4 only socket on a port followed by an ipv6 only socket on
the same port.
Fixes: b9470c27607b ("inet: kill smallest_size and smallest_port") Reported-by: Cole Robinson <crobinso@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry-picked from commit fbed24bcc69d3e48c5402c371f19f5c7688871e5) Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Josef Bacik [Mon, 16 Oct 2017 11:46:25 +0000 (13:46 +0200)]
net: use inet6_rcv_saddr to compare sockets
BugLink: http://bugs.launchpad.net/bugs/1722702
In ipv6_rcv_saddr_equal() we need to use inet6_rcv_saddr(sk) for the
ipv6 compare with the fast socket information to make sure we're doing
the proper comparisons.
Fixes: 637bc8bbe6c0 ("inet: reset tb->fastreuseport when adding a reuseport sk") Reported-and-tested-by: Cole Robinson <crobinso@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry-picked from commit 7a56673b58f2414679e926bba80309a037a4fd35) Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Josef Bacik [Mon, 16 Oct 2017 11:46:24 +0000 (13:46 +0200)]
net: set tb->fast_sk_family
BugLink: http://bugs.launchpad.net/bugs/1722702
We need to set the tb->fast_sk_family properly so we can use the proper
comparison function for all subsequent reuseport bind requests.
Fixes: 637bc8bbe6c0 ("inet: reset tb->fastreuseport when adding a reuseport sk") Reported-and-tested-by: Cole Robinson <crobinso@redhat.com> Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry-picked from commit cbb2fb5c72f48d3029c144be0f0e61da1c7bccf7) Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Seth Forshee <seth.forshee@canonical.com Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Michael Neuling [Fri, 6 Oct 2017 17:26:06 +0000 (13:26 -0400)]
powerpc/64s: Add workaround for P9 vector CI load issue
BugLink: http://bugs.launchpad.net/bugs/1721070
POWER9 DD2.1 and earlier has an issue where some cache inhibited
vector load will return bad data. The workaround is two part, one
firmware/microcode part triggers HMI interrupts when hitting such
loads, the other part is this patch which then emulates the
instructions in Linux.
The affected instructions are limited to lxvd2x, lxvw4x, lxvb16x and
lxvh8x.
When an instruction triggers the HMI, all threads in the core will be
sent to the HMI handler, not just the one running the vector load.
In general, these spurious HMIs are detected by the emulation code and
we just return back to the running process. Unfortunately, if a
spurious interrupt occurs on a vector load that's to normal memory we
have no way to detect that it's spurious (unless we walk the page
tables, which is very expensive). In this case we emulate the load but
we need do so using a vector load itself to ensure 128bit atomicity is
preserved.
Some additional debugfs emulated instruction counters are added also.
Signed-off-by: Michael Neuling <mikey@neuling.org> Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
[mpe: Switch CONFIG_PPC_BOOK3S_64 to CONFIG_VSX to unbreak the build] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from linux-next commit 5080332c2c893118dbc18755f35c8b0131cf0fc4) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
powerpc/mce: Move 64-bit machine check code into mce.c
BugLink: http://bugs.launchpad.net/bugs/1721070
We already have mce.c which is built for 64bit and contains other parts
of the machine check code, so move these bits in there too.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit ccd3cd361341b71ae2afa596f6b470fcb32a916e) Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com> Acked-by: Colin Ian King <colin.king@canonical.com> Acked-by: Kamal Mostafa <kamal@canonical.com> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Commit 2ef7a2953c81 ("arm, arm64: factorize common cpu capacity default code")
introduced init_cpu_capacity_callback and init_cpu_capacity_notifier
which are referenced from initcall and are missing __init{,data}
annotations resulting the below section mismatch build warnings.
"WARNING: vmlinux.o(.text+0xbab790): Section mismatch in reference from
the function init_cpu_capacity_callback() to the variable .init.text:$x
The function init_cpu_capacity_callback() references the variable
__init $x. This is often because init_cpu_capacity_callback lacks a
__init annotation or the annotation of $x is wrong."
This patch fixes the above build warnings by adding the required annotations.
Fixes: 2ef7a2953c81 ("arm, arm64: factorize common cpu capacity default code") Cc: Juri Lelli <juri.lelli@arm.com> Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The commit bc044e8db796 ("udp: perform source validation for
mcast early demux") does not take into account that broadcast packets
lands in the same code path and they need different checks for the
source address - notably, zero source address are valid for bcast
and invalid for mcast.
As a result, 2nd and later broadcast packets with 0 source address
landing to the same socket are dropped. This breaks dhcp servers.
Since we don't have stringent performance requirements for ingress
broadcast traffic, fix it by disabling UDP early demux such traffic.
Reported-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Fixes: bc044e8db796 ("udp: perform source validation for mcast early demux") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The UDP early demux can leverate the rx dst cache even for
multicast unconnected sockets.
In such scenario the ipv4 source address is validated only on
the first packet in the given flow. After that, when we fetch
the dst entry from the socket rx cache, we stop enforcing
the rp_filter and we even start accepting any kind of martian
addresses.
Disabling the dst cache for unconnected multicast socket will
cause large performace regression, nearly reducing by half the
max ingress tput.
Instead we factor out a route helper to completely validate an
skb source address for multicast packets and we call it from
the UDP early demux for mcast packets landing on unconnected
sockets, after successful fetching the related cached dst entry.
This still gives a measurable, but limited performance
regression:
The above figures are on top of current net tree.
Applying the net-next commit 6e617de84e87 ("net: avoid a full
fib lookup when rp_filter is disabled.") the delta with
rp_filter == 0 will decrease even more.
Fixes: 421b3885bf6d ("udp: ipv4: Add udp early demux") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Commit 6edfa11cb396 ("clk: samsung: Add enable/disable operation for
PLL36XX clocks") added enable/disable operations to PLL clocks. Prior that
VPLL and EPPL clocks were always enabled because the enable bit was never
touched. Those clocks have to be enabled during suspend/resume cycle,
because otherwise board fails to enter sleep mode. This patch enables them
unconditionally before entering system suspend state. System restore
function will set them to the previous state saved in the register cache
done before that unconditional enable.
Fixes: 6edfa11cb396 ("clk: samsung: Add enable/disable operation for PLL36XX clocks") Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Reviewed-by: Chanwoo Choi <cw00.choi@samsung.com> Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org> Acked-by: Sylwester Nawrocki <s.nawrocki@samsung.com> Signed-off-by: Stephen Boyd <sboyd@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Define a policy for packet pattern attributes in order to fix a
potential read over the end of the buffer during nla_get_u32()
of the NL80211_PKTPAT_OFFSET attribute.
Note that the data there can always be read due to SKB allocation
(with alignment and struct skb_shared_info at the end), but the
data might be uninitialized. This could be used to leak some data
from uninitialized vmalloc() memory, but most drivers don't allow
an offset (so you'd just get -EINVAL if the data is non-zero) or
just allow it with a fixed value - 100 or 128 bytes, so anything
above that would get -EINVAL. With brcmfmac the limit is 1500 so
(at least) one byte could be obtained.
Signed-off-by: Peng Xu <pxu@qti.qualcomm.com> Signed-off-by: Jouni Malinen <jouni@qca.qualcomm.com>
[rewrite description based on SKB allocation knowledge] Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Currently, NVMe PCI host driver is programming CMB dma address as
I/O SQs addresses. This results in failures on systems where 1:1
outbound mapping is not used (example Broadcom iProc SOCs) because
CMB BAR will be progammed with PCI bus address but NVMe PCI EP will
try to access CMB using dma address.
To have CMB working on systems without 1:1 outbound mapping, we
program PCI bus address for I/O SQs instead of dma address. This
approach will work on systems with/without 1:1 outbound mapping.
Based on a report and previous patch from Abhishek Shah.
Fixes: 8ffaadf7 ("NVMe: Use CMB for the IO SQes if available") Reported-by: Abhishek Shah <abhishek.shah@broadcom.com> Tested-by: Abhishek Shah <abhishek.shah@broadcom.com> Reviewed-by: Keith Busch <keith.busch@intel.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
When under memory-pressure it is possible that the mempool which backs
the 'struct request_queue' will make use of up to BLKDEV_MIN_RQ count
emergency buffers - in case it can't get a regular allocation. These
buffers are preallocated and once they are also used, they are
re-supplied with old finished requests from the same request_queue (see
mempool_free()).
The bug is, when re-supplying the emergency pool, the old requests are
not again ran through the callback mempool_t->alloc(), and thus also not
through the callback bsg_init_rq(). Thus we skip initialization, and
while the sense-buffer still should be good, scsi_request->cmd might
have become to be an invalid pointer in the meantime. When the request
is initialized in bsg.c, and the user's CDB is larger than BLK_MAX_CDB,
bsg will replace it with a custom allocated buffer, which is freed when
the user's command is finished, thus it dangles afterwards. When next a
command is sent by the user that has a smaller/similar CDB as
BLK_MAX_CDB, bsg will assume that scsi_request->cmd is backed by
scsi_request->__cmd, will not make a custom allocation, and write into
undefined memory.
Fix this by splitting bsg_init_rq() into two functions:
- bsg_init_rq() is changed to only do the allocation of the
sense-buffer, which is used to back the bsg job's reply buffer. This
pointer should never change during the lifetime of a scsi_request, so
it doesn't need re-initialization.
- bsg_initialize_rq() is a new function that makes use of
'struct request_queue's initialize_rq_fn callback (which was
introduced in v4.12). This is always called before the request is
given out via blk_get_request(). This function does the remaining
initialization that was previously done in bsg_init_rq(), and will
also do it when the request is taken from the emergency-pool of the
backing mempool.
Fixes: 50b4d485528d ("bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer") Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The hardware state readout oopses after several warnings when trying to
use HDMI on port A, if such a combination is configured in VBT. Filter
the combo out already at the VBT parsing phase.
v2: also ignore DVI (Ville)
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102889 Cc: Imre Deak <imre.deak@intel.com> Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Daniel Drake <dan@reactivated.net> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170921141920.18172-1-jani.nikula@intel.com
(cherry picked from commit d27ffc1d00327c29b3aa97f941b42f0949f9e99f) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
drm_edid_to_eld() initializes the connector ELD to zero, overwriting the
ELD connector type initialized in intel_audio_codec_enable(). If
userspace does getconnector and thus get_modes after modeset, a
subsequent audio component i915_audio_component_get_eld() call will
receive an ELD without the connector type properly set. It's fine for
HDMI, but screws up audio for DP.
Always set the ELD connector type at intel_connector_update_modes()
based on the connector type. We can drop the connector type update from
intel_audio_codec_enable().
Credits to Joseph Nuzman <jnuzman@gmail.com> for figuring this out.
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Cc: Joseph Nuzman <jnuzman@gmail.com> Reported-by: Joseph Nuzman <jnuzman@gmail.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101583 Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com> Tested-by: Joseph Nuzman <jnuzman@gmail.com> Signed-off-by: Jani Nikula <jani.nikula@intel.com> Link: https://patchwork.freedesktop.org/patch/msgid/20170919153813.29808-1-jani.nikula@intel.com
(cherry picked from commit d81fb7fd9436e81fda67e5bc8ed0713aa28d3db2) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The driver was not properly configuring firmware with regard to the
type of scan. It always performed an active scan even when user-space
was requesting for passive scan, ie. the scan request was done without
any SSIDs specified.
Reported-by: Huang, Jiangyang <Jiangyang.Huang@itron.com> Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Upon handling the firmware notification for scans the length was
checked properly and may result in corrupting kernel heap memory
due to buffer overruns. This fix addresses CVE-2017-0786.
Cc: Kevin Cernekee <cernekee@chromium.org> Reviewed-by: Hante Meuleman <hante.meuleman@broadcom.com> Reviewed-by: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Reviewed-by: Franky Lin <franky.lin@broadcom.com> Signed-off-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
A user may lower the max_sectors_kb setting in sysfs to accommodate
certain workloads. Previously we would always set the max I/O size to
either the block layer default or the optional preferred I/O size
reported by the device.
Keep the current heuristics for the initial setting of max_sectors_kb.
For subsequent invocations, only update the current queue limit if it
exceeds the capabilities of the hardware.
Reported-by: Don Brace <don.brace@microsemi.com> Reviewed-by: Martin Wilck <mwilck@suse.com> Tested-by: Don Brace <don.brace@microsemi.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
"A MAXIMUM UNMAP LBA COUNT field set to a non-zero value indicates the
maximum number of LBAs that may be unmapped by an UNMAP command"
"A MAXIMUM WRITE SAME LENGTH field set to a non-zero value indicates
the maximum number of contiguous logical blocks that the device server
allows to be unmapped or written in a single WRITE SAME command."
Despite the spec being clear on the topic, some devices incorrectly
expect WRITE SAME commands with the UNMAP bit set to be limited to the
value reported in MAXIMUM UNMAP LBA COUNT in the Block Limits VPD.
Implement a blacklist option that can be used to accommodate devices
with this behavior.
Reported-by: Bill Kuzeja <William.Kuzeja@stratus.com> Reported-by: Ewan D. Milne <emilne@redhat.com> Reviewed-by: Ewan D. Milne <emilne@redhat.com> Tested-by: Laurence Oberman <loberman@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The MCAST_FILTER_CMD can get quite large when we have many mcast
addresses to set (we support up to 255). So the command should be
send as NOCOPY to prevent a warning caused by too-long commands:
WARNING: CPU: 0 PID: 9700 at /root/iwlwifi/stack-dev/drivers/net/wireless/intel/iwlwifi/pcie/tx.c:1550 iwl_pcie_enqueue_hcmd+0x8c7/0xb40 [iwlwifi]
Command MCAST_FILTER_CMD (0x1d0) is too large (328 bytes)
This fixes: https://bugzilla.kernel.org/show_bug.cgi?id=196743
Currently, in PREEMPT_COUNT=n kernel, kvm_async_pf_task_wait() could call
schedule() to reschedule in some cases. This could result in
accidentally ending the current RCU read-side critical section early,
causing random memory corruption in the guest, or otherwise preempting
the currently running task inside between preempt_disable and
preempt_enable.
The difficulty to handle this well is because we don't know whether an
async PF delivered in a preemptible section or RCU read-side critical section
for PREEMPT_COUNT=n, since preempt_disable()/enable() and rcu_read_lock/unlock()
are both no-ops in that case.
To cure this, we treat any async PF interrupting a kernel context as one
that cannot be preempted, preventing kvm_async_pf_task_wait() from choosing
the schedule() path in that case.
To do so, a second parameter for kvm_async_pf_task_wait() is introduced,
so that we know whether it's called from a context interrupting the
kernel, and the parameter is set properly in all the callsites.
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Boqun Feng <boqun.feng@gmail.com> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
In KVM's XICS-on-XIVE emulation, kvmppc_xive_get_xive() returns the
value of state->guest_server as "server". However, this value is not
set by it's counterpart kvmppc_xive_set_xive(). When the guest uses
this interface to migrate interrupts away from a CPU that is going
offline, it sees all interrupts as belonging to CPU 0, so they are
left assigned to (now) offline CPUs.
This patch removes the guest_server field from the state, and returns
act_server in it's place (that is, the CPU actually handling the
interrupt, which may differ from the one requested).
Fixes: 5af50993850a ("KVM: PPC: Book3S HV: Native usage of the XIVE interrupt controller") Signed-off-by: Sam Bobroff <sam.bobroff@au1.ibm.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The size of struct dm_name_list is different on 32-bit and 64-bit
kernels (so "(nl + 1)" differs between 32-bit and 64-bit kernels).
This mismatch caused some harmless difference in padding when using 32-bit
or 64-bit kernel. Commit 23d70c5e52dd ("dm ioctl: report event number in
DM_LIST_DEVICES") added reporting event number in the output of
DM_LIST_DEVICES_CMD. This difference in padding makes it impossible for
userspace to determine the location of the event number (the location
would be different when running on 32-bit and 64-bit kernels).
Fix the padding by using offsetof(struct dm_name_list, name) instead of
sizeof(struct dm_name_list) to determine the location of entries.
Also, the ioctl version number is incremented to 37 so that userspace
can use the version number to determine that the event number is present
and correctly located.
In addition, a global event is now raised when a DM device is created,
removed, renamed or when table is swapped, so that the user can monitor
for device changes.
Reported-by: Eugene Syromiatnikov <esyr@redhat.com> Fixes: 23d70c5e52dd ("dm ioctl: report event number in DM_LIST_DEVICES") Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
If a crypt mapping uses optional sector_size feature, additional
restrictions to mapped device segment size must be applied in
constructor, otherwise the device activation will fail later.
Fixes: 8f0009a225 ("dm crypt: optionally support larger encryption sector size") Signed-off-by: Milan Broz <gmazyland@gmail.com> Signed-off-by: Mike Snitzer <snitzer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
When CONFIG_KASAN is enabled, the "--param asan-stack=1" causes rather large
stack frames in some functions. This goes unnoticed normally because
CONFIG_FRAME_WARN is disabled with CONFIG_KASAN by default as of commit 3f181b4d8652 ("lib/Kconfig.debug: disable -Wframe-larger-than warnings with
KASAN=y").
The kernelci.org build bot however has the warning enabled and that led
me to investigate it a little further, as every build produces these warnings:
net/wireless/nl80211.c:4389:1: warning: the frame size of 2240 bytes is larger than 2048 bytes [-Wframe-larger-than=]
net/wireless/nl80211.c:1895:1: warning: the frame size of 3776 bytes is larger than 2048 bytes [-Wframe-larger-than=]
net/wireless/nl80211.c:1410:1: warning: the frame size of 2208 bytes is larger than 2048 bytes [-Wframe-larger-than=]
net/bridge/br_netlink.c:1282:1: warning: the frame size of 2544 bytes is larger than 2048 bytes [-Wframe-larger-than=]
Most of this problem is now solved in gcc-8, which can consolidate
the stack slots for the inline function arguments. On older compilers
we can add a workaround by declaring a local variable in each function
to pass the inline function argument.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Inlining these functions creates lots of stack variables that each take
64 bytes when KASAN is enabled, leading to this warning about potential
stack overflow:
drivers/net/ethernet/rocker/rocker_ofdpa.c: In function 'ofdpa_cmd_flow_tbl_add':
drivers/net/ethernet/rocker/rocker_ofdpa.c:621:1: error: the frame size of 2752 bytes is larger than 1536 bytes [-Werror=frame-larger-than=]
gcc-8 can now consolidate the stack slots itself, but on older versions
we get the same behavior by using a temporary variable that holds a
copy of the inline function argument.
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Jean-Denis Girard noticed commit c821e7f3 "pass bytes to
btrfs_bio_alloc" (https://patchwork.kernel.org/patch/9763081/)
introduces a regression on 32 bit machines.
When CONFIG_LBDAF is _not_ defined (CONFIG_LBDAF == Support for large
(2TB+) block devices and files) sector_t is 32 bit on 32bit machines.
In the function submit_extent_page, 'sector' (which is sector_t type) is
multiplied by 512 to convert it from sectors to bytes, leading to an
overflow when the disk is bigger than 4GB (!).
The wacom_get_hdev_data function is used to find and return a reference to
the "other half" of a Wacom device (i.e., the touch device associated with
a pen, or vice-versa). To ensure these references are properly accounted
for, the function is supposed to automatically increment the refcount before
returning. This was not done, however, for devices which have pen & touch
on different interfaces of the same USB device. This can lead to a WARNING
("refcount_t: underflow; use-after-free") when removing the module or device
as we call kref_put() more times than kref_get(). Triggering an "actual" use-
after-free would be difficult since both devices will disappear nearly-
simultaneously. To silence this warning and prevent the potential error, we
need to increment the refcount for all cases within wacom_get_hdev_data.
Fixes: 41372d5d40 ("HID: wacom: Augment 'oVid' and 'oPid' with heuristics for HID_GENERIC") Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> Reviewed-by: Ping Cheng <ping.cheng@wacom.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The tool ID information sent in ABS_MISC is expected to be reset to 0
when a tool leaves proximity. Not doing this can cause problems if a
tool is removed and then re-introduced. Kernel event filtering will
prevent the (identical) ABS_MISC event from being sent when the tool
re-enters proxmity. This can cause userspace to not properly set the
tool ID.
Fixes: f85c9dc678 ("HID: wacom: generic: Support tool ID and additional tool types") Signed-off-by: Ping Cheng <ping.cheng@wacom.com> Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The latest generation of pro devices (MobileStudio Pro, 2nd-gen Intuos
Pro, Cintiq Pro) send a serial number of '0' whenever the pen is too far
away for reliable communication. Userspace defines that a serial number
of '0' is invalid, so we need to be careful not to actually forward
this value. Additionally, since EMR ISDv4 devices do not support serial
numbers or tool IDs, we'd like to not send these events if they aren't
necessary.
The existing code achieves these goals by adding a check for a non-zero
serial number within the wacom_wac_pen_report function. The MSC_SERIAL
and ABS_MISC events are only sent if the serial number is non-zero. This
code fails, however when the pen for a pro device leaves proximity. When
the pen leaves prox and the tablet sends a serial of 0, wacom_wac_pen_event
dutifully clears the serial number. When wacom_wac_pen_report is called,
it does not send either the MSC_SERIAL of the exiting tool nor an ABS_MISC
event.
This patch prevents the wacom_wac_pen_event function from clearing an
already-set serial number. This ensures that we have the serial number
handy when exiting proximity, but requires us to manually clear it
afterwards to ensure the driver does not send stale data (e.g. when
switching between AES pens that report a serial nubmer of 0 for the
first few fully in-proximity packets).
Fixes: f85c9dc678 ("HID: wacom: generic: Support tool ID and additional tool types") Signed-off-by: Ping Cheng <ping.cheng@wacom.com> Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The MobileStudio Pro, Cintiq Pro, and 2nd-gen Intuos Pro devices use a
different coordinate system for their touchring and pen twist than prior
devices. Prior devices had zero aligned to the tablet's left and would
increase clockwise. Userspace expects data from the kernel to be in this
old coordinate space, so adjustments are necessary.
While the coordinate system for pen twist is formally defined by the HID
standard, no such definition existed for the touchring at the time these
tablets were introduced. Future tablets are expected to report touchring
data using the same "zero-up clockwise-increasing" coordinate system
defined for twist.
Fixes: 50066a042d ("HID: wacom: generic: Add support for height, tilt, and twist usages") Fixes: 4922cd26f0 ("HID: wacom: Support 2nd-gen Intuos Pro's Bluetooth classic interface") Fixes: 60a2218698 ("HID: wacom: generic: add support for touchring") Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> Reviewed-by: Ping Cheng <ping.cheng@wacom.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The wacom driver's IRQ handler for Bluetooth reports from the 2nd-gen
Intuos Pro does not correctly process negative numbers. Values for
tilt and rotation (which can go negative) are instead interpreted as
unsigned and so jump to very large values when the data should be
negative. This commit properly casts the data to ensure we report
negative numbers when necessary.
Fixes: 4922cd2 ("HID: wacom: Support 2nd-gen Intuos Pro's Bluetooth classic interface") Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Commit a50aac7193f1 introduces 'led.groups' and adds EKR support
for these groups. However, unlike the other devices with LEDs,
the EKR's LEDs are read-only and we shouldn't attempt to control
them in wacom_led_control().
See bug: https://sourceforge.net/p/linuxwacom/bugs/342/
The buffer allocation is not currently accounting for an extra byte for
the report id. This can cause an out of bounds access in function
i2c_hid_set_or_send_report() with reportID > 15.
So it looks like that suspend/resume has actually always been broken on
hid-rmi. The fact it worked was a rather silly coincidence that was
relying on the HID device to already be opened upon resume. This means
that so long as anything was reading the /dev/input/eventX node for for
an RMI device, it would suspend and resume correctly. As well, if
nothing happened to be keeping the HID device away it would shut off,
then the RMI driver would get confused on resume when it stopped
responding and explode.
So, call hid_hw_open() in rmi_post_resume() so we make sure that the
device is alive before we try talking to it.
This fixes RMI device suspend/resume over HID.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=196851
[jkosina@suse.cz: removed useless hunk that was zero-initializing 'ret'] Signed-off-by: Lyude <lyude@redhat.com> Cc: Andrew Duggan <aduggan@synaptics.com> Reviewed-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
We trap and emulate some instructions (e.g, mrs, deprecated instructions)
for the userspace. However the handlers for these are registered as
late_initcalls and the userspace could be up and running from the initramfs
by that time (with populate_rootfs, which is a rootfs_initcall()). This
could cause problems for the early applications ending up in failure
like :
Enforcing exclusive ownership on upper/work dirs caused a docker
regression: https://github.com/moby/moby/issues/34672.
Euan spotted the regression and pointed to the offending commit.
Vivek has brought the regression to my attention and provided this
reproducer:
Terminal 1:
mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none
merged/
Terminal 2:
unshare -m
Terminal 1:
umount merged
mount -t overlay -o workdir=work,lowerdir=lower,upperdir=upper none
merged/
mount: /root/overlay-testing/merged: none already mounted or mount point
busy
To fix the regression, I replaced the error with an alarming warning.
With index feature enabled, mount does fail, but logs a suggestion to
override exclusive dir protection by disabling index.
Note that index=off mount does take the inuse locks, so a concurrent
index=off will issue the warning and a concurrent index=on mount will fail.
index dentry was not released when breaking out of the loop
due to index verification error.
Fixes: 415543d5c64f ("ovl: cleanup bad and stale index entries on mount") Signed-off-by: Amir Goldstein <amir73il@gmail.com> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The trampoline allocated by function tracer was overwriten by function_graph
tracer, and caused a memory leak. The save_global_trampoline should have
saved the previous trampoline in register_ftrace_graph() and restored it in
unregister_ftrace_graph(). But as it is implemented, save_global_trampoline was
only used in unregister_ftrace_graph as default value 0, and it overwrote the
previous trampoline's value. Causing the previous allocated trampoline to be
lost.
[
Looking further into this, I found that this was left over from when the
function and function graph tracers shared the same ftrace_ops. But in
commit 5f151b2401 ("ftrace: Fix function_profiler and function tracer
together"), the two were separated, and the save_global_trampoline no
longer was necessary (and it may have been broken back then too).
-- Steven Rostedt
]
Link: http://lkml.kernel.org/r/20170912021454.5976-1-shuwang@redhat.com Fixes: 5f151b2401 ("ftrace: Fix function_profiler and function tracer together") Signed-off-by: Shu Wang <shuwang@redhat.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Commit f4757af ("staging: panel: Fix single-open policy race condition")
introduced in 3.19-rc1 attempted to fix a race condition on the open, but
failed to properly do it and used to exit without restoring the semaphore.
This results in -EBUSY being returned after the first open error until
the module is reloaded or the system restarted (ie: consecutive to a
dual open resulting in -EBUSY or to a permission error).
[ Note for stable maintainers: the code moved from drivers/misc/panel.c
to drivers/auxdisplay/{charlcd,panel}.c during 4.12. The patch easily
applies there (modulo the renamed atomic counter) but I can provide a
tested backport if desired. ]
For reasons unknown, the stm_source removal path uses device_destroy()
to kill the underlying device object. Because device_destroy() uses
devt to look for the device to destroy and the fact that stm_source
devices don't have one (or all have the same one), it just picks the
first device in the class, which may well be the wrong one.
That is, loading stm_console and stm_heartbeat and then removing both
will die in dereferencing a freed object.
Since this should have been device_unregister() in the first place,
use it instead of device_destroy().
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com> Fixes: 7bd1d4093c2 ("stm class: Introduce an abstraction for System Trace Module devices") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Due to commit 54a66265d675 ("Drivers: hv: vmbus: Fix rescind handling"),
we need this patch to resolve the below deadlock:
after we get the mutex in vmbus_hvsock_device_unregister() and call
vmbus_device_unregister() -> device_unregister() -> ... -> device_release()
-> vmbus_device_release(), we'll get a deadlock, because
vmbus_device_release() tries to get the same mutex.
Signed-off-by: Dexuan Cui <decui@microsoft.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Till recently the expected length of bytes read by the
daemon did depend on the context. It was either hv_start_fcopy or
hv_do_fcopy. The daemon had a buffer size of two pages, which was much
larger than needed.
Now the expected length of bytes read by the
daemon changed slightly. For START_FILE_COPY it is still the size of
hv_start_fcopy. But for WRITE_TO_FILE and the other operations it is as
large as the buffer that arrived via vmbus. In case of WRITE_TO_FILE
that is slightly larger than a struct hv_do_fcopy. Since the buffer in
the daemon was still larger everything was fine.
Currently, the daemon reads only what is actually needed.
The new buffer layout is as large as a struct hv_do_fcopy, for the
WRITE_TO_FILE operation. Since the kernel expects a slightly larger
size, hvt_op_read will return -EINVAL because the daemon will read
slightly less than expected. Address this by restoring the expected
buffer size in case of WRITE_TO_FILE.
Fixes: 'c7e490fc23eb ("Drivers: hv: fcopy: convert to hv_utils_transport")' Fixes: '3f2baa8a7d2e ("Tools: hv: update buffer handling in hv_fcopy_daemon")' Signed-off-by: Olaf Hering <olaf@aepfle.de> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
The cgroup_taskset structure within the larger cgroup_mgctx structure
is supposed to be used once and then discarded. That is not really the
case in the hotplug code path:
In this case, the cgroup_migrate() function is called multiple time
with the same cgroup_mgctx structure to transfer the tasks from
one cgroup to another one-by-one. The second time cgroup_migrate()
is called, the cgroup_taskset will be in an incorrect state and so
may cause the system to panic. For example,
To allow reuse of the cgroup_mgctx structure, some fields in that
structure are now re-initialized at the end of cgroup_migrate_execute()
function call so that the structure can be reused again in a later
iteration without causing problem.
This bug was introduced in the commit e595cd706982 ("group: track
migration context in cgroup_mgctx") in 4.11. This commit moves the
cgroup_taskset initialization out of cgroup_migrate(). The commit 10467270fb3 ("cgroup: don't call migration methods if there are no
tasks to migrate") helped, but did not completely resolve the problem.
When printing the driver_override parameter when it is 4095 and 4094 bytes
long, the printing code would access invalid memory because we need count+1
bytes for printing.
Reject driver_override values of these lengths in driver_override_store().
This is in close analogy to commit 4efe874aace5 ("PCI: Don't read past the
end of sysfs "driver_override" buffer") from Sasha Levin.
As raw_cpu_generic_read() is a plain read from a raw_cpu_ptr() address,
it's possible (albeit unlikely) that the compiler will split the access
across multiple instructions.
In this_cpu_generic_read() we disable preemption but not interrupts
before calling raw_cpu_generic_read(). Thus, an interrupt could be taken
in the middle of the split load instructions. If a this_cpu_write() or
RMW this_cpu_*() op is made to the same variable in the interrupt
handling path, this_cpu_read() will return a torn value.
For native word types, we can avoid tearing using READ_ONCE(), but this
won't work in all cases (e.g. 64-bit types on most 32-bit platforms).
This patch reworks this_cpu_generic_read() to use READ_ONCE() where
possible, otherwise falling back to disabling interrupts.
Currently it's possible that on returning from the signal handler
through the restore_tm_sigcontexts() code path (e.g. from a signal
caught due to a `trap` instruction executed in the middle of an HTM
block, or a deliberately constructed sigframe) an illegal TM state
(like TS=10 TM=0, i.e. "T0") is set in SRR1 and when `rfid` sets
implicitly the MSR register from SRR1 register on return to userspace
it causes a TM Bad Thing exception.
That illegal state can be set (a) by a malicious user that disables
the TM bit by tweaking the bits in uc_mcontext before returning from
the signal handler or (b) by a sufficient number of context switches
occurring such that the load_tm counter overflows and TM is disabled
whilst in the signal handler.
This commit fixes the illegal TM state by ensuring that TM bit is
always enabled before we return from restore_tm_sigcontexts(). A small
comment correction is made as well.
When using transactional memory (TM), the CPU can be in one of six
states as far as TM is concerned, encoded in the Machine State
Register (MSR). Certain state transitions are illegal and if attempted
trigger a "TM Bad Thing" type program check exception.
If we ever hit one of these exceptions it's treated as a bug, ie. we
oops, and kill the process and/or panic, depending on configuration.
One case where we can trigger a TM Bad Thing, is when returning to
userspace after a system call or interrupt, using RFID. When this
happens the CPU first restores the user register state, in particular
r1 (the stack pointer) and then attempts to update the MSR. However
the MSR update is not allowed and so we take the program check with
the user register state, but the kernel MSR.
This tricks the exception entry code into thinking we have a bad
kernel stack pointer, because the MSR says we're coming from the
kernel, but r1 is pointing to userspace.
To avoid this we instead always switch to the emergency stack if we
take a TM Bad Thing from the kernel. That way none of the user
register values are used, other than for printing in the oops message.