]> git.proxmox.com Git - mirror_ubuntu-bionic-kernel.git/log
mirror_ubuntu-bionic-kernel.git
6 years agodrm: Allow determining if current task is output poll worker
Lukas Wunner [Wed, 14 Feb 2018 05:41:25 +0000 (06:41 +0100)]
drm: Allow determining if current task is output poll worker

BugLink: http://bugs.launchpad.net/bugs/1756100
commit 25c058ccaf2ebbc3e250ec1e199e161f91fe27d4 upstream.

Introduce a helper to determine if the current task is an output poll
worker.

This allows us to fix a long-standing deadlock in several DRM drivers
wherein the ->runtime_suspend callback waits for the output poll worker
to finish and the worker in turn calls a ->detect callback which waits
for runtime suspend to finish.  The ->detect callback is invoked from
multiple call sites and waiting for runtime suspend to finish is the
correct thing to do except if it's executing in the context of the
worker.

v2: Expand kerneldoc to specifically mention deadlock between
    output poll worker and autosuspend worker as use case. (Lyude)

Cc: Dave Airlie <airlied@redhat.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://patchwork.freedesktop.org/patch/msgid/3549ce32e7f1467102e70d3e9cbf70c46bfe108e.1518593424.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoworkqueue: Allow retrieval of current task's work struct
Lukas Wunner [Sun, 11 Feb 2018 09:38:28 +0000 (10:38 +0100)]
workqueue: Allow retrieval of current task's work struct

BugLink: http://bugs.launchpad.net/bugs/1756100
commit 27d4ee03078aba88c5e07dcc4917e8d01d046f38 upstream.

Introduce a helper to retrieve the current task's work struct if it is
a workqueue worker.

This allows us to fix a long-standing deadlock in several DRM drivers
wherein the ->runtime_suspend callback waits for a specific worker to
finish and that worker in turn calls a function which waits for runtime
suspend to finish.  That function is invoked from multiple call sites
and waiting for runtime suspend to finish is the correct thing to do
except if it's executing in the context of the worker.

Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Cc: Dave Airlie <airlied@redhat.com>
Cc: Ben Skeggs <bskeggs@redhat.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Lyude Paul <lyude@redhat.com>
Signed-off-by: Lukas Wunner <lukas@wunner.de>
Link: https://patchwork.freedesktop.org/patch/msgid/2d8f603074131eb87e588d2b803a71765bd3a2fd.1518338788.git.lukas@wunner.de
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agodrm/i915: Always call to intel_display_set_init_power() in resume_early.
Maarten Lankhorst [Tue, 16 Jan 2018 15:53:24 +0000 (16:53 +0100)]
drm/i915: Always call to intel_display_set_init_power() in resume_early.

BugLink: http://bugs.launchpad.net/bugs/1756100
commit d13a8479f3584613b6aacbb793eae64578b8f69a upstream.

intel_power_domains_init_hw() calls set_init_power, but when using
runtime power management this call is skipped. This prevents hw readout
from taking place.

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104172
Link: https://patchwork.freedesktop.org/patch/msgid/20180116155324.75120-1-maarten.lankhorst@linux.intel.com
Fixes: bc87229f323e ("drm/i915/skl: enable PC9/10 power states during suspend-to-idle")
Cc: Nivedita Swaminathan <nivedita.swaminathan@intel.com>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Patrik Jakobsson <patrik.jakobsson@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <stable@vger.kernel.org> # v4.5+
Reviewed-by: Imre Deak <imre.deak@intel.com>
(cherry picked from commit ac25dfed15d470d7f23dd817e965b54aa3f94a1e)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoscsi: qla2xxx: Fix NULL pointer crash due to active timer for ABTS
himanshu.madhani@cavium.com [Mon, 12 Feb 2018 18:28:14 +0000 (10:28 -0800)]
scsi: qla2xxx: Fix NULL pointer crash due to active timer for ABTS

BugLink: http://bugs.launchpad.net/bugs/1756100
commit 1514839b366417934e2f1328edb50ed1e8a719f5 upstream.

This patch fixes NULL pointer crash due to active timer running for abort
IOCB.

From crash dump analysis it was discoverd that get_next_timer_interrupt()
encountered a corrupted entry on the timer list.

 #9 [ffff95e1f6f0fd40] page_fault at ffffffff914fe8f8
    [exception RIP: get_next_timer_interrupt+440]
    RIP: ffffffff90ea3088  RSP: ffff95e1f6f0fdf0  RFLAGS: 00010013
    RAX: ffff95e1f6451028  RBX: 000218e2389e5f40  RCX: 00000001232ad600
    RDX: 0000000000000001  RSI: ffff95e1f6f0fdf0  RDI: 0000000001232ad6
    RBP: ffff95e1f6f0fe40   R8: ffff95e1f6451188   R9: 0000000000000001
    R10: 0000000000000016  R11: 0000000000000016  R12: 00000001232ad5f6
    R13: ffff95e1f6450000  R14: ffff95e1f6f0fdf8  R15: ffff95e1f6f0fe10
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018

Looking at the assembly of get_next_timer_interrupt(), address came
from %r8 (ffff95e1f6451188) which is pointing to list_head with single
entry at ffff95e5ff621178.

 0xffffffff90ea307a <get_next_timer_interrupt+426>:      mov    (%r8),%rdx
 0xffffffff90ea307d <get_next_timer_interrupt+429>:      cmp    %r8,%rdx
 0xffffffff90ea3080 <get_next_timer_interrupt+432>:      je     0xffffffff90ea30a7 <get_next_timer_interrupt+471>
 0xffffffff90ea3082 <get_next_timer_interrupt+434>:      nopw   0x0(%rax,%rax,1)
 0xffffffff90ea3088 <get_next_timer_interrupt+440>:      testb  $0x1,0x18(%rdx)

 crash> rd ffff95e1f6451188 10
 ffff95e1f6451188:  ffff95e5ff621178 ffff95e5ff621178   x.b.....x.b.....
 ffff95e1f6451198:  ffff95e1f6451198 ffff95e1f6451198   ..E.......E.....
 ffff95e1f64511a8:  ffff95e1f64511a8 ffff95e1f64511a8   ..E.......E.....
 ffff95e1f64511b8:  ffff95e77cf509a0 ffff95e77cf509a0   ...|.......|....
 ffff95e1f64511c8:  ffff95e1f64511c8 ffff95e1f64511c8   ..E.......E.....

 crash> rd ffff95e5ff621178 10
 ffff95e5ff621178:  0000000000000001 ffff95e15936aa00   ..........6Y....
 ffff95e5ff621188:  0000000000000000 00000000ffffffff   ................
 ffff95e5ff621198:  00000000000000a0 0000000000000010   ................
 ffff95e5ff6211a8:  ffff95e5ff621198 000000000000000c   ..b.............
 ffff95e5ff6211b8:  00000f5800000000 ffff95e751f8d720   ....X... ..Q....

 ffff95e5ff621178 belongs to freed mempool object at ffff95e5ff621080.

 CACHE            NAME                 OBJSIZE  ALLOCATED     TOTAL  SLABS  SSIZE
 ffff95dc7fd74d00 mnt_cache                384      19785     24948    594    16k
   SLAB              MEMORY            NODE  TOTAL  ALLOCATED  FREE
   ffffdc5dabfd8800  ffff95e5ff620000     1     42         29    13
   FREE / [ALLOCATED]
    ffff95e5ff621080  (cpu 6 cache)

Examining the contents of that memory reveals a pointer to a constant string
in the driver, "abort\0", which is set by qla24xx_async_abort_cmd().

 crash> rd ffffffffc059277c 20
 ffffffffc059277c:  6e490074726f6261 0074707572726574   abort.Interrupt.
 ffffffffc059278c:  00676e696c6c6f50 6920726576697244   Polling.Driver i
 ffffffffc059279c:  646f6d207325206e 6974736554000a65   n %s mode..Testi
 ffffffffc05927ac:  636976656420676e 786c252074612065   ng device at %lx
 ffffffffc05927bc:  6b63656843000a2e 646f727020676e69   ...Checking prod
 ffffffffc05927cc:  6f20444920746375 0a2e706968632066   uct ID of chip..
 ffffffffc05927dc:  5120646e756f4600 204130303232414c   .Found QLA2200A
 ffffffffc05927ec:  43000a2e70696843 20676e696b636568   Chip...Checking
 ffffffffc05927fc:  65786f626c69616d 6c636e69000a2e73   mailboxes...incl
 ffffffffc059280c:  756e696c2f656475 616d2d616d642f78   ude/linux/dma-ma

 crash> struct -ox srb_iocb
 struct srb_iocb {
           union {
               struct {...} logio;
               struct {...} els_logo;
               struct {...} tmf;
               struct {...} fxiocb;
               struct {...} abt;
               struct ct_arg ctarg;
               struct {...} mbx;
               struct {...} nack;
    [0x0 ] } u;
    [0xb8] struct timer_list timer;
    [0x108] void (*timeout)(void *);
 }
 SIZE: 0x110

 crash> ! bc
 ibase=16
 obase=10
 B8+40
 F8

The object is a srb_t, and at offset 0xf8 within that structure
(i.e. ffff95e5ff621080 + f8 -> ffff95e5ff621178) is a struct timer_list.

Cc: <stable@vger.kernel.org> #4.4+
Fixes: 4440e46d5db7 ("[SCSI] qla2xxx: Add IOCB Abort command asynchronous handling.")
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoscsi: core: Avoid that ATA error handling can trigger a kernel hang or oops
Bart Van Assche [Thu, 22 Feb 2018 19:30:20 +0000 (11:30 -0800)]
scsi: core: Avoid that ATA error handling can trigger a kernel hang or oops

BugLink: http://bugs.launchpad.net/bugs/1756100
commit 3be8828fc507cdafe7040a3dcf361a2bcd8e305b upstream.

Avoid that the recently introduced call_rcu() call in the SCSI core
triggers a double call_rcu() call.

Reported-by: Natanael Copa <ncopa@alpinelinux.org>
Reported-by: Damien Le Moal <damien.lemoal@wdc.com>
References: https://bugzilla.kernel.org/show_bug.cgi?id=198861
Fixes: 3bd6f43f5cb3 ("scsi: core: Ensure that the SCSI error handler gets woken up")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Reviewed-by: Damien Le Moal <damien.lemoal@wdc.com>
Tested-by: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Natanael Copa <ncopa@alpinelinux.org>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: Alexandre Oliva <oliva@gnu.org>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Hannes Reinecke <hare@suse.com>
Cc: Johannes Thumshirn <jthumshirn@suse.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agodrm/i915/perf: fix perf stream opening lock
Lionel Landwerlin [Thu, 1 Mar 2018 11:06:13 +0000 (11:06 +0000)]
drm/i915/perf: fix perf stream opening lock

BugLink: http://bugs.launchpad.net/bugs/1756100
commit f616f2830c1ed79245cfeca900f7e8a3b3c08c06 upstream.

We're seeing on CI that some contexts don't have the programmed OA
period timer that directs the OA unit on how often to write reports.

The issue is that we're not holding the drm lock from when we edit the
context images down to when we set the exclusive_stream variable. This
leaves a window for the deferred context allocation to call
i915_oa_init_reg_state() that will not program the expected OA timer
value, because we haven't set the exclusive_stream yet.

v2: Drop need_lock from gen8_configure_all_contexts() (Matt)

Signed-off-by: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
Reviewed-by: Matthew Auld <matthew.auld@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Fixes: 701f8231a2f ("drm/i915/perf: prune OA configs")
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=102254
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103715
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103755
Link: https://patchwork.freedesktop.org/patch/msgid/20180301110613.1737-1-lionel.g.landwerlin@intel.com
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: intel-gfx@lists.freedesktop.org
Cc: <stable@vger.kernel.org> # v4.14+
(cherry picked from commit 41d3fdcd15d5ecf29cc73e8b79c2327ebb54b960)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agodrm/i915: Try EDID bitbanging on HDMI after failed read
Stefan Brüns [Sun, 31 Dec 2017 22:34:54 +0000 (23:34 +0100)]
drm/i915: Try EDID bitbanging on HDMI after failed read

BugLink: http://bugs.launchpad.net/bugs/1756100
commit 90024a5951029685acc5396258f1b0de9b23cf4a upstream.

The ACK/NACK implementation as found in e.g. the G965 has the falling
clock edge and the release of the data line after the ACK for the received
byte happen at the same time.

This is conformant with the I2C specification, which allows a zero hold
time, see footnote [3]: "A device must internally provide a hold time of
at least 300 ns for the SDA signal (with respect to the V IH(min) of the
SCL signal) to bridge the undefined region of the falling edge of SCL."

Some HDMI-to-VGA converters apparently fail to adhere to this requirement
and latch SDA at the falling clock edge, so instead of an ACK
sometimes a NACK is read and the slave (i.e. the EDID ROM) ends the
transfer.

The bitbanging releases the data line for the ACK only 1/4 bit time after
the falling clock edge, so a slave will see the correct value no matter
if it samples at the rising or the falling clock edge or in the center.

Fallback to bitbanging is already done for the CRT connector.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=92685
Signed-off-by: Stefan Brüns <stefan.bruens@rwth-aachen.de>
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/a39f080b-81a5-4c93-b3f7-7cb0a58daca3@rwthex-w2-a.rwth-ad.de
(cherry picked from commit cfb926e148e99acc02351d72e8b85e32b5f786ef)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agodrm/i915: Update watermark state correctly in sanitize_watermarks
Maarten Lankhorst [Fri, 10 Nov 2017 11:34:53 +0000 (12:34 +0100)]
drm/i915: Update watermark state correctly in sanitize_watermarks

BugLink: http://bugs.launchpad.net/bugs/1756100
commit 556fe36d09da5f82879e92bafa0371b4b79f7d6f upstream.

We no longer use intel_crtc->wm.active for watermarks any more,
which was incorrect. But this uncovered a bug in sanitize_watermarks(),
which meant that we wrote the correct watermarks, but the next
update would still use the wrong hw watermarks for calculating.
This caused all further updates to fail with -EINVAL and the
log would reveal an error like the one below:

[   10.043902] [drm:ilk_validate_wm_level.part.8 [i915]] Sprite WM0 too large 56 (max 0)
[   10.043960] [drm:ilk_validate_pipe_wm [i915]] LP0 watermark invalid
[   10.044030] [drm:intel_crtc_atomic_check [i915]] No valid intermediate pipe watermarks are possible

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Fixes: b6b178a77210 ("drm/i915: Calculate ironlake intermediate watermarks correctly, v2.")
Cc: stable@vger.kernel.org #v4.8+
Link: https://patchwork.freedesktop.org/patch/msgid/20171110113503.16253-1-maarten.lankhorst@linux.intel.com
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agodrm/i915: Disable DC states around GMBUS on GLK
Ville Syrjälä [Fri, 8 Dec 2017 21:37:36 +0000 (23:37 +0200)]
drm/i915: Disable DC states around GMBUS on GLK

BugLink: http://bugs.launchpad.net/bugs/1756100
commit 156961ae7bdf6feb72778e8da83d321b273343fd upstream.

Prevent the DMC from destroying GMBUS transfers on GLK. GMBUS
lives in PG1 so DC off is all we need.

Cc: stable@vger.kernel.org
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20171208213739.16388-1-ville.syrjala@linux.intel.com
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agodrm/i915: Clear the in-use marker on execbuf failure
Chris Wilson [Mon, 19 Feb 2018 14:01:44 +0000 (14:01 +0000)]
drm/i915: Clear the in-use marker on execbuf failure

BugLink: http://bugs.launchpad.net/bugs/1756100
commit e659d14ed48096f87a678e7ebbdf286a817b4d0e upstream.

If we fail to unbind the vma (due to a signal on an active buffer that
needs to be moved for the next execbuf), then we need to clear the
persistent tracking state we setup for this execbuf.

Fixes: c7c6e46f913b ("drm/i915: Convert execbuf to use struct-of-array packing for critical fields")
Testcase: igt/gem_fenced_exec_thrash/no-spare-fences-busy*
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: <stable@vger.kernel.org> # v4.14+
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180219140144.24004-1-chris@chris-wilson.co.uk
(cherry picked from commit ed2f3532321083cf40e4da4e36234880e0136136)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agodrm/i915: Fix rsvd2 mask when out-fence is returned
Daniele Ceraolo Spurio [Wed, 14 Feb 2018 19:18:25 +0000 (11:18 -0800)]
drm/i915: Fix rsvd2 mask when out-fence is returned

BugLink: http://bugs.launchpad.net/bugs/1756100
commit b1b13780ab06ef8c770dd9cbe31dac549a11630e upstream.

GENMASK_ULL wants the high bit of the mask first. The current value
cancels the in-fence when an out-fence is returned.

Fixes: fec0445caa273 ("drm/i915: Support explicit fencing for execbuf")
Testcase: igt/gem_exec_fence/keep-in-fence*
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Link: https://patchwork.freedesktop.org/patch/msgid/20180214191827.8465-1-daniele.ceraolospurio@intel.com
Cc: <stable@vger.kernel.org> # v4.12+
(cherry picked from commit b6a88e4a804cf5a71159906e16df2c1fc7196f92)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agodrm/i915/audio: fix check for av_enc_map overflow
Jani Nikula [Wed, 14 Feb 2018 17:38:40 +0000 (19:38 +0200)]
drm/i915/audio: fix check for av_enc_map overflow

BugLink: http://bugs.launchpad.net/bugs/1756100
commit 72a6d72c2cd03bba7b70117b63dea83d2de88057 upstream.

Turns out -1 >= ARRAY_SIZE() is always true. Move the bounds check where
we know pipe >= 0 and next to the array indexing where it makes most
sense.

Fixes: 9965db26ac05 ("drm/i915: Check for fused or unused pipes")
Fixes: 0b7029b7e43f ("drm/i915: Check for fused or unused pipes")
Cc: <stable@vger.kernel.org> # v4.10+
Cc: Mika Kahola <mika.kahola@intel.com>
Cc: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: intel-gfx@lists.freedesktop.org
Reviewed-by: Dhinakaran Pandiyan <dhinakaran.pandiyan@intel.com>
Reviewed-by: Mika Kahola <mika.kahola@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20180214173840.25360-1-jani.nikula@intel.com
(cherry picked from commit cdb3db8542d854bd678d60cd28861b042e191672)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agodrm/i915: Check for fused or unused pipes
Mika Kahola [Mon, 18 Dec 2017 08:04:03 +0000 (10:04 +0200)]
drm/i915: Check for fused or unused pipes

BugLink: http://bugs.launchpad.net/bugs/1756100
commit 9965db26ac0548648309f506dc155a92daa2158f upstream.

We may have fused or unused pipes in our system. Let's check that the pipe
in question is within limits of accessible pipes. In case, that we are not
able to access the pipe, we return early with a warning.

v2: Rephrasing of the commit message (Jani)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=103206
Reported-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Jaswinder Singh Rajput <jaswinder@perfectintelligent.com>
Suggested-by: Jani Nikula <jani.nikula@intel.com>
Reviewed-by: Jani Nikula <jani.nikula@intel.com>
Signed-off-by: Mika Kahola <mika.kahola@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/1513584243-12607-1-git-send-email-mika.kahola@intel.com
(cherry picked from commit 0b7029b7e43fda1304c181a3ade0b429b9edcd9d)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
Cc: <stable@vger.kernel.org> # v4.10+
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoregulator: stm32-vrefbuf: fix check on ready flag
Fabrice Gasnier [Thu, 8 Feb 2018 13:43:05 +0000 (14:43 +0100)]
regulator: stm32-vrefbuf: fix check on ready flag

BugLink: http://bugs.launchpad.net/bugs/1756100
commit f63248fac563125fd5a2f0bc780ce7a299872cab upstream.

stm32_vrefbuf_enable() wrongly checks VRR bit: 0 stands for not ready,
1 for ready. It currently checks the opposite.
This makes enable routine to exit immediately without waiting for ready
flag.

Fixes: 0cdbf481e927 ("regulator: Add support for stm32-vrefbuf")
Signed-off-by: Fabrice Gasnier <fabrice.gasnier@st.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agonet/smc: fix NULL pointer dereference on sock_create_kern() error path
Davide Caratti [Wed, 28 Feb 2018 11:44:09 +0000 (12:44 +0100)]
net/smc: fix NULL pointer dereference on sock_create_kern() error path

BugLink: http://bugs.launchpad.net/bugs/1756100
commit a5dcb73b96a9d21431048bdaac02d9e96f386da3 upstream.

when sock_create_kern(..., a) returns an error, 'a' might not be a valid
pointer, so it shouldn't be dereferenced to read a->sk->sk_sndbuf and
and a->sk->sk_rcvbuf; not doing that caused the following crash:

general protection fault: 0000 [#1] SMP KASAN
Dumping ftrace buffer:
    (ftrace buffer empty)
Modules linked in:
CPU: 0 PID: 4254 Comm: syzkaller919713 Not tainted 4.16.0-rc1+ #18
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
RIP: 0010:smc_create+0x14e/0x300 net/smc/af_smc.c:1410
RSP: 0018:ffff8801b06afbc8 EFLAGS: 00010202
RAX: dffffc0000000000 RBX: ffff8801b63457c0 RCX: ffffffff85a3e746
RDX: 0000000000000004 RSI: 00000000ffffffff RDI: 0000000000000020
RBP: ffff8801b06afbf0 R08: 00000000000007c0 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
R13: ffff8801b6345c08 R14: 00000000ffffffe9 R15: ffffffff8695ced0
FS:  0000000001afb880(0000) GS:ffff8801db200000(0000)
knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020000040 CR3: 00000001b0721004 CR4: 00000000001606f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
  __sock_create+0x4d4/0x850 net/socket.c:1285
  sock_create net/socket.c:1325 [inline]
  SYSC_socketpair net/socket.c:1409 [inline]
  SyS_socketpair+0x1c0/0x6f0 net/socket.c:1366
  do_syscall_64+0x282/0x940 arch/x86/entry/common.c:287
  entry_SYSCALL_64_after_hwframe+0x26/0x9b
RIP: 0033:0x4404b9
RSP: 002b:00007fff44ab6908 EFLAGS: 00000246 ORIG_RAX: 0000000000000035
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00000000004404b9
RDX: 0000000000000000 RSI: 0000000000000001 RDI: 000000000000002b
RBP: 00007fff44ab6910 R08: 0000000000000002 R09: 00007fff44003031
R10: 0000000020000040 R11: 0000000000000246 R12: ffffffffffffffff
R13: 0000000000000006 R14: 0000000000000000 R15: 0000000000000000
Code: 48 c1 ea 03 80 3c 02 00 0f 85 b3 01 00 00 4c 8b a3 48 04 00 00 48
b8
00 00 00 00 00 fc ff df 49 8d 7c 24 20 48 89 fa 48 c1 ea 03 <80> 3c 02
00
0f 85 82 01 00 00 4d 8b 7c 24 20 48 b8 00 00 00 00
RIP: smc_create+0x14e/0x300 net/smc/af_smc.c:1410 RSP: ffff8801b06afbc8

Fixes: cd6851f30386 smc: remote memory buffers (RMBs)
Reported-and-tested-by: syzbot+aa0227369be2dcc26ebe@syzkaller.appspotmail.com
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agomac80211_hwsim: don't use WQ_MEM_RECLAIM
Johannes Berg [Wed, 24 Jan 2018 07:40:51 +0000 (08:40 +0100)]
mac80211_hwsim: don't use WQ_MEM_RECLAIM

BugLink: http://bugs.launchpad.net/bugs/1756100
commit ce162bfbc0b601841886965baba14877127c7c7c upstream.

We're obviously not part of a memory reclaim path, so don't set the flag.

This also causes a warning in check_flush_dependency() since we end up
in a code path that flushes a non-reclaim workqueue, and we shouldn't do
that if we were really part of reclaim.

Reported-by: syzbot+41cdaf4232c50e658934@syzkaller.appspotmail.com
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoIB/uverbs: Improve lockdep_check
Jason Gunthorpe [Tue, 13 Feb 2018 10:18:38 +0000 (12:18 +0200)]
IB/uverbs: Improve lockdep_check

BugLink: http://bugs.launchpad.net/bugs/1756100
commit 104f268d439b3c21c83708e52946a4d8d37f3d0f upstream.

This is really being used as an assert that the expected usecnt
is being held and implicitly that the usecnt is valid. Rename it to
assert_uverbs_usecnt and tighten the checks to only accept valid
values of usecnt (eg 0 and < -1 are invalid).

The tigher checkes make the assertion cover more cases and is more
likely to find bugs via syzkaller/etc.

Fixes: 3832125624b7 ("IB/core: Add support for idr types")
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agobpf: cpumap: use GFP_KERNEL instead of GFP_ATOMIC in __cpu_map_entry_alloc()
Jason Wang [Wed, 14 Feb 2018 14:17:34 +0000 (22:17 +0800)]
bpf: cpumap: use GFP_KERNEL instead of GFP_ATOMIC in __cpu_map_entry_alloc()

BugLink: http://bugs.launchpad.net/bugs/1756100
commit 7fc17e909edfb9bf421ee04e981d3d474175c7c7 upstream.

There're several implications after commit 0bf7800f1799 ("ptr_ring:
try vmalloc() when kmalloc() fails") with the using of vmalloc() since
can't allow GFP_ATOMIC but mandate GFP_KERNEL. This will lead a WARN
since cpumap try to call with GFP_ATOMIC. Fortunately, entry
allocation of cpumap can only be done through syscall path which means
GFP_ATOMIC is not necessary, so fixing this by replacing GFP_ATOMIC
with GFP_KERNEL.

Reported-by: syzbot+1a240cdb1f4cc88819df@syzkaller.appspotmail.com
Fixes: 0bf7800f1799 ("ptr_ring: try vmalloc() when kmalloc() fails")
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: akpm@linux-foundation.org
Cc: dhowells@redhat.com
Cc: hannes@cmpxchg.org
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoRDMA/mlx5: Fix integer overflow while resizing CQ
Leon Romanovsky [Wed, 7 Mar 2018 13:29:09 +0000 (15:29 +0200)]
RDMA/mlx5: Fix integer overflow while resizing CQ

BugLink: http://bugs.launchpad.net/bugs/1756100
commit 28e9091e3119933c38933cb8fc48d5618eb784c8 upstream.

The user can provide very large cqe_size which will cause to integer
overflow as it can be seen in the following UBSAN warning:

=======================================================================
UBSAN: Undefined behaviour in drivers/infiniband/hw/mlx5/cq.c:1192:53
signed integer overflow:
64870 * 65536 cannot be represented in type 'int'
CPU: 0 PID: 267 Comm: syzkaller605279 Not tainted 4.15.0+ #90 Hardware
name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
Call Trace:
 dump_stack+0xde/0x164
 ? dma_virt_map_sg+0x22c/0x22c
 ubsan_epilogue+0xe/0x81
 handle_overflow+0x1f3/0x251
 ? __ubsan_handle_negate_overflow+0x19b/0x19b
 ? lock_acquire+0x440/0x440
 mlx5_ib_resize_cq+0x17e7/0x1e40
 ? cyc2ns_read_end+0x10/0x10
 ? native_read_msr_safe+0x6c/0x9b
 ? cyc2ns_read_end+0x10/0x10
 ? mlx5_ib_modify_cq+0x220/0x220
 ? sched_clock_cpu+0x18/0x200
 ? lookup_get_idr_uobject+0x200/0x200
 ? rdma_lookup_get_uobject+0x145/0x2f0
 ib_uverbs_resize_cq+0x207/0x3e0
 ? ib_uverbs_ex_create_cq+0x250/0x250
 ib_uverbs_write+0x7f9/0xef0
 ? cyc2ns_read_end+0x10/0x10
 ? print_irqtrace_events+0x280/0x280
 ? ib_uverbs_ex_create_cq+0x250/0x250
 ? uverbs_devnode+0x110/0x110
 ? sched_clock_cpu+0x18/0x200
 ? do_raw_spin_trylock+0x100/0x100
 ? __lru_cache_add+0x16e/0x290
 __vfs_write+0x10d/0x700
 ? uverbs_devnode+0x110/0x110
 ? kernel_read+0x170/0x170
 ? sched_clock_cpu+0x18/0x200
 ? security_file_permission+0x93/0x260
 vfs_write+0x1b0/0x550
 SyS_write+0xc7/0x1a0
 ? SyS_read+0x1a0/0x1a0
 ? trace_hardirqs_on_thunk+0x1a/0x1c
 entry_SYSCALL_64_fastpath+0x1e/0x8b
RIP: 0033:0x433549
RSP: 002b:00007ffe63bd1ea8 EFLAGS: 00000217
=======================================================================

Cc: syzkaller <syzkaller@googlegroups.com>
Cc: <stable@vger.kernel.org> # 3.13
Fixes: bde51583f49b ("IB/mlx5: Add support for resize CQ")
Reported-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoRDMA/ucma: Check that user doesn't overflow QP state
Leon Romanovsky [Wed, 7 Mar 2018 16:49:16 +0000 (18:49 +0200)]
RDMA/ucma: Check that user doesn't overflow QP state

BugLink: http://bugs.launchpad.net/bugs/1756100
commit a5880b84430316e3e1c1f5d23aa32ec6000cc717 upstream.

The QP state is limited and declared in enum ib_qp_state,
but ucma user was able to supply any possible (u32) value.

Reported-by: syzbot+0df1ab766f8924b1edba@syzkaller.appspotmail.com
Fixes: 75216638572f ("RDMA/cma: Export rdma cm interface to userspace")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoRDMA/ucma: Limit possible option size
Leon Romanovsky [Wed, 7 Mar 2018 12:49:09 +0000 (14:49 +0200)]
RDMA/ucma: Limit possible option size

BugLink: http://bugs.launchpad.net/bugs/1756100
commit 6a21dfc0d0db7b7e0acedce67ca533a6eb19283c upstream.

Users of ucma are supposed to provide size of option level,
in most paths it is supposed to be equal to u8 or u16, but
it is not the case for the IB path record, where it can be
multiple of struct ib_path_rec_data.

This patch takes simplest possible approach and prevents providing
values more than possible to allocate.

Reported-by: syzbot+a38b0e9f694c379ca7ce@syzkaller.appspotmail.com
Fixes: 7ce86409adcd ("RDMA/ucma: Allow user space to set service type")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoRevert "UBUNTU: SAUCE: ALSA: hda/realtek - Add support headset mode for DELL WYSE"
Thadeu Lima de Souza Cascardo [Thu, 15 Mar 2018 14:30:23 +0000 (11:30 -0300)]
Revert "UBUNTU: SAUCE: ALSA: hda/realtek - Add support headset mode for DELL WYSE"

BugLink: http://bugs.launchpad.net/bugs/1756100
This reverts commit 4127709ed096728061d252323ceb5ae947b84079.

It has been applied to v4.15.10, where some lines have changed positions.
Preferring the upstream stable version to avoid future conflicts.

Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agonfp: flower: prioritize stats updates
Pieter Jansen van Vuuren [Thu, 15 Mar 2018 19:50:24 +0000 (15:50 -0400)]
nfp: flower: prioritize stats updates

BugLink: http://bugs.launchpad.net/bugs/1752061
Previously it was possible to interrupt processing stats updates because
they were handled in a work queue. Interrupting the stats updates could
lead to a situation where we backup the control message queue. This patch
moves the stats update processing out of the work queue to be processed as
soon as hardware sends a request.

Reported-by: Louis Peens <louis.peens@netronome.com>
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com>
Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 01c15e93a78cfcf45cc32d07aa38bdc84250f569)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agonvme-pci: Fix EEH failure on ppc
Wen Xiong [Wed, 14 Mar 2018 22:03:07 +0000 (18:03 -0400)]
nvme-pci: Fix EEH failure on ppc

BugLink: http://bugs.launchpad.net/bugs/1753371
Triggering PPC EEH detection and handling requires a memory mapped read
failure. The NVMe driver removed the periodic health check MMIO, so
there's no early detection mechanism to trigger the recovery. Instead,
the detection now happens when the nvme driver handles an IO timeout
event. This takes the pci channel offline, so we do not want the driver
to proceed with escalating its own recovery efforts that may conflict
with the EEH handler.

This patch ensures the driver will observe the channel was set to offline
after a failed MMIO read and resets the IO timer so the EEH handler has
a chance to recover the device.

Signed-off-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
[updated change log]
Signed-off-by: Keith Busch <keith.busch@intel.com>
(cherry picked from commit 651438bb0af5213f1f70d66e75bf11d08cb5537a)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agowatchdog: sbsa: use 32-bit read for WCV
Jayachandran C [Tue, 13 Mar 2018 22:51:31 +0000 (16:51 -0600)]
watchdog: sbsa: use 32-bit read for WCV

BugLink: https://bugs.launchpad.net/bugs/1755595
According to SBSA spec v3.1 section 5.3:
  All registers are 32 bits in size and should be accessed using
  32-bit reads and writes. If an access size other than 32 bits
  is used then the results are IMPLEMENTATION DEFINED.
  [...]
  The Generic Watchdog is little-endian

The current code uses readq to read the watchdog compare register
which does a 64-bit access. This fails on ThunderX2 which does not
implement 64-bit access to this register.

Fix this by using lo_hi_readq() that does two 32-bit reads.

Signed-off-by: Jayachandran C <jnair@caviumnetworks.com>
Reviewed-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Wim Van Sebroeck <wim@iguana.be>
(cherry picked from commit 93ac3deb7c220cbcec032a967220a1f109d58431)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoUBUNTU: d-i: Add netsec to nic-modules
dann frazier [Tue, 13 Mar 2018 17:31:32 +0000 (11:31 -0600)]
UBUNTU: d-i: Add netsec to nic-modules

Needed for netboot installs on the SynQuacer boards.

Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoKVM: s390: add vcpu stat counters for many instruction
Christian Borntraeger [Tue, 23 Jan 2018 12:28:40 +0000 (13:28 +0100)]
KVM: s390: add vcpu stat counters for many instruction

BugLink: http://bugs.launchpad.net/bugs/1755132
The overall instruction counter is larger than the sum of the
single counters. We should try to catch all instruction handlers
to make this match the summary counter.
Let us add sck,tb,sske,iske,rrbe,tb,tpi,tsch,lpsw,pswe....
and remove other unused ones.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Acked-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
(cherry picked from commit a37cb07a30f0a181bc45c6970e486ac2992e9cde)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoKVM: s390: diagnoses are instructions as well
Christian Borntraeger [Wed, 24 Jan 2018 11:27:01 +0000 (12:27 +0100)]
KVM: s390: diagnoses are instructions as well

BugLink: http://bugs.launchpad.net/bugs/1755132
Make the diagnose counters also appear as instruction counters.

Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: Janosch Frank <frankja@linux.vnet.ibm.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
(cherry picked from commit 866c138c32547b690e4cab3cd209c763508a95ab)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoCIFS: dump IPC tcon in debug proc file
Aurelien Aptel [Fri, 9 Mar 2018 08:33:54 +0000 (09:33 +0100)]
CIFS: dump IPC tcon in debug proc file

BugLink: http://bugs.launchpad.net/bugs/1747572
dump it as first share with an "IPC: " prefix.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit 02cf5905e35df7e08691b6becda167858486da9a)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoCIFS: use tcon_ipc instead of use_ipc parameter of SMB2_ioctl
Aurelien Aptel [Fri, 9 Mar 2018 08:33:53 +0000 (09:33 +0100)]
CIFS: use tcon_ipc instead of use_ipc parameter of SMB2_ioctl

BugLink: http://bugs.launchpad.net/bugs/1747572
Since IPC now has a tcon object, the caller can just pass it. This
allows domain-based DFS requests to work with smb2+.

Link: https://bugzilla.samba.org/show_bug.cgi?id=12917
Fixes: 9d49640a21bf ("CIFS: implement get_dfs_refer for SMB2+")
Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
(cherry picked from commit 63a83b861c47dba9e0f46b98423723a6a3d97fb1)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoCIFS: make IPC a regular tcon
Aurelien Aptel [Fri, 9 Mar 2018 08:33:52 +0000 (09:33 +0100)]
CIFS: make IPC a regular tcon

BugLink: http://bugs.launchpad.net/bugs/1747572
* Remove ses->ipc_tid.
* Make IPC$ regular tcon.
* Add a direct pointer to it in ses->tcon_ipc.
* Distinguish PIPE tcon from IPC tcon by adding a tcon->pipe flag. All
  IPC tcons are pipes but not all pipes are IPC.
* All TreeConnect functions now cannot take a NULL tcon object.

The IPC tcon has the same lifetime as the session it belongs to. It is
created when the session is created and destroyed when the session is
destroyed.

Since no mounts directly refer to the IPC tcon, its refcount should
always be set to initialisation value (1). Thus we make sure
cifs_put_tcon() skips it.

If the mount request resulting in a new session being created requires
encryption, try to require it too for IPC.

* set SERVER_NAME_LENGTH to serverName actual size

The maximum length of an ipv6 string representation is defined in
INET6_ADDRSTRLEN as 45+1 for null but lets keep what we know works.

Signed-off-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com>
(back ported from commit b327a717e506980399464e304e363f94f95eb7a1)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoi2c: octeon: Prevent error message on bus error
Jan Glauber [Wed, 7 Mar 2018 16:11:33 +0000 (17:11 +0100)]
i2c: octeon: Prevent error message on bus error

BugLink: https://bugs.launchpad.net/bugs/1754076
The error message:

[Fri Feb 16 13:42:13 2018] i2c-thunderx 0000:01:09.4: unhandled state: 0

is mis-leading as state 0 (bus error) is not an unknown state.

Return -EIO as before but avoid printing the message. Also rename
STAT_ERROR to STATE_BUS_ERROR.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
Signed-off-by: Wolfram Sang <wsa@the-dreams.de>
(cherry picked from commit 7c4246797b84e55e2dfaaf8a18033de9df7c18c1)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Acked-by: Kleber Sacilotto de Souza <kleber.souza@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
6 years agoscsi: qla2xxx: Fix memory corruption during hba reset test
Quinn Tran [Tue, 13 Mar 2018 15:31:45 +0000 (11:31 -0400)]
scsi: qla2xxx: Fix memory corruption during hba reset test

BugLink: http://bugs.launchpad.net/bugs/1750441
This patch fixes memory corrpution while performing HBA Reset test.

Following stack trace is seen:

[  466.397219] BUG: unable to handle kernel NULL pointer dereference at 0000000000000020
[  466.433669] IP: [<ffffffffc06f5dd0>] qlt_free_session_done+0x260/0x5f0 [qla2xxx]
[  466.467731] PGD 0
[  466.476718] Oops: 0000 [#1] SMP

Signed-off-by: Quinn Tran <quinn.tran@cavium.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com>
Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 2ce87cc5b269510de9ca1185ca8a6e10ec78c069)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoKVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler
Paul Mackerras [Tue, 13 Mar 2018 15:15:45 +0000 (11:15 -0400)]
KVM: PPC: Book3S HV: Fix handling of large pages in radix page fault handler

BugLink: http://bugs.launchpad.net/bugs/1752236
This fixes several bugs in the radix page fault handler relating to
the way large pages in the memory backing the guest were handled.
First, the check for large pages only checked for explicit huge pages
and missed transparent huge pages.  Then the check that the addresses
(host virtual vs. guest physical) had appropriate alignment was
wrong, meaning that the code never put a large page in the partition
scoped radix tree; it was always demoted to a small page.

Fixing this exposed bugs in kvmppc_create_pte().  We were never
invalidating a 2MB PTE, which meant that if a page was initially
faulted in without write permission and the guest then attempted
to store to it, we would never update the PTE to have write permission.
If we find a valid 2MB PTE in the PMD, we need to clear it and
do a TLB invalidation before installing either the new 2MB PTE or
a pointer to a page table page.

This also corrects an assumption that get_user_pages_fast would set
the _PAGE_DIRTY bit if we are writing, which is not true.  Instead we
mark the page dirty explicitly with set_page_dirty_lock().  This
also means we don't need the dirty bit set on the host PTE when
providing write access on a read fault.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
(cherry picked from commit c3856aeb29402e94ad9b3879030165cc6a4fdc56)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoUBUNTU: SAUCE: Fix ARC hit rate (LP: #1755158)
Colin Ian King [Tue, 13 Mar 2018 14:13:18 +0000 (14:13 +0000)]
UBUNTU: SAUCE: Fix ARC hit rate (LP: #1755158)

BugLink: http://bugs.launchpad.net/bugs/1755158
Upstream ZFS fix, commit 0873bb6337452e3e028e40f5dad945b30deab185,
Fixes issue that can impact ARC hit rate especially with a small ARC

When the compressed ARC feature was added in commit d3c2ae1
the method of reference counting in the ARC was modified. As
part of this accounting change the arc_buf_add_ref() function
was removed entirely.

This would have be fine but the arc_buf_add_ref() function
served a second undocumented purpose of updating the ARC access
information when taking a hold on a dbuf. Without this logic
in place a cached dbuf would not migrate its associated
arc_buf_hdr_t to the MFU list. This would negatively impact
the ARC hit rate, particularly on systems with a small ARC.

Reviewed-by: Giuseppe Di Natale <dinatale2@llnl.gov>
Reviewed-by: Tony Hutter <hutter2@llnl.gov>
Reviewed-by: Tim Chase <tim@chase2k.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #6171
Closes #6852
Closes #6989
(Backported from upstream ZFS commit 0873bb6337452e3e028e40f5dad945b30deab185)
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoLinux 4.15.9
Greg Kroah-Hartman [Sun, 11 Mar 2018 15:25:15 +0000 (16:25 +0100)]
Linux 4.15.9

BugLink: http://bugs.launchpad.net/bugs/1755275
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoKVM: x86: fix backward migration with async_PF
Radim Krčmář [Thu, 1 Feb 2018 21:16:21 +0000 (22:16 +0100)]
KVM: x86: fix backward migration with async_PF

BugLink: http://bugs.launchpad.net/bugs/1755275
commit fe2a3027e74e40a3ece3a4c1e4e51403090a907a upstream.

Guests on new hypersiors might set KVM_ASYNC_PF_DELIVERY_AS_PF_VMEXIT
bit when enabling async_PF, but this bit is reserved on old hypervisors,
which results in a failure upon migration.

To avoid breaking different cases, we are checking for CPUID feature bit
before enabling the feature and nothing else.

Fixes: 52a5c155cf79 ("KVM: async_pf: Let guest support delivery of async_pf from guest mode")
Cc: <stable@vger.kernel.org>
Reviewed-by: Wanpeng Li <wanpengli@tencent.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
[jwang: port to 4.14]
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoscsi: mpt3sas: wait for and flush running commands on shutdown/unload
Sreekanth Reddy [Fri, 16 Feb 2018 22:39:58 +0000 (20:39 -0200)]
scsi: mpt3sas: wait for and flush running commands on shutdown/unload

BugLink: http://bugs.launchpad.net/bugs/1755275
commit c666d3be99c000bb889a33353e9be0fa5808d3de upstream.

This patch finishes all outstanding SCSI IO commands (but not other commands,
e.g., task management) in the shutdown and unload paths.

It first waits for the commands to complete (this is done after setting
'ioc->remove_host = 1 ', which prevents new commands to be queued) then it
flushes commands that might still be running.

This avoids triggering error handling (e.g., abort command) for all commands
possibly completed by the adapter after interrupts disabled.

[mauricfo: introduced something in commit message.]

Signed-off-by: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Tested-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
[mauricfo: backport to linux-4.15.y (a few updates to context lines)]
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoscsi: mpt3sas: fix oops in error handlers after shutdown/unload
Mauricio Faria de Oliveira [Fri, 16 Feb 2018 22:39:57 +0000 (20:39 -0200)]
scsi: mpt3sas: fix oops in error handlers after shutdown/unload

BugLink: http://bugs.launchpad.net/bugs/1755275
commit 9ff549ffb4fb4cc9a4b24d1de9dc3e68287797c4 upstream.

This patch adds checks for 'ioc->remove_host' in the SCSI error handlers, so
not to access pointers/resources potentially freed in the PCI shutdown/module
unload path.  The error handlers may be invoked after shutdown/unload,
depending on other components.

This problem was observed with kexec on a system with a mpt3sas based adapter
and an infiniband adapter which takes long enough to shutdown:

The mpt3sas driver finished shutting down / disabled interrupt handling, thus
some commands have not finished and timed out.

Since the system was still running (waiting for the infiniband adapter to
shutdown), the scsi error handler for task abort of mpt3sas was invoked, and
hit an oops -- either in scsih_abort() because 'ioc->scsi_lookup' was NULL
without commit dbec4c9040ed ("scsi: mpt3sas: lockless command submission"), or
later up in scsih_host_reset() (with or without that commit), because it
eventually called mpt3sas_base_get_iocstate().

After the above commit, the oops in scsih_abort() does not occur anymore
(_scsih_scsi_lookup_find_by_scmd() is no longer called), but that commit is
too big and out of the scope of linux-stable, where this patch might help, so
still go for the changes.

Also, this might help to prevent similar errors in the future, in case code
changes and possibly tries to access freed stuff.

Note the fix in scsih_host_reset() is still important anyway.

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Acked-by: Sreekanth Reddy <Sreekanth.Reddy@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agobpf, ppc64: fix out of bounds access in tail call
Daniel Borkmann [Thu, 8 Mar 2018 12:16:49 +0000 (13:16 +0100)]
bpf, ppc64: fix out of bounds access in tail call

BugLink: http://bugs.launchpad.net/bugs/1755275
[ upstream commit d269176e766c71c998cb75b4ea8cbc321cc0019d ]

While working on 16338a9b3ac3 ("bpf, arm64: fix out of bounds access in
tail call") I noticed that ppc64 JIT is partially affected as well. While
the bound checking is correctly performed as unsigned comparison, the
register with the index value however, is never truncated into 32 bit
space, so e.g. a index value of 0x100000000ULL with a map of 1 element
would pass with PPC_CMPLW() whereas we later on continue with the full
64 bit register value. Therefore, as we do in interpreter and other JITs
truncate the value to 32 bit initially in order to fix access.

Fixes: ce0761419fae ("powerpc/bpf: Implement support for tail calls")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agobpf: allow xadd only on aligned memory
Daniel Borkmann [Thu, 8 Mar 2018 12:16:48 +0000 (13:16 +0100)]
bpf: allow xadd only on aligned memory

BugLink: http://bugs.launchpad.net/bugs/1755275
[ upstream commit ca36960211eb228bcbc7aaebfa0d027368a94c60 ]

The requirements around atomic_add() / atomic64_add() resp. their
JIT implementations differ across architectures. E.g. while x86_64
seems just fine with BPF's xadd on unaligned memory, on arm64 it
triggers via interpreter but also JIT the following crash:

  [  830.864985] Unable to handle kernel paging request at virtual address ffff8097d7ed6703
  [...]
  [  830.916161] Internal error: Oops: 96000021 [#1] SMP
  [  830.984755] CPU: 37 PID: 2788 Comm: test_verifier Not tainted 4.16.0-rc2+ #8
  [  830.991790] Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.29 07/17/2017
  [  830.998998] pstate: 80400005 (Nzcv daif +PAN -UAO)
  [  831.003793] pc : __ll_sc_atomic_add+0x4/0x18
  [  831.008055] lr : ___bpf_prog_run+0x1198/0x1588
  [  831.012485] sp : ffff00001ccabc20
  [  831.015786] x29: ffff00001ccabc20 x28: ffff8017d56a0f00
  [  831.021087] x27: 0000000000000001 x26: 0000000000000000
  [  831.026387] x25: 000000c168d9db98 x24: 0000000000000000
  [  831.031686] x23: ffff000008203878 x22: ffff000009488000
  [  831.036986] x21: ffff000008b14e28 x20: ffff00001ccabcb0
  [  831.042286] x19: ffff0000097b5080 x18: 0000000000000a03
  [  831.047585] x17: 0000000000000000 x16: 0000000000000000
  [  831.052885] x15: 0000ffffaeca8000 x14: 0000000000000000
  [  831.058184] x13: 0000000000000000 x12: 0000000000000000
  [  831.063484] x11: 0000000000000001 x10: 0000000000000000
  [  831.068783] x9 : 0000000000000000 x8 : 0000000000000000
  [  831.074083] x7 : 0000000000000000 x6 : 000580d428000000
  [  831.079383] x5 : 0000000000000018 x4 : 0000000000000000
  [  831.084682] x3 : ffff00001ccabcb0 x2 : 0000000000000001
  [  831.089982] x1 : ffff8097d7ed6703 x0 : 0000000000000001
  [  831.095282] Process test_verifier (pid: 2788, stack limit = 0x0000000018370044)
  [  831.102577] Call trace:
  [  831.105012]  __ll_sc_atomic_add+0x4/0x18
  [  831.108923]  __bpf_prog_run32+0x4c/0x70
  [  831.112748]  bpf_test_run+0x78/0xf8
  [  831.116224]  bpf_prog_test_run_xdp+0xb4/0x120
  [  831.120567]  SyS_bpf+0x77c/0x1110
  [  831.123873]  el0_svc_naked+0x30/0x34
  [  831.127437] Code: 97fffe97 17ffffec 00000000 f9800031 (885f7c31)

Reason for this is because memory is required to be aligned. In
case of BPF, we always enforce alignment in terms of stack access,
but not when accessing map values or packet data when the underlying
arch (e.g. arm64) has CONFIG_HAVE_EFFICIENT_UNALIGNED_ACCESS set.

xadd on packet data that is local to us anyway is just wrong, so
forbid this case entirely. The only place where xadd makes sense in
fact are map values; xadd on stack is wrong as well, but it's been
around for much longer. Specifically enforce strict alignment in case
of xadd, so that we handle this case generically and avoid such crashes
in the first place.

Fixes: 17a5267067f3 ("bpf: verifier (add verifier core)")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agobpf: add schedule points in percpu arrays management
Eric Dumazet [Thu, 8 Mar 2018 12:16:47 +0000 (13:16 +0100)]
bpf: add schedule points in percpu arrays management

BugLink: http://bugs.launchpad.net/bugs/1755275
[ upstream commit 32fff239de37ef226d5b66329dd133f64d63b22d ]

syszbot managed to trigger RCU detected stalls in
bpf_array_free_percpu()

It takes time to allocate a huge percpu map, but even more time to free
it.

Since we run in process context, use cond_resched() to yield cpu if
needed.

Fixes: a10423b87a7e ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agobpf, arm64: fix out of bounds access in tail call
Daniel Borkmann [Thu, 8 Mar 2018 12:16:46 +0000 (13:16 +0100)]
bpf, arm64: fix out of bounds access in tail call

BugLink: http://bugs.launchpad.net/bugs/1755275
[ upstream commit 16338a9b3ac30740d49f5dfed81bac0ffa53b9c7 ]

I recently noticed a crash on arm64 when feeding a bogus index
into BPF tail call helper. The crash would not occur when the
interpreter is used, but only in case of JIT. Output looks as
follows:

  [  347.007486] Unable to handle kernel paging request at virtual address fffb850e96492510
  [...]
  [  347.043065] [fffb850e96492510] address between user and kernel address ranges
  [  347.050205] Internal error: Oops: 96000004 [#1] SMP
  [...]
  [  347.190829] x13: 0000000000000000 x12: 0000000000000000
  [  347.196128] x11: fffc047ebe782800 x10: ffff808fd7d0fd10
  [  347.201427] x9 : 0000000000000000 x8 : 0000000000000000
  [  347.206726] x7 : 0000000000000000 x6 : 001c991738000000
  [  347.212025] x5 : 0000000000000018 x4 : 000000000000ba5a
  [  347.217325] x3 : 00000000000329c4 x2 : ffff808fd7cf0500
  [  347.222625] x1 : ffff808fd7d0fc00 x0 : ffff808fd7cf0500
  [  347.227926] Process test_verifier (pid: 4548, stack limit = 0x000000007467fa61)
  [  347.235221] Call trace:
  [  347.237656]  0xffff000002f3a4fc
  [  347.240784]  bpf_test_run+0x78/0xf8
  [  347.244260]  bpf_prog_test_run_skb+0x148/0x230
  [  347.248694]  SyS_bpf+0x77c/0x1110
  [  347.251999]  el0_svc_naked+0x30/0x34
  [  347.255564] Code: 9100075a d280220a 8b0a002a d37df04b (f86b694b)
  [...]

In this case the index used in BPF r3 is the same as in r1
at the time of the call, meaning we fed a pointer as index;
here, it had the value 0xffff808fd7cf0500 which sits in x2.

While I found tail calls to be working in general (also for
hitting the error cases), I noticed the following in the code
emission:

  # bpftool p d j i 988
  [...]
  38:   ldr     w10, [x1,x10]
  3c:   cmp     w2, w10
  40:   b.ge    0x000000000000007c              <-- signed cmp
  44:   mov     x10, #0x20                      // #32
  48:   cmp     x26, x10
  4c:   b.gt    0x000000000000007c
  50:   add     x26, x26, #0x1
  54:   mov     x10, #0x110                     // #272
  58:   add     x10, x1, x10
  5c:   lsl     x11, x2, #3
  60:   ldr     x11, [x10,x11]                  <-- faulting insn (f86b694b)
  64:   cbz     x11, 0x000000000000007c
  [...]

Meaning, the tests passed because commit ddb55992b04d ("arm64:
bpf: implement bpf_tail_call() helper") was using signed compares
instead of unsigned which as a result had the test wrongly passing.

Change this but also the tail call count test both into unsigned
and cap the index as u32. Latter we did as well in 90caccdd8cc0
("bpf: fix bpf_tail_call() x64 JIT") and is needed in addition here,
too. Tested on HiSilicon Hi1616.

Result after patch:

  # bpftool p d j i 268
  [...]
  38: ldr w10, [x1,x10]
  3c: add w2, w2, #0x0
  40: cmp w2, w10
  44: b.cs 0x0000000000000080
  48: mov x10, #0x20                   // #32
  4c: cmp x26, x10
  50: b.hi 0x0000000000000080
  54: add x26, x26, #0x1
  58: mov x10, #0x110                  // #272
  5c: add x10, x1, x10
  60: lsl x11, x2, #3
  64: ldr x11, [x10,x11]
  68: cbz x11, 0x0000000000000080
  [...]

Fixes: ddb55992b04d ("arm64: bpf: implement bpf_tail_call() helper")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agobpf, x64: implement retpoline for tail call
Daniel Borkmann [Thu, 8 Mar 2018 12:16:45 +0000 (13:16 +0100)]
bpf, x64: implement retpoline for tail call

BugLink: http://bugs.launchpad.net/bugs/1755275
[ upstream commit a493a87f38cfa48caaa95c9347be2d914c6fdf29 ]

Implement a retpoline [0] for the BPF tail call JIT'ing that converts
the indirect jump via jmp %rax that is used to make the long jump into
another JITed BPF image. Since this is subject to speculative execution,
we need to control the transient instruction sequence here as well
when CONFIG_RETPOLINE is set, and direct it into a pause + lfence loop.
The latter aligns also with what gcc / clang emits (e.g. [1]).

JIT dump after patch:

  # bpftool p d x i 1
   0: (18) r2 = map[id:1]
   2: (b7) r3 = 0
   3: (85) call bpf_tail_call#12
   4: (b7) r0 = 2
   5: (95) exit

With CONFIG_RETPOLINE:

  # bpftool p d j i 1
  [...]
  33: cmp    %edx,0x24(%rsi)
  36: jbe    0x0000000000000072  |*
  38: mov    0x24(%rbp),%eax
  3e: cmp    $0x20,%eax
  41: ja     0x0000000000000072  |
  43: add    $0x1,%eax
  46: mov    %eax,0x24(%rbp)
  4c: mov    0x90(%rsi,%rdx,8),%rax
  54: test   %rax,%rax
  57: je     0x0000000000000072  |
  59: mov    0x28(%rax),%rax
  5d: add    $0x25,%rax
  61: callq  0x000000000000006d  |+
  66: pause                      |
  68: lfence                     |
  6b: jmp    0x0000000000000066  |
  6d: mov    %rax,(%rsp)         |
  71: retq                       |
  72: mov    $0x2,%eax
  [...]

  * relative fall-through jumps in error case
  + retpoline for indirect jump

Without CONFIG_RETPOLINE:

  # bpftool p d j i 1
  [...]
  33: cmp    %edx,0x24(%rsi)
  36: jbe    0x0000000000000063  |*
  38: mov    0x24(%rbp),%eax
  3e: cmp    $0x20,%eax
  41: ja     0x0000000000000063  |
  43: add    $0x1,%eax
  46: mov    %eax,0x24(%rbp)
  4c: mov    0x90(%rsi,%rdx,8),%rax
  54: test   %rax,%rax
  57: je     0x0000000000000063  |
  59: mov    0x28(%rax),%rax
  5d: add    $0x25,%rax
  61: jmpq   *%rax               |-
  63: mov    $0x2,%eax
  [...]

  * relative fall-through jumps in error case
  - plain indirect jump as before

  [0] https://support.google.com/faqs/answer/7625886
  [1] https://github.com/gcc-mirror/gcc/commit/a31e654fa107be968b802786d747e962c2fcdb2b

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agobpf: fix rcu lockdep warning for lpm_trie map_free callback
Yonghong Song [Thu, 8 Mar 2018 12:16:44 +0000 (13:16 +0100)]
bpf: fix rcu lockdep warning for lpm_trie map_free callback

BugLink: http://bugs.launchpad.net/bugs/1755275
[ upstream commit 6c5f61023c5b0edb0c8a64c902fe97c6453b1852 ]

Commit 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback function")
fixed a memory leak and removed unnecessary locks in map_free callback function.
Unfortrunately, it introduced a lockdep warning. When lockdep checking is turned on,
running tools/testing/selftests/bpf/test_lpm_map will have:

  [   98.294321] =============================
  [   98.294807] WARNING: suspicious RCU usage
  [   98.295359] 4.16.0-rc2+ #193 Not tainted
  [   98.295907] -----------------------------
  [   98.296486] /home/yhs/work/bpf/kernel/bpf/lpm_trie.c:572 suspicious rcu_dereference_check() usage!
  [   98.297657]
  [   98.297657] other info that might help us debug this:
  [   98.297657]
  [   98.298663]
  [   98.298663] rcu_scheduler_active = 2, debug_locks = 1
  [   98.299536] 2 locks held by kworker/2:1/54:
  [   98.300152]  #0:  ((wq_completion)"events"){+.+.}, at: [<00000000196bc1f0>] process_one_work+0x157/0x5c0
  [   98.301381]  #1:  ((work_completion)(&map->work)){+.+.}, at: [<00000000196bc1f0>] process_one_work+0x157/0x5c0

Since actual trie tree removal happens only after no other
accesses to the tree are possible, replacing
  rcu_dereference_protected(*slot, lockdep_is_held(&trie->lock))
with
  rcu_dereference_protected(*slot, 1)
fixed the issue.

Fixes: 9a3efb6b661f ("bpf: fix memory leak in lpm_trie map_free callback function")
Reported-by: Eric Dumazet <edumazet@google.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Acked-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agobpf: fix memory leak in lpm_trie map_free callback function
Yonghong Song [Thu, 8 Mar 2018 12:16:43 +0000 (13:16 +0100)]
bpf: fix memory leak in lpm_trie map_free callback function

BugLink: http://bugs.launchpad.net/bugs/1755275
[ upstream commit 9a3efb6b661f71d5675369ace9257833f0e78ef3 ]

There is a memory leak happening in lpm_trie map_free callback
function trie_free. The trie structure itself does not get freed.

Also, trie_free function did not do synchronize_rcu before freeing
various data structures. This is incorrect as some rcu_read_lock
region(s) for lookup, update, delete or get_next_key may not complete yet.
The fix is to add synchronize_rcu in the beginning of trie_free.
The useless spin_lock is removed from this function as well.

Fixes: b95a5c4db09b ("bpf: add a longest prefix match trie map implementation")
Reported-by: Mathieu Malaterre <malat@debian.org>
Reported-by: Alexei Starovoitov <ast@kernel.org>
Tested-by: Mathieu Malaterre <malat@debian.org>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agobpf: fix mlock precharge on arraymaps
Daniel Borkmann [Thu, 8 Mar 2018 12:16:42 +0000 (13:16 +0100)]
bpf: fix mlock precharge on arraymaps

BugLink: http://bugs.launchpad.net/bugs/1755275
[ upstream commit 9c2d63b843a5c8a8d0559cc067b5398aa5ec3ffc ]

syzkaller recently triggered OOM during percpu map allocation;
while there is work in progress by Dennis Zhou to add __GFP_NORETRY
semantics for percpu allocator under pressure, there seems also a
missing bpf_map_precharge_memlock() check in array map allocation.

Given today the actual bpf_map_charge_memlock() happens after the
find_and_alloc_map() in syscall path, the bpf_map_precharge_memlock()
is there to bail out early before we go and do the map setup work
when we find that we hit the limits anyway. Therefore add this for
array map as well.

Fixes: 6c9059817432 ("bpf: pre-allocate hash map elements")
Fixes: a10423b87a7e ("bpf: introduce BPF_MAP_TYPE_PERCPU_ARRAY map")
Reported-by: syzbot+adb03f3f0bb57ce3acda@syzkaller.appspotmail.com
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Dennis Zhou <dennisszhou@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoLinux 4.15.8
Greg Kroah-Hartman [Fri, 9 Mar 2018 06:47:50 +0000 (22:47 -0800)]
Linux 4.15.8

BugLink: http://bugs.launchpad.net/bugs/1755179
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoplatform/x86: dell-laptop: fix kbd_get_state's request value
Laszlo Toth [Tue, 13 Feb 2018 20:43:43 +0000 (21:43 +0100)]
platform/x86: dell-laptop: fix kbd_get_state's request value

BugLink: http://bugs.launchpad.net/bugs/1755179
commit eca39e7f0cdb9bde4003a29149fa695e876c6f73 upstream.

Commit 9862b43624a5 ("platform/x86: dell-laptop: Allocate buffer on heap
rather than globally")
broke one request, changed it back to the original value.

Tested on a Dell E6540, backlight came back.

Fixes: 9862b43624a5 ("platform/x86: dell-laptop: Allocate buffer on heap rather than globally")
Signed-off-by: Laszlo Toth <laszlth@gmail.com>
Reviewed-by: Pali Rohár <pali.rohar@gmail.com>
Reviewed-by: Mario Limonciello <mario.limonciello@dell.com>
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agomd: only allow remove_and_add_spares when no sync_thread running.
NeilBrown [Fri, 2 Feb 2018 22:19:30 +0000 (09:19 +1100)]
md: only allow remove_and_add_spares when no sync_thread running.

BugLink: http://bugs.launchpad.net/bugs/1755179
commit 39772f0a7be3b3dc26c74ea13fe7847fd1522c8b upstream.

The locking protocols in md assume that a device will
never be removed from an array during resync/recovery/reshape.
When that isn't happening, rcu or reconfig_mutex is needed
to protect an rdev pointer while taking a refcount.  When
it is happening, that protection isn't needed.

Unfortunately there are cases were remove_and_add_spares() is
called when recovery might be happening: is state_store(),
slot_store() and hot_remove_disk().
In each case, this is just an optimization, to try to expedite
removal from the personality so the device can be removed from
the array.  If resync etc is happening, we just have to wait
for md_check_recover to find a suitable time to call
remove_and_add_spares().

This optimization and not essential so it doesn't
matter if it fails.
So change remove_and_add_spares() to abort early if
resync/recovery/reshape is happening, unless it is called
from md_check_recovery() as part of a newly started recovery.
The parameter "this" is only NULL when called from
md_check_recovery() so when it is NULL, there is no need to abort.

As this can result in a NULL dereference, the fix is suitable
for -stable.

cc: yuyufen <yuyufen@huawei.com>
Cc: Tomasz Majchrzak <tomasz.majchrzak@intel.com>
Fixes: 8430e7e0af9a ("md: disconnect device from personality before trying to remove it.")
Cc: stable@ver.kernel.org (v4.8+)
Signed-off-by: NeilBrown <neilb@suse.com>
Signed-off-by: Shaohua Li <sh.li@alibaba-inc.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agopowerpc/64s/radix: Boot-time NULL pointer protection using a guard-PID
Nicholas Piggin [Wed, 7 Feb 2018 01:20:02 +0000 (11:20 +1000)]
powerpc/64s/radix: Boot-time NULL pointer protection using a guard-PID

BugLink: http://bugs.launchpad.net/bugs/1755179
commit eeb715c3e995fbdda0cc05e61216c6c5609bce66 upstream.

This change restores and formalises the behaviour that access to NULL
or other user addresses by the kernel during boot should fault rather
than succeed and modify memory. This was inadvertently broken when
fixing another bug, because it was previously not well defined and
only worked by chance.

powerpc/64s/radix uses high address bits to select an address space
"quadrant", which determines which PID and LPID are used to translate
the rest of the address (effective PID, effective LPID). The kernel
mapping at 0xC... selects quadrant 3, which uses PID=0 and LPID=0. So
the kernel page tables are installed in the PID 0 process table entry.

An address at 0x0... selects quadrant 0, which uses PID=PIDR for
translating the rest of the address (that is, it uses the value of the
PIDR register as the effective PID). If PIDR=0, then the translation
is performed with the PID 0 process table entry page tables. This is
the kernel mapping, so we effectively get another copy of the kernel
address space at 0. A NULL pointer access will access physical memory
address 0.

To prevent duplicating the kernel address space in quadrant 0, this
patch allocates a guard PID containing no translations, and
initializes PIDR with this during boot, before the MMU is switched on.
Any kernel access to quadrant 0 will use this guard PID for
translation and find no valid mappings, and therefore fault.

After boot, this PID will be switchd away to user context PIDs, but
those contain user mappings (and usually NULL pointer protection)
rather than kernel mapping, which is much safer (and by design). It
may be in future this is tightened further, which the guard PID could
be used for.

Commit 371b8044 ("powerpc/64s: Initialize ISAv3 MMU registers before
setting partition table"), introduced this problem because it zeroes
PIDR at boot. However previously the value was inherited from firmware
or kexec, which is not robust and can be zero (e.g., mambo).

Fixes: 371b80447ff3 ("powerpc/64s: Initialize ISAv3 MMU registers before setting partition table")
Cc: stable@vger.kernel.org # v4.15+
Reported-by: Florian Weimer <fweimer@redhat.com>
Tested-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
[mauricfo: backport to v4.15.7 (context line updates only) and re-test]
Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoARM: dts: LogicPD Torpedo: Fix I2C1 pinmux
Adam Ford [Thu, 25 Jan 2018 20:10:37 +0000 (14:10 -0600)]
ARM: dts: LogicPD Torpedo: Fix I2C1 pinmux

BugLink: http://bugs.launchpad.net/bugs/1755179
commit 74402055a2d3ec998a1ded599e86185a27d9bbf4 upstream.

The pinmuxing was missing for I2C1 which was causing intermittent issues
with the PMIC which is connected to I2C1.  The bootloader did not quite
configure the I2C1 either, so when running at 2.6MHz, it was generating
errors at time.

This correctly sets the I2C1 pinmuxing so it can operate at 2.6MHz

Fixes: 687c27676151 ("ARM: dts: Add minimal support for LogicPD Torpedo
DM3730 devkit")

Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoARM: dts: LogicPD SOM-LV: Fix I2C1 pinmux
Adam Ford [Sat, 27 Jan 2018 21:27:05 +0000 (15:27 -0600)]
ARM: dts: LogicPD SOM-LV: Fix I2C1 pinmux

BugLink: http://bugs.launchpad.net/bugs/1755179
commit 84c7efd607e7fb6933920322086db64654f669b2 upstream.

The pinmuxing was missing for I2C1 which was causing intermittent issues
with the PMIC which is connected to I2C1.  The bootloader did not quite
configure the I2C1 either, so when running at 2.6MHz, it was generating
errors at times.

This correctly sets the I2C1 pinmuxing so it can operate at 2.6MHz

Fixes: ab8dd3aed011 ("ARM: DTS: Add minimal Support for Logic PD DM3730
SOM-LV")

Signed-off-by: Adam Ford <aford173@gmail.com>
Signed-off-by: Tony Lindgren <tony@atomide.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoACPI / bus: Parse tables as term_list for Dell XPS 9570 and Precision M5530
Kai Heng Feng [Mon, 5 Feb 2018 05:19:24 +0000 (13:19 +0800)]
ACPI / bus: Parse tables as term_list for Dell XPS 9570 and Precision M5530

BugLink: http://bugs.launchpad.net/bugs/1755179
commit 36904703aeeeb6cd31993f1353c8325006229f9a upstream.

The i2c touchpad on Dell XPS 9570 and Precision M5530 doesn't work out
of box.

The touchpad relies on its _INI method to update its _HID value from
XXXX0000 to SYNA2393.

Also, the _STA relies on value of I2CN to report correct status.

Set acpi_gbl_parse_table_as_term_list so the value of I2CN can be
correctly set up, and _INI can get run. The ACPI table in this machine
is designed to get parsed this way.

Also, change the quirk table to a more generic name.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=198515
Signed-off-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoKVM/x86: remove WARN_ON() for when vm_munmap() fails
Eric Biggers [Thu, 1 Feb 2018 01:30:21 +0000 (17:30 -0800)]
KVM/x86: remove WARN_ON() for when vm_munmap() fails

BugLink: http://bugs.launchpad.net/bugs/1755179
commit 103c763c72dd2df3e8c91f2d7ec88f98ed391111 upstream.

On x86, special KVM memslots such as the TSS region have anonymous
memory mappings created on behalf of userspace, and these mappings are
removed when the VM is destroyed.

It is however possible for removing these mappings via vm_munmap() to
fail.  This can most easily happen if the thread receives SIGKILL while
it's waiting to acquire ->mmap_sem.   This triggers the 'WARN_ON(r < 0)'
in __x86_set_memory_region().  syzkaller was able to hit this, using
'exit()' to send the SIGKILL.  Note that while the vm_munmap() failure
results in the mapping not being removed immediately, it is not leaked
forever but rather will be freed when the process exits.

It's not really possible to handle this failure properly, so almost
every other caller of vm_munmap() doesn't check the return value.  It's
a limitation of having the kernel manage these mappings rather than
userspace.

So just remove the WARN_ON() so that users can't spam the kernel log
with this warning.

Fixes: f0d648bdf0a5 ("KVM: x86: map/unmap private slots in __x86_set_memory_region")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoKVM: x86: fix vcpu initialization with userspace lapic
Radim Krčmář [Thu, 1 Mar 2018 14:24:25 +0000 (15:24 +0100)]
KVM: x86: fix vcpu initialization with userspace lapic

BugLink: http://bugs.launchpad.net/bugs/1755179
commit b7e31be385584afe7f073130e8e570d53c95f7fe upstream.

Moving the code around broke this rare configuration.
Use this opportunity to finally call lapic reset from vcpu reset.

Reported-by: syzbot+fb7a33a4b6c35007a72b@syzkaller.appspotmail.com
Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Fixes: 0b2e9904c159 ("KVM: x86: move LAPIC initialization after VMCS creation")
Cc: stable@vger.kernel.org
Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoKVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR path as...
Paolo Bonzini [Thu, 22 Feb 2018 15:43:18 +0000 (16:43 +0100)]
KVM/VMX: Optimize vmx_vcpu_run() and svm_vcpu_run() by marking the RDMSR path as unlikely()

BugLink: http://bugs.launchpad.net/bugs/1755179
commit 946fbbc13dce68902f64515b610eeb2a6c3d7a64 upstream.

vmx_vcpu_run() and svm_vcpu_run() are large functions, and giving
branch hints to the compiler can actually make a substantial cycle
difference by keeping the fast path contiguous in memory.

With this optimization, the retpoline-guest/retpoline-host case is
about 50 cycles faster.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: KarimAllah Ahmed <karahmed@amazon.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/20180222154318.20361-3-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoKVM: x86: move LAPIC initialization after VMCS creation
Paolo Bonzini [Fri, 23 Feb 2018 22:29:32 +0000 (23:29 +0100)]
KVM: x86: move LAPIC initialization after VMCS creation

BugLink: http://bugs.launchpad.net/bugs/1755179
commit 0b2e9904c15963e715d33e5f3f1387f17d19333a upstream.

The initial reset of the local APIC is performed before the VMCS has been
created, but it tries to do a vmwrite:

 vmwrite error: reg 810 value 4a00 (err 18944)
 CPU: 54 PID: 38652 Comm: qemu-kvm Tainted: G        W I      4.16.0-0.rc2.git0.1.fc28.x86_64 #1
 Hardware name: Intel Corporation S2600CW/S2600CW, BIOS SE5C610.86B.01.01.0003.090520141303 09/05/2014
 Call Trace:
  vmx_set_rvi [kvm_intel]
  vmx_hwapic_irr_update [kvm_intel]
  kvm_lapic_reset [kvm]
  kvm_create_lapic [kvm]
  kvm_arch_vcpu_init [kvm]
  kvm_vcpu_init [kvm]
  vmx_create_vcpu [kvm_intel]
  kvm_vm_ioctl [kvm]

Move it later, after the VMCS has been created.

Fixes: 4191db26b714 ("KVM: x86: Update APICv on APIC reset")
Cc: stable@vger.kernel.org
Cc: Liran Alon <liran.alon@oracle.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoKVM/x86: Remove indirect MSR op calls from SPEC_CTRL
Paolo Bonzini [Thu, 22 Feb 2018 15:43:17 +0000 (16:43 +0100)]
KVM/x86: Remove indirect MSR op calls from SPEC_CTRL

BugLink: http://bugs.launchpad.net/bugs/1755179
commit ecb586bd29c99fb4de599dec388658e74388daad upstream.

Having a paravirt indirect call in the IBRS restore path is not a
good idea, since we are trying to protect from speculative execution
of bogus indirect branch targets.  It is also slower, so use
native_wrmsrl() on the vmentry path too.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Jim Mattson <jmattson@google.com>
Cc: David Woodhouse <dwmw@amazon.co.uk>
Cc: KarimAllah Ahmed <karahmed@amazon.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: kvm@vger.kernel.org
Cc: stable@vger.kernel.org
Fixes: d28b387fb74da95d69d2615732f50cceb38e9a4d
Link: http://lkml.kernel.org/r/20180222154318.20361-2-pbonzini@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoKVM: mmu: Fix overlap between public and private memslots
Wanpeng Li [Tue, 13 Feb 2018 14:36:00 +0000 (15:36 +0100)]
KVM: mmu: Fix overlap between public and private memslots

BugLink: http://bugs.launchpad.net/bugs/1755179
commit b28676bb8ae4569cced423dc2a88f7cb319d5379 upstream.

Reported by syzkaller:

    pte_list_remove: ffff9714eb1f8078 0->BUG
    ------------[ cut here ]------------
    kernel BUG at arch/x86/kvm/mmu.c:1157!
    invalid opcode: 0000 [#1] SMP
    RIP: 0010:pte_list_remove+0x11b/0x120 [kvm]
    Call Trace:
     drop_spte+0x83/0xb0 [kvm]
     mmu_page_zap_pte+0xcc/0xe0 [kvm]
     kvm_mmu_prepare_zap_page+0x81/0x4a0 [kvm]
     kvm_mmu_invalidate_zap_all_pages+0x159/0x220 [kvm]
     kvm_arch_flush_shadow_all+0xe/0x10 [kvm]
     kvm_mmu_notifier_release+0x6c/0xa0 [kvm]
     ? kvm_mmu_notifier_release+0x5/0xa0 [kvm]
     __mmu_notifier_release+0x79/0x110
     ? __mmu_notifier_release+0x5/0x110
     exit_mmap+0x15a/0x170
     ? do_exit+0x281/0xcb0
     mmput+0x66/0x160
     do_exit+0x2c9/0xcb0
     ? __context_tracking_exit.part.5+0x4a/0x150
     do_group_exit+0x50/0xd0
     SyS_exit_group+0x14/0x20
     do_syscall_64+0x73/0x1f0
     entry_SYSCALL64_slow_path+0x25/0x25

The reason is that when creates new memslot, there is no guarantee for new
memslot not overlap with private memslots. This can be triggered by the
following program:

   #include <fcntl.h>
   #include <pthread.h>
   #include <setjmp.h>
   #include <signal.h>
   #include <stddef.h>
   #include <stdint.h>
   #include <stdio.h>
   #include <stdlib.h>
   #include <string.h>
   #include <sys/ioctl.h>
   #include <sys/stat.h>
   #include <sys/syscall.h>
   #include <sys/types.h>
   #include <unistd.h>
   #include <linux/kvm.h>

   long r[16];

   int main()
   {
void *p = valloc(0x4000);

r[2] = open("/dev/kvm", 0);
r[3] = ioctl(r[2], KVM_CREATE_VM, 0x0ul);

uint64_t addr = 0xf000;
ioctl(r[3], KVM_SET_IDENTITY_MAP_ADDR, &addr);
r[6] = ioctl(r[3], KVM_CREATE_VCPU, 0x0ul);
ioctl(r[3], KVM_SET_TSS_ADDR, 0x0ul);
ioctl(r[6], KVM_RUN, 0);
ioctl(r[6], KVM_RUN, 0);

struct kvm_userspace_memory_region mr = {
.slot = 0,
.flags = KVM_MEM_LOG_DIRTY_PAGES,
.guest_phys_addr = 0xf000,
.memory_size = 0x4000,
.userspace_addr = (uintptr_t) p
};
ioctl(r[3], KVM_SET_USER_MEMORY_REGION, &mr);
return 0;
   }

This patch fixes the bug by not adding a new memslot even if it
overlaps with private memslots.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Eric Biggers <ebiggers3@gmail.com>
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoKVM: X86: Fix SMRAM accessing even if VM is shutdown
Wanpeng Li [Thu, 8 Feb 2018 07:32:45 +0000 (15:32 +0800)]
KVM: X86: Fix SMRAM accessing even if VM is shutdown

BugLink: http://bugs.launchpad.net/bugs/1755179
commit 95e057e25892eaa48cad1e2d637b80d0f1a4fac5 upstream.

Reported by syzkaller:

   WARNING: CPU: 6 PID: 2434 at arch/x86/kvm/vmx.c:6660 handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
   CPU: 6 PID: 2434 Comm: repro_test Not tainted 4.15.0+ #4
   RIP: 0010:handle_ept_misconfig+0x54/0x1e0 [kvm_intel]
   Call Trace:
    vmx_handle_exit+0xbd/0xe20 [kvm_intel]
    kvm_arch_vcpu_ioctl_run+0xdaf/0x1d50 [kvm]
    kvm_vcpu_ioctl+0x3e9/0x720 [kvm]
    do_vfs_ioctl+0xa4/0x6a0
    SyS_ioctl+0x79/0x90
    entry_SYSCALL_64_fastpath+0x25/0x9c

The testcase creates a first thread to issue KVM_SMI ioctl, and then creates
a second thread to mmap and operate on the same vCPU.  This triggers a race
condition when running the testcase with multiple threads. Sometimes one thread
exits with a triple fault while another thread mmaps and operates on the same
vCPU.  Because CS=0x3000/IP=0x8000 is not mapped, accessing the SMI handler
results in an EPT misconfig. This patch fixes it by returning RET_PF_EMULATE
in kvm_handle_bad_page(), which will go on to cause an emulation failure and an
exit with KVM_EXIT_INTERNAL_ERROR.

Reported-by: syzbot+c1d9517cab094dae65e446c0c5b4de6c40f4dc58@syzkaller.appspotmail.com
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: stable@vger.kernel.org
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoARM: kvm: fix building with gcc-8
Arnd Bergmann [Fri, 2 Feb 2018 15:07:34 +0000 (16:07 +0100)]
ARM: kvm: fix building with gcc-8

BugLink: http://bugs.launchpad.net/bugs/1755179
commit 67870eb1204223598ea6d8a4467b482e9f5875b5 upstream.

In banked-sr.c, we use a top-level '__asm__(".arch_extension virt")'
statement to allow compilation of a multi-CPU kernel for ARMv6
and older ARMv7-A that don't normally support access to the banked
registers.

This is considered to be a programming error by the gcc developers
and will no longer work in gcc-8, where we now get a build error:

/tmp/cc4Qy7GR.s:34: Error: Banked registers are not available with this architecture. -- `mrs r3,SP_usr'
/tmp/cc4Qy7GR.s:41: Error: Banked registers are not available with this architecture. -- `mrs r3,ELR_hyp'
/tmp/cc4Qy7GR.s:55: Error: Banked registers are not available with this architecture. -- `mrs r3,SP_svc'
/tmp/cc4Qy7GR.s:62: Error: Banked registers are not available with this architecture. -- `mrs r3,LR_svc'
/tmp/cc4Qy7GR.s:69: Error: Banked registers are not available with this architecture. -- `mrs r3,SPSR_svc'
/tmp/cc4Qy7GR.s:76: Error: Banked registers are not available with this architecture. -- `mrs r3,SP_abt'

Passign the '-march-armv7ve' flag to gcc works, and is ok here, because
we know the functions won't ever be called on pre-ARMv7VE machines.
Unfortunately, older compiler versions (4.8 and earlier) do not understand
that flag, so we still need to keep the asm around.

Backporting to stable kernels (4.6+) is needed to allow those to be built
with future compilers as well.

Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84129
Fixes: 33280b4cd1dc ("ARM: KVM: Add banked registers save/restore")
Cc: stable@vger.kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Christoffer Dall <christoffer.dall@linaro.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoARM: mvebu: Fix broken PL310_ERRATA_753970 selects
Ulf Magnusson [Mon, 5 Feb 2018 01:21:13 +0000 (02:21 +0100)]
ARM: mvebu: Fix broken PL310_ERRATA_753970 selects

BugLink: http://bugs.launchpad.net/bugs/1755179
commit 8aa36a8dcde3183d84db7b0d622ffddcebb61077 upstream.

The MACH_ARMADA_375 and MACH_ARMADA_38X boards select ARM_ERRATA_753970,
but it was renamed to PL310_ERRATA_753970 by commit fa0ce4035d48 ("ARM:
7162/1: errata: tidy up Kconfig options for PL310 errata workarounds").

Fix the selects to use the new name.

Discovered with the
https://github.com/ulfalizer/Kconfiglib/blob/master/examples/list_undefined.py
script.
Fixes: fa0ce4035d48 ("ARM: 7162/1: errata: tidy up Kconfig options for
PL310 errata workarounds"
cc: stable@vger.kernel.org
Signed-off-by: Ulf Magnusson <ulfalizer@gmail.com>
Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoARM: dts: rockchip: Remove 1.8 GHz operation point from phycore som
Daniel Schultz [Tue, 13 Feb 2018 09:44:32 +0000 (10:44 +0100)]
ARM: dts: rockchip: Remove 1.8 GHz operation point from phycore som

BugLink: http://bugs.launchpad.net/bugs/1755179
commit 5ce0bad4ccd04c8a989e94d3c89e4e796ac22e48 upstream.

Rockchip recommends to run the CPU cores only with operations points of
1.6 GHz or lower.

Removed the cpu0 node with too high operation points and use the default
values instead.

Fixes: 903d31e34628 ("ARM: dts: rockchip: Add support for phyCORE-RK3288 SoM")
Cc: stable@vger.kernel.org
Signed-off-by: Daniel Schultz <d.schultz@phytec.de>
Signed-off-by: Heiko Stuebner <heiko@sntech.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoARM: orion: fix orion_ge00_switch_board_info initialization
Arnd Bergmann [Wed, 21 Feb 2018 12:18:49 +0000 (13:18 +0100)]
ARM: orion: fix orion_ge00_switch_board_info initialization

BugLink: http://bugs.launchpad.net/bugs/1755179
commit 8337d083507b9827dfb36d545538b7789df834fd upstream.

A section type mismatch warning shows up when building with LTO,
since orion_ge00_mvmdio_bus_name was put in __initconst but not marked
const itself:

include/linux/of.h: In function 'spear_setup_of_timer':
arch/arm/mach-spear/time.c:207:34: error: 'timer_of_match' causes a section type conflict with 'orion_ge00_mvmdio_bus_name'
 static const struct of_device_id timer_of_match[] __initconst = {
                                  ^
arch/arm/plat-orion/common.c:475:32: note: 'orion_ge00_mvmdio_bus_name' was declared here
 static __initconst const char *orion_ge00_mvmdio_bus_name = "orion-mii";
                                ^

As pointed out by Andrew Lunn, it should in fact be 'const' but not
'__initconst' because the string is never copied but may be accessed
after the init sections are freed. To fix that, I get rid of the
extra symbol and rewrite the initialization in a simpler way that
assigns both the bus_id and modalias statically.

I spotted another theoretical bug in the same place, where d->netdev[i]
may be an out of bounds access, this can be fixed by moving the device
assignment into the loop.

Cc: stable@vger.kernel.org
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agox86/mm: Fix {pmd,pud}_{set,clear}_flags()
Jan Beulich [Mon, 19 Feb 2018 14:48:11 +0000 (07:48 -0700)]
x86/mm: Fix {pmd,pud}_{set,clear}_flags()

BugLink: http://bugs.launchpad.net/bugs/1755179
commit 842cef9113c2120f74f645111ded1e020193d84c upstream.

Just like pte_{set,clear}_flags() their PMD and PUD counterparts should
not do any address translation. This was outright wrong under Xen
(causing a dead boot with no useful output on "suitable" systems), and
produced needlessly more complicated code (even if just slightly) when
paravirt was enabled.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Reviewed-by: Juergen Gross <jgross@suse.com>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Brian Gerst <brgerst@gmail.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/5A8AF1BB02000078001A91C3@prv-mh.provo.novell.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agonospec: Allow index argument to have const-qualified type
Rasmus Villemoes [Fri, 16 Feb 2018 21:20:48 +0000 (13:20 -0800)]
nospec: Allow index argument to have const-qualified type

BugLink: http://bugs.launchpad.net/bugs/1755179
commit b98c6a160a057d5686a8c54c79cc6c8c94a7d0c8 upstream.

The last expression in a statement expression need not be a bare
variable, quoting gcc docs

  The last thing in the compound statement should be an expression
  followed by a semicolon; the value of this subexpression serves as the
  value of the entire construct.

and we already use that in e.g. the min/max macros which end with a
ternary expression.

This way, we can allow index to have const-qualified type, which will in
some cases avoid the need for introducing a local copy of index of
non-const qualified type. That, in turn, can prevent readers not
familiar with the internals of array_index_nospec from wondering about
the seemingly redundant extra variable, and I think that's worthwhile
considering how confusing the whole _nospec business is.

The expression _i&_mask has type unsigned long (since that is the type
of _mask, and the BUILD_BUG_ONs guarantee that _i will get promoted to
that), so in order not to change the type of the whole expression, add
a cast back to typeof(_i).

Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Will Deacon <will.deacon@arm.com>
Cc: linux-arch@vger.kernel.org
Cc: stable@vger.kernel.org
Link: http://lkml.kernel.org/r/151881604837.17395.10812767547837568328.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoKVM: s390: consider epoch index on TOD clock syncs
David Hildenbrand [Wed, 7 Feb 2018 11:46:45 +0000 (12:46 +0100)]
KVM: s390: consider epoch index on TOD clock syncs

BugLink: http://bugs.launchpad.net/bugs/1755179
commit 1575767ef3cf5326701d2ae3075b7732cbc855e4 upstream.

For now, we don't take care of over/underflows. Especially underflows
are critical:

Assume the epoch is currently 0 and we get a sync request for delta=1,
meaning the TOD is moved forward by 1 and we have to fix it up by
subtracting 1 from the epoch. Right now, this will leave the epoch
index untouched, resulting in epoch=-1, epoch_idx=0, which is wrong.

We have to take care of over and underflows, also for the VSIE case. So
let's factor out calculation into a separate function.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180207114647.6220-5-david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
[use u8 for idx]
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoKVM: s390: consider epoch index on hotplugged CPUs
David Hildenbrand [Wed, 7 Feb 2018 11:46:44 +0000 (12:46 +0100)]
KVM: s390: consider epoch index on hotplugged CPUs

BugLink: http://bugs.launchpad.net/bugs/1755179
commit d16b52cb9cdb6f06dea8ab2f0a428e7d7f0b0a81 upstream.

We must copy both, the epoch and the epoch_idx.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180207114647.6220-4-david@redhat.com>
Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support")
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoKVM: s390: provide only a single function for setting the tod (fix SCK)
David Hildenbrand [Wed, 7 Feb 2018 11:46:43 +0000 (12:46 +0100)]
KVM: s390: provide only a single function for setting the tod (fix SCK)

BugLink: http://bugs.launchpad.net/bugs/1755179
commit 0e7def5fb0dc53ddbb9f62a497d15f1e11ccdc36 upstream.

Right now, SET CLOCK called in the guest does not properly take care of
the epoch index, as the call goes via the old kvm_s390_set_tod_clock()
interface. So the epoch index is neither reset to 0, if required, nor
properly set to e.g. 0xff on negative values.

Fix this by providing a single kvm_s390_set_tod_clock() function. Move
Multiple-epoch facility handling into it.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180207114647.6220-3-david@redhat.com>
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support")
Cc: stable@vger.kernel.org
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoKVM: s390: take care of clock-comparator sign control
David Hildenbrand [Wed, 7 Feb 2018 11:46:42 +0000 (12:46 +0100)]
KVM: s390: take care of clock-comparator sign control

BugLink: http://bugs.launchpad.net/bugs/1755179
commit 5fe01793dd953ab947fababe8abaf5ed5258c8df upstream.

Missed when enabling the Multiple-epoch facility. If the facility is
installed and the control is set, a sign based comaprison has to be
performed.

Right now we would inject wrong interrupts and ignore interrupt
conditions. Also the sleep time is calculated in a wrong way.

Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20180207114647.6220-2-david@redhat.com>
Fixes: 8fa1696ea781 ("KVM: s390: Multiple Epoch Facility support")
Cc: stable@vger.kernel.org
Reviewed-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@de.ibm.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoEDAC, sb_edac: Fix out of bound writes during DIMM configuration on KNL
Anna Karbownik [Thu, 22 Feb 2018 15:18:13 +0000 (16:18 +0100)]
EDAC, sb_edac: Fix out of bound writes during DIMM configuration on KNL

BugLink: http://bugs.launchpad.net/bugs/1755179
commit bf8486709ac7fad99e4040dea73fe466c57a4ae1 upstream.

Commit

  3286d3eb906c ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4")

decreased NUM_CHANNELS from 8 to 4, but this is not enough for Knights
Landing which supports up to 6 channels.

This caused out-of-bounds writes to pvt->mirror_mode and pvt->tolm
variables which don't pay critical role on KNL code path, so the memory
corruption wasn't causing any visible driver failures.

The easiest way of fixing it is to change NUM_CHANNELS to 6. Do that.

An alternative solution would be to restructure the KNL part of the
driver to 2MC/3channel representation.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Anna Karbownik <anna.karbownik@intel.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Tony Luck <tony.luck@intel.com>
Cc: jim.m.snow@intel.com
Cc: krzysztof.paliswiat@intel.com
Cc: lukasz.odzioba@intel.com
Cc: qiuxu.zhuo@intel.com
Cc: linux-edac <linux-edac@vger.kernel.org>
Cc: <stable@vger.kernel.org>
Fixes: 3286d3eb906c ("EDAC, sb_edac: Drop NUM_CHANNELS from 8 back to 4")
Link: http://lkml.kernel.org/r/1519312693-4789-1-git-send-email-anna.karbownik@intel.com
[ Massage commit message. ]
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agomedia: m88ds3103: don't call a non-initalized function
Mauro Carvalho Chehab [Sat, 10 Feb 2018 11:14:10 +0000 (06:14 -0500)]
media: m88ds3103: don't call a non-initalized function

BugLink: http://bugs.launchpad.net/bugs/1755179
commit b9c97c67fd19262c002d94ced2bfb513083e161e upstream.

If m88d3103 chip ID is not recognized, the device is not initialized.

However, it returns from probe without any error, causing this OOPS:

[    7.689289] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[    7.689297] pgd = 7b0bd7a7
[    7.689302] [00000000] *pgd=00000000
[    7.689318] Internal error: Oops: 80000005 [#1] SMP ARM
[    7.689322] Modules linked in: dvb_usb_dvbsky(+) m88ds3103 dvb_usb_v2 dvb_core videobuf2_vmalloc videobuf2_memops videobuf2_core crc32_arm_ce videodev media
[    7.689358] CPU: 3 PID: 197 Comm: systemd-udevd Not tainted 4.15.0-mcc+ #23
[    7.689361] Hardware name: BCM2835
[    7.689367] PC is at 0x0
[    7.689382] LR is at m88ds3103_attach+0x194/0x1d0 [m88ds3103]
[    7.689386] pc : [<00000000>]    lr : [<bf0ae1ec>]    psr: 60000013
[    7.689391] sp : ed8e5c20  ip : ed8c1e00  fp : ed8945c0
[    7.689395] r10: ed894000  r9 : ed894378  r8 : eda736c0
[    7.689400] r7 : ed894070  r6 : ed8e5c44  r5 : bf0bb040  r4 : eda77600
[    7.689405] r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : eda77600
[    7.689412] Flags: nZCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment none
[    7.689417] Control: 10c5383d  Table: 2d8e806a  DAC: 00000051
[    7.689423] Process systemd-udevd (pid: 197, stack limit = 0xe9dbfb63)
[    7.689428] Stack: (0xed8e5c20 to 0xed8e6000)
[    7.689439] 5c20: ed853a80 eda73640 ed894000 ed8942c0 ed853a80 bf0b9e98 ed894070 bf0b9f10
[    7.689449] 5c40: 00000000 00000000 bf08c17c c08dfc50 00000000 00000000 00000000 00000000
[    7.689459] 5c60: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    7.689468] 5c80: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    7.689479] 5ca0: 00000000 00000000 ed8945c0 ed8942c0 ed894000 ed894830 bf0b9e98 00000000
[    7.689490] 5cc0: ed894378 bf0a3cb4 bf0bc3b0 0000533b ed920540 00000000 00000034 bf0a6434
[    7.689500] 5ce0: ee952070 ed826600 bf0a7038 bf0a2dd8 00000001 bf0a6768 bf0a2f90 ed8943c0
[    7.689511] 5d00: 00000000 c08eca68 ed826620 ed826620 00000000 ee952070 bf0bc034 ee952000
[    7.689521] 5d20: ed826600 bf0bb080 ffffffed c0aa9e9c c0aa9dac ed826620 c16edf6c c168c2c8
[    7.689531] 5d40: c16edf70 00000000 bf0bc034 0000000d 00000000 c08e268c bf0bb080 ed826600
[    7.689541] 5d60: bf0bc034 ed826654 ed826620 bf0bc034 c164c8bc 00000000 00000001 00000000
[    7.689553] 5d80: 00000028 c08e2948 00000000 bf0bc034 c08e2848 c08e0778 ee9f0a58 ed88bab4
[    7.689563] 5da0: bf0bc034 ed90ba80 c168c1f0 c08e1934 bf0bb3bc c17045ac bf0bc034 c164c8bc
[    7.689574] 5dc0: bf0bc034 bf0bb3bc ed91f564 c08e34ec bf0bc000 c164c8bc bf0bc034 c0aa8dc4
[    7.689584] 5de0: ffffe000 00000000 bf0bf000 ed91f600 ed91f564 c03021e4 00000001 00000000
[    7.689595] 5e00: c166e040 8040003f ed853a80 bf0bc448 00000000 c1678174 ed853a80 f0f22000
[    7.689605] 5e20: f0f21fff 8040003f 014000c0 ed91e700 ed91e700 c16d8e68 00000001 ed91e6c0
[    7.689615] 5e40: bf0bc400 00000001 bf0bc400 ed91f564 00000001 00000000 00000028 c03c9a24
[    7.689625] 5e60: 00000001 c03c8c94 ed8e5f50 ed8e5f50 00000001 bf0bc400 ed91f540 c03c8cb0
[    7.689637] 5e80: bf0bc40c 00007fff bf0bc400 c03c60b0 00000000 bf0bc448 00000028 c0e09684
[    7.689647] 5ea0: 00000002 bf0bc530 c1234bf8 bf0bc5dc bf0bc514 c10ebbe8 ffffe000 bf000000
[    7.689657] 5ec0: 00011538 00000000 ed8e5f48 00000000 00000000 00000000 00000000 00000000
[    7.689666] 5ee0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[    7.689676] 5f00: 00000000 00000000 7fffffff 00000000 00000013 b6e55a18 0000017b c0309104
[    7.689686] 5f20: ed8e4000 00000000 00510af0 c03c9430 7fffffff 00000000 00000003 00000000
[    7.689697] 5f40: 00000000 f0f0f000 00011538 00000000 f0f107b0 f0f0f000 00011538 f0f1fdb8
[    7.689707] 5f60: f0f1fbe8 f0f1b974 00004000 000041e0 bf0bc3d0 00000001 00000000 000024c4
[    7.689717] 5f80: 0000002d 0000002e 00000019 00000000 00000010 00000000 16894000 00000000
[    7.689727] 5fa0: 00000000 c0308f20 16894000 00000000 00000013 b6e55a18 00000000 b6e5652c
[    7.689737] 5fc0: 16894000 00000000 00000000 0000017b 00020000 00508110 00000000 00510af0
[    7.689748] 5fe0: bef68948 bef68938 b6e4d3d0 b6d32590 60000010 00000013 00000000 00000000
[    7.689790] [<bf0ae1ec>] (m88ds3103_attach [m88ds3103]) from [<bf0b9f10>] (dvbsky_s960c_attach+0x78/0x280 [dvb_usb_dvbsky])
[    7.689821] [<bf0b9f10>] (dvbsky_s960c_attach [dvb_usb_dvbsky]) from [<bf0a3cb4>] (dvb_usbv2_probe+0xa3c/0x1024 [dvb_usb_v2])
[    7.689849] [<bf0a3cb4>] (dvb_usbv2_probe [dvb_usb_v2]) from [<c0aa9e9c>] (usb_probe_interface+0xf0/0x2a8)
[    7.689869] [<c0aa9e9c>] (usb_probe_interface) from [<c08e268c>] (driver_probe_device+0x2f8/0x4b4)
[    7.689881] [<c08e268c>] (driver_probe_device) from [<c08e2948>] (__driver_attach+0x100/0x11c)
[    7.689895] [<c08e2948>] (__driver_attach) from [<c08e0778>] (bus_for_each_dev+0x4c/0x9c)
[    7.689909] [<c08e0778>] (bus_for_each_dev) from [<c08e1934>] (bus_add_driver+0x1c0/0x264)
[    7.689919] [<c08e1934>] (bus_add_driver) from [<c08e34ec>] (driver_register+0x78/0xf4)
[    7.689931] [<c08e34ec>] (driver_register) from [<c0aa8dc4>] (usb_register_driver+0x70/0x134)
[    7.689946] [<c0aa8dc4>] (usb_register_driver) from [<c03021e4>] (do_one_initcall+0x44/0x168)
[    7.689963] [<c03021e4>] (do_one_initcall) from [<c03c9a24>] (do_init_module+0x64/0x1f4)
[    7.689979] [<c03c9a24>] (do_init_module) from [<c03c8cb0>] (load_module+0x20a0/0x25c8)
[    7.689993] [<c03c8cb0>] (load_module) from [<c03c9430>] (SyS_finit_module+0xb4/0xec)
[    7.690007] [<c03c9430>] (SyS_finit_module) from [<c0308f20>] (ret_fast_syscall+0x0/0x54)
[    7.690018] Code: bad PC value

This may happen on normal circumstances, if, for some reason, the demod
hangs and start returning an invalid chip ID:

[   10.394395] m88ds3103 3-0068: Unknown device. Chip_id=00

So, change the logic to cause probe to fail with -ENODEV, preventing
the OOPS.

Detected while testing DVB MMAP patches on Raspberry Pi 3 with
DVBSky S960CI.

Cc: stable@vger.kernel.org
Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoblk-mq: don't call io sched's .requeue_request when requeueing rq to ->dispatch
Ming Lei [Fri, 23 Feb 2018 15:36:56 +0000 (23:36 +0800)]
blk-mq: don't call io sched's .requeue_request when requeueing rq to ->dispatch

BugLink: http://bugs.launchpad.net/bugs/1755179
commit 105976f517791aed3b11f8f53b308a2069d42055 upstream.

__blk_mq_requeue_request() covers two cases:

- one is that the requeued request is added to hctx->dispatch, such as
blk_mq_dispatch_rq_list()

- another case is that the request is requeued to io scheduler, such as
blk_mq_requeue_request().

We should call io sched's .requeue_request callback only for the 2nd
case.

Cc: Paolo Valente <paolo.valente@linaro.org>
Cc: Omar Sandoval <osandov@fb.com>
Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers")
Cc: stable@vger.kernel.org
Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com>
Acked-by: Paolo Valente <paolo.valente@linaro.org>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agotcp: revert F-RTO extension to detect more spurious timeouts
Yuchung Cheng [Tue, 27 Feb 2018 22:15:02 +0000 (14:15 -0800)]
tcp: revert F-RTO extension to detect more spurious timeouts

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit fc68e171d376c322e6777a3d7ac2f0278b68b17f ]

This reverts commit 89fe18e44f7ee5ab1c90d0dff5835acee7751427.

While the patch could detect more spurious timeouts, it could cause
poor TCP performance on broken middle-boxes that modifies TCP packets
(e.g. receive window, SACK options). Since the performance gain is
much smaller compared to the potential loss. The best solution is
to fully revert the change.

Fixes: 89fe18e44f7e ("tcp: extend F-RTO to catch more spurious timeouts")
Reported-by: Teodor Milkov <tm@del.bg>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agotcp: revert F-RTO middle-box workaround
Yuchung Cheng [Tue, 27 Feb 2018 22:15:01 +0000 (14:15 -0800)]
tcp: revert F-RTO middle-box workaround

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit d4131f09770d9b7471c9da65e6ecd2477746ac5c ]

This reverts commit cc663f4d4c97b7297fb45135ab23cfd508b35a77. While fixing
some broken middle-boxes that modifies receive window fields, it does not
address middle-boxes that strip off SACK options. The best solution is
to fully revert this patch and the root F-RTO enhancement.

Fixes: cc663f4d4c97 ("tcp: restrict F-RTO to work-around broken middle-boxes")
Reported-by: Teodor Milkov <tm@del.bg>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agos390/qeth: fix IPA command submission race
Julian Wiedmann [Tue, 27 Feb 2018 17:58:17 +0000 (18:58 +0100)]
s390/qeth: fix IPA command submission race

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit d22ffb5a712f9211ffd104c38fc17cbfb1b5e2b0 ]

If multiple IPA commands are build & sent out concurrently,
fill_ipacmd_header() may assign a seqno value to a command that's
different from what send_control_data() later assigns to this command's
reply.
This is due to other commands passing through send_control_data(),
and incrementing card->seqno.ipa along the way.

So one IPA command has no reply that's waiting for its seqno, while some
other IPA command has multiple reply objects waiting for it.
Only one of those waiting replies wins, and the other(s) times out and
triggers a recovery via send_ipa_cmd().

Fix this by making sure that the same seqno value is assigned to
a command and its reply object.
Do so immediately before submitting the command & while holding the
irq_pending "lock", to produce nicely ascending seqnos.

As a side effect, *all* IPA commands now use a reply object that's
waiting for its actual seqno. Previously, early IPA commands that were
submitted while the card was still DOWN used the "catch-all" IDX seqno.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agos390/qeth: fix IP address lookup for L3 devices
Julian Wiedmann [Tue, 27 Feb 2018 17:58:16 +0000 (18:58 +0100)]
s390/qeth: fix IP address lookup for L3 devices

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit c5c48c58b259bb8f0482398370ee539d7a12df3e ]

Current code ("qeth_l3_ip_from_hash()") matches a queried address object
against objects in the IP table by IP address, Mask/Prefix Length and
MAC address ("qeth_l3_ipaddrs_is_equal()"). But what callers actually
require is either
a) "is this IP address registered" (ie. match by IP address only),
before adding a new address.
b) or "is this address object registered" (ie. match all relevant
   attributes), before deleting an address.

Right now
1. the ADD path is too strict in its lookup, and eg. doesn't detect
conflicts between an existing NORMAL address and a new VIPA address
(because the NORMAL address will have mask != 0, while VIPA has
a mask == 0),
2. the DELETE path is not strict enough, and eg. allows del_rxip() to
delete a VIPA address as long as the IP address matches.

Fix all this by adding helpers (_addr_match_ip() and _addr_match_all())
that do the appropriate checking.

Note that the ADD path for NORMAL addresses is special, as qeth keeps
track of how many times such an address is in use (and there is no
immediate way of returning errors to the caller). So when a requested
NORMAL address _fully_ matches an existing one, it's not considered a
conflict and we merely increment the refcount.

Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agoRevert "s390/qeth: fix using of ref counter for rxip addresses"
Julian Wiedmann [Tue, 27 Feb 2018 17:58:15 +0000 (18:58 +0100)]
Revert "s390/qeth: fix using of ref counter for rxip addresses"

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit 4964c66fd49b2e2342da35358f2ff74614bcbaee ]

This reverts commit cb816192d986f7596009dedcf2201fe2e5bc2aa7.

The issue this attempted to fix never actually occurs.
l3_add_rxip() checks (via l3_ip_from_hash()) if the requested address
was previously added to the card. If so, it returns -EEXIST and doesn't
call l3_add_ip().
As a result, the "address exists" path in l3_add_ip() is never taken
for rxip addresses, and this patch had no effect.

Fixes: cb816192d986 ("s390/qeth: fix using of ref counter for rxip addresses")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agos390/qeth: fix double-free on IP add/remove race
Julian Wiedmann [Tue, 27 Feb 2018 17:58:14 +0000 (18:58 +0100)]
s390/qeth: fix double-free on IP add/remove race

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit 14d066c3531a87f727968cacd85bd95c75f59843 ]

Registering an IPv4 address with the HW takes quite a while, so we
temporarily drop the ip_htable lock. Any concurrent add/remove of the
same IP adjusts the IP's use count, and (on remove) is then blocked by
addr->in_progress.
After the register call has completed, we check the use count for
concurrently attempted add/remove calls - and possibly straight-away
deregister the IP again. This happens via l3_delete_ip(), which
1) looks up the queried IP in the htable (getting a reference to the
   *same* queried object),
2) deregisters the IP from the HW, and
3) frees the IP object.

The caller in l3_add_ip() then does a second free on the same object.

For this case, skip all the extra checks and lookups in l3_delete_ip()
and just deregister & free the IP object ourselves.

Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agos390/qeth: fix IP removal on offline cards
Julian Wiedmann [Tue, 27 Feb 2018 17:58:13 +0000 (18:58 +0100)]
s390/qeth: fix IP removal on offline cards

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit 98d823ab1fbdcb13abc25b420f9bb71bade42056 ]

If the HW is not reachable, then none of the IPs in qeth's internal
table has been registered with the HW yet. So when deleting such an IP,
there's no need to stage it for deregistration - just drop it from
the table.

This fixes the "add-delete-add" scenario on an offline card, where the
the second "add" merely increments the IP's use count. But as the IP is
still set to DISP_ADDR_DELETE from the previous "delete" step,
l3_recover_ip() won't register it with the HW when the card goes online.

Fixes: 5f78e29ceebf ("qeth: optimize IP handling in rx_mode callback")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agos390/qeth: fix overestimated count of buffer elements
Julian Wiedmann [Tue, 27 Feb 2018 17:58:12 +0000 (18:58 +0100)]
s390/qeth: fix overestimated count of buffer elements

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit 12472af89632beb1ed8dea29d4efe208ca05b06a ]

qeth_get_elements_for_range() doesn't know how to handle a 0-length
range (ie. start == end), and returns 1 when it should return 0.
Such ranges occur on TSO skbs, where the L2/L3/L4 headers (and thus all
of the skb's linear data) are skipped when mapping the skb into regular
buffer elements.

This overestimation may cause several performance-related issues:
1. sub-optimal IO buffer selection, where the next buffer gets selected
   even though the skb would actually still fit into the current buffer.
2. forced linearization, if the element count for a non-linear skb
   exceeds QETH_MAX_BUFFER_ELEMENTS.

Rather than modifying qeth_get_elements_for_range() and adding overhead
to every caller, fix up those callers that are in risk of passing a
0-length range.

Fixes: 2863c61334aa ("qeth: refactor calculation of SBALE count")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agos390/qeth: fix SETIP command handling
Julian Wiedmann [Fri, 9 Feb 2018 10:03:50 +0000 (11:03 +0100)]
s390/qeth: fix SETIP command handling

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit 1c5b2216fbb973a9410e0b06389740b5c1289171 ]

send_control_data() applies some special handling to SETIP v4 IPA
commands. But current code parses *all* command types for the SETIP
command code. Limit the command code check to IPA commands.

Fixes: 5b54e16f1a54 ("qeth: do not spin for SETIP ip assist command")
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agos390/qeth: fix underestimated count of buffer elements
Ursula Braun [Fri, 9 Feb 2018 10:03:49 +0000 (11:03 +0100)]
s390/qeth: fix underestimated count of buffer elements

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit 89271c65edd599207dd982007900506283c90ae3 ]

For a memory range/skb where the last byte falls onto a page boundary
(ie. 'end' is of the form xxx...xxx001), the PFN_UP() part of the
calculation currently doesn't round up to the next PFN due to an
off-by-one error.
Thus qeth believes that the skb occupies one page less than it
actually does, and may select a IO buffer that doesn't have enough spare
buffer elements to fit all of the skb's data.
HW detects this as a malformed buffer descriptor, and raises an
exception which then triggers device recovery.

Fixes: 2863c61334aa ("qeth: refactor calculation of SBALE count")
Signed-off-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agol2tp: fix tunnel lookup use-after-free race
James Chapman [Fri, 23 Feb 2018 17:45:47 +0000 (17:45 +0000)]
l2tp: fix tunnel lookup use-after-free race

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit 28f5bfb819195ad9c2eb9486babe7b0e4efe925f ]

l2tp_tunnel_get walks the tunnel list to find a matching tunnel
instance and if a match is found, its refcount is increased before
returning the tunnel pointer. But when tunnel objects are destroyed,
they are on the tunnel list after their refcount hits zero. Fix this
by moving the code that removes the tunnel from the tunnel list from
the tunnel socket destructor into in the l2tp_tunnel_delete path,
before the tunnel refcount is decremented.

refcount_t: increment on 0; use-after-free.
WARNING: CPU: 3 PID: 13507 at lib/refcount.c:153 refcount_inc+0x47/0x50
Modules linked in:
CPU: 3 PID: 13507 Comm: syzbot_6e6a5ec8 Not tainted 4.16.0-rc2+ #36
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
RIP: 0010:refcount_inc+0x47/0x50
RSP: 0018:ffff8800136ffb20 EFLAGS: 00010286
RAX: dffffc0000000008 RBX: ffff880017068e68 RCX: ffffffff814d3333
RDX: 0000000000000000 RSI: ffff88001a59f6d8 RDI: ffff88001a59f6d8
RBP: ffff8800136ffb28 R08: 0000000000000000 R09: 0000000000000000
R10: ffff8800136ffab0 R11: 0000000000000000 R12: ffff880017068e50
R13: 0000000000000000 R14: ffff8800174da800 R15: 0000000000000004
FS:  00007f403ab1e700(0000) GS:ffff88001a580000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000205fafd2 CR3: 0000000016770000 CR4: 00000000000006e0
Call Trace:
 l2tp_tunnel_get+0x2dd/0x4e0
 pppol2tp_connect+0x428/0x13c0
 ? pppol2tp_session_create+0x170/0x170
 ? __might_fault+0x115/0x1d0
 ? lock_downgrade+0x860/0x860
 ? __might_fault+0xe5/0x1d0
 ? security_socket_connect+0x8e/0xc0
 SYSC_connect+0x1b6/0x310
 ? SYSC_bind+0x280/0x280
 ? __do_page_fault+0x5d1/0xca0
 ? up_read+0x1f/0x40
 ? __do_page_fault+0x3c8/0xca0
 SyS_connect+0x29/0x30
 ? SyS_accept+0x40/0x40
 do_syscall_64+0x1e0/0x730
 ? trace_hardirqs_off_thunk+0x1a/0x1c
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x7f403a42f259
RSP: 002b:00007f403ab1dee8 EFLAGS: 00000296 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 00000000205fafe4 RCX: 00007f403a42f259
RDX: 000000000000002e RSI: 00000000205fafd2 RDI: 0000000000000004
RBP: 00007f403ab1df20 R08: 00007f403ab1e700 R09: 0000000000000000
R10: 00007f403ab1e700 R11: 0000000000000296 R12: 0000000000000000
R13: 00007ffc81906cbf R14: 0000000000000000 R15: 00007f403ab2b040
Code: 3b ff 5b 5d c3 e8 ca 5f 3b ff 80 3d 49 8e 66 04 00 75 ea e8 bc 5f 3b ff 48 c7 c7 60 69 64 85 c6 05 34 8e 66 04 01 e8 59 49 15 ff <0f> 0b eb ce 0f 1f 44 00 00 55 48 89 e5 41 56 41 55 41 54 53 49

Fixes: f8ccac0e44934 ("l2tp: put tunnel socket release on a workqueue")
Reported-and-tested-by: syzbot+19c09769f14b48810113@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+347bd5acde002e353a36@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+6e6a5ec8de31a94cd015@syzkaller.appspotmail.com
Reported-and-tested-by: syzbot+9df43faf09bd400f2993@syzkaller.appspotmail.com
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agol2tp: fix race in pppol2tp_release with session object destroy
James Chapman [Fri, 23 Feb 2018 17:45:46 +0000 (17:45 +0000)]
l2tp: fix race in pppol2tp_release with session object destroy

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit d02ba2a6110c530a32926af8ad441111774d2893 ]

pppol2tp_release uses call_rcu to put the final ref on its socket. But
the session object doesn't hold a ref on the session socket so may be
freed while the pppol2tp_put_sk RCU callback is scheduled. Fix this by
having the session hold a ref on its socket until the session is
destroyed. It is this ref that is dropped via call_rcu.

Sessions are also deleted via l2tp_tunnel_closeall. This must now also put
the final ref via call_rcu. So move the call_rcu call site into
pppol2tp_session_close so that this happens in both destroy paths. A
common destroy path should really be implemented, perhaps with
l2tp_tunnel_closeall calling l2tp_session_delete like pppol2tp_release
does, but this will be looked at later.

ODEBUG: activate active (active state 1) object type: rcu_head hint:           (null)
WARNING: CPU: 3 PID: 13407 at lib/debugobjects.c:291 debug_print_object+0x166/0x220
Modules linked in:
CPU: 3 PID: 13407 Comm: syzbot_19c09769 Not tainted 4.16.0-rc2+ #38
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
RIP: 0010:debug_print_object+0x166/0x220
RSP: 0018:ffff880013647a00 EFLAGS: 00010082
RAX: dffffc0000000008 RBX: 0000000000000003 RCX: ffffffff814d3333
RDX: 0000000000000000 RSI: 0000000000000001 RDI: ffff88001a59f6d0
RBP: ffff880013647a40 R08: 0000000000000000 R09: 0000000000000001
R10: ffff8800136479a8 R11: 0000000000000000 R12: 0000000000000001
R13: ffffffff86161420 R14: ffffffff85648b60 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88001a580000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020e77000 CR3: 0000000006022000 CR4: 00000000000006e0
Call Trace:
 debug_object_activate+0x38b/0x530
 ? debug_object_assert_init+0x3b0/0x3b0
 ? __mutex_unlock_slowpath+0x85/0x8b0
 ? pppol2tp_session_destruct+0x110/0x110
 __call_rcu.constprop.66+0x39/0x890
 ? __call_rcu.constprop.66+0x39/0x890
 call_rcu_sched+0x17/0x20
 pppol2tp_release+0x2c7/0x440
 ? fcntl_setlk+0xca0/0xca0
 ? sock_alloc_file+0x340/0x340
 sock_release+0x92/0x1e0
 sock_close+0x1b/0x20
 __fput+0x296/0x6e0
 ____fput+0x1a/0x20
 task_work_run+0x127/0x1a0
 do_exit+0x7f9/0x2ce0
 ? SYSC_connect+0x212/0x310
 ? mm_update_next_owner+0x690/0x690
 ? up_read+0x1f/0x40
 ? __do_page_fault+0x3c8/0xca0
 do_group_exit+0x10d/0x330
 ? do_group_exit+0x330/0x330
 SyS_exit_group+0x22/0x30
 do_syscall_64+0x1e0/0x730
 ? trace_hardirqs_off_thunk+0x1a/0x1c
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x7f362e471259
RSP: 002b:00007ffe389abe08 EFLAGS: 00000202 ORIG_RAX: 00000000000000e7
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f362e471259
RDX: 00007f362e471259 RSI: 000000000000002e RDI: 0000000000000000
RBP: 00007ffe389abe30 R08: 0000000000000000 R09: 00007f362e944270
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000400b60
R13: 00007ffe389abf50 R14: 0000000000000000 R15: 0000000000000000
Code: 8d 3c dd a0 8f 64 85 48 89 fa 48 c1 ea 03 80 3c 02 00 75 7b 48 8b 14 dd a0 8f 64 85 4c 89 f6 48 c7 c7 20 85 64 85 e
8 2a 55 14 ff <0f> 0b 83 05 ad 2a 68 04 01 48 83 c4 18 5b 41 5c 41 5d 41 5e 41

Fixes: ee40fb2e1eb5b ("l2tp: protect sock pointer of struct pppol2tp_session with RCU")
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agol2tp: fix races with tunnel socket close
James Chapman [Fri, 23 Feb 2018 17:45:45 +0000 (17:45 +0000)]
l2tp: fix races with tunnel socket close

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit d00fa9adc528c1b0e64d532556764852df8bd7b9 ]

The tunnel socket tunnel->sock (struct sock) is accessed when
preparing a new ppp session on a tunnel at pppol2tp_session_init. If
the socket is closed by a thread while another is creating a new
session, the threads race. In pppol2tp_connect, the tunnel object may
be created if the pppol2tp socket is associated with the special
session_id 0 and the tunnel socket is looked up using the provided
fd. When handling this, pppol2tp_connect cannot sock_hold the tunnel
socket to prevent it being destroyed during pppol2tp_connect since
this may itself may race with the socket being destroyed. Doing
sockfd_lookup in pppol2tp_connect isn't sufficient to prevent
tunnel->sock going away either because a given tunnel socket fd may be
reused between calls to pppol2tp_connect. Instead, have
l2tp_tunnel_create sock_hold the tunnel socket before it does
sockfd_put. This ensures that the tunnel's socket is always extant
while the tunnel object exists. Hold a ref on the socket until the
tunnel is destroyed and ensure that all tunnel destroy paths go
through a common function (l2tp_tunnel_delete) since this will do the
final sock_put to release the tunnel socket.

Since the tunnel's socket is now guaranteed to exist if the tunnel
exists, we no longer need to use sockfd_lookup via l2tp_sock_to_tunnel
to derive the tunnel from the socket since this is always
sk_user_data.

Also, sessions no longer sock_hold the tunnel socket since sessions
already hold a tunnel ref and the tunnel sock will not be freed until
the tunnel is freed. Removing these sock_holds in
l2tp_session_register avoids a possible sock leak in the
pppol2tp_connect error path if l2tp_session_register succeeds but
attaching a ppp channel fails. The pppol2tp_connect error path could
have been fixed instead and have the sock ref dropped when the session
is freed, but doing a sock_put of the tunnel socket when the session
is freed would require a new session_free callback. It is simpler to
just remove the sock_hold of the tunnel socket in
l2tp_session_register, now that the tunnel socket lifetime is
guaranteed.

Finally, some init code in l2tp_tunnel_create is reordered to ensure
that the new tunnel object's refcount is set and the tunnel socket ref
is taken before the tunnel socket destructor callbacks are set.

kasan: CONFIG_KASAN_INLINE enabled
kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 4360 Comm: syzbot_19c09769 Not tainted 4.16.0-rc2+ #34
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
RIP: 0010:pppol2tp_session_init+0x1d6/0x500
RSP: 0018:ffff88001377fb40 EFLAGS: 00010212
RAX: dffffc0000000000 RBX: ffff88001636a940 RCX: ffffffff84836c1d
RDX: 0000000000000045 RSI: 0000000055976744 RDI: 0000000000000228
RBP: ffff88001377fb60 R08: ffffffff84836bc8 R09: 0000000000000002
R10: ffff88001377fab8 R11: 0000000000000001 R12: 0000000000000000
R13: ffff88001636aac8 R14: ffff8800160f81c0 R15: 1ffff100026eff76
FS:  00007ffb3ea66700(0000) GS:ffff88001a400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020e77000 CR3: 0000000016261000 CR4: 00000000000006f0
Call Trace:
 pppol2tp_connect+0xd18/0x13c0
 ? pppol2tp_session_create+0x170/0x170
 ? __might_fault+0x115/0x1d0
 ? lock_downgrade+0x860/0x860
 ? __might_fault+0xe5/0x1d0
 ? security_socket_connect+0x8e/0xc0
 SYSC_connect+0x1b6/0x310
 ? SYSC_bind+0x280/0x280
 ? __do_page_fault+0x5d1/0xca0
 ? up_read+0x1f/0x40
 ? __do_page_fault+0x3c8/0xca0
 SyS_connect+0x29/0x30
 ? SyS_accept+0x40/0x40
 do_syscall_64+0x1e0/0x730
 ? trace_hardirqs_off_thunk+0x1a/0x1c
 entry_SYSCALL_64_after_hwframe+0x42/0xb7
RIP: 0033:0x7ffb3e376259
RSP: 002b:00007ffeda4f6508 EFLAGS: 00000202 ORIG_RAX: 000000000000002a
RAX: ffffffffffffffda RBX: 0000000020e77012 RCX: 00007ffb3e376259
RDX: 000000000000002e RSI: 0000000020e77000 RDI: 0000000000000004
RBP: 00007ffeda4f6540 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000400b60
R13: 00007ffeda4f6660 R14: 0000000000000000 R15: 0000000000000000
Code: 80 3d b0 ff 06 02 00 0f 84 07 02 00 00 e8 13 d6 db fc 49 8d bc 24 28 02 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 f
a 48 c1 ea 03 <80> 3c 02 00 0f 85 ed 02 00 00 4d 8b a4 24 28 02 00 00 e8 13 16

Fixes: 80d84ef3ff1dd ("l2tp: prevent l2tp_tunnel_delete racing with userspace close")
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agol2tp: don't use inet_shutdown on ppp session destroy
James Chapman [Fri, 23 Feb 2018 17:45:44 +0000 (17:45 +0000)]
l2tp: don't use inet_shutdown on ppp session destroy

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit 225eb26489d05c679a4c4197ffcb81c81e9dcaf4 ]

Previously, if a ppp session was closed, we called inet_shutdown to mark
the socket as unconnected such that userspace would get errors and
then close the socket. This could race with userspace closing the
socket. Instead, leave userspace to close the socket in its own time
(our session will be detached anyway).

BUG: KASAN: use-after-free in inet_shutdown+0x5d/0x1c0
Read of size 4 at addr ffff880010ea3ac0 by task syzbot_347bd5ac/8296

CPU: 3 PID: 8296 Comm: syzbot_347bd5ac Not tainted 4.16.0-rc1+ #91
Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Call Trace:
 dump_stack+0x101/0x157
 ? inet_shutdown+0x5d/0x1c0
 print_address_description+0x78/0x260
 ? inet_shutdown+0x5d/0x1c0
 kasan_report+0x240/0x360
 __asan_load4+0x78/0x80
 inet_shutdown+0x5d/0x1c0
 ? pppol2tp_show+0x80/0x80
 pppol2tp_session_close+0x68/0xb0
 l2tp_tunnel_closeall+0x199/0x210
 ? udp_v6_flush_pending_frames+0x90/0x90
 l2tp_udp_encap_destroy+0x6b/0xc0
 ? l2tp_tunnel_del_work+0x2e0/0x2e0
 udpv6_destroy_sock+0x8c/0x90
 sk_common_release+0x47/0x190
 udp_lib_close+0x15/0x20
 inet_release+0x85/0xd0
 inet6_release+0x43/0x60
 sock_release+0x53/0x100
 ? sock_alloc_file+0x260/0x260
 sock_close+0x1b/0x20
 __fput+0x19f/0x380
 ____fput+0x1a/0x20
 task_work_run+0xd2/0x110
 exit_to_usermode_loop+0x18d/0x190
 do_syscall_64+0x389/0x3b0
 entry_SYSCALL_64_after_hwframe+0x26/0x9b
RIP: 0033:0x7fe240a45259
RSP: 002b:00007fe241132df8 EFLAGS: 00000297 ORIG_RAX: 0000000000000003
RAX: 0000000000000000 RBX: 0000000000000000 RCX: 00007fe240a45259
RDX: 00007fe240a45259 RSI: 0000000000000000 RDI: 00000000000000a5
RBP: 00007fe241132e20 R08: 00007fe241133700 R09: 0000000000000000
R10: 00007fe241133700 R11: 0000000000000297 R12: 0000000000000000
R13: 00007ffc49aff84f R14: 0000000000000000 R15: 00007fe241141040

Allocated by task 8331:
 save_stack+0x43/0xd0
 kasan_kmalloc+0xad/0xe0
 kasan_slab_alloc+0x12/0x20
 kmem_cache_alloc+0x144/0x3e0
 sock_alloc_inode+0x22/0x130
 alloc_inode+0x3d/0xf0
 new_inode_pseudo+0x1c/0x90
 sock_alloc+0x30/0x110
 __sock_create+0xaa/0x4c0
 SyS_socket+0xbe/0x130
 do_syscall_64+0x128/0x3b0
 entry_SYSCALL_64_after_hwframe+0x26/0x9b

Freed by task 8314:
 save_stack+0x43/0xd0
 __kasan_slab_free+0x11a/0x170
 kasan_slab_free+0xe/0x10
 kmem_cache_free+0x88/0x2b0
 sock_destroy_inode+0x49/0x50
 destroy_inode+0x77/0xb0
 evict+0x285/0x340
 iput+0x429/0x530
 dentry_unlink_inode+0x28c/0x2c0
 __dentry_kill+0x1e3/0x2f0
 dput.part.21+0x500/0x560
 dput+0x24/0x30
 __fput+0x2aa/0x380
 ____fput+0x1a/0x20
 task_work_run+0xd2/0x110
 exit_to_usermode_loop+0x18d/0x190
 do_syscall_64+0x389/0x3b0
 entry_SYSCALL_64_after_hwframe+0x26/0x9b

Fixes: fd558d186df2c ("l2tp: Split pppol2tp patch into separate l2tp and ppp parts")
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agol2tp: don't use inet_shutdown on tunnel destroy
James Chapman [Fri, 23 Feb 2018 17:45:43 +0000 (17:45 +0000)]
l2tp: don't use inet_shutdown on tunnel destroy

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit 76a6abdb2513ad4ea0ded55d2c66160491f2e848 ]

Previously, if a tunnel was closed, we called inet_shutdown to mark
the socket as unconnected such that userspace would get errors and
then close the socket. This could race with userspace closing the
socket. Instead, leave userspace to close the socket in its own time
(our tunnel will be detached anyway).

BUG: unable to handle kernel NULL pointer dereference at 00000000000000a0
IP: __lock_acquire+0x263/0x1630
PGD 0 P4D 0
Oops: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 2 PID: 42 Comm: kworker/u8:2 Not tainted 4.15.0-rc7+ #129
Workqueue: l2tp l2tp_tunnel_del_work
RIP: 0010:__lock_acquire+0x263/0x1630
RSP: 0018:ffff88001a37fc70 EFLAGS: 00010002
RAX: 0000000000000001 RBX: 0000000000000088 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffff88001a37fd18 R08: 0000000000000001 R09: 0000000000000000
R10: 0000000000000000 R11: 00000000000076fd R12: 00000000000000a0
R13: ffff88001a3722c0 R14: 0000000000000001 R15: 0000000000000000
FS:  0000000000000000(0000) GS:ffff88001ad00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00000000000000a0 CR3: 000000001730b000 CR4: 00000000000006e0
Call Trace:
 ? __lock_acquire+0xc77/0x1630
 ? console_trylock+0x11/0xa0
 lock_acquire+0x117/0x230
 ? lock_sock_nested+0x3a/0xa0
 _raw_spin_lock_bh+0x3a/0x50
 ? lock_sock_nested+0x3a/0xa0
 lock_sock_nested+0x3a/0xa0
 inet_shutdown+0x33/0xf0
 l2tp_tunnel_del_work+0x60/0xef
 process_one_work+0x1ea/0x5f0
 ? process_one_work+0x162/0x5f0
 worker_thread+0x48/0x3e0
 ? trace_hardirqs_on+0xd/0x10
 kthread+0x108/0x140
 ? process_one_work+0x5f0/0x5f0
 ? kthread_stop+0x2a0/0x2a0
 ret_from_fork+0x24/0x30
Code: 00 41 81 ff ff 1f 00 00 0f 87 7a 13 00 00 45 85 f6 49 8b 85
68 08 00 00 0f 84 ae 03 00 00 c7 44 24 18 00 00 00 00 e9 f0 00 00 00 <49> 81 3c
24 80 93 3f 83 b8 00 00 00 00 44 0f 44 c0 83 fe 01 0f
RIP: __lock_acquire+0x263/0x1630 RSP: ffff88001a37fc70
CR2: 00000000000000a0

Fixes: 309795f4bec2d ("l2tp: Add netlink control API for L2TP")
Signed-off-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agotcp: tracepoint: only call trace_tcp_send_reset with full socket
Song Liu [Wed, 7 Feb 2018 04:50:23 +0000 (20:50 -0800)]
tcp: tracepoint: only call trace_tcp_send_reset with full socket

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit 5c487bb9adddbc1d23433e09d2548759375c2b52 ]

tracepoint tcp_send_reset requires a full socket to work. However, it
may be called when in TCP_TIME_WAIT:

        case TCP_TW_RST:
                tcp_v6_send_reset(sk, skb);
                inet_twsk_deschedule_put(inet_twsk(sk));
                goto discard_it;

To avoid this problem, this patch checks the socket with sk_fullsock()
before calling trace_tcp_send_reset().

Fixes: c24b14c46bb8 ("tcp: add tracepoint trace_tcp_send_reset")
Signed-off-by: Song Liu <songliubraving@fb.com>
Reviewed-by: Lawrence Brakmo <brakmo@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agonet: phy: Restore phy_resume() locking assumption
Andrew Lunn [Tue, 27 Feb 2018 00:56:06 +0000 (01:56 +0100)]
net: phy: Restore phy_resume() locking assumption

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit 9c2c2e62df3fa30fb13fbeb7512a4eede729383b ]

commit f5e64032a799 ("net: phy: fix resume handling") changes the
locking semantics for phy_resume() such that the caller now needs to
hold the phy mutex. Not all call sites were adopted to this new
semantic, resulting in warnings from the added
WARN_ON(!mutex_is_locked(&phydev->lock)).  Rather than change the
semantics, add a __phy_resume() and restore the old behavior of
phy_resume().

Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Fixes: f5e64032a799 ("net: phy: fix resume handling")
Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agonet/mlx5: Fix error handling when adding flow rules
Vlad Buslov [Tue, 6 Feb 2018 08:52:19 +0000 (10:52 +0200)]
net/mlx5: Fix error handling when adding flow rules

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit 9238e380e823a39983ee8d6b6ee8d1a9c4ba8a65 ]

If building match list or adding existing fg fails when
node is locked, function returned without unlocking it.
This happened if node version changed or adding existing fg
returned with EAGAIN after jumping to search_again_locked label.

Fixes: bd71b08ec2ee ("net/mlx5: Support multiple updates of steering rules in parallel")
Signed-off-by: Vlad Buslov <vladbu@mellanox.com>
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agocxgb4: fix trailing zero in CIM LA dump
Rahul Lakkireddy [Thu, 15 Feb 2018 12:50:01 +0000 (18:20 +0530)]
cxgb4: fix trailing zero in CIM LA dump

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit e6f02a4d57cc438099bc8abfba43ba1400d77b38 ]

Set correct size of the CIM LA dump for T6.

Fixes: 27887bc7cb7f ("cxgb4: collect hardware LA dumps")
Signed-off-by: Rahul Lakkireddy <rahul.lakkireddy@chelsio.com>
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agovirtio-net: disable NAPI only when enabled during XDP set
Jason Wang [Wed, 28 Feb 2018 10:20:04 +0000 (18:20 +0800)]
virtio-net: disable NAPI only when enabled during XDP set

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit 4e09ff5362843dff3accfa84c805c7f3a99de9cd ]

We try to disable NAPI to prevent a single XDP TX queue being used by
multiple cpus. But we don't check if device is up (NAPI is enabled),
this could result stall because of infinite wait in
napi_disable(). Fixing this by checking device state through
netif_running() before.

Fixes: 4941d472bf95b ("virtio-net: do not reset during XDP set")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agotuntap: disable preemption during XDP processing
Jason Wang [Sat, 24 Feb 2018 03:32:25 +0000 (11:32 +0800)]
tuntap: disable preemption during XDP processing

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit 23e43f07f896f8578318cfcc9466f1e8b8ab21b6 ]

Except for tuntap, all other drivers' XDP was implemented at NAPI
poll() routine in a bh. This guarantees all XDP operation were done at
the same CPU which is required by e.g BFP_MAP_TYPE_PERCPU_ARRAY. But
for tuntap, we do it in process context and we try to protect XDP
processing by RCU reader lock. This is insufficient since
CONFIG_PREEMPT_RCU can preempt the RCU reader critical section which
breaks the assumption that all XDP were processed in the same CPU.

Fixing this by simply disabling preemption during XDP processing.

Fixes: 761876c857cb ("tap: XDP support")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agotuntap: correctly add the missing XDP flush
Jason Wang [Sat, 24 Feb 2018 03:32:26 +0000 (11:32 +0800)]
tuntap: correctly add the missing XDP flush

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit 1bb4f2e868a2891ab8bc668b8173d6ccb8c4ce6f ]

We don't flush batched XDP packets through xdp_do_flush_map(), this
will cause packets stall at TX queue. Consider we don't do XDP on NAPI
poll(), the only possible fix is to call xdp_do_flush_map()
immediately after xdp_do_redirect().

Note, this in fact won't try to batch packets through devmap, we could
address in the future.

Reported-by: Christoffer Dall <christoffer.dall@linaro.org>
Fixes: 761876c857cb ("tap: XDP support")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agotcp: purge write queue upon RST
Soheil Hassas Yeganeh [Tue, 27 Feb 2018 23:32:18 +0000 (18:32 -0500)]
tcp: purge write queue upon RST

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit a27fd7a8ed3856faaf5a2ff1c8c5f00c0667aaa0 ]

When the connection is reset, there is no point in
keeping the packets on the write queue until the connection
is closed.

RFC 793 (page 70) and RFC 793-bis (page 64) both suggest
purging the write queue upon RST:
https://tools.ietf.org/html/draft-ietf-tcpm-rfc793bis-07

Moreover, this is essential for a correct MSG_ZEROCOPY
implementation, because userspace cannot call close(fd)
before receiving zerocopy signals even when the connection
is reset.

Fixes: f214f915e7db ("tcp: enable MSG_ZEROCOPY")
Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agonetlink: put module reference if dump start fails
Jason A. Donenfeld [Wed, 21 Feb 2018 03:41:59 +0000 (04:41 +0100)]
netlink: put module reference if dump start fails

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit b87b6194be631c94785fe93398651e804ed43e28 ]

Before, if cb->start() failed, the module reference would never be put,
because cb->cb_running is intentionally false at this point. Users are
generally annoyed by this because they can no longer unload modules that
leak references. Also, it may be possible to tediously wrap a reference
counter back to zero, especially since module.c still uses atomic_inc
instead of refcount_inc.

This patch expands the error path to simply call module_put if
cb->start() fails.

Fixes: 41c87425a1ac ("netlink: do not set cb_running if dump's start() errs")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agomlxsw: spectrum_router: Do not unconditionally clear route offload indication
Ido Schimmel [Fri, 16 Feb 2018 23:30:44 +0000 (00:30 +0100)]
mlxsw: spectrum_router: Do not unconditionally clear route offload indication

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit d1c95af366961101819f07e3c64d44f3be7f0367 ]

When mlxsw replaces (or deletes) a route it removes the offload
indication from the replaced route. This is problematic for IPv4 routes,
as the offload indication is stored in the fib_info which is usually
shared between multiple routes.

Instead of unconditionally clearing the offload indication, only clear
it if no other route is using the fib_info.

Fixes: 3984d1a89fe7 ("mlxsw: spectrum_router: Provide offload indication using nexthop flags")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Reported-by: Alexander Petrovskiy <alexpe@mellanox.com>
Tested-by: Alexander Petrovskiy <alexpe@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
6 years agocls_u32: fix use after free in u32_destroy_key()
Paolo Abeni [Mon, 5 Feb 2018 21:23:01 +0000 (22:23 +0100)]
cls_u32: fix use after free in u32_destroy_key()

BugLink: http://bugs.launchpad.net/bugs/1755179
[ Upstream commit d7cdee5ea8d28ae1b6922deb0c1badaa3aa0ef8c ]

Li Shuang reported an Oops with cls_u32 due to an use-after-free
in u32_destroy_key(). The use-after-free can be triggered with:

dev=lo
tc qdisc add dev $dev root handle 1: htb default 10
tc filter add dev $dev parent 1: prio 5 handle 1: protocol ip u32 divisor 256
tc filter add dev $dev protocol ip parent 1: prio 5 u32 ht 800:: match ip dst\
 10.0.0.0/8 hashkey mask 0x0000ff00 at 16 link 1:
tc qdisc del dev $dev root

Which causes the following kasan splat:

 ==================================================================
 BUG: KASAN: use-after-free in u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
 Read of size 4 at addr ffff881b83dae618 by task kworker/u48:5/571

 CPU: 17 PID: 571 Comm: kworker/u48:5 Not tainted 4.15.0+ #87
 Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
 Workqueue: tc_filter_workqueue u32_delete_key_freepf_work [cls_u32]
 Call Trace:
  dump_stack+0xd6/0x182
  ? dma_virt_map_sg+0x22e/0x22e
  print_address_description+0x73/0x290
  kasan_report+0x277/0x360
  ? u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
  u32_destroy_key.constprop.21+0x117/0x140 [cls_u32]
  u32_delete_key_freepf_work+0x1c/0x30 [cls_u32]
  process_one_work+0xae0/0x1c80
  ? sched_clock+0x5/0x10
  ? pwq_dec_nr_in_flight+0x3c0/0x3c0
  ? _raw_spin_unlock_irq+0x29/0x40
  ? trace_hardirqs_on_caller+0x381/0x570
  ? _raw_spin_unlock_irq+0x29/0x40
  ? finish_task_switch+0x1e5/0x760
  ? finish_task_switch+0x208/0x760
  ? preempt_notifier_dec+0x20/0x20
  ? __schedule+0x839/0x1ee0
  ? check_noncircular+0x20/0x20
  ? firmware_map_remove+0x73/0x73
  ? find_held_lock+0x39/0x1c0
  ? worker_thread+0x434/0x1820
  ? lock_contended+0xee0/0xee0
  ? lock_release+0x1100/0x1100
  ? init_rescuer.part.16+0x150/0x150
  ? retint_kernel+0x10/0x10
  worker_thread+0x216/0x1820
  ? process_one_work+0x1c80/0x1c80
  ? lock_acquire+0x1a5/0x540
  ? lock_downgrade+0x6b0/0x6b0
  ? sched_clock+0x5/0x10
  ? lock_release+0x1100/0x1100
  ? compat_start_thread+0x80/0x80
  ? do_raw_spin_trylock+0x190/0x190
  ? _raw_spin_unlock_irq+0x29/0x40
  ? trace_hardirqs_on_caller+0x381/0x570
  ? _raw_spin_unlock_irq+0x29/0x40
  ? finish_task_switch+0x1e5/0x760
  ? finish_task_switch+0x208/0x760
  ? preempt_notifier_dec+0x20/0x20
  ? __schedule+0x839/0x1ee0
  ? kmem_cache_alloc_trace+0x143/0x320
  ? firmware_map_remove+0x73/0x73
  ? sched_clock+0x5/0x10
  ? sched_clock_cpu+0x18/0x170
  ? find_held_lock+0x39/0x1c0
  ? schedule+0xf3/0x3b0
  ? lock_downgrade+0x6b0/0x6b0
  ? __schedule+0x1ee0/0x1ee0
  ? do_wait_intr_irq+0x340/0x340
  ? do_raw_spin_trylock+0x190/0x190
  ? _raw_spin_unlock_irqrestore+0x32/0x60
  ? process_one_work+0x1c80/0x1c80
  ? process_one_work+0x1c80/0x1c80
  kthread+0x312/0x3d0
  ? kthread_create_worker_on_cpu+0xc0/0xc0
  ret_from_fork+0x3a/0x50

 Allocated by task 1688:
  kasan_kmalloc+0xa0/0xd0
  __kmalloc+0x162/0x380
  u32_change+0x1220/0x3c9e [cls_u32]
  tc_ctl_tfilter+0x1ba6/0x2f80
  rtnetlink_rcv_msg+0x4f0/0x9d0
  netlink_rcv_skb+0x124/0x320
  netlink_unicast+0x430/0x600
  netlink_sendmsg+0x8fa/0xd60
  sock_sendmsg+0xb1/0xe0
  ___sys_sendmsg+0x678/0x980
  __sys_sendmsg+0xc4/0x210
  do_syscall_64+0x232/0x7f0
  return_from_SYSCALL_64+0x0/0x75

 Freed by task 112:
  kasan_slab_free+0x71/0xc0
  kfree+0x114/0x320
  rcu_process_callbacks+0xc3f/0x1600
  __do_softirq+0x2bf/0xc06

 The buggy address belongs to the object at ffff881b83dae600
  which belongs to the cache kmalloc-4096 of size 4096
 The buggy address is located 24 bytes inside of
  4096-byte region [ffff881b83dae600ffff881b83daf600)
 The buggy address belongs to the page:
 page:ffffea006e0f6a00 count:1 mapcount:0 mapping:          (null) index:0x0 compound_mapcount: 0
 flags: 0x17ffffc0008100(slab|head)
 raw: 0017ffffc0008100 0000000000000000 0000000000000000 0000000100070007
 raw: dead000000000100 dead000000000200 ffff880187c0e600 0000000000000000
 page dumped because: kasan: bad access detected

 Memory state around the buggy address:
  ffff881b83dae500: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
  ffff881b83dae580: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
 >ffff881b83dae600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                             ^
  ffff881b83dae680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
  ffff881b83dae700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 ==================================================================

The problem is that the htnode is freed before the linked knodes and the
latter will try to access the first at u32_destroy_key() time.
This change addresses the issue using the htnode refcnt to guarantee
the correct free order. While at it also add a RCU annotation,
to keep sparse happy.

v1 -> v2: use rtnl_derefence() instead of RCU read locks
v2 -> v3:
  - don't check refcnt in u32_destroy_hnode()
  - cleaned-up u32_destroy() implementation
  - cleaned-up code comment
v3 -> v4:
  - dropped unneeded comment

Reported-by: Li Shuang <shuali@redhat.com>
Fixes: c0d378ef1266 ("net_sched: use tcf_queue_work() in u32 filter")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>