git.proxmox.com Git - mirror_ubuntu-zesty-kernel.git/log

UBUNTU: SAUCE: mm: remove gup_flags FOLL_WRITE games from __get_user_pages()

This is an ancient bug that was actually attrempted to be fixed once
(badly) by me eleven years ago in commit 4ceb5db9757a ("Fix
get_user_pages() race for write access") but that was then undone due to
problems on s390 by commit f33ea7f404e5 ("fix get_user_pages bug").

In the meantime, the s390 situation has long been fixed, and we can once
more try to fix it by checking the pte_dirty() bit properly (and do it
better). Also, the VM has become more scalable, and what was a purely
theoretical race back then has become easier to trigger.

To fix it, we introduce a new internal FOLL_COW flag to mark the "yes,
we already did a COW" rather than play racy games with FOLL_WRITE that
is very fundamental, and then use the pte dirty flag to validate that
the FOLL_COW flag is still valid.

Reported-and-tested-by: Phil "not Paul" Oester <kernel@linuxace.com>
Cc: Michal Hocko <mhocko@kernel.org>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Willy Tarreau <w@1wt.eu>
Acked-by: Hugh Dickins <hughd@google.com>
Cc: Nick Piggin <npiggin@gmail.com>
Cc: Greg Thelen <gthelen@google.com>
Cc: stable@vger.kernel.org
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
CVE-2016-5195

[ saf: Adjust context for missing FOLL_REMOTE ]
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Andy Whitcroft <andy.whitcroft@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

UBUNTU: Start new release

Ignore: yes
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>

UBUNTU: Ubuntu-4.4.0-43.63

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: (no-up) If zone is so small that watermarks are the same, stop zone balance.

BugLink: http://bugs.launchpad.net/bugs/1518457
On an AWS t2.micro instance (Xeon E5-2670, 991MiB of memory).
Occasionally (about once a day), kswapd0 falls into a busy loop and
spins on 100% CPU usage indefinitely. Reject to do the zone balance
when the memory is too small.

Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Gavin Guo <gavin.guo@canonical.com>
Tested-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Acked-by: Leann Ogasawara <leann.ogasawara@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: Start new release

Ignore: yes
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: Ubuntu-4.4.0-42.62

Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: SAUCE: net: add recursion limit to GRO

BugLink: http://bugs.launchpad.net/bugs/1631287
Currently, GRO can do unlimited recursion through the gro_receive
handlers.  This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem.  Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.

This patch adds a recursion counter to the GRO layer to prevent stack
overflow.  When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally.

Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.")
Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Reviewed-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Steve Beattie <steve.beattie@canonical.com>
[ saf: backported to xenial
  - Fix up gro_receive callback invocation in udb_gro_receive()
  - Adjust context ]
CVE-2016-7039
Acked-by: Kamal Mostafa <kamal@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

gro: Allow tunnel stacking in the case of FOU/GUE

BugLink: http://bugs.launchpad.net/bugs/1631287
This patch should fix the issues seen with a recent fix to prevent
tunnel-in-tunnel frames from being generated with GRO. The fix itself is
correct for now as long as we do not add any devices that support
NETIF_F_GSO_GRE_CSUM. When such a device is added it could have the
potential to mess things up due to the fact that the outer transport header
points to the outer UDP header and not the GRE header as would be expected.

Fixes: fac8e0f579695 ("tunnels: Don't apply GRO to multiple layers of encapsulation.")
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c3483384ee511ee2af40b4076366cd82a6a47b86)
Acked-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Steve Beattie <steve.beattie@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

tunnels: Don't apply GRO to multiple layers of encapsulation.

BugLink: http://bugs.launchpad.net/bugs/1631287
When drivers express support for TSO of encapsulated packets, they
only mean that they can do it for one layer of encapsulation.
Supporting additional levels would mean updating, at a minimum,
more IP length fields and they are unaware of this.

No encapsulation device expresses support for handling offloaded
encapsulated packets, so we won't generate these types of frames
in the transmit path. However, GRO doesn't have a check for
multiple levels of encapsulation and will attempt to build them.

UDP tunnel GRO actually does prevent this situation but it only
handles multiple UDP tunnels stacked on top of each other. This
generalizes that solution to prevent any kind of tunnel stacking
that would cause problems.

Fixes: bf5a755f ("net-gre-gro: Add GRE support to the GRO stack")
Signed-off-by: Jesse Gross <jesse@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
(backported from commit fac8e0f579695a3ecbc4d3cac369139d7f819971)
Acked-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Steve Beattie <steve.beattie@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>

UBUNTU: Start new release

Ignore: yes
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

UBUNTU: Ubuntu-4.4.0-41.61

Signed-off-by: Kamal Mostafa <kamal@canonical.com>

UBUNTU: (fix) NVMe: Don't unmap controller registers on reset

BugLink: http://bugs.launchpad.net/bugs/1626894
Commit b00a726a9fd82ddd4c10344e46f0d371e1674303 upstream.

Fix nvme_pci_enable missing initializer accidentally omitted from Xenial
backport of 30d6592 "NVMe: Don't unmap controller registers on reset"
(and homogenize comment lines with the 4.4.y-stable tree).

Signed-off-by: Kamal Mostafa <kamal@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Stefan Bader <stefan.bader@canonical.com>

UBUNTU: Start new release

Ignore: yes
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

UBUNTU: Ubuntu-4.4.0-40.60

Signed-off-by: Kamal Mostafa <kamal@canonical.com>

UBUNTU: SAUCE: Fix regression which breaks DFS mounting

BugLink: http://bugs.launchpad.net/bugs/1626112
Patch a6b5058 results in -EREMOTE returned by is_path_accessible() in
cifs_mount() to be ignored which breaks DFS mounting.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Reviewed-by: Aurelien Aptel <aaptel@suse.com>
CC: Stable <stable@vger.kernel.org>
Signed-off-by: Steve French <smfrench@gmail.com>
(cherry picked from commit de5233745cd59cf5853d963ad216067788a87594
git://git.samba.org/sfrench/cifs-2.6.git)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

Compare prepaths when comparing superblocks

BugLink: http://bugs.launchpad.net/bugs/1626112
The patch
fs/cifs: make share unaccessible at root level mountable
makes use of prepaths when any component of the underlying path is
inaccessible.

When mounting 2 separate shares having different prepaths but are other
wise similar in other respects, we end up sharing superblocks when we
shouldn't be doing so.

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Tested-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
(cherry picked from commit c1d8b24d18192764fe82067ec6aa8d4c3bf094e0)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

Fix memory leaks in cifs_do_mount()

BugLink: http://bugs.launchpad.net/bugs/1626112
Fix memory leaks introduced by the patch
fs/cifs: make share unaccessible at root level mountable

Also move allocation of cifs_sb->prepath to cifs_setup_cifs_sb().

Signed-off-by: Sachin Prabhu <sprabhu@redhat.com>
Tested-by: Aurelien Aptel <aaptel@suse.com>
Signed-off-by: Steve French <smfrench@gmail.com>
(backported from commit 4214ebf4654798309364d0c678b799e402f38288)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

UBUNTU: SAUCE: i915_bpo: drm/i915/backlight: setup backlight pwm alternate increment on backlight enable

BugLink: http://bugs.launchpad.net/bugs/1625932
Backlight enable is supposed to do a full setup of the backlight. We
were missing the PWM alternate increment bit in the south chicken
registers on lpt+ pch. This potentially caused a PWM frequency change
when the chicken register value was lost e.g. on suspend.

v2 by Jani, rebase on the patch caching alt increment

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=97486
References: https://bugs.freedesktop.org/show_bug.cgi?id=67454
Cc: Cooper Chiou <cooper.chiou@intel.com>
Cc: Wei Shun Chen <wei.shun.chang@intel.com>
Cc: Gary C Wang <gary.c.wang@intel.com>
Cc: stable@vger.kernel.org # v4.4+ 32b421e79e6b drm/i915/backlight: setup and cache...
Cc: stable@vger.kernel.org # v4.4+
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Shawn Lee <shawn.c.lee@intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/8265f5935bd31c039ddfc82819d26c2ca1ae9cba.1474281249.git.jani.nikula@intel.com
(cherry-picked from drm-intel-next-queued commit e29aff05f239f8d)
Signed-off-by: AceLan Kao <acelan.kao@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

UBUNTU: SAUCE: i915_bpo: drm/i915/backlight: setup and cache pwm alternate increment value

BugLink: http://bugs.launchpad.net/bugs/1625932
This will also be needed later on when setting up the alternate
increment in backlight enable.

Cc: Shawn Lee <shawn.c.lee@intel.com>
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: Jani Nikula <jani.nikula@intel.com>
Link: http://patchwork.freedesktop.org/patch/msgid/9984b20bc59aee90b83caf59ce91f3fb122c9627.1474281249.git.jani.nikula@intel.com
(cherry-picked from drm-intel-next-queued commit 32b421e79e6b546)
Signed-off-by: AceLan Kao <acelan.kao@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

UBUNTU: Start new release

Ignore: yes
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

UBUNTU: Ubuntu-4.4.0-39.59

Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>

net: thunderx: Fix for issues with multiple CQEs posted for a TSO packet

BugLink: http://bugs.launchpad.net/bugs/1624569
On ThunderX 88xx pass 2.x chips when TSO is offloaded to HW,
HW posts a CQE for every TSO segment transmitted. Current code
does handles this, but is prone to issues when segment sizes are
small resulting in SW processing too many CQEs and also at times
frees a SKB which is not yet transmitted.

This patch handles the errata in a different way and eliminates issues
with earlier approach, TSO packet is submitted to HW with post_cqe=0,
so that no CQE is posted upon completion of transmission of TSO packet
but a additional HDR + IMMEDIATE descriptors are added to SQ due to
which a CQE is posted and will have required info to be used while
cleanup in napi. This way only one CQE is posted for a TSO packet.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7ceb8a1319ec64954459d474dd4a8c3c60ff0999)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

net: thunderx: Fix for HW issue while padding TSO packet

BugLink: http://bugs.launchpad.net/bugs/1623627
There is a issue in HW where-in while sending GSO sized pkts
as part of TSO, if pkt len falls below configured min packet
size i.e 60, NIC will zero PAD packet and also updates IP total length.
Hence set this value to lessthan min pkt size of MAC + IP + TCP
headers, BGX will anyway do the padding to transmit 64 byte pkt
including FCS.

Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 57e81d44b0e1aa4dcb479ff8de8fc34cf635d0e8)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

tcp: fix use after free in tcp_xmit_retransmit_queue()

When tcp_sendmsg() allocates a fresh and empty skb, it puts it at the
tail of the write queue using tcp_add_write_queue_tail()

Then it attempts to copy user data into this fresh skb.

If the copy fails, we undo the work and remove the fresh skb.

Unfortunately, this undo lacks the change done to tp->highest_sack and
we can leave a dangling pointer (to a freed skb)

Later, tcp_xmit_retransmit_queue() can dereference this pointer and
access freed memory. For regular kernels where memory is not unmapped,
this might cause SACK bugs because tcp_highest_sack_seq() is buggy,
returning garbage instead of tp->snd_nxt, but with various debug
features like CONFIG_DEBUG_PAGEALLOC, this can crash the kernel.

This bug was found by Marco Grassi thanks to syzkaller.

Fixes: 6859d49475d4 ("[TCP]: Abstract tp->highest_sack accessing & point to next skb")
Reported-by: Marco Grassi <marco.gra@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Cc: Yuchung Cheng <ycheng@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
CVE-2016-6828
(cherry picked from commit bb1fceca22492109be12640d49f5ea5a544c6bb4)
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

UBUNTU: SAUCE: (no-up) ALSA: usb-audio: Add quirk for sennheiser officerunner

BugLink: http://bugs.launchpad.net/bugs/1622763
This USB headset doesn't seem to support sample rate polling, similar to this issue:

https://bugzilla.kernel.org/show_bug.cgi?id=95961

Every time something goes to interact with the device (playing a sound
file, opening the sound panel, opening web audio/video), a 10 second pause
is encountered, where dmesg prints out two messages:

usb 2-1.2: 2:1: cannot get freq at ep 0x83
usb 2-1.2: 2:1: cannot get freq at ep 0x83

Once the sound is playing, everything is fine. These sample rate polls
don't seem to keep happening. After waiting for maybe 30 seconds after
sound is playing, future interactions will again trigger the pause.

This 10 second pause can introduce other subtle problems. For instance,
google hangouts will sometimes timeout waiting for the sound device to
respond, and the browser tab will crash or not fully load as a result. The
sound panel often also will not display the device in the list of choices,
and you will have to close it out. Sometimes restart pulse audio to get it
to recognize the headset again.

Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

EDAC, ie31200_edac: Add Skylake support

BugLink: http://bugs.launchpad.net/bugs/1619766
Skylake adjusts some register locations, but otherwise follows the
existing model quite closely. I was able to verify that the 'ce_count'
increments when 'bad dimms' are used. The accounting of 'ce_count' and
'ue_count' is the primary functionality of interest for us. Tested on
Intel(R) Xeon(R) CPU E3-1260L v5 @ 2.90GHz.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Tony Luck <tony.luck@intel.com>
Cc: linux-edac <linux-edac@vger.kernel.org>
Link: http://lkml.kernel.org/r/1462547927-22679-1-git-send-email-jbaron@akamai.com
Signed-off-by: Borislav Petkov <bp@suse.de>
(cherry picked from commit 953dee9bbd245f5515173126b9cc8b1a2c340797)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

UBUNTU: SAUCE: nvme: Don't suspend admin queue that wasn't created

Pending 4.8-rc merge.
BugLink: http://bugs.launchpad.net/bugs/1602724
This fixes a regression in my previous commit c21377f8366c ("nvme:
Suspend all queues before deletion"), which provoked an Oops in the
removal path when removing a device that became IO incapable very early
at probe (i.e. after a failed EEH recovery).

Turns out, if the error occurred very early at the probe path, before
even configuring the admin queue, we might try to suspend the
uninitialized admin queue, accessing bad memory.

Fixes: c21377f8366c ("nvme: Suspend all queues before deletion")
Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Reviewed-by: Jay Freyensee <james_p_freyensee@linux.intel.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

blk-mq: don't overwrite rq->mq_ctx

BugLink: http://bugs.launchpad.net/bugs/1620317
We do this in a few places, if the CPU is offline. This isn't allowed,
though, since on multi queue hardware, we can't just move a request
from one software queue to another, if they map to different hardware
queues. The request and tag isn't valid on another hardware queue.

This can happen if plugging races with CPU offlining. But it does
no harm, since it can only happen in the window where we are
currently busy freezing the queue and flushing IO, in preparation
for redoing the software <-> hardware queue mappings.

Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit e57690fe009b2ab0cee8a57f53be634540e49c9d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

blk-mq: improve warning for running a queue on the wrong CPU

BugLink: http://bugs.launchpad.net/bugs/1620317
__blk_mq_run_hw_queue() currently warns if we are running the queue on a
CPU that isn't set in its mask. However, this can happen if a CPU is
being offlined, and the workqueue handling will place the work on CPU0
instead. Improve the warning so that it only triggers if the batch cpu
in the hardware queue is currently online. If it triggers for that
case, then it's indicative of a flow problem in blk-mq, so we want to
retain it for that case.

Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 0e87e58bf60edb6bb28e493c7a143f41b091a5e5)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

blk-mq: Allow timeouts to run while queue is freezing

BugLink: http://bugs.launchpad.net/bugs/1620317
In case a submitted request gets stuck for some reason, the block layer
can prevent the request starvation by starting the scheduled timeout work.
If this stuck request occurs at the same time another thread has started
a queue freeze, the blk_mq_timeout_work will not be able to acquire the
queue reference and will return silently, thus not issuing the timeout.
But since the request is already holding a q_usage_counter reference and
is unable to complete, it will never release its reference, preventing
the queue from completing the freeze started by first thread.  This puts
the request_queue in a hung state, forever waiting for the freeze
completion.

This was observed while running IO to a NVMe device at the same time we
toggled the CPU hotplug code. Eventually, once a request got stuck
requiring a timeout during a queue freeze, we saw the CPU Hotplug
notification code get stuck inside blk_mq_freeze_queue_wait, as shown in
the trace below.

[c000000deaf13690] [c000000deaf13738] 0xc000000deaf13738 (unreliable)
[c000000deaf13860] [c000000000015ce8] __switch_to+0x1f8/0x350
[c000000deaf138b0] [c000000000ade0e4] __schedule+0x314/0x990
[c000000deaf13940] [c000000000ade7a8] schedule+0x48/0xc0
[c000000deaf13970] [c0000000005492a4] blk_mq_freeze_queue_wait+0x74/0x110
[c000000deaf139e0] [c00000000054b6a8] blk_mq_queue_reinit_notify+0x1a8/0x2e0
[c000000deaf13a40] [c0000000000e7878] notifier_call_chain+0x98/0x100
[c000000deaf13a90] [c0000000000b8e08] cpu_notify_nofail+0x48/0xa0
[c000000deaf13ac0] [c0000000000b92f0] _cpu_down+0x2a0/0x400
[c000000deaf13b90] [c0000000000b94a8] cpu_down+0x58/0xa0
[c000000deaf13bc0] [c0000000006d5dcc] cpu_subsys_offline+0x2c/0x50
[c000000deaf13bf0] [c0000000006cd244] device_offline+0x104/0x140
[c000000deaf13c30] [c0000000006cd40c] online_store+0x6c/0xc0
[c000000deaf13c80] [c0000000006c8c78] dev_attr_store+0x68/0xa0
[c000000deaf13cc0] [c0000000003974d0] sysfs_kf_write+0x80/0xb0
[c000000deaf13d00] [c0000000003963e8] kernfs_fop_write+0x188/0x200
[c000000deaf13d50] [c0000000002e0f6c] __vfs_write+0x6c/0xe0
[c000000deaf13d90] [c0000000002e1ca0] vfs_write+0xc0/0x230
[c000000deaf13de0] [c0000000002e2cdc] SyS_write+0x6c/0x110
[c000000deaf13e30] [c000000000009204] system_call+0x38/0xb4

The fix is to allow the timeout work to execute in the window between
dropping the initial refcount reference and the release of the last
reference, which actually marks the freeze completion.  This can be
achieved with percpu_refcount_tryget, which does not require the counter
to be alive.  This way the timeout work can do it's job and terminate a
stuck request even during a freeze, returning its reference and avoiding
the deadlock.

Allowing the timeout to run is just a part of the fix, since for some
devices, we might get stuck again inside the device driver's timeout
handler, should it attempt to allocate a new request in that path -
which is a quite common action for Abort commands, which need to be sent
after a timeout.  In NVMe, for instance, we call blk_mq_alloc_request
from inside the timeout handler, which will fail during a freeze, since
it also tries to acquire a queue reference.

I considered a similar change to blk_mq_alloc_request as a generic
solution for further device driver hangs, but we can't do that, since it
would allow new requests to disturb the freeze process.  I thought about
creating a new function in the block layer to support unfreezable
requests for these occasions, but after working on it for a while, I
feel like this should be handled in a per-driver basis.  I'm now
experimenting with changes to the NVMe timeout path, but I'm open to
suggestions of ways to make this generic.

Signed-off-by: Gabriel Krisman Bertazi <krisman@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Cc: Keith Busch <keith.busch@intel.com>
Cc: linux-nvme@lists.infradead.org
Cc: linux-block@vger.kernel.org
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@fb.com>
(cherry picked from commit 71f79fb3179e69b0c1448a2101a866d871c66e7f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

btrfs: bugfix: handle FS_IOC32_{GETFLAGS, SETFLAGS, GETVERSION} in btrfs_ioctl

BugLink: http://bugs.launchpad.net/bugs/1619918
32-bit ioctl uses these rather than the regular FS_IOC_* versions. They can
be handled in btrfs using the same code. Without this, 32-bit {ch,ls}attr
fail.

Signed-off-by: Luke Dashjr <luke-jr+git@utopios.org>
Cc: stable@vger.kernel.org
Reviewed-by: Josef Bacik <jbacik@fb.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
(back ported from commit 4c63c2454eff996c5e27991221106eb511f7db38)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Conflicts:
fs/btrfs/ctree.h

Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

drm/radeon/dp: add back special handling for NUTMEG

BugLink: http://bugs.launchpad.net/bugs/1600092
When I fixed the dp rate selection in:
092c96a8ab9d1bd60ada2ed385cc364ce084180e
drm/radeon: fix dp link rate selection (v2)
I accidently dropped the special handling for NUTMEG
DP bridge chips. They require a fixed link rate.

Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Ken Wang <Qingqing.Wang@amd.com>
Reviewed-by: Harry Wentland <harry.wentland@amd.com>
Tested-by: Ken Moffat <zarniwhoop@ntlworld.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
Cc: stable@vger.kernel.org
(cherry picked from commit c8213a638f65bf487c10593c216525952cca3690)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

qed: add MODULE_FIRMWARE()

BugLink: http://bugs.launchpad.net/bugs/1623187
Module is using a binary firmware file and so should be marked as such.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d43d3f0f393b21ee14c0487d5757edae194c4848)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

ixgbevf: Use mac_ops instead of trying to identify NIC type

BugLink: http://bugs.launchpad.net/bugs/1616677
This change makes it so that we can just use function pointers instead of
having to identify if a given VF is running on a Linux or Windows PF. By
doing this we can avoid having to pull too much information out of the
lower layers and can instead just make use of the mac_ops pointers since
they should differ between the two types of VFs anyway.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 2f8214fe6811a246265629d81af2313695c63f4d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

ixgbevf: Change the relaxed order settings in VF driver for sparc

BugLink: http://bugs.launchpad.net/bugs/1616677
We noticed performance issues with VF interface on sparc compared
to PF. Setting the RX to IXGBE_DCA_RXCTRL_DATA_WRO_EN brings it
on far with PF. Also this matches to the default sparc setting in
PF driver.

Signed-off-by: Babu Moger <babu.moger@oracle.com>
Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 33b0eb15961393d8c60e7c4bddd23da53cd1c2e4)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

ixgbevf: Support Windows hosts (Hyper-V)

BugLink: http://bugs.launchpad.net/bugs/1616677
On Hyper-V, the VF/PF communication is a via software mediated path
as opposed to the hardware mailbox. Make the necessary
adjustments to support Hyper-V.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit c6d45171d706c2b5efa3d5ee7a8260c14b6367c0)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

ixgbevf: Add the device ID's presented while running on Hyper-V

BugLink: http://bugs.launchpad.net/bugs/1616677
Intel SR-IOV cards present different ID when running on Hyper-V.
Add the device IDs presented while running on Hyper-V.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit b4363fbd8df2be23439e15a53b4040897228c481)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

ixgbevf: Move API negotiation function into mac_ops

BugLink: http://bugs.launchpad.net/bugs/1616677
This patch moves API negotiation into mac_ops. The general idea here is
that with HyperV on the way we need to make certain that anything that will
have different versions between HyperV and a standard VF needs to be
abstracted enough so that we can have a separate function between the two
so we can avoid changes in one breaking something in the other.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 7921f4dc4c36e736d7a5b45dfa7b6a755a4fc012)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

ixgbevf: make use of BIT() macro to avoid shift of signed values

BugLink: http://bugs.launchpad.net/bugs/1616677
Also cleanup a case where we're bit shifting a value into place, and use
an unsigned constant. Make use of the unsigned postfix in places where
BIT() macro is not appropriate.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 8d055cc0c8be92cd6a77193460117f0ab0a05286)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

ixgbevf: add support for per-queue ethtool stats

BugLink: http://bugs.launchpad.net/bugs/1616677
Implement per-queue statistics for packets, bytes and busy poll
specific counters.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit a02a5a53418a6039893f5d5a9373cf18080fded2)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

ixgbevf: refactor ethtool stats handling

BugLink: http://bugs.launchpad.net/bugs/1616677
This brings the logic closer to how we handle the stats in ixgbe and it
sets us up for introducing per-queue stats.

Use IXGBEVF_STAT and IXGBEVF_NETDEV_STAT for accessing the driver and
netdev stats respectively. This way we don't have to calculate the
stats based on register values which could lead to the counters not
being initialized properly when the interface is down.

IXGBEVF_QUEUE_STATS_LEN is set to include the number of queues.

Also some defines were renamed to use the IXGBEVF prefix.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit d72d6c19b583afc09ace22baf80b29b11139a8f3)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

ixgbe/ixgbevf: Add support for bulk free in Tx cleanup & cleanup boolean logic

BugLink: http://bugs.launchpad.net/bugs/1616677
This patch enables bulk free in Tx cleanup for ixgbevf and cleans up the
boolean logic in the polling routines for ixgbe and ixgbevf in the hopes of
avoiding any mix-ups similar to what occurred with i40e and i40evf.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 8220bbc12d39175964cb56e100fabcedd59c48da)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

ixgbevf: Add support for generic Tx checksums

BugLink: http://bugs.launchpad.net/bugs/1616677
This patch adds support for generic Tx checksums to the ixgbevf driver.  It
turns out this is actually pretty easy after going over the datasheet as we
were doing a number of steps we didn't need to.

In order to perform a Tx checksum for an L4 header we need to fill in the
following fields in the Tx descriptor:
  MACLEN (maximum of 127), retrieved from:
skb_network_offset()
  IPLEN  (maximum of 511), retrieved from:
skb_checksum_start_offset() - skb_network_offset()
  TUCMD.L4T indicates offset and if checksum or crc32c, based on:
skb->csum_offset

The added advantage to doing this is that we can support inner checksum
offloads for tunnels and MPLS while still being able to transparently
insert VLAN tags.

I also took the opportunity to clean-up many of the feature flag
configuration bits to make them a bit more consistent between drivers.  In
the case of the VF drivers this meant adding support for SCTP CRCs, and
inner checksum offloads for MPLS and various tunnel types.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit cb2b3edbece804d9836647c1ca51282ad384d425)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

ixgbevf: use bit operations for setting and checking resets

BugLink: http://bugs.launchpad.net/bugs/1616677
Move the reset flags to adapter->state in order to make use of bit
operations.

This is an alternative patch to the one previously submitted by
John Greene.

Suggested-by: Alexander Duyck <aduyck@mirantis.com>
Reported-by: Scott Otto <otts62@yahoo.com>
Reported-by: John Greene <jogreene@redhat.com>
Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Phil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit d5dd7c3fa4dbff70fc25acf54acb63cf971fd6e9)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

ixgbevf: fix error code path when setting MAC address

BugLink: http://bugs.launchpad.net/bugs/1616677
Return error when a MAC address change is rejected by the PF.

This will prevent the user from modifying the MAC address when
that operation is not permitted.

Signed-off-by: Emil Tantilov <emil.s.tantilov@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 32ca68683532ab629d16cede1102b36ae5346f40)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

ixgbevf: call ndo_stop() instead of dev_close() when running offline selftest

BugLink: http://bugs.launchpad.net/bugs/1616677
Calling dev_close() causes IFF_UP to be cleared which will remove the
interfaces routes and some addresses. That's probably not what the user
intended when running the offline selftest. Besides this does not happen
if the interface is brought down before the test, so the current
behaviour is inconsistent.
Instead call the net_device_ops ndo_stop function directly and avoid
touching IFF_UP at all.

Signed-off-by: Stefan Assmann <sassmann@kpanic.de>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
(cherry picked from commit 324d086709978fce1671ba04087bf90865b04398)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

Drivers: hv: vmbus: Use the new virt_xx barrier code

BugLink: http://bugs.launchpad.net/bugs/1616677
Use the virt_xx barriers that have been defined for use in virtual machines.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(back ported from commit dcd0eeca4454d5c5a60aa3467581e0c6d7461b96)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Conflicts:
drivers/hv/ring_buffer.c
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: fix bonding devices check in netvsc_netdev_event()

BugLink: http://bugs.launchpad.net/bugs/1616677
Bonding driver sets IFF_BONDING on both master (the bonding device) and
slave (the real NIC) devices and in netvsc_netdev_event() we want to skip
master devices only. Currently, there is an uncertainty when a slave
interface is removed: if bonding module comes first in netdev_chain it
clears IFF_BONDING flag on the netdev and netvsc_netdev_event() correctly
handles NETDEV_UNREGISTER event, but in case netvsc comes first on the
chain it sees the device with IFF_BONDING still attached and skips it. As
we still hold vf_netdev pointer to the device we crash on the next inject.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0dbff144a1e7310e2f8b7a957352c4be9aeb38e4)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: protect module refcount by checking net_device_ctx->vf_netdev

BugLink: http://bugs.launchpad.net/bugs/1616677
We're not guaranteed to see NETDEV_REGISTER/NETDEV_UNREGISTER notifications
only once per VF but we increase/decrease module refcount unconditionally.
Check vf_netdev to make sure we don't take/release it twice. We presume
that only one VF per netvsc device may exist.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0f20d795f78d182c4b743d880a5e8dc4d39892fe)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: reset vf_inject on VF removal

BugLink: http://bugs.launchpad.net/bugs/1616677
We reset vf_inject on VF going down (netvsc_vf_down()) but we don't on
VF removal (netvsc_unregister_vf()) so vf_inject stays 'true' while
vf_netdev is already NULL and we're trying to inject packets into NULL
net device in netvsc_recv_callback() causing kernel to crash.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 57c1826b991244d2144eb6e3d5d1b13a53cbea63)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: avoid deadlocks between rtnl lock and vf_use_cnt wait

BugLink: http://bugs.launchpad.net/bugs/1616677
Here is a deadlock scenario:
- netvsc_vf_up() schedules netvsc_notify_peers() work and quits.
- netvsc_vf_down() runs before netvsc_notify_peers() gets executed. As it
  is being executed from netdev notifier chain we hold rtnl lock when we
  get here.
- we enter while (atomic_read(&net_device_ctx->vf_use_cnt) != 0) loop and
  wait till netvsc_notify_peers() drops vf_use_cnt.
- netvsc_notify_peers() starts on some other CPU but netdev_notify_peers()
  will hang on rtnl_lock().
- deadlock!

Instead of introducing additional synchronization I suggest we drop
gwrk.dwrk completely and call NETDEV_NOTIFY_PEERS directly. As we're
acting under rtnl lock this is legitimate.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d072218f214929194db06069564495b6b9fff34a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: don't lose VF information

BugLink: http://bugs.launchpad.net/bugs/1616677
struct netvsc_device is not suitable for storing VF information as this
structure is being destroyed on MTU change / set channel operation (see
rndis_filter_device_remove()). Move all VF related stuff to struct
net_device_context which is persistent.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f9a7da9130ef0143eb900794c7863dc5c9051fbc)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: Fix VF register on bonding devices

BugLink: http://bugs.launchpad.net/bugs/1616677
Added a condition to avoid bonding devices with same MAC registering
as VF.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e2b9f1f7af1dfe20df8e68849ebb4bbafed5727a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

PCI: hv: Fix interrupt cleanup path

BugLink: http://bugs.launchpad.net/bugs/1616677
SR-IOV disabled from the host causes a memory leak. pci-hyperv usually
first receives a PCI_EJECT notification and then proceeds to delete the
hpdev list entry in hv_eject_device_work(). Later in hv_msi_free() since
the device is no longer on the device list hpdev is NULL and hv_msi_free
returns without freeing int_desc as part of hv_int_desc_free().

Signed-off-by: Cathy Avery <cavery@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jake Oshins <jakeo@microsoft.com>
(cherry picked from commit 0c6e617f656ec259162a41c0849e3a7557c99d95)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

x86/kernel: Audit and remove any unnecessary uses of module.h

BugLink: http://bugs.launchpad.net/bugs/1616677
Historically a lot of these existed because we did not have
a distinction between what was modular code and what was providing
support to modules via EXPORT_SYMBOL and friends.  That changed
when we forked out support for the latter into the export.h file.

This means we should be able to reduce the usage of module.h
in code that is obj-y Makefile or bool Kconfig.  The advantage
in doing so is that module.h itself sources about 15 other headers;
adding significantly to what we feed cpp, and it can obscure what
headers we are effectively using.

Since module.h was the source for init.h (for __init) and for
export.h (for EXPORT_SYMBOL) we consider each obj-y/bool instance
for the presence of either and replace as needed.  Build testing
revealed some implicit header usage that was fixed up accordingly.

Note that some bool/obj-y instances remain since module.h is
the header for some exception table entry stuff, and for things
like __init_or_module (code that is tossed when MODULES=n).

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20160714001901.31603-4-paul.gortmaker@windriver.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit 186f43608a5c827f8284fe4559225b4dccaa49ef)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

netvsc: Use the new in-place consumption APIs in the rx path

BugLink: http://bugs.launchpad.net/bugs/1616677
Use the new APIs for eliminating a copy on the receive path. These new APIs also
help in minimizing the number of memory barriers we end up issuing (in the
ringbuffer code) since we can better control when we want to expose the ring
state to the host.

The patch is being resent to address earlier email issues.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 99a50bb11cad44cd1d478256d2388e7afce982ac)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

PCI: hv: Handle all pending messages in hv_pci_onchannelcallback()

BugLink: http://bugs.launchpad.net/bugs/1616677
When we have an interrupt from the host we have a bit set in event page
indicating there are messages for the particular channel.  We need to read
them all as we won't get signaled for what was on the queue before we
cleared the bit in vmbus_on_event().  This applies to all Hyper-V drivers
and the pass-through driver should do the same.

I did not meet any bugs; the issue was found by code inspection.  We don't
have many events going through hv_pci_onchannelcallback(), which explains
why nobody reported the issue before.

While on it, fix handling non-zero vmbus_recvpacket_raw() return values by
dropping out.  If the return value is not zero, it is wrong to inspect
buffer or bytes_recvd as these may contain invalid data.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jake Oshins <jakeo@microsoft.com>
(cherry picked from commit 837d741ea2e6bb23da9cad1667776fc6f0cb67dd)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

PCI: hv: Don't leak buffer in hv_pci_onchannelcallback()

BugLink: http://bugs.launchpad.net/bugs/1616677
We don't free buffer on several code paths in hv_pci_onchannelcallback(),
put kfree() to the end of the function to fix the issue. Direct { kfree();
return; } can now be replaced with a simple 'break';

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: Jake Oshins <jakeo@microsoft.com>
(cherry picked from commit 60fcdac8136b4275da42d6edf9ddb10439350289)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

netvsc: get rid of completion timeouts

BugLink: http://bugs.launchpad.net/bugs/1616677
I'm hitting 5 second timeout in rndis_filter_set_rss_param() while setting
RSS parameters for the device. When this happens we end up returning
-ETIMEDOUT from the function and rndis_filter_device_add() falls back to
setting

        net_device->max_chn = 1;
        net_device->num_chn = 1;
        net_device->num_sc_offered = 0;

but after a moment the rndis request succeeds and subchannels start to
appear. netvsc_sc_open() does unconditional nvscdev->num_sc_offered-- and
it becomes U32_MAX-1. Consequent rndis_filter_device_remove() will hang
while waiting for all U32_MAX-1 subchannels to appear and this is not
going to happen.

The immediate issue could be solved by adding num_sc_offered > 0 check to
netvsc_sc_open() but we're getting out of sync with the host and it's not
easy to adjust things later, e.g. in this particular case we'll be creating
queues without a user request for it and races are expected. Same applies
to other parts of the driver which have the same completion timeout.

Following the trend in drivers/hv/* code I suggest we remove all these
timeouts completely. As a guest we can always trust the host we're running
on and if the host screws things up there is no easy way to recover anyway.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 5362855aba7159aab8f7c6573eb675d9da317914)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: pass struct net_device to rndis_filter_set_offload_params()

BugLink: http://bugs.launchpad.net/bugs/1616677
The only caller rndis_filter_device_add() has 'struct net_device' pointer
already.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 426d95417eebc0698e878708a8a44980e8b7570d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: pass struct net_device to rndis_filter_set_device_mac()

BugLink: http://bugs.launchpad.net/bugs/1616677
We unpack 'struct net_device' in netvsc_set_mac_addr() to get to
'struct hv_device' pointer which we use in rndis_filter_set_device_mac()
to get back to 'struct net_device'.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e834da9a40edd3117ef0a9b2a73d845fe6b622a8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: pass struct netvsc_device to rndis_filter_{open, close}()

BugLink: http://bugs.launchpad.net/bugs/1616677
Both rndis_filter_open()/rndis_filter_close() use struct hv_device to
reach to struct netvsc_device only and all callers have it already.
While on it, rename net_device to nvdev in rndis_filter_open() as
net_device is misleading.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2f5fa6c869e8f8c340dd05a2817eecbcea382c35)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: introduce {net, hv}_device_to_netvsc_device() helpers

BugLink: http://bugs.launchpad.net/bugs/1616677
Make it easier to get 'struct netvsc_device' from 'struct net_device' and
'struct hv_device' by introducing inline helpers.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 2625466d6d92f056da970bee990c862c54188819)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: remove redundant assignment in netvsc_recv_callback()

BugLink: http://bugs.launchpad.net/bugs/1616677
net_device_ctx is assigned in the very beginning of the function and 'net'
pointer doesn't change.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4baa994dc93287321c1b712d98cb6f721652891b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: Fix VF register on vlan devices

BugLink: http://bugs.launchpad.net/bugs/1616677
Added a condition to avoid vlan devices with same MAC registering
as VF.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit cb2911fed61497e4d0383355f1c865fcdaa94061)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: set nvdev link after populating chn_table

BugLink: http://bugs.launchpad.net/bugs/1616677
Crash in netvsc_send() is observed when netvsc device is re-created on
mtu change/set channels. The crash is caused by dereferencing of NULL
channel pointer which comes from chn_table. The root cause is a mixture
of two facts:
- we set nvdev pointer in net_device_context in alloc_net_device()
before we populate chn_table.
- we populate chn_table[0] only.

The issue could be papered over by checking channel != NULL in
netvsc_send() but populating the whole chn_table and writing the
nvdev pointer afterwards seems more appropriate.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 88098834827025cc04c15f1b4b0d9bbef3cf55af)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: synchronize netvsc_change_mtu()/netvsc_set_channels() with netvsc_remove()

BugLink: http://bugs.launchpad.net/bugs/1616677
When netvsc device is removed during mtu change or channels setup we get
into troubles as both paths are trying to remove the device. Synchronize
them with start_remove flag and rtnl lock.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6da7225f5a95ba68e3c6225c4051182bef30eed4)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: get rid of struct net_device pointer in struct netvsc_device

BugLink: http://bugs.launchpad.net/bugs/1616677
Simplify netvsvc pointer graph by getting rid of the redundant ndev
pointer. We can always get a pointer to struct net_device from somewhere
else.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0a1275ca5128b84ffffc149960969ed351ae00eb)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: untangle the pointer mess

BugLink: http://bugs.launchpad.net/bugs/1616677
We have the following structures keeping netvsc adapter state:
- struct net_device
- struct net_device_context
- struct netvsc_device
- struct rndis_device
- struct hv_device
and there are pointers/dependencies between them:
- struct net_device_context is contained in struct net_device
- struct hv_device has driver_data pointer which points to
  'struct net_device' OR 'struct netvsc_device' depending on driver's
  state (!).
- struct net_device_context has a pointer to 'struct hv_device'.
- struct netvsc_device has pointers to 'struct hv_device' and
  'struct net_device_context'.
- struct rndis_device has a pointer to 'struct netvsc_device'.

Different functions get different structures as parameters and use these
pointers for traveling. The problem is (in addition to keeping in mind
this complex graph) that some of these structures (struct netvsc_device
and struct rndis_device) are being removed and re-created on mtu change
(as we implement it as re-creation of hyper-v device) so our travel using
these pointers is dangerous.

Simplify this to a the following:
- add struct netvsc_device pointer to struct net_device_context (which is
  a part of struct net_device and thus never disappears)
- remove struct hv_device and struct net_device_context pointers from
  struct netvsc_device
- replace pointer to 'struct netvsc_device' with pointer to
  'struct net_device'.
- always keep 'struct net_device' in hv_device driver_data.

We'll end up with the following 'circular' structure:

net_device:
[net_device_context] -> netvsc_device -> rndis_device -> net_device
                      -> hv_device -> net_device

On MTU change we'll be removing the 'netvsc_device -> rndis_device'
branch and re-creating it making the synchronization easier.

There is one additional redundant pointer left, it is struct net_device
link in struct netvsc_device, it is going to be removed in a separate
commit.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3d541ac5a92af708d0085925d136f875f3a58d57)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: use start_remove flag to protect netvsc_link_change()

BugLink: http://bugs.launchpad.net/bugs/1616677
netvsc_link_change() can race with netvsc_change_mtu() or
netvsc_set_channels() as these functions destroy struct netvsc_device and
rndis filter. Use start_remove flag for syncronization. As
netvsc_change_mtu()/netvsc_set_channels() are called with rtnl lock held
we need to take it before checking start_remove value in
netvsc_link_change().

Reported-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1bdcec8a5f05445752a0639edd603ac09ae6c553)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: move start_remove flag to net_device_context

BugLink: http://bugs.launchpad.net/bugs/1616677
struct netvsc_device is destroyed on mtu change so keeping the
protection flag there is not a good idea. Move it to struct
net_device_context which is preserved.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f580aec4bfd7babe51f086e599400027def08ed8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

tools: hv: lsvmbus: add pci pass-through UUID

BugLink: http://bugs.launchpad.net/bugs/1616677
lsvmbus keeps its own copy of all VMBus UUIDs, add PCIe pass-through
device there to not report 'Unknown' for such devices.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 552beb4930ef3889d42a049eb9ba3533b4cbe0f6)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

Drivers: hv: balloon: reset host_specified_ha_region

BugLink: http://bugs.launchpad.net/bugs/1616677
We set host_specified_ha_region = true on certain request but this is a
global state which stays 'true' forever. We need to reset it when we
receive a request where ha_region is not specified. I did not see any
real issues, the bug was found by code inspection.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d19a55d6ed5bf0ffe553df2d8bf91d054ddf2d76)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

Drivers: hv: balloon: don't crash when memory is added in non-sorted order

BugLink: http://bugs.launchpad.net/bugs/1616677
When we iterate through all HA regions in handle_pg_range() we have an
assumption that all these regions are sorted in the list and the
'start_pfn >= has->end_pfn' check is enough to find the proper region.
Unfortunately it's not the case with WS2016 where host can hot-add regions
in a different order. We end up modifying the wrong HA region and crashing
later on pages online. Modify the check to make sure we found the region
we were searching for while iterating. Fix the same check in pfn_covered()
as well.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 77c0c9735bc0ba5898e637a3a20d6bcb50e3f67d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

Drivers: hv: vmbus: handle various crash scenarios

BugLink: http://bugs.launchpad.net/bugs/1616677
Kdump keeps biting. Turns out CHANNELMSG_UNLOAD_RESPONSE is always
delivered to the CPU which was used for initial contact or to CPU0
depending on host version. vmbus_wait_for_unload() doesn't account for
the fact that in case we're crashing on some other CPU we won't get the
CHANNELMSG_UNLOAD_RESPONSE message and our wait on the current CPU will
never end.

Do the following:
1) Check for completion_done() in the loop. In case interrupt handler is
   still alive we'll get the confirmation we need.

2) Read message pages for all CPUs message page as we're unsure where
   CHANNELMSG_UNLOAD_RESPONSE is going to be delivered to. We can race with
   still-alive interrupt handler doing the same, add cmpxchg() to
   vmbus_signal_eom() to not lose CHANNELMSG_UNLOAD_RESPONSE message.

3) Cleanup message pages on all CPUs. This is required (at least for the
   current CPU as we're clearing CPU0 messages now but we may want to bring
   up additional CPUs on crash) as new messages won't be delivered till we
   consume what's pending. On boot we'll place message pages somewhere else
   and we won't be able to read stale messages.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit cd95aad5579371ac332507fc946008217fc37e6c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

drivers:hv: Separate out frame buffer logic when picking MMIO range

BugLink: http://bugs.launchpad.net/bugs/1616677
Simplify the logic that picks MMIO ranges by pulling out the
logic related to trying to lay frame buffer claim on top of where
the firmware placed the frame buffer.

Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ea37a6b8a0b9fbe3f85b4b9da3206c28f1de6f8e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

drivers:hv: Track allocations of children of hv_vmbus in private resource tree

BugLink: http://bugs.launchpad.net/bugs/1616677
This patch changes vmbus_allocate_mmio() and vmbus_free_mmio() so
that when child paravirtual devices allocate memory-mapped I/O
space, they allocate it privately from a resource tree pointed
at by hyperv_mmio and also by the public resource tree
iomem_resource.  This allows the region to be marked as "busy"
in the private tree, but a "bridge window" in the public tree,
guaranteeing that no two bridge windows will overlap each other
but while also allowing the PCI device children of the bridge
windows to overlap that window.

One might conclude that this belongs in the pnp layer, rather
than in this driver.  Rafael Wysocki, the maintainter of the
pnp layer, has previously asked that we not modify the pnp layer
as it is considered deprecated.  This patch is thus essentially
a workaround.

Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit be000f93e5d71f5d43dd722f8eb110b069f9d8a2)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

drivers:hv: Make a function to free mmio regions through vmbus

BugLink: http://bugs.launchpad.net/bugs/1616677
This patch introduces a function that reverses everything
done by vmbus_allocate_mmio(). Existing code just called
release_mem_region(). Future patches in this series
require a more complex sequence of actions, so this function
is introduced to wrap those actions.

Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(back ported from commit 97fb77dc87582300fa3c141b63699f853576cab1)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Conflicts:
drivers/hv/vmbus_drv.c
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

drivers:hv: Lock access to hyperv_mmio resource tree

BugLink: http://bugs.launchpad.net/bugs/1616677
In existing code, this tree of resources is created
in single-threaded code and never modified after it is
created, and thus needs no locking. This patch introduces
a semaphore for tree access, as other patches in this
series introduce run-time modifications of this resource
tree which can happen on multiple threads.

Signed-off-by: Jake Oshins <jakeo@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(back ported from commit e16dad6bfe1437aaee565f875a6713ca7ce81bdf)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Conflicts:
drivers/hv/vmbus_drv.c
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

Drivers: hv: vmbus: Implement APIs to support "in place" consumption of vmbus packets

BugLink: http://bugs.launchpad.net/bugs/1616677
Implement APIs for in-place consumption of vmbus packets. Currently, each
packet is copied and processed one at a time and as part of processing
each packet we potentially may signal the host (if it is waiting for
room to produce a packet).

These APIs help batched in-place processing of vmbus packets.
We also optimize host signaling by having a separate API to signal
the end of in-place consumption. With netvsc using these APIs,
on an iperf run on average I see about 20X reduction in checks to
signal the host.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ab028db41ca9174caab7f9e3fc0a2e7f4a418410)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

Drivers: hv: vmbus: Move some ring buffer functions to hyperv.h

BugLink: http://bugs.launchpad.net/bugs/1616677
In preparation for implementing APIs for in-place consumption of VMBUS
packets, movve some ring buffer functionality into hyperv.h

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(back ported from commit 687f32e6d9bd1d63c5e557e877809eb446f1a6e8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Conflicts:
drivers/hv/ring_buffer.c
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

asm-generic: implement virt_xxx memory barriers

BugLink: http://bugs.launchpad.net/bugs/1616677
Guests running within virtual machines might be affected by SMP effects even if
the guest itself is compiled without SMP support. This is an artifact of
interfacing with an SMP host while running an UP kernel. Using mandatory
barriers for this use-case would be possible but is often suboptimal.

In particular, virtio uses a bunch of confusing ifdefs to work around
this, while xen just uses the mandatory barriers.

To better handle this case, low-level virt_mb() etc macros are made available.
These are implemented trivially using the low-level __smp_xxx macros,
the purpose of these wrappers is to annotate those specific cases.

These have the same effect as smp_mb() etc when SMP is enabled, but generate
identical code for SMP and non-SMP systems. For example, virtual machine guests
should use virt_mb() rather than smp_mb() when synchronizing against a
(possibly SMP) host.

Suggested-by: David Miller <davem@davemloft.net>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
(cherry picked from commit 6a65d26385bf487926a0616650927303058551e3)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

x86: define __smp_xxx

BugLink: http://bugs.launchpad.net/bugs/1616677
This defines __smp_xxx barriers for x86,
for use by virtualization.

smp_xxx barriers are removed as they are
defined correctly by asm-generic/barriers.h

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 1638fb72070f8faf2ac0787fafbb839d0c859d5b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

asm-generic: add __smp_xxx wrappers

BugLink: http://bugs.launchpad.net/bugs/1616677
On !SMP, most architectures define their
barriers as compiler barriers.
On SMP, most need an actual barrier.

Make it possible to remove the code duplication for
!SMP by defining low-level __smp_xxx barriers
which do not depend on the value of SMP, then
use them from asm-generic conditionally.

Besides reducing code duplication, these low level APIs will also be
useful for virtualization, where a barrier is sometimes needed even if
!SMP since we might be talking to another kernel on the same SMP system.

Both virtio and Xen drivers will benefit.

The smp_xxx variants should use __smp_XXX ones or barrier() depending on
SMP, identically for all architectures.

We keep ifndef guards around them for now - once/if all
architectures are converted to use the generic
code, we'll be able to remove these.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
(cherry picked from commit a9e4252a9b147043142282ebb65da94dcb951e2a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

x86: reuse asm-generic/barrier.h

BugLink: http://bugs.launchpad.net/bugs/1616677
As on most architectures, on x86 read_barrier_depends and
smp_read_barrier_depends are empty. Drop the local definitions and pull
the generic ones from asm-generic/barrier.h instead: they are identical.

This is in preparation to refactoring this code area.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
(cherry picked from commit 300b06d4555305dc227748674f75970f2f84c224)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

asm-generic: guard smp_store_release/load_acquire

BugLink: http://bugs.launchpad.net/bugs/1616677
Allow architectures to override smp_store_release
and smp_load_acquire by guarding the defines
in asm-generic/barrier.h with ifndef directives.

This is in preparation to reusing asm-generic/barrier.h
on architectures which have their own definition
of these macros.

Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
(cherry picked from commit 57f7c0370f386d5e0960e25d2c3ceb0b8e8c489d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

lcoking/barriers, arch: Use smp barriers in smp_store_release()

BugLink: http://bugs.launchpad.net/bugs/1616677
With commit b92b8b35a2e ("locking/arch: Rename set_mb() to smp_store_mb()")
it was made clear that the context of this call (and thus set_mb)
is strictly for CPU ordering, as opposed to IO. As such all archs
should use the smp variant of mb(), respecting the semantics and
saving a mandatory barrier on UP.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: <linux-arch@vger.kernel.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Tony Luck <tony.luck@intel.com>
Cc: dave@stgolabs.net
Link: http://lkml.kernel.org/r/1445975631-17047-3-git-send-email-dave@stgolabs.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
(cherry picked from commit 5a1b26d7c629915446222ebe77d16567c98426ff)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

Drivers: hv: vmbus: Export the vmbus_set_event() API

BugLink: http://bugs.launchpad.net/bugs/1616677
In preparation for moving some ring buffer functionality out of the
vmbus driver, export the API for signaling the host.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 5cc472477f928fb8584eb8e08245c9cf9002d74a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

Drivers: hv: vmbus: Use READ_ONCE() to read variables that are volatile

BugLink: http://bugs.launchpad.net/bugs/1616677
Use the READ_ONCE macro to access variabes that can change asynchronously.
This is the recommended mechanism for dealing with "unsafe" compiler
optimizations.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d45faaeedb762a3965a0246cf831e55045dd6ef8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

Drivers: hv: vmbus: Introduce functions for estimating room in the ring buffer

BugLink: http://bugs.launchpad.net/bugs/1616677
Introduce separate functions for estimating how much can be read from
and written to the ring buffer.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a6341f000024cdf1ec14dc26743a409a17378db5)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: Fix the list processing for network change event

BugLink: http://bugs.launchpad.net/bugs/1616677
RNDIS_STATUS_NETWORK_CHANGE event is handled as two "half events" --
media disconnect & connect. The second half should be added to the list
head, not to the tail. So all events are processed in normal order.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 15cfd40771e18a4e9b788c64c9db2606f958b93d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

hv_netvsc: Implement support for VF drivers on Hyper-V

BugLink: http://bugs.launchpad.net/bugs/1616677
Support VF drivers on Hyper-V. On Hyper-V, each VF instance presented to
the guest has an associated synthetic interface that shares the MAC address
with the VF instance. Typically these are bonded together to support
live migration. By default, the host delivers all the incoming packets
on the synthetic interface. Once the VF is up, we need to explicitly switch
the data path on the host to divert traffic onto the VF interface. Even after
switching the data path, broadcast and multicast packets are always delivered
on the synthetic interface and these will have to be injected back onto the
VF interface (if VF is up).
This patch implements the necessary support in netvsc to support Linux
VF drivers.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 84bf9cefb162b197da20a0f4388929f4b5ba5db4)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Kamal Mostafa <kamal@canonical.com>

UBUNTU: SAUCE: i2c: i801: Add support for Kaby Lake PCH-H

BugLink: http://bugs.launchpad.net/bugs/1622469
Intel Kaby Lake PCH-H has the same legacy SMBus host controller than Intel
Sunrisepoint PCH. It also has same iTCO watchdog on the bus.

Add Kaby Lake PCH-H PCI ID to the list of supported devices.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Phidias Chiang <phidias.chiang@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>

mfd: lpss: Add Intel Kaby Lake PCH-H PCI IDs

BugLink: http://bugs.launchpad.net/bugs/1622469
Intel Kaby Lake PCH-H has the same LPSS than Intel Sunrisepoint. Add the new
IDs to the list of supported devices.

Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
(cherry picked from linux-next commit a6a576b78e09cccee78add5abf5a40311b06f8ce)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>

Linux 4.4.21

BugLink: http://bugs.launchpad.net/bugs/1624037
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>

lib/mpi: mpi_write_sgl(): fix skipping of leading zero limbs

BugLink: http://bugs.launchpad.net/bugs/1624037
commit f2d1362ff7d266b3d2b1c764d6c2ef4a3b457f23 upstream.

Currently, if the number of leading zeros is greater than fits into a
complete limb, mpi_write_sgl() skips them by iterating over them limb-wise.

However, it fails to adjust its internal leading zeros tracking variable,
lzeros, accordingly: it does a

  p -= sizeof(alimb);
  continue;

which should really have been a

  lzeros -= sizeof(alimb);
  continue;

Since lzeros never decreases if its initial value >= sizeof(alimb), nothing
gets copied by mpi_write_sgl() in that case.

Instead of skipping the high order zero limbs within the loop as shown
above, fix the issue by adjusting the copying loop's bounds.

Fixes: 2d4d1eea540b ("lib/mpi: Add mpi sgl helpers")
Signed-off-by: Nicolai Stange <nicstange@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Kamal Mostafa <kamal@canonical.com>