git.proxmox.com Git - mirror_ubuntu-artful-kernel.git/log

KVM: x86: emulate FXSAVE and FXRSTOR

BugLink: http://bugs.launchpad.net/bugs/1658091
commit 283c95d0e3891b64087706b344a4b545d04a6e62 upstream.

Internal errors were reported on 16 bit fxsave and fxrstor with ipxe.
Old Intels don't have unrestricted_guest, so we have to emulate them.

The patch takes advantage of the hardware implementation.

AMD and Intel differ in saving and restoring other fields in first 32
bytes.  A test wrote 0xff to the fxsave area, 0 to upper bits of MCSXR
in the fxsave area, executed fxrstor, rewrote the fxsave area to 0xee,
and executed fxsave:

  Intel (Nehalem):
    7f 1f 7f 7f ff 00 ff 07 ff ff ff ff ff ff 00 00
    ff ff ff ff ff ff 00 00 ff ff 00 00 ff ff 00 00
  Intel (Haswell -- deprecated FPU CS and FPU DS):
    7f 1f 7f 7f ff 00 ff 07 ff ff ff ff 00 00 00 00
    ff ff ff ff 00 00 00 00 ff ff 00 00 ff ff 00 00
  AMD (Opteron 2300-series):
    7f 1f 7f 7f ff 00 ee ee ee ee ee ee ee ee ee ee
    ee ee ee ee ee ee ee ee ff ff 00 00 ff ff 02 00

fxsave/fxrstor will only be emulated on early Intels, so KVM can't do
much to improve the situation.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

KVM: x86: add asm_safe wrapper

BugLink: http://bugs.launchpad.net/bugs/1658091
commit aabba3c6abd50b05b1fc2c6ec44244aa6bcda576 upstream.

Move the existing exception handling for inline assembly into a macro
and switch its return values to X86EMUL type.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

KVM: x86: add Align16 instruction flag

BugLink: http://bugs.launchpad.net/bugs/1658091
commit d3fe959f81024072068e9ed86b39c2acfd7462a9 upstream.

Needed for FXSAVE and FXRSTOR.

Signed-off-by: Radim Krčmář <rkrcmar@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

KVM: x86: flush pending lapic jump label updates on module unload

BugLink: http://bugs.launchpad.net/bugs/1658091
commit cef84c302fe051744b983a92764d3fcca933415d upstream.

KVM's lapic emulation uses static_key_deferred (apic_{hw,sw}_disabled).
These are implemented with delayed_work structs which can still be
pending when the KVM module is unloaded. We've seen this cause kernel
panics when the kvm_intel module is quickly reloaded.

Use the new static_key_deferred_flush() API to flush pending updates on
module unload.

Signed-off-by: David Matlack <dmatlack@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

jump_labels: API for flushing deferred jump label updates

BugLink: http://bugs.launchpad.net/bugs/1658091
commit b6416e61012429e0277bd15a229222fd17afc1c1 upstream.

Modules that use static_key_deferred need a way to synchronize with
any delayed work that is still pending when the module is unloaded.
Introduce static_key_deferred_flush() which flushes any pending
jump label updates.

Signed-off-by: David Matlack <dmatlack@google.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

KVM: eventfd: fix NULL deref irqbypass consumer

BugLink: http://bugs.launchpad.net/bugs/1658091
commit 4f3dbdf47e150016aacd734e663347fcaa768303 upstream.

Reported syzkaller:

    BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
    IP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
    PGD 0

    Oops: 0002 [#1] SMP
    CPU: 1 PID: 125 Comm: kworker/1:1 Not tainted 4.9.0+ #1
    Workqueue: kvm-irqfd-cleanup irqfd_shutdown [kvm]
    task: ffff9bbe0dfbb900 task.stack: ffffb61802014000
    RIP: 0010:irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass]
    Call Trace:
     irqfd_shutdown+0x66/0xa0 [kvm]
     process_one_work+0x16b/0x480
     worker_thread+0x4b/0x500
     kthread+0x101/0x140
     ? process_one_work+0x480/0x480
     ? kthread_create_on_node+0x60/0x60
     ret_from_fork+0x25/0x30
    RIP: irq_bypass_unregister_consumer+0x9d/0xb70 [irqbypass] RSP: ffffb61802017e20
    CR2: 0000000000000008

The syzkaller folks reported a NULL pointer dereference that due to
unregister an consumer which fails registration before. The syzkaller
creates two VMs w/ an equal eventfd occasionally. So the second VM
fails to register an irqbypass consumer. It will make irqfd as inactive
and queue an workqueue work to shutdown irqfd and unregister the irqbypass
consumer when eventfd is closed. However, the second consumer has been
initialized though it fails registration. So the token(same as the first
VM's) is taken to unregister the consumer through the workqueue, the
consumer of the first VM is found and unregistered, then NULL deref incurred
in the path of deleting consumer from the consumers list.

This patch fixes it by making irq_bypass_register/unregister_consumer()
looks for the consumer entry based on consumer pointer itself instead of
token matching.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Suggested-by: Alex Williamson <alex.williamson@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Alex Williamson <alex.williamson@redhat.com>
Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

KVM: x86: fix emulation of "MOV SS, null selector"

BugLink: http://bugs.launchpad.net/bugs/1658091
commit 33ab91103b3415e12457e3104f0e4517ce12d0f3 upstream.

This is CVE-2017-2583. On Intel this causes a failed vmentry because
SS's type is neither 3 nor 7 (even though the manual says this check is
only done for usable SS, and the dmesg splat says that SS is unusable!).
On AMD it's worse: svm.c is confused and sets CPL to 0 in the vmcb.

The fix fabricates a data segment descriptor when SS is set to a null
selector, so that CPL and SS.DPL are set correctly in the VMCS/vmcb.
Furthermore, only allow setting SS to a NULL selector if SS.RPL < 3;
this in turn ensures CPL < 3 because RPL must be equal to CPL.

Thanks to Andy Lutomirski and Willy Tarreau for help in analyzing
the bug and deciphering the manuals.

Reported-by: Xiaohan Zhang <zhangxiaohan1@huawei.com>
Fixes: 79d5b4c3cd809c770d4bf9812635647016c56011
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

mm/hugetlb.c: fix reservation race when freeing surplus pages

BugLink: http://bugs.launchpad.net/bugs/1658091
commit e5bbc8a6c992901058bc09e2ce01d16c111ff047 upstream.

return_unused_surplus_pages() decrements the global reservation count,
and frees any unused surplus pages that were backing the reservation.

Commit 7848a4bf51b3 ("mm/hugetlb.c: add cond_resched_lock() in
return_unused_surplus_pages()") added a call to cond_resched_lock in the
loop freeing the pages.

As a result, the hugetlb_lock could be dropped, and someone else could
use the pages that will be freed in subsequent iterations of the loop.
This could result in inconsistent global hugetlb page state, application
api failures (such as mmap) failures or application crashes.

When dropping the lock in return_unused_surplus_pages, make sure that
the global reservation count (resv_huge_pages) remains sufficiently
large to prevent someone else from claiming pages about to be freed.

Analyzed by Paul Cassella.

Fixes: 7848a4bf51b3 ("mm/hugetlb.c: add cond_resched_lock() in return_unused_surplus_pages()")
Link: http://lkml.kernel.org/r/1483991767-6879-1-git-send-email-mike.kravetz@oracle.com
Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
Reported-by: Paul Cassella <cassella@cray.com>
Suggested-by: Michal Hocko <mhocko@kernel.org>
Cc: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
Cc: Hillf Danton <hillf.zj@alibaba-inc.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ocfs2: fix crash caused by stale lvb with fsdlm plugin

BugLink: http://bugs.launchpad.net/bugs/1658091
commit e7ee2c089e94067d68475990bdeed211c8852917 upstream.

The crash happens rather often when we reset some cluster nodes while
nodes contend fiercely to do truncate and append.

The crash backtrace is below:

   dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover_grant 1 locks on 971 resources
   dlm: C21CBDA5E0774F4BA5A9D4F317717495: dlm_recover 9 generation 5 done: 4 ms
   ocfs2: Begin replay journal (node 318952601, slot 2) on device (253,18)
   ocfs2: End replay journal (node 318952601, slot 2) on device (253,18)
   ocfs2: Beginning quota recovery on device (253,18) for slot 2
   ocfs2: Finishing quota recovery on device (253,18) for slot 2
   (truncate,30154,1):ocfs2_truncate_file:470 ERROR: bug expression: le64_to_cpu(fe->i_size) != i_size_read(inode)
   (truncate,30154,1):ocfs2_truncate_file:470 ERROR: Inode 290321, inode i_size = 732 != di i_size = 937, i_flags = 0x1
   ------------[ cut here ]------------
   kernel BUG at /usr/src/linux/fs/ocfs2/file.c:470!
   invalid opcode: 0000 [#1] SMP
   Modules linked in: ocfs2_stack_user(OEN) ocfs2(OEN) ocfs2_nodemanager ocfs2_stackglue(OEN) quota_tree dlm(OEN) configfs fuse sd_mod    iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi af_packet iscsi_ibft iscsi_boot_sysfs softdog xfs libcrc32c ppdev parport_pc pcspkr parport      joydev virtio_balloon virtio_net i2c_piix4 acpi_cpufreq button processor ext4 crc16 jbd2 mbcache ata_generic cirrus virtio_blk ata_piix               drm_kms_helper ahci syscopyarea libahci sysfillrect sysimgblt fb_sys_fops ttm floppy libata drm virtio_pci virtio_ring uhci_hcd virtio ehci_hcd       usbcore serio_raw usb_common sg dm_multipath dm_mod scsi_dh_rdac scsi_dh_emc scsi_dh_alua scsi_mod autofs4
   Supported: No, Unsupported modules are loaded
   CPU: 1 PID: 30154 Comm: truncate Tainted: G           OE   N  4.4.21-69-default #1
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.1-0-g4adadbd-20151112_172657-sheep25 04/01/2014
   task: ffff88004ff6d240 ti: ffff880074e68000 task.ti: ffff880074e68000
   RIP: 0010:[<ffffffffa05c8c30>]  [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]
   RSP: 0018:ffff880074e6bd50  EFLAGS: 00010282
   RAX: 0000000000000074 RBX: 000000000000029e RCX: 0000000000000000
   RDX: 0000000000000001 RSI: 0000000000000246 RDI: 0000000000000246
   RBP: ffff880074e6bda8 R08: 000000003675dc7a R09: ffffffff82013414
   R10: 0000000000034c50 R11: 0000000000000000 R12: ffff88003aab3448
   R13: 00000000000002dc R14: 0000000000046e11 R15: 0000000000000020
   FS:  00007f839f965700(0000) GS:ffff88007fc80000(0000) knlGS:0000000000000000
   CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
   CR2: 00007f839f97e000 CR3: 0000000036723000 CR4: 00000000000006e0
   Call Trace:
     ocfs2_setattr+0x698/0xa90 [ocfs2]
     notify_change+0x1ae/0x380
     do_truncate+0x5e/0x90
     do_sys_ftruncate.constprop.11+0x108/0x160
     entry_SYSCALL_64_fastpath+0x12/0x6d
   Code: 24 28 ba d6 01 00 00 48 c7 c6 30 43 62 a0 8b 41 2c 89 44 24 08 48 8b 41 20 48 c7 c1 78 a3 62 a0 48 89 04 24 31 c0 e8 a0 97 f9 ff <0f> 0b 3d 00 fe ff ff 0f 84 ab fd ff ff 83 f8 fc 0f 84 a2 fd ff
   RIP  [<ffffffffa05c8c30>] ocfs2_truncate_file+0x640/0x6c0 [ocfs2]

It's because ocfs2_inode_lock() get us stale LVB in which the i_size is
not equal to the disk i_size.  We mistakenly trust the LVB because the
underlaying fsdlm dlm_lock() doesn't set lkb_sbflags with
DLM_SBF_VALNOTVALID properly for us.  But, why?

The current code tries to downconvert lock without DLM_LKF_VALBLK flag
to tell o2cb don't update RSB's LVB if it's a PR->NULL conversion, even
if the lock resource type needs LVB.  This is not the right way for
fsdlm.

The fsdlm plugin behaves different on DLM_LKF_VALBLK, it depends on
DLM_LKF_VALBLK to decide if we care about the LVB in the LKB.  If
DLM_LKF_VALBLK is not set, fsdlm will skip recovering RSB's LVB from
this lkb and set the right DLM_SBF_VALNOTVALID appropriately when node
failure happens.

The following diagram briefly illustrates how this crash happens:

RSB1 is inode metadata lock resource with LOCK_TYPE_USES_LVB;

The 1st round:

             Node1                                    Node2
RSB1: PR
                                                  RSB1(master): NULL->EX
ocfs2_downconvert_lock(PR->NULL, set_lvb==0)
  ocfs2_dlm_lock(no DLM_LKF_VALBLK)

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

dlm_lock(no DLM_LKF_VALBLK)
  convert_lock(overwrite lkb->lkb_exflags
               with no DLM_LKF_VALBLK)

RSB1: NULL                                        RSB1: EX
                                                  reset Node2
dlm_recover_rsbs()
  recover_lvb()

/* The LVB is not trustable if the node with EX fails and
* no lock >= PR is left. We should set RSB_VALNOTVALID for RSB1.
*/

if(!(kb_exflags & DLM_LKF_VALBLK)) /* This means we miss the chance to
           return;                   * to invalid the LVB here.
                                     */

The 2nd round:

         Node 1                                Node2
RSB1(become master from recovery)

ocfs2_setattr()
  ocfs2_inode_lock(NULL->EX)
    /* dlm_lock() return the stale lvb without setting DLM_SBF_VALNOTVALID */
    ocfs2_meta_lvb_is_trustable() return 1 /* so we don't refresh inode from disk */
  ocfs2_truncate_file()
      mlog_bug_on_msg(disk isize != i_size_read(inode))  /* crash! */

The fix is quite straightforward.  We keep to set DLM_LKF_VALBLK flag
for dlm_lock() if the lock resource type needs LVB and the fsdlm plugin
is uesed.

Link: http://lkml.kernel.org/r/1481275846-6604-1-git-send-email-zren@suse.com
Signed-off-by: Eric Ren <zren@suse.com>
Reviewed-by: Joseph Qi <jiangqi903@gmail.com>
Cc: Mark Fasheh <mfasheh@versity.com>
Cc: Joel Becker <jlbec@evilplan.org>
Cc: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

mm: fix devm_memremap_pages crash, use mem_hotplug_{begin, done}

BugLink: http://bugs.launchpad.net/bugs/1658091
commit f931ab479dd24cf7a2c6e2df19778406892591fb upstream.

Both arch_add_memory() and arch_remove_memory() expect a single threaded
context.

For example, arch/x86/mm/init_64.c::kernel_physical_mapping_init() does
not hold any locks over this check and branch:

    if (pgd_val(*pgd)) {
     pud = (pud_t *)pgd_page_vaddr(*pgd);
     paddr_last = phys_pud_init(pud, __pa(vaddr),
        __pa(vaddr_end),
        page_size_mask);
     continue;
    }

    pud = alloc_low_page();
    paddr_last = phys_pud_init(pud, __pa(vaddr), __pa(vaddr_end),
        page_size_mask);

The result is that two threads calling devm_memremap_pages()
simultaneously can end up colliding on pgd initialization.  This leads
to crash signatures like the following where the loser of the race
initializes the wrong pgd entry:

    BUG: unable to handle kernel paging request at ffff888ebfff0000
    IP: memcpy_erms+0x6/0x10
    PGD 2f8e8fc067 PUD 0 /* <---- Invalid PUD */
    Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
    CPU: 54 PID: 3818 Comm: systemd-udevd Not tainted 4.6.7+ #13
    task: ffff882fac290040 ti: ffff882f887a4000 task.ti: ffff882f887a4000
    RIP: memcpy_erms+0x6/0x10
    [..]
    Call Trace:
      ? pmem_do_bvec+0x205/0x370 [nd_pmem]
      ? blk_queue_enter+0x3a/0x280
      pmem_rw_page+0x38/0x80 [nd_pmem]
      bdev_read_page+0x84/0xb0

Hold the standard memory hotplug mutex over calls to
arch_{add,remove}_memory().

Fixes: 41e94a851304 ("add devm_memremap_pages")
Link: http://lkml.kernel.org/r/148357647831.9498.12606007370121652979.stgit@dwillia2-desk3.amr.corp.intel.com
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Cc: Christoph Hellwig <hch@lst.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

selftests: do not require bash for the generated test

BugLink: http://bugs.launchpad.net/bugs/1658091
commit a2b1e8a20c992b01eeb76de00d4f534cbe9f3822 upstream.

Nothing in this minimal script seems to require bash. We often run these
tests on embedded devices where the only shell available is the busybox
ash. Use sh instead.

Signed-off-by: Rolf Eike Beer <eb@emlix.com>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

selftests: do not require bash to run netsocktests testcase

BugLink: http://bugs.launchpad.net/bugs/1658091
commit 3659f98b5375d195f1870c3e508fe51e52206839 upstream.

Nothing in this minimal script seems to require bash. We often run these
tests on embedded devices where the only shell available is the busybox
ash. Use sh instead.

Signed-off-by: Rolf Eike Beer <eb@emlix.com>
Signed-off-by: Shuah Khan <shuahkh@osg.samsung.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Input: i8042 - add Pegatron touchpad to noloop table

BugLink: http://bugs.launchpad.net/bugs/1658091
commit 41c567a5d7d1a986763e58c3394782813c3bcb03 upstream.

Avoid AUX loopback in Pegatron C15B touchpad, so input subsystem is able
to recognize a Synaptics touchpad in the AUX port.

Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=93791
(Touchpad is not detected on DNS 0801480 notebook (PEGATRON C15B))

Suggested-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Marcos Paulo de Souza <marcos.souza.org@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Input: xpad - use correct product id for x360w controllers

BugLink: http://bugs.launchpad.net/bugs/1658091
commit b6fc513da50c5dbc457a8ad6b58b046a6a68fd9d upstream.

currently the controllers get the same product id as the wireless
receiver. However the controllers actually have their own product id.

The patch makes the driver expose the same product id as the windows
driver.

This improves compatibility when running applications with WINE.

see https://github.com/paroj/xpad/issues/54

Signed-off-by: Pavel Rojtberg <rojtberg@gmail.com>
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

UBUNTU: [Config] CONFIG_SND_SOC_INTEL_BYTCR_RT5660_MACH=m, CONFIG_SND_SOC_RT5660=m

BugLink: http://bugs.launchpad.net/bugs/1657674
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

UBUNTU: SAUCE: (no-up) ASoC: Intel: Support machine driver for RT5660 on Baytrail

BugLink: http://bugs.launchpad.net/bugs/1657674
Dell Caracalla IoT gateways sport an RT5660 codec with HID 10EC3277.
This patch adds the machine driver required to support RT5660 codec on
Baytrail based systems.

Signed-off-by: Shrirang Bagul <shrirang.bagul@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

UBUNTU: SAUCE: (no-up) ASoC: rt5660: Add ACPI support

BugLink: http://bugs.launchpad.net/bugs/1657674
On Dell IoT Gateways, RT5660 codec is available with ACPI ID 10EC3277.
Also, GPIO's are only available by index, so we register mappings to allow
machine drivers to access them by name.

Signed-off-by: Shrirang Bagul <shrirang.bagul@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ASoC: Intel: Atom: flip logic for gain Switch

BugLink: http://bugs.launchpad.net/bugs/1657674
The upstreamed code modified the control names from Mute to
Switch without changing the logic. To get audio working the Switch
needs to be off which isn't aligned with normal ALSA conventions.

Inverting the logic now so that Switch Off means mute and Switch On
means active audio using the specific volume setting.

Signed-off-by: Sebastien Guiriec <sebastien.guiriec@intel.com>
Signed-off-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 940a5a014d50e15269b5d197ab571d1ca9971c43)
Signed-off-by: Shrirang Bagul <shrirang.bagul@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ASoC: rt5660: enable MCLK detection

BugLink: http://bugs.launchpad.net/bugs/1657674
There is a power saving mechanism in rt5660. It will turn off some
unused power when MCLK is not present. We call that "MCLK detection"
and it should be enabled by default.

Signed-off-by: Bard Liao <bardliao@realtek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit d01580c394f58dff234a03f0f2f32c051a556f15)
Signed-off-by: Shrirang Bagul <shrirang.bagul@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ASoC: rt5660: add rt5660 codec driver

BugLink: http://bugs.launchpad.net/bugs/1657674
This is the initial codec driver for rt5660

Signed-off-by: Oder Chiou <oder_chiou@realtek.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
(cherry picked from commit 2b26dd4c1fc5f83bc088f4a053120ca03817045e)
Signed-off-by: Shrirang Bagul <shrirang.bagul@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

UBUNTU: [Config] Enable CONFIG_VEN_RSI_* configs

BugLink: http://bugs.launchpad.net/bugs/1657682
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

UBUNTU: SAUCE: RS9113: Comment out IDs from upstream driver

BugLink: http://bugs.launchpad.net/bugs/1657682
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

UBUNTU: SAUCE: RS9113: Use vendor driver to support WLAN/BT card on Caracalla HW only

BugLink: http://bugs.launchpad.net/bugs/1657682
This patch limits the RS9113 driver from vendor to be used for the WLAN/BT
card mounted on Dell Caracalla IoT gateways.

Signed-off-by: Shrirang Bagul <shrirang.bagul@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

UBUNTU: SAUCE: Redpine RS9113 WLAN/BT driver ver. 0.9.7

BugLink: http://bugs.launchpad.net/bugs/1657682
This is the beta2 release for RS9113 driver from Redpine

Signed-off-by: Darren Wu <darren.wu@canonical.com>
Signed-off-by: Shrirang Bagul <shrirang.bagul@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

UBUNTU: SAUCE: Separate Redpine RS9113 WLAN/BT vendor and kernel drivers

BugLink: http://bugs.launchpad.net/bugs/1657682
The driver submitted by vendor Redpine for RS9113 WLAN/BT exports same
symbols as their existing old driver in the kernel. This patch tries to
resolve this conflict.

Signed-off-by: Shrirang Bagul <shrirang.bagul@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

UBUNTU: SAUCE: Support Redpine RS9113 WLAN/BT

BugLink: http://bugs.launchpad.net/bugs/1657682
This patch adds support for RS9113 WLAN-BT cards found on Dell Caracalla
IoT gateways.
Vendor release: RS9113.NB0.NL.GNU.LNX.0.9.3 on 12/01/2016

Signed-off-by: Shrirang Bagul <shrirang.bagul@canonical.com>
Acked-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

netvsc: add rcu_read locking to netvsc callback

BugLink: http://bugs.launchpad.net/bugs/1657540
The receive callback (in tasklet context) is using RCU to get reference
to associated VF network device but this is not safe. RCU read lock
needs to be held. Found by running with full lockdep debugging
enabled.

Fixes: f207c10d9823 ("hv_netvsc: use RCU to protect vf_netdev")
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0719e72ccb801829a3d735d187ca8417f0930459)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

tools: hv: Add a script to help bonding synthetic and VF NICs

BugLink: http://bugs.launchpad.net/bugs/1650059
This script helps to create bonding network devices based on synthetic NIC
(the virtual network adapter usually provided by Hyper-V) and the matching
VF NIC (SRIOV virtual function). So the synthetic NIC and VF NIC can
function as one network device, and fail over to the synthetic NIC if VF is
down.

Mayjor distros (RHEL, Ubuntu, SLES) supported by Hyper-V are supported by
this script.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from linux-next commit 178cd55f086629cf0bad9c66c793a7e2bcc3abb6)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: vmbus: Base host signaling strictly on the ring state

BugLink: http://bugs.launchpad.net/bugs/1650059
One of the factors that can result in the host concluding that a given
guest in mounting a DOS attack is if the guest generates interrupts
to the host when the host is not expecting it. If these "spurious"
interrupts reach a certain rate, the host can throttle the guest to
minimize the impact. The host computation of the "expected number
of interrupts" is strictly based on the ring transitions. Until
the host logic is fixed, base the guest logic to interrupt solely
on the ring state.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 74198eb4a42c4a3c4fbef08fa01a291a282f7c2e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

netvsc: reduce maximum GSO size

BugLink: http://bugs.launchpad.net/bugs/1650059
Hyper-V (and Azure) support using NVGRE which requires some extra space
for encapsulation headers. Because of this the largest allowed TSO
packet is reduced.

For older releases, hard code a fixed reduced value. For next release,
there is a better solution which uses result of host offload
negotiation.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit a50af86dd49ee1851d1ccf06dd0019c05b95e297)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

vmbus: make sysfs names consistent with PCI

BugLink: http://bugs.launchpad.net/bugs/1650059
In commit 9a56e5d6a0ba ("Drivers: hv: make VMBus bus ids persistent")
the name of vmbus devices in sysfs changed to be (in 4.9-rc1):
/sys/bus/vmbus/vmbus-6aebe374-9ba0-11e6-933c-00259086b36b

The prefix ("vmbus-") is redundant and differs from how PCI is
represented in sysfs. Therefore simplify to:
/sys/bus/vmbus/6aebe374-9ba0-11e6-933c-00259086b36b

Please merge this before 4.9 is released and the old format
has to live forever.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f6b2db084b65b9dc0f910bc48d5f77c0e5166dc6)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Revert "hv_netvsc: report vmbus name in ethtool"

BugLink: http://bugs.launchpad.net/bugs/1650059
This reverts commit e3f74b841d48
("hv_netvsc: report vmbus name in ethtool")'
because of problem introduced by commit f9a56e5d6a0ba
("Drivers: hv: make VMBus bus ids persistent").
This changed the format of the vmbus name and this new format is too
long to fit in the bus_info field of ethtool.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e934f684856393a0b2aecc7fdd8357a48b79c535)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

net/hyperv: avoid uninitialized variable

BugLink: http://bugs.launchpad.net/bugs/1650059
The hdr_offset variable is only if we deal with a TCP or UDP packet,
but as the check surrounding its usage tests for skb_is_gso()
instead, the compiler has no idea if the variable is initialized
or not at that point:

drivers/net/hyperv/netvsc_drv.c: In function ‘netvsc_start_xmit’:
drivers/net/hyperv/netvsc_drv.c:494:42: error: ‘hdr_offset’ may be used uninitialized in this function [-Werror=maybe-uninitialized]

This adds an additional check for the transport type, which
tells the compiler that this path cannot happen. Since the
get_net_transport_info() function should always be inlined
here, I don't expect this to result in additional runtime
checks.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 52ccd6318481a467d8ad54ea40f60c61e957d58f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

netvsc: Remove mistaken udp.h inclusion.

BugLink: http://bugs.launchpad.net/bugs/1650059
Based upon v2 of Stephen's patch.

Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3f2b0a5a3583a2c6c3aeb2d59e7f775d43faa89c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

netvsc: fix checksum on UDP IPV6

BugLink: http://bugs.launchpad.net/bugs/1650059
The software calculation of UDP checksum in Netvsc driver was
only handling IPv4 case. By using skb_checksum_help() instead
all protocols can be handled. Rearrange code to eliminate goto
and look like other drivers.

This is a temporary solution; recent versions of Window Server etc
do support UDP checksum offload, just need to do the appropriate negotiation
with host to validate before using. This will be done in later patch.

Please queue this for -stable as well.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ad19bc8a95baee4588e9ec4481297d97c0bec765)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: get rid of id in struct vmbus_channel

BugLink: http://bugs.launchpad.net/bugs/1650059
The auto incremented counter is not being used anymore, get rid of it.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e7fca5d860aeeb1e606448f5191cea8d925cc7a3)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: make VMBus bus ids persistent

BugLink: http://bugs.launchpad.net/bugs/1650059
Some tools use bus ids to identify devices and they count on the fact
that these ids are persistent across reboot. This may be not true for
VMBus as we use auto incremented counter from alloc_channel() as such
id. Switch to using if_instance from channel offer, this id is supposed
to be persistent.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b294809dbfa7e50229d00253d43f9a56e5d6a0ba)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: fix comments

BugLink: http://bugs.launchpad.net/bugs/1650059
Typo's and spelling errors. Also remove old comment from staging era.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c6a77ff82fb849534748719f37f3f9086d78ed39)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: count multicast packets received

BugLink: http://bugs.launchpad.net/bugs/1650059
Useful for debugging issues with multicast and SR-IOV to keep track
of number of received multicast packets.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f7ad75b753f386454f50044fd69edad767b69ce8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: remove VF in flight counters

BugLink: http://bugs.launchpad.net/bugs/1650059
Since VF reference is now protected by RCU, no longer need the VF usage
counter and can use device flags to see whether to inject or not.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9cbcc4280645f0e7e19e6a0da443ec7e69cecf40)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: use RCU to protect vf_netdev

BugLink: http://bugs.launchpad.net/bugs/1650059
The vf_netdev pointer in the netvsc device context can simply be protected
by RCU because network device destruction is already RCU synchronized.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit f207c10d982388fa42710922ad1c0c9d3ba9a87b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: improve VF device matching

BugLink: http://bugs.launchpad.net/bugs/1650059
The code to associate netvsc and VF devices can be made less error prone
by using a better matching algorithms.

On registration, use the permanent address which avoids any possible
issues caused by device MAC address being changed. For all other callbacks,
search by the netdevice pointer value to ensure getting the correct
network device.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e8ff40d4bff1f3b6a588e29ed1fbdfd943642856)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: simplify callback event code

BugLink: http://bugs.launchpad.net/bugs/1650059
The callback handler for netlink events can be simplified:
* Consolidate check for netlink callback events about this driver itself.
* Ignore non-Ethernet devices.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit ee837a137304290a1ae26980c73a367f7afef54f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: dev hold/put reference to VF

BugLink: http://bugs.launchpad.net/bugs/1650059
The netvsc driver holds a pointer to the virtual function network device if
managing SR-IOV association. In order to ensure that the VF network device
does not disappear, it should be using dev_hold/dev_put to get a reference
count.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 07d0f0008c783d2a2fce8497000938db15fd7aa1)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: use consume_skb

BugLink: http://bugs.launchpad.net/bugs/1650059
Packets that are transmitted in normal path should use consume_skb
instead of kfree_skb. This allows for better tracing of packet drops.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 17db4bcef3c3c45b95b3b3d8577f725df1b2c0a0)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Revert "hv_netvsc: make inline functions static"

BugLink: http://bugs.launchpad.net/bugs/1650059
These functions are used by other code misc-next tree.

This reverts commit 30d1de08c87ddde6f73936c3350e7e153988fe02.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3a8963acc70e69606729404713cfa9a03b58b18c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: hv_util: Avoid dynamic allocation in time synch

BugLink: http://bugs.launchpad.net/bugs/1650059
Under stress, we have seen allocation failure in time synch code. Avoid
this dynamic allocation.

Signed-off-by: Vivek Yadav <vyadav@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3ba1eb17b633b419737b65429fa7124ce87c5efc)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: utils: Support TimeSync version 4.0 protocol samples.

BugLink: http://bugs.launchpad.net/bugs/1650059
This enables support for more accurate TimeSync v4 samples when hosted
under Windows Server 2016 and newer hosts.

The new time samples include a "vmreferencetime" field that represents
the guest's TSC value when the host generated its time sample. This value
lets the guest calculate the latency in receiving the time sample. The
latency is added to the sample host time prior to updating the clock.

Signed-off-by: Alex Ng <alexng@messages.microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8e1d260738ca89bc7c87444f95f04a026d12b496)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: utils: Use TimeSync samples to adjust the clock after boot.

BugLink: http://bugs.launchpad.net/bugs/1650059
Only the first 50 samples after boot were being used to discipline the
clock. After the first 50 samples, any samples from the host were ignored
and the guest clock would eventually drift from the host clock.

This patch allows TimeSync-enabled guests to continuously synchronize the
clock with the host clock, even after the first 50 samples.

Signed-off-by: Alex Ng <alexng@messages.microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 2e338f7e034f12066c78999deff8894c337ae23b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: utils: Rename version definitions to reflect protocol version.

BugLink: http://bugs.launchpad.net/bugs/1650059
Different Windows host versions may reuse the same protocol version when
negotiating the TimeSync, Shutdown, and Heartbeat protocols. We should only
refer to the protocol version to avoid conflating the two concepts.

Signed-off-by: Alex Ng <alexng@messages.microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit abeda47ebb20997aa3c5b4cac7db70f8bea62d7d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: vmbus: suppress some "hv_vmbus: Unknown GUID" warnings

BugLink: http://bugs.launchpad.net/bugs/1650059
Some VMBus devices are not needed by Linux guest[1][2], and, VMBus channels
of Hyper-V Sockets don't really mean usual synthetic devices, so let's
suppress the warnings for them.

[1] https://support.microsoft.com/en-us/kb/2925727
[2] https://msdn.microsoft.com/en-us/library/jj980180(v=winembedded.81).aspx

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 0f98829a99850836cf7c2cc9fbf1d7ce0f795780)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Driver: hv: vmbus: Make mmio resource local

BugLink: http://bugs.launchpad.net/bugs/1650059
This fixes a sparse warning because hyperv_mmio resources
are only used in this one file and should be static.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e2e808413425d82c8f162933a30304d10a5df231)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Revert "Drivers: hv: ring_buffer: count on wrap around mappings in get_next_pkt_raw()"

BugLink: http://bugs.launchpad.net/bugs/1650059
To deal with the merge conflict between net-next and char-misc trees,
revert commit bb08d431a914 from char-misc tree. This commit can be rebased
and applied once net-next picks up char-misc changes.

Here is the commit log of the reverted patch:
"With wrap around mappings in place we can always provide drivers with
direct links to packets on the ring buffer, even when they wrap around.
Do the required updates to get_next_pkt_raw()/put_pkt_raw()"

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 65a532f3d50a266bcc5e9c7efc636006565e8e7e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

PCI: hv: Handle hv_pci_generic_compl() error case

BugLink: http://bugs.launchpad.net/bugs/1650059
'completion_status' is used in some places, e.g.,
hv_pci_protocol_negotiation(), so we should make sure it's initialized in
error case too, though the error is unlikely here.

[bhelgaas: fix changelog typo and nearby whitespace]
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: KY Srinivasan <kys@microsoft.com>
CC: Jake Oshins <jakeo@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
(cherry picked from commit a5b45b7b952822aa5fbe842d2ee497c7c9dd7f55)

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

PCI: hv: Handle vmbus_sendpacket() failure in hv_compose_msi_msg()

BugLink: http://bugs.launchpad.net/bugs/1650059
Handle vmbus_sendpacket() failure in hv_compose_msi_msg().

I happened to find this when reading the code. I didn't get a real issue
however.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: KY Srinivasan <kys@microsoft.com>
CC: Jake Oshins <jakeo@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
(cherry picked from commit 665e2245eb46a217926d45031b83f1212a1bb344)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

PCI: hv: Remove the unused 'wrk' in struct hv_pcibus_device

BugLink: http://bugs.launchpad.net/bugs/1650059
Remove the unused 'wrk' member in struct hv_pcibus_device.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: KY Srinivasan <kys@microsoft.com>
CC: Jake Oshins <jakeo@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
(cherry picked from commit 617ceb62eaa1a180e8af1be9903d960c3a0b2ebc)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

PCI: hv: Use pci_function_description[0] in struct definitions

BugLink: http://bugs.launchpad.net/bugs/1650059
The 2 structs can use a zero-length array here, because dynamic memory of
the correct size is allocated in hv_pci_devices_present() and we don't need
this extra element.

No functional change.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: KY Srinivasan <kys@microsoft.com>
CC: Jake Oshins <jakeo@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
(cherry picked from commit 7d0f8eec976ae5a364af84fd8e92f6da9ec05a41)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

PCI: hv: Use zero-length array in struct pci_packet

BugLink: http://bugs.launchpad.net/bugs/1650059
Use zero-length array in struct pci_packet and rename struct pci_message's
field "message_type" to "type". This makes the code more readable.

No functionality change.

Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Acked-by: KY Srinivasan <kys@microsoft.com>
CC: Jake Oshins <jakeo@microsoft.com>
CC: Haiyang Zhang <haiyangz@microsoft.com>
CC: Vitaly Kuznetsov <vkuznets@redhat.com>
(cherry picked from commit 0c6045d8c0eff0f7784f310407ccad44f622aa40)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: utils: Check VSS daemon is listening before a hot backup

BugLink: http://bugs.launchpad.net/bugs/1650059
Hyper-V host will send a VSS_OP_HOT_BACKUP request to check if guest is
ready for a live backup/snapshot. The driver should respond to the check
only if the daemon is running and listening to requests. This allows the
host to fallback to standard snapshots in case the VSS daemon is not
running.

Signed-off-by: Alex Ng <alexng@messages.microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit db886e4d24c2b3d334be2cc1bd1bd05d547eb4c4)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: utils: Continue to poll VSS channel after handling requests.

BugLink: http://bugs.launchpad.net/bugs/1650059
Multiple VSS_OP_HOT_BACKUP requests may arrive in quick succession, even
though the host only signals once. The driver wass handling the first
request while ignoring the others in the ring buffer. We should poll the
VSS channel after handling a request to continue processing other requests.

Signed-off-by: Alex Ng <alexng@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 497af84b81b98b27e9ba7aebb8a373412e328497)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: Introduce a policy for controlling channel affinity

BugLink: http://bugs.launchpad.net/bugs/1650059
Introduce a mechanism to control how channels will be affinitized. We will
support two policies:

1. HV_BALANCED: All performance critical channels will be dstributed
evenly amongst all the available NUMA nodes. Once the Node is assigned,
we will assign the CPU based on a simple round robin scheme.

2. HV_LOCALIZED: Only the primary channels are distributed across all
NUMA nodes. Sub-channels will be in the same NUMA node as the primary
channel. This is the current behaviour.

The default policy will be the HV_BALANCED as it can minimize the remote
memory access on NUMA machines with applications that span NUMA nodes.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 509879bdb30b8e12bd0b3cb0bc8429f01478df4b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: ring_buffer: count on wrap around mappings in get_next_pkt_raw()

BugLink: http://bugs.launchpad.net/bugs/1650059
With wrap around mappings in place we can always provide drivers with
direct links to packets on the ring buffer, even when they wrap around.
Do the required updates to get_next_pkt_raw()/put_pkt_raw()

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(back ported from commit bb08d431a914984dee278e0e0482a2e6d620a482)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Conflicts:
include/linux/hyperv.h
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: ring_buffer: use wrap around mappings in hv_copy{from, to}_ringbuffer()

BugLink: http://bugs.launchpad.net/bugs/1650059
With wrap around mappings for ring buffers we can always use a single
memcpy() to do the job.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit f24f0b495b17df33c03f3b758b1461385e9f0e50)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: ring_buffer: wrap around mappings for ring buffers

BugLink: http://bugs.launchpad.net/bugs/1650059
Make it possible to always use a single memcpy() or to provide a direct
link to a packet on the ring buffer by creating virtual mapping for two
copies of the ring buffer with vmap(). Utilize currently empty
hv_ringbuffer_cleanup() to do the unmap.

While on it, replace sizeof(struct hv_ring_buffer) check
in hv_ringbuffer_init() with BUILD_BUG_ON() as it is a compile time check.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 9988ce685676cebe0b14dc128f00e1ae9cd1a4fa)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: cleanup vmbus_open() for wrap around mappings

BugLink: http://bugs.launchpad.net/bugs/1650059
In preparation for doing wrap around mappings for ring buffers cleanup
vmbus_open() function:
- check that ring sizes are PAGE_SIZE aligned (they are for all in-kernel
  drivers now);
- kfree(open_info) on error only after we kzalloc() it (not an issue as it
  is valid to call kfree(NULL);
- rename poorly named labels;
- use alloc_pages() instead of __get_free_pages() as we need struct page
  pointer for future.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Tested-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 98f531b10d23e3c28e8d34c0e88822a81231b3c2)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: balloon: Use available memory value in pressure report

BugLink: http://bugs.launchpad.net/bugs/1650059
Reports for available memory should use the si_mem_available() value.
The previous freeram value does not include available page cache memory.

Signed-off-by: Alex Ng <alexng@messages.microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit b605c2d913589c448d4a6887262bb8e99da12009)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: balloon: replace ha_region_mutex with spinlock

BugLink: http://bugs.launchpad.net/bugs/1650059
lockdep reports possible circular locking dependency when udev is used
for memory onlining:

systemd-udevd/3996 is trying to acquire lock:
((memory_chain).rwsem){++++.+}, at: [<ffffffff810d137e>] __blocking_notifier_call_chain+0x4e/0xc0

but task is already holding lock:
(&dm_device.ha_region_mutex){+.+.+.}, at: [<ffffffffa015382e>] hv_memory_notifier+0x5e/0xc0 [hv_balloon]
...

which is probably a false positive because we take and release
ha_region_mutex from memory notifier chain depending on the arg. No real
deadlocks were reported so far (though I'm not really sure about
preemptible kernels...) but we don't really need to hold the mutex
for so long. We use it to protect ha_region_list (and its members) and the
num_pages_onlined counter. None of these operations require us to sleep
and nothing is slow, switch to using spinlock with interrupts disabled.

While on it, replace list_for_each -> list_for_each_entry as we actually
need entries in all these cases, drop meaningless list_empty() checks.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit eece30b9f0046cee810a2c7caa2247f3f8dc85e2)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: balloon: don't wait for ol_waitevent when memhp_auto_online is enabled

BugLink: http://bugs.launchpad.net/bugs/1650059
With the recently introduced in-kernel memory onlining
(MEMORY_HOTPLUG_DEFAULT_ONLINE) these is no point in waiting for pages
to come online in the driver and we can get rid of the waiting.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit a132c54cbcb8c239a64370b9cd9468908398bc6e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: balloon: account for gaps in hot add regions

BugLink: http://bugs.launchpad.net/bugs/1650059
I'm observing the following hot add requests from the WS2012 host:

hot_add_req: start_pfn = 0x108200 count = 330752
hot_add_req: start_pfn = 0x158e00 count = 193536
hot_add_req: start_pfn = 0x188400 count = 239616

As the host doesn't specify hot add regions we're trying to create
128Mb-aligned region covering the first request, we create the 0x108000 -
0x160000 region and we add 0x108000 - 0x158e00 memory. The second request
passes the pfn_covered() check, we enlarge the region to 0x108000 -
0x190000 and add 0x158e00 - 0x188200 memory. The problem emerges with the
third request as it starts at 0x188400 so there is a 0x200 gap which is
not covered. As the end of our region is 0x190000 now it again passes the
pfn_covered() check were we just adjust the covered_end_pfn and make it
0x188400 instead of 0x188200 which means that we'll try to online
0x188200-0x188400 pages but these pages were never assigned to us and we
crash.

We can't react to such requests by creating new hot add regions as it may
happen that the whole suggested range falls into the previously identified
128Mb-aligned area so we'll end up adding nothing or create intersecting
regions and our current logic doesn't allow that. Instead, create a list of
such 'gaps' and check for them in the page online callback.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit cb7a5724c7e1bfb5766ad1c3beba14cc715991cf)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: balloon: keep track of where ha_region starts

BugLink: http://bugs.launchpad.net/bugs/1650059
Windows 2012 (non-R2) does not specify hot add region in hot add requests
and the logic in hot_add_req() is trying to find a 128Mb-aligned region
covering the request. It may also happen that host's requests are not 128Mb
aligned and the created ha_region will start before the first specified
PFN. We can't online these non-present pages but we don't remember the real
start of the region.

This is a regression introduced by the commit 5abbbb75d733 ("Drivers: hv:
hv_balloon: don't lose memory when onlining order is not natural"). While
the idea of keeping the 'moving window' was wrong (as there is no guarantee
that hot add requests come ordered) we should still keep track of
covered_start_pfn. This is not a revert, the logic is different.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7cf3b79ec85ee1a5bbaaf936bb1d050dc652983b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Tools: hv: kvp: ensure kvp device fd is closed on exec

BugLink: http://bugs.launchpad.net/bugs/1650059
KVP daemon does fork()/exec() (with popen()) so we need to close our fds
to avoid sharing them with child processes. The immediate implication of
not doing so I see is SELinux complaining about 'ip' trying to access
'/dev/vmbus/hv_kvp'.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 26840437cbd6d3625ea6ab34e17cd34bb810c861)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: vmbus: Implement a mechanism to tag the channel for low latency

BugLink: http://bugs.launchpad.net/bugs/1650059
On Hyper-V, performance critical channels use the monitor
mechanism to signal the host when the guest posts mesages
for the host. This mechanism minimizes the hypervisor intercepts
and also makes the host more efficient in that each time the
host is woken up, it processes a batch of messages as opposed to
just one. The goal here is improve the throughput and this is at
the expense of increased latency.
Implement a mechanism to let the client driver decide if latency
is important.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3724287c0ec472815ebe5ae3790f77965c6aa557)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: vmbus: Reduce the delay between retries in vmbus_post_msg()

BugLink: http://bugs.launchpad.net/bugs/1650059
The current delay between retries is unnecessarily high and is negatively
affecting the time it takes to boot the system.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8de0d7e951826d7592e0ba1da655b175c4aa0923)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: vmbus: Enable explicit signaling policy for NIC channels

BugLink: http://bugs.launchpad.net/bugs/1650059
For synthetic NIC channels, enable explicit signaling policy as netvsc wants to
explicitly control when the host is to be signaled.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ccef9bcc02ee63ac171ea9f0d51e04b3e55b3a12)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: vmbus: fix the race when querying & updating the percpu list

BugLink: http://bugs.launchpad.net/bugs/1650059
There is a rare race when we remove an entry from the global list
hv_context.percpu_list[cpu] in hv_process_channel_removal() ->
percpu_channel_deq() -> list_del(): at this time, if vmbus_on_event() ->
process_chn_event() -> pcpu_relid2channel() is trying to query the list,
we can get the kernel fault.

Similarly, we also have the issue in the code path: vmbus_process_offer() ->
percpu_channel_enq().

We can resolve the issue by disabling the tasklet when updating the list.

The patch also moves vmbus_release_relid() to a later place where
the channel has been removed from the per-cpu and the global lists.

Reported-by: Rolf Neugebauer <rolf.neugebauer@docker.com>
Signed-off-by: Dexuan Cui <decui@microsoft.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 638fea33aee858cc665297a76f0039e95a28ce0c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: utils: fix a race on userspace daemons registration

BugLink: http://bugs.launchpad.net/bugs/1650059
Background: userspace daemons registration protocol for Hyper-V utilities
drivers has two steps:
1) daemon writes its own version to kernel
2) kernel reads it and replies with module version
at this point we consider the handshake procedure being completed and we
do hv_poll_channel() transitioning the utility device to HVUTIL_READY
state. At this point we're ready to handle messages from kernel.

When hvutil_transport is in HVUTIL_TRANSPORT_CHARDEV mode we have a
single buffer for outgoing message. hvutil_transport_send() puts to this
buffer and till the buffer is cleared with hvt_op_read() returns -EFAULT
to all consequent calls. Host<->guest protocol guarantees there is no more
than one request at a time and we will not get new requests till we reply
to the previous one so this single message buffer is enough.

Now to the race. When we finish negotiation procedure and send kernel
module version to userspace with hvutil_transport_send() it goes into the
above mentioned buffer and if the daemon is slow enough to read it from
there we can get a collision when a request from the host comes, we won't
be able to put anything to the buffer so the request will be lost. To
solve the issue we need to know when the negotiation is really done (when
the version message is read by the daemon) and transition to HVUTIL_READY
state after this happens. Implement a callback on read to support this.
Old style netlink communication is not affected by the change, we don't
really know when these messages are delivered but we don't have a single
message buffer there.

Reported-by: Barry Davis <barry_davis@stormagic.com>
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e0fa3e5e7df61eb2c339c9f0067c202c0cdeec2c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: get rid of timeout in vmbus_open()

BugLink: http://bugs.launchpad.net/bugs/1650059
vmbus_teardown_gpadl() can result in infinite wait when it is called on 5
second timeout in vmbus_open(). The issue is caused by the fact that gpadl
teardown operation won't ever succeed for an opened channel and the timeout
isn't always enough. As a guest, we can always trust the host to respond to
our request (and there is nothing we can do if it doesn't).

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 396e287fa2ff46e83ae016cdcb300c3faa3b02f6)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: don't leak memory in vmbus_establish_gpadl()

BugLink: http://bugs.launchpad.net/bugs/1650059
In some cases create_gpadl_header() allocates submessages but we never
free them.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7cc80c98070ccc7940fc28811c92cca0a681015d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

Drivers: hv: get rid of redundant messagecount in create_gpadl_header()

BugLink: http://bugs.launchpad.net/bugs/1650059
We use messagecount only once in vmbus_establish_gpadl() to check if
it is safe to iterate through the submsglist. We can just initialize
the list header in all cases in create_gpadl_header() instead.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 4d63763296ab7865a98bc29cc7d77145815ef89f)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: add ethtool statistics for tx packet issues

BugLink: http://bugs.launchpad.net/bugs/1650059
Printing console messages is not helpful when system is out of memory;
and can be disastrous with netconsole. Instead keep statistics
of these anomalous conditions.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4323b47cf8edfe95bd58e20965667e71121c866e)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: report vmbus name in ethtool

BugLink: http://bugs.launchpad.net/bugs/1650059
Make netvsc on vmbus behave more like PCI.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e3f74b841d482e962b9f5a907eeb25eeeb09aa60)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: make variable local

BugLink: http://bugs.launchpad.net/bugs/1650059
The variable m_ret is only used in one basic block.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 6c4c137e5035e0e17fa40c223fa0a3167e0f65fa)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: make netvsc_destroy_buf void

BugLink: http://bugs.launchpad.net/bugs/1650059
No caller checks the return value.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7a2a0a84fda33062200decf5e201b182591bf0ec)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: refactor completion function

BugLink: http://bugs.launchpad.net/bugs/1650059
Break the different cases, code is cleaner if broken up

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit bc304dd3b4444b84082c483534a8dcac7a22cb9d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: rearrange start_xmit

BugLink: http://bugs.launchpad.net/bugs/1650059
Rearrange the transmit routine to eliminate goto's and unnecessary
boolean variables. Use standard functions to test for vlan tag.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 0ab05141f97de2ac954c267c27b256ebd241e762)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: init completion during alloc

BugLink: http://bugs.launchpad.net/bugs/1650059
Move initialization to allocate where other fields are initialized.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit fd612602d6a7919982779fda914bd521e5778593)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: make device_remove void

BugLink: http://bugs.launchpad.net/bugs/1650059
Always returns 0 and no callers check.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e08f3ea586d4145e36c77f0dd1602374b5d7e928)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: use ARRAY_SIZE() for NDIS versions

BugLink: http://bugs.launchpad.net/bugs/1650059
Don't hard code size of array of NDIS versions.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e5a78fad4f1eb49064e4c35d0a4263424b12124c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: make inline functions static

BugLink: http://bugs.launchpad.net/bugs/1650059
Several new functions were introduced into hyperv.h but only used in one file.
Move them and let compiler decide on inline.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 30d1de08c87ddde6f73936c3350e7e153988fe02)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: style cleanups

BugLink: http://bugs.launchpad.net/bugs/1650059
Fix most of the complaints about the style of the code.
Things like extra blank lines and return statements.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 796cc88c32c1bd1f833d596448ac785a8736e57c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: use kcalloc

BugLink: http://bugs.launchpad.net/bugs/1650059
Better to use kcalloc rather than kzalloc and multiply for an array.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit e53a9c2a5a8cd0a3a8b3c0f9b7a3ad9bc6a28867)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: make RSS hash key static

BugLink: http://bugs.launchpad.net/bugs/1650059
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 9477386687354f2aa8f4843170b7093c6dd1eb37)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: fix rtnl locking in callback

BugLink: http://bugs.launchpad.net/bugs/1650059
The function get_netvsc_net_device had conditional locking. This was
unnecessary, incorrect, but harmless. It was unnecessary since the
code is only called from netlink netdev event callback where RTNL
is always acquired before the callbacks are run. It was incorrect
because of use of trylock and then continuing.
Fix by replacing with proper assertion.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 8737caafd16790c654f1fb8564abcf6e1f3ffe19)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

PCI: hv: Use list_move_tail() instead of list_del() + list_add_tail()

BugLink: http://bugs.launchpad.net/bugs/1650059
Use list_move_tail() instead of list_del() + list_add_tail(). No
functional change intended

Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
(cherry picked from commit 4f1cb01a7892582d18483986fbc268cdef1b1dee)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: Implement batching of receive completions

BugLink: http://bugs.launchpad.net/bugs/1650059
The existing code uses busy retry when unable to send out receive
completions due to full ring buffer. It also gives up retrying after limit
is reached, and causes receive buffer slots not being recycled.
This patch implements batching of receive completions. It also prevents
dropping receive completions due to full ring buffer.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c0b558e5a393b77d2fe53335b5e07ca0e77178f8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: Add handler for physical link speed change

BugLink: http://bugs.launchpad.net/bugs/1650059
On Hyper-V host 2016 and later, VMs gets an event message of the physical
link speed when vSwitch is changed. This patch handles this message, so
the updated link speed can be reported by ethtool.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7f5d5af0b2f859f09b3dcb16a00b800fa48d9288)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

hv_netvsc: Add query for initial physical link speed

BugLink: http://bugs.launchpad.net/bugs/1650059
The physical link speed value will be reported by ethtool command.
The real speed is available from Windows 2016 host or later.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit b37879e6ca7079943542048da37f00a386c30cee)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

memory-hotplug: add automatic onlining policy for the newly added memory

BugLink: http://bugs.launchpad.net/bugs/1650059
Currently, all newly added memory blocks remain in 'offline' state
unless someone onlines them, some linux distributions carry special udev
rules like:

  SUBSYSTEM=="memory", ACTION=="add", ATTR{state}=="offline", ATTR{state}="online"

to make this happen automatically.  This is not a great solution for
virtual machines where memory hotplug is being used to address high
memory pressure situations as such onlining is slow and a userspace
process doing this (udev) has a chance of being killed by the OOM killer
as it will probably require to allocate some memory.

Introduce default policy for the newly added memory blocks in
/sys/devices/system/memory/auto_online_blocks file with two possible
values: "offline" which preserves the current behavior and "online"
which causes all newly added memory blocks to go online as soon as
they're added.  The default is "offline".

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Reviewed-by: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Daniel Kiper <daniel.kiper@oracle.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Tang Chen <tangchen@cn.fujitsu.com>
Cc: David Vrabel <david.vrabel@citrix.com>
Acked-by: David Rientjes <rientjes@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: Xishi Qiu <qiuxishi@huawei.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: Igor Mammedov <imammedo@redhat.com>
Cc: Kay Sievers <kay@vrfy.org>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 31bc3858ea3ebcc3157b3f5f0e624c5962f5a7a6)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Brad Figg <brad.figg@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ibmveth: calculate gso_segs for large packets

BugLink: http://bugs.launchpad.net/bugs/1655420
Include calculations to compute the number of segments
that comprise an aggregated large packet.

Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Jonathan Maxwell <jmaxwell37@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 94acf164dc8f1184e8d0737be7125134c2701dbe)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>

ibmveth: set correct gso_size and gso_type

BugLink: http://bugs.launchpad.net/bugs/1655420
This patch is based on an earlier one submitted
by Jon Maxwell with the following commit message:

"We recently encountered a bug where a few customers using ibmveth on the
same LPAR hit an issue where a TCP session hung when large receive was
enabled. Closer analysis revealed that the session was stuck because the
one side was advertising a zero window repeatedly.

We narrowed this down to the fact the ibmveth driver did not set gso_size
which is translated by TCP into the MSS later up the stack. The MSS is
used to calculate the TCP window size and as that was abnormally large,
it was calculating a zero window, even although the sockets receive buffer
was completely empty."

We rely on the Virtual I/O Server partition in a pseries
environment to provide the MSS through the TCP header checksum
field. The stipulation is that users should not disable checksum
offloading if rx packet aggregation is enabled through VIOS.

Some firmware offerings provide the MSS in the RX buffer.
This is signalled by a bit in the RX queue descriptor.

Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Reviewed-by: Pradeep Satyanarayana <pradeeps@linux.vnet.ibm.com>
Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Reviewed-by: Jonathan Maxwell <jmaxwell37@gmail.com>
Reviewed-by: David Dai <zdai@us.ibm.com>
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 7b5967389f5a8dfb9d32843830f5e2717e20995d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Acked-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com>
Acked-by: Seth Forshee <seth.forshee@canonical.com>
Signed-off-by: Luis Henriques <luis.henriques@canonical.com>