]> git.proxmox.com Git - mirror_ubuntu-zesty-kernel.git/log
mirror_ubuntu-zesty-kernel.git
7 years agoDrivers: hv: restore hypervcall page cleanup before kexec
Vitaly Kuznetsov [Sat, 28 Jan 2017 19:37:14 +0000 (12:37 -0700)]
Drivers: hv: restore hypervcall page cleanup before kexec

BugLink: http://bugs.launchpad.net/bugs/1676635
We need to cleanup the hypercall page before doing kexec/kdump or the new
kernel may crash if it tries to use it. Reuse the now-empty hv_cleanup
function renaming it to hyperv_cleanup and moving to the arch specific
code.

Fixes: 8730046c1498 ("Drivers: hv vmbus: Move Hypercall page setup out of common code")
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d6f3609d2b4c6d0eec01f398cb685e50da3e6013)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agohv_util: switch to using timespec64
Vitaly Kuznetsov [Sat, 28 Jan 2017 19:37:13 +0000 (12:37 -0700)]
hv_util: switch to using timespec64

BugLink: http://bugs.launchpad.net/bugs/1676635
do_settimeofday() is deprecated, use do_settimeofday64() instead.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Acked-by: John Stultz <john.stultz@linaro.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 17244623a4c0f68d3f02c9c74d9b6ae259425826)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Cleanup hyperv_vmbus.h
K. Y. Srinivasan [Thu, 19 Jan 2017 18:51:59 +0000 (11:51 -0700)]
Drivers: hv: vmbus: Cleanup hyperv_vmbus.h

BugLink: http://bugs.launchpad.net/bugs/1676635
Get rid of all unused definitions.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8e27a236312c4ab6dc8dbd303552b771d3569cf1)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Define an APIs to manage interrupt state
K. Y. Srinivasan [Thu, 19 Jan 2017 18:51:58 +0000 (11:51 -0700)]
Drivers: hv: vmbus: Define an APIs to manage interrupt state

BugLink: http://bugs.launchpad.net/bugs/1676635
As part of cleaning up architecture specific code, define APIs
to manage interrupt state.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 37e11d5c7052a5ca55ef807731c75218ea341b4c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Define an API to retrieve virtual processor index
K. Y. Srinivasan [Thu, 19 Jan 2017 18:51:57 +0000 (11:51 -0700)]
Drivers: hv: vmbus: Define an API to retrieve virtual processor index

BugLink: http://bugs.launchpad.net/bugs/1676635
As part of cleaning up architecture specific code, define an API
to retrieve the virtual procesor index.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 7297ff0ca9db7e2d830841035b95d8b94b529142)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Define APIs to manipulate the synthetic interrupt controller
K. Y. Srinivasan [Thu, 19 Jan 2017 18:51:56 +0000 (11:51 -0700)]
Drivers: hv: vmbus: Define APIs to manipulate the synthetic interrupt controller

BugLink: http://bugs.launchpad.net/bugs/1676635
As part of cleaning up architecture specific code, define APIs
to manipulate the interrupt controller state.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 06d1d98a839f196e94cb726008fb2118e430f356)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Define APIs to manipulate the event page
K. Y. Srinivasan [Thu, 19 Jan 2017 18:51:55 +0000 (11:51 -0700)]
Drivers: hv: vmbus: Define APIs to manipulate the event page

BugLink: http://bugs.launchpad.net/bugs/1676635
As part of cleaning up architecture specific code, define APIs
to manipulate the event page.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8e307bf82d76ab02e95a00d132d926f04db6ccab)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Define APIs to manipulate the message page
K. Y. Srinivasan [Thu, 19 Jan 2017 18:51:54 +0000 (11:51 -0700)]
Drivers: hv: vmbus: Define APIs to manipulate the message page

BugLink: http://bugs.launchpad.net/bugs/1676635
As part of cleaning up architecture specific code, define APIs
to manipulate the message page.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 155e4a2f28a59e5344dfa7c5d003161fe59a5bf2)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Get rid of an unsused variable
K. Y. Srinivasan [Thu, 19 Jan 2017 18:51:53 +0000 (11:51 -0700)]
Drivers: hv: vmbus: Get rid of an unsused variable

BugLink: http://bugs.launchpad.net/bugs/1676635
The version variable while it is initialized is not used;
get rid of it.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d383877db60bcc7fd02d1051a90e078d731dfb59)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: util: Use hv_get_current_tick() to get current tick
K. Y. Srinivasan [Thu, 19 Jan 2017 18:51:52 +0000 (11:51 -0700)]
Drivers: hv: util: Use hv_get_current_tick() to get current tick

BugLink: http://bugs.launchpad.net/bugs/1676635
As part of the effort to interact with Hyper-V in an instruction set
architecture independent way, use the new API to get the current
tick.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 305f7549c9298247723c255baddb7a54b4e63050)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Restructure the clockevents code
K. Y. Srinivasan [Thu, 19 Jan 2017 18:51:51 +0000 (11:51 -0700)]
Drivers: hv: vmbus: Restructure the clockevents code

BugLink: http://bugs.launchpad.net/bugs/1676635
Move the relevant code that programs the hypervisor to an architecture
specific file.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d5116b4091ecca271c249ede43a49c1245920558)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Move the code to signal end of message
K. Y. Srinivasan [Thu, 19 Jan 2017 18:51:50 +0000 (11:51 -0700)]
Drivers: hv: vmbus: Move the code to signal end of message

BugLink: http://bugs.launchpad.net/bugs/1676635
As part of the effort to separate out architecture specific code, move the
code for signaling end of message.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e810e48c0c9a1a1ebb90cfe966bce6dc80ce08e7)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Move the check for hypercall page setup
K. Y. Srinivasan [Thu, 19 Jan 2017 18:51:49 +0000 (11:51 -0700)]
Drivers: hv: vmbus: Move the check for hypercall page setup

BugLink: http://bugs.launchpad.net/bugs/1676635
As part of the effort to separate out architecture specific code, move the
check for detecting if the hypercall page is setup.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 73638cddaad861a5ebb2b119d8b318d4bded8f8d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Move the crash notification function
K. Y. Srinivasan [Thu, 19 Jan 2017 18:51:48 +0000 (11:51 -0700)]
Drivers: hv: vmbus: Move the crash notification function

BugLink: http://bugs.launchpad.net/bugs/1676635
As part of the effort to separate out architecture specific code, move the
crash notification function.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d058fa7e98ff01a4b4750a2210fc19906db3cbe1)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Move the extracting of Hypervisor version information
K. Y. Srinivasan [Thu, 19 Jan 2017 18:51:47 +0000 (11:51 -0700)]
Drivers: hv: vmbus: Move the extracting of Hypervisor version information

BugLink: http://bugs.launchpad.net/bugs/1676635
As part of the effort to separate out architecture specific code,
extract hypervisor version information in an architecture specific
file.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8de8af7e0873c4fdac2205327dff922819e16657)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Consolidate all Hyper-V specific clocksource code
K. Y. Srinivasan [Thu, 19 Jan 2017 18:51:46 +0000 (11:51 -0700)]
Drivers: hv: vmbus: Consolidate all Hyper-V specific clocksource code

BugLink: http://bugs.launchpad.net/bugs/1676635
As part of the effort to separate out architecture specific code,
consolidate all Hyper-V specific clocksource code to an architecture
specific code.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 63ed4e0c67df332681ebfef6eca6852da28d6300)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Move Hypercall invocation code out of common code
K. Y. Srinivasan [Wed, 18 Jan 2017 23:45:03 +0000 (16:45 -0700)]
Drivers: hv: vmbus: Move Hypercall invocation code out of common code

BugLink: http://bugs.launchpad.net/bugs/1676635
As part of the effort to separate out architecture specific code, move the
hypercall invocation code to an architecture specific file.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 6ab42a66d2cc10afefea9f9e5d9a5ad5a836d254)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv vmbus: Move Hypercall page setup out of common code
K. Y. Srinivasan [Wed, 18 Jan 2017 23:45:02 +0000 (16:45 -0700)]
Drivers: hv vmbus: Move Hypercall page setup out of common code

BugLink: http://bugs.launchpad.net/bugs/1676635
As part of the effort to separate out architecture specific code, move the
hypercall page setup to an architecture specific file.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 8730046c1498e8fb8c9a124789893944e8ce8220)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Move the definition of generate_guest_id()
K. Y. Srinivasan [Wed, 18 Jan 2017 23:45:01 +0000 (16:45 -0700)]
Drivers: hv: vmbus: Move the definition of generate_guest_id()

BugLink: http://bugs.launchpad.net/bugs/1676635
As part of the effort to separate out architecture specific code, move the
definition of generate_guest_id() to x86 specific header file.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 352c9624242d5836ad8a960826183011367871a4)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Move the definition of hv_x64_msr_hypercall_contents
K. Y. Srinivasan [Wed, 18 Jan 2017 23:45:00 +0000 (16:45 -0700)]
Drivers: hv: vmbus: Move the definition of hv_x64_msr_hypercall_contents

BugLink: http://bugs.launchpad.net/bugs/1676635
As part of the effort to separate out architecture specific code, move the
definition of hv_x64_msr_hypercall_contents to x86 specific header file.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3f646ed70ccd1c4e5c1263d2922247d28c8e08f0)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: util: Backup: Fix a rescind processing issue
K. Y. Srinivasan [Fri, 23 Dec 2016 00:54:03 +0000 (16:54 -0800)]
Drivers: hv: util: Backup: Fix a rescind processing issue

BugLink: http://bugs.launchpad.net/bugs/1676635
VSS may use a char device to support the communication between
the user level daemon and the driver. When the VSS channel is rescinded
we need to make sure that the char device is fully cleaned up before
we can process a new VSS offer from the host. Implement this logic.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit d77044d142e960f7b5f814a91ecb8bcf86aa552c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: util: Fcopy: Fix a rescind processing issue
K. Y. Srinivasan [Fri, 23 Dec 2016 00:54:02 +0000 (16:54 -0800)]
Drivers: hv: util: Fcopy: Fix a rescind processing issue

BugLink: http://bugs.launchpad.net/bugs/1676635
Fcopy may use a char device to support the communication between
the user level daemon and the driver. When the Fcopy channel is rescinded
we need to make sure that the char device is fully cleaned up before
we can process a new Fcopy offer from the host. Implement this logic.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 20951c7535b5e6af46bc37b7142105f716df739c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: util: kvp: Fix a rescind processing issue
K. Y. Srinivasan [Fri, 23 Dec 2016 00:54:01 +0000 (16:54 -0800)]
Drivers: hv: util: kvp: Fix a rescind processing issue

BugLink: http://bugs.launchpad.net/bugs/1676635
KVP may use a char device to support the communication between
the user level daemon and the driver. When the KVP channel is rescinded
we need to make sure that the char device is fully cleaned up before
we can process a new KVP offer from the host. Implement this logic.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 5a66fecbf6aa528e375cbebccb1061cc58d80c84)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Fix a rescind handling bug
K. Y. Srinivasan [Fri, 23 Dec 2016 00:54:00 +0000 (16:54 -0800)]
Drivers: hv: vmbus: Fix a rescind handling bug

BugLink: http://bugs.launchpad.net/bugs/1676635
The host can rescind a channel that has been offered to the
guest and once the channel is rescinded, the host does not
respond to any requests on that channel. Deal with the case where
the guest may be blocked waiting for a response from the host.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit ccb61f8a99e6c29df4fb96a65dad4fad740d5be9)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agohv: make CPU offlining prevention fine-grained
Vitaly Kuznetsov [Wed, 7 Dec 2016 22:53:12 +0000 (14:53 -0800)]
hv: make CPU offlining prevention fine-grained

BugLink: http://bugs.launchpad.net/bugs/1676635
Since commit e513229b4c38 ("Drivers: hv: vmbus: prevent cpu offlining on
newer hypervisors") cpu offlining was disabled. It is still true that we
can't offline CPUs which have VMBus channels bound to them but we may have
'free' CPUs (e.v. we booted with maxcpus= parameter and onlined CPUs after
VMBus was initialized), these CPUs may be disabled without issues.

In future, we may even allow closing CPUs which have only sub-channels
assinged to them by closing these sub-channels. All devices will continue
to work.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 523b94087078f7f5ac10b7d9cd04277927031c39)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agohv: switch to cpuhp state machine for synic init/cleanup
Vitaly Kuznetsov [Wed, 7 Dec 2016 22:53:11 +0000 (14:53 -0800)]
hv: switch to cpuhp state machine for synic init/cleanup

BugLink: http://bugs.launchpad.net/bugs/1676635
To make it possible to online/offline CPUs switch to cpuhp infrastructure
for doing hv_synic_init()/hv_synic_cleanup().

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 76d36ab79820430f73c584673aef10ba2446fced)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Prevent sending data on a rescinded channel
K. Y. Srinivasan [Wed, 7 Dec 2016 09:16:28 +0000 (01:16 -0800)]
Drivers: hv: vmbus: Prevent sending data on a rescinded channel

BugLink: http://bugs.launchpad.net/bugs/1676635
After the channel is rescinded, the host does not read from the rescinded channel.
Fail writes to a channel that has already been rescinded. If we permit writes on a
rescinded channel, since the host will not respond we will have situations where
we will be unable to unload vmbus drivers that cannot have any outstanding requests
to the host at the point they are unoaded.

Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <Stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit e7e97dd8b77ee7366f2f8c70a033bf5fa05ec2e0)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agohv: don't reset hv_context.tsc_page on crash
Vitaly Kuznetsov [Wed, 7 Dec 2016 09:16:27 +0000 (01:16 -0800)]
hv: don't reset hv_context.tsc_page on crash

BugLink: http://bugs.launchpad.net/bugs/1676635
It may happen that secondary CPUs are still alive and resetting
hv_context.tsc_page will cause a consequent crash in read_hv_clock_tsc()
as we don't check for it being not NULL there. It is safe as we're not
freeing this page anyways.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 56ef6718a1d8d77745033c5291e025ce18504159)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agohv: init percpu_list in hv_synic_alloc()
Vitaly Kuznetsov [Wed, 7 Dec 2016 09:16:26 +0000 (01:16 -0800)]
hv: init percpu_list in hv_synic_alloc()

BugLink: http://bugs.launchpad.net/bugs/1676635
Initializing hv_context.percpu_list in hv_synic_alloc() helps to prevent a
crash in percpu_channel_enq() when not all CPUs were online during
initialization and it naturally belongs there.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 3c7630d35009e6635e5b58d62de554fd5b6db5df)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agohv: allocate synic pages for all present CPUs
Vitaly Kuznetsov [Wed, 7 Dec 2016 09:16:25 +0000 (01:16 -0800)]
hv: allocate synic pages for all present CPUs

BugLink: http://bugs.launchpad.net/bugs/1676635
It may happen that not all CPUs are online when we do hv_synic_alloc() and
in case more CPUs come online later we may try accessing these allocated
structures.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit 421b8f20d3c381b215f988b42428f56fc3b82405)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoDrivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()
Vitaly Kuznetsov [Wed, 7 Dec 2016 09:16:24 +0000 (01:16 -0800)]
Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()

BugLink: http://bugs.launchpad.net/bugs/1676635
DoS protection conditions were altered in WS2016 and now it's easy to get
-EAGAIN returned from vmbus_post_msg() (e.g. when we try changing MTU on a
netvsc device in a loop). All vmbus_post_msg() callers don't retry the
operation and we usually end up with a non-functional device or crash.

While host's DoS protection conditions are unknown to me my tests show that
it can take up to 10 seconds before the message is sent so doing udelay()
is not an option, we really need to sleep. Almost all vmbus_post_msg()
callers are ready to sleep but there is one special case:
vmbus_initiate_unload() which can be called from interrupt/NMI context and
we can't sleep there. I'm also not sure about the lonely
vmbus_send_tl_connect_request() which has no in-tree users but its external
users are most likely waiting for the host to reply so sleeping there is
also appropriate.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
(cherry picked from commit c0bb03924f1a80e7f65900e36c8e6b3dc167c5f8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoRevert "UBUNTU: SAUCE: (no-up) hv: Supply vendor ID and package ABI"
Tim Gardner [Tue, 28 Mar 2017 20:23:25 +0000 (14:23 -0600)]
Revert "UBUNTU: SAUCE: (no-up) hv: Supply vendor ID and package ABI"

BugLink: http://bugs.launchpad.net/bugs/1676635
This reverts commit 2400c988c0b5da90b7035bfce63f1105e66b3423.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoRevert "drivers: hv: Turn off write permission on the hypercall page"
Tim Gardner [Tue, 28 Mar 2017 20:22:06 +0000 (14:22 -0600)]
Revert "drivers: hv: Turn off write permission on the hypercall page"

BugLink: http://bugs.launchpad.net/bugs/1676635
This reverts commit 71a5a0559d132a6bb20e63e8e9c62fbd22666137.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoRevert "Drivers: hv: util: Backup: Fix a rescind processing issue"
Tim Gardner [Tue, 28 Mar 2017 20:21:56 +0000 (14:21 -0600)]
Revert "Drivers: hv: util: Backup: Fix a rescind processing issue"

BugLink: http://bugs.launchpad.net/bugs/1676635
This reverts commit 8da12e10a191c62830c277f35a4fa5403eb1bcd2.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoRevert "Drivers: hv: util: Fcopy: Fix a rescind processing issue"
Tim Gardner [Tue, 28 Mar 2017 20:21:46 +0000 (14:21 -0600)]
Revert "Drivers: hv: util: Fcopy: Fix a rescind processing issue"

BugLink: http://bugs.launchpad.net/bugs/1676635
This reverts commit c9d4b38c5c386cab269664832fdcd9d6b878f998.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoRevert "Drivers: hv: util: kvp: Fix a rescind processing issue"
Tim Gardner [Tue, 28 Mar 2017 20:21:37 +0000 (14:21 -0600)]
Revert "Drivers: hv: util: kvp: Fix a rescind processing issue"

BugLink: http://bugs.launchpad.net/bugs/1676635
This reverts commit ca0b5897e11ebcd15770561849f45f2c7a980d85.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoRevert "Drivers: hv: vmbus: Fix a rescind handling bug"
Tim Gardner [Tue, 28 Mar 2017 20:21:25 +0000 (14:21 -0600)]
Revert "Drivers: hv: vmbus: Fix a rescind handling bug"

BugLink: http://bugs.launchpad.net/bugs/1676635
This reverts commit 1b7d44c16f61522ee0c7b79d6f666a89c3244a5a.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoRevert "Drivers: hv: vmbus: Prevent sending data on a rescinded channel"
Tim Gardner [Tue, 28 Mar 2017 20:21:15 +0000 (14:21 -0600)]
Revert "Drivers: hv: vmbus: Prevent sending data on a rescinded channel"

BugLink: http://bugs.launchpad.net/bugs/1676635
This reverts commit 81afb2c5dfd49aab0f6a3240c83d975416b53245.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoRevert "hv: init percpu_list in hv_synic_alloc()"
Tim Gardner [Tue, 28 Mar 2017 20:21:04 +0000 (14:21 -0600)]
Revert "hv: init percpu_list in hv_synic_alloc()"

BugLink: http://bugs.launchpad.net/bugs/1676635
This reverts commit db60c8d6cc34f9966be31c574ec20d577a6730a2.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoRevert "hv: allocate synic pages for all present CPUs"
Tim Gardner [Tue, 28 Mar 2017 20:20:54 +0000 (14:20 -0600)]
Revert "hv: allocate synic pages for all present CPUs"

BugLink: http://bugs.launchpad.net/bugs/1676635
This reverts commit 9b66ff22466f0f566fe688a0e77d03a4e7fb11de.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoRevert "Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()"
Tim Gardner [Tue, 28 Mar 2017 20:20:42 +0000 (14:20 -0600)]
Revert "Drivers: hv: vmbus: Raise retry/wait limits in vmbus_post_msg()"

BugLink: http://bugs.launchpad.net/bugs/1676635
This reverts commit 816725f684dd5d018c4314f79797d0ea8eccdd9b.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoRevert "hv: don't reset hv_context.tsc_page on crash"
Tim Gardner [Tue, 28 Mar 2017 20:20:30 +0000 (14:20 -0600)]
Revert "hv: don't reset hv_context.tsc_page on crash"

BugLink: http://bugs.launchpad.net/bugs/1676635
This reverts commit e7a2222fc8a0d23d0e6020f04cccc63ff545f9bf.

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agopowerpc/64: Use optimized checksum routines on little-endian
Paul Mackerras [Thu, 3 Nov 2016 05:15:42 +0000 (16:15 +1100)]
powerpc/64: Use optimized checksum routines on little-endian

BugLink: http://bugs.launchpad.net/bugs/1670247
Currently we have optimized hand-coded assembly checksum routines for
big-endian 64-bit systems, but for little-endian we use the generic C
routines. This modifies the optimized routines to work for
little-endian. With this, we no longer need to enable
CONFIG_GENERIC_CSUM. This also fixes a couple of comments in
checksum_64.S so they accurately reflect what the associated instruction
does.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Use the more common __BIG_ENDIAN__]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit d4fde568a34a93897dfb9ae64cfe9dda9d5c908c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agopowerpc/64: Fix checksum folding in csum_tcpudp_nofold and ip_fast_csum_nofold
Paul Mackerras [Thu, 3 Nov 2016 05:10:55 +0000 (16:10 +1100)]
powerpc/64: Fix checksum folding in csum_tcpudp_nofold and ip_fast_csum_nofold

BugLink: http://bugs.launchpad.net/bugs/1670247
These functions compute an IP checksum by computing a 64-bit sum and
folding it to 32 bits (the "nofold" in their names refers to folding
down to 16 bits).  However, doing (u32) (s + (s >> 32)) is not
sufficient to fold a 64-bit sum to 32 bits correctly.  The addition
can produce a carry out from bit 31, which needs to be added in to
the sum to produce the correct result.

To fix this, we copy the from64to32() function from lib/checksum.c
and use that.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit b492f7e4e07a28e706db26cf4943bb0911435426)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoscsi: storvsc: Workaround for virtual DVD SCSI version
Stephen Hemminger [Tue, 28 Mar 2017 16:40:17 +0000 (12:40 -0400)]
scsi: storvsc: Workaround for virtual DVD SCSI version

BugLink: http://bugs.launchpad.net/bugs/1674635
Hyper-V host emulation of SCSI for virtual DVD device reports SCSI
version 0 (UNKNOWN) but is still capable of supporting REPORTLUN.

Without this patch, a GEN2 Linux guest on Hyper-V will not boot 4.11
successfully with virtual DVD ROM device. What happens is that the SCSI
scan process falls back to doing sequential probing by INQUIRY.  But the
storvsc driver has a previous workaround that masks/blocks all errors
reports from INQUIRY (or MODE_SENSE) commands.  This workaround causes
the scan to then populate a full set of bogus LUN's on the target and
then sends kernel spinning off into a death spiral doing block reads on
the non-existent LUNs.

By setting the correct blacklist flags, the target with the DVD device
is scanned with REPORTLUN and that works correctly.

Patch needs to go in current 4.11, it is safe but not necessary in older
kernels.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Reviewed-by: K. Y. Srinivasan <kys@microsoft.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit f1c635b439a5c01776fe3a25b1e2dc546ea82e6f)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agopowerpc/powernv: Remove separate entry for OPAL real mode calls
Benjamin Herrenschmidt [Tue, 28 Mar 2017 16:54:45 +0000 (13:54 -0300)]
powerpc/powernv: Remove separate entry for OPAL real mode calls

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
All entry points already read the MSR so they can easily do
the right thing.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit ab9bad0ead9ab179ace09988a3f1cfca122eb7c2)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agopowerpc/powernv: Initialise nest mmu
Alistair Popple [Tue, 28 Mar 2017 16:54:44 +0000 (13:54 -0300)]
powerpc/powernv: Initialise nest mmu

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
POWER9 contains an off core mmu called the nest mmu (NMMU). This is
used by other hardware units on the chip to translate virtual
addresses into real addresses. The unit attempting an address
translation provides the majority of the context required for the
translation request except for the base address of the partition table
(ie. the PTCR) which needs to be programmed into the NMMU.

This patch adds a call to OPAL to set the PTCR for the nest mmu in
opal_init().

Signed-off-by: Alistair Popple <alistair@popple.id.au>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 1d0761d2557d1540727723e4f05395d53321d555)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book 3S: XICS: Don't lock twice when checking for resend
Li Zhong [Tue, 28 Mar 2017 16:54:43 +0000 (13:54 -0300)]
KVM: PPC: Book 3S: XICS: Don't lock twice when checking for resend

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
This patch improves the code that takes lock twice to check the resend flag
and do the actual resending, by checking the resend flag locklessly, and
add a boolean parameter check_resend to icp_[rm_]deliver_irq(), so the
resend flag can be checked in the lock when doing the delivery.

We need make sure when we clear the ics's bit in the icp's resend_map, we
don't miss the resend flag of the irqs that set the bit. It could be
ordered through the barrier in test_and_clear_bit(), and a newly added
wmb between setting irq's resend flag, and icp's resend_map.

Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
(cherry picked from commit 21acd0e4df04f02176e773468658c3cebff096bb)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agopowerpc: Update to new option-vector-5 format for CAS
Suraj Jitindar Singh [Tue, 28 Mar 2017 16:54:42 +0000 (13:54 -0300)]
powerpc: Update to new option-vector-5 format for CAS

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
On POWER9 the ibm,client-architecture-support (CAS) negotiation process
has been updated to change how the host to guest negotiation is done for
the new hash/radix mmu as well as the nest mmu, process tables and guest
translation shootdown (GTSE).

This is documented in the unreleased PAPR ACR "CAS option vector
additions for P9".

The host tells the guest which options it supports in
ibm,arch-vec-5-platform-support. The guest then chooses a subset of these
to request in the CAS call and these are agreed to in the
ibm,architecture-vec-5 property of the chosen node.

Thus we read ibm,arch-vec-5-platform-support and make our selection before
calling CAS. We then parse the ibm,architecture-vec-5 property of the
chosen node to check whether we should run as hash or radix.

ibm,arch-vec-5-platform-support format:

index value pairs: <index, val> ... <index, val>

index: Option vector 5 byte number
val:   Some representation of supported values

Signed-off-by: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
[mpe: Don't print about unknown options, be consistent with OV5_FEAT]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 014d02cbf16b3106dc8e93281d2a9c189751ed5e)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agopowerpc/64: Invalidate process table caching after setting process table
Paul Mackerras [Tue, 28 Mar 2017 16:54:41 +0000 (13:54 -0300)]
powerpc/64: Invalidate process table caching after setting process table

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
The POWER9 MMU reads and caches entries from the process table.
When we kexec from one kernel to another, the second kernel sets
its process table pointer but doesn't currently do anything to
make the CPU invalidate any cached entries from the old process table.
This adds a tlbie (TLB invalidate entry) instruction with parameters
to invalidate caching of the process table after the new process
table is installed.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 7a70d7288c926ae88e0c773fbb506aa374e99c2d)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book 3S: Fix error return in kvm_vm_ioctl_create_spapr_tce()
Wei Yongjun [Tue, 28 Mar 2017 16:54:40 +0000 (13:54 -0300)]
KVM: PPC: Book 3S: Fix error return in kvm_vm_ioctl_create_spapr_tce()

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
Fix to return error code -ENOMEM from the memory alloc error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
(cherry picked from commit 5982f0849e08fe4e4e7df5e345c4539ce9780b1b)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S HV: Don't try to signal cpu -1
Paul Mackerras [Tue, 28 Mar 2017 16:54:39 +0000 (13:54 -0300)]
KVM: PPC: Book3S HV: Don't try to signal cpu -1

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
If the target vcpu for kvmppc_fast_vcpu_kick_hv() is not running on
any CPU, then we will have vcpu->arch.thread_cpu == -1, and as it
happens, kvmppc_fast_vcpu_kick_hv will call kvmppc_ipi_thread with
-1 as the cpu argument.  Although this is not meaningful, in the past,
before commit 1704a81ccebc ("KVM: PPC: Book3S HV: Use msgsnd for IPIs
to other cores on POWER9", 2016-11-18), it was harmless because CPU
-1 is not in the same core as any real CPU thread.  On a POWER9,
however, we don't do the "same core" check, so we were trying to
do a msgsnd to thread -1, which is invalid.  To avoid this, we add
a check to see that vcpu->arch.thread_cpu is >= 0 before calling
kvmppc_ipi_thread() with it.  Since vcpu->arch.thread_vcpu can change
asynchronously, we use READ_ONCE to ensure that the value we check is
the same value that we use as the argument to kvmppc_ipi_thread().

Fixes: 1704a81ccebc ("KVM: PPC: Book3S HV: Use msgsnd for IPIs to other cores on POWER9")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
(cherry picked from commit 3deda5e50c893be38c1b6b3a73f8f8fb5560baa4)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S HV: Don't use ASDR for real-mode HPT faults on POWER9
Paul Mackerras [Tue, 28 Mar 2017 16:54:38 +0000 (13:54 -0300)]
KVM: PPC: Book3S HV: Don't use ASDR for real-mode HPT faults on POWER9

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
In HPT mode on POWER9, the ASDR register is supposed to record
segment information for hypervisor page faults.  It turns out that
POWER9 DD1 does not record the page size information in the ASDR
for faults in guest real mode.  We have the necessary information
in memory already, so by moving the checks for real mode that already
existed, we can use the in-memory copy.  Since a load is likely to
be faster than reading an SPR, we do this unconditionally (not just
for POWER9 DD1).

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
(cherry picked from commit 4e5acdc23a3dcbd6ad6dc93a9783dd9c838987c8)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S HV: Fix software walk of guest process page tables
Paul Mackerras [Tue, 28 Mar 2017 16:54:37 +0000 (13:54 -0300)]
KVM: PPC: Book3S HV: Fix software walk of guest process page tables

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
This fixes some bugs in the code that walks the guest's page tables.
These bugs cause MMIO emulation to fail whenever the guest is in
virtial mode (MMU on), leading to the guest hanging if it tried to
access a virtio device.

The first bug was that when reading the guest's process table, we were
using the whole of arch->process_table, not just the field that contains
the process table base address.  The second bug was that the mask used
when reading the process table entry to get the radix tree base address,
RPDB_MASK, had the wrong value.

Fixes: 9e04ba69beec ("KVM: PPC: Book3S HV: Add basic infrastructure for radix guests")
Fixes: e99833448c5f ("powerpc/mm/radix: Add partition table format & callback")
Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
(cherry picked from commit 70cd4c10b290dd77fff6dc702a9a2c8c679df121)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agopowerpc/64: CONFIG_RELOCATABLE support for hmi interrupts
Nicholas Piggin [Tue, 28 Mar 2017 16:54:36 +0000 (13:54 -0300)]
powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
The branch from hmi_exception_early to hmi_exception_realmode must use
a "relocatable-style" branch, because it is branching from unrelocated
exception code to beyond __end_interrupts.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 2337d207288f163e10bd8d4d7eeb0c1c75046a0c)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S HV: Enable radix guest support
Paul Mackerras [Tue, 28 Mar 2017 16:54:35 +0000 (13:54 -0300)]
KVM: PPC: Book3S HV: Enable radix guest support

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
This adds a few last pieces of the support for radix guests:

* Implement the backends for the KVM_PPC_CONFIGURE_V3_MMU and
  KVM_PPC_GET_RMMU_INFO ioctls for radix guests

* On POWER9, allow secondary threads to be on/off-lined while guests
  are running.

* Set up LPCR and the partition table entry for radix guests.

* Don't allocate the rmap array in the kvm_memory_slot structure
  on radix.

* Don't try to initialize the HPT for radix guests, since they don't
  have an HPT.

* Take out the code that prevents the HV KVM module from
  initializing on radix hosts.

At this stage, we only support radix guests if the host is running
in radix mode, and only support HPT guests if the host is running in
HPT mode.  Thus a guest cannot switch from one mode to the other,
which enables some simplifications.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 8cf4ecc0ca9bd9bdc9b4ca0a99f7445a1e74afed)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S HV: Invalidate ERAT on guest entry/exit for POWER9 DD1
Paul Mackerras [Tue, 28 Mar 2017 16:54:34 +0000 (13:54 -0300)]
KVM: PPC: Book3S HV: Invalidate ERAT on guest entry/exit for POWER9 DD1

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
On POWER9 DD1, we need to invalidate the ERAT (effective to real
address translation cache) when changing the PIDR register, which
we do as part of guest entry and exit.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit f11f6f79b606fb54bb388d0ea652ed889b2fdf86)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S HV: Allow guest exit path to have MMU on
Paul Mackerras [Tue, 28 Mar 2017 16:54:33 +0000 (13:54 -0300)]
KVM: PPC: Book3S HV: Allow guest exit path to have MMU on

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
If we allow LPCR[AIL] to be set for radix guests, then interrupts from
the guest to the host can be delivered by the hardware with relocation
on, and thus the code path starting at kvmppc_interrupt_hv can be
executed in virtual mode (MMU on) for radix guests (previously it was
only ever executed in real mode).

Most of the code is indifferent to whether the MMU is on or off, but
the calls to OPAL that use the real-mode OPAL entry code need to
be switched to use the virtual-mode code instead.  The affected
calls are the calls to the OPAL XICS emulation functions in
kvmppc_read_one_intr() and related functions.  We test the MSR[IR]
bit to detect whether we are in real or virtual mode, and call the
opal_rm_* or opal_* function as appropriate.

The other place that depends on the MMU being off is the optimization
where the guest exit code jumps to the external interrupt vector or
hypervisor doorbell interrupt vector, or returns to its caller (which
is __kvmppc_vcore_entry).  If the MMU is on and we are returning to
the caller, then we don't need to use an rfid instruction since the
MMU is already on; a simple blr suffices.  If there is an external
or hypervisor doorbell interrupt to handle, we branch to the
relocation-on version of the interrupt vector.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 53af3ba2e8195f504d6a3a0667ccb5e7d4c57599)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S HV: Invalidate TLB on radix guest vcpu movement
Paul Mackerras [Tue, 28 Mar 2017 16:54:32 +0000 (13:54 -0300)]
KVM: PPC: Book3S HV: Invalidate TLB on radix guest vcpu movement

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
With radix, the guest can do TLB invalidations itself using the tlbie
(global) and tlbiel (local) TLB invalidation instructions.  Linux guests
use local TLB invalidations for translations that have only ever been
accessed on one vcpu.  However, that doesn't mean that the translations
have only been accessed on one physical cpu (pcpu) since vcpus can move
around from one pcpu to another.  Thus a tlbiel might leave behind stale
TLB entries on a pcpu where the vcpu previously ran, and if that task
then moves back to that previous pcpu, it could see those stale TLB
entries and thus access memory incorrectly.  The usual symptom of this
is random segfaults in userspace programs in the guest.

To cope with this, we detect when a vcpu is about to start executing on
a thread in a core that is a different core from the last time it
executed.  If that is the case, then we mark the core as needing a
TLB flush and then send an interrupt to any thread in the core that is
currently running a vcpu from the same guest.  This will get those vcpus
out of the guest, and the first one to re-enter the guest will do the
TLB flush.  The reason for interrupting the vcpus executing on the old
core is to cope with the following scenario:

CPU 0 CPU 1 CPU 4
(core 0) (core 0) (core 1)

VCPU 0 runs task X      VCPU 1 runs
core 0 TLB gets
entries from task X
VCPU 0 moves to CPU 4
VCPU 0 runs task X
Unmap pages of task X
tlbiel

(still VCPU 1) task X moves to VCPU 1
task X runs
task X sees stale TLB
entries

That is, as soon as the VCPU starts executing on the new core, it
could unmap and tlbiel some page table entries, and then the task
could migrate to one of the VCPUs running on the old core and
potentially see stale TLB entries.

Since the TLB is shared between all the threads in a core, we only
use the bit of kvm->arch.need_tlb_flush corresponding to the first
thread in the core.  To ensure that we don't have a window where we
can miss a flush, this moves the clearing of the bit from before the
actual flush to after it.  This way, two threads might both do the
flush, but we prevent the situation where one thread can enter the
guest before the flush is finished.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit a29ebeaf5575d03eef178bb87c425a1e46cae1ca)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S HV: Make HPT-specific hypercalls return error in radix mode
Paul Mackerras [Tue, 28 Mar 2017 16:54:31 +0000 (13:54 -0300)]
KVM: PPC: Book3S HV: Make HPT-specific hypercalls return error in radix mode

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
If the guest is in radix mode, then it doesn't have a hashed page
table (HPT), so all of the hypercalls that manipulate the HPT can't
work and should return an error.  This adds checks to make them
return H_FUNCTION ("function not supported").

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 65dae5403a162fe6ef7cd8b2835de9d23c303891)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S HV: Implement dirty page logging for radix guests
Paul Mackerras [Tue, 28 Mar 2017 16:54:30 +0000 (13:54 -0300)]
KVM: PPC: Book3S HV: Implement dirty page logging for radix guests

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
This adds code to keep track of dirty pages when requested (that is,
when memslot->dirty_bitmap is non-NULL) for radix guests.  We use the
dirty bits in the PTEs in the second-level (partition-scoped) page
tables, together with a bitmap of pages that were dirty when their
PTE was invalidated (e.g., when the page was paged out).  This bitmap
is stored in the first half of the memslot->dirty_bitmap area, and
kvm_vm_ioctl_get_dirty_log_hv() now uses the second half for the
bitmap that gets returned to userspace.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 8f7b79b8379a85fb8dd0c3f42d9f452ec5552161)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S HV: MMU notifier callbacks for radix guests
Paul Mackerras [Tue, 28 Mar 2017 16:54:29 +0000 (13:54 -0300)]
KVM: PPC: Book3S HV: MMU notifier callbacks for radix guests

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
This adapts our implementations of the MMU notifier callbacks
(unmap_hva, unmap_hva_range, age_hva, test_age_hva, set_spte_hva)
to call radix functions when the guest is using radix.  These
implementations are much simpler than for HPT guests because we
have only one PTE to deal with, so we don't need to traverse
rmap chains.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 01756099e0a5f431bbada9693d566269acfb51f9)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S HV: Page table construction and page faults for radix guests
Paul Mackerras [Tue, 28 Mar 2017 16:54:28 +0000 (13:54 -0300)]
KVM: PPC: Book3S HV: Page table construction and page faults for radix guests

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
This adds the code to construct the second-level ("partition-scoped" in
architecturese) page tables for guests using the radix MMU.  Apart from
the PGD level, which is allocated when the guest is created, the rest
of the tree is all constructed in response to hypervisor page faults.

As well as hypervisor page faults for missing pages, we also get faults
for reference/change (RC) bits needing to be set, as well as various
other error conditions.  For now, we only set the R or C bit in the
guest page table if the same bit is set in the host PTE for the
backing page.

This code can take advantage of the guest being backed with either
transparent or ordinary 2MB huge pages, and insert 2MB page entries
into the guest page tables.  There is no support for 1GB huge pages
yet.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 5a319350a46572d073042a3194676099dd2c135d)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S HV: Modify guest entry/exit paths to handle radix guests
Paul Mackerras [Tue, 28 Mar 2017 16:54:27 +0000 (13:54 -0300)]
KVM: PPC: Book3S HV: Modify guest entry/exit paths to handle radix guests

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
This adds code to  branch around the parts that radix guests don't
need - clearing and loading the SLB with the guest SLB contents,
saving the guest SLB contents on exit, and restoring the host SLB
contents.

Since the host is now using radix, we need to save and restore the
host value for the PID register.

On hypervisor data/instruction storage interrupts, we don't do the
guest HPT lookup on radix, but just save the guest physical address
for the fault (from the ASDR register) in the vcpu struct.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit f4c51f841d2ac7d36cacb84efbc383190861f87c)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S HV: Add basic infrastructure for radix guests
Paul Mackerras [Tue, 28 Mar 2017 16:54:26 +0000 (13:54 -0300)]
KVM: PPC: Book3S HV: Add basic infrastructure for radix guests

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
This adds a field in struct kvm_arch and an inline helper to
indicate whether a guest is a radix guest or not, plus a new file
to contain the radix MMU code, which currently contains just a
translate function which knows how to traverse the guest page
tables to translate an address.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 9e04ba69beec372ddf857c700ff922e95f50b0d0)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S HV: Use ASDR for HPT guests on POWER9
Paul Mackerras [Tue, 28 Mar 2017 16:54:25 +0000 (13:54 -0300)]
KVM: PPC: Book3S HV: Use ASDR for HPT guests on POWER9

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
POWER9 adds a register called ASDR (Access Segment Descriptor
Register), which is set by hypervisor data/instruction storage
interrupts to contain the segment descriptor for the address
being accessed, assuming the guest is using HPT translation.
(For radix guests, it contains the guest real address of the
access.)

Thus, for HPT guests on POWER9, we can use this register rather
than looking up the SLB with the slbfee. instruction.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit ef8c640cb9cc865a461827b698fcc55b0ecaa600)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S HV: Set process table for HPT guests on POWER9
Paul Mackerras [Tue, 28 Mar 2017 16:54:24 +0000 (13:54 -0300)]
KVM: PPC: Book3S HV: Set process table for HPT guests on POWER9

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
This adds the implementation of the KVM_PPC_CONFIGURE_V3_MMU ioctl
for HPT guests on POWER9.  With this, we can return 1 for the
KVM_CAP_PPC_MMU_HASH_V3 capability.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 468808bd35c4aa3cf7d9fde0ebb010270038734b)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU
Paul Mackerras [Tue, 28 Mar 2017 16:54:23 +0000 (13:54 -0300)]
KVM: PPC: Book3S HV: Add userspace interfaces for POWER9 MMU

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
This adds two capabilities and two ioctls to allow userspace to
find out about and configure the POWER9 MMU in a guest.  The two
capabilities tell userspace whether KVM can support a guest using
the radix MMU, or using the hashed page table (HPT) MMU with a
process table and segment tables.  (Note that the MMUs in the
POWER9 processor cores do not use the process and segment tables
when in HPT mode, but the nest MMU does).

The KVM_PPC_CONFIGURE_V3_MMU ioctl allows userspace to specify
whether a guest will use the radix MMU or the HPT MMU, and to
specify the size and location (in guest space) of the process
table.

The KVM_PPC_GET_RMMU_INFO ioctl gives userspace information about
the radix MMU.  It returns a list of supported radix tree geometries
(base page size and number of bits indexed at each level of the
radix tree) and the encoding used to specify the various page
sizes for the TLB invalidate entry instruction.

Initially, both capabilities return 0 and the ioctls return -EINVAL,
until the necessary infrastructure for them to operate correctly
is added.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit c92701322711682de89b2bd0f32affad040b6e86)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agopowerpc/64: Allow for relocation-on interrupts from guest to host
Paul Mackerras [Tue, 28 Mar 2017 16:54:22 +0000 (13:54 -0300)]
powerpc/64: Allow for relocation-on interrupts from guest to host

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
With host and guest both using radix translation, it is feasible
for the host to take interrupts that come from the guest with
relocation on, and that is in fact what the POWER9 hardware will
do when LPCR[AIL] = 3.  All such interrupts use HSRR0/1 not SRR0/1
except for system call with LEV=1 (hcall).

Therefore this adds the KVM tests to the _HV variants of the
relocation-on interrupt handlers, and adds the KVM test to the
relocation-on system call entry point.

We also instantiate the relocation-on versions of the hypervisor
data storage and instruction interrupt handlers, since these can
occur with relocation on in radix guests.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit bc3551257af837fc603d295e59f9e32953525b98)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agopowerpc/64: Make type of partition table flush depend on partition type
Paul Mackerras [Tue, 28 Mar 2017 16:54:21 +0000 (13:54 -0300)]
powerpc/64: Make type of partition table flush depend on partition type

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
When changing a partition table entry on POWER9, we do a particular
form of the tlbie instruction which flushes all TLBs and caches of
the partition table for a given logical partition ID (LPID).
This instruction has a field in the instruction word, labelled R
(radix), which should be 1 if the partition was previously a radix
partition and 0 if it was a HPT partition.  This implements that
logic.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 16ed141677c5a1a796408e74ccd0a6f6554c3f21)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agopowerpc/64: Export pgtable_cache and pgtable_cache_add for KVM
Paul Mackerras [Tue, 28 Mar 2017 16:54:20 +0000 (13:54 -0300)]
powerpc/64: Export pgtable_cache and pgtable_cache_add for KVM

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
This exports the pgtable_cache array and the pgtable_cache_add
function so that HV KVM can use them for allocating radix page
tables for guests.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit ba9b399aee6fb70cbe988f0750d6dd9f6677293b)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agopowerpc/64: More definitions for POWER9
Paul Mackerras [Tue, 28 Mar 2017 16:54:19 +0000 (13:54 -0300)]
powerpc/64: More definitions for POWER9

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
This adds definitions for bits in the DSISR register which are used
by POWER9 for various translation-related exception conditions, and
for some more bits in the partition table entry that will be needed
by KVM.

Signed-off-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit dbcbfee0c81c7938e40d7d6bc659a5191f490b50)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S: 64-bit CONFIG_RELOCATABLE support for interrupts
Nicholas Piggin [Tue, 28 Mar 2017 16:54:18 +0000 (13:54 -0300)]
KVM: PPC: Book3S: 64-bit CONFIG_RELOCATABLE support for interrupts

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
64-bit Book3S exception handlers must find the dynamic kernel base
to add to the target address when branching beyond __end_interrupts,
in order to support kernel running at non-0 physical address.

Support this in KVM by branching with CTR, similarly to regular
interrupt handlers. The guest CTR saved in HSTATE_SCRATCH1 and
restored after the branch.

Without this, the host kernel hangs and crashes randomly when it is
running at a non-0 address and a KVM guest is started.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit a97a65d53d9f53b6897dc1b2aed381bc1707136b)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S: Move 64-bit KVM interrupt handler out from alt section
Nicholas Piggin [Tue, 28 Mar 2017 16:54:17 +0000 (13:54 -0300)]
KVM: PPC: Book3S: Move 64-bit KVM interrupt handler out from alt section

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
A subsequent patch to make KVM handlers relocation-safe makes them
unusable from within alt section "else" cases (due to the way fixed
addresses are taken from within fixed section head code).

Stop open-coding the KVM handlers, and add them both as normal. A more
optimal fix may be to allow some level of alternate feature patching in
the exception macros themselves, but for now this will do.

The TRAMP_KVM handlers must be moved to the "virt" fixed section area
(name is arbitrary) in order to be closer to .text and avoid the dreaded
"relocation truncated to fit" error.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit 7ede531773ea69fa56b02a873ed83ce3507eb8d5)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoKVM: PPC: Book3S: Change interrupt call to reduce scratch space use on HV
Nicholas Piggin [Tue, 28 Mar 2017 16:54:16 +0000 (13:54 -0300)]
KVM: PPC: Book3S: Change interrupt call to reduce scratch space use on HV

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
Change the calling convention to put the trap number together with
CR in two halves of r12, which frees up HSTATE_SCRATCH2 in the HV
handler.

The 64-bit PR handler entry translates the calling convention back
to match the previous call convention (i.e., shared with 32-bit), for
simplicity.

Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Acked-by: Paul Mackerras <paulus@ozlabs.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit d3918e7fd4a27564f93ec46d0359a9739c5deb8d)
Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoRevert "KVM: PPC: Book 3S: XICS: Don't lock twice when checking for resend"
Breno Leitao [Tue, 28 Mar 2017 16:54:15 +0000 (13:54 -0300)]
Revert "KVM: PPC: Book 3S: XICS: Don't lock twice when checking for resend"

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
This reverts commit 21acd0e4df04f02176e773468658c3cebff096bb.

Reverting this commit know, to apply other commits, and, then,
add this commit back on top of the new commits

Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoRevert "powerpc/powernv: Initialise nest mmu"
Breno Leitao [Tue, 28 Mar 2017 16:54:14 +0000 (13:54 -0300)]
Revert "powerpc/powernv: Initialise nest mmu"

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
This reverts commit 4f8a759561214a906844708f65e868aed7b90d5a.

This is being reverted temporarily in order to cherry pick another
patchset. This patch will be added on top of this new patchset.

Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoRevert "powerpc: Update to new option-vector-5 format for CAS"
Breno Leitao [Tue, 28 Mar 2017 16:54:13 +0000 (13:54 -0300)]
Revert "powerpc: Update to new option-vector-5 format for CAS"

BugLink: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1675806
This reverts commit ba46da7c1cc57d83f6af66bfe8f10516151c7923.

Reverting this commit know, to apply other commits, and, then,
add this commit back on top of the new commits

Signed-off-by: Breno Leitao <breno.leitao@gmail.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agonet/mlx4_core: Avoid delays during VF driver device shutdown
Jack Morgenstein [Tue, 28 Mar 2017 15:55:32 +0000 (11:55 -0400)]
net/mlx4_core: Avoid delays during VF driver device shutdown

BugLink: http://bugs.launchpad.net/bugs/1672785
Some Hypervisors detach VFs from VMs by instantly causing an FLR event
to be generated for a VF.

In the mlx4 case, this will cause that VF's comm channel to be disabled
before the VM has an opportunity to invoke the VF device's "shutdown"
method.

For such Hypervisors, there is a race condition between the VF's
shutdown method and its internal-error detection/reset thread.

The internal-error detection/reset thread (which runs every 5 seconds) also
detects a disabled comm channel. If the internal-error detection/reset
flow wins the race, we still get delays (while that flow tries repeatedly
to detect comm-channel recovery).

The cited commit fixed the command timeout problem when the
internal-error detection/reset flow loses the race.

This commit avoids the unneeded delays when the internal-error
detection/reset flow wins.

Fixes: d585df1c5ccf ("net/mlx4_core: Avoid command timeouts during VF driver device shutdown")
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Reported-by: Simon Xiao <sixiao@microsoft.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4cbe4dac82e423ecc9a0ba46af24a860853259f4)
Signed-off-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agomlx4: reduce OOM risk on arches with large pages
Eric Dumazet [Sat, 18 Feb 2017 18:34:18 +0000 (10:34 -0800)]
mlx4: reduce OOM risk on arches with large pages

BugLink: http://bugs.launchpad.net/bugs/1676858
Since mlx4 NIC are used on PowerPC with 64K pages, we need to adapt
MLX4_EN_ALLOC_PREFER_ORDER definition.

Otherwise, a fragment sitting in an out of order TCP queue can hold
0.5 Mbytes and it is a serious OOM risk.

Fixes: 51151a16a60f ("mlx4: allow order-0 memory allocations in RX path")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 3608b13ccc51d06e499dfe12b27f134de1286e28)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agointel_th: pci: Add Gemini Lake support
Alexander Shishkin [Thu, 30 Jun 2016 13:10:51 +0000 (16:10 +0300)]
intel_th: pci: Add Gemini Lake support

BugLink: http://bugs.launchpad.net/bugs/1645963
This adds Intel(R) Trace Hub PCI ID for Gemini Lake SOC.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
(cherry picked from commit 340837f985c2cb87ca0868d4aa9ce42b0fab3a21)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agointel_th: pci: Add Denverton SOC support
Alexander Shishkin [Tue, 8 Sep 2015 11:03:55 +0000 (14:03 +0300)]
intel_th: pci: Add Denverton SOC support

BugLink: http://bugs.launchpad.net/bugs/1645963
This adds Intel(R) Trace Hub PCI ID for Denverton SOC.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
(cherry picked from commit 5118ccd34780f4637a9360be580f41f4c1feab48)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agonet/mlx5: E-Switch, Don't allow changing inline mode when flows are configured
Roi Dayan [Tue, 21 Mar 2017 13:59:14 +0000 (15:59 +0200)]
net/mlx5: E-Switch, Don't allow changing inline mode when flows are configured

BugLink: http://bugs.launchpad.net/bugs/1676388
Changing the eswitch inline mode can potentially cause already configured
flows not to match the policy. E.g. set policy L4, add some L4 rules,
set policy to L2 --> bad! Hence we disallow it.

Keep track of how many offloaded rules are now set and refuse
inline mode changes if this isn't zero.

Fixes: bffaa916588e ("net/mlx5: E-Switch, Add control for inline mode")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 375f51e2b5b7b9a42b3139aea519cbb1bfc5d6ef)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agonet/mlx5e: Change the TC offload rule add/del code path to be per NIC or E-Switch
Or Gerlitz [Tue, 21 Mar 2017 13:59:13 +0000 (15:59 +0200)]
net/mlx5e: Change the TC offload rule add/del code path to be per NIC or E-Switch

BugLink: http://bugs.launchpad.net/bugs/1676388
Refactor the code to deal with add/del TC rules to have handler per NIC/E-switch
offloading use case, and push the latter into the e-switch code. This provides
better separation and is to be used in down-stream patch for applying a fix.

Fixes: bffaa916588e ("net/mlx5: E-Switch, Add control for inline mode")
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit d85cdccbb3fe9a632ec9d0f4e4526c8c84fc3523)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agodevlink: allow to fillup eswitch attrs even if mode_get op does not exist
Jiri Pirko [Thu, 9 Feb 2017 14:54:36 +0000 (15:54 +0100)]
devlink: allow to fillup eswitch attrs even if mode_get op does not exist

BugLink: http://bugs.launchpad.net/bugs/1676388
Even when mode_get op is not present, other eswitch attrs need to be
filled-up.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 4456f61cfd2a589c4368fe0b9080b646b9bd470d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agodevlink: use nla_put_failure goto label instead of out
Jiri Pirko [Thu, 9 Feb 2017 14:54:35 +0000 (15:54 +0100)]
devlink: use nla_put_failure goto label instead of out

BugLink: http://bugs.launchpad.net/bugs/1676388
Be aligned with the rest of the code and use label named nla_put_failure.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1a6aa36b6f92b1a2f2e6789f6785372d4d6ddca9)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agodevlink: rename devlink_eswitch_fill to devlink_nl_eswitch_fill
Jiri Pirko [Thu, 9 Feb 2017 14:54:34 +0000 (15:54 +0100)]
devlink: rename devlink_eswitch_fill to devlink_nl_eswitch_fill

BugLink: http://bugs.launchpad.net/bugs/1676388
Be aligned with the rest of the file and name the helper function
accordingly.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 21e3d2dd4a19f842e7d134c341eb584970ff3b32)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agodevlink: fix the name of eswitch commands
Jiri Pirko [Thu, 9 Feb 2017 14:54:33 +0000 (15:54 +0100)]
devlink: fix the name of eswitch commands

BugLink: http://bugs.launchpad.net/bugs/1676388
The eswitch_[gs]et command is supposed to be similar to port_[gs]et
command - for multiple eswitch attributes. However, when it was introduced
by 08f4b5918b2d ("net/devlink: Add E-Switch mode control") it was wrongly
named with the word "mode" in it. So fix this now, make the oririnal
enum value existing but obsolete.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit adf200f31c000d707e4afe238ed1d1199e0cce7c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agonet/mlx5e: Avoid wrong identification of rules on deletion
Or Gerlitz [Fri, 10 Mar 2017 12:33:04 +0000 (14:33 +0200)]
net/mlx5e: Avoid wrong identification of rules on deletion

BugLink: http://bugs.launchpad.net/bugs/1676388
When deleting offloaded TC flows, we must correctly identify E-switch
rules. The current check could get us wrong w.r.t to rules set on the
PF. Since it's possible to set NIC rules on the PF, switch to SRIOV
offloads mode and then attempt to delete a NIC rule.

To solve that, we add a flags field to offloaded rules, set it on
creation time and use that over the code where currently needed.

Fixes: 8b32580df1cb ('net/mlx5e: Add TC vlan action for SRIOV offloads')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 65ba8fb7d5c6803ec236bb8d6650465fed7f9769)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoUBUNTU: [Debian] add rprovides for spl-modules and zfs-modules
Tim Gardner [Mon, 27 Mar 2017 16:45:25 +0000 (10:45 -0600)]
UBUNTU: [Debian] add rprovides for spl-modules and zfs-modules

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoUBUNTU: SAUCE: efi: arm-stub: Round up FDT allocation to mapping size
Ard Biesheuvel [Wed, 22 Mar 2017 15:22:13 +0000 (10:22 -0500)]
UBUNTU: SAUCE: efi: arm-stub: Round up FDT allocation to mapping size

The FDT is mapped via a fixmap entry that is at least 2 MB in size and
2 MB aligned on 4 KB page size kernels.

On UEFI systems, the FDT allocation may share this 2 MB block with a
reserved region, or another memory region that we should never map,
unless we account for this in the size of the allocation (the alignment
is already 2 MB)

So instead of taking guesses at the needed space, simply allocate 2 MB
immediately. The allocation will be recorded as a EFI_LOADER_DATA, and
the kernel only memblock_reserve()'s the actual size of the FDT, so the
unused space will be released to the kernel.

BugLink: http://bugs.launchpad.net/bugs/1675046
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-By: Jeffrey Hugo <jhugo@codeaurora.org>
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoUBUNTU: SAUCE: efi: arm-stub: Correct FDT and initrd allocation rules for arm64
Ard Biesheuvel [Wed, 22 Mar 2017 15:22:04 +0000 (10:22 -0500)]
UBUNTU: SAUCE: efi: arm-stub: Correct FDT and initrd allocation rules for arm64

On arm64, we have made some changes over the past year to the way the
kernel itself is allocated and to how it deals with the initrd and FDT.
This patch brings the allocation logic in the EFI stub in line with that,
which is necessary because the introduction of KASLR has created the
possibility for the initrd to be allocated in a place where the kernel
may not be able to map it. (This is mostly a theoretical scenario, since
it only affects systems where the physical memory footprint exceeds the
size of the linear mapping.)

Since we know the kernel itself will be covered by the linear mapping,
choose a suitably sized window (i.e., based on the size of the linear
region) covering the kernel when allocating memory for the initrd.

The FDT may be anywhere in memory on arm64 now that we map it via the
fixmap, so we can lift the address restriction there completely.

BugLink: http://bugs.launchpad.net/bugs/1675046
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Reviewed-By: Jeffrey Hugo <jhugo@codeaurora.org>
Signed-off-by: Manoj Iyer <manoj.iyer@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoscsi: aacraid: Fix potential null access
Raghava Aditya Renukunta [Tue, 14 Mar 2017 16:20:19 +0000 (09:20 -0700)]
scsi: aacraid: Fix potential null access

BugLink: http://bugs.launchpad.net/bugs/1675872
Currently, command threads fails to return ioctls commands for older
controller versions, since it returns when all the fibs have been
allocated. Another issue is even all the fibs have not been allocated,
the correct allocated fibs is not updated nor freed.

Fixes: 113156bcea9ef1e6 (scsi: aacraid: Reworked aac_command_thread)
Reported-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from linux-next commit e498520edec6655e93ac5e768b04f4fd2299fe4d)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
7 years agoscsi: aacraid: Fix typo in blink status
Raghava Aditya Renukunta [Thu, 2 Mar 2017 17:21:33 +0000 (09:21 -0800)]
scsi: aacraid: Fix typo in blink status

BugLink: http://bugs.launchpad.net/bugs/1675872
The return status of the adapter check on KERNEL_PANIC is supposed to be
the upper 16 bits of the OMR status register.

Fixes: c421530bf848604e (scsi: aacraid: Reorder Adpater status check)
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Raghava Aditya Renukunta <RaghavaAditya.Renukunta@microsemi.com>
Reviewed-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit 934767c56b0d9dbb95a40e9e6e4d9dcdc3a165ad)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
7 years agoscsi: aacraid: remove redundant zero check on ret
Colin Ian King [Fri, 24 Feb 2017 14:43:30 +0000 (14:43 +0000)]
scsi: aacraid: remove redundant zero check on ret

BugLink: http://bugs.launchpad.net/bugs/1675872
The check for ret being zero is redundant as a few statements earlier we
break out of the while loop if ret is non-zero.  Thus we can remove the
zero check and also the dead-code non-zero case too.

Detected by CoverityScan, CID#1411632 ("Logically Dead Code")

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Dave Carroll <david.carroll@microsemi.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
(cherry picked from commit fbdab3e7fd547e1ce558db1521659707bdf02cc6)
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
7 years agoext4: lock the xattr block before checksuming it
Theodore Ts'o [Sat, 25 Mar 2017 21:22:47 +0000 (17:22 -0400)]
ext4: lock the xattr block before checksuming it

BugLink: http://bugs.launchpad.net/bugs/1658633
We must lock the xattr block before calculating or verifying the
checksum in order to avoid spurious checksum failures.

https://bugzilla.kernel.org/show_bug.cgi?id=193661

Reported-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Cc: stable@vger.kernel.org
(cherry picked from commit dac7a4b4b1f664934e8b713f529b629f67db313c)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Signed-off-by: Colin King <colin.king@canonical.com>
7 years agonet/mlx5e: Avoid supporting udp tunnel port ndo for VF reps
Paul Blakey [Tue, 21 Mar 2017 13:59:16 +0000 (15:59 +0200)]
net/mlx5e: Avoid supporting udp tunnel port ndo for VF reps

BugLink: http://bugs.launchpad.net/bugs/1676388
This was added to allow the TC offloading code to identify offloading
encap/decap vxlan rules.

The VF reps are effectively related to the same mlx5 PCI device as the
PF. Since the kernel invokes the (say) delete ndo for each netdev, the
FW erred on multiple vxlan dst port deletes when the port was deleted
from the system.

We fix that by keeping the registration to be carried out only by the
PF. Since the PF serves as the uplink device, the VF reps will look
up a port there and realize if they are ok to offload that.

Tested:
 <SETUP VFS>
 <SETUP switchdev mode to have representors>
 ip link add vxlan1 type vxlan id 44 dev ens5f0 dstport 9999
 ip link set vxlan1 up
 ip link del dev vxlan1

Fixes: 4a25730eb202 ('net/mlx5e: Add ndo_udp_tunnel_add to VF representors')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 1ad9a00ae0efc2e9337148d6c382fad3d27bf99a)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agonet/mlx5: Fix create autogroup prev initializer
Paul Blakey [Fri, 10 Mar 2017 12:33:01 +0000 (14:33 +0200)]
net/mlx5: Fix create autogroup prev initializer

BugLink: http://bugs.launchpad.net/bugs/1676388
The autogroups list is a list of non overlapping group boundaries
sorted by their start index. If the autogroups list wasn't empty
and an empty group slot was found at the start of the list,
the new group was added to the end of the list instead of the
beginning, as the prev initializer was incorrect.
When this was repeated, it caused multiple groups to have
overlapping boundaries.

Fixed that by correctly initializing the prev pointer to the
start of the list.

Fixes: eccec8da3b4e ('net/mlx5: Keep autogroups list ordered')
Signed-off-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit af36370569eb37420e1e78a2e60c277b781fcd00)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
7 years agoUBUNTU: SAUCE: arm64: arch_timer: Add check for unknown erratum
dann frazier [Fri, 24 Mar 2017 19:53:05 +0000 (13:53 -0600)]
UBUNTU: SAUCE: arm64: arch_timer: Add check for unknown erratum

BugLink: https://bugs.launchpad.net/bugs/1675509
If an unknown erratum type is passed into arch_timer_check_ool_workaround(),
we would call arch_timer_iterate_errata with a NULL match_fn and trigger
an Oops. This does not look possible with the existing code (all types are
accounted for), but emit an error and return if it happens to come to pass
in the future.

Reported-by: Seth Forshee <seth.foreshee@canonical.com>
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>
7 years agoUBUNTU: SAUCE: arm64: arch_timer: Add HISILICON_ERRATUM_161010101 ACPI matching data
Marc Zyngier [Tue, 21 Feb 2017 15:04:27 +0000 (15:04 +0000)]
UBUNTU: SAUCE: arm64: arch_timer: Add HISILICON_ERRATUM_161010101 ACPI matching data

BugLink: https://bugs.launchpad.net/bugs/1675509
In order to deal with ACPI enabled platforms suffering from the
HISILICON_ERRATUM_161010101, let's add the required OEM data that
allow the workaround to be enabled.

Tested-by: dann frazier <dann.frazier@canonical.com>
Tested-by: Hanjun Guo <hanjun.guo@linaro.org>
Reviewed-by: Hanjun Guo <hanjun.guo@linaro.org>
Signed-off-by: Marc Zyngier <marc.zyngier@arm.com>
(cherry picked from commit 68fedd693bbdf8601667302ee12e677e50fe06d8
 in the timers/errata-rework branch of
 git://git.kernel.org/pub/scm/linux/kernel/git/maz/arm-platforms.git)
Signed-off-by: dann frazier <dann.frazier@canonical.com>
Signed-off-by: Seth Forshee <seth.forshee@canonical.com>