Andra Paraschiv [Mon, 21 Sep 2020 12:17:32 +0000 (15:17 +0300)]
MAINTAINERS: Add entry for the Nitro Enclaves driver
Add entry in the MAINTAINERS file for the Nitro Enclaves files such as
the documentation, the header files, the driver itself and the user
space sample.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Update the location of the documentation, as it has been moved to the
"virt" directory.
v7 -> v8
* No changes.
v6 -> v7
* No changes.
v5 -> v6
* No changes.
v4 -> v5
* No changes.
v3 -> v4
* No changes.
v2 -> v3
* Update file entries to be in alphabetical order.
Andra Paraschiv [Mon, 21 Sep 2020 12:17:31 +0000 (15:17 +0300)]
nitro_enclaves: Add overview documentation
Add documentation on the overview of Nitro Enclaves. Include it in the
virtualization specific directory.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Move the Nitro Enclaves documentation to the "virt" directory and add
an entry for it in the corresponding index file.
v7 -> v8
* Add info about the primary / parent VM CID value.
* Update reference link for huge pages.
* Add reference link for the x86 boot protocol.
* Add license mention and update doc title / chapter formatting.
v6 -> v7
* No changes.
v5 -> v6
* No changes.
v4 -> v5
* No changes.
v3 -> v4
* Update doc type from .txt to .rst.
* Update documentation based on the changes from v4.
Andra Paraschiv [Mon, 21 Sep 2020 12:17:30 +0000 (15:17 +0300)]
nitro_enclaves: Add sample for ioctl interface usage
Add a user space sample for the usage of the ioctl interface provided by
the Nitro Enclaves driver.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* No changes.
v7 -> v8
* Track NE custom error codes for invalid page size, invalid flags and
enclave CID.
* Update the heartbeat logic to have a listener fd first, then start the
enclave and then accept connection to get the heartbeat.
* Update the reference link to the hugetlb documentation.
v6 -> v7
* Track POLLNVAL as poll event in addition to POLLHUP.
v5 -> v6
* Remove "rc" mentioning when printing errno string.
* Remove the ioctl to query API version.
* Include usage info for NUMA-aware hugetlb configuration.
* Update documentation to kernel-doc format.
* Add logic for enclave image loading.
v4 -> v5
* Print enclave vCPU ids when they are created.
* Update logic to map the modified vCPU ioctl call.
* Add check for the path to the enclave image to be less than PATH_MAX.
* Update the ioctl calls error checking logic to match the NE specific
error codes.
v3 -> v4
* Update usage details to match the updates in v4.
* Update NE ioctl interface usage.
v2 -> v3
* Remove the include directory to use the uapi from the kernel.
* Remove the GPL additional wording as SPDX-License-Identifier is
already in place.
v1 -> v2
* New in v2.
Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-17-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andra Paraschiv [Mon, 21 Sep 2020 12:17:28 +0000 (15:17 +0300)]
nitro_enclaves: Add Kconfig for the Nitro Enclaves driver
Add kernel config entry for Nitro Enclaves, including dependencies.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* No changes.
v7 -> v8
* No changes.
v6 -> v7
* Remove, for now, the dependency on ARM64 arch. x86 is currently
supported, with Arm to come afterwards. The NE kernel driver can be
built for aarch64 arch.
v5 -> v6
* No changes.
v4 -> v5
* Add arch dependency for Arm / x86.
v3 -> v4
* Add PCI and SMP dependencies.
v2 -> v3
* Remove the GPL additional wording as SPDX-License-Identifier is
already in place.
v1 -> v2
* Update path to Kconfig to match the drivers/virt/nitro_enclaves
directory.
* Update help in Kconfig.
Andra Paraschiv [Mon, 21 Sep 2020 12:17:27 +0000 (15:17 +0300)]
nitro_enclaves: Add logic for terminating an enclave
An enclave is associated with an fd that is returned after the enclave
creation logic is completed. This enclave fd is further used to setup
enclave resources. Once the enclave needs to be terminated, the enclave
fd is closed.
Add logic for enclave termination, that is mapped to the enclave fd
release callback. Free the internal enclave info used for bookkeeping.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Use the ne_devs data structure to get the refs for the NE PCI device.
v7 -> v8
* No changes.
v6 -> v7
* Remove the pci_dev_put() call as the NE misc device parent field is
used now to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
of handling full CPU cores and dedicating them to the enclave.
v5 -> v6
* Update documentation to kernel-doc format.
* Use directly put_page() instead of unpin_user_pages(), to match the
get_user_pages() calls.
v4 -> v5
* Release the reference to the NE PCI device on enclave fd release.
* Adapt the logic to cpumask enclave vCPU ids and CPU cores.
* Remove sanity checks for situations that shouldn't happen, only if
buggy system or broken logic at all.
v3 -> v4
* Use dev_err instead of custom NE log pattern.
v2 -> v3
* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
v1 -> v2
* Add log pattern for NE.
* Remove the BUG_ON calls.
* Update goto labels to match their purpose.
* Add early exit in release() if there was a slot alloc error in the fd
creation path.
Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-14-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andra Paraschiv [Mon, 21 Sep 2020 12:17:26 +0000 (15:17 +0300)]
nitro_enclaves: Add logic for starting an enclave
After all the enclave resources are set, the enclave is ready for
beginning to run.
Add ioctl command logic for starting an enclave after all its resources,
memory regions and CPUs, have been set.
The enclave start information includes the local channel addressing -
vsock CID - and the flags associated with the enclave.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Use the ne_devs data structure to get the refs for the NE PCI device.
v7 -> v8
* Add check for invalid enclave CID value e.g. well-known CIDs and
parent VM CID.
* Add custom error code for incorrect flag in enclave start info and
invalid enclave CID.
v6 -> v7
* Update the naming and add more comments to make more clear the logic
of handling full CPU cores and dedicating them to the enclave.
v5 -> v6
* Check for invalid enclave start flags.
* Update documentation to kernel-doc format.
v4 -> v5
* Add early exit on enclave start ioctl function call error.
* Move sanity checks in the enclave start ioctl function, outside of the
switch-case block.
* Remove log on copy_from_user() / copy_to_user() failure.
v3 -> v4
* Use dev_err instead of custom NE log pattern.
* Update the naming for the ioctl command from metadata to info.
* Check for minimum enclave memory size.
* Add log pattern for NE.
* Check if enclave state is init when starting an enclave.
* Remove the BUG_ON calls.
Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-13-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andra Paraschiv [Mon, 21 Sep 2020 12:17:25 +0000 (15:17 +0300)]
nitro_enclaves: Add logic for setting an enclave memory region
Another resource that is being set for an enclave is memory. User space
memory regions, that need to be backed by contiguous memory regions,
are associated with the enclave.
One solution for allocating / reserving contiguous memory regions, that
is used for integration, is hugetlbfs. The user space process that is
associated with the enclave passes to the driver these memory regions.
The enclave memory regions need to be from the same NUMA node as the
enclave CPUs.
Add ioctl command logic for setting user space memory region for an
enclave.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Use the ne_devs data structure to get the refs for the NE PCI device.
v7 -> v8
* Add early check, while getting user pages, to be multiple of 2 MiB for
the pages that back the user space memory region.
* Add custom error code for incorrect user space memory region flag.
* Include in a separate function the sanity checks for each page of the
user space memory region.
v6 -> v7
* Update check for duplicate user space memory regions to cover
additional possible scenarios.
v5 -> v6
* Check for max number of pages allocated for the internal data
structure for pages.
* Check for invalid memory region flags.
* Check for aligned physical memory regions.
* Update documentation to kernel-doc format.
* Check for duplicate user space memory regions.
* Use directly put_page() instead of unpin_user_pages(), to match the
get_user_pages() calls.
v4 -> v5
* Add early exit on set memory region ioctl function call error.
* Remove log on copy_from_user() failure.
* Exit without unpinning the pages on NE PCI dev request failure as
memory regions from the user space range may have already been added.
* Add check for the memory region user space address to be 2 MiB
aligned.
* Update logic to not have a hardcoded check for 2 MiB memory regions.
v3 -> v4
* Check enclave memory regions are from the same NUMA node as the
enclave CPUs.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.
v2 -> v3
* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
v1 -> v2
* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave max memory regions is reached when setting an enclave
memory region.
* Check if enclave state is init when setting an enclave memory region.
Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-12-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andra Paraschiv [Mon, 21 Sep 2020 12:17:24 +0000 (15:17 +0300)]
nitro_enclaves: Add logic for getting the enclave image load info
Before setting the memory regions for the enclave, the enclave image
needs to be placed in memory. After the memory regions are set, this
memory cannot be used anymore by the VM, being carved out.
Add ioctl command logic to get the offset in enclave memory where to
place the enclave image. Then the user space tooling copies the enclave
image in the memory using the given memory offset.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* No changes.
v7 -> v8
* Add custom error code for incorrect enclave image load info flag.
v6 -> v7
* No changes.
v5 -> v6
* Check for invalid enclave image load flags.
v4 -> v5
* Check for the enclave not being started when invoking this ioctl call.
* Remove log on copy_from_user() / copy_to_user() failure.
v3 -> v4
* Use dev_err instead of custom NE log pattern.
* Set enclave image load offset based on flags.
* Update the naming for the ioctl command from metadata to info.
Andra Paraschiv [Mon, 21 Sep 2020 12:17:23 +0000 (15:17 +0300)]
nitro_enclaves: Add logic for setting an enclave vCPU
An enclave, before being started, has its resources set. One of its
resources is CPU.
A NE CPU pool is set and enclave CPUs are chosen from it. Offline the
CPUs from the NE CPU pool during the pool setup and online them back
during the NE CPU pool teardown. The CPU offline is necessary so that
there would not be more vCPUs than physical CPUs available to the
primary / parent VM. In that case the CPUs would be overcommitted and
would change the initial configuration of the primary / parent VM of
having dedicated vCPUs to physical CPUs.
The enclave CPUs need to be full cores and from the same NUMA node. CPU
0 and its siblings have to remain available to the primary / parent VM.
Add ioctl command logic for setting an enclave vCPU.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Use the ne_devs data structure to get the refs for the NE PCI device.
v7 -> v8
* No changes.
v6 -> v7
* Check for error return value when setting the kernel parameter string.
* Use the NE misc device parent field to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
of handling full CPU cores and dedicating them to the enclave.
* Calculate the number of threads per core and not use smp_num_siblings
that is x86 specific.
v5 -> v6
* Check CPUs are from the same NUMA node before going through CPU
siblings during the NE CPU pool setup.
* Update documentation to kernel-doc format.
v4 -> v5
* Set empty string in case of invalid NE CPU pool.
* Clear NE CPU pool mask on pool setup failure.
* Setup NE CPU cores out of the NE CPU pool.
* Early exit on NE CPU pool setup if enclave(s) already running.
* Remove sanity checks for situations that shouldn't happen, only if
buggy system or broken logic at all.
* Add check for maximum vCPU id possible before looking into the CPU
pool.
* Remove log on copy_from_user() / copy_to_user() failure and on admin
capability check for setting the NE CPU pool.
* Update the ioctl call to not create a file descriptor for the vCPU.
* Split the CPU pool usage logic in 2 separate functions - one to get a
CPU from the pool and the other to check the given CPU is available in
the pool.
v3 -> v4
* Setup the NE CPU pool at runtime via a sysfs file for the kernel
parameter.
* Check enclave CPUs to be from the same NUMA node.
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.
v2 -> v3
* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open, ioctl and release.
v1 -> v2
* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
* Check if enclave state is init when setting enclave vCPU.
Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-10-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andra Paraschiv [Mon, 21 Sep 2020 12:17:22 +0000 (15:17 +0300)]
nitro_enclaves: Add logic for creating an enclave VM
Add ioctl command logic for enclave VM creation. It triggers a slot
allocation. The enclave resources will be associated with this slot and
it will be used as an identifier for triggering enclave run.
Return a file descriptor, namely enclave fd. This is further used by the
associated user space enclave process to set enclave resources and
trigger enclave termination.
The poll function is implemented in order to notify the enclave process
when an enclave exits without a specific enclave termination command
trigger e.g. when an enclave crashes.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Use the ne_devs data structure to get the refs for the NE PCI device.
v7 -> v8
* No changes.
v6 -> v7
* Use the NE misc device parent field to get the NE PCI device.
* Update the naming and add more comments to make more clear the logic
of handling full CPU cores and dedicating them to the enclave.
v5 -> v6
* Update the code base to init the ioctl function in this patch.
* Update documentation to kernel-doc format.
v4 -> v5
* Release the reference to the NE PCI device on create VM error.
* Close enclave fd on copy_to_user() failure; rename fd to enclave fd
while at it.
* Remove sanity checks for situations that shouldn't happen, only if
buggy system or broken logic at all.
* Remove log on copy_to_user() failure.
v3 -> v4
* Use dev_err instead of custom NE log pattern.
* Update the NE ioctl call to match the decoupling from the KVM API.
* Add metadata for the NUMA node for the enclave memory and CPUs.
v2 -> v3
* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Update kzfree() calls to kfree().
* Remove file ops that do nothing for now - open.
v1 -> v2
* Add log pattern for NE.
* Update goto labels to match their purpose.
* Remove the BUG_ON calls.
Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-9-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andra Paraschiv [Mon, 21 Sep 2020 12:17:21 +0000 (15:17 +0300)]
nitro_enclaves: Init misc device providing the ioctl interface
The Nitro Enclaves driver provides an ioctl interface to the user space
for enclave lifetime management e.g. enclave creation / termination and
setting enclave resources such as memory and CPU.
This ioctl interface is mapped to a Nitro Enclaves misc device.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Use the ne_devs data structure to get the refs for the NE misc device
in the NE PCI device driver logic.
v7 -> v8
* Add define for the CID of the primary / parent VM.
* Update the NE PCI driver shutdown logic to include misc device
deregister.
v6 -> v7
* Set the NE PCI device the parent of the NE misc device to be able to
use it in the ioctl logic.
* Update the naming and add more comments to make more clear the logic
of handling full CPU cores and dedicating them to the enclave.
v5 -> v6
* Remove the ioctl to query API version.
* Update documentation to kernel-doc format.
v4 -> v5
* Update the size of the NE CPU pool string from 4096 to 512 chars.
v3 -> v4
* Use dev_err instead of custom NE log pattern.
* Remove the NE CPU pool init during kernel module loading, as the CPU
pool is now setup at runtime, via a sysfs file for the kernel
parameter.
* Add minimum enclave memory size definition.
v2 -> v3
* Remove the GPL additional wording as SPDX-License-Identifier is
already in place.
* Remove the WARN_ON calls.
* Remove linux/bug and linux/kvm_host includes that are not needed.
* Remove "ratelimited" from the logs that are not in the ioctl call
paths.
* Remove file ops that do nothing for now - open and release.
v1 -> v2
* Add log pattern for NE.
* Update goto labels to match their purpose.
* Update ne_cpu_pool data structure to include the global mutex.
* Update NE misc device mode to 0660.
* Check if the CPU siblings are included in the NE CPU pool, as full CPU
cores are given for the enclave(s).
In addition to the replies sent by the Nitro Enclaves PCI device in
response to command requests, out-of-band enclave events can happen e.g.
an enclave crashes. In this case, the Nitro Enclaves driver needs to be
aware of the event and notify the corresponding user space process that
abstracts the enclave.
Register an MSI-X interrupt vector to be used for this kind of
out-of-band events. The interrupt notifies that the state of an enclave
changed and the driver logic scans the state of each running enclave to
identify for which this notification is intended.
Create an workqueue to handle the out-of-band events. Notify user space
enclave process that is using a polling mechanism on the enclave fd.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Use the reference to the pdev directly from the ne_pci_dev instead of
the one from the enclave data structure.
v7 -> v8
* No changes.
v6 -> v7
* No changes.
v5 -> v6
* Update documentation to kernel-doc format.
v4 -> v5
* Remove sanity checks for situations that shouldn't happen, only if
buggy system or broken logic at all.
v3 -> v4
* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.
v2 -> v3
* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
paths.
v1 -> v2
* Add log pattern for NE.
* Update goto labels to match their purpose.
Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-7-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
The Nitro Enclaves PCI device exposes a MMIO space that this driver
uses to submit command requests and to receive command replies e.g. for
enclave creation / termination or setting enclave resources.
Add logic for handling PCI device command requests based on the given
command type.
Register an MSI-X interrupt vector for command reply notifications to
handle this type of communication events.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* No changes.
v7 -> v8
* Update function signature for submit request and retrive reply
functions as they only returned 0, no error code.
* Include command type value in the error logs of ne_do_request().
v6 -> v7
* No changes.
v5 -> v6
* Update documentation to kernel-doc format.
v4 -> v5
* Remove sanity checks for situations that shouldn't happen, only if
buggy system or broken logic at all.
v3 -> v4
* Use dev_err instead of custom NE log pattern.
* Return IRQ_NONE when interrupts are not handled.
v2 -> v3
* Remove the WARN_ON calls.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
paths.
v1 -> v2
* Add log pattern for NE.
* Remove the BUG_ON calls.
* Update goto labels to match their purpose.
* Add fix for kbuild report:
https://lore.kernel.org/lkml/202004231644.xTmN4Z1z%25lkp@intel.com/
Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-6-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andra Paraschiv [Mon, 21 Sep 2020 12:17:18 +0000 (15:17 +0300)]
nitro_enclaves: Init PCI device driver
The Nitro Enclaves PCI device is used by the kernel driver as a means of
communication with the hypervisor on the host where the primary VM and
the enclaves run. It handles requests with regard to enclave lifetime.
Setup the PCI device driver and add support for MSI-X interrupts.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Init the reference to the ne_pci_dev in the ne_devs data structure.
v7 -> v8
* Add NE PCI driver shutdown logic.
v6 -> v7
* No changes.
v5 -> v6
* Update documentation to kernel-doc format.
v4 -> v5
* Remove sanity checks for situations that shouldn't happen, only if
buggy system or broken logic at all.
v3 -> v4
* Use dev_err instead of custom NE log pattern.
* Update NE PCI driver name to "nitro_enclaves".
v2 -> v3
* Remove the GPL additional wording as SPDX-License-Identifier is
already in place.
* Remove the WARN_ON calls.
* Remove linux/bug include that is not needed.
* Update static calls sanity checks.
* Remove "ratelimited" from the logs that are not in the ioctl call
paths.
* Update kzfree() calls to kfree().
v1 -> v2
* Add log pattern for NE.
* Update PCI device setup functions to receive PCI device data structure and
then get private data from it inside the functions logic.
* Remove the BUG_ON calls.
* Add teardown function for MSI-X setup.
* Update goto labels to match their purpose.
* Implement TODO for NE PCI device disable state check.
* Update function name for NE PCI device probe / remove.
Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com> Signed-off-by: Alexandru Ciobotaru <alcioa@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-5-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andra Paraschiv [Mon, 21 Sep 2020 12:17:17 +0000 (15:17 +0300)]
nitro_enclaves: Define enclave info for internal bookkeeping
The Nitro Enclaves driver keeps an internal info per each enclave.
This is needed to be able to manage enclave resources state, enclave
notifications and have a reference of the PCI device that handles
command requests for enclave lifetime management.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Add data structure to keep references to both Nitro Enclaves misc and
PCI devices.
v7 -> v8
* No changes.
v6 -> v7
* Update the naming and add more comments to make more clear the logic
of handling full CPU cores and dedicating them to the enclave.
v5 -> v6
* Update documentation to kernel-doc format.
* Include in the enclave memory region data structure the user space
address and size for duplicate user space memory regions checks.
v4 -> v5
* Include enclave cores field in the enclave metadata.
* Update the vCPU ids data structure to be a cpumask instead of a list.
v3 -> v4
* Add NUMA node field for an enclave metadata as the enclave memory and
CPUs need to be from the same NUMA node.
v2 -> v3
* Remove the GPL additional wording as SPDX-License-Identifier is
already in place.
v1 -> v2
* Add enclave memory regions and vcpus count for enclave bookkeeping.
* Update ne_state comments to reflect NE_START_ENCLAVE ioctl naming
update.
Reviewed-by: Alexander Graf <graf@amazon.com> Signed-off-by: Alexandru-Catalin Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-4-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Andra Paraschiv [Mon, 21 Sep 2020 12:17:16 +0000 (15:17 +0300)]
nitro_enclaves: Define the PCI device interface
The Nitro Enclaves (NE) driver communicates with a new PCI device, that
is exposed to a virtual machine (VM) and handles commands meant for
handling enclaves lifetime e.g. creation, termination, setting memory
regions. The communication with the PCI device is handled using a MMIO
space and MSI-X interrupts.
This device communicates with the hypervisor on the host, where the VM
that spawned the enclave itself runs, e.g. to launch a VM that is used
for the enclave.
Define the MMIO space of the NE PCI device, the commands that are
provided by this device. Add an internal data structure used as private
data for the PCI device driver and the function for the PCI device
command requests handling.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* Fix indent for the NE PCI device command types enum.
v7 -> v8
* No changes.
v6 -> v7
* Update the documentation to include references to the NE PCI device id
and MMIO bar.
v5 -> v6
* Update documentation to kernel-doc format.
v4 -> v5
* Add a TODO for including flags in the request to the NE PCI device to
set a memory region for an enclave. It is not used for now.
v3 -> v4
* Remove the "packed" attribute and include padding in the NE data
structures.
v2 -> v3
* Remove the GPL additional wording as SPDX-License-Identifier is
already in place.
Andra Paraschiv [Mon, 21 Sep 2020 12:17:15 +0000 (15:17 +0300)]
nitro_enclaves: Add ioctl interface definition
The Nitro Enclaves driver handles the enclave lifetime management. This
includes enclave creation, termination and setting up its resources such
as memory and CPU.
An enclave runs alongside the VM that spawned it. It is abstracted as a
process running in the VM that launched it. The process interacts with
the NE driver, that exposes an ioctl interface for creating an enclave
and setting up its resources.
Changelog
v9 -> v10
* Update commit message to include the changelog before the SoB tag(s).
v8 -> v9
* No changes.
v7 -> v8
* Add NE custom error codes for user space memory regions not backed by
pages multiple of 2 MiB, invalid flags and enclave CID.
* Add max flag value for enclave image load info.
v6 -> v7
* Clarify in the ioctls documentation that the return value is -1 and
errno is set on failure.
* Update the error code value for NE_ERR_INVALID_MEM_REGION_SIZE as it
gets in user space as value 25 (ENOTTY) instead of 515. Update the
NE custom error codes values range to not be the same as the ones
defined in include/linux/errno.h, although these are not propagated
to user space.
v5 -> v6
* Fix typo in the description about the NE CPU pool.
* Update documentation to kernel-doc format.
* Remove the ioctl to query API version.
v4 -> v5
* Add more details about the ioctl calls usage e.g. error codes, file
descriptors used.
* Update the ioctl to set an enclave vCPU to not return a file
descriptor.
* Add specific NE error codes.
v3 -> v4
* Decouple NE ioctl interface from KVM API.
* Add NE API version and the corresponding ioctl call.
* Add enclave / image load flags options.
v2 -> v3
* Remove the GPL additional wording as SPDX-License-Identifier is
already in place.
v1 -> v2
* Add ioctl for getting enclave image load metadata.
* Update NE_ENCLAVE_START ioctl name to NE_START_ENCLAVE.
* Add entry in Documentation/userspace-api/ioctl/ioctl-number.rst for NE
ioctls.
* Update NE ioctls definition based on the updated ioctl range for major
and minor.
Reviewed-by: Alexander Graf <graf@amazon.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Alexandru Vasile <lexnv@amazon.com> Signed-off-by: Andra Paraschiv <andraprs@amazon.com> Link: https://lore.kernel.org/r/20200921121732.44291-2-andraprs@amazon.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Lang Dai [Mon, 14 Sep 2020 03:26:41 +0000 (11:26 +0800)]
uio: free uio id after uio file node is freed
uio_register_device() do two things.
1) get an uio id from a global pool, e.g. the id is <A>
2) create file nodes like /sys/class/uio/uio<A>
uio_unregister_device() do two things.
1) free the uio id <A> and return it to the global pool
2) free the file node /sys/class/uio/uio<A>
There is a situation is that one worker is calling uio_unregister_device(),
and another worker is calling uio_register_device().
If the two workers are X and Y, they go as below sequence,
1) X free the uio id <AAA>
2) Y get an uio id <AAA>
3) Y create file node /sys/class/uio/uio<AAA>
4) X free the file note /sys/class/uio/uio<AAA>
Then it will failed at the 3rd step and cause the phenomenon we saw as it
is creating a duplicated file node.
We don't need to specify any ranges when allocating IDs so we can switch
to ida_alloc() and ida_free() instead of the ida_simple_ counterparts.
ida_simple_get(ida, 0, 0, gfp) is equivalent to
ida_alloc_range(ida, 0, UINT_MAX, gfp) which is equivalent to
ida_alloc(ida, gfp). Note: IDR will never actually allocate an ID
larger than INT_MAX.
Mike Leach [Wed, 16 Sep 2020 19:17:37 +0000 (13:17 -0600)]
coresight: etm4x: Fix number of resources check for ETM 4.3 and above
The initialisation code checks TRCIDR4 to determine the number of resource
selectors available on the system. Since ETM v 4.3, the value 0 has a
different meaning. This patch takes into account this change.
Jonathan Zhou [Wed, 16 Sep 2020 19:17:36 +0000 (13:17 -0600)]
coresight: etm4x: Fix mis-usage of nr_resource in sysfs interface
The member @nr_resource represents how many resource selector pairs,
and the pair 0 is always implemented and reserved.
So let's multiply by 2 when resetting the selector configuration.
And also update the validation of the input @idx.
Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Jonathan Zhou <jonathan.zhouwen@huawei.com>
[Fixed typographical error in changelog, added stable] Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20200916191737.4001561-16-mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
coresight: Make sysfs functional on topologies with per core sink
Coresight driver assumes sink is common across all the ETMs,
and tries to build a path between ETM and the first enabled
sink found using bus based search. This breaks sysFS usage
on implementations that has multiple per core sinks in
enabled state.
To fix this, coresight_get_enabled_sink API is updated to
do a connection based search starting from the given source,
instead of bus based search.
With sink selection using sysfs depecrated for perf interface,
provision for reset is removed as well in this API.
Jonathan Zhou [Wed, 16 Sep 2020 19:17:32 +0000 (13:17 -0600)]
coresight: etm4x: Fix issues on trcseqevr access
The TRCSEQEVR(3) is reserved, using '@nrseqstate - 1' instead to avoid
accessing the reserved register.
Fixes: f188b5e76aae ("coresight: etm4x: Save/restore state across CPU low power states") Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Mike Leach <mike.leach@linaro.org> Cc: Shaokun Zhang <zhangshaokun@hisilicon.com> Signed-off-by: Jonathan Zhou <jonathan.zhouwen@huawei.com>
[Fixed capital letter in title] Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20200916191737.4001561-12-mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
coresight: etm4x: Handle unreachable sink in perf mode
If the specified/hinted sink is not reachable from a subset of the CPUs,
we could end up unable to trace the event on those CPUs. This
is the best effort we could do until we support 1:1 configurations.
Fail gracefully in such cases avoiding a WARN_ON, which can be easily
triggered by the user on certain platforms (Arm N1SDP), with the following
trace paths :
Even though we don't support using separate sinks for the ETMs yet (e.g,
for 1:1 configurations), we should at least honor the user's choice and
handle the limitations gracefully, by simply skipping the tracing on ETMs
which can't reach the requested sink.
Fixes: f9d81a657bb8 ("coresight: perf: Allow tracing on hotplugged CPUs") Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Reported-by: Jeremy Linton <jeremy.linton@arm.com> Tested-by: Jeremy Linton <jeremy.linton@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20200916191737.4001561-11-mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
coresight: cti: Write regsiters directly in cti_enable_hw()
Deadlock as below is triggered by one CPU holds drvdata->spinlock
and calls cti_enable_hw(). Smp_call_function_single() is called
in cti_enable_hw() and tries to let another CPU write CTI registers.
That CPU is trying to get drvdata->spinlock in cti_cpu_pm_notify()
and doesn't response to IPI from smp_call_function_single().
This change write CTI registers directly in cti_enable_hw().
Config->hw_powered has been checked to be true with spinlock holded.
CTI is powered and can be programmed until spinlock is released.
Moving from using an address filter to trace the default "all addresses"
range to no filtering to acheive the same result, has caused the perf
filtering of kernel/user address spaces from not working unless an
explicit address filter was used.
This is due to the original code using a side-effect of the address
filtering rather than setting the global TRCVICTLR exception level
filtering.
The use of the mode sysfs file is also similarly affected.
A helper function is added to fix both instances.
Fixes: ae2041510d5d ("coresight: etmv4: Update default filter and initialisation") Reported-by: Leo Yan <leo.yan@linaro.org> Tested-by: Leo Yan <leo.yan@linaro.org> Reviewed-by: Leo Yan <leo.yan@linaro.org> Signed-off-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20200916191737.4001561-8-mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
coresight: cti: remove pm_runtime_get_sync() from CPU hotplug
Below BUG is triggered by call pm_runtime_get_sync() in
cti_cpuhp_enable_hw(). It's in CPU hotplug callback with interrupt
disabled. Pm_runtime_get_sync() calls clock driver to enable clock
which could sleep. Remove pm_runtime_get_sync() in cti_cpuhp_enable_hw()
since pm_runtime_get_sync() is called in cti_enabld and pm_runtime_put()
is called in cti_disabled. No need to increase pm count when CPU gets
online since it's not decreased when CPU is offline.
coresight: cti: disclaim device only when it's claimed
Coresight_claim_device() is called in cti_starting_cpu() only
when CTI is enabled while coresight_disclaim_device() is called
uncontionally in cti_dying_cpu(). This triggered below WARNING.
Only call disclaim device when CTI device is enabled to fix it.
coresight: etm4x: Fix etm4_count race by moving cpuhp callbacks to init
etm4_count keeps track of number of ETMv4 registered and on some systems,
a race is observed on etm4_count variable which can lead to multiple calls
to cpuhp_setup_state_nocalls_cpuslocked(). This function internally calls
cpuhp_store_callbacks() which prevents multiple registrations of callbacks
for a given state and due to this race, it returns -EBUSY leading to ETM
probe failures like below.
coresight-etm4x: probe of 7040000.etm failed with error -16
This race can easily be triggered with async probe by setting probe type
as PROBE_PREFER_ASYNCHRONOUS and with ETM power management property
"arm,coresight-loses-context-with-cpu".
Prevent this race by moving cpuhp callbacks to etm driver init since the
cpuhp callbacks doesn't have to depend on the etm4_count and can be once
setup during driver init. Similarly we move cpu_pm notifier registration
to driver init and completely remove etm4_count usage. Also now we can
use non cpuslocked version of cpuhp callbacks with this movement.
Fixes: 9b6a3f3633a5 ("coresight: etmv4: Fix CPU power management setup in probe() function") Fixes: 58eb457be028 ("hwtracing/coresight-etm4x: Convert to hotplug state machine") Suggested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Tested-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Stephen Boyd <swboyd@chromium.org> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org> Signed-off-by: Mathieu Poirier <mathieu.poirier@linaro.org> Link: https://lore.kernel.org/r/20200916191737.4001561-2-mathieu.poirier@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Colin Ian King [Thu, 10 Sep 2020 15:12:21 +0000 (16:12 +0100)]
binder: remove redundant assignment to pointer n
The pointer n is being initialized with a value that is
never read and it is being updated later with a new value. The
initialization is redundant and can be removed.
Acked-by: Todd Kjos <tkjos@google.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20200910151221.751464-1-colin.king@canonical.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Colin Ian King [Mon, 14 Sep 2020 13:56:46 +0000 (14:56 +0100)]
misc: hisi_hikey_usb: fix return of uninitialized ret status variable
Currently the return value from ret is uninitialized so the function
hisi_hikey_usb_parse_kirin970 is returning a garbage value when
succeeding. Since ret is not used anywhere else in the function, remove
it and just return 0 success at the end of the function.
Fixes: d210a0023590 ("misc: hisi_hikey_usb: add support for Hikey 970") Reviewed-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Colin Ian King <colin.king@canonical.com> Link: https://lore.kernel.org/r/20200914135646.99334-1-colin.king@canonical.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Merge tag 'fpga-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mdf/linux-fpga into char-misc-next
Moritz writes:
Here is the first set of changes for the 5.10-rc1 merge window.
Xilinx:
- Luca's changes clean up the xilinx-spi driver and add better
diagnostics on errors.
Core:
- I cleaned up a stray comment.
- Richard's change marks FPGA manager tasks un-interruptible.
- Tom has agreed to help out as Reviewer in the FPGA Manager subsystem.
DFL:
- Xu's changes add a new bus that is the first part of a series to support
adding devices via DFL (the other parts are still under review)
All patches have been reviewed on the mailing list, and have been in the
last few linux-next releases (as part of my for-next branch) without issues.
Signed-off-by: Moritz Fischer <mdf@kernel.org>
* tag 'fpga-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/mdf/linux-fpga:
fpga: dfl: create a dfl bus type to support DFL devices
fpga: fpga-region: Cleanup an outdated comment
fpga: dfl: map feature mmio resources in their own feature drivers
fpga manager: xilinx-spi: provide better diagnostics on programming failure
fpga manager: xilinx-spi: add error checking after gpiod_get_value()
fpga manager: xilinx-spi: fix write_complete timeout handling
fpga manager: xilinx-spi: remove final dot from dev_err() strings
fpga manager: xilinx-spi: remove stray comment
fpga: dfl: change data type of feature id to u16
MAINTAINERS: Add Tom Rix as fpga reviewer
fpga: stratix10-soc: make FPGA task un-interruptible
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
"A collection of fixes I've been accruing over the last few weeks, none
of them have been severe enough to warrant flushing the queue but it's
been long enough now that it's a good idea to send them in.
A handful of them are fixups for QSPI DT/bindings/compatibles, some
smaller fixes for system DMA clock control and TMU interrupts on i.MX,
a handful of fixes for OMAP, including a fix for DSI (display) on
omap5"
Merge tag 'usb-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb
Pull USB/Thunderbolt fixes from Greg KH:
"Here are some small USB and Thunderbolt driver fixes for 5.9-rc5.
Nothing huge, just a number of bugfixes and new device ids for
problems reported:
- new USB serial driver ids
- bug fixes for syzbot reported problems
- typec driver fixes
- thunderbolt driver fixes
- revert of reported broken commit
All of these have been in linux-next with no reported issues"
* tag 'usb-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb:
usb: typec: intel_pmc_mux: Do not configure SBU and HSL Orientation in Alternate modes
usb: typec: intel_pmc_mux: Do not configure Altmode HPD High
usb: core: fix slab-out-of-bounds Read in read_descriptors
Revert "usb: dwc3: meson-g12a: fix shared reset control use"
usb: typec: ucsi: acpi: Check the _DEP dependencies
usb: typec: intel_pmc_mux: Un-register the USB role switch
usb: Fix out of sync data toggle if a configured device is reconfigured
USB: serial: option: support dynamic Quectel USB compositions
USB: serial: option: add support for SIM7070/SIM7080/SIM7090 modules
thunderbolt: Use maximum USB3 link rate when reclaiming if link is not up
thunderbolt: Disable ports that are not implemented
USB: serial: ftdi_sio: add IDs for Xsens Mti USB converter
Merge tag 'staging-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging
Pull staging/IIO driver fixes from Greg KH:
"Here are a number of staging and IIO driver fixes for 5.9-rc5.
The majority of these are IIO driver fixes, to resolve a timestamp
issue that was recently found to affect a bunch of IIO drivers.
The other fixes in here are:
- small IIO driver fixes
- greybus driver fix
- counter driver fix (came in through the IIO fixes tree)
All of these have been in linux-next for a while with no reported
issues"
* tag 'staging-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: (23 commits)
iio: adc: mcp3422: fix locking on error path
iio: adc: mcp3422: fix locking scope
iio: adc: meson-saradc: Use the parent device to look up the calib data
iio:adc:max1118 Fix alignment of timestamp and data leak issues
iio:adc:ina2xx Fix timestamp alignment issue.
iio:adc:ti-adc084s021 Fix alignment and data leak issues.
iio:adc:ti-adc081c Fix alignment and data leak issues
iio:magnetometer:ak8975 Fix alignment and data leak issues.
iio:light:ltr501 Fix timestamp alignment issue.
iio:light:max44000 Fix timestamp alignment and prevent data leak.
iio:chemical:ccs811: Fix timestamp alignment and prevent data leak.
iio:proximity:mb1232: Fix timestamp alignment and prevent data leak.
iio:accel:mma7455: Fix timestamp alignment and prevent data leak.
iio:accel:bmc150-accel: Fix timestamp alignment and prevent data leak.
iio:accel:mma8452: Fix timestamp alignment and prevent data leak.
iio: accel: kxsd9: Fix alignment of local buffer.
iio: adc: rockchip_saradc: select IIO_TRIGGERED_BUFFER
iio: adc: ti-ads1015: fix conversion when CONFIG_PM is not set
counter: microchip-tcb-capture: check the correct variable
iio: cros_ec: Set Gyroscope default frequency to 25Hz
...
Merge tag 'driver-core-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core fixes from Greg KH:
"Here are some small driver core and debugfs fixes for 5.9-rc5
Included in here are:
- firmware loader memory leak fix
- firmware loader testing fixes for non-EFI systems
- device link locking fixes found by lockdep
- kobject_del() bugfix that has been affecting some callers
- debugfs minor fix
All of these have been in linux-next for a while with no reported
issues"
* tag 'driver-core-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
test_firmware: Test platform fw loading on non-EFI systems
PM: <linux/device.h>: fix @em_pd kernel-doc warning
kobject: Drop unneeded conditional in __kobject_del()
driver core: Fix device_pm_lock() locking for device links
MAINTAINERS: Add the security document to SECURITY CONTACT
driver code: print symbolic error code
debugfs: Fix module state check condition
kobject: Restore old behaviour of kobject_del(NULL)
firmware_loader: fix memory leak for paged buffer
Olof Johansson [Sun, 13 Sep 2020 15:57:37 +0000 (08:57 -0700)]
Merge tag 'arm-soc/for-5.9/devicetree-fixes' of https://github.com/Broadcom/stblinux into arm/fixes
This pull request contains Broadcom ARM-based SoCs Device Tree fixes for
5.9, please pull the following:
- Florian fixes the Broadcom QSPI controller binding such that the most
specific compatible string is the left most one, and all existing
in-tree users are updated as well.
Olof Johansson [Sun, 13 Sep 2020 15:56:03 +0000 (08:56 -0700)]
Merge tag 'imx-fixes-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux into arm/fixes
i.MX fixes for 5.9, round 2:
- Fix the misspelling of 'interrupts' property in i.MX8MQ TMU DT node.
- Correct 'ahb' clock for i.MX8MP SDMA1 in device tree.
- Fix pad QSPI1B_SCLK mux mode for UART3 on i.MX6SX.
* tag 'imx-fixes-5.9-2' of git://git.kernel.org/pub/scm/linux/kernel/git/shawnguo/linux:
ARM: dts: imx6sx: fix the pad QSPI1B_SCLK mux mode for uart3
arm64: dts: imx8mp: correct sdma1 clk setting
arm64: dts: imx8mq: Fix TMU interrupt property
Merge tag 'char-misc-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char / misc driver fixes from Greg KH:
"Here are a number of small driver fixes for 5.9-rc5
Included in here are:
- habanalabs driver fixes
- interconnect driver fixes
- soundwire driver fixes
- dyndbg fixes for reported issues, and then reverts to fix it all up
to a sane state.
- phy driver fixes
All of these have been in linux-next for a while with no reported
issues"
* tag 'char-misc-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc:
Revert "dyndbg: accept query terms like file=bar and module=foo"
Revert "dyndbg: fix problem parsing format="foo bar""
scripts/tags.sh: exclude tools directory from tags generation
video: fbdev: fix OOB read in vga_8planes_imageblit()
dyndbg: fix problem parsing format="foo bar"
dyndbg: refine export, rename to dynamic_debug_exec_queries()
dyndbg: give %3u width in pr-format, cosmetic only
interconnect: qcom: Fix small BW votes being truncated to zero
soundwire: fix double free of dangling pointer
interconnect: Show bandwidth for disabled paths as zero in debugfs
habanalabs: fix report of RAZWI initiator coordinates
habanalabs: prevent user buff overflow
phy: omap-usb2-phy: disable PHY charger detect
phy: qcom-qmp: Use correct values for ipq8074 PCIe Gen2 PHY init
soundwire: bus: fix typo in comment on INTSTAT registers
phy: qualcomm: fix return value check in qcom_ipq806x_usb_phy_probe()
phy: qualcomm: fix platform_no_drv_owner.cocci warnings
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"A bit on the bigger side, mostly due to me being on vacation, then
busy, then on parental leave, but there's nothing worrisome.
ARM:
- Multiple stolen time fixes, with a new capability to match x86
- Fix for hugetlbfs mappings when PUD and PMD are the same level
- Fix for hugetlbfs mappings when PTE mappings are enforced (dirty
logging, for example)
- Fix tracing output of 64bit values
x86:
- nSVM state restore fixes
- Async page fault fixes
- Lots of small fixes everywhere"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (25 commits)
KVM: emulator: more strict rsm checks.
KVM: nSVM: more strict SMM checks when returning to nested guest
SVM: nSVM: setup nested msr permission bitmap on nested state load
SVM: nSVM: correctly restore GIF on vmexit from nesting after migration
x86/kvm: don't forget to ACK async PF IRQ
x86/kvm: properly use DEFINE_IDTENTRY_SYSVEC() macro
KVM: VMX: Don't freeze guest when event delivery causes an APIC-access exit
KVM: SVM: avoid emulation with stale next_rip
KVM: x86: always allow writing '0' to MSR_KVM_ASYNC_PF_EN
KVM: SVM: Periodically schedule when unregistering regions on destroy
KVM: MIPS: Change the definition of kvm type
kvm x86/mmu: use KVM_REQ_MMU_SYNC to sync when needed
KVM: nVMX: Fix the update value of nested load IA32_PERF_GLOBAL_CTRL control
KVM: fix memory leak in kvm_io_bus_unregister_dev()
KVM: Check the allocation of pv cpu mask
KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected
KVM: arm64: Update page shift if stage 2 block mapping not supported
KVM: arm64: Fix address truncation in traces
KVM: arm64: Do not try to map PUDs when they are folded into PMD
arm64/x86: KVM: Introduce steal-time cap
...
Merge tag 'for-linus' of git://github.com/openrisc/linux
Pull OpenRISC fixes from Stafford Horne:
"Fixes for compile issues pointed out by kbuild and one bug I found in
initrd with the 5.9 patches"
* tag 'for-linus' of git://github.com/openrisc/linux:
openrisc: Fix issue with get_user for 64-bit values
openrisc: Fix cache API compile issue when not inlining
openrisc: Reserve memblock for initrd
Merge tag 'seccomp-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull seccomp fixes from Kees Cook:
"This fixes a rare race condition in seccomp when using TSYNC and
USER_NOTIF together where a memory allocation would not get freed
(found by syzkaller, fixed by Tycho).
Additionally updates Tycho's MAINTAINERS and .mailmap entries for his
new address"
* tag 'seccomp-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
seccomp: don't leave dangling ->notif if file allocation fails
mailmap, MAINTAINERS: move to tycho.pizza
seccomp: don't leak memory when filter install races
Merge tag 'libnvdimm-fix-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm
Pull libnvdimm fix from Vishal Verma:
"Fix detection of dax support for block devices.
Previous fixes in this area, which only affected printing of debug
messages, had an incorrect condition for detection of dax. This fix
should finally do the right thing"
* tag 'libnvdimm-fix-v5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/nvdimm/nvdimm:
dax: fix detection of dax support for non-persistent memory block devices
Merge tag 'for-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
"A few more fixes:
- regression fix for a crash after failed snapshot creation
- one more lockep fix: use nofs allocation when allocating missing
device
- fix reloc tree leak on degraded mount
- make some extent buffer alignment checks less strict to mount
filesystems created by btrfs-convert"
* tag 'for-5.9-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: fix NULL pointer dereference after failure to create snapshot
btrfs: free data reloc tree on failed mount
btrfs: require only sector size alignment for parent eb bytenr
btrfs: fix lockdep splat in add_missing_dev
Maxim Levitsky [Thu, 27 Aug 2020 16:27:20 +0000 (19:27 +0300)]
KVM: nSVM: more strict SMM checks when returning to nested guest
* check that guest is 64 bit guest, otherwise the SVM related fields
in the smm state area are not defined
* If the SMM area indicates that SMM interrupted a running guest,
check that EFER.SVME which is also saved in this area is set, otherwise
the guest might have tampered with SMM save area, and so indicate
emulation failure which should triple fault the guest.
* Check that that guest CPUID supports SVM (due to the same issue as above)
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200827162720.278690-4-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Maxim Levitsky [Thu, 27 Aug 2020 16:27:18 +0000 (19:27 +0300)]
SVM: nSVM: correctly restore GIF on vmexit from nesting after migration
Currently code in svm_set_nested_state copies the current vmcb control
area to L1 control area (hsave->control), under assumption that
it mostly reflects the defaults that kvm choose, and later qemu
overrides these defaults with L2 state using standard KVM interfaces,
like KVM_SET_REGS.
However nested GIF (which is AMD specific thing) is by default is true,
and it is copied to hsave area as such.
This alone is not a big deal since on VMexit, GIF is always set to false,
regardless of what it was on VM entry. However in nested_svm_vmexit we
were first were setting GIF to false, but then we overwrite the control
fields with value from the hsave area. (including the nested GIF field
itself if GIF virtualization is enabled).
Now on normal vm entry this is not a problem, since GIF is usually false
prior to normal vm entry, and this is the value that copied to hsave,
and then restored, but this is not always the case when the nested state
is loaded as explained above.
To fix this issue, move svm_set_gif after we restore the L1 control
state in nested_svm_vmexit, so that even with wrong GIF in the
saved L1 control area, we still clear GIF as the spec says.
Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com>
Message-Id: <20200827162720.278690-2-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
openrisc: Fix issue with get_user for 64-bit values
A build failure was raised by kbuild with the following error.
drivers/android/binder.c: Assembler messages:
drivers/android/binder.c:3861: Error: unrecognized keyword/register name `l.lwz ?ap,4(r24)'
drivers/android/binder.c:3866: Error: unrecognized keyword/register name `l.addi ?ap,r0,0'
The issue is with 64-bit get_user() calls on openrisc. I traced this to
a problem where in the internally in the get_user macros there is a cast
to long __gu_val this causes GCC to think the get_user call is 32-bit.
This binder code is really long and GCC allocates register r30, which
triggers the issue. The 64-bit get_user asm tries to get the 64-bit pair
register, which for r30 overflows the general register names and returns
the dummy register ?ap.
The fix here is to move the temporary variables into the asm macros. We
use a 32-bit __gu_tmp for 32-bit and smaller macro and a 64-bit tmp in
the 64-bit macro. The cast in the 64-bit macro has a trick of casting
through __typeof__((x)-(x)) which avoids the below warning. This was
barrowed from riscv.
arch/openrisc/include/asm/uaccess.h:240:8: warning: cast to pointer from integer of different size
I tested this in a small unit test to check reading between 64-bit and
32-bit pointers to 64-bit and 32-bit values in all combinations. Also I
ran make C=1 to confirm no new sparse warnings came up. It all looks
clean to me.
Merge commit 26d05b368a5c0 ("Merge branch 'kvm-async-pf-int' into HEAD")
tried to adapt the new interrupt based async PF mechanism to the newly
introduced IDTENTRY magic but unfortunately it missed the fact that
DEFINE_IDTENTRY_SYSVEC() doesn't call ack_APIC_irq() on its own and
all DEFINE_IDTENTRY_SYSVEC() users have to call it manually.
As the result all multi-CPU KVM guest hang on boot when
KVM_FEATURE_ASYNC_PF_INT is present. The breakage went unnoticed because no
KVM userspace (e.g. QEMU) currently set it (and thus async PF mechanism
is currently disabled) but we're about to change that.
Wanpeng Li [Wed, 19 Aug 2020 08:55:27 +0000 (16:55 +0800)]
KVM: VMX: Don't freeze guest when event delivery causes an APIC-access exit
According to SDM 27.2.4, Event delivery causes an APIC-access VM exit.
Don't report internal error and freeze guest when event delivery causes
an APIC-access exit, it is handleable and the event will be re-injected
during the next vmentry.
Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
Message-Id: <1597827327-25055-2-git-send-email-wanpengli@tencent.com> Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Wanpeng Li [Sat, 12 Sep 2020 06:16:39 +0000 (02:16 -0400)]
KVM: SVM: avoid emulation with stale next_rip
svm->next_rip is reset in svm_vcpu_run() only after calling
svm_exit_handlers_fastpath(), which will cause SVM's
skip_emulated_instruction() to write a stale RIP.
We can move svm_exit_handlers_fastpath towards the end of
svm_vcpu_run(). To align VMX with SVM, keep svm_complete_interrupts()
close as well.
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Paul K. <kronenpj@kronenpj.dyndns.org> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com>
[Also move vmcb_mark_all_clean before any possible write to the VMCB.
- Paolo]
Merge branch 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux
Pull i2c updates from Wolfram Sang:
"Usual driver bugfixes for the I2C subsystem"
* 'i2c/for-current' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux:
i2c: algo: pca: Reapply i2c bus settings after reset
i2c: npcm7xx: Fix timeout calculation
misc: eeprom: at24: register nvmem only after eeprom is ready to use
Merge tag 'pm-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"These fix three pieces of documentation and add new CPU IDs to the
Intel RAPL power capping driver.
Specifics:
- Add CPU IDs of the TigerLake Desktop, RocketLake and AlderLake
chips to the Intel RAPL power capping driver (Zhang Rui).
- Add the missing energy model performance domain item to the struct
device kerneldoc comment (Randy Dunlap).
- Fix the struct powercap_control_type kerneldoc comment to match the
actual definition of that structure and add missing item to the
struct powercap_zone_ops kerneldoc comment (Amit Kucheria)"
* tag 'pm-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
powercap: make documentation reflect code
PM: <linux/device.h>: fix @em_pd kernel-doc warning
powercap/intel_rapl: add support for AlderLake
powercap/intel_rapl: add support for RocketLake
powercap/intel_rapl: add support for TigerLake Desktop
Merge tag 'block-5.9-2020-09-11' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- Fix a regression in bdev partition locking (Christoph)
- NVMe pull request from Christoph:
- cancel async events before freeing them (David Milburn)
- revert a broken race fix (James Smart)
- fix command processing during resets (Sagi Grimberg)
- Fix a kyber crash with requeued flushes (Omar)
- Fix __bio_try_merge_page() same_page error for no merging (Ritesh)
* tag 'block-5.9-2020-09-11' of git://git.kernel.dk/linux-block:
block: Set same_page to false in __bio_try_merge_page if ret is false
nvme-fabrics: allow to queue requests for live queues
block: only call sched requeue_request() for scheduled requests
nvme-tcp: cancel async events before freeing event struct
nvme-rdma: cancel async events before freeing event struct
nvme-fc: cancel async events before freeing event struct
nvme: Revert: Fix controller creation races with teardown flow
block: restore a specific error code in bdev_del_partition
Merge tag 'spi-fix-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi
Pull spi fixes from Mark Brown:
"There's some driver specific fixes here plus one core fix for memory
leaks that could be triggered by a potential race condition when
cleaning up after we have split transfers to fit into what the
controller can support"
* tag 'spi-fix-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi:
spi: stm32: fix pm_runtime_get_sync() error checking
spi: Fix memory leak on splited transfers
spi: spi-cadence-quadspi: Fix mapping of buffers for DMA reads
spi: stm32: Rate-limit the 'Communication suspended' message
spi: spi-loopback-test: Fix out-of-bounds read
spi: spi-cadence-quadspi: Populate get_name() interface
MAINTAINERS: add myself as maintainer for spi-fsl-dspi driver
Merge tag 'regulator-fix-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"The biggest set of fixes here is those from Michał Mirosław fixing
some locking issues with coupled regulators that are triggered in
cases where a coupled regulator is used by a device involved in
fs_reclaim like eMMC storage.
These are relatively serious for the affected systems, though the
circumstances where they trigger are very rare"
* tag 'regulator-fix-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: pwm: Fix machine constraints application
regulator: core: Fix slab-out-of-bounds in regulator_unlock_recursive()
regulator: remove superfluous lock in regulator_resolve_coupling()
regulator: cleanup regulator_ena_gpio_free()
regulator: plug of_node leak in regulator_register()'s error path
regulator: push allocation in set_consumer_device_supply() out of lock
regulator: push allocations in create_regulator() outside of lock
regulator: push allocation in regulator_ena_gpio_request() out of lock
regulator: push allocation in regulator_init_coupling() outside of lock
regulator: fix spelling mistake "Cant" -> "Can't"
regulator: cros-ec-regulator: Add NULL test for devm_kmemdup call
KVM: x86: always allow writing '0' to MSR_KVM_ASYNC_PF_EN
Even without in-kernel LAPIC we should allow writing '0' to
MSR_KVM_ASYNC_PF_EN as we're not enabling the mechanism. In
particular, QEMU with 'kernel-irqchip=off' fails to start
a guest with
qemu-system-x86_64: error: failed to set MSR 0x4b564d02 to 0x0
Fixes: 9d3c447c72fb2 ("KVM: X86: Fix async pf caused null-ptr-deref") Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200911093147.484565-1-vkuznets@redhat.com>
[Actually commit the version proposed by Sean Christopherson. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
David Rientjes [Tue, 25 Aug 2020 19:56:28 +0000 (12:56 -0700)]
KVM: SVM: Periodically schedule when unregistering regions on destroy
There may be many encrypted regions that need to be unregistered when a
SEV VM is destroyed. This can lead to soft lockups. For example, on a
host running 4.15:
Although the CLFLUSH is no longer issued on every encrypted region to be
unregistered, there are no other changes that can prevent soft lockups for
very large SEV VMs in the latest kernel.
Periodically schedule if necessary. This still holds kvm->lock across the
resched, but since this only happens when the VM is destroyed this is
assumed to be acceptable.
Signed-off-by: David Rientjes <rientjes@google.com>
Message-Id: <alpine.DEB.2.23.453.2008251255240.2987727@chino.kir.corp.google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
In Documentation/virt/kvm/api.rst it is said that "You probably want to
use 0 as machine type", which implies that type 0 be the "automatic" or
"default" type. And, in user-space libvirt use the null-machine (with
type 0) to detect the kvm capability, which returns "KVM not supported"
on a VZ platform.
I try to fix it in QEMU but it is ugly:
https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg05629.html
And Thomas Huth suggests me to change the definition of kvm type:
https://lists.nongnu.org/archive/html/qemu-devel/2020-09/msg03281.html
Since VZ and TE cannot co-exists, using type 0 on a TE platform will
still return success (so old user-space tools have no problems on new
kernels); the advantage is that using type 0 on a VZ platform will not
return failure. So, the only problem is "new user-space tools use type
2 on old kernels", but if we treat this as a kernel bug, we can backport
this patch to old stable kernels.
Signed-off-by: Huacai Chen <chenhc@lemote.com>
Message-Id: <1599734031-28746-1-git-send-email-chenhc@lemote.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Merge tag 'mmc-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- sdio: Restore ~20% performance drop for SDHCI drivers, by using
mmc_pre_req() and mmc_post_req() for SDIO requests.
MMC host:
- sdhci-of-esdhc: Fix support for erratum eSDHC7
- mmc_spi: Allow the driver to be built when CONFIG_HAS_DMA is unset
- sdhci-msm: Use retries to fix tuning
- sdhci-acpi: Fix resume for eMMC HS400 mode"
* tag 'mmc-v5.9-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdio: Use mmc_pre_req() / mmc_post_req()
mmc: sdhci-of-esdhc: Don't walk device-tree on every interrupt
mmc: mmc_spi: Allow the driver to be built when CONFIG_HAS_DMA is unset
mmc: sdhci-msm: Add retries when all tuning phases are found valid
mmc: sdhci-acpi: Clear amd_sdhci_host on reset
Lai Jiangshan [Wed, 2 Sep 2020 13:54:21 +0000 (21:54 +0800)]
kvm x86/mmu: use KVM_REQ_MMU_SYNC to sync when needed
When kvm_mmu_get_page() gets a page with unsynced children, the spt
pagetable is unsynchronized with the guest pagetable. But the
guest might not issue a "flush" operation on it when the pagetable
entry is changed from zero or other cases. The hypervisor has the
responsibility to synchronize the pagetables.
KVM behaved as above for many years, But commit 8c8560b83390
("KVM: x86/mmu: Use KVM_REQ_TLB_FLUSH_CURRENT for MMU specific flushes")
inadvertently included a line of code to change it without giving any
reason in the changelog. It is clear that the commit's intention was to
change KVM_REQ_TLB_FLUSH -> KVM_REQ_TLB_FLUSH_CURRENT, so we don't
needlessly flush other contexts; however, one of the hunks changed
a nearby KVM_REQ_MMU_SYNC instead. This patch changes it back.
Link: https://lore.kernel.org/lkml/20200320212833.3507-26-sean.j.christopherson@intel.com/ Cc: Sean Christopherson <sean.j.christopherson@intel.com> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Lai Jiangshan <laijs@linux.alibaba.com>
Message-Id: <20200902135421.31158-1-jiangshanlai@gmail.com>
fixes: 8c8560b83390 ("KVM: x86/mmu: Use KVM_REQ_TLB_FLUSH_CURRENT for MMU specific flushes") Cc: stable@vger.kernel.org Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: fix memory leak in kvm_io_bus_unregister_dev()
when kmalloc() fails in kvm_io_bus_unregister_dev(), before removing
the bus, we should iterate over all other devices linked to it and call
kvm_iodevice_destructor() for them
Fixes: 90db10434b16 ("KVM: kvm_io_bus_unregister_dev() should never fail") Cc: stable@vger.kernel.org Reported-and-tested-by: syzbot+f196caa45793d6374707@syzkaller.appspotmail.com Link: https://syzkaller.appspot.com/bug?extid=f196caa45793d6374707 Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com> Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>
Message-Id: <20200907185535.233114-1-rkovhaev@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Haiwei Li [Tue, 1 Sep 2020 11:41:37 +0000 (19:41 +0800)]
KVM: Check the allocation of pv cpu mask
check the allocation of per-cpu __pv_cpu_mask. Initialize ops only when
successful.
Signed-off-by: Haiwei Li <lihaiwei@tencent.com>
Message-Id: <d59f05df-e6d3-3d31-a036-cc25a2b2f33f@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Peter Shier [Thu, 20 Aug 2020 23:05:45 +0000 (16:05 -0700)]
KVM: nVMX: Update VMCS02 when L2 PAE PDPTE updates detected
When L2 uses PAE, L0 intercepts of L2 writes to CR0/CR3/CR4 call
load_pdptrs to read the possibly updated PDPTEs from the guest
physical address referenced by CR3. It loads them into
vcpu->arch.walk_mmu->pdptrs and sets VCPU_EXREG_PDPTR in
vcpu->arch.regs_dirty.
At the subsequent assumed reentry into L2, the mmu will call
vmx_load_mmu_pgd which calls ept_load_pdptrs. ept_load_pdptrs sees
VCPU_EXREG_PDPTR set in vcpu->arch.regs_dirty and loads
VMCS02.GUEST_PDPTRn from vcpu->arch.walk_mmu->pdptrs[]. This all works
if the L2 CRn write intercept always resumes L2.
The resume path calls vmx_check_nested_events which checks for
exceptions, MTF, and expired VMX preemption timers. If
vmx_check_nested_events finds any of these conditions pending it will
reflect the corresponding exit into L1. Live migration at this point
would also cause a missed immediate reentry into L2.
After L1 exits, vmx_vcpu_run calls vmx_register_cache_reset which
clears VCPU_EXREG_PDPTR in vcpu->arch.regs_dirty. When L2 next
resumes, ept_load_pdptrs finds VCPU_EXREG_PDPTR clear in
vcpu->arch.regs_dirty and does not load VMCS02.GUEST_PDPTRn from
vcpu->arch.walk_mmu->pdptrs[]. prepare_vmcs02 will then load
VMCS02.GUEST_PDPTRn from vmcs12->pdptr0/1/2/3 which contain the stale
values stored at last L2 exit. A repro of this bug showed L2 entering
triple fault immediately due to the bad VMCS02.GUEST_PDPTRn values.
When L2 is in PAE paging mode add a call to ept_load_pdptrs before
leaving L2. This will update VMCS02.GUEST_PDPTRn if they are dirty in
vcpu->arch.walk_mmu->pdptrs[].
Tested:
kvm-unit-tests with new directed test: vmx_mtf_pdpte_test.
Verified that test fails without the fix.
Also ran Google internal VMM with an Ubuntu 16.04 4.4.0-83 guest running a
custom hypervisor with a 32-bit Windows XP L2 guest using PAE. Prior to fix
would repro readily. Ran 14 simultaneous L2s for 140 iterations with no
failures.
Signed-off-by: Peter Shier <pshier@google.com> Reviewed-by: Jim Mattson <jmattson@google.com>
Message-Id: <20200820230545.2411347-1-pshier@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Fri, 11 Sep 2020 17:12:11 +0000 (13:12 -0400)]
Merge tag 'kvmarm-fixes-5.9-1' of git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for Linux 5.9, take #1
- Multiple stolen time fixes, with a new capability to match x86
- Fix for hugetlbfs mappings when PUD and PMD are the same level
- Fix for hugetlbfs mappings when PTE mappings are enforced
(dirty logging, for example)
- Fix tracing output of 64bit values
Merge tag 'drm-fixes-2020-09-11' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Regular fixes, not much a major amount. One thing though is Laurent
fixed some Kconfig issues, and I'm carrying the rapidio kconfig change
so the drm one for xlnx driver works. He hadn't got a response from
rapidio maintainers.
Otherwise, virtio, sun4i, tve200, ingenic have some fixes, one audio
fix for i915 and a core docs fix.
kconfig:
- rapidio/xlnx kconfig fix
core:
- Documentation fix
i915:
- audio regression fix
virtio:
- Fix double free in virtio
- Fix virtio unblank
- Remove output->enabled from virtio, as it should use crtc_state
sun4i:
- Add missing put_device in sun4i, and other fixes
- Handle sun4i alpha on lowest plane correctly
tv200:
- Fix tve200 enable/disable
ingenic
- Small ingenic fixes"
* tag 'drm-fixes-2020-09-11' of git://anongit.freedesktop.org/drm/drm:
drm/i915: fix regression leading to display audio probe failure on GLK
drm: xlnx: dpsub: Fix DMADEVICES Kconfig dependency
rapidio: Replace 'select' DMAENGINES 'with depends on'
drm/virtio: drop virtio_gpu_output->enabled
drm/sun4i: backend: Disable alpha on the lowest plane on the A20
drm/sun4i: backend: Support alpha property on lowest plane
drm/sun4i: Fix DE2 YVU handling
drm/tve200: Stabilize enable/disable
dma-buf: fence-chain: Document missing dma_fence_chain_init() parameter in kerneldoc
dma-buf: Fix kerneldoc of dma_buf_set_name()
drm/virtio: fix unblank
Documentation: fix dma-buf.rst underline length warning
drm/sun4i: Fix dsi dcs long write function
drm/ingenic: Fix driver not probing when IPU port is missing
drm/ingenic: Fix leak of device_node pointer
drm/sun4i: add missing put_device() call in sun8i_r40_tcon_tv_set_mux()
drm/virtio: Revert "drm/virtio: Call the right shmem helpers"
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"A number of driver bug fixes and a few recent regressions:
- Several bug fixes for bnxt_re. Crashing, incorrect data reported,
and corruption on new HW
- Memory leak and crash in rxe
- Fix sysfs corruption in rxe if the netdev name is too long
- Fix a crash on error unwind in the new cq_pool code
- Fix kobject panics in rtrs by working device lifetime properly
- Fix a data corruption bug in iser target related to misaligned
buffers"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/isert: Fix unaligned immediate-data handling
RDMA/rtrs-srv: Set .release function for rtrs srv device during device init
RDMA/bnxt_re: Remove set but not used variable 'qplib_ctx'
RDMA/core: Fix reported speed and width
RDMA/core: Fix unsafe linked list traversal after failing to allocate CQ
RDMA/bnxt_re: Remove the qp from list only if the qp destroy succeeds
RDMA/bnxt_re: Fix driver crash on unaligned PSN entry address
RDMA/bnxt_re: Restrict the max_gids to 256
RDMA/bnxt_re: Static NQ depth allocation
RDMA/bnxt_re: Fix the qp table indexing
RDMA/bnxt_re: Do not report transparent vlan from QP1
RDMA/mlx4: Read pkey table length instead of hardcoded value
RDMA/rxe: Fix panic when calling kmem_cache_create()
RDMA/rxe: Fix memleak in rxe_mem_init_user
RDMA/rxe: Fix the parent sysfs read when the interface has 15 chars
RDMA/rtrs-srv: Replace device_register with device_initialize and device_add
Using gcov to collect coverage data for kernels compiled with GCC 10.1
causes random malfunctions and kernel crashes. This is the result of a
changed GCOV_COUNTERS value in GCC 10.1 that causes a mismatch between
the layout of the gcov_info structure created by GCC profiling code and
the related structure used by the kernel.
Fix this by updating the in-kernel GCOV_COUNTERS value. Also re-enable
config GCOV_KERNEL for use with GCC 10.
Reported-by: Colin Ian King <colin.king@canonical.com> Reported-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Peter Oberparleiter <oberpar@linux.ibm.com> Tested-by: Leon Romanovsky <leonro@nvidia.com> Tested-and-Acked-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* powercap:
powercap: make documentation reflect code
powercap/intel_rapl: add support for AlderLake
powercap/intel_rapl: add support for RocketLake
powercap/intel_rapl: add support for TigerLake Desktop
Merge tag 'fsi-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joel/fsi into char-misc-next
Joel writes:
FSI changes for 5.10
- Misc code cleanups. Thanks to Colin, Xu and Rikard
- Features for the ASPEED FSI master
* Detect connection type and routing for Tacoma
* Run at full speed (200MHz) by default
* Set bus speed with a parameter
* CFAM reset GPIO
* 23 bit addressing
- Core features
* Disable unused links
* Set LBUS ownership
* tag 'fsi-for-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/joel/fsi:
fsi: aspeed: Support CFAM reset GPIO
fsi: aspeed: Add module param for bus divisor
fsi: aspeed: Run the bus at maximum speed
fsi: aspeed: Support cabled FSI
dt-bindings: fsi: Document gpios
fsi: scom: Constify scom_ids
fsi: sbefifo: Constify sbefifo_ids
fsi: master: Constify hub_master_ids
fsi: master: Remove link enable read-back
fsi: core: Set slave local bus ownership during init
fsi: core: Disable link when slave init fails
fsi: master: Add boolean parameter to link_enable function
fsi: fsi-occ: fix return value check in occ_probe()
fsi: aspeed: Enable 23-bit addressing
fsi: master-ast-cf: fix spelling mistake "firwmare" -> "firmware"
Dave Airlie [Thu, 10 Sep 2020 23:48:43 +0000 (09:48 +1000)]
Merge tag 'drm-misc-fixes-2020-09-09' of git://anongit.freedesktop.org/drm/drm-misc into drm-fixes
drm-misc-fixes for v5.9-rc5:
- Fix double free in virtio.
- Add missing put_device in sun4i, and other fixes.
- Small ingenic fixes.
- Handle sun4i alpha on lowest plane correctly.
- Remove output->enabled from virtio, as it should use crtc_state.
- Fix tve200 enable/disable.
- Documentation fix.
- Fix virtio unblank.
Merge tag 'f2fs-for-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs
Pull f2fs fixes from Jaegeuk Kim:
"Small bug fixes for:
- SMR drive fix
- infinite loop when building free node ids
- EOF at DIO read"
* tag 'f2fs-for-5.9-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/jaegeuk/f2fs:
f2fs: Return EOF on unaligned end of file DIO read
f2fs: fix indefinite loop scanning for free nid
f2fs: Fix type of section block count variables