git.proxmox.com Git - mirror

tests/acpi: q35: Allow addition of a CXL test.

Add exceptions for the DSDT and the new CEDT tables
specific to a new CXL test in the following patch.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20220429144110.25167-37-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

i386/pc: Enable CXL fixed memory windows

Add the CFMWs memory regions to the memorymap and adjust the
PCI window to avoid hitting the same memory.

Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Message-Id: <20220429144110.25167-36-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl/component Add a dumb HDM decoder handler

Add a trivial handler for now to cover the root bridge
where we could do some error checking in future.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20220429144110.25167-35-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

cxl/cxl-host: Add memops for CFMWS region.

These memops perform interleave decoding, walking down the
CXL topology from CFMWS described host interleave
decoder via CXL host bridge HDM decoders, through the CXL
root ports and finally call CXL type 3 specific read and write
functions.

Note that, whilst functional the current implementation does
not support:
* switches
* multiple HDM decoders at a given level.
* unaligned accesses across the interleave boundaries

Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Message-Id: <20220429144110.25167-34-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

mem/cxl_type3: Add read and write functions for associated hostmem.

Once a read or write reaches a CXL type 3 device, the HDM decoders
on the device are used to establish the Device Physical Address
which should be accessed.  These functions peform the required maths
and then use a device specific address space to access the
hostmem->mr to fullfil the actual operation.  Note that failed writes
are silent, but failed reads return poison.  Note this is based
loosely on:

https://lore.kernel.org/qemu-devel/20200817161853.593247-6-f4bug@amsat.org/
[RFC PATCH 0/9] hw/misc: Add support for interleaved memory accesses

Only lightly tested so far.  More complex test cases yet to be written.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20220429144110.25167-33-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

CXL/cxl_component: Add cxl_get_hb_cstate()

Accessor to get hold of the cxl state for a CXL host bridge
without exposing the internals of the implementation.

Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-32-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

pci/pcie_port: Add pci_find_port_by_pn()

Simple function to search a PCIBus to find a port by
it's port number.

CXL interleave decoding uses the port number as a target
so it is necessary to locate the port when doing interleave
decoding.

Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-31-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/pci-host/gpex-acpi: Add support for dsdt construction for pxb-cxl

This adds code to instantiate the slightly extended ACPI root port
description in DSDT as per the CXL 2.0 specification.

Basically a cut and paste job from the i386/pc code.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-30-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

acpi/cxl: Introduce CFMWS structures in CEDT

The CEDT CXL Fixed Window Memory Window Structures (CFMWs)
define regions of the host phyiscal address map which
(via an impdef means) are configured such that they have
a particular interleave setup across one or more CXL Host Bridges.

Reported-by: Alison Schofield <alison.schofield@intel.com>
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-29-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl/host: Add support for CXL Fixed Memory Windows.

The concept of these is introduced in [1] in terms of the
description the CEDT ACPI table. The principal is more general.
Unlike once traffic hits the CXL root bridges, the host system
memory address routing is implementation defined and effectively
static once observable by standard / generic system software.
Each CXL Fixed Memory Windows (CFMW) is a region of PA space
which has fixed system dependent routing configured so that
accesses can be routed to the CXL devices below a set of target
root bridges. The accesses may be interleaved across multiple
root bridges.

For QEMU we could have fully specified these regions in terms
of a base PA + size, but as the absolute address does not matter
it is simpler to let individual platforms place the memory regions.

ExampleS:
-cxl-fixed-memory-window targets.0=cxl.0,size=128G
-cxl-fixed-memory-window targets.0=cxl.1,size=128G
-cxl-fixed-memory-window targets.0=cxl0,targets.1=cxl.1,size=256G,interleave-granularity=2k

Specifies
* 2x 128G regions not interleaved across root bridges, one for each of
  the root bridges with ids cxl.0 and cxl.1
* 256G region interleaved across root bridges with ids cxl.0 and cxl.1
with a 2k interleave granularity.

When system software enumerates the devices below a given root bridge
it can then decide which CFMW to use. If non interleave is desired
(or possible) it can use the appropriate CFMW for the root bridge in
question.  If there are suitable devices to interleave across the
two root bridges then it may use the 3rd CFMS.

A number of other designs were considered but the following constraints
made it hard to adapt existing QEMU approaches to this particular problem.
1) The size must be known before a specific architecture / board brings
   up it's PA memory map.  We need to set up an appropriate region.
2) Using links to the host bridges provides a clean command line interface
   but these links cannot be established until command line devices have
   been added.

Hence the two step process used here of first establishing the size,
interleave-ways and granularity + caching the ids of the host bridges
and then, once available finding the actual host bridges so they can
be used later to support interleave decoding.

[1] CXL 2.0 ECN: CEDT CFMWS & QTG DSM (computeexpresslink.org / specifications)

Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Markus Armbruster <armbru@redhat.com> # QAPI Schema
Message-Id: <20220429144110.25167-28-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl/component: Add utils for interleave parameter encoding/decoding

Both registers and the CFMWS entries in CDAT use simple encodings
for the number of interleave ways and the interleave granularity.
Introduce simple conversion functions to/from the unencoded
number / size. So far the iw decode has not been needed so is
it not implemented.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-27-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

acpi/cxl: Create the CEDT (9.14.1)

The CXL Early Discovery Table is defined in the CXL 2.0 specification as
a way for the OS to get CXL specific information from the system
firmware.

CXL 2.0 specification adds an _HID, ACPI0016, for CXL capable host
bridges, with a _CID of PNP0A08 (PCIe host bridge). CXL aware software
is able to use this initiate the proper _OSC method, and get the _UID
which is referenced by the CEDT. Therefore the existence of an ACPI0016
device allows a CXL aware driver perform the necessary actions. For a
CXL capable OS, this works. For a CXL unaware OS, this works.

CEDT awaremess requires more. The motivation for ACPI0017 is to provide
the possibility of having a Linux CXL module that can work on a legacy
Linux kernel. Linux core PCI/ACPI which won't be built as a module,
will see the _CID of PNP0A08 and bind a driver to it. If we later loaded
a driver for ACPI0016, Linux won't be able to bind it to the hardware
because it has already bound the PNP0A08 driver. The ACPI0017 device is
an opportunity to have an object to bind a driver will be used by a
Linux driver to walk the CXL topology and do everything that we would
have preferred to do with ACPI0016.

There is another motivation for an ACPI0017 device which isn't
implemented here. An operating system needs an attach point for a
non-volatile region provider that understands cross-hostbridge
interleaving. Since QEMU emulation doesn't support interleaving yet,
this is more important on the OS side, for now.

As of CXL 2.0 spec, only 1 sub structure is defined, the CXL Host Bridge
Structure (CHBS) which is primarily useful for telling the OS exactly
where the MMIO for the host bridge is.

Link: https://lore.kernel.org/linux-cxl/20210115034911.nkgpzc756d6qmjpl@intel.com/T/#t
Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-26-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

acpi/cxl: Add _OSC implementation (9.14.2)

CXL 2.0 specification adds 2 new dwords to the existing _OSC definition
from PCIe. The new dwords are accessed with a new uuid. This
implementation supports what is in the specification.

iasl -d decodes the result of this patch as:

Name (SUPP, Zero)
Name (CTRL, Zero)
Name (SUPC, Zero)
Name (CTRC, Zero)
Method (_OSC, 4, NotSerialized)  // _OSC: Operating System Capabilities
{
    CreateDWordField (Arg3, Zero, CDW1)
    If (((Arg0 == ToUUID ("33db4d5b-1ff7-401c-9657-7441c03dd766") /* PCI Host Bridge Device */) || (Arg0 == ToUUID ("68f2d50b-c469-4d8a-bd3d-941a103fd3fc") /* Unknown UUID */)))
    {
        CreateDWordField (Arg3, 0x04, CDW2)
        CreateDWordField (Arg3, 0x08, CDW3)
        Local0 = CDW3 /* \_SB_.PC0C._OSC.CDW3 */
        Local0 &= 0x1F
        If ((Arg1 != One))
        {
            CDW1 |= 0x08
        }

        If ((CDW3 != Local0))
        {
            CDW1 |= 0x10
        }

        SUPP = CDW2 /* \_SB_.PC0C._OSC.CDW2 */
        CTRL = CDW3 /* \_SB_.PC0C._OSC.CDW3 */
        CDW3 = Local0
        If ((Arg0 == ToUUID ("68f2d50b-c469-4d8a-bd3d-941a103fd3fc") /* Unknown UUID */))
        {
            CreateDWordField (Arg3, 0x0C, CDW4)
            CreateDWordField (Arg3, 0x10, CDW5)
            SUPC = CDW4 /* \_SB_.PC0C._OSC.CDW4 */
            CTRC = CDW5 /* \_SB_.PC0C._OSC.CDW5 */
            CDW5 |= One
        }

        Return (Arg3)
    }
    Else
    {
        CDW1 |= 0x04
        Return (Arg3)
    }

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20220429144110.25167-25-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl/component: Implement host bridge MMIO (8.2.5, table 142)

CXL host bridges themselves may have MMIO. Since host bridges don't have
a BAR they are treated as special for MMIO. This patch includes
i386/pc support.
Also hook up the device reset now that we have have the MMIO
space in which the results are visible.

Note that we duplicate the PCI express case for the aml_build but
the implementations will diverge when the CXL specific _OSC is
introduced.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-24-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

qtests/cxl: Add initial root port and CXL type3 tests

At this stage we can boot configurations with host bridges,
root ports and type 3 memory devices, so add appropriate
tests.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-23-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl/device: Implement get/set Label Storage Area (LSA)

Implement get and set handlers for the Label Storage Area
used to hold data describing persistent memory configuration
so that it can be ensured it is seen in the same configuration
after reboot.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20220429144110.25167-22-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl/device: Plumb real Label Storage Area (LSA) sizing

This should introduce no change. Subsequent work will make use of this
new class member.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20220429144110.25167-21-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl/device: Add some trivial commands

GET_FW_INFO and GET_PARTITION_INFO, for this emulation, is equivalent to
info already returned in the IDENTIFY command. To have a more robust
implementation, add those.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20220429144110.25167-20-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl/device: Implement MMIO HDM decoding (8.2.5.12)

A device's volatile and persistent memory are known Host Defined Memory
(HDM) regions. The mechanism by which the device is programmed to claim
the addresses associated with those regions is through dedicated logic
known as the HDM decoder. In order to allow the OS to properly program
the HDMs, the HDM decoders must be modeled.

There are two ways the HDM decoders can be implemented, the legacy
mechanism is through the PCIe DVSEC programming from CXL 1.1 (8.1.3.8),
and MMIO is found in 8.2.5.12 of the spec. For now, 8.1.3.8 is not
implemented.

Much of CXL device logic is implemented in cxl-utils. The HDM decoder
however is implemented directly by the device implementation.
Whilst the implementation currently does no validity checks on the
encoder set up, future work will add sanity checking specific to
the type of cxl component.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Co-developed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-19-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl/device: Add a memory device (8.2.8.5)

A CXL memory device (AKA Type 3) is a CXL component that contains some
combination of volatile and persistent memory. It also implements the
previously defined mailbox interface as well as the memory device
firmware interface.

Although the memory device is configured like a normal PCIe device, the
memory traffic is on an entirely separate bus conceptually (using the
same physical wires as PCIe, but different protocol).

Once the CXL topology is fully configure and address decoders committed,
the guest physical address for the memory device is part of a larger
window which is owned by the platform. The creation of these windows
is later in this series.

The following example will create a 256M device in a 512M window:
-object "memory-backend-file,id=cxl-mem1,share,mem-path=cxl-type3,size=512M"
-device "cxl-type3,bus=rp0,memdev=cxl-mem1,id=cxl-pmem0"

Note: Dropped PCDIMM info interfaces for now. They can be added if
appropriate at a later date.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Message-Id: <20220429144110.25167-18-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl/rp: Add a root port

This adds just enough of a root port implementation to be able to
enumerate root ports (creating the required DVSEC entries). What's not
here yet is the MMIO nor the ability to write some of the DVSEC entries.

This can be added with the qemu commandline by adding a rootport to a
specific CXL host bridge. For example:
-device cxl-rp,id=rp0,bus="cxl.0",addr=0.0,chassis=4

Like the host bridge patch, the ACPI tables aren't generated at this
point and so system software cannot use it.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-17-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

qtest/cxl: Introduce initial test for pxb-cxl only.

Initial test with just pxb-cxl. Other tests will be added
alongside functionality.

Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-16-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/pxb: Allow creation of a CXL PXB (host bridge)

This works like adding a typical pxb device, except the name is
'pxb-cxl' instead of 'pxb-pcie'. An example command line would be as
follows:
-device pxb-cxl,id=cxl.0,bus="pcie.0",bus_nr=1

A CXL PXB is backward compatible with PCIe. What this means in practice
is that an operating system that is unaware of CXL should still be able
to enumerate this topology as if it were PCIe.

One can create multiple CXL PXB host bridges, but a host bridge can only
be connected to the main root bus. Host bridges cannot appear elsewhere
in the topology.

Note that as of this patch, the ACPI tables needed for the host bridge
(specifically, an ACPI object in _SB named ACPI0016 and the CEDT) aren't
created. So while this patch internally creates it, it cannot be
properly used by an operating system or other system software.

Also necessary is to add an exception to scripts/device-crash-test
similar to that for exiting pxb as both must created on a PCIexpress
host bus.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan.Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-15-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

cxl: Machine level control on whether CXL support is enabled

There are going to be some potential overheads to CXL enablement,
for example the host bridge region reserved in memory maps.
Add a machine level control so that CXL is disabled by default.

Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-14-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/pci/cxl: Create a CXL bus type

The easiest way to differentiate a CXL bus, and a PCIE bus is using a
flag. A CXL bus, in hardware, is backward compatible with PCIE, and
therefore the code tries pretty hard to keep them in sync as much as
possible.

The other way to implement this would be to try to cast the bus to the
correct type. This is less code and useful for debugging via simply
looking at the flags.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-13-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/pxb: Use a type for realizing expanders

This opens up the possibility for more types of expanders (other than
PCI and PCIe). We'll need this to create a CXL expander.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-12-Jonathan.Cameron@huawei.com>

hw/cxl/device: Add log commands (8.2.9.4) + CEL

CXL specification provides for the ability to obtain logs from the
device. Logs are either spec defined, like the "Command Effects Log"
(CEL), or vendor specific. UUIDs are defined for all log types.

The CEL is a mechanism to provide information to the host about which
commands are supported. It is useful both to determine which spec'd
optional commands are supported, as well as provide a list of vendor
specified commands that might be used. The CEL is already created as
part of mailbox initialization, but here it is now exported to hosts
that use these log commands.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-11-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl/device: Timestamp implementation (8.2.9.3)

Errata F4 to CXL 2.0 clarified the meaning of the timer as the
sum of the value set with the timestamp set command and the number
of nano seconds since it was last set.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-10-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl/device: Add cheap EVENTS implementation (8.2.9.1)

Using the previously implemented stubbed helpers, it is now possible to
easily add the missing, required commands to the implementation.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-9-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl/device: Add memory device utilities

Memory devices implement extra capabilities on top of CXL devices. This
adds support for that.

A large part of memory devices is the mailbox/command interface. All of
the mailbox handling is done in the mailbox-utils library. Longer term,
new CXL devices that are being emulated may want to handle commands
differently, and therefore would need a mechanism to opt in/out of the
specific generic handlers. As such, this is considered sufficient for
now, but may need more depth in the future.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-8-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl/device: Implement basic mailbox (8.2.8.4)

This is the beginning of implementing mailbox support for CXL 2.0
devices. The implementation recognizes when the doorbell is rung,
handles the command/payload, clears the doorbell while returning error
codes and data.

Generally the mailbox mechanism is designed to permit communication
between the host OS and the firmware running on the device. For our
purposes, we emulate both the firmware, implemented primarily in
cxl-mailbox-utils.c, and the hardware.

No commands are implemented yet.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-7-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl/device: Implement the CAP array (8.2.8.1-2)

This implements all device MMIO up to the first capability. That
includes the CXL Device Capabilities Array Register, as well as all of
the CXL Device Capability Header Registers. The latter are filled in as
they are implemented in the following patches.

Endianness and alignment are managed by softmmu memory core.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-6-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/cxl/device: Introduce a CXL device (8.2.8)

A CXL device is a type of CXL component. Conceptually, a CXL device
would be a leaf node in a CXL topology. From an emulation perspective,
CXL devices are the most complex and so the actual implementation is
reserved for discrete commits.

This new device type is specifically catered towards the eventual
implementation of a Type3 CXL.mem device, 8.2.8.5 in the CXL 2.0
specification.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Adam Manzanares <a.manzanares@samsung.com>
Message-Id: <20220429144110.25167-5-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

MAINTAINERS: Add entry for Compute Express Link Emulation

The CXL emulation will be jointly maintained by Ben Widawsky
and Jonathan Cameron. Broken out as a separate patch
to improve visibility.

Signed-off-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220429144110.25167-4-Jonathan.Cameron@huawei.com>

hw/cxl/component: Introduce CXL components (8.1.x, 8.2.5)

A CXL 2.0 component is any entity in the CXL topology. All components
have a analogous function in PCIe. Except for the CXL host bridge, all
have a PCIe config space that is accessible via the common PCIe
mechanisms. CXL components are enumerated via DVSEC fields in the
extended PCIe header space. CXL components will minimally implement some
subset of CXL.mem and CXL.cache registers defined in 8.2.5 of the CXL
2.0 specification. Two headers and a utility library are introduced to
support the minimum functionality needed to enumerate components.

The cxl_pci header manages bits associated with PCI, specifically the
DVSEC and related fields. The cxl_component.h variant has data
structures and APIs that are useful for drivers implementing any of the
CXL 2.0 components. The library takes care of making use of the DVSEC
bits and the CXL.[mem|cache] registers. Per spec, the registers are
little endian.

None of the mechanisms required to enumerate a CXL capable hostbridge
are introduced at this point.

Note that the CXL.mem and CXL.cache registers used are always 4B wide.
It's possible in the future that this constraint will not hold.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Adam Manzanares <a.manzanares@samsung.com>
Message-Id: <20220429144110.25167-3-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

hw/pci/cxl: Add a CXL component type (interface)

A CXL component is a hardware entity that implements CXL component
registers from the CXL 2.0 spec (8.2.3). Currently these represent 3
general types.
1. Host Bridge
2. Ports (root, upstream, downstream)
3. Devices (memory, other)

A CXL component can be conceptually thought of as a PCIe device with
extra functionality when enumerated and enabled. For this reason, CXL
does here, and will continue to add on to existing PCI code paths.

Host bridges will typically need to be handled specially and so they can
implement this newly introduced interface or not. All other components
should implement this interface. Implementing this interface allows the
core PCI code to treat these devices as special where appropriate.

Signed-off-by: Ben Widawsky <ben.widawsky@intel.com>
Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Adam Manzanares <a.manzanares@samsung.com>
Message-Id: <20220429144110.25167-2-Jonathan.Cameron@huawei.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>

intel-iommu: correct the value used for error_setg_errno()

error_setg_errno() expects a normal errno value, not a negated
one, so we should use ENOTSUP instead of -ENOSUP.

Fixes: Coverity CID 1487174
Fixes: ("intel_iommu: support snoop control")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220401022824.9337-1-jasowang@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Peter Xu <peterx@redhat.com>

virtio: fix feature negotiation for ACCESS_PLATFORM

Unlike most virtio features ACCESS_PLATFORM is considered mandatory by
QEMU, i.e. the driver must accept it if offered by the device. The
virtio specification says that the driver SHOULD accept the
ACCESS_PLATFORM feature if offered, and that the device MAY fail to
operate if ACCESS_PLATFORM was offered but not negotiated.

While a SHOULD ain't exactly a MUST, we are certainly allowed to fail
the device when the driver fences ACCESS_PLATFORM. With commit
2943b53f68 ("virtio: force VIRTIO_F_IOMMU_PLATFORM") we already made the
decision to do so whenever the get_dma_as() callback is implemented (by
the bus), which in practice means for the entirety of virtio-pci.

That means, if the device needs to translate I/O addresses, then
ACCESS_PLATFORM is mandatory. The aforementioned commit tells us in the
commit message that this is for security reasons. More precisely if we
were to allow a less then trusted driver (e.g. an user-space driver, or
a nested guest) to make the device bypass the IOMMU by not negotiating
ACCESS_PLATFORM, then the guest kernel would have no ability to
control/police (by programming the IOMMU) what pieces of guest memory
the driver may manipulate using the device. Which would break security
assumptions within the guest.

If ACCESS_PLATFORM is offered not because we want the device to utilize
an IOMMU and do address translation, but because the device does not
have access to the entire guest RAM, and needs the driver to grant
access to the bits it needs access to (e.g. confidential guest support),
we still require the guest to have the corresponding logic and to accept
ACCESS_PLATFORM. If the driver does not accept ACCESS_PLATFORM, then
things are bound to go wrong, and we may see failures much less graceful
than failing the device because the driver didn't negotiate
ACCESS_PLATFORM.

So let us make ACCESS_PLATFORM mandatory for the driver regardless
of whether the get_dma_as() callback is implemented or not.

Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Fixes: 2943b53f68 ("virtio: force VIRTIO_F_IOMMU_PLATFORM")
Message-Id: <20220307112939.2780117-1-pasic@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Cornelia Huck <cohuck@redhat.com>

Merge tag 'for-upstream' of https://gitlab.com/bonzini/qemu into staging

* small cleanups for pc-bios/optionrom Makefiles
* checkpatch: fix g_malloc check
* fix mremap() and RDMA detection
* confine igd-passthrough-isa-bridge to Xen-enabled builds
* cover PCI in arm-virt machine qtests
* add -M boot and -M mem compound properties
* bump SLIRP submodule
* support CFI with system libslirp (>= 4.7)
* clean up CoQueue wakeup functions
* fix vhost-vsock regression
* fix --disable-vnc compilation
* other minor bugfixes

# -----BEGIN PGP SIGNATURE-----
#
# iQFIBAABCAAyFiEE8TM4V0tmI4mGbHaCv/vSX3jHroMFAmJ8/KMUHHBib256aW5p
# QHJlZGhhdC5jb20ACgkQv/vSX3jHroNTTAf9Et1C8iZn+OlZi99wMEeMy8a4mIE5
# CpkBpFphhkBvt3AH7XNsCyL4Gea4QgsI7nOIEVUwvW7gPf85PiBUX8mjrIVg3x1k
# bmMEwMKSTYPmDieAnYBP9zCqZQXNYP8L8WxVs2jFY2GXZ2ZogODYFbvCY4yEEB72
# UR6uIvQRdpiB6BEj8UZ+5i+sDtb0zxqrjzUz8T/PJC9/2JSNgi+sAWWQoQT3PPU7
# R7z2nmEa1VeVLPP6mUHvJKhBltVXF+LyIjQHvo+Tp9tSqp9JwXfFBNQ5W/MFes2D
# skF47N7PdgKRH9Dp4r0j+MqBwoAq86+ao+MKsbQ1Gb91HhoCWt/MrVrVyg==
# =1E6P
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 12 May 2022 05:25:07 AM PDT
# gpg:                using RSA key F13338574B662389866C7682BFFBD25F78C7AE83
# gpg:                issuer "pbonzini@redhat.com"
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>" [undefined]
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* tag 'for-upstream' of https://gitlab.com/bonzini/qemu: (27 commits)
  vmxcap: add tertiary execution controls
  vl: make machine type deprecation a warning
  meson: link libpng independent of vnc
  vhost-backend: do not depend on CONFIG_VHOST_VSOCK
  coroutine-lock: qemu_co_queue_restart_all is a coroutine-only qemu_co_enter_all
  coroutine-lock: introduce qemu_co_queue_enter_all
  coroutine-lock: qemu_co_queue_next is a coroutine-only qemu_co_enter_next
  net: slirp: allow CFI with libslirp >= 4.7
  net: slirp: add support for CFI-friendly timer API
  net: slirp: switch to slirp_new
  net: slirp: introduce a wrapper struct for QemuTimer
  slirp: bump submodule past 4.7 release
  machine: move more memory validation to Machine object
  machine: make memory-backend a link property
  machine: add mem compound property
  machine: add boot compound property
  machine: use QAPI struct for boot configuration
  tests/qtest/libqos: Add generic pci host bridge in arm-virt machine
  tests/qtest/libqos: Skip hotplug tests if pci root bus is not hotpluggable
  tests/qtest/libqos/pci: Introduce pio_limit
  ...

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Merge tag 'for-upstream' of git://repo.or.cz/qemu/kevin into staging

Block layer patches

- coroutine: Fix crashes due to too large pool batch size
- fdc: Prevent end-of-track overrun
- nbd: MULTI_CONN for shared writable exports
- iotests test runner improvements

# -----BEGIN PGP SIGNATURE-----
#
# iQJFBAABCAAvFiEE3D3rFZqa+V09dFb+fwmycsiPL9YFAmJ9KCkRHGt3b2xmQHJl
# ZGhhdC5jb20ACgkQfwmycsiPL9ZtSRAAmYDFBPqxfutpFXM7kIKwL6COXJC12MOx
# Tmu8cDiGB/jNChdi3kl6I5h5njzo3U0ZlL/Ign6EzHoeoXLAPSeUWmuRsARwsZ+A
# rL61gf6yrMjAo45FZuIS0GlMDk8BauRwPl9qPWeqQcrtOMYpxwZfyFGmcMpQgAOI
# MSC1I8p3FA7oJhGpKIHDPOjaZA97Lm2rLnDIwZ4f0YgssbybFBcFCXOQbhpsVhLy
# Tjp/L+qRUtna9xBsPHQvHZW0kITQbCQPdX+oVqqUmwzSvuHqfXKe1YppyPjBt/S0
# H7nxtx4HOgP0lP5Kea+wbIRAk9Da5uaOW8hlMWRLShEKv1iTUenQSKteBB6CD03t
# GD9ze1kGoR9b6szw795BXxZxcWii0cn359lIVHeKR/U8zDuz5w3zhyl0klK8xeJy
# nj+JErLwQ7BD8kNR+7WAfXTF3tk2dQao1AvsBjn087KjMiJ/Mg8HY4K2zrjBUrHL
# DLTyAIjzct3BWJDZ02fb5jb8pHmIP3JO6m9Zvjm7ibP65BqJOwIXUTFpbgnrOg45
# oFLDV4JgC4Hh4GEtdm+UhQE51A0VVW5pDaqWTdWkCcuk3QgxUdM3Wm3SW6pw1Gvb
# T0X0j5RgF/k3YrW576R/VIy6z4YPbzAtiG4O/zSlsujHoDcVNWnxApgSB/unaDh8
# LNkFPGEMeSs=
# =JmTm
# -----END PGP SIGNATURE-----
# gpg: Signature made Thu 12 May 2022 08:30:49 AM PDT
# gpg:                using RSA key DC3DEB159A9AF95D3D7456FE7F09B272C88F2FD6
# gpg:                issuer "kwolf@redhat.com"
# gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" [full]

* tag 'for-upstream' of git://repo.or.cz/qemu/kevin:
  qemu-iotests: inline common.config into common.rc
  nbd/server: Allow MULTI_CONN for shared writable exports
  qemu-nbd: Pass max connections to blockdev layer
  tests/qtest/fdc-test: Add a regression test for CVE-2021-3507
  hw/block/fdc: Prevent end-of-track overrun (CVE-2021-3507)
  .gitlab-ci.d: export meson testlog.txt as an artifact
  tests/qemu-iotests: print intent to run a test in TAP mode
  iotests/testrunner: Flush after run_test()
  coroutine: Revert to constant batch size
  coroutine: Rename qemu_coroutine_inc/dec_pool_size()

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

qemu-iotests: inline common.config into common.rc

common.rc has some complicated logic to find the common.config that
dates back to xfstests and is completely unnecessary now. Just include
the contents of the file.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220505094723.732116-1-pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

vmxcap: add tertiary execution controls

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

vl: make machine type deprecation a warning

error_report should generally be followed by a failure; if we can proceed
anyway, that is just a warning and should be communicated properly to
the user with warn_report.

Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220511175043.27327-1-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

nbd/server: Allow MULTI_CONN for shared writable exports

According to the NBD spec, a server that advertises
NBD_FLAG_CAN_MULTI_CONN promises that multiple client connections will
not see any cache inconsistencies: when properly separated by a single
flush, actions performed by one client will be visible to another
client, regardless of which client did the flush.

We always satisfy these conditions in qemu - even when we support
multiple clients, ALL clients go through a single point of reference
into the block layer, with no local caching.  The effect of one client
is instantly visible to the next client.  Even if our backend were a
network device, we argue that any multi-path caching effects that
would cause inconsistencies in back-to-back actions not seeing the
effect of previous actions would be a bug in that backend, and not the
fault of caching in qemu.  As such, it is safe to unconditionally
advertise CAN_MULTI_CONN for any qemu NBD server situation that
supports parallel clients.

Note, however, that we don't want to advertise CAN_MULTI_CONN when we
know that a second client cannot connect (for historical reasons,
qemu-nbd defaults to a single connection while nbd-server-add and QMP
commands default to unlimited connections; but we already have
existing means to let either style of NBD server creation alter those
defaults).  This is visible by no longer advertising MULTI_CONN for
'qemu-nbd -r' without -e, as in the iotest nbd-qemu-allocation.

The harder part of this patch is setting up an iotest to demonstrate
behavior of multiple NBD clients to a single server.  It might be
possible with parallel qemu-io processes, but I found it easier to do
in python with the help of libnbd, and help from Nir and Vladimir in
writing the test.

Signed-off-by: Eric Blake <eblake@redhat.com>
Suggested-by: Nir Soffer <nsoffer@redhat.com>
Suggested-by: Vladimir Sementsov-Ogievskiy <v.sementsov-og@mail.ru>
Message-Id: <20220512004924.417153-3-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

qemu-nbd: Pass max connections to blockdev layer

The next patch wants to adjust whether the NBD server code advertises
MULTI_CONN based on whether it is known if the server limits to
exactly one client. For a server started by QMP, this information is
obtained through nbd_server_start (which can support more than one
export); but for qemu-nbd (which supports exactly one export), it is
controlled only by the command-line option -e/--shared. Since we
already have a hook function used by qemu-nbd, it's easiest to just
alter its signature to fit our needs.

Signed-off-by: Eric Blake <eblake@redhat.com>
Message-Id: <20220512004924.417153-2-eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/qtest/fdc-test: Add a regression test for CVE-2021-3507

Add the reproducer from https://gitlab.com/qemu-project/qemu/-/issues/339

Without the previous commit, when running 'make check-qtest-i386'
with QEMU configured with '--enable-sanitizers' we get:

  ==4028352==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x619000062a00 at pc 0x5626d03c491a bp 0x7ffdb4199410 sp 0x7ffdb4198bc0
  READ of size 786432 at 0x619000062a00 thread T0
      #0 0x5626d03c4919 in __asan_memcpy (qemu-system-i386+0x1e65919)
      #1 0x5626d1c023cc in flatview_write_continue softmmu/physmem.c:2787:13
      #2 0x5626d1bf0c0f in flatview_write softmmu/physmem.c:2822:14
      #3 0x5626d1bf0798 in address_space_write softmmu/physmem.c:2914:18
      #4 0x5626d1bf0f37 in address_space_rw softmmu/physmem.c:2924:16
      #5 0x5626d1bf14c8 in cpu_physical_memory_rw softmmu/physmem.c:2933:5
      #6 0x5626d0bd5649 in cpu_physical_memory_write include/exec/cpu-common.h:82:5
      #7 0x5626d0bd0a07 in i8257_dma_write_memory hw/dma/i8257.c:452:9
      #8 0x5626d09f825d in fdctrl_transfer_handler hw/block/fdc.c:1616:13
      #9 0x5626d0a048b4 in fdctrl_start_transfer hw/block/fdc.c:1539:13
      #10 0x5626d09f4c3e in fdctrl_write_data hw/block/fdc.c:2266:13
      #11 0x5626d09f22f7 in fdctrl_write hw/block/fdc.c:829:9
      #12 0x5626d1c20bc5 in portio_write softmmu/ioport.c:207:17

  0x619000062a00 is located 0 bytes to the right of 512-byte region [0x619000062800,0x619000062a00)
  allocated by thread T0 here:
      #0 0x5626d03c66ec in posix_memalign (qemu-system-i386+0x1e676ec)
      #1 0x5626d2b988d4 in qemu_try_memalign util/oslib-posix.c:210:11
      #2 0x5626d2b98b0c in qemu_memalign util/oslib-posix.c:226:27
      #3 0x5626d09fbaf0 in fdctrl_realize_common hw/block/fdc.c:2341:20
      #4 0x5626d0a150ed in isabus_fdc_realize hw/block/fdc-isa.c:113:5
      #5 0x5626d2367935 in device_set_realized hw/core/qdev.c:531:13

  SUMMARY: AddressSanitizer: heap-buffer-overflow (qemu-system-i386+0x1e65919) in __asan_memcpy
  Shadow bytes around the buggy address:
    0x0c32800044f0: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0c3280004510: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0c3280004520: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
    0x0c3280004530: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  =>0x0c3280004540:[fa]fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004550: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004560: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004570: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004580: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
    0x0c3280004590: fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd fd
  Shadow byte legend (one shadow byte represents 8 application bytes):
    Addressable:           00
    Heap left redzone:       fa
    Freed heap region:       fd
  ==4028352==ABORTING

[ kwolf: Added snapshot=on to prevent write file lock failure ]

Reported-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Alexander Bulekov <alxndr@bu.edu>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

hw/block/fdc: Prevent end-of-track overrun (CVE-2021-3507)

Per the 82078 datasheet, if the end-of-track (EOT byte in
the FIFO) is more than the number of sectors per side, the
command is terminated unsuccessfully:

* 5.2.5 DATA TRANSFER TERMINATION

  The 82078 supports terminal count explicitly through
  the TC pin and implicitly through the underrun/over-
  run and end-of-track (EOT) functions. For full sector
  transfers, the EOT parameter can define the last
  sector to be transferred in a single or multisector
  transfer. If the last sector to be transferred is a par-
  tial sector, the host can stop transferring the data in
  mid-sector, and the 82078 will continue to complete
  the sector as if a hardware TC was received. The
  only difference between these implicit functions and
  TC is that they return "abnormal termination" result
  status. Such status indications can be ignored if they
  were expected.

* 6.1.3 READ TRACK

  This command terminates when the EOT specified
  number of sectors have been read. If the 82078
  does not find an I D Address Mark on the diskette
  after the second· occurrence of a pulse on the
  INDX# pin, then it sets the IC code in Status Regis-
  ter 0 to "01" (Abnormal termination), sets the MA bit
  in Status Register 1 to "1", and terminates the com-
  mand.

* 6.1.6 VERIFY

  Refer to Table 6-6 and Table 6-7 for information
  concerning the values of MT and EC versus SC and
  EOT value.

* Table 6·6. Result Phase Table

* Table 6-7. Verify Command Result Phase Table

Fix by aborting the transfer when EOT > # Sectors Per Side.

Cc: qemu-stable@nongnu.org
Cc: Hervé Poussineau <hpoussin@reactos.org>
Fixes: baca51faff0 ("floppy driver: disk geometry auto detect")
Reported-by: Alexander Bulekov <alxndr@bu.edu>
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/339
Signed-off-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-Id: <20211118115733.4038610-2-philmd@redhat.com>
Reviewed-by: Hanna Reitz <hreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

meson: link libpng independent of vnc

Currently png support is dependent on vnc for linking object file to
libpng. This commit makes the parameter independent of vnc as it breaks
system emulator with --disable-vnc unless --disable-png is added.

Fixes: 9a0a119a38 ("Added parameter to take screenshot with screendump as PNG", 2022-04-27)
Signed-off-by: Kshitij Suri <kshitij.suri@nutanix.com>
Message-Id: <20220510161932.228481-1-kshitij.suri@nutanix.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

vhost-backend: do not depend on CONFIG_VHOST_VSOCK

The vsock callbacks .vhost_vsock_set_guest_cid and
.vhost_vsock_set_running are the only ones to be conditional
on #ifdef CONFIG_VHOST_VSOCK. This is different from any other
device-dependent callbacks like .vhost_scsi_set_endpoint, and it
also broke when CONFIG_VHOST_VSOCK was changed to a per-target
symbol.

It would be possible to also use the CONFIG_DEVICES include, but
really there is no reason for most virtio files to be per-target
so just remove the #ifdef to fix the issue.

Reported-by: Dov Murik <dovmurik@linux.ibm.com>
Fixes: 9972ae314f ("build: move vhost-vsock configuration to Kconfig")
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

coroutine-lock: qemu_co_queue_restart_all is a coroutine-only qemu_co_enter_all

qemu_co_queue_restart_all is basically the same as qemu_co_enter_all
but without a QemuLockable argument. That's perfectly fine, but only as
long as the function is marked coroutine_fn. If used outside coroutine
context, qemu_co_queue_wait will attempt to take the lock and that
is just broken: if you are calling qemu_co_queue_restart_all outside
coroutine context, the lock is going to be a QemuMutex which cannot be
taken twice by the same thread.

The patch adds the marker to qemu_co_queue_restart_all and to its sole
non-coroutine_fn caller; it then reimplements the function in terms of
qemu_co_enter_all_impl, to remove duplicated code and to clarify that the
latter also works in coroutine context.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20220427130830.150180-4-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

coroutine-lock: introduce qemu_co_queue_enter_all

Because qemu_co_queue_restart_all does not release the lock, it should
be used only in coroutine context. Introduce a new function that,
like qemu_co_enter_next, does release the lock, and use it whenever
qemu_co_queue_restart_all was used outside coroutine context.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20220427130830.150180-3-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

coroutine-lock: qemu_co_queue_next is a coroutine-only qemu_co_enter_next

qemu_co_queue_next is basically the same as qemu_co_enter_next but
without a QemuLockable argument. That's perfectly fine, but only
as long as the function is marked coroutine_fn. If used outside
coroutine context, qemu_co_queue_wait will attempt to take the lock
and that is just broken: if you are calling qemu_co_queue_next outside
coroutine context, the lock is going to be a QemuMutex which cannot be
taken twice by the same thread.

The patch adds the marker and reimplements qemu_co_queue_next in terms of
qemu_co_enter_next_impl, to remove duplicated code and to clarify that the
latter also works in coroutine context.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-Id: <20220427130830.150180-2-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

net: slirp: allow CFI with libslirp >= 4.7

slirp 4.7 introduces a new CFI-friendly timer callback that does
not pass function pointers within libslirp as callbacks for timers.
Check the version number and, if it is new enough, allow using CFI
even with a system libslirp.

Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Marc-André Lureau <malureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

net: slirp: add support for CFI-friendly timer API

libslirp 4.7 introduces a CFI-friendly version of the .timer_new callback.
The new callback replaces the function pointer with an enum; invoking the
callback is done with a new function slirp_handle_timer.

Support the new API so that CFI can be made compatible with using a system
libslirp.

Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Marc-André Lureau <malureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

net: slirp: switch to slirp_new

Replace slirp_init with slirp_new, so that a more recent cfg.version
can be specified. The function appeared in version 4.1.0.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

net: slirp: introduce a wrapper struct for QemuTimer

This struct will be extended in the next few patches to support the
new slirp_handle_timer() call. For that we need to store an additional
"int" for each SLIRP timer, in addition to the cb_opaque.

Reviewed-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Reviewed-by: Marc-André Lureau <malureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

slirp: bump submodule past 4.7 release

Version 4.7 of slirp provides a new timer API that works better with CFI,
together with several other improvements:

* Allow disabling the internal DHCP server !22
* Support Unix sockets in hostfwd !103
* IPv6 DNS proxying support !110
* bootp: add support for UEFI HTTP boot !111

and bugfixes.

The submodule update also includes 2 commits to fix warnings in the
Win32 build.

Reviewed-by: Marc-André Lureau <malureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

machine: move more memory validation to Machine object

This allows setting memory properties without going through
vl.c, and have them validated just the same.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414165300.555321-6-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

machine: make memory-backend a link property

Handle HostMemoryBackend creation and setting of ms->ram entirely in
machine_run_board_init.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414165300.555321-5-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

machine: add mem compound property

Make -m syntactic sugar for a compound property "-machine
mem.{size,max-size,slots}". The new property does not have
the magic conversion to megabytes of unsuffixed arguments,
and also does not understand that "0" means the default size
(you have to leave it out to get the default). This means
that we need to convert the QemuOpts by hand to a QDict.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414165300.555321-4-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

machine: add boot compound property

Make -boot syntactic sugar for a compound property "-machine boot.{order,menu,...}".
machine_boot_parse is replaced by the setter for the property.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414165300.555321-3-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

machine: use QAPI struct for boot configuration

As part of converting -boot to a property with a QAPI type, define
the struct and use it throughout QEMU to access boot configuration.
machine_boot_parse takes care of doing the QemuOpts->QAPI conversion by
hand, for now.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220414165300.555321-2-pbonzini@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

.gitlab-ci.d: export meson testlog.txt as an artifact

When running 'make check' we only get a summary of progress on the
console. Fortunately meson/ninja have saved the raw test output to a
logfile. Exposing this log will make it easier to debug failures that
happen in CI.

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220509124134.867431-3-berrange@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/qemu-iotests: print intent to run a test in TAP mode

When running I/O tests using TAP output mode, we get a single TAP test
with a sub-test reported for each I/O test that is run. The output looks
something like this:

1..123
ok qcow2 011
ok qcow2 012
ok qcow2 013
ok qcow2 217
...

If everything runs or fails normally this is fine, but periodically we
have been seeing the test harness abort early before all 123 tests have
been run, just leaving a fairly useless message like

TAP parsing error: Too few tests run (expected 123, got 107)

we have no idea which tests were running at the time the test harness
abruptly exited. This change causes us to print a message about our
intent to run each test, so we have a record of what is active at the
time the harness exits abnormally.

1..123
# running qcow2 011
ok qcow2 011
# running qcow2 012
ok qcow2 012
# running qcow2 013
ok qcow2 013
# running qcow2 217
ok qcow2 217
...

Signed-off-by: Daniel P. Berrangé <berrange@redhat.com>
Message-Id: <20220509124134.867431-2-berrange@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

iotests/testrunner: Flush after run_test()

When stdout is not a terminal, the buffer may not be flushed at each end
of line, so we should flush after each test is done.  This is especially
apparent when run by check-block, in two ways:

First, when running make check-block -jX with X > 1, progress indication
was missing, even though testrunner.py does theoretically print each
test's status once it has been run, even in multi-processing mode.
Flushing after each test restores this progress indication.

Second, sometimes make check-block failed altogether, with an error
message that "too few tests [were] run".  I presume that's because one
worker process in the job pool did not get to flush its stdout before
the main process exited, and so meson did not get to see that worker's
test results.  In any case, by flushing at the end of run_test(), the
problem has disappeared for me.

Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220506134215.10086-1-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

coroutine: Revert to constant batch size

Commit 4c41c69e changed the way the coroutine pool is sized because for
virtio-blk devices with a large queue size and heavy I/O, it was just
too small and caused coroutines to be deleted and reallocated soon
afterwards. The change made the size dynamic based on the number of
queues and the queue size of virtio-blk devices.

There are two important numbers here: Slightly simplified, when a
coroutine terminates, it is generally stored in the global release pool
up to a certain pool size, and if the pool is full, it is freed.
Conversely, when allocating a new coroutine, the coroutines in the
release pool are reused if the pool already has reached a certain
minimum size (the batch size), otherwise we allocate new coroutines.

The problem after commit 4c41c69e is that it not only increases the
maximum pool size (which is the intended effect), but also the batch
size for reusing coroutines (which is a bug). It means that in cases
with many devices and/or a large queue size (which defaults to the
number of vcpus for virtio-blk-pci), many thousand coroutines could be
sitting in the release pool without being reused.

This is not only a waste of memory and allocations, but it actually
makes the QEMU process likely to hit the vm.max_map_count limit on Linux
because each coroutine requires two mappings (its stack and the guard
page for the stack), causing it to abort() in qemu_alloc_stack() because
when the limit is hit, mprotect() starts to fail with ENOMEM.

In order to fix the problem, change the batch size back to 64 to avoid
uselessly accumulating coroutines in the release pool, but keep the
dynamic maximum pool size so that coroutines aren't freed too early
in heavy I/O scenarios.

Note that this fix doesn't strictly make it impossible to hit the limit,
but this would only happen if most of the coroutines are actually in use
at the same time, not just sitting in a pool. This is the same behaviour
as we already had before commit 4c41c69e. Fully preventing this would
require allowing qemu_coroutine_create() to return an error, but it
doesn't seem to be a scenario that people hit in practice.

Cc: qemu-stable@nongnu.org
Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=2079938
Fixes: 4c41c69e05fe28c0f95f8abd2ebf407e95a4f04b
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20220510151020.105528-3-kwolf@redhat.com>
Tested-by: Hiroki Narukawa <hnarukaw@yahoo-corp.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

coroutine: Rename qemu_coroutine_inc/dec_pool_size()

It's true that these functions currently affect the batch size in which
coroutines are reused (i.e. moved from the global release pool to the
allocation pool of a specific thread), but this is a bug and will be
fixed in a separate patch.

In fact, the comment in the header file already just promises that it
influences the pool size, so reflect this in the name of the functions.
As a nice side effect, the shorter function name makes some line
wrapping unnecessary.

Cc: qemu-stable@nongnu.org
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20220510151020.105528-2-kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>

tests/qtest/libqos: Add generic pci host bridge in arm-virt machine

Up to now the virt-machine node contains a virtio-mmio node.
However no driver produces any PCI interface node. Hence, PCI
tests cannot be run with aarch64 binary.

Add a GPEX driver node that produces a pci interface node. This latter
then can be consumed by all the pci tests. One of the first motivation
was to be able to run the virtio-iommu-pci tests.

We still face an issue with pci hotplug tests as hotplug cannot happen
on the pcie root bus and require a generic root port. This will be
addressed later on.

We force cpu=max along with aarch64/virt machine as some PCI tests
require high MMIO regions to be available.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Message-Id: <20220504152025.1785704-4-eric.auger@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

tests/qtest/libqos: Skip hotplug tests if pci root bus is not hotpluggable

ARM does not not support hotplug on pcie.0. Add a flag on the bus
which tells if devices can be hotplugged and skip hotplug tests
if the bus cannot be hotplugged. This is a temporary solution to
enable the other pci tests on aarch64.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Acked-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220504152025.1785704-3-eric.auger@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

tests/qtest/libqos/pci: Introduce pio_limit

At the moment the IO space limit is hardcoded to
QPCI_PIO_LIMIT = 0x10000. When accesses are performed to a bar,
the base address of this latter is compared against the limit
to decide whether we perform an IO or a memory access.

On ARM, we cannot keep this PIO limit as the arm-virt machine
uses [0x3eff0000, 0x3f000000 ] for the IO space map and we
are mandated to allocate at 0x0.

Add a new flag in QPCIBar indicating whether it is an IO bar
or a memory bar. This flag is set on QPCIBar allocation and
provisionned based on the BAR configuration. Then the new flag
is used in access functions and in iomap() function.

Signed-off-by: Eric Auger <eric.auger@redhat.com>
Reviewed-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Message-Id: <20220504152025.1785704-2-eric.auger@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

hw/xen/xen_pt: Resolve igd_passthrough_isa_bridge_create() indirection

Now that igd_passthrough_isa_bridge_create() is implemented within the
xen context it may use Xen* data types directly and become
xen_igd_passthrough_isa_bridge_create(). This resolves an indirection.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Message-Id: <20220326165825.30794-3-shentey@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

hw/xen/xen_pt: Confine igd-passthrough-isa-bridge to XEN

igd-passthrough-isa-bridge is only requested in xen_pt but was
implemented in pc_piix.c. This caused xen_pt to dependend on i386/pc
which is hereby resolved.

Signed-off-by: Bernhard Beschow <shentey@gmail.com>
Acked-by: Anthony PERARD <anthony.perard@citrix.com>
Message-Id: <20220326165825.30794-2-shentey@gmail.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

meson: Make mremap() detecting works correctly

Without this (at least in Fedora 35) it don't detect mremap()
correctly.

Signed-off-by: Juan Quintela <quintela@redhat.com>
Message-Id: <20220502131119.2345-1-quintela@redhat.com>
[Also switch the LEGACY_RDMA_REG_MR test to cc.links, otherwise
Debian fails to build. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

checkpatch: fix g_malloc check

Use the string equality operator "eq", and ensure that $1 is defined by
using "(try|)" instead of "(try)?". The alternative "((?:try)?)" is
longer and less readable.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

target/i386: do not consult nonexistent host leaves

When cache_info_passthrough is requested, QEMU passes the host values
of the cache information CPUID leaves down to the guest.  However,
it blindly assumes that the CPUID leaf exists on the host, and this
cannot be guaranteed: for example, KVM has recently started to
synthesize AMD leaves up to 0x80000021 in order to provide accurate
CPU bug information to guests.

Querying a nonexistent host leaf fills the output arguments of
host_cpuid with data that (albeit deterministic) is nonsensical
as cache information, namely the data in the highest Intel CPUID
leaf.  If said highest leaf is not ECX-dependent, this can even
cause an infinite loop when kvm_arch_init_vcpu prepares the input
to KVM_SET_CPUID2.  The infinite loop is only terminated by an
abort() when the array gets full.

Reported-by: Maxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

pc-bios/optionrom: compile with -Wno-array-bounds

Avoids the following bogus warning:

pvh_main.c: In function ‘pvh_load_kernel’:
pvh_main.c:101:42: warning: array subscript 0 is outside array bounds of ‘uint16_t[0]’ {aka ‘short unsigned int[]’} [-Warray-bounds]
101 | uint32_t ebda_paddr = ((uint32_t)*((uint16_t *)EBDA_BASE_ADDR)) << 4;
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Cc: qemu-stable@nongnu.org
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

pc-bios/optionrom: detect -fno-pie

Do not rely on the detection that was done in the configure script,
since in the future we may want to cross-compile this file.

Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>

Merge tag 'pull-misc-2022-05-11' of git://repo.or.cz/qemu/armbru into staging

Miscellaneous patches patches for 2022-05-11

# -----BEGIN PGP SIGNATURE-----
#
# iQJGBAABCAAwFiEENUvIs9frKmtoZ05fOHC0AOuRhlMFAmJ7zwISHGFybWJydUBy
# ZWRoYXQuY29tAAoJEDhwtADrkYZThuAQAJdSuj5fpY8EXxhuS3Rc8uHPrz6lP+nZ
# kwxKPOldwFdmkXRJ8qrjcc/BXxiJU3pxmSRvFZ8miCFMrb4Vd16sUzD6PeKb1jr8
# JsrvXcsaWn4f/p0v0WraamwSQeZUMjqsZPgZut93qfJoKmgTaxoZnR+ZDHFKoQJS
# qBrHL/5+RPxSugLa6IEpSQwy80jd0tMBaG/e8V+JxzgFM5jzOExwXtfUujzS92Lr
# NgapnbEZrpqErBC1xhpetQ8Q5I4r0kkLj4Exm/ClNtIM2GByJxI8x2DE+NJZNDnm
# g/tvVKUhEl6cOywQRajAJ/LrhUpVSkz6wsczv35rhRS+1FoCb+PRKr42SxZGI2rB
# tZLYt4ouoSGk2pYiudoIBKsIR1Svu7Cmg4YzOL9yvqF0BS3cRDvPgm3QFvoeErjL
# EML7b41zLdIkbvujsJ7HJqVL44QmMSu13PcLUtDvLh+ivpL9wIUQn3ji+rfsgqh+
# RYw4niJ9JO3N3/VwEhlymc9kRSTgZ6rdIWPrtQ5ACwTADAv30++opxAlksE6mo0m
# TYrqyTG2FHGOKm+5Q4Lyx1heHJDUAE3dlRIhGt8KqD6UKlpSfIVIUU2ztjZK4JQ5
# n85LOLZkE9ejbvbpnLX8hgKfouVKKYwFagc/ZA649cIXvC8YDxdOwvhjEVCxa+V5
# dQbpQsekXf9G
# =jOTx
# -----END PGP SIGNATURE-----
# gpg: Signature made Wed 11 May 2022 07:58:10 AM PDT
# gpg:                using RSA key 354BC8B3D7EB2A6B68674E5F3870B400EB918653
# gpg:                issuer "armbru@redhat.com"
# gpg: Good signature from "Markus Armbruster <armbru@redhat.com>" [undefined]
# gpg:                 aka "Markus Armbruster <armbru@pond.sub.org>" [undefined]
# gpg: WARNING: This key is not certified with a trusted signature!
# gpg:          There is no indication that the signature belongs to the owner.
# Primary key fingerprint: 354B C8B3 D7EB 2A6B 6867  4E5F 3870 B400 EB91 8653

* tag 'pull-misc-2022-05-11' of git://repo.or.cz/qemu/armbru:
  Clean up decorations and whitespace around header guards
  Normalize header guard symbol definition
  Clean up ill-advised or unusual header guards
  Clean up header guards that don't match their file name

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Clean up decorations and whitespace around header guards

Cleaned up with scripts/clean-header-guards.pl.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220506134911.2856099-5-armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Normalize header guard symbol definition

We commonly define the header guard symbol without an explicit value.
Normalize the exceptions.

Done with scripts/clean-header-guards.pl.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220506134911.2856099-4-armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Clean up ill-advised or unusual header guards

Leading underscores are ill-advised because such identifiers are
reserved.  Trailing underscores are merely ugly.  Strip both.

Our header guards commonly end in _H.  Normalize the exceptions.

Macros should be ALL_CAPS.  Normalize the exception.

Done with scripts/clean-header-guards.pl.

include/hw/xen/interface/ and tools/virtiofsd/ left alone, because
these were imported from Xen and libfuse respectively.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220506134911.2856099-3-armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>

Clean up header guards that don't match their file name

Header guard symbols should match their file name to make guard
collisions less likely.

Cleaned up with scripts/clean-header-guards.pl, followed by some
renaming of new guard symbols picked by the script to better ones.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Message-Id: <20220506134911.2856099-2-armbru@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
[Change to generated file ebpf/rss.bpf.skeleton.h backed out]

Merge tag 'block-pull-request' of https://gitlab.com/stefanha/qemu into staging

Pull request

- Add new thread-pool-min/thread-pool-max parameters to control the thread pool
  used for async I/O.

- Fix virtio-scsi IOThread 100% CPU consumption QEMU 7.0 regression.

# -----BEGIN PGP SIGNATURE-----
#
# iQEzBAABCAAdFiEEhpWov9P5fNqsNXdanKSrs4Grc8gFAmJ5DqgACgkQnKSrs4Gr
# c8iAqAf/WEJzEso0Hu3UUYJi2lAXpLxWPjoNBlPdQlKIJ/I0zQIF0P7GeCifF+0l
# iMjgBv0ofyAuV47gaTJlVrAR75+hJ/IXNDhnu3UuvNWfVOqvksgw6kuHkMo9A2hC
# 4tIHEU9J8jbQSSdQTaZR8Zj4FX1/zcxMBAXT3YO3De6zo78RatBTuNP4dsZzt8bI
# Qs1a4A0p2ScNXK8EcF4QwAWfoxu9OPPzN52DBCNxcIcnn0SUab4NbDxzpRV4ZhDP
# 08WoafI5O+2Kb36QysJN01LqajHrClG/fozrPzBLq5aZUK3xewJGB1hEdGTLkkmz
# NJNBg5Ldszwj4PDZ1dFU3/03aigb3g==
# =t5eR
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 09 May 2022 05:52:56 AM PDT
# gpg:                using RSA key 8695A8BFD3F97CDAAC35775A9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>" [full]
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>" [full]

* tag 'block-pull-request' of https://gitlab.com/stefanha/qemu:
  virtio-scsi: move request-related items from .h to .c
  virtio-scsi: clean up virtio_scsi_handle_cmd_vq()
  virtio-scsi: clean up virtio_scsi_handle_ctrl_vq()
  virtio-scsi: clean up virtio_scsi_handle_event_vq()
  virtio-scsi: don't waste CPU polling the event virtqueue
  virtio-scsi: fix ctrl and event handler functions in dataplane mode
  util/event-loop-base: Introduce options to set the thread pool size
  util/main-loop: Introduce the main loop into QOM
  Introduce event-loop-base abstract class

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Merge tag 'pull-target-arm-20220509' of https://git.linaro.org/people/pmaydell/qemu-arm into staging

target-arm queue:
* MAINTAINERS/.mailmap: update email for Leif Lindholm
* hw/arm: add version information to sbsa-ref machine DT
* Enable new features for -cpu max:
   FEAT_Debugv8p2, FEAT_Debugv8p4, FEAT_RAS (minimal version only),
   FEAT_IESB, FEAT_CSV2, FEAT_CSV2_2, FEAT_CSV3, FEAT_DGH
* Emulate Cortex-A76
* Emulate Neoverse-N1
* Fix the virt board default NUMA topology

# -----BEGIN PGP SIGNATURE-----
#
# iQJNBAABCAA3FiEE4aXFk81BneKOgxXPPCUl7RQ2DN4FAmJ5AbsZHHBldGVyLm1h
# eWRlbGxAbGluYXJvLm9yZwAKCRA8JSXtFDYM3vyFEACZZ6tRVJYB6YpIzI7rho9x
# hVQIMTc4D5lmVetJnbLdLazifIy60oIOtSKV3Y3oj5DLMcsf6NITrPaFPWNRX3Nm
# mcbTCT5FGj8i7b1CkpEylLwvRQbIaoz2GnJPckdYelxxAq1uJNog3fmoG8nVtJ1F
# HfXVCVkZGQyiyr6Y2/zn3vpdp9n6/4RymN8ugizkcgIRII87DKV+DNDalw613JG4
# 5xxBOGkYzo5DZM8TgL8Ylmb5Jy9XY0EN1xpkyHFOg6gi0B3UZTxHq5SvK6NFoZLJ
# ogyhmMh6IjEfhUIDCtWG9VCoPyWpOXAFoh7D7akFVB4g2SIvBvcuGzFxCAsh5q3K
# s+9CgNX1SZpJQkT1jLjQlNzoUhh8lNc7QvhPWVrbAj3scc+1xVnS5MJsokEV21Cx
# /bp3mFwCL+Q4gjsMKx1nKSvxLv8xlxRtIilmlfj+wvpkenIfIwHYjbvItJTlAy1L
# +arx8fqImNQorxO6oMjOuAlSbNnDKup5qvwGghyu/qz/YEnGQVzN6gI324Km081L
# 1u31H/B3C2rj3qMsYMp5yOqgprXi1D5c6wfYIpLD/C4UfHgIlRiprawZPDM7fAhX
# vxhUhhj3e9OgkbC9yqd6SUR2Uk3YaQlp319LyoZa3VKSvjBTciFsMXXnIV1UitYp
# BGtz8+FypPVkYH7zQB9c7Q==
# =ey1m
# -----END PGP SIGNATURE-----
# gpg: Signature made Mon 09 May 2022 04:57:47 AM PDT
# gpg:                using RSA key E1A5C593CD419DE28E8315CF3C2525ED14360CDE
# gpg:                issuer "peter.maydell@linaro.org"
# gpg: Good signature from "Peter Maydell <peter.maydell@linaro.org>" [full]
# gpg:                 aka "Peter Maydell <pmaydell@gmail.com>" [full]
# gpg:                 aka "Peter Maydell <pmaydell@chiark.greenend.org.uk>" [full]

* tag 'pull-target-arm-20220509' of https://git.linaro.org/people/pmaydell/qemu-arm: (32 commits)
  hw/acpi/aml-build: Use existing CPU topology to build PPTT table
  hw/arm/virt: Fix CPU's default NUMA node ID
  qtest/numa-test: Correct CPU and NUMA association in aarch64_numa_cpu()
  hw/arm/virt: Consider SMP configuration in CPU topology
  qtest/numa-test: Specify CPU topology in aarch64_numa_cpu()
  qapi/machine.json: Add cluster-id
  hw/arm: add versioning to sbsa-ref machine DT
  target/arm: Define neoverse-n1
  target/arm: Define cortex-a76
  target/arm: Enable FEAT_DGH for -cpu max
  target/arm: Enable FEAT_CSV3 for -cpu max
  target/arm: Enable FEAT_CSV2_2 for -cpu max
  target/arm: Enable FEAT_CSV2 for -cpu max
  target/arm: Enable FEAT_IESB for -cpu max
  target/arm: Enable FEAT_RAS for -cpu max
  target/arm: Implement ESB instruction
  target/arm: Implement virtual SError exceptions
  target/arm: Enable SCR and HCR bits for RAS
  target/arm: Add minimal RAS registers
  target/arm: Enable FEAT_Debugv8p4 for -cpu max
  ...

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

hw/acpi/aml-build: Use existing CPU topology to build PPTT table

When the PPTT table is built, the CPU topology is re-calculated, but
it's unecessary because the CPU topology has been populated in
virt_possible_cpu_arch_ids() on arm/virt machine.

This reworks build_pptt() to avoid by reusing the existing IDs in
ms->possible_cpus. Currently, the only user of build_pptt() is
arm/virt machine.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Tested-by: Yanan Wang <wangyanan55@huawei.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Message-id: 20220503140304.855514-7-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/arm/virt: Fix CPU's default NUMA node ID

When CPU-to-NUMA association isn't explicitly provided by users,
the default one is given by mc->get_default_cpu_node_id(). However,
the CPU topology isn't fully considered in the default association
and this causes CPU topology broken warnings on booting Linux guest.

For example, the following warning messages are observed when the
Linux guest is booted with the following command lines.

  /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
  -accel kvm -machine virt,gic-version=host               \
  -cpu host                                               \
  -smp 6,sockets=2,cores=3,threads=1                      \
  -m 1024M,slots=16,maxmem=64G                            \
  -object memory-backend-ram,id=mem0,size=128M            \
  -object memory-backend-ram,id=mem1,size=128M            \
  -object memory-backend-ram,id=mem2,size=128M            \
  -object memory-backend-ram,id=mem3,size=128M            \
  -object memory-backend-ram,id=mem4,size=128M            \
  -object memory-backend-ram,id=mem4,size=384M            \
  -numa node,nodeid=0,memdev=mem0                         \
  -numa node,nodeid=1,memdev=mem1                         \
  -numa node,nodeid=2,memdev=mem2                         \
  -numa node,nodeid=3,memdev=mem3                         \
  -numa node,nodeid=4,memdev=mem4                         \
  -numa node,nodeid=5,memdev=mem5
         :
  alternatives: patching kernel code
  BUG: arch topology borken
  the CLS domain not a subset of the MC domain
  <the above error log repeats>
  BUG: arch topology borken
  the DIE domain not a subset of the NODE domain

With current implementation of mc->get_default_cpu_node_id(),
CPU#0 to CPU#5 are associated with NODE#0 to NODE#5 separately.
That's incorrect because CPU#0/1/2 should be associated with same
NUMA node because they're seated in same socket.

This fixes the issue by considering the socket ID when the default
CPU-to-NUMA association is provided in virt_possible_cpu_arch_ids().
With this applied, no more CPU topology broken warnings are seen
from the Linux guest. The 6 CPUs are associated with NODE#0/1, but
there are no CPUs associated with NODE#2/3/4/5.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Message-id: 20220503140304.855514-6-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

qtest/numa-test: Correct CPU and NUMA association in aarch64_numa_cpu()

In aarch64_numa_cpu(), the CPU and NUMA association is something
like below. Two threads in the same core/cluster/socket are
associated with two individual NUMA nodes, which is unreal as
Igor Mammedov mentioned. We don't expect the association to break
NUMA-to-socket boundary, which matches with the real world.

    NUMA-node  socket  cluster   core   thread
    ------------------------------------------
        0       0        0        0      0
        1       0        0        0      1

This corrects the topology for CPUs and their association with
NUMA nodes. After this patch is applied, the CPU and NUMA
association becomes something like below, which looks real.
Besides, socket/cluster/core/thread IDs are all checked when
the NUMA node IDs are verified. It helps to check if the CPU
topology is properly populated or not.

    NUMA-node  socket  cluster   core   thread
    ------------------------------------------
       0        1        0        0       0
       1        0        0        0       0

Suggested-by: Igor Mammedov <imammedo@redhat.com>
Signed-off-by: Gavin Shan <gshan@redhat.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 20220503140304.855514-5-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/arm/virt: Consider SMP configuration in CPU topology

Currently, the SMP configuration isn't considered when the CPU
topology is populated. In this case, it's impossible to provide
the default CPU-to-NUMA mapping or association based on the socket
ID of the given CPU.

This takes account of SMP configuration when the CPU topology
is populated. The die ID for the given CPU isn't assigned since
it's not supported on arm/virt machine. Besides, the used SMP
configuration in qtest/numa-test/aarch64_numa_cpu() is corrcted
to avoid testing failure

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 20220503140304.855514-4-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

qtest/numa-test: Specify CPU topology in aarch64_numa_cpu()

The CPU topology isn't enabled on arm/virt machine yet, but we're
going to do it in next patch. After the CPU topology is enabled by
next patch, "thread-id=1" becomes invalid because the CPU core is
preferred on arm/virt machine. It means these two CPUs have 0/1
as their core IDs, but their thread IDs are all 0. It will trigger
test failure as the following message indicates:

  [14/21 qemu:qtest+qtest-aarch64 / qtest-aarch64/numa-test  ERROR
  1.48s   killed by signal 6 SIGABRT
  >>> G_TEST_DBUS_DAEMON=/home/gavin/sandbox/qemu.main/tests/dbus-vmstate-daemon.sh \
      QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon         \
      QTEST_QEMU_BINARY=./qemu-system-aarch64                                       \
      QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=83                                  \
      /home/gavin/sandbox/qemu.main/build/tests/qtest/numa-test --tap -k
  ――――――――――――――――――――――――――――――――――――――――――――――
  stderr:
  qemu-system-aarch64: -numa cpu,node-id=0,thread-id=1: no match found

This fixes the issue by providing comprehensive SMP configurations
in aarch64_numa_cpu(). The SMP configurations aren't used before
the CPU topology is enabled in next patch.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Message-id: 20220503140304.855514-3-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

qapi/machine.json: Add cluster-id

This adds cluster-id in CPU instance properties, which will be used
by arm/virt machine. Besides, the cluster-id is also verified or
dumped in various spots:

  * hw/core/machine.c::machine_set_cpu_numa_node() to associate
    CPU with its NUMA node.

  * hw/core/machine.c::machine_numa_finish_cpu_init() to record
    CPU slots with no NUMA mapping set.

  * hw/core/machine-hmp-cmds.c::hmp_hotpluggable_cpus() to dump
    cluster-id.

Signed-off-by: Gavin Shan <gshan@redhat.com>
Reviewed-by: Yanan Wang <wangyanan55@huawei.com>
Acked-by: Igor Mammedov <imammedo@redhat.com>
Message-id: 20220503140304.855514-2-gshan@redhat.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

hw/arm: add versioning to sbsa-ref machine DT

The sbsa-ref machine is continuously evolving. Some of the changes we
want to make in the near future, to align with real components (e.g.
the GIC-700), will break compatibility for existing firmware.

Introduce two new properties to the DT generated on machine generation:
- machine-version-major
  To be incremented when a platform change makes the machine
  incompatible with existing firmware.
- machine-version-minor
  To be incremented when functionality is added to the machine
  without causing incompatibility with existing firmware.
  to be reset to 0 when machine-version-major is incremented.

This versioning scheme is *neither*:
- A QEMU versioned machine type; a given version of QEMU will emulate
  a given version of the platform.
- A reflection of level of SBSA (now SystemReady SR) support provided.

The version will increment on guest-visible functional changes only,
akin to a revision ID register found on a physical platform.

These properties are both introduced with the value 0.
(Hence, a machine where the DT is lacking these nodes is equivalent
to version 0.0.)

Signed-off-by: Leif Lindholm <quic_llindhol@quicinc.com>
Message-id: 20220505113947.75714-1-quic_llindhol@quicinc.com
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: Radoslaw Biernacki <rad@semihalf.com>
Cc: Cédric Le Goater <clg@kaod.org>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Define neoverse-n1

Enable the n1 for virt and sbsa board use.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220506180242.216785-25-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Define cortex-a76

Enable the a76 for virt and sbsa board use.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220506180242.216785-24-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Enable FEAT_DGH for -cpu max

This extension concerns not merging memory access, which TCG does
not implement. Thus we can trivially enable this feature.
Add a comment to handle_hint for the DGH instruction, but no code.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220506180242.216785-23-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Enable FEAT_CSV3 for -cpu max

This extension concerns cache speculation, which TCG does
not implement. Thus we can trivially enable this feature.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220506180242.216785-22-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Enable FEAT_CSV2_2 for -cpu max

There is no branch prediction in TCG, therefore there is no
need to actually include the context number into the predictor.
Therefore all we need to do is add the state for SCXTNUM_ELx.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220506180242.216785-21-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Enable FEAT_CSV2 for -cpu max

This extension concerns branch speculation, which TCG does
not implement. Thus we can trivially enable this feature.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220506180242.216785-20-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Enable FEAT_IESB for -cpu max

This feature is AArch64 only, and applies to physical SErrors,
which QEMU does not implement, thus the feature is a nop.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220506180242.216785-19-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Enable FEAT_RAS for -cpu max

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220506180242.216785-18-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>

target/arm: Implement ESB instruction

Check for and defer any pending virtual SError.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20220506180242.216785-17-richard.henderson@linaro.org
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>