Use dev_dbg() instead of dev_err()/dev_warn() to avoid flooding the
host with prints, when the guest driver is misbehaving.
In this way, prints can be dynamically enabled when the vDPA block
simulator is used to validate a driver.
Suggested-by: Jason Wang <jasowang@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20220630153221.83371-2-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:39:02 +0000 (14:39 +0800)]
virtio_net: support set_ringparam
Support set_ringparam based on virtio queue reset.
Users can use ethtool -G eth0 <ring_num> to modify the ring size of
virtio-net.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-43-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:39:01 +0000 (14:39 +0800)]
virtio_net: support tx queue resize
This patch implements the resize function of the tx queues.
Based on this function, it is possible to modify the ring num of the
queue.
Inludes fixup:
virtio_net: fix for stuck when change tx ring size with dev down
When dev is set to DOWN state, napi has been disabled, if we modify the
ring size at this time, we should not call napi_disable() again, which
will cause stuck.
And all operations are under the protection of rtnl_lock, so there is no
need to consider concurrency issues.
Message-Id: <20220801063902.129329-42-xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220811080258.79398-3-xuanzhuo@linux.alibaba.com> Reported-by: Kangjie Xu <kangjie.xu@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:39:00 +0000 (14:39 +0800)]
virtio_net: support rx queue resize
This patch implements the resize function of the rx queues.
Based on this function, it is possible to modify the ring num of the
queue.
Includes fixup:
virtio_net: fix for stuck when change rx ring size with dev down
When dev is set to DOWN state, napi has been disabled, if we modify the
ring size at this time, we should not call napi_disable() again, which
will cause stuck.
And all operations are under the protection of rtnl_lock, so there is no
need to consider concurrency issues.
Message-Id: <20220801063902.129329-41-xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220811080258.79398-2-xuanzhuo@linux.alibaba.com> Reported-by: Kangjie Xu <kangjie.xu@linux.alibaba.com> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:59 +0000 (14:38 +0800)]
virtio_net: split free_unused_bufs()
This patch separates two functions for freeing sq buf and rq buf from
free_unused_bufs().
When supporting the enable/disable tx/rq queue in the future, it is
necessary to support separate recovery of a sq buf or a rq buf.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-40-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:58 +0000 (14:38 +0800)]
virtio_net: get ringparam by virtqueue_get_vring_max_size()
Use virtqueue_get_vring_max_size() in virtnet_get_ringparam() to set
tx,rx_max_pending.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-39-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Call virtnet_update_settings() once before calling init_vqs() to update
speed.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-38-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:56 +0000 (14:38 +0800)]
virtio: add helper virtio_find_vqs_ctx_size()
Introduce helper virtio_find_vqs_ctx_size() to call find_vqs and specify
the maximum size of each vq ring.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-37-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:55 +0000 (14:38 +0800)]
virtio_mmio: support the arg sizes of find_vqs()
Virtio MMIO support the new parameter sizes of find_vqs().
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-36-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:54 +0000 (14:38 +0800)]
virtio_pci: support the arg sizes of find_vqs()
Virtio PCI supports new parameter sizes of find_vqs().
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-35-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:53 +0000 (14:38 +0800)]
virtio: find_vqs() add arg sizes
find_vqs() adds a new parameter sizes to specify the size of each vq
vring.
NULL as sizes means that all queues in find_vqs() use the maximum size.
A value in the array is 0, which means that the corresponding queue uses
the maximum size.
In the split scenario, the meaning of size is the largest size, because
it may be limited by memory, the virtio core will try a smaller size.
And the size is power of 2.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-34-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:52 +0000 (14:38 +0800)]
virtio_pci: support VIRTIO_F_RING_RESET
This patch implements virtio pci support for QUEUE RESET.
Performing reset on a queue is divided into these steps:
1. notify the device to reset the queue
2. recycle the buffer submitted
3. reset the vring (may re-alloc)
4. mmap vring to device, and enable the queue
This patch implements virtio_reset_vq(), virtio_enable_resetq() in the
pci scenario.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-33-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:51 +0000 (14:38 +0800)]
virtio_pci: extract the logic of active vq for modern pci
Introduce vp_active_vq() to configure vring to backend after vq attach
vring. And configure vq vector if necessary.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-32-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-31-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-30-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:48 +0000 (14:38 +0800)]
virtio_ring: struct virtqueue introduce reset
Introduce a new member reset to the structure virtqueue to determine
whether the current vq is in the reset state. Subsequent patches will
use it.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-29-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
This feature indicates that the driver can reset a queue individually.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-28-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:46 +0000 (14:38 +0800)]
virtio: allow to unbreak/break virtqueue individually
This patch allows the new introduced
__virtqueue_break()/__virtqueue_unbreak() to break/unbreak the
virtqueue.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-27-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Add queue_notify_data in struct virtio_pci_common_cfg, which comes from
here https://github.com/oasis-tcs/virtio-spec/issues/89
In order not to affect the API, add a dedicated structure struct
virtio_pci_modern_common_cfg to virtio_pci_modern.h.
Since I want to add queue_reset after queue_notify_data, I submitted
this patch first.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-26-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:44 +0000 (14:38 +0800)]
virtio_ring: introduce virtqueue_resize()
Introduce virtqueue_resize() to implement the resize of vring.
Based on these, the driver can dynamically adjust the size of the vring.
For example: ethtool -G.
virtqueue_resize() implements resize based on the vq reset function. In
case of failure to allocate a new vring, it will give up resize and use
the original vring.
During this process, if the re-enable reset vq fails, the vq can no
longer be used. Although the probability of this situation is not high.
The parameter recycle is used to recycle the buffer that is no longer
used.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-25-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Only after the new vring is successfully allocated based on the new num,
we will release the old vring. In any case, an error is returned,
indicating that the vring still points to the old vring.
In the case of an error, re-initialize(by virtqueue_reinit_packed()) the
virtqueue to ensure that the vring can be used.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-24-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce a function to initialize vq without allocating new ring,
desc_state, desc_extra.
Subsequent patches will call this function after reset vq to
reinitialize vq.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-23-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:41 +0000 (14:38 +0800)]
virtio_ring: packed: extract the logic of attach vring
Separate the logic of attach vring, the subsequent patch will call it
separately.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-22-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:40 +0000 (14:38 +0800)]
virtio_ring: packed: extract the logic of vring init
Separate the logic of initializing vring, and subsequent patches will
call it separately.
This function completes the variable initialization of packed vring. It
together with the logic of atatch constitutes the initialization of
vring.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-21-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:39 +0000 (14:38 +0800)]
virtio_ring: packed: extract the logic of alloc state and extra
Separate the logic for alloc desc_state and desc_extra, which will
be called separately by subsequent patches.
Use struct vring_packed to pass desc_state, desc_extra.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-20-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:38 +0000 (14:38 +0800)]
virtio_ring: packed: extract the logic of alloc queue
Separate the logic of packed to create vring queue.
This feature is required for subsequent virtuqueue reset vring.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-19-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:37 +0000 (14:38 +0800)]
virtio_ring: packed: introduce vring_free_packed
Free the structure struct vring_vritqueue_packed.
Subsequent patches require it.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-18-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Only after the new vring is successfully allocated based on the new num,
we will release the old vring. In any case, an error is returned,
indicating that the vring still points to the old vring.
In the case of an error, re-initialize(virtqueue_reinit_split()) the
virtqueue to ensure that the vring can be used.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-17-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
In vring_alloc_queue_split() save vring_align, may_reduce_num to
structure vring_virtqueue_split. Used to create a new vring when
implementing resize.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-16-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Introduce a function to initialize vq without allocating new ring,
desc_state, desc_extra.
Subsequent patches will call this function after reset vq to
reinitialize vq.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-15-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:33 +0000 (14:38 +0800)]
virtio_ring: split: extract the logic of attach vring
Separate the logic of attach vring, subsequent patches will call it
separately.
virtqueue_vring_init_split() completes the initialization of other
variables of vring split. We can directly use
vq->split = *vring_split to complete attach.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-14-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:32 +0000 (14:38 +0800)]
virtio_ring: split: extract the logic of vring init
Separate the logic of initializing vring, and subsequent patches will
call it separately.
This function completes the variable initialization of split vring. It
together with the logic of atatch constitutes the initialization of
vring.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-13-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:31 +0000 (14:38 +0800)]
virtio_ring: split: extract the logic of alloc state and extra
Separate the logic of creating desc_state, desc_extra, and subsequent
patches will call it independently.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-12-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:30 +0000 (14:38 +0800)]
virtio_ring: split: extract the logic of alloc queue
Separate the logic of split to create vring queue.
This feature is required for subsequent virtuqueue reset vring.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-11-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:29 +0000 (14:38 +0800)]
virtio_ring: split: introduce vring_free_split()
Free the structure struct vring_vritqueue_split.
Subsequent patches require it.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-10-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
The purpose of this is to pass more information into
__vring_new_virtqueue() to make the code simpler and the structure
cleaner.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-9-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:27 +0000 (14:38 +0800)]
virtio_ring: split: stop __vring_new_virtqueue as export symbol
There is currently only one place to reference __vring_new_virtqueue()
directly from the outside of virtio core. And here vring_new_virtqueue()
can be used instead.
Subsequent patches will modify __vring_new_virtqueue, so stop it as an
export symbol for now.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-8-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:26 +0000 (14:38 +0800)]
virtio_ring: introduce virtqueue_init()
Separate the logic of virtqueue initialization. These variables should
be reset during reset.
This logic can be called independently when implementing resize/reset
later.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-7-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:25 +0000 (14:38 +0800)]
virtio_ring: split vring_virtqueue
Separate the two inline structures(split and packed) from the structure
vring_virtqueue.
In this way, we can use these two structures later to pass parameters
and retain temporary variables.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-6-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:24 +0000 (14:38 +0800)]
virtio_ring: extract the logic of freeing vring
Introduce vring_free() to free the vring of vq.
Subsequent patches will use vring_free() alone.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-5-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:23 +0000 (14:38 +0800)]
virtio_ring: update the document of the virtqueue_detach_unused_buf for queue reset
Added documentation for virtqueue_detach_unused_buf, allowing it to be
called on queue reset.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-4-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:22 +0000 (14:38 +0800)]
virtio: struct virtio_config_ops add callbacks for queue_reset
reset can be divided into the following four steps (example):
1. transport: notify the device to reset the queue
2. vring: recycle the buffer submitted
3. vring: reset/resize the vring (may re-alloc)
4. transport: mmap vring to device, and enable the queue
In order to support queue reset, add two callbacks in struct
virtio_config_ops to implement steps 1 and 4.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-3-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Mon, 1 Aug 2022 06:38:21 +0000 (14:38 +0800)]
virtio: record the maximum queue num supported by the device.
virtio-net can display the maximum (supported by hardware) ring size in
ethtool -g eth0.
When the subsequent patch implements vring reset, it can judge whether
the ring size passed by the driver is legal based on this.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220801063902.129329-2-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
drivers/virtio: Clarify CONFIG_VIRTIO_MEM for unsupported architectures
Let's make it clearer that simply unlocking CONFIG_VIRTIO_MEM on an
architecture is most probably not sufficient to have it working as
expected.
Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Gavin Shan <gshan@redhat.com> Signed-off-by: David Hildenbrand <david@redhat.com>
Message-Id: <20220610094737.65254-1-david@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Some systems want to set the interrupt of virtio_mmio device
as a wakeup source. On such systems, we'll use the existence
of the "wakeup-source" property as a signal of requirement.
Signed-off-by: Minghao Xue <quic_mingxue@quicinc.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Message-Id: <1654851507-13891-2-git-send-email-quic_mingxue@quicinc.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Robin Murphy [Wed, 8 Jun 2022 11:48:26 +0000 (12:48 +0100)]
vdpa: Use device_iommu_capable()
Use the new interface to check the capability for our device
specifically.
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Message-Id: <548e316fa282ce513fabb991a4c4d92258062eb5.1654688822.git.robin.murphy@arm.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
This option doesn't really work and breaks too many drivers.
Not yet sure what's the right thing to do, for now
let's make sure randconfig isn't broken by this.
Fixes: c346dae4f3fb ("virtio: disable notification hardening by default") Cc: "Jason Wang" <jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Acked-by: Jason Wang <jasowang@redhat.com>
Jason Wang [Tue, 28 Jun 2022 08:34:30 +0000 (16:34 +0800)]
virtio_pmem: set device ready in probe()
The NVDIMM region could be available before the virtio_device_ready()
that is called by virtio_dev_probe(). This means the driver tries to
use device before DRIVER_OK which violates the spec, fixing this by
set device ready before the nvdimm_pmem_region_create().
Note that this means the virtio_pmem_host_ack() could be triggered
before the creation of the nd region, this is safe since the pmem_lock
has been initialized and whether or not any available buffer is added
before is validated by virtio_pmem_host_ack().
Fixes 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver") Acked-by: Pankaj Gupta <pankaj.gupta@amd.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220628083430.61856-2-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Jason Wang [Tue, 28 Jun 2022 08:34:29 +0000 (16:34 +0800)]
virtio_pmem: initialize provider_data through nd_region_desc
We used to initialize the provider_data manually after
nvdimm_pemm_region_create(). This seems to be racy if the flush is
issued before the initialization of provider_data[1]. Fixing this by
initializing the provider_data through nd_region_desc to make sure the
provider_data is ready after the pmem is created.
Fixes 6e84200c0a29 ("virtio-pmem: Add virtio pmem driver") Acked-by: Pankaj Gupta <pankaj.gupta@amd.com> Reviewed-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220628083430.61856-1-jasowang@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
vringh: iterate on iotlb_translate to handle large translations
iotlb_translate() can return -ENOBUFS if the bio_vec is not big enough
to contain all the ranges for translation.
This can happen for example if the VMM maps a large bounce buffer,
without using hugepages, that requires more than 16 ranges to translate
the addresses.
To handle this case, let's extend iotlb_translate() to also return the
number of bytes successfully translated.
In copy_from_iotlb()/copy_to_iotlb() loops by calling iotlb_translate()
several times until we complete the translation.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20220624075656.13997-1-sgarzare@redhat.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Fri, 24 Jun 2022 02:55:45 +0000 (10:55 +0800)]
virtio_ring: remove the arg vq of vring_alloc_desc_extra()
The parameter vq of vring_alloc_desc_extra() is useless. This patch
removes this parameter.
Subsequent patches will call this function to avoid passing useless
arguments.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Acked-by: Jason Wang <jasowang@redhat.com>
Message-Id: <20220624025621.128843-6-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Xuan Zhuo [Fri, 24 Jun 2022 02:55:41 +0000 (10:55 +0800)]
remoteproc: rename len of rpoc_vring to num
Rename the member len in the structure rpoc_vring to num. And remove 'in
bytes' from the comment of it. This is misleading. Because this actually
refers to the size of the virtio vring to be created. The unit is not
bytes.
Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Message-Id: <20220624025621.128843-2-xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Merge tag 'x86_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Update the 'mitigations=' kernel param documentation
- Check the IBPB feature flag before enabling IBPB in firmware calls
because cloud vendors' fantasy when it comes to creating guest
configurations is unlimited
- Unexport sev_es_ghcb_hv_call() before 5.19 releases now that HyperV
doesn't need it anymore
- Remove dead CONFIG_* items
* tag 'x86_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
docs/kernel-parameters: Update descriptions for "mitigations=" param with retbleed
x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available
Revert "x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV"
x86/configs: Update configs in x86_debug.config
Merge tag 'locking_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking fix from Borislav Petkov:
- Avoid rwsem lockups in certain situations when handling the handoff
bit
* tag 'locking_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
Merge tag 'edac_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras
Pull EDAC fixes from Borislav Petkov:
- Relax the condition under which the DIMM label in ghes_edac is set in
order to accomodate an HPE BIOS which sets only the device but not
the bank
- Two forgotten fixes to synopsys_edac when handling error interrupts
* tag 'edac_urgent_for_v5.19' of git://git.kernel.org/pub/scm/linux/kernel/git/ras/ras:
EDAC/ghes: Set the DIMM label unconditionally
EDAC/synopsys: Re-enable the error interrupts on v3 hw
EDAC/synopsys: Use the correct register to disable the error interrupt on v3 hw
Waiman Long [Wed, 22 Jun 2022 20:04:19 +0000 (16:04 -0400)]
locking/rwsem: Allow slowpath writer to ignore handoff bit if not set by first waiter
With commit d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
consistent"), the writer that sets the handoff bit can be interrupted
out without clearing the bit if the wait queue isn't empty. This disables
reader and writer optimistic lock spinning and stealing.
Now if a non-first writer in the queue is somehow woken up or a new
waiter enters the slowpath, it can't acquire the lock. This is not the
case before commit d257cc8cb8d5 as the writer that set the handoff bit
will clear it when exiting out via the out_nolock path. This is less
efficient as the busy rwsem stays in an unlock state for a longer time.
In some cases, this new behavior may cause lockups as shown in [1] and
[2].
This patch allows a non-first writer to ignore the handoff bit if it
is not originally set or initiated by the first waiter. This patch is
shown to be effective in fixing the lockup problem reported in [1].
Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent") Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: John Donnelly <john.p.donnelly@oracle.com> Tested-by: Mel Gorman <mgorman@techsingularity.net> Link: https://lore.kernel.org/r/20220622200419.778799-1-longman@redhat.com
Merge tag 'mm-hotfixes-stable-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"Two hotfixes, both cc:stable"
* tag 'mm-hotfixes-stable-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
mm/hmm: fault non-owner device private entries
page_alloc: fix invalid watermark check on a negative value
Merge tag 'drm-fixes-2022-07-30' of git://anongit.freedesktop.org/drm/drm
Pull more drm fixes from Dave Airlie:
"Maxime had the dog^Wmailing list server eat his homework^Wmisc pull
request.
Two more small fixes, one in nouveau svm code and the other in
simpledrm.
nouveau:
- page migration fix
simpledrm:
- fix mode_valid return value"
* tag 'drm-fixes-2022-07-30' of git://anongit.freedesktop.org/drm/drm:
nouveau/svm: Fix to migrate all requested pages
drm/simpledrm: Fix return type of simpledrm_simple_display_pipe_mode_valid()
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Four fixes, three in drivers.
The two biggest fixes are ufs and the remaining driver and core fix
are small and obvious (and the core fix is low risk)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: ufs: core: Fix a race condition related to device management
scsi: core: Fix warning in scsi_alloc_sgtables()
scsi: ufs: host: Hold reference returned by of_parse_phandle()
scsi: mpt3sas: Stop fw fault watchdog work item during system shutdown
Ralph Campbell [Mon, 25 Jul 2022 18:36:14 +0000 (11:36 -0700)]
mm/hmm: fault non-owner device private entries
If hmm_range_fault() is called with the HMM_PFN_REQ_FAULT flag and a
device private PTE is found, the hmm_range::dev_private_owner page is used
to determine if the device private page should not be faulted in.
However, if the device private page is not owned by the caller,
hmm_range_fault() returns an error instead of calling migrate_to_ram() to
fault in the page.
For example, if a page is migrated to GPU private memory and a RDMA fault
capable NIC tries to read the migrated page, without this patch it will
get an error. With this patch, the page will be migrated back to system
memory and the NIC will be able to read the data.
Jaewon Kim [Mon, 25 Jul 2022 09:52:12 +0000 (18:52 +0900)]
page_alloc: fix invalid watermark check on a negative value
There was a report that a task is waiting at the
throttle_direct_reclaim. The pgscan_direct_throttle in vmstat was
increasing.
This is a bug where zone_watermark_fast returns true even when the free
is very low. The commit f27ce0e14088 ("page_alloc: consider highatomic
reserve in watermark fast") changed the watermark fast to consider
highatomic reserve. But it did not handle a negative value case which
can be happened when reserved_highatomic pageblock is bigger than the
actual free.
If watermark is considered as ok for the negative value, allocating
contexts for order-0 will consume all free pages without direct reclaim,
and finally free page may become depleted except highatomic free.
Then allocating contexts may fall into throttle_direct_reclaim. This
symptom may easily happen in a system where wmark min is low and other
reclaimers like kswapd does not make free pages quickly.
Handle the negative case by using MIN.
Link: https://lkml.kernel.org/r/20220725095212.25388-1-jaewon31.kim@samsung.com Fixes: f27ce0e14088 ("page_alloc: consider highatomic reserve in watermark fast") Signed-off-by: Jaewon Kim <jaewon31.kim@samsung.com> Reported-by: GyeongHwan Hong <gh21.hong@samsung.com> Acked-by: Mel Gorman <mgorman@techsingularity.net> Cc: Minchan Kim <minchan@kernel.org> Cc: Baoquan He <bhe@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Yong-Taek Lee <ytk.lee@samsung.com> Cc: <stable@vger.kerenl.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Merge tag 'perf-tools-fixes-for-v5.19-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tools fixes from Arnaldo Carvalho de Melo:
- Fix addresses for bss symbols, describing variables used in resolving
data access in tools such as 'perf c2c' and 'perf mem'.
- Skip symbols if SHF_ALLOC flag is not set, a technique used for
listing deprecated symbols, its addresses are zeros, so not useful.
- Remove undefined behavior from bpf_perf_object__next() when dealing
with an empty bpf_objects_list list.
- Make a ARM CoreSight disasm script work with both python2 and
python3.
- Sync x86's cpufeatures header with with the kernel sources.
* tag 'perf-tools-fixes-for-v5.19-2022-07-29' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
perf bpf: Remove undefined behavior from bpf_perf_object__next()
perf symbol: Skip symbols if SHF_ALLOC flag is not set
perf symbol: Correct address for bss symbols
perf scripts python: Let script to be python2 compliant
tools headers cpufeatures: Sync with the kernel sources
Merge tag 'pm-5.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fix from Rafael Wysocki:
"Make some false positive RCU splats resulting from a recent intel_idle
driver change go away (Waiman Long)"
* tag 'pm-5.19-rc9' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
intel_idle: Fix false positive RCU splats due to incorrect hardirqs state
Lai Jiangshan [Fri, 29 Jul 2022 09:44:38 +0000 (17:44 +0800)]
workqueue: Avoid a false warning in unbind_workers()
Doing set_cpus_allowed_ptr() with wq_unbound_cpumask can be possible
fails and trigger the false warning.
Use cpu_possible_mask instead when wq_unbound_cpumask has no active CPUs.
It is very easy to trigger the warning:
Set wq_unbound_cpumask to a small set of CPUs.
Offline all the CPUs of wq_unbound_cpumask.
Offline an extra CPU and trigger the warning.
Fixes: 10a5a651e3af ("workqueue: Restrict kworker in the offline CPU pool running on housekeeping CPUs") Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Tejun Heo <tj@kernel.org>
Merge tag 'loongarch-fixes-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
- Fix cache size calculation, stack protection attributes, ptrace's
fpr_set and "ROM Size" in boardinfo
- Some cleanups and improvements of assembly
- Some cleanups of unused code and useless code
* tag 'loongarch-fixes-5.19-5' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: Fix wrong "ROM Size" of boardinfo
LoongArch: Fix missing fcsr in ptrace's fpr_set
LoongArch: Fix shared cache size calculation
LoongArch: Disable executable stack by default
LoongArch: Remove unused variables
LoongArch: Remove clock setting during cpu hotplug stage
LoongArch: Remove useless header compiler.h
LoongArch: Remove several syntactic sugar macros for branches
LoongArch: Re-tab the assembly files
LoongArch: Simplify "BGT foo, zero" with BGTZ
LoongArch: Simplify "BLT foo, zero" with BLTZ
LoongArch: Simplify "BEQ/BNE foo, zero" with BEQZ/BNEZ
LoongArch: Use the "move" pseudo-instruction where applicable
LoongArch: Use the "jr" pseudo-instruction where applicable
LoongArch: Use ABI names of registers where appropriate
Merge tag 'powerpc-5.19-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
- Re-enable the new amdgpu display engine for powerpc, as long as the
compiler is correctly configured.
- Disable stack variable initialisation in prom_init to fix GCC 12
allmodconfig.
Thanks to Dan Horák and Sudip Mukherjee.
* tag 'powerpc-5.19-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
drm/amdgpu: Re-enable DCN for 64-bit powerpc
powerpc/64s: Disable stack variable initialisation for prom_init
Tiezhu Yang [Thu, 21 Jul 2022 09:53:01 +0000 (17:53 +0800)]
LoongArch: Fix wrong "ROM Size" of boardinfo
We can see the "ROM Size" is different in the following outputs:
[root@linux loongson]# cat /sys/firmware/loongson/boardinfo
BIOS Information
Vendor : Loongson
Version : vUDK2018-LoongArch-V2.0.pre-beta8
ROM Size : 63 KB
Release Date : 06/15/2022
Board Information
Manufacturer : Loongson
Board Name : Loongson-LS3A5000-7A1000-1w-A2101
Family : LOONGSON64
[root@linux loongson]# dmidecode | head -11
...
Handle 0x0000, DMI type 0, 26 bytes
BIOS Information
Vendor: Loongson
Version: vUDK2018-LoongArch-V2.0.pre-beta8
Release Date: 06/15/2022
ROM Size: 4 MB
According to "BIOS Information (Type 0) structure" in the SMBIOS
Reference Specification [1], it shows 64K * (n+1) is the size of
the physical device containing the BIOS if the size is less than
16M.
Additionally, we can see the related code in dmidecode [2]:
u64 s = { .l = (code1 + 1) << 6 };
So the output of dmidecode is correct, the output of boardinfo
is wrong, fix it.
By the way, at present no need to consider the size is 16M or
greater on LoongArch, because it is usually 4M or 8M which is
enough to use.
Current calculation of shared cache size is from the node (die) scope,
but we hope 'lscpu' to show the shared cache size of the whole package
for multi-die chips (e.g., Loongson-3C5000L, which contains 4 dies in
one package). So fix it by multiplying nodes_per_package.
Bibo Mao [Wed, 20 Jul 2022 07:21:51 +0000 (15:21 +0800)]
LoongArch: Remove clock setting during cpu hotplug stage
On physical machine we can save power by disabling clock of hot removed
cpu. However as different platforms require different methods to
configure clocks, the code is platform-specific, and probably belongs to
firmware/pmu or cpu regulator, rather than generic arch/loongarch code.
Also, there is no such register on QEMU virt machine since the
clock/frequency regulation is not emulated.
This patch removes the hard-coded clock register accesses in generic
LoongArch cpu hotplug flow.
Reviewed-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Bibo Mao <maobibo@loongson.cn> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:22 +0000 (23:57 +0800)]
LoongArch: Re-tab the assembly files
Reflow the *.S files for better stylistic consistency, namely hard tabs
after mnemonic position, and vertical alignment of the first operand
with hard tabs. Tab width is obviously 8. Some pre-existing intra-block
vertical alignments are preserved.
Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:21 +0000 (23:57 +0800)]
LoongArch: Simplify "BGT foo, zero" with BGTZ
Support for the syntactic sugar is present in upstream binutils port
from the beginning. Use it for shorter lines and better consistency.
Generated code should be identical.
Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:20 +0000 (23:57 +0800)]
LoongArch: Simplify "BLT foo, zero" with BLTZ
Support for the syntactic sugar is present in upstream binutils port
from the beginning. Use it for shorter lines and better consistency.
Generated code should be identical.
Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:19 +0000 (23:57 +0800)]
LoongArch: Simplify "BEQ/BNE foo, zero" with BEQZ/BNEZ
While B{EQ,NE}Z and B{EQ,NE} are different instructions, and the vastly
expanded range for branch destination does not really matter in the few
cases touched, use the B{EQ,NE}Z where possible for shorter lines and
better consistency (e.g. some places used "BEQ foo, zero", while some
used "BEQ zero, foo").
Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:18 +0000 (23:57 +0800)]
LoongArch: Use the "move" pseudo-instruction where applicable
Some of the assembly code in the LoongArch port likely originated
from a time when the assembler did not support pseudo-instructions like
"move" or "jr", so the desugared form was used and readability suffers
(to a minor degree) as a result.
As the upstream toolchain supports these pseudo-instructions from the
beginning, migrate the existing few usages to them for better
readability.
Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:17 +0000 (23:57 +0800)]
LoongArch: Use the "jr" pseudo-instruction where applicable
Some of the assembly code in the LoongArch port likely originated
from a time when the assembler did not support pseudo-instructions like
"move" or "jr", so the desugared form was used and readability suffers
(to a minor degree) as a result.
As the upstream toolchain supports these pseudo-instructions from the
beginning, migrate the existing few usages to them for better
readability.
Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
WANG Xuerui [Tue, 26 Jul 2022 15:57:16 +0000 (23:57 +0800)]
LoongArch: Use ABI names of registers where appropriate
Some of the assembly in the LoongArch port seem to come from a
prehistoric time, when the assembler didn't even have support for the
ABI names we all come to know and love, thus used raw register numbers
which hampered readability.
The usages are found with a regex match inside arch/loongarch, then
manually adjusted for those non-definitions.
Signed-off-by: WANG Xuerui <git@xen0n.name> Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
When offset is larger than the size of the bit array, we should not
attempt to access the array as we can perform an access beyond the
end of the array. Fix this by changing the pre-condition.
Using "cmp r2, r1; bhs ..." covers us for the size == 0 case, since
this will always take the branch when r1 is zero, irrespective of
the value of r2. This means we can fix this bug without adding any
additional code!
Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
x86/bugs: Do not enable IBPB at firmware entry when IBPB is not available
Some cloud hypervisors do not provide IBPB on very recent CPU processors,
including AMD processors affected by Retbleed.
Using IBPB before firmware calls on such systems would cause a GPF at boot
like the one below. Do not enable such calls when IBPB support is not
present.
Users may request that pages from an OpenCL SVM allocation be migrated
to the GPU with clEnqueueSVMMigrateMem(). In Nouveau this will call into
nouveau_dmem_migrate_vma() to do the migration. If the total range to be
migrated exceeds SG_MAX_SINGLE_ALLOC the pages will be migrated in
chunks of size SG_MAX_SINGLE_ALLOC. However a typo in updating the
starting address means that only the first chunk will get migrated.
Fix the calculation so that the entire range will get migrated if
possible.
Signed-off-by: Alistair Popple <apopple@nvidia.com> Fixes: e3d8b0890469 ("drm/nouveau/svm: map pages after migration") Reviewed-by: Ralph Campbell <rcampbell@nvidia.com> Reviewed-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Lyude Paul <lyude@redhat.com> Link: https://patchwork.freedesktop.org/patch/msgid/20220720062745.960701-1-apopple@nvidia.com Cc: <stable@vger.kernel.org> # v5.8+
* tag 'net-5.19-final' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (67 commits)
stmmac: dwmac-mediatek: fix resource leak in probe
ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr
net: ping6: Fix memleak in ipv6_renew_options().
net/funeth: Fix fun_xdp_tx() and XDP packet reclaim
sctp: leave the err path free in sctp_stream_init to sctp_stream_free
sfc: disable softirqs for ptp TX
ptp: ocp: Select CRC16 in the Kconfig.
tcp: md5: fix IPv4-mapped support
virtio-net: fix the race between refill work and close
mptcp: Do not return EINPROGRESS when subflow creation succeeds
Bluetooth: L2CAP: Fix use-after-free caused by l2cap_chan_put
Bluetooth: Always set event mask on suspend
Bluetooth: mgmt: Fix double free on error path
wifi: mac80211: do not abuse fq.lock in ieee80211_do_stop()
ice: do not setup vlan for loopback VSI
ice: check (DD | EOF) bits on Rx descriptor rather than (EOP | RS)
ice: Fix VSIs unable to share unicast MAC
ice: Fix tunnel checksum offload with fragmented traffic
ice: Fix max VLANs available for VF
netfilter: nft_queue: only allow supported familes and hooks
...
Ziyang Xuan [Thu, 28 Jul 2022 01:33:07 +0000 (09:33 +0800)]
ipv6/addrconf: fix a null-ptr-deref bug for ip6_ptr
Change net device's MTU to smaller than IPV6_MIN_MTU or unregister
device while matching route. That may trigger null-ptr-deref bug
for ip6_ptr probability as following.
=========================================================
BUG: KASAN: null-ptr-deref in find_match.part.0+0x70/0x134
Read of size 4 at addr 0000000000000308 by task ping6/263
Reproducer as following:
Firstly, prepare conditions:
$ip netns add ns1
$ip netns add ns2
$ip link add veth1 type veth peer name veth2
$ip link set veth1 netns ns1
$ip link set veth2 netns ns2
$ip netns exec ns1 ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1
$ip netns exec ns2 ip -6 addr add 2001:0db8:0:f101::2/64 dev veth2
$ip netns exec ns1 ifconfig veth1 up
$ip netns exec ns2 ifconfig veth2 up
$ip netns exec ns1 ip -6 route add 2000::/64 dev veth1 metric 1
$ip netns exec ns2 ip -6 route add 2001::/64 dev veth2 metric 1
Secondly, execute the following two commands in two ssh windows
respectively:
$ip netns exec ns1 sh
$while true; do ip -6 addr add 2001:0db8:0:f101::1/64 dev veth1; ip -6 route add 2000::/64 dev veth1 metric 1; ping6 2000::2; done
$ip netns exec ns1 sh
$while true; do ip link set veth1 mtu 1000; ip link set veth1 mtu 1500; sleep 5; done
It is because ip6_ptr has been assigned to NULL in addrconf_ifdown() firstly,
then ip6_ignore_linkdown() accesses ip6_ptr directly without NULL check.
When we close ping6 sockets, some resources are left unfreed because
pingv6_prot is missing sk->sk_prot->destroy(). As reported by
syzbot [0], just three syscalls leak 96 bytes and easily cause OOM.
struct ipv6_sr_hdr *hdr;
char data[24] = {0};
int fd;
To fix memory leaks, let's add a destroy function.
Note the socket() syscall checks if the GID is within the range of
net.ipv4.ping_group_range. The default value is [1, 0] so that no
GID meets the condition (1 <= GID <= 0). Thus, the local DoS does
not succeed until we change the default value. However, at least
Ubuntu/Fedora/RHEL loosen it.
watch_queue: Fix missing locking in add_watch_to_object()
If a watch is being added to a queue, it needs to guard against
interference from addition of a new watch, manual removal of a watch and
removal of a watch due to some other queue being destroyed.
KEYCTL_WATCH_KEY guards against this for the same {key,queue} pair by
holding the key->sem writelocked and by holding refs on both the key and
the queue - but that doesn't prevent interaction from other {key,queue}
pairs.
While add_watch_to_object() does take the spinlock on the event queue,
it doesn't take the lock on the source's watch list. The assumption was
that the caller would prevent that (say by taking key->sem) - but that
doesn't prevent interference from the destruction of another queue.
Fix this by locking the watcher list in add_watch_to_object().
Fixes: c73be61cede5 ("pipe: Add general notification queue support") Reported-by: syzbot+03d7b43290037d1f87ca@syzkaller.appspotmail.com Signed-off-by: David Howells <dhowells@redhat.com>
cc: keyrings@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>