fw_devlink could only detect a single and simple cycle because it relied
mainly on device link cycle detection code that only checked for cycles
between devices. The expectation was that the firmware wouldn't have
complicated cycles and multiple cycles between devices. That expectation
has been proven to be wrong.
And this is without including parent child dependencies or nodes in the
cycle that are just firmware nodes that'll never have a struct device
created for them.
The proper way to treat these devices it to not force any probe ordering
between them, while still enforce dependencies between node in the
cycles (A, B and C) and their consumers.
So this patch goes all out and just deals with all types of cycles. It
does this by:
1. Following dependencies across device links, parent-child and fwnode
links.
2. When it find cycles, it mark the device links and fwnode links as
such instead of just deleting them or making the indistinguishable
from proxy SYNC_STATE_ONLY device links.
This way, when new nodes get added, we can immediately find and mark any
new cycles whether the new node is a device or firmware node.
Fixes: 2de9d8e0d2fe ("driver core: fw_devlink: Improve handling of cyclic dependencies") Signed-off-by: Saravana Kannan <saravanak@google.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Douglas Anderson <dianders@chromium.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Luca Weiss <luca.weiss@fairphone.com> # qcom/sm7225-fairphone-fp4 Link: https://lore.kernel.org/r/20230207014207.1678715-9-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
fw_devlink shouldn't defer the probe of a device to wait on a supplier
that'll never have a struct device or will never be probed by a driver.
We currently check if a supplier falls into this category, but don't
check its ancestors. We need to check the ancestors too because if the
ancestor will never probe, then the supplier will never probe either.
Signed-off-by: Saravana Kannan <saravanak@google.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Douglas Anderson <dianders@chromium.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Luca Weiss <luca.weiss@fairphone.com> # qcom/sm7225-fairphone-fp4 Link: https://lore.kernel.org/r/20230207014207.1678715-3-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: 3fb16866b51d ("driver core: fw_devlink: Make cycle detection more robust") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
To improve detection and handling of dependency cycles, we need to be
able to mark fwnode links as being part of cycles. fwnode links marked
as being part of a cycle should not block their consumers from probing.
Fixes: 2de9d8e0d2fe ("driver core: fw_devlink: Improve handling of cyclic dependencies") Signed-off-by: Saravana Kannan <saravanak@google.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Douglas Anderson <dianders@chromium.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Luca Weiss <luca.weiss@fairphone.com> # qcom/sm7225-fairphone-fp4 Link: https://lore.kernel.org/r/20230207014207.1678715-7-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
When a device X is bound successfully to a driver, if it has a child
firmware node Y that doesn't have a struct device created by then, we
delete fwnode links where the child firmware node Y is the supplier. We
did this to avoid blocking the consumers of the child firmware node Y
from deferring probe indefinitely.
While that a step in the right direction, it's better to make the
consumers of the child firmware node Y to be consumers of the device X
because device X is probably implementing whatever functionality is
represented by child firmware node Y. By doing this, we capture the
device dependencies more accurately and ensure better
probe/suspend/resume ordering.
Signed-off-by: Saravana Kannan <saravanak@google.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Douglas Anderson <dianders@chromium.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Luca Weiss <luca.weiss@fairphone.com> # qcom/sm7225-fairphone-fp4 Link: https://lore.kernel.org/r/20230207014207.1678715-2-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Stable-dep-of: 6a6dfdf8b3ff ("driver core: fw_devlink: Allow marking a fwnode link as being part of a cycle") Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
fw_devlink uses DL_FLAG_SYNC_STATE_ONLY device link flag for two
purposes:
1. To allow a parent device to proxy its child device's dependency on a
supplier so that the supplier doesn't get its sync_state() callback
before the child device/consumer can be added and probed. In this
usage scenario, we need to ignore cycles for ensure correctness of
sync_state() callbacks.
2. When there are dependency cycles in firmware, we don't know which of
those dependencies are valid. So, we have to ignore them all wrt
probe ordering while still making sure the sync_state() callbacks
come correctly.
However, when detecting dependency cycles, there can be multiple
dependency cycles between two devices that we need to detect. For
example:
A -> B -> A and A -> C -> B -> A.
To detect multiple cycles correct, we need to be able to differentiate
DL_FLAG_SYNC_STATE_ONLY device links used for (1) vs (2) above.
To allow this differentiation, add a DL_FLAG_CYCLE that can be use to
mark use case (2). We can then use the DL_FLAG_CYCLE to decide which
DL_FLAG_SYNC_STATE_ONLY device links to follow when looking for
dependency cycles.
Fixes: 2de9d8e0d2fe ("driver core: fw_devlink: Improve handling of cyclic dependencies") Signed-off-by: Saravana Kannan <saravanak@google.com> Tested-by: Colin Foster <colin.foster@in-advantage.com> Tested-by: Sudeep Holla <sudeep.holla@arm.com> Tested-by: Douglas Anderson <dianders@chromium.org> Tested-by: Geert Uytterhoeven <geert+renesas@glider.be> Tested-by: Luca Weiss <luca.weiss@fairphone.com> # qcom/sm7225-fairphone-fp4 Link: https://lore.kernel.org/r/20230207014207.1678715-6-saravanak@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
There maybe pending USR interrupt before requesting irq, however
uart_add_one_port has not executed, so there will be kernel panic:
[ 0.795668] Unable to handle kernel NULL pointer dereference at virtual addre
ss 0000000000000080
[ 0.802701] Mem abort info:
[ 0.805367] ESR = 0x0000000096000004
[ 0.808950] EC = 0x25: DABT (current EL), IL = 32 bits
[ 0.814033] SET = 0, FnV = 0
[ 0.816950] EA = 0, S1PTW = 0
[ 0.819950] FSC = 0x04: level 0 translation fault
[ 0.824617] Data abort info:
[ 0.827367] ISV = 0, ISS = 0x00000004
[ 0.831033] CM = 0, WnR = 0
[ 0.833866] [0000000000000080] user address but active_mm is swapper
[ 0.839951] Internal error: Oops: 0000000096000004 [#1] PREEMPT SMP
[ 0.845953] Modules linked in:
[ 0.848869] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.1.1+g56321e101aca #1
[ 0.855617] Hardware name: Freescale i.MX8MP EVK (DT)
[ 0.860452] pstate: 000000c5 (nzcv daIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[ 0.867117] pc : __imx_uart_rxint.constprop.0+0x11c/0x2c0
[ 0.872283] lr : imx_uart_int+0xf8/0x1ec
The issue only happends in the inmate linux when Jailhouse hypervisor
enabled. The test procedure is:
while true; do
jailhouse enable imx8mp.cell
jailhouse cell linux xxxx
sleep 10
jailhouse cell destroy 1
jailhouse disable
sleep 5
done
And during the upper test, press keys to the 2nd linux console.
When `jailhouse cell destroy 1`, the 2nd linux has no chance to put
the uart to a quiese state, so USR1/2 may has pending interrupts. Then
when `jailhosue cell linux xx` to start 2nd linux again, the issue
trigger.
In order to disable irqs before requesting them, both UCR1 and UCR2 irqs
should be disabled, so here fix that, disable the Ageing Timer interrupt
in UCR2 as UCR1 does.
Fixes: 8a61f0c70ae6 ("serial: imx: Disable irqs before requesting them") Suggested-by: Sherry Sun <sherry.sun@nxp.com> Reviewed-by: Sherry Sun <sherry.sun@nxp.com> Signed-off-by: Peng Fan <peng.fan@nxp.com> Acked-by: Jason Liu <jason.hui.liu@nxp.com> Link: https://lore.kernel.org/r/20230206013016.29352-1-sherry.sun@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The irdma driver can use a maximum number of msix vectors equal
to num_online_cpus() + 1 and the kernel warning stack below is shown
if that number is exceeded.
The kernel throws a warning as the driver tries to update the affinity
hint with a CPU mask greater than the max CPU IDs. Fix this by capping
the MSIX vectors to num_online_cpus() + 1.
To update the I/O pins, the registers are read/modified/written. The
read operation incorrectly always read the first register. Although
wrong, there wasn't any impact as all the output pins are always
written, and the inputs are read only anyway.
Fixes: 2d53139f3162 ("Add support for using a MAX3421E chip as a host driver.") Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz> Link: https://lore.kernel.org/r/20230207033337.18112-1-mark.tomlinson@alliedtelesis.co.nz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
If get_ep_from_tid() fails to lookup non-NULL value for ep, ep is
dereferenced later regardless of whether it is empty.
This patch adds a simple sanity check to fix the issue.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 944661dd97f4 ("RDMA/iw_cxgb4: atomically lookup ep and get a reference") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://lore.kernel.org/r/20230202184850.29882-1-n.zhandarovich@fintech.ru Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The rn5t618 power driver fails to register
a cooling device because POWER_SUPPLY_PROP_CHARGE_CONTROL_LIMIT_MAX
is missing but availability is not checked before registering
cooling device. After improved error checking in the thermal
code, the registration of the power supply fails entirely.
Checking for availability of _MAX before registering cooling device
fixes the rn5t618 problem. But the whole logic feels questionable.
First, the logic is inverted here:
the code tells: max_current = max_cooling but
0 = max_cooling, so there needs to be some inversion
in the code which cannot be found. Comparing with other
cooling devices, it can be found that value for fan speed is not
inverted, value for cpufreq cooling is inverted (similar situation
as here lowest frequency = max cooling)
Second, analyzing usage of _MAX: it is seems that maximum capabilities
of charging controller are specified and not of the battery. Probably
there is not too much mismatch in the drivers actually implementing
that. So nothing has exploded yet. So there is no easy and safe way
to specifify a max cooling value now.
Conclusion for now (as a regression fix) just remove the cooling device
registration and do it properly later on.
Fixes: e49a1e1ee078 ("thermal/core: fix error code in __thermal_cooling_device_register()") Fixes: 952aeeb3ee28 ("power_supply: Register power supply for thermal cooling device") Signed-off-by: Andreas Kemnade <andreas@kemnade.info> Signed-off-by: Sebastian Reichel <sebastian.reichel@collabora.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Some devices like ZBT WE1326 and ZBT WF3526-P and some Netgear models need
to delay phy port initialization after calling the mt7621_pcie_init_port()
driver function to get into reliable boots for both warm and hard resets.
The delay required to detect the ports seems to be in the range [75-100]
milliseconds.
If the ports are not detected the controller is not functional.
There is no datasheet or something similar to really understand why this
extra delay is needed only for these devices and it is not for most of
the boards that are built on mt7621 SoC.
This issue has been reported by openWRT community and the complete
discussion is in [0]. The 100 milliseconds delay has been tested in all
devices to validate it.
Add the extra 100 milliseconds delay to fix the issue.
Remove the temporary @mask_, this may cause build warning when use clang
compiler for powerpc, but can't reproduce it when compile for arm64.
the build warning is caused by:
"warning: result of comparison of constant 18446744073709551615 with
expression of type (aka 'unsigned long') is always false
[-Wtautological-constant-out-of-range-compare]"
More information provided in below lore link.
After removing @mask_, there is a "CHECK:MACRO_ARG_REUSE" when run
checkpatch.pl, but due to @mask is constant, no reuse problem will happen.
During enumeration or composition switch,a userspace process
agnostic of the conventions of configs can try to create function
symlinks even after the UDC is bound to current config which is
not correct. Potentially it can create duplicates within the
current config.
Prevent this by adding a check if udc_name already exists, then bail
out of cfg_link.
both the lists corresponds to the same function instance ffs.a
but the usb_function* pointer is different because in step 3
ffs_alloc has created a new reference to usb_function* for
ffs.a and added it to cfg_list.
Step4:
Now a composition switch involving <ffs.b,ffs.a> is executed.
the composition switch will involve 3 things:
1. unlinking the previous functions existing
2. creating new symlinks
3. writing UDC
However, the composition switch is generally taken care by
userspace process which creates the symlinks in its own
nomenclature(X*) and removes only those.
So it won't be able to remove Y1 which user had created
by own.
Due to this the new symlinks cannot be created for ffs.a
since the entry already exists in CFG->FUNC_LIST.
The state of the CFG->FUNC_LIST is as follows:
CFG->FUNC_LIST: <ffs.a>
Fixes: 88af8bbe4ef7 ("usb: gadget: the start of the configfs interface") Signed-off-by: Krishna Kurapati PSSNV <quic_kriskura@quicinc.com> Signed-off-by: Udipto Goswami <quic_ugoswami@quicinc.com> Link: https://lore.kernel.org/r/20230201132308.31523-1-quic_ugoswami@quicinc.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
This function only calls mtk_otg_switch_init() when the ->port_mode
is MUSB_OTG so the clean up code should only call mtk_otg_switch_exit()
for that mode.
Fixes: 0990366bab3c ("usb: musb: Add support for MediaTek musb controller") Signed-off-by: Dan Carpenter <error27@gmail.com> Link: https://lore.kernel.org/r/Y8/3TqpqiSr0RxFH@kili Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
ip_dev_find() may return NULL and assign it to pdev which is
dereferenced later.
Fix this by checking the return value of ip_dev_find() for NULL
similar to the way it is done with other instances of said function.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: 1cab775c3e75 ("RDMA/cxgb4: Fix LE hash collision bug for passive open connection") Signed-off-by: Nikita Zhandarovich <n.zhandarovich@fintech.ru> Link: https://lore.kernel.org/r/20230201172103.17261-1-n.zhandarovich@fintech.ru Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The RXWATER value must be greater than 0 according to the LPUART
reference manual. And when the number of datawords in the receive
FIFO is greater than RXWATER, an interrupt or a DMA request is
generated, so no need to set the different value for lpuart interrupt
case and dma case. Here delete the wrong RXWATER setting for dma case
directly.
Fixes: 42b68768e51b ("serial: fsl_lpuart: DMA support for 32-bit variant") Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Link: https://lore.kernel.org/r/20230130064449.9564-4-sherry.sun@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Original busy loop with retries count in mpf_poll_status() is not too
reliable, as it takes different times on different systems. Replace it
with read_poll_timeout() macro.
While at it, fix polling stop condition to met function's original
intention declared in the comment. The issue with original polling stop
condition is that it stops if any of mask bits is set, while intention
was to stop if all mask bits is set. This was not noticible because only
MPF_STATUS_READY is passed as mask argument and it is BIT(1).
As spi-summary doc says:
> I/O buffers use the usual Linux rules, and must be DMA-safe.
> You'd normally allocate them from the heap or free page pool.
> Don't use the stack, or anything that's declared "static".
Replace spi_write() with spi_write_then_read(), which is dma-safe for
on-stack buffers. Use cacheline aligned buffers for transfers used in
spi_sync_transfer().
Although everything works OK with stack-located I/O buffers, better
follow the doc to be safe.
Interleaved DMA transfer support was added by 85e7518f42c8 ("dmaengine:
dw-edma: Add device_prep_interleave_dma() support"), but depending on the
selected channel, either source or destination address are left
uninitialized which was obviously wrong.
Initialize the destination address of the eDMA burst descriptors for
DEV_TO_MEM interleaved operations and the source address for MEM_TO_DEV
operations.
struct iommu_ioas_copy, struct iommu_option and struct iommu_vfio_ioas are
missed in ucmd_buffer. Although they are smaller than the size of
ucmd_buffer, it is safer to list them in ucmd_buffer explicitly.
Fixes: aad37e71d5c4 ("iommufd: IOCTLs for the io_pagetable") Fixes: d624d6652a65 ("iommufd: vfio container FD ioctl compatibility") Link: https://lore.kernel.org/r/20230120122040.280219-1-yi.l.liu@intel.com Signed-off-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The normal call sequence of using transport class is:
Add path:
transport_setup_device()
transport_setup_classdev() // call sas_host_setup() here
transport_add_device() // if fails, need call transport_destroy_device()
transport_configure_device()
Remove path:
transport_remove_device()
transport_remove_classdev // call sas_host_remove() here
transport_destroy_device()
If transport_add_device() fails, need call transport_destroy_device()
to free memory, but in this case, ->remove() is not called, and the
resources allocated in ->setup() are leaked. So fix these leaks by
calling ->remove() in transport_add_class_device() if it returns error.
Current some drivers(like iscsi) call transport_register_device()
failed, they don't call transport_destroy_device() to release the
memory allocated in transport_setup_device(), because they don't
know what was done, it should be internal thing to release the
resource in register function. So fix this leak by calling destroy
function inside register function.
When calling kobject_add() failed in device_add(), it will call
cleanup_glue_dir() to free resource. But in kobject_add(),
dev->kobj.parent has been set to NULL. This will cause resource leak.
The process is as follows:
device_add()
get_device_parent()
class_dir_create_and_add()
kobject_add() //kobject_get()
...
dev->kobj.parent = kobj;
...
kobject_add() //failed, but set dev->kobj.parent = NULL
...
glue_dir = get_glue_dir(dev) //glue_dir = NULL, and goto
//"Error" label
...
cleanup_glue_dir() //becaues glue_dir is NULL, not call
//kobject_put()
The preceding problem may cause insmod mac80211_hwsim.ko to failed.
sysfs: cannot create duplicate filename '/devices/virtual/mac80211_hwsim'
Call Trace:
<TASK>
dump_stack_lvl+0x8e/0xd1
sysfs_warn_dup.cold+0x1c/0x29
sysfs_create_dir_ns+0x224/0x280
kobject_add_internal+0x2aa/0x880
kobject_add+0x135/0x1a0
get_device_parent+0x3d7/0x590
device_add+0x2aa/0x1cb0
device_create_groups_vargs+0x1eb/0x260
device_create+0xdc/0x110
mac80211_hwsim_new_radio+0x31e/0x4790 [mac80211_hwsim]
init_mac80211_hwsim+0x48d/0x1000 [mac80211_hwsim]
do_one_initcall+0x10f/0x630
do_init_module+0x19f/0x5e0
load_module+0x64b7/0x6eb0
__do_sys_finit_module+0x140/0x200
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
</TASK>
kobject_add_internal failed for mac80211_hwsim with -EEXIST, don't try to
register things with the same name in the same directory.
Fixes: cebf8fd16900 ("driver core: fix race between creating/querying glue dir and its cleanup") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20221123012042.335252-1-shaozhengchao@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
GUID_INIT() is for internal guid_t type and shouldn't be used
for the uuid_le. I.o.w. relying on the implementation details
is layering violation. Use correct macros to initialize uuid_le.
Fixes: 64e9bbdd9588 ("misc/mei/hdcp: Client driver for HDCP application") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Tomas Winkler <tomas.winkler@intel.com> Link: https://lore.kernel.org/r/20221228160500.21220-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
GUID_INIT() is for internal guid_t type and shouldn't be used
for the uuid_le. I.o.w. relying on the implementation details
is layering violation. Use correct macros to initialize uuid_le.
Fixes: c2004ce99ed7 ("mei: pxp: export pavp client to me client bus") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Tomas Winkler <tomas.winkler@intel.com> Link: https://lore.kernel.org/r/20221228160558.21311-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The call to get_user_pages_fast() in vmci_host_setup_notify() can return
NULL context->notify_page causing a GPF. To avoid GPF check if
context->notify_page == NULL and return error if so.
general protection fault, probably for non-canonical address
0xe0009d1000000060: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: maybe wild-memory-access in range [0x0005088000000300-
0x0005088000000307]
CPU: 2 PID: 26180 Comm: repro_34802241 Not tainted 6.1.0-rc4 #1
Hardware name: Red Hat KVM, BIOS 1.15.0-2.module+el8.6.0 04/01/2014
RIP: 0010:vmci_ctx_check_signal_notify+0x91/0xe0
Call Trace:
<TASK>
vmci_host_unlocked_ioctl+0x362/0x1f40
__x64_sys_ioctl+0x1a1/0x230
do_syscall_64+0x3a/0x90
entry_SYSCALL_64_after_hwframe+0x63/0xcd
As comment of pci_get_class() says, it returns a pci_device with its
refcount increased and decreased the refcount for the input parameter
@from if it is not NULL.
If we break the loop in applicom_init() with 'dev' not NULL, we need to
call pci_dev_put() to decrease the refcount. Add the missing
pci_dev_put() to avoid refcount leak.
A problem about idt_89hpesx create debugfs failed is triggered with the
following log given:
[ 4973.269647] debugfs: Directory 'idt_csr' with parent '/' already present!
The reason is that idt_init() returns i2c_add_driver() directly without
checking its return value, if i2c_add_driver() failed, it returns without
destroy the newly created debugfs, resulting the debugfs of idt_csr can
never be created later.
The function monitor_card() is a timer handler that runs in an
atomic context, but it calls usleep_range() that can sleep.
As a result, the sleep-in-atomic-context bugs will happen.
The process is shown below:
Add the missing clk_disable_unprepare() before return from
tegra_uart_hw_init() in the error handling path.
When request_irq() fails in tegra_uart_startup(), 'tup->uart_clk'
has been enabled, fix it by adding clk_disable_unprepare().
Fixes: cc9ca4d95846 ("serial: tegra: Only print FIFO error message when an error occurs") Fixes: d781ec21bae6 ("serial: tegra: report clk rate errors") Signed-off-by: Yi Yang <yiyang13@huawei.com> Link: https://lore.kernel.org/r/20221126020852.113378-1-yiyang13@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
UARTBAUD_RDMAE and UARTBAUD_TDMAE are enabled in lpuart32_startup(), but
lpuart32_shutdown() not disable them, only free the dma ring buffer and
release the dma channels, so here disable the Rx/Tx DMA first in
lpuart32_shutdown().
Fixes: 42b68768e51b ("serial: fsl_lpuart: DMA support for 32-bit variant") Signed-off-by: Sherry Sun <sherry.sun@nxp.com> Link: https://lore.kernel.org/r/20221125101953.18753-3-sherry.sun@nxp.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The PTT device can only support the devices on the same PCIe core,
within BDF range [lower_bdf, upper_bdf]. It's not correct to assume
the devices on the root bus are from the same PCIe core, there are
cases that root ports from different PCIe core are sharing the same
bus. So check when initializing the filters list.
Fixes: ff0de066b463 ("hwtracing: hisi_ptt: Add trace function support for HiSilicon PCIe Tune and Trace device") Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20230112112201.16283-1-yangyicong@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
8e4bfbe644a6 ("PCI: endpoint: pci-epf-vntb: fix error handle in
epf_ntb_mw_bar_init()") added a "num_mws" parameter to
epf_ntb_mw_bar_clear() but failed to add kernel-doc for num_mws.
Add kernel-doc for num_mws on epf_ntb_mw_bar_clear().
Fixes: 8e4bfbe644a6 ("PCI: endpoint: pci-epf-vntb: fix error handle in epf_ntb_mw_bar_init()") Link: https://lore.kernel.org/r/20230103024907.293853-1-yangyingliang@huawei.com Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
switchtec_dev_read() didn't handle copy_to_user() errors correctly: it
assigned "rc = -EFAULT", but actually returned either "size", -ENXIO, or
-EBADMSG instead.
Update the failure cases to unlock mrpc_mutex and return -EFAULT directly.
The sysfs link name "virtfn%u" constructed by pci_iov_sysfs_link() requires
17 bytes to contain the longest possible string. Increase VIRTFN_ID_LEN to
accommodate that.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
[bhelgaas: commit log, comment at #define] Fixes: dd7cc44d0bce ("PCI: add SR-IOV API for Physical Function driver") Link: https://lore.kernel.org/r/20221218033347.23743-1-gremlin@altlinux.org Signed-off-by: Alexey V. Vissarionov <gremlin@altlinux.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
When acpi_dev_get_memory_resources() fails, the reference count is
left bumped. Drop it as it's done in the other error paths.
Fixes: 43d596e32276 ("usb: typec: intel_pmc_mux: Check the port status before connect") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Heikki Krogerus <heikki.krogerus@linux.intel.com> Link: https://lore.kernel.org/r/20230102202933.15968-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
In commit 6746eae4bbad ("coresight: cti: Fix hang in cti_disable_hw()")
PM runtime calls are removed from cti_enable_hw/cti_disable_hw. When
enabling CTI by writing enable sysfs node, clock for accessing CTI
register won't be enabled. Device will crash due to register access
issue. Add PM runtime call in enable_store to fix this issue.
Fixes: 6746eae4bbad ("coresight: cti: Fix hang in cti_disable_hw()") Signed-off-by: Mao Jinlong <quic_jinlmao@quicinc.com>
[Change to only call pm_runtime_put if a disable happened] Tested-by: Jinlong Mao <quic_jinlmao@quicinc.com> Signed-off-by: James Clark <james.clark@arm.com> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20230110110736.2709917-3-james.clark@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Writing 0 to the enable control repeatedly results in a negative value
for enable_req_count. After this, writing 1 to the enable control
appears to not work until the count returns to positive.
Change it so that it's impossible for enable_req_count to be < 0.
Return an error to indicate that the disable request was invalid.
Fixes: 835d722ba10a ("coresight: cti: Initial CoreSight CTI Driver") Tested-by: Jinlong Mao <quic_jinlmao@quicinc.com> Signed-off-by: James Clark <james.clark@arm.com> Reviewed-by: Mike Leach <mike.leach@linaro.org> Signed-off-by: Suzuki K Poulose <suzuki.poulose@arm.com> Link: https://lore.kernel.org/r/20230110110736.2709917-2-james.clark@arm.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Move the control mapping to uvc_ctrl.c. This way we do not have
references to UVC controls or V4L2 controls in uvc_driver.c.
This also fixes a bug introduced in commit 382075604a68 ("media:
uvcvideo: Limit power line control for Quanta UVC Webcam"). The
offending commit caused the power line control menu entries to have
incorrect indices compared to the V4L2_CID_POWER_LINE_FREQUENCY_*
enumeration. Now that the limited mapping reuses the correct menu_info
array, the indices correctly map to the V4L2 control specification.
Fixes: 382075604a68 ("media: uvcvideo: Limit power line control for Quanta UVC Webcam") Signed-off-by: Ricardo Ribalda <ribalda@chromium.org> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Check for inactive controls in uvc_ctrl_is_accessible().
Use the new value for the master_id controls if present, otherwise
use the existing value to determine if it is OK to set the control.
Doing this here avoids attempting to set an inactive control, which
will return an error from the USB device, which returns an invalid
errorcode.
This fixes:
warn: v4l2-test-controls.cpp(483): s_ctrl returned EIO
 warn: v4l2-test-controls.cpp(483): s_ctrl returned EIO
test VIDIOC_G/S_CTRL: OK
 warn: v4l2-test-controls.cpp(739): s_ext_ctrls returned EIO
 warn: v4l2-test-controls.cpp(739): s_ext_ctrls returned EIO
 warn: v4l2-test-controls.cpp(816): s_ext_ctrls returned EIO
test VIDIOC_G/S/TRY_EXT_CTRLS: OK
Tested with:
v4l2-ctl -c auto_exposure=1
OK
v4l2-ctl -c exposure_time_absolute=251
OK
v4l2-ctl -c auto_exposure=3
OK
v4l2-ctl -c exposure_time_absolute=251
VIDIOC_S_EXT_CTRLS: failed: Input/output error
exposure_time_absolute: Input/output error
ERROR
v4l2-ctl -c auto_exposure=3,exposure_time_absolute=251,auto_exposure=1
v4l2-ctl -C auto_exposure,exposure_time_absolute Â
auto_exposure: 1
exposure_time_absolute: 251
Just memcmp() with ELFMAG - that's the normal way to do it in userland
code, which that thing is. Besides, that has the benefit of actually
building - str_has_prefix() is *NOT* present in <string.h>.
Fixes: 5f14596e55de "alpha: Replace strncmp with str_has_prefix" Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
In kobject_get_path(), if kobj->name is changed between calls
get_kobj_path_length() and fill_kobj_path() and the length becomes
longer, then fill_kobj_path() will have an out-of-bounds bug.
The actual current problem occurs when the ixgbe probe.
In ixgbe_mii_bus_init(), if the length of netdev->dev.kobj.name
length becomes longer, out-of-bounds will occur.
cpu0 cpu1
ixgbe_probe
register_netdev(netdev)
netdev_register_kobject
device_add
kobject_uevent // Sending ADD events
systemd-udevd // rename netdev
dev_change_name
device_rename
kobject_rename
ixgbe_mii_bus_init |
mdiobus_register |
__mdiobus_register |
device_register |
device_add |
kobject_uevent |
kobject_get_path |
len = get_kobj_path_length // old name |
path = kzalloc(len, gfp_mask); |
kobj->name = name;
/* name length becomes
* longer
*/
fill_kobj_path /* kobj path length is
* longer than path,
* resulting in out of
* bounds when filling path
*/
This is the kasan report:
==================================================================
BUG: KASAN: slab-out-of-bounds in fill_kobj_path+0x50/0xc0
Write of size 7 at addr ff1100090573d1fd by task kworker/28:1/673
I got the following null-ptr-deref report while doing fault injection test:
BUG: kernel NULL pointer dereference, address: 0000000000000058
CPU: 2 PID: 278 Comm: 37-i2c-ds2482 Tainted: G B W N 6.1.0-rc3+
RIP: 0010:klist_put+0x2d/0xd0
Call Trace:
<TASK>
klist_remove+0xf1/0x1c0
device_release_driver_internal+0x196/0x210
bus_remove_device+0x1bd/0x240
device_add+0xd3d/0x1100
w1_add_master_device+0x476/0x490 [wire]
ds2482_probe+0x303/0x3e0 [ds2482]
This is how it happened:
w1_alloc_dev()
// The dev->driver is set to w1_master_driver.
memcpy(&dev->dev, device, sizeof(struct device));
device_add()
bus_add_device()
dpm_sysfs_add() // It fails, calls bus_remove_device.
// error path
bus_remove_device()
// The dev->driver is not null, but driver is not bound.
__device_release_driver()
klist_remove(&dev->p->knode_driver) <-- It causes null-ptr-deref.
// normal path
bus_probe_device() // It's not called yet.
device_bind_driver()
If dev->driver is set, in the error path after calling bus_add_device()
in device_add(), bus_remove_device() is called, then the device will be
detached from driver. But device_bind_driver() is not called yet, so it
causes null-ptr-deref while access the 'knode_driver'. To fix this, set
dev->driver to null in the error path before calling bus_remove_device().
Fixes: 57eee3d23e88 ("Driver core: Call device_pm_add() after bus_add_device() in device_add()") Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Link: https://lore.kernel.org/r/20221205034904.2077765-1-yangyingliang@huawei.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The command FIFOs in the Cadence IP can be configured during design
up to 32 entries, and the code in cadence_master.c was assuming the
full 32-entry FIFO. But all current Intel implementations use an 8-entry
FIFO.
Up to now the longest message used was 6 entries so this wasn't
causing any problem. But future Cirrus Logic codecs have downloadable
firmware or tuning blobs. It is more efficient for the codec driver to
issue long transfers that can take advantage of any queuing in the
Soundwire controller and avoid the overhead of repeatedly writing the
page registers.
Make i2c_dw_clk_rate() to return u32 instead of unsigned long, as the
function return the value of get_clk_rate_khz() which returns u32.
Fixes: b33af11de236 ("i2c: designware: Do not require clock when SSCN and FFCN are provided") Signed-off-by: Hanna Hawa <hhhawa@amazon.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Jarkko Nikula <jarkko.nikula@linux.intel.com> Signed-off-by: Wolfram Sang <wsa@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
When request_irq(ires1->start) failed in w5300_hw_probe(), irq
ires->start has not been freed, and on the clean_up3 error path,
we also need to free ires1->start irq, fix it.
In addition, We should add free_irq in fusb300_remove(), and give
the lables a proper name so that they can be understood easily,
so add free_irq in fusb300_remove(), and update clean_up3 to
err_alloc_request.
Instead of hardcoding IRQ trigger type to IRQF_TRIGGER_RAISING,
let's respect the settings specified in the firmware description.
To be compatible with the older firmware descriptions, if trigger
type is not set up there, we'll set it to default (raising edge).
Fixes: 388be4883952 ("staging:iio: tsl2563 abi fixes and interrupt handling") Fixes: bdab1001738f ("staging:iio:light:tsl2563 remove old style event registration.") Signed-off-by: Ferry Toth <ftoth@exalondelft.nl> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20221207190348.9347-1-andriy.shevchenko@linux.intel.com Signed-off-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
rdma_user_mmap_entry_get_pgoff() takes the reference.
Add missing rdma_user_mmap_entry_put() to release the reference.
Fixes: 0045e0d3f42e ("RDMA/hns: Support direct wqe of userspace") Signed-off-by: Miaoqian Lin <linmq006@gmail.com>
Acked-by Haoyue Xu <xuhaoyue1@hisilicon.com> Link: https://lore.kernel.org/r/20221223072900.802728-1-linmq006@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The HiSilicon DMA Engine is only present on HiSilicon SoCs. Hence add a
dependency on ARCH_HISI, to prevent asking the user about this driver
when configuring a kernel without HiSilicon SoC support.
rdma_user_mmap_entry_get() take reference, we should release it when not
need anymore, add the missing rdma_user_mmap_entry_put() in the error
path to fix it.
Fixes: 155055771704 ("RDMA/erdma: Add verbs implementation") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Link: https://lore.kernel.org/r/20221220121139.1540564-1-linmq006@gmail.com Acked-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Leon Romanovsky <leon@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
On DSA/IAX 1.0, TC-A and TC-B in GRPCFG are set as 1 to have best
performance and cannot be changed through sysfs knobs unless override
option is given.
The same values should be set on DSA 2.0 as well.
Fixes: ea7c8f598c32 ("dmaengine: idxd: restore traffic class defaults after wq reset") Fixes: ade8a86b512c ("dmaengine: idxd: Set defaults for GRPCFG traffic class") Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/20221209172141.562648-1-fenghua.yu@intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
`req` is allocated in pcf50633_adc_async_read(), but
adc_enqueue_request() could fail to insert the `req` into queue.
We need to check the return value and free it in the case of failure.
Fixes: 08c3e06a5eb2 ("mfd: PCF50633 adc driver") Signed-off-by: Qiheng Lin <linqiheng@huawei.com> Signed-off-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20221208061555.8776-1-linqiheng@huawei.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The cs5535-mfd driver uses CPU-specific data that is not available
for ARCH=um builds, so don't allow it to be built for UML.
Prevents these build errors:
In file included from ../arch/x86/include/asm/olpc.h:7,
from ../drivers/mfd/cs5535-mfd.c:17:
../arch/x86/include/asm/geode.h: In function ‘is_geode_gx’:
../arch/x86/include/asm/geode.h:16:31: error: ‘struct cpuinfo_um’ has no member named ‘x86_vendor’
16 | return ((boot_cpu_data.x86_vendor == X86_VENDOR_NSC) &&
../arch/x86/include/asm/geode.h:16:46: error: ‘X86_VENDOR_NSC’ undeclared (first use in this function); did you mean ‘X86_VENDOR_ANY’?
16 | return ((boot_cpu_data.x86_vendor == X86_VENDOR_NSC) &&
../arch/x86/include/asm/geode.h:17:31: error: ‘struct cpuinfo_um’ has no member named ‘x86’
17 | (boot_cpu_data.x86 == 5) &&
../arch/x86/include/asm/geode.h:18:31: error: ‘struct cpuinfo_um’ has no member named ‘x86_model’
18 | (boot_cpu_data.x86_model == 5));
../arch/x86/include/asm/geode.h: In function ‘is_geode_lx’:
../arch/x86/include/asm/geode.h:23:31: error: ‘struct cpuinfo_um’ has no member named ‘x86_vendor’
23 | return ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
../arch/x86/include/asm/geode.h:23:46: error: ‘X86_VENDOR_AMD’ undeclared (first use in this function); did you mean ‘X86_VENDOR_ANY’?
23 | return ((boot_cpu_data.x86_vendor == X86_VENDOR_AMD) &&
../arch/x86/include/asm/geode.h:24:31: error: ‘struct cpuinfo_um’ has no member named ‘x86’
24 | (boot_cpu_data.x86 == 5) &&
../arch/x86/include/asm/geode.h:25:31: error: ‘struct cpuinfo_um’ has no member named ‘x86_model’
25 | (boot_cpu_data.x86_model == 10));
Fixes: 68f5d3f3b654 ("um: add PCI over virtio emulation driver") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20221201012541.11809-1-rdunlap@infradead.org Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Currently proc_dobool expects a (bool *) in table->data, but sizeof(int)
in table->maxsize, because it uses do_proc_dointvec() directly.
This is unsafe for at least two reasons:
1. A sysctl table definition may use { .data = &variable, .maxsize =
sizeof(variable) }, not realizing that this makes the sysctl unusable
(see the Fixes: tag) and that they need to use the completely
counterintuitive sizeof(int) instead.
2. proc_dobool() will currently try to parse an array of values if given
.maxsize >= 2*sizeof(int), but will try to write values of type bool
by offsets of sizeof(int), so it will not work correctly with neither
an (int *) nor a (bool *). There is no .maxsize validation to prevent
this.
Fix this by:
1. Constraining proc_dobool() to allow only one value and .maxsize ==
sizeof(bool).
2. Wrapping the original struct ctl_table in a temporary one with .data
pointing to a local int variable and .maxsize set to sizeof(int) and
passing this one to proc_dointvec(), converting the value to/from
bool as needed (using proc_dou8vec_minmax() as an example).
3. Extending sysctl_check_table() to enforce proc_dobool() expectations.
4. Fixing the proc_dobool() docstring (it was just copy-pasted from
proc_douintvec, apparently...).
5. Converting all existing proc_dobool() users to set .maxsize to
sizeof(bool) instead of sizeof(int).
Fixes: 83efeeeb3d04 ("tty: Allow TIOCSTI to be disabled") Fixes: a2071573d634 ("sysctl: introduce new proc handler proc_dobool") Signed-off-by: Ondrej Mosnacek <omosnace@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Fix kprobe probepoint testcase to ignore __pfx_* prefix symbols. Those are
introduced by commit b341b20d648b ("x86: Add prefix symbols for function
padding") for identifying PADDING_BYTES of NOPs. Since kprobe events can
not probe these prefix symbols, this testcase has to skip those symbols.
Link: https://lore.kernel.org/all/167309835609.640500.9664678940260305746.stgit@devnote3/ Fixes: b341b20d648b ("x86: Add prefix symbols for function padding") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Acked-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
A lot of the tsan helpers are already excempt from the UACCESS warnings,
but some more functions were added that need the same thing:
kernel/kcsan/core.o: warning: objtool: __tsan_volatile_read16+0x0: call to __tsan_unaligned_read16() with UACCESS enabled
kernel/kcsan/core.o: warning: objtool: __tsan_volatile_write16+0x0: call to __tsan_unaligned_write16() with UACCESS enabled
vmlinux.o: warning: objtool: __tsan_unaligned_volatile_read16+0x4: call to __tsan_unaligned_read16() with UACCESS enabled
vmlinux.o: warning: objtool: __tsan_unaligned_volatile_write16+0x4: call to __tsan_unaligned_write16() with UACCESS enabled
As Marco points out, these functions don't even call each other
explicitly but instead gcc (but not clang) notices the functions
being identical and turns one symbol into a direct branch to the
other.
Link: https://lkml.kernel.org/r/20230215130058.3836177-4-arnd@kernel.org Fixes: 75d75b7a4d54 ("kcsan: Support distinguishing volatile accesses") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Kuan-Ying Lee <Kuan-Ying.Lee@mediatek.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
[command]# ./perf test 98 -vv
98: perf all metrics test :
--- start ---
test child forked, pid 13262
Testing BRU_STALL_CPI
Testing COMPLETION_STALL_CPI
----
Testing TOTAL_LOCAL_NODE_PUMPS_P23
Metric 'TOTAL_LOCAL_NODE_PUMPS_P23' not printed in:
Error:
Invalid event (hv_24x7/PM_PB_LNS_PUMP23,chip=3/) in per-thread mode, enable system wide with '-a'.
Testing TOTAL_LOCAL_NODE_PUMPS_RETRIES_P01
Metric 'TOTAL_LOCAL_NODE_PUMPS_RETRIES_P01' not printed in:
Error:
Invalid event (hv_24x7/PM_PB_RTY_LNS_PUMP01,chip=3/) in per-thread mode, enable system wide with '-a'.
----
Based on above logs, we could see some of the hv-24x7 metric events
fails, and logs suggest to run the metric event with -a option. This
change happened after the commit a4b8cfcabb1d90ec ("perf stat: Delay
metric parsing"), which delayed the metric parsing phase and now before
metric parsing phase perf tool identifies, whether target is system-wide
or not. With this change, perf_event_open will fails with workload
monitoring for uncore events as expected.
The perf all metric test case fails as some of the hv-24x7 metric events
may need bigger workload with system wide monitoring to get the data.
Fix this issue by changing current system wide check from true workload
to sleep 0.01 workload.
When --overwrite and --max-size options of perf record are used
together, a segmentation fault occurs. The following is an example:
# perf record -e sched:sched* --overwrite --max-size 1K -a -- sleep 1
[ perf record: Woken up 1 times to write data ]
perf: Segmentation fault
Obtained 12 stack frames.
./perf/perf(+0x197673) [0x55f99710b673]
/lib/x86_64-linux-gnu/libc.so.6(+0x3ef0f) [0x7fa45f3cff0f]
./perf/perf(+0x8eb40) [0x55f997002b40]
./perf/perf(+0x1f6882) [0x55f99716a882]
./perf/perf(+0x794c2) [0x55f996fed4c2]
./perf/perf(+0x7b7c7) [0x55f996fef7c7]
./perf/perf(+0x9074b) [0x55f99700474b]
./perf/perf(+0x12e23c) [0x55f9970a223c]
./perf/perf(+0x12e54a) [0x55f9970a254a]
./perf/perf(+0x7db60) [0x55f996ff1b60]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe6) [0x7fa45f3b2c86]
./perf/perf(+0x7dfe9) [0x55f996ff1fe9]
Segmentation fault (core dumped)
backtrace of the core file is as follows:
(gdb) bt
#0 record__bytes_written (rec=0x55f99755a200 <record>) at builtin-record.c:234
#1 record__output_max_size_exceeded (rec=0x55f99755a200 <record>) at builtin-record.c:242
#2 record__write (map=0x0, size=12816, bf=0x55f9978da2e0, rec=0x55f99755a200 <record>) at builtin-record.c:263
#3 process_synthesized_event (tool=tool@entry=0x55f99755a200 <record>, event=event@entry=0x55f9978da2e0, sample=sample@entry=0x0, machine=machine@entry=0x55f997893658) at builtin-record.c:618
#4 0x000055f99716a883 in __perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=0x55f9978928b0, machine=machine@entry=0x55f997893658,
from=from@entry=0) at util/synthetic-events.c:1895
#5 0x000055f99716a91f in perf_event__synthesize_id_index (tool=tool@entry=0x55f99755a200 <record>, process=process@entry=0x55f997002aa0 <process_synthesized_event>, evlist=<optimized out>, machine=machine@entry=0x55f997893658)
at util/synthetic-events.c:1905
#6 0x000055f996fed4c3 in record__synthesize (tail=tail@entry=true, rec=0x55f99755a200 <record>) at builtin-record.c:1997
#7 0x000055f996fef7c8 in __cmd_record (argc=argc@entry=2, argv=argv@entry=0x7ffc67551260, rec=0x55f99755a200 <record>) at builtin-record.c:2802
#8 0x000055f99700474c in cmd_record (argc=<optimized out>, argv=0x7ffc67551260) at builtin-record.c:4258
#9 0x000055f9970a223d in run_builtin (p=0x55f997564d88 <commands+264>, argc=10, argv=0x7ffc67551260) at perf.c:330
#10 0x000055f9970a254b in handle_internal_command (argc=10, argv=0x7ffc67551260) at perf.c:384
#11 0x000055f996ff1b61 in run_argv (argcp=<synthetic pointer>, argv=<synthetic pointer>) at perf.c:428
#12 main (argc=<optimized out>, argv=0x7ffc67551260) at perf.c:562
The reason is that record__bytes_written accesses the freed memory rec->thread_data,
The process is as follows:
__cmd_record
-> record__free_thread_data
-> zfree(&rec->thread_data) // free rec->thread_data
-> record__synthesize
-> perf_event__synthesize_id_index
-> process_synthesized_event
-> record__write
-> record__bytes_written // access rec->thread_data
We add a member variable "thread_bytes_written" in the struct "record"
to save the data size written by the threads.
Fixes: 6d57581659f72299 ("perf record: Add support for limit perf output file size") Signed-off-by: Yang Jihong <yangjihong1@huawei.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Jiwei Sun <jiwei.sun@windriver.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/CAM9d7ci_TRrqBQVQNW8=GwakUr7SsZpYxaaty-S4bxF8zJWyqw@mail.gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Rather than trying to guess which implementation of "echo" to run with
support for "-ne" options, use "printf" instead of "echo -ne". It
handles escape characters as a standard feature and it is widespread
among modern shells.
Reported-by: "kernelci.org bot" <bot@kernelci.org> Suggested-by: David Laight <David.Laight@ACULAB.COM> Fixes: 3297a4df805d ("kselftests: Enable the echo command to print newlines in Makefile") Fixes: 79c16b1120fe ("selftests: find echo binary to use -ne options") Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com> Reviewed-by: Guenter Roeck <groeck@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Since commit a1d6cd88c897 ("selftests/ftrace: event_triggers: wait
longer for test_event_enable") introduced bash specific "=="
comparation operator, that test will fail when we run it on a
posix-shell. `checkbashisms` warned it as below.
possible bashism in ftrace/func_event_triggers.tc line 45 (should be 'b = a'):
if [ "$e" == $val ]; then
This replaces it with "=".
Fixes: a1d6cd88c897 ("selftests/ftrace: event_triggers: wait longer for test_event_enable") Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Find the actual echo binary using $(which echo) and use it for
formatted output with -ne. On some systems, the default echo command
doesn't handle the -e option and the output looks like this (arm64
build):
-ne Emit Tests for alsa
-ne Emit Tests for amd-pstate
-ne Emit Tests for arm64
This is for example the case with the KernelCI Docker images
e.g. kernelci/gcc-10:x86-kselftest-kernelci. With the actual echo
binary (e.g. in /bin/echo), the output is formatted as expected (x86
build this time):
Emit Tests for alsa
Emit Tests for amd-pstate
Skipping non-existent dir: arm64
Only the install target is using "echo -ne" so keep the $ECHO variable
local to it.
Reported-by: "kernelci.org bot" <bot@kernelci.org> Fixes: 3297a4df805d ("kselftests: Enable the echo command to print newlines in Makefile") Signed-off-by: Guillaume Tucker <guillaume.tucker@collabora.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
When doing randconfig builds for sparc32 with COMPILE_TEST, some
(non-Sparc) drivers cause kconfig warnings with the Kconfig symbols PM,
PM_GENERIC_DOMAINS, or PM_GENERIC_DOMAINS_OF.
This is due to arch/sparc/Kconfig not using the PM Kconfig for
Sparc32:
if SPARC64
source "kernel/power/Kconfig"
endif
Arnd suggested adding "|| COMPILE_TEST" to the conditional,
instead of trying to track down every driver that selects
any of these PM symbols.
Fixes the following kconfig warnings:
WARNING: unmet direct dependencies detected for PM
Depends on [n]: SPARC64 [=n]
Selected by [y]:
- SUN20I_PPU [=y] && (ARCH_SUNXI || COMPILE_TEST [=y])
WARNING: unmet direct dependencies detected for PM
Depends on [n]: SPARC64 [=n]
Selected by [y]:
- SUN20I_PPU [=y] && (ARCH_SUNXI || COMPILE_TEST [=y])
The added perf_stat_merge_counters combines uncore counters. When
metrics are enabled, the counts are merged into a metric_leader via the
stat-shadow saved_value logic. As the leader now is passed an aggregated
count, it leads to all counters being added together twice and counts
appearing approximately doubled in metrics.
This change disables the saved_value merging of counts for evsels that
are merged. It is recommended that later changes remove the saved_value
entirely as the two layers of aggregation in the code is confusing.
Fixes: 942c5593393d9418 ("perf stat: Add perf_stat_merge_counters()") Reported-by: Perry Taylor <perry.taylor@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Florian Fischer <florian.fischer@muhq.space> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com> Link: https://lore.kernel.org/r/20230209064447.83733-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
On aarch64 CPU related events are not under event_source/devices/cpu/events,
they're under event_source/devices/armv8_pmuv3_0/events on my machine.
Using current auto-complete script will generate below error:
[root@localhost bin]# perf stat -e
ls: cannot access '/sys/bus/event_source/devices/cpu/events': No such file or directory
Fix this by not testing /sys/bus/event_source/devices/cpu/events on
aarch64 machine.
Fixes: 74cd5815d9af6e6c ("perf tool: Improve bash command line auto-complete for multiple events with comma") Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: linux-arm-kernel@lists.infradead.org Cc: linuxarm@huawei.com Cc: prime.zeng@hisilicon.com Link: https://lore.kernel.org/r/20230207035057.43394-1-yangyicong@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Enabling verbose option provided debug logs, which says debuginfo
needs to be installed. Snippet of verbose logs:
<<>>
42.3: BPF prologue generation :
--- start ---
test child forked, pid 28218
<<>>
Rebuild with CONFIG_DEBUG_INFO=y, or install an appropriate debuginfo
package.
bpf_probe: failed to convert perf probe events
Failed to add events selected by BPF
test child finished with -1
---- end ----
BPF filter subtest 3: FAILED!
<<>>
Here the subtest "BPF prologue generation" failed and logs shows
debuginfo is needed. After installing kernel-debuginfo package, testcase
passes.
The "BPF prologue generation" subtest failed because, the do_test()
returns TEST_FAIL without checking the error type returned by
parse_events_load_bpf_obj().
parse_events_load_bpf_obj() can also return error of type -ENODATA
incase kernel-debuginfo package is not installed. Fix this by adding
check for -ENODATA error.
The current display code for perf stat iterates given cpus and build the
aggr map to collect the event data for the aggregation mode.
But uncore events have their own cpu maps and it won't guarantee that
it'd match to the aggr map. For example, per-package uncore events
would generate a single value for each socket. When user asks per-core
aggregation mode, the output would contain 0 values for other cores.
Thus it needs to check the uncore PMU's cpumask and if it matches to the
current aggregation id.
Before:
$ sudo ./perf stat -a --per-core -e power/energy-pkg/ sleep 1
When it processes AUXTRACE_INFO, it calls to auxtrace_queue_data() to
collect AUXTRACE data first. That won't work with pipe since it needs
lseek() to read the scattered aux data.
$ perf record -o- -e intel_pt// true | perf report -i- --itrace=i100
# To display the perf.data header info, please use --header/--header-only options.
#
0x4118 [0xa0]: failed to process type: 70
Error:
failed to process sample
For the pipe mode, it can handle the aux data as it gets. But there's
no guarantee it can get the aux data in time. So the following warning
will be shown at the beginning:
WARNING: Intel PT with pipe mode is not recommended.
The output cannot relied upon. In particular,
time stamps and the order of events may be incorrect.
Fixes: dbd134322e74f19d ("perf intel-pt: Add support for decoding AUX area samples") Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230131023350.1903992-3-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
In copy_bytes(), it reads the data from the (input) fd and writes it to
the output file. But it does with the read(2) unconditionally which
caused a problem of mixing buffered vs unbuffered I/O together.
You can see the problem when using pipes.
$ perf record -e intel_pt// -o- true | perf inject -b > /dev/null
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.000 MB - ]
0x45c0 [0x30]: failed to process type: 71
It should use perf_data__read() to honor the 'use_stdio' setting.
Fixes: 601366678c93618f ("perf data: Allow to use stdio functions for pipe mode") Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230131023350.1903992-2-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
If we register a "leds-gpio" platform device for GPIO pins that do not
exist we get a -EPROBE_DEFER and the probe will be tried again later.
If there is no driver to provide that pin we will poll forever and also
create a lot of log messages.
So check if that GPIO driver is configured, if so it will come up
eventually. If not, we exit our probe function early and do not even
bother registering the "leds-gpio". This method was chosen over "Kconfig
depends" since this way we can add support for more devices and GPIO
backends more easily without "depends":ing on all GPIO backends.
Fixes: a6c80bec3c93 ("leds: simatic-ipc-leds-gpio: Add GPIO version of Siemens driver") Signed-off-by: Henning Schild <henning.schild@siemens.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Signed-off-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20221007153323.1326-1-henning.schild@siemens.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Clang complains that devm_add_action() takes a parameter with a wrong type:
warning: cast from 'void (*)(struct mutex *)' to 'void (*)(void *)' converts to incompatible function type [-Wcast-function-type-strict]
err = devm_add_action(dev, (void (*)(void *))mutex_destroy, &is31->lock);
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
It appears that the commit e1af5c815586 ("leds: is31fl319x: Fix devm vs.
non-devm ordering") missed two things:
- whilst the commit mentions devm_add_action_or_reset() the actual change
utilised devm_add_action() call by mistake
- strictly speaking the parameter is not compatible by type
Fix both issues by switching to devm_add_action_or_reset() and adding a
wrapper for mutex_destroy() call.
Reported-by: kernel test robot <lkp@intel.com> Fixes: e1af5c815586 ("leds: is31fl319x: Fix devm vs. non-devm ordering") Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Tested-by: Vincent Knecht <vincent.knecht@mailoo.org> Signed-off-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20221228093238.82713-1-andriy.shevchenko@linux.intel.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
class_find_device_by_of_node() calls class_find_device(), it will take
the reference, use the put_device() to drop the reference when not need
anymore.
Fixes: 699a8c7c4bd3 ("leds: Add of_led_get() and led_put()") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Lee Jones <lee@kernel.org> Link: https://lore.kernel.org/r/20221220121807.1543790-1-linmq006@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The LLVM template is first echo-ed into command_out and then
command_out executed. The echo surrounds the template with double
quotes, however, the template itself may contain quotes. This is
generally innocuous but in tools/perf/tests/bpf-script-test-prologue.c
we see:
...
SEC("func=null_lseek file->f_mode offset orig")
...
where the first double quote ends the double quote of the echo, then
the > redirects output into a file called f_mode.
To avoid this inadvertent behavior substitute redirects and similar
characters to be ASCII control codes, then substitute the output in
the echo back again.
Fixes: 5eab5a7ee032acaa ("perf llvm: Display eBPF compiling command in debug output") Signed-off-by: Ian Rogers <irogers@google.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: bpf@vger.kernel.org Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: llvm@lists.linux.dev Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Rix <trix@redhat.com> Link: https://lore.kernel.org/r/20230105082609.344538-1-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
The ->writepage() and ->writepages() operations are supposed to write
entire pages. However, on filesystems with a block size smaller than
PAGE_SIZE, __gfs2_jdata_writepage() only adds the first block to the
current transaction instead of adding the entire page. Fix that.
Fixes: 18ec7d5c3f43 ("[GFS2] Make journaled data files identical to normal files on disk") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
In smb2_reconnect_server, we allocate a dummy tcon for
calling reconnect for just the session. This should be
allocated using tconInfoAlloc, and not kmalloc.
Fixes: 3663c9045f51 ("cifs: check reconnects for channels of active tcons too") Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Signed-off-by: Steve French <stfrench@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Currently, we're only memcpy'ing the first __be32. Ensure we copy into
both words.
Fixes: 91d2e9b56cf5 ("NFSD: Clean up the nfsd_net::nfssvc_boot field") Reported-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>
Most of the time, NFSv4 clients issue a COMMIT before the final CLOSE of
an open stateid, so with NFSv4, the fsync in the nfsd_file_free path is
usually a no-op and doesn't block.
We have a customer running knfsd over very slow storage (XFS over Ceph
RBD). They were using the "async" export option because performance was
more important than data integrity for this application. That export
option turns NFSv4 COMMIT calls into no-ops. Due to the fsync in this
codepath however, their final CLOSE calls would still stall (since a
CLOSE effectively became a COMMIT).
I think this fsync is not strictly necessary. We only use that result to
reset the write verifier. Instead of fsync'ing all of the data when we
free an nfsd_file, we can just check for writeback errors when one is
acquired and when it is freed.
If the client never comes back, then it'll never see the error anyway
and there is no point in resetting it. If an error occurs after the
nfsd_file is removed from the cache but before the inode is evicted,
then it will reset the write verifier on the next nfsd_file_acquire,
(since there will be an unseen error).
The only exception here is if something else opens and fsyncs the file
during that window. Given that local applications work with this
limitation today, I don't see that as an issue.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=2166658 Fixes: ac3a2585f018 ("nfsd: rework refcounting in filecache") Reported-and-tested-by: Pierguido Lambri <plambri@redhat.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Andrea Righi <andrea.righi@canonical.com>