In ipa_endpoint_replenish(), if an error occurs when attempting to
replenish a receive buffer, we just quit and try again later. In
that case we increment the backlog count to reflect that the attempt
was unsuccessful. Then, if the add_one flag was true we increment
the backlog again.
This second increment is not included in the backlog local variable
though, and its value determines whether delayed work should be
scheduled. This is a bug.
Fix this by determining whether 1 or 2 should be added to the
backlog before adding it in a atomic_add_return() call.
Reviewed-by: Matthias Kaehlcke <mka@chromium.org> Fixes: 84f9bd12d46db ("soc: qcom: ipa: IPA endpoints") Signed-off-by: Alex Elder <elder@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
In Linux bonding scenario, one packet is copied to several copies and sent
by all slave device of bond0 in mode 3(broadcast mode). The mode 3 xmit
function bond_xmit_broadcast() only ueses the last slave device's tx result
as the final result. In this case, if the last slave device is down, then
it always return NET_XMIT_DROP, even though the other slave devices xmit
success. It may cause the tx statistics error, and cause the application
(e.g. scp) consider the network is unreachable.
For example, use the following command to configure server A.
echo 3 > /sys/class/net/bond0/bonding/mode
ifconfig bond0 up
ifenslave bond0 eth0 eth1
ifconfig bond0 192.168.1.125
ifconfig eth0 up
ifconfig eth1 down
The slave device eth0 and eth1 are connected to server B(192.168.1.107).
Run the ping 192.168.1.107 -c 3 -i 0.2 command, the following information
is displayed.
PING 192.168.1.107 (192.168.1.107) 56(84) bytes of data.
64 bytes from 192.168.1.107: icmp_seq=1 ttl=64 time=0.077 ms
64 bytes from 192.168.1.107: icmp_seq=2 ttl=64 time=0.056 ms
64 bytes from 192.168.1.107: icmp_seq=3 ttl=64 time=0.051 ms
192.168.1.107 ping statistics
0 packets transmitted, 3 received
Actually, the slave device eth0 of the bond successfully sends three
ICMP packets, but the result shows that 0 packets are transmitted.
Also if we use scp command to get remote files, the command end with the
following printings.
ssh_exchange_identification: read: Connection timed out
So this patch modifies the bond_xmit_broadcast to return NET_XMIT_SUCCESS
if one slave device in the bond sends packets successfully. If all slave
devices send packets fail, the discarded packets stats is increased. The
skb is released when there is no slave device in the bond or the last slave
device is down.
Fixes: ae46f184bc1f ("bonding: propagate transmit status") Signed-off-by: Jie Wang <wangjie125@huawei.com> Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Don't forget to release the device in sock_timestamping_bind_phc() after
it was used to get the vclock indices.
Fixes: d463126e23f1 ("net: sock: extend SO_TIMESTAMPING for PHC binding") Signed-off-by: Miroslav Lichvar <mlichvar@redhat.com> Cc: Yangbo Lu <yangbo.lu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
These properties aren't documented nor implemented in the driver.
Drop them.
Fixes warnings as:
$ make dtbs_check DT_SCHEMA_FILES=Documentation/devicetree/bindings/display/msm/gpu.yaml
...
arch/arm64/boot/dts/qcom/msm8996-mtp.dt.yaml: gpu@b00000: 'qcom,gpu-quirk-fault-detect-mask', 'qcom,gpu-quirk-two-pass-use-wfi' do not match any of the regexes: 'pinctrl-[0-9]+'
From schema: Documentation/devicetree/bindings/display/msm/gpu.yaml
...
Fixes: 69cc3114ab0f ("arm64: dts: Add Adreno GPU definitions") Signed-off-by: David Heidelberg <david@ixit.cz> Signed-off-by: Bjorn Andersson <bjorn.andersson@linaro.org> Link: https://lore.kernel.org/r/20211030100413.28370-1-david@ixit.cz Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
DEVLINK_CMD_HEALTH_REPORTER_DUMP_GET command doesn't have .doit callback
and has no use in internal_flags at all. Remove this misleading assignment.
Fixes: e44ef4e4516c ("devlink: Hang reporter's dump method on a dumpit cb") Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Because of commit bf794bf52a80c627 ("powerpc/kprobes: Fix kallsyms
lookup across powerpc ABIv1 and ABIv2"), in ppc64 ABIv1, our perf
command eliminates the need to use the prefix "." at the symbol name.
But when the command "perf probe -a schedule" is executed on ppc64
ABIv1, it obtains two symbol address information through /proc/kallsyms,
for example:
The symbol "D schedule" is not a function symbol, and perf will print:
"p:probe/schedule _text+13958584"Failed to write event: Invalid argument
Therefore, when searching symbols from map and adding probe point for
them, a symbol type check is added. If the type of symbol is not a
function, skip it.
Fixes: bf794bf52a80c627 ("powerpc/kprobes: Fix kallsyms lookup across powerpc ABIv1 and ABIv2") Signed-off-by: Zechuan Chen <chenzechuan1@huawei.com> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jianlin Lv <Jianlin.Lv@arm.com> Cc: Jin Yao <yao.jin@linux.intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@linux.ibm.com> Cc: Yang Jihong <yangjihong1@huawei.com> Link: https://lore.kernel.org/r/20211228111338.218602-1-chenzechuan1@huawei.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
It's possible to link against libopencsd_c_api without having
libstdc++.so available, only libstdc++.so.6.0.28 (or whatever version is
in use) needs to be available. The same holds true for libopencsd.so.
When -lstdc++ (or -lopencsd) is explicitly passed to the linker however
the .so file must be available.
So wrap adding the dependencies into a check for static linking that
actually requires adding them all. The same construct is already used
for some other tests in the same file to reduce dependencies in the
dynamic linking case.
Fixes: 573cf5c9a152 ("perf build: Add missing -lstdc++ when linking with libopencsd") Reviewed-by: James Clark <james.clark@arm.com> Signed-off-by: Uwe Kleine-König <uwe@kleine-koenig.org> Cc: Adrian Bunk <bunk@debian.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Branislav Rankov <branislav.rankov@arm.com> Cc: Diederik de Haas <didi.debian@cknow.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/all/20211203210544.1137935-1-uwe@kleine-koenig.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The hardware channel next descriptor view structure contains just
fields of 32 bits, while dma_addr_t can be of type u64 or u32
depending on CONFIG_ARCH_DMA_ADDR_T_64BIT. Force u32 to comply with
what the hardware expects.
Fixes: e1f7c9eee707 ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver") Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Link: https://lore.kernel.org/r/20211215110115.191749-11-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Since tx_submit can be called from a hard IRQ, xfers_list must be
protected with a lock to avoid concurency on the list's elements.
Since at_xdmac_handle_cyclic() is called from a tasklet, spin_lock_irq
is enough to protect from a hard IRQ.
Fixes: e1f7c9eee707 ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver") Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Link: https://lore.kernel.org/r/20211215110115.191749-8-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Cyclic channels must too call issue_pending in order to start a transfer.
Start the transfer in issue_pending regardless of the type of channel.
This wrongly worked before, because in the past the transfer was started
at tx_submit level when only a desc in the transfer list.
Fixes: e1f7c9eee707 ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver") Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Link: https://lore.kernel.org/r/20211215110115.191749-3-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
tx_submit is supposed to push the current transaction descriptor to a
pending queue, waiting for issue_pending() to be called. issue_pending()
must start the transfer, not tx_submit(), thus remove
at_xdmac_start_xfer() from at_xdmac_tx_submit(). Clients of at_xdmac that
assume that tx_submit() starts the transfer must be updated and call
dma_async_issue_pending() if they miss to call it (one example is
atmel_serial).
As the at_xdmac_start_xfer() is now called only from
at_xdmac_advance_work() when !at_xdmac_chan_is_enabled(), the
at_xdmac_chan_is_enabled() check is no longer needed in
at_xdmac_start_xfer(), thus remove it.
Fixes: e1f7c9eee707 ("dmaengine: at_xdmac: creation of the atmel eXtended DMA Controller driver") Signed-off-by: Tudor Ambarus <tudor.ambarus@microchip.com> Link: https://lore.kernel.org/r/20211215110115.191749-2-tudor.ambarus@microchip.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Using grep -C with perf script -D can give erroneous results as grep loses
lines due to non-printable characters, for example, below the 0020, 0060
and 0070 lines are missing:
0 0 0x450 [0x98]: PERF_RECORD_AUXTRACE_INFO type: 1
PMU Type 8
Time Shift 31
perf's isprint() is a custom implementation from the kernel, but the
kernel's _ctype appears to include characters from Latin-1 Supplement which
is not compatible with, for example, UTF-8. Fix by checking also isascii().
Fixes: 3052ba56bcb58904 ("tools perf: Move from sane_ctype.h obtained from git to the Linux's original") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@redhat.com> Link: http://lore.kernel.org/lkml/20220112085057.277205-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Mask the ECN bits before calling ip_route_output_ports(). The tos
variable might be passed directly from an IPv4 header, so it may have
the last ECN bit set. This interferes with the route lookup process as
ip_route_output_key_hash() interpretes this bit specially (to restrict
the route scope).
Found by code inspection, compile tested only.
Fixes: 804c2f3e36ef ("libcxgb,iw_cxgb4,cxgbit: add cxgb_find_route()") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Mask the ECN bits before initialising ->flowi4_tos. The tunnel key may
have the last ECN bit set, which will interfere with the route lookup
process as ip_route_output_key_hash() interpretes this bit specially
(to restrict the route scope).
Found by code inspection, compile tested only.
Fixes: 962924fa2b7a ("ip_gre: Refactor collect metatdata mode tunnel xmit to ip_md_tunnel_xmit") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Restore ndev->cur_num_vqs to the original value in case change_num_qps()
fails.
Fixes: 52893733f2c5 ("vdpa/mlx5: Add multiqueue support") Reviewed-by: Si-Wei Liu<si-wei.liu@oracle.com> Acked-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20220105114646.577224-10-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Similar to commit 94e2238969e8 ("xfrm4: strip ECN bits from tos field"),
clear the ECN bits from iph->tos when setting ->flowi4_tos.
This ensures that the last bit of ->flowi4_tos is cleared, so
ip_route_output_key_hash() isn't going to restrict the scope of the
route lookup.
Use ~INET_ECN_MASK instead of IPTOS_RT_MASK, because we have no reason
to clear the high order bits.
Found by code inspection, compile tested only.
Fixes: 4da3089f2b58 ("[IPSEC]: Use TOS when doing tunnel lookups") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Contrary to what was stated before, the hardware hasn't changed
the bits here yet. In any case, the new CSR is also directly
(lower 16 bits) connected to UREG_DOORBELL_TO_ISR6, so if it
still changes the changes would be there. Adjust the code and
comments accordingly.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Fixes: 6c0795f1a524 ("iwlwifi: implement Bz NMI behaviour") Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Link: https://lore.kernel.org/r/iwlwifi.20211210090244.75b6207536e3.I7d170a48a9096e6b7269c3a9f447c326f929b171@changeid Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
When under stress, cleanup_net() can have to dismantle
netns in big numbers. ops_exit_list() currently calls
many helpers [1] that have no schedule point, and we can
end up with soft lockups, particularly on hosts
with many cpus.
Even for moderate amount of netns processed by cleanup_net()
this patch avoids latency spikes.
[1] Some of these helpers like fib_sync_up() and fib_sync_down_dev()
are very slow because net/ipv4/fib_semantics.c uses host-wide hash tables,
and ifindex is used as the only input of two hash functions.
ifindexes tend to be the same for all netns (lo.ifindex==1 per instance)
This will be fixed in a separate patch.
Fixes: 72ad937abd0a ("net: Add support for batching network namespace cleanups") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Both fields can be read/written without synchronization,
add proper accessors and documentation.
Fixes: d5dd88794a13 ("inet: fix various use-after-free in defrags units") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
In the function bacct_add_task the code reading task->exit_code was
introduced in commit f3cef7a99469 ("[PATCH] csa: basic accounting over
taskstats"), and it is not entirely clear what the taskstats interface
is trying to return as only returning the exit_code of the first task
in a process doesn't make a lot of sense.
As best as I can figure the intent is to return task->exit_code after
a task exits. The field is returned with per task fields, so the
exit_code of the entire process is not wanted. Only the value of the
first task is returned so this is not a useful way to get the per task
ptrace stop code. The ordinary case of returning this value is
returning after a task exits, which also precludes use for getting
a ptrace value.
It is common to for the first task of a process to also be the last
task of a process so this field may have done something reasonable by
accident in testing.
Make ac_exitcode a reliable per task value by always returning it for
every exited task.
Setting ac_exitcode in a sensible mannter makes it possible to continue
to provide this value going forward.
Cc: Balbir Singh <bsingharora@gmail.com> Fixes: f3cef7a99469 ("[PATCH] csa: basic accounting over taskstats") Link: https://lkml.kernel.org/r/20220103213312.9144-5-ebiederm@xmission.com Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
A recently added error path does not mark ring unused when exiting on
OOM, which will lead to BUG on the next entry in debug builds.
TODO: refactor code so we have START_USE and END_USE in the same function.
Fixes: fc6d70f40b3d ("virtio_ring: check desc == NULL when using indirect with packed") Cc: "Xuan Zhuo" <xuanzhuo@linux.alibaba.com> Cc: Jiasheng Jiang <jiasheng@iscas.ac.cn> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Remove overriding of virtio_version_1_0 which forced the virtqueue
object to version 1.
Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for supported mlx5 devices") Signed-off-by: Eli Cohen <elic@nvidia.com> Link: https://lore.kernel.org/r/20211230142024.142979-1-elic@nvidia.com Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Acked-by: Jason Wang <jasowang@redhat.com> Reviewed-by: Si-Wei Liu <si-wei.liu@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
With the latest stable kernel versions the rtc on the PXA based
Zaurus does not work, when booting I see the following kernel messages:
pxa-rtc pxa-rtc: failed to find rtc clock source
pxa-rtc pxa-rtc: Unable to init SA1100 RTC sub-device
pxa-rtc: probe of pxa-rtc failed with error -2
hctosys: unable to open rtc device (rtc0)
I think this is because commit f2997775b111 ("rtc: sa1100: fix possible
race condition") moved the allocation of the rtc_device struct out of
sa1100_rtc_init and into sa1100_rtc_probe. This means that pxa_rtc_probe
also needs to do allocation for the rtc_device struct, otherwise
sa1100_rtc_init will try to dereference a null pointer. This patch adds
that allocation by copying how sa1100_rtc_probe in
drivers/rtc/rtc-sa1100.c does it; after the IRQs are set up a managed
rtc_device is allocated.
I've tested this patch with `qemu-system-arm -machine akita` and with a
real Zaurus SL-C1000 applied to 4.19, 5.4, and 5.10.
Signed-off-by: Laurence de Bruxelles <lfdebrux@gmail.com> Fixes: f2997775b111 ("rtc: sa1100: fix possible race condition") Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com> Link: https://lore.kernel.org/r/20220101154149.12026-1-lfdebrux@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Unfortunately details of USB HID transport bled into HID core and
handling of numbered/unnumbered reports is quite a mess, with
hid_report_len() calculating the length according to USB rules,
and hid_hw_raw_request() adding report ID to the buffer for both
numbered and unnumbered reports.
Untangling it all requres a lot of changes in HID, so for now let's
handle this in the driver.
[jkosina@suse.cz: microoptimize field->report->id to report->id] Fixes: 14c9c014babe ("HID: add vivaldi HID driver") Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Tested-by: Stephen Boyd <swboyd@chromium.org> # CoachZ Signed-off-by: Jiri Kosina <jkosina@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Fixes: d8fb32f4790f ("um: Add support for host CPU flags and alignment") Signed-off-by: Johannes Berg <johannes.berg@intel.com> Acked-By: anton.ivanov@cambridgegreys.com Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
find_first_bit() and find_first_zero_bit() are not protected with
ifdefs as other functions in find.h. It causes build errors on some
platforms if CONFIG_GENERIC_FIND_FIRST_BIT is enabled.
Signed-off-by: Yury Norov <yury.norov@gmail.com> Fixes: 2cc7b6a44ac2 ("lib: add fast path for find_first_*_bit() and find_last_bit()") Reported-by: kernel test robot <lkp@intel.com> Tested-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
With previous changes to make the driver handle the TX ring size more
correctly, the default TX ring size of 64 appears to significantly
bottleneck TX performance to around 600 Mbps on a 1 Gbps link on ZynqMP.
Increasing this to 128 seems to bring performance up to near line rate and
shouldn't cause excess bufferbloat (this driver doesn't yet support modern
byte-based queue management).
Fixes: 8a3b7a252dca9 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Network driver documentation indicates we should be avoiding returning
NETDEV_TX_BUSY from ndo_start_xmit in normal cases, since it requires
the packets to be requeued. Instead the queue should be stopped after
a packet is added to the TX ring when there may not be enough room for an
additional one. Also, when TX ring entries are completed, we should only
wake the queue if we know there is room for another full maximally
fragmented packet.
Print a warning if there is insufficient space at the start of start_xmit,
since this should no longer happen.
Combined with increasing the default TX ring size (in a subsequent
patch), this appears to recover the TX performance lost by previous changes
to actually manage the TX ring state properly.
Fixes: 8a3b7a252dca9 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The check for the number of available TX ring slots was off by 1 since a
slot is required for the skb header as well as each fragment. This could
result in overwriting a TX ring slot that was still in use.
Fixes: 8a3b7a252dca9 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The check for whether a TX ring slot was available was incorrect,
since a slot which had been loaded with transmit data but the device had
not started transmitting would be treated as available, potentially
causing non-transmitted slots to be overwritten. The control field in
the descriptor should be checked, rather than the status field (which may
only be updated when the device completes the entry).
Fixes: 8a3b7a252dca9 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The driver will not work properly if the TX ring size is set to below
MAX_SKB_FRAGS + 1 since it needs to hold at least one full maximally
fragmented packet in the TX ring. Limit setting the ring size to below
this value.
Fixes: 8b09ca823ffb4 ("net: axienet: Make RX/TX ring sizes configurable") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
This driver was missing some required memory barriers:
Use dma_rmb to ensure we see all updates to the descriptor after we see
that an entry has been completed.
Use wmb and rmb to avoid stale descriptor status between the TX path and
TX complete IRQ path.
Fixes: 8a3b7a252dca9 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
In some cases where the Xilinx Ethernet core was used in 1000Base-X or
SGMII modes, which use the internal PCS/PMA PHY, and the MGT
transceiver clock source for the PCS was not running at the time the
FPGA logic was loaded, the core would come up in a state where the
PCS could not be found on the MDIO bus. To fix this, the Ethernet core
(including the PCS) should be reset after enabling the clocks, prior to
attempting to access the PCS using of_mdio_find_device.
Fixes: 1a02556086fc (net: axienet: Properly handle PCS/PMA PHY for 1000BaseX mode) Signed-off-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
When resetting the device, wait for the PhyRstCmplt bit to be set
in the interrupt status register before continuing initialization, to
ensure that the core is actually ready. When using an external PHY, this
also ensures we do not start trying to access the PHY while it is still
in reset. The PHY reset is initiated by the core reset which is
triggered just above, but remains asserted for 5ms after the core is
reset according to the documentation.
The MgtRdy bit could also be waited for, but unfortunately when using
7-series devices, the bit does not appear to work as documented (it
seems to behave as some sort of link state indication and not just an
indication the transceiver is ready) so it can't really be relied on for
this purpose.
Fixes: 8a3b7a252dca9 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The previous timeout of 1ms was too short to handle some cases where the
core is reset just after the input clocks were started, which will
be introduced in an upcoming patch. Increase the timeout to 50ms. Also
simplify the reset timeout checking to use read_poll_timeout.
Fixes: 8a3b7a252dca9 ("drivers/net/ethernet/xilinx: added Xilinx AXI Ethernet driver") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
A hung_task is observed when removing SMC-R devices. Suppose that
a link group has two active links(lnk_A, lnk_B) associated with two
different SMC-R devices(dev_A, dev_B). When dev_A is removed, the
link group will be removed from smc_lgr_list and added into
lgr_linkdown_list. lnk_A will be cleared and smcibdev(A)->lnk_cnt
will reach to zero. However, when dev_B is removed then, the link
group can't be found in smc_lgr_list and lnk_B won't be cleared,
making smcibdev->lnk_cnt never reaches zero, which causes a hung_task.
This patch fixes this issue by restoring the implementation of
smc_smcr_terminate_all() to what it was before commit 349d43127dac
("net/smc: fix kernel panic caused by race of smc_sock"). The original
implementation also satisfies the intention that make sure QP destroy
earlier than CQ destroy because we will always wait for smcibdev->lnk_cnt
reaches zero, which guarantees QP has been destroyed.
Fixes: 349d43127dac ("net/smc: fix kernel panic caused by race of smc_sock") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
GPIO nodes are not themselves busses, so passing rockchip_bank_match
here is wrong. Passing NULL instead uses the standard bus match table
which is more appropriate.
devm_of_platform_populate() shows that this is the normal way to call
of_platform_populate() from a device driver, so in order to match that
more closely also add the pinctrl device as the parent for the newly
created GPIO controllers.
Specifically, using the wrong match here can break dynamic GPIO hogs as
marking the GPIO bank as a bus means that of_platform_notify() will set
OF_POPULATED on new child nodes and if this happens before
of_gpio_notify() is called then the new hog will be skipped as
OF_POPULATED is already set.
Fixes: 9ce9a02039de ("pinctrl/rockchip: drop the gpio related codes") Signed-off-by: John Keeping <john@metanate.com> Link: https://lore.kernel.org/r/20211126151352.1509583-1-john@metanate.com Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The call to of_clk_add_hw_provider was not undone on remove or on probe
failure, which could cause an oops on a subsequent attempt to retrieve
clocks for the removed device. Switch to the devm version of the
function to avoid this issue.
Fixes: 3044a860fd09 ("clk: Add Si5341/Si5340 driver") Signed-off-by: Robert Hancock <robert.hancock@calian.com> Link: https://lore.kernel.org/r/20220112203816.1784610-1-robert.hancock@calian.com Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
We don't want vendors to be enabling this part of the clk code and
shipping it to customers. Exposing the ability to change clk frequencies
and parents via debugfs is potentially damaging to the system if folks
don't know what they're doing. Emit a strong warning so that the message
is clear: don't enable this outside of development systems.
Fixes: 37215da5553e ("clk: Add support for setting clk_rate via debugfs") Cc: Geert Uytterhoeven <geert+renesas@glider.be> Link: https://lore.kernel.org/r/20211210014237.2130300-1-sboyd@kernel.org Signed-off-by: Stephen Boyd <sboyd@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
wait_for_unix_gc() reads unix_tot_inflight & gc_in_progress
without synchronization.
Adds READ_ONCE()/WRITE_ONCE() and their associated comments
to better document the intent.
BUG: KCSAN: data-race in unix_inflight / wait_for_unix_gc
write to 0xffffffff86e2b7c0 of 4 bytes by task 9380 on cpu 0:
unix_inflight+0x1e8/0x260 net/unix/scm.c:63
unix_attach_fds+0x10c/0x1e0 net/unix/scm.c:121
unix_scm_to_skb net/unix/af_unix.c:1674 [inline]
unix_dgram_sendmsg+0x679/0x16b0 net/unix/af_unix.c:1817
unix_seqpacket_sendmsg+0xcc/0x110 net/unix/af_unix.c:2258
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg net/socket.c:724 [inline]
____sys_sendmsg+0x39a/0x510 net/socket.c:2409
___sys_sendmsg net/socket.c:2463 [inline]
__sys_sendmmsg+0x267/0x4c0 net/socket.c:2549
__do_sys_sendmmsg net/socket.c:2578 [inline]
__se_sys_sendmmsg net/socket.c:2575 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2575
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
read to 0xffffffff86e2b7c0 of 4 bytes by task 9375 on cpu 1:
wait_for_unix_gc+0x24/0x160 net/unix/garbage.c:196
unix_dgram_sendmsg+0x8e/0x16b0 net/unix/af_unix.c:1772
unix_seqpacket_sendmsg+0xcc/0x110 net/unix/af_unix.c:2258
sock_sendmsg_nosec net/socket.c:704 [inline]
sock_sendmsg net/socket.c:724 [inline]
____sys_sendmsg+0x39a/0x510 net/socket.c:2409
___sys_sendmsg net/socket.c:2463 [inline]
__sys_sendmmsg+0x267/0x4c0 net/socket.c:2549
__do_sys_sendmmsg net/socket.c:2578 [inline]
__se_sys_sendmmsg net/socket.c:2575 [inline]
__x64_sys_sendmmsg+0x53/0x60 net/socket.c:2575
do_syscall_x64 arch/x86/entry/common.c:50 [inline]
do_syscall_64+0x44/0xd0 arch/x86/entry/common.c:80
entry_SYSCALL_64_after_hwframe+0x44/0xae
value changed: 0x00000002 -> 0x00000004
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 9375 Comm: syz-executor.1 Not tainted 5.16.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Fixes: 9915672d4127 ("af_unix: limit unix_tot_inflight") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Link: https://lore.kernel.org/r/20220114164328.2038499-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
If kstrtoint() fails then "lfs_num" is uninitialized and the warning
doesn't make any sense. Just delete it.
Fixes: 8ec8015a3168 ("crypto: octeontx2 - add support to process the crypto request") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
There is a potential deadlock between writeback process and a process
performing write_begin() or write_cache_pages() while trying to write
same compress file, but not compressable, as below:
Since there is no compress process, it is no longer necessary to hold
locks on every pages in cluster within f2fs_write_raw_pages().
This patch changes f2fs_write_raw_pages() to release all locks first
and then perform write same as the non-compress file in
f2fs_write_cache_pages().
Fixes: 4c8ff7095bef ("f2fs: support data compression") Signed-off-by: Hyeong-Jun Kim <hj514.kim@samsung.com> Signed-off-by: Sungjong Seo <sj1557.seo@samsung.com> Signed-off-by: Youngjin Gil <youngjin.gil@samsung.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
F2FS-fs (loop11): mismatched blkaddr 5765 (source_blkaddr 1) in seg 3
kernel BUG at fs/f2fs/gc.c:1042!
do_garbage_collect+0x90f/0xa80 [f2fs]
f2fs_gc+0x294/0x12a0 [f2fs]
f2fs_balance_fs+0x2c5/0x7d0 [f2fs]
f2fs_create+0x239/0xd90 [f2fs]
lookup_open+0x45e/0xa90
open_last_lookups+0x203/0x670
path_openat+0xae/0x490
do_filp_open+0xbc/0x160
do_sys_openat2+0x2f1/0x500
do_sys_open+0x5e/0xa0
__x64_sys_openat+0x28/0x40
Previously, f2fs tries to catch data inconcistency exception in between
SSA and SIT table during GC, however once the exception is caught, it will
call f2fs_bug_on to hang kernel, it's not needed, instead, let's set
SBI_NEED_FSCK flag and skip migrating current block.
Fixes: bbf9f7d90f21 ("f2fs: Fix indefinite loop in f2fs_gc()") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Since compress inode not a regular file, generic_error_remove_page in
f2fs_invalidate_compress_pages will always be failed, set compress
inode as a regular file to fix it.
Fixes: 6ce19aff0b8c ("f2fs: compress: add compress_inode to cache compressed blocks") Signed-off-by: Fengnan Chang <changfengnan@vivo.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Before the driver had screen targets support we had to disable explicit
bringup of its infrastructure because it was breaking screen objects
support.
Since the implementation of screen targets landed there hasn't been a
reason to explicitly disable it and the options were never used.
Remove of all that unused code.
Old versions of the svga device used to export virtual vram, handling of
which was optimized on top of transparent hugepages support. Only very
old devices (OpenGL 2.1 support and earlier) used this code and at this
point performance differences are negligible.
Because the code requires very old hardware versions to run it has
been largely untested and unused for a long time.
Furthermore removal of the ttm hugepages support in:
commit 0d979509539e ("drm/ttm: remove ttm_bo_vm_insert_huge()")
broke the coherency mode in vmwgfx when running with hugepages.
Fixes: 0d979509539e ("drm/ttm: remove ttm_bo_vm_insert_huge()") Signed-off-by: Zack Rusin <zackr@vmware.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Christian König <christian.koenig@amd.com> Cc: Daniel Vetter <daniel.vetter@ffwll.ch> Reviewed-by: Martin Krastev <krastevm@vmware.com> Reviewed-by: Maaz Mombasawala <mombasawalam@vmware.com> Link: https://patchwork.freedesktop.org/patch/msgid/20211215184147.3688785-2-zack@kde.org
(cherry picked from commit 49d535d64d52945e2c874f380705675e20a02b6a) Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
It does not make sense to have an (empty) chosen node in an SoC-specific
.dtsi, as chosen is meant for system-specific configuration.
It is already provided in microchip-mpfs-icicle-kit.dts anyway.
Fixes: 0fa6107eca4186ad ("RISC-V: Initial DTS for Microchip ICICLE board") Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Conor Dooley <conor.dooley@microchip.com> Tested-by: Conor Dooley <conor.dooley@microchip.com> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
kobject_init_and_add() takes reference even when it fails.
According to the doc of kobject_init_and_add():
If this function returns an error, kobject_put() must be called to
properly clean up the memory associated with the object.
Fix memory leak by calling kobject_put().
Fixes: 73f368cf679b ("Kobject: change drivers/parisc/pdc_stable.c to use kobject_init_and_add") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Helge Deller <deller@gmx.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Once an MDIO read transaction is initiated, we must read back the data
register within 16 MDC cycles after the transaction completes. Outside
of this window, reads may return corrupt data.
Therefore, disable local interrupts in the critical section, to
maximize the probability that we can satisfy this requirement.
Fixes: d55ad2967d89 ("powerpc/mpc85xx: Create dts components for the FSL QorIQ DPAA FMan") Signed-off-by: Tobias Waldekranz <tobias@waldekranz.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Mask the ECN bits before calling mlx5e_route_lookup_ipv4_get(). The
tunnel key might have the last ECN bit set. This interferes with the
route lookup process as ip_route_output_key_hash() interpretes this bit
specially (to restrict the route scope).
Found by code inspection, compile tested only.
Fixes: c7b9038d8af6 ("net/mlx5e: TC preparation refactoring for routing update event") Fixes: 9a941117fb76 ("net/mlx5e: Maximize ip tunnel key usage on the TC offloading path") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
net/ipv4/fib_semantics.c uses an hash table of 256 slots,
keyed by device ifindexes: fib_info_devhash[DEVINDEX_HASHSIZE]
Problem is that with network namespaces, devices tend
to use the same ifindex.
lo device for instance has a fixed ifindex of one,
for all network namespaces.
This means that hosts with thousands of netns spend
a lot of time looking at some hash buckets with thousands
of elements, notably at netns dismantle.
Simply add a per netns perturbation (net_hash_mix())
to spread elements more uniformely.
Also change fib_devindex_hashfn() to use more entropy.
Fixes: aa79e66eee5d ("net: Make ifindex generation per-net namespace") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
In the past, free_fib_info() was supposed to be called
under RTNL protection.
This eventually was no longer the case.
Instead of enforcing RTNL it seems we simply can
move fib_info_cnt changes to occur when fib_info_lock
is held.
v2: David Laight suggested to update fib_info_cnt
only when an entry is added/deleted to/from the hash table,
as fib_info_cnt is used to make sure hash table size
is optimal.
BUG: KCSAN: data-race in fib_create_info / free_fib_info
read to 0xffffffff86e243a0 of 4 bytes by task 31505 on cpu 1:
free_fib_info+0x35/0x80 net/ipv4/fib_semantics.c:252
fib_info_put include/net/ip_fib.h:575 [inline]
nsim_fib4_rt_destroy drivers/net/netdevsim/fib.c:294 [inline]
nsim_fib4_rt_replace drivers/net/netdevsim/fib.c:403 [inline]
nsim_fib4_rt_insert drivers/net/netdevsim/fib.c:431 [inline]
nsim_fib4_event drivers/net/netdevsim/fib.c:461 [inline]
nsim_fib_event drivers/net/netdevsim/fib.c:881 [inline]
nsim_fib_event_work+0x15ca/0x2cf0 drivers/net/netdevsim/fib.c:1477
process_one_work+0x3fc/0x980 kernel/workqueue.c:2298
process_scheduled_works kernel/workqueue.c:2361 [inline]
worker_thread+0x7df/0xa70 kernel/workqueue.c:2447
kthread+0x2c7/0x2e0 kernel/kthread.c:327
ret_from_fork+0x1f/0x30
value changed: 0x00000d2d -> 0x00000d2e
Reported by Kernel Concurrency Sanitizer on:
CPU: 1 PID: 31505 Comm: kworker/1:21 Not tainted 5.16.0-rc6-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: events nsim_fib_event_work
Fixes: 48bb9eb47b27 ("netdevsim: fib: Add dummy implementation for FIB offload") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Cc: David Laight <David.Laight@ACULAB.COM> Cc: Ido Schimmel <idosch@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
A previous patch preventing "attr->sample_period" values from being
overridden in pfm events changed a related behaviour in arm-spe.
Before said patch:
perf record -c 10000 -e arm_spe_0// -- sleep 1
Would yield an SPE event with period=10000. After the patch, the period
in "-c 10000" was being ignored because the arm-spe code initializes
sample_period to a non-zero value.
This patch restores the previous behaviour for non-libpfm4 events.
Fixes: ae5dcc8abe31 (“perf record: Prevent override of attr->sample_period for libpfm4 events”) Reported-by: Chase Conklin <chase.conklin@arm.com> Signed-off-by: German Gomez <german.gomez@arm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20220118144054.2541-1-german.gomez@arm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Similar as with other pointer types where we use ldimm64, clear the register
content to zero first, and then populate the PTR_TO_FUNC type and subprogno
number. Currently this is not done, and leads to reuse of stale register
tracking data.
Given for special ldimm64 cases we always clear the register offset, make it
common for all cases, so it won't be forgotten in future.
Fixes: 69c087ba6225 ("bpf: Add bpf_for_each_map_elem() helper") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
We noticed our tc ebpf tools can't start after we upgrade our in-house kernel
version from 4.19 to 5.10. That is because of the behaviour change in bpffs
caused by commit d2935de7e4fd ("vfs: Convert bpf to use the new mount API").
In our tc ebpf tools, we do strict environment check. If the environment is
not matched, we won't allow to start the ebpf progs. One of the check is whether
bpffs is properly mounted. The mount information of bpffs in kernel-4.19 and
kernel-5.10 are as follows:
- kernel 4.19
$ mount -t bpf bpffs /sys/fs/bpf
$ mount -t bpf
bpffs on /sys/fs/bpf type bpf (rw,relatime)
- kernel 5.10
$ mount -t bpf bpffs /sys/fs/bpf
$ mount -t bpf
none on /sys/fs/bpf type bpf (rw,relatime)
The device name in kernel-5.10 is displayed as none instead of bpffs, then our
environment check fails. Currently we modify the tools to adopt to the kernel
behaviour change, but I think we'd better change the kernel code to keep the
behavior consistent.
After this change, the mount information will be displayed the same with the
behavior in kernel-4.19, for example:
$ mount -t bpf bpffs /sys/fs/bpf
$ mount -t bpf
bpffs on /sys/fs/bpf type bpf (rw,relatime)
Fixes: d2935de7e4fd ("vfs: Convert bpf to use the new mount API") Suggested-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: David Howells <dhowells@redhat.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Link: https://lore.kernel.org/bpf/20220108134623.32467-1-laoar.shao@gmail.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The bpf_xdp_link_update() function didn't check the program type before
updating the program, which made it possible to install any program type as
an XDP program, which is obviously not good. Syzbot managed to trigger this
by swapping in an LWT program on the XDP hook which would crash in a helper
call.
Fix this by adding a check and bailing out if the types don't match.
Mixed indentation levels in the lists of options in bpftool's
documentation produces some unexpected results. For the "bpftool" man
page, it prints a warning:
$ make -C bpftool.8
GEN bpftool.8
<stdin>:26: (ERROR/3) Unexpected indentation.
For other pages, there is no warning, but it results in a line break
appearing in the option lists in the generated man pages.
RST paragraphs should have a uniform indentation level. Let's fix it.
Fixes: c07ba629df97 ("tools: bpftool: Update and synchronise option list in doc and help msg") Fixes: 8cc8c6357c8f ("tools: bpftool: Document and add bash completion for -L, -B options") Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20211110114632.24537-5-quentin@isovalent.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Bpftool's Makefile, and the Makefile for its documentation, both include
scripts/utilities.mak, but they use none of the items defined in this
file. Remove the includes.
Accessing the crtc->state pointer from outside the modesetting context
is not allowed. We thus need to copy whatever we need from the KMS state
to our structure in order to access it.
In VC4, a number of users of that pointers have crept in over the years,
and the previous commits removed them all but the HVS channel a CRTC has
been assigned.
Let's move this channel in struct vc4_crtc at atomic_begin() time, drop
it from our private state structure, and remove our use of crtc->state
from our vblank handler entirely.
In some situation, we can end up being stuck on a non-blocking that went
through properly.
The situation that seems to trigger it reliably is to first start a
non-blocking commit, and then right after, and before we had any vblank
interrupt), start a blocking commit.
This will lead to the first commit workqueue to be scheduled, setup the
display, while the second commit is waiting for the first one to be
completed.
The vblank interrupt will then be raised, vc4_crtc_handle_vblank() will
run and will compare the active dlist in the HVS channel to the one
associated with the crtc->state.
However, at that point, the second commit is waiting using
drm_atomic_helper_wait_for_dependencies that occurs after
drm_atomic_helper_swap_state has been called, so crtc->state points to
the second commit state. vc4_crtc_handle_vblank() will compare the two
dlist addresses and since they don't match will ignore the interrupt.
The vblank event will never be reported, and the first and second commit
will wait for the first commit completion until they timeout.
The underlying reason is that it was never safe to do so. Indeed,
accessing the ->state pointer access synchronization is based on
ownership guarantees that can only occur within the functions and hooks
defined as part of the KMS framework, and obviously the irq handler
isn't one of them. The rework to move to generic helpers only uncovered
the underlying issue.
However, since the code path between
drm_atomic_helper_wait_for_dependencies() and
drm_atomic_helper_wait_for_vblanks() is serialised and we can't get two
commits in that path at the same time, we can work around this issue by
setting a variable associated to struct drm_crtc to the dlist we expect,
and then using it from the vc4_crtc_handle_vblank() function.
Since that state is shared with the modesetting path, we also need to
introduce a spinlock to protect the code shared between the interrupt
handler and the modesetting path, protecting only our new variable for
now.
Accessing the crtc->state pointer from outside the modesetting context
is not allowed. We thus need to copy whatever we need from the KMS state
to our structure in order to access it.
In VC4, a number of users of that pointers have crept in over the years,
the first one being whether or not the downstream controller of the
pixelvalve is our writeback controller.
Fortunately for us, Since commit 39fcb2808376 ("drm/vc4: txp: Turn the
TXP into a CRTC of its own") this is no longer something that can change
from one commit to the other and is hardcoded.
Let's set this flag in struct vc4_crtc if we happen to be the TXP, and
drop the flag from our private state structure.
We do test with inject error fault base on v4.19, after test some time we found
sync /dev/sda always failed.
[root@localhost] sync /dev/sda
sync: error syncing '/dev/sda': Input/output error
As 8d6996630c03 introduce 'fq->rq_status', this data only update when 'flush_rq'
reference count isn't zero. If flush request once failed and record error code
in 'fq->rq_status'. If there is no chance to update 'fq->rq_status',then do fsync
will always failed.
To address this issue reset 'fq->rq_status' after return error code to upper layer.
Fixes: 8d6996630c03("block: fix null pointer dereference in blk_mq_rq_timed_out()") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211129012659.1553733-1-yebin10@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
A previous commit added this feature, but it inadvertently used the wrong
variable to show/store the setting from/to, victimized by copy/paste. Fix
it up so that the async_depth sysfs interface reads and writes from the
right setting.
Fixes: 07757588e507 ("block/mq-deadline: Reserve 25% of scheduler tags for synchronous requests") Link: https://bugzilla.kernel.org/show_bug.cgi?id=215485 Reviewed-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
arch/powerpc/platforms/cell/pervasive.c:81:2: error: unannotated fall-through between switch labels
case SRR1_WAKEEE:
^
arch/powerpc/platforms/cell/pervasive.c:81:2: note: insert 'break;' to avoid fall-through
case SRR1_WAKEEE:
^
break;
1 error generated.
Clang is more pedantic than GCC, which does not warn when failing
through to a case that is just break or return. Clang's version is more
in line with the kernel's own stance in deprecated.rst. Add athe missing
break to silence the warning.
Fixes: 6e83985b0f6e ("powerpc/cbe: Do not process external or decremeter interrupts from sreset") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211207110228.698956-1-anders.roxell@linaro.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The reverted commit had added a retry mechanism to the command entry
index allocation. The previous patch ensures that there is a free
command entry index once the command work handler holds the command
semaphore. Thus the retry mechanism is not needed.
Fixes: 410bd754cd73 ("net/mlx5: Add retry mechanism to the command entry index allocation") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
By the spec, wq size and group association is not changeable unless device
is disabled. Exclude clearing the shadow copy on wq disable/reset. This
allows wq type to be changed after disable to be re-enabled.
Move the size and group association to its own cleanup and only call it
during device disable.
Fixes: 0dcfe41e9a4c ("dmanegine: idxd: cleanup all device related bits after disabling device") Reported-by: Lucas Van <lucas.van@intel.com> Tested-by: Lucas Van <lucas.van@intel.com> Signed-off-by: Dave Jiang <dave.jiang@intel.com> Link: https://lore.kernel.org/r/163951291732.2987775.13576571320501115257.stgit@djiang5-desk3.ch.intel.com Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The function regulator_get() returns an error pointer. Use IS_ERR() to
validate the return value.
Link: https://lore.kernel.org/r/20211222070930.9449-1-linmq006@gmail.com Fixes: cf137b3ea49a ("scsi: ufs-mediatek: Support VA09 regulator operations") Signed-off-by: Miaoqian Lin <linmq006@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The SCSI debugfs code supports showing information about pending commands,
including translating SCSI command flags from numeric into text format.
Also convert the SCMD_LAST flag from numeric into text form.
Link: https://lore.kernel.org/r/20211129194609.3466071-4-bvanassche@acm.org Fixes: 8930a6c20791 ("scsi: core: add support for request batching") Signed-off-by: Bart Van Assche <bvanassche@acm.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
10bbffa3e88e attempted to fix the use of rotation duration as
advertising duration but it didn't change the if condition which still
uses the duration instead of the timeout.
Fixes: 10bbffa3e88e ("Bluetooth: Fix using advertising instance duration as timeout") Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
On boards with LAN9514 and no preconfigured MAC address we don't get an
ip address from DHCP after commit a049a30fc27c ("net: usb: Correct PHY handling
of smsc95xx") anymore. Adding an explicit reset before starting the phy
fixes the issue.
From: Gabriel Hojda <ghojda@yo2urs.ro> Fixes: a049a30fc27c ("net: usb: Correct PHY handling of smsc95xx") Signed-off-by: Gabriel Hojda <ghojda@yo2urs.ro> Signed-off-by: Markus Reichl <m.reichl@fivetechno.de> Tested-by: Alexander Stein <alexander.stein@ew.tq-group.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The config RANDOMIZE_SLAB does not exist, the authors probably intended to
refer to the config RANDOMIZE_BASE, which provides kernel address-space
randomization. They probably just confused SLAB with BASE (these two
four-letter words coincidentally share three common letters), as they also
point out the config SLAB_FREELIST_RANDOM as further randomization within
the same sentence.
Fix the reference of the config for kernel address-space randomization to
the config that provides that.
Fixes: 6e88559470f5 ("Documentation: Add section about CPU vulnerabilities for Spectre") Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Link: https://lore.kernel.org/r/20211230171940.27558-1-lukas.bulwahn@gmail.com Signed-off-by: Jonathan Corbet <corbet@lwn.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
This config was removed so remove all references to it.
Fixes: 76a3c92ec9e0 ("cifs: remove support for NTLM and weaker authentication algorithms") Signed-off-by: Alexandre Ghiti <alexandre.ghiti@canonical.com> Reviewed-by: Steve French <smfrench@gmail.com> Acked-by: Arnd Bergmann <arnd@arndb.de> [arch/arm/configs] Acked-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
The data node reference documentation was missing a package that must
contain the property values, instead property name and multiple values
being present in a single package. This is not aligned with the _DSD
spec.
Fix it by adding the package for the values.
Also add the missing "reg" properties to two numbered nodes.
Fixes: b10134a3643d ("ACPI: property: Document hierarchical data extension references") Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Currently the documentation states that channels must be configured before
running the dmatest. This has not been true since commit 6b41030fdc79
("dmaengine: dmatest: Restore default for channel"). Fix accordingly.
Fixes: 6b41030fdc79 ("dmaengine: dmatest: Restore default for channel") Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org> Link: https://lore.kernel.org/r/20211118100952.27268-3-daniel.thompson@linaro.org Signed-off-by: Vinod Koul <vkoul@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Fix grammar/wording in the help text for MEDIA_TEST_SUPPORT.
Fixes: 4b32216adb01 ("media: split test drivers from platform directory") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
Similarly to what we encountered with the detect hook with DRM, nothing
actually prevents any of the CEC callback from being run while the HDMI
output is disabled.
However, this is an issue since any register access to the controller
when it's powered down will result in a silent hang.
Let's make sure we run the runtime_pm hooks when the CEC adapter is
opened and closed by the userspace to avoid that issue.
Fixes: 15b4511a4af6 ("drm/vc4: add HDMI CEC support") Reviewed-by: Dave Stevenson <dave.stevenson@raspberrypi.com> Signed-off-by: Maxime Ripard <maxime@cerno.tech> Link: https://patchwork.freedesktop.org/patch/msgid/20210819135931.895976-6-maxime@cerno.tech Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
PHTW register is selected based on default bit rate from Table[1].
for the bit rates less than or equal to 250. Currently first
value of default bit rate which is greater than or equal to
the caculated mbps is selected. This selection can be further
improved by selecting the default bit rate which is nearest to
the calculated value.
The help text for GOOGLE_FIRMWARE states that it should only be
enabled when building a kernel for Google's own servers. However,
many of the drivers dependent on it are also useful on Chromebooks or
on any platform using coreboot.
Update the help text to reflect this double duty.
Fixes: d384d6f43d1e ("firmware: google memconsole: Add coreboot support") Reviewed-by: Julius Werner <jwerner@chromium.org> Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Link: https://lore.kernel.org/r/20180618225540.GD14131@decadent.org.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
OF: /soc/pinctrl@1000000: phandle pinctrl@1000000 needs 3, found 2
Which is a more helpful messages making DT debugging easier.
In this particular example the phandle name looks like duplicate of the
same node name. But note that the first node is the parent node
(it->parent), while the second is the phandle target (it->node). They
happen to be the same in the case that triggered this improvement. See
commit 72cb4c48a46a ("arm64: dts: qcom: ipq6018: Fix gpio-ranges
property").
The return value was never initialized so the cleanup code executed when
it isn't even necessary.
Just add proper error handling.
Fixes: ab50cb9df889 ("drm/radeon/radeon_kms: Fix a NULL pointer dereference in radeon_driver_open_kms()") Signed-off-by: Christian König <christian.koenig@amd.com> Tested-by: Jan Stancek <jstancek@redhat.com> Tested-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
qmi tries to allocate a large contiguous dma memory at first,
on the AMD Ryzen platform it fails, then retries with small slices.
So set flag GFP_NOWARN to avoid flooding dmesg.
Signed-off-by: Aaron Ma <aaron.ma@canonical.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20210823063258.37747-1-aaron.ma@canonical.com Cc: "Limonciello, Mario" <mario.limonciello@amd.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
If start_per_cpu_kthreads() called from osnoise_workload_start() returns
error, event hooks are left in broken state: unhook_irq_events() called
but unhook_thread_events() and unhook_softirq_events() not called, and
trace_osnoise_callback_enabled flag not cleared.
On the next tracer enable, hooks get not installed due to
trace_osnoise_callback_enabled flag.
And on the further tracer disable an attempt to remove non-installed
hooks happened, hitting a WARN_ON_ONCE() in tracepoint_remove_func().
Fix the error path by adding the missing part of cleanup.
While at this, introduce osnoise_unhook_events() to avoid code
duplication between this error path and normal tracer disable.
Link: https://lkml.kernel.org/r/20220109153459.3701773-1-nikita.yushchenko@virtuozzo.com Cc: stable@vger.kernel.org Fixes: bce29ac9ce0b ("trace: Add osnoise tracer") Acked-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Nikita Yushchenko <nikita.yushchenko@virtuozzo.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>
We probably want to remove the indirect block to extents migration
feature after a deprecation window, but until then, let's fix a
potential data loss problem caused by the fact that we put the
tmp_inode on the orphan list. In the unlikely case where we crash and
do a journal recovery, the data blocks belonging to the inode being
migrated are also represented in the tmp_inode on the orphan list ---
and so its data blocks will get marked unallocated, and available for
reuse.
Instead, stop putting the tmp_inode on the oprhan list. So in the
case where we crash while migrating the inode, we'll leak an inode,
which is not a disaster. It will be easily fixed the next time we run
fsck, and it's better than potentially having blocks getting claimed
by two different files, and losing data as a result.
Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Lukas Czerner <lczerner@redhat.com> Cc: stable@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Paolo Pisati <paolo.pisati@canonical.com>