git.proxmox.com Git - mirror_ubuntu-kernels.git/log

Merge branch 'net-refcount-address-dst_entry-reference-count-scalability-issues'

Thomas Gleixner says:

====================
net, refcount: Address dst_entry reference count scalability issues

This is version 3 of this series. Version 2 can be found here:

     https://lore.kernel.org/lkml/20230307125358.772287565@linutronix.de

Wangyang and Arjan reported a bottleneck in the networking code related to
struct dst_entry::__refcnt. Performance tanks massively when concurrency on
a dst_entry increases.

This happens when there are a large amount of connections to or from the
same IP address. The memtier benchmark when run on the same host as
memcached amplifies this massively. But even over real network connections
this issue can be observed at an obviously smaller scale (due to the
network bandwith limitations in my setup, i.e. 1Gb). How to reproduce:

  Run memcached with -t $N and memtier_benchmark with -t $M and --ratio=1:100
  on the same machine. localhost connections amplify the problem.

  Start with the defaults for $N and $M and increase them. Depending on
  your machine this will tank at some point. But even in reasonably small
  $N, $M scenarios the refcount operations and the resulting false sharing
  fallout becomes visible in perf top. At some point it becomes the
  dominating issue.

There are two factors which make this reference count a scalability issue:

   1) False sharing

      dst_entry:__refcnt is located at offset 64 of dst_entry, which puts
      it into a seperate cacheline vs. the read mostly members located at
      the beginning of the struct.

      That prevents false sharing vs. the struct members in the first 64
      bytes of the structure, but there is also

           dst_entry::lwtstate

      which is located after the reference count and in the same cache
      line. This member is read after a reference count has been acquired.

      The other problem is struct rtable, which embeds a struct dst_entry
      at offset 0. struct dst_entry has a size of 112 bytes, which means
      that the struct members of rtable which follow the dst member share
      the same cache line as dst_entry::__refcnt. Especially

         rtable::rt_genid

      is also read by the contexts which have a reference count acquired
      already.

      When dst_entry:__refcnt is incremented or decremented via an atomic
      operation these read accesses stall and contribute to the performance
      problem.

   2) atomic_inc_not_zero()

      A reference on dst_entry:__refcnt is acquired via
      atomic_inc_not_zero() and released via atomic_dec_return().

      atomic_inc_not_zero() is implemted via a atomic_try_cmpxchg() loop,
      which exposes O(N^2) behaviour under contention with N concurrent
      operations. Contention scalability is degrading with even a small
      amount of contenders and gets worse from there.

      Lightweight instrumentation exposed an average of 8!! retry loops per
      atomic_inc_not_zero() invocation in a inc()/dec() loop running
      concurrently on 112 CPUs.

      There is nothing which can be done to make atomic_inc_not_zero() more
      scalable.

The following series addresses these issues:

    1) Reorder and pad struct dst_entry to prevent the false sharing.

    2) Implement and use a reference count implementation which avoids the
       atomic_inc_not_zero() problem.

       It is slightly less performant in the case of the final 0 -> -1
       transition, but the deconstruction of these objects is a low
       frequency event. get()/put() pairs are in the hotpath and that's
       what this implementation optimizes for.

       The algorithm of this reference count is only suitable for RCU
       managed objects. Therefore it cannot replace the refcount_t
       algorithm, which is also based on atomic_inc_not_zero(), due to a
       subtle race condition related to the 0 -> -1 transition and the final
       verdict to mark the reference count dead. See details in patch 2/3.

       It might be just my lack of imagination which declares this to be
       impossible and I'd be happy to be proven wrong.

       As a bonus the new rcuref implementation provides underflow/overflow
       detection and mitigation while being performance wise on par with
       open coded atomic_inc_not_zero() / atomic_dec_return() pairs even in
       the non-contended case.

The combination of these two changes results in performance gains in micro
benchmarks and also localhost and networked memtier benchmarks talking to
memcached. It's hard to quantify the benchmark results as they depend
heavily on the micro-architecture and the number of concurrent operations.

The overall gain of both changes for localhost memtier ranges from 1.2X to
3.2X and from +2% to %5% range for networked operations on a 1Gb connection.

A micro benchmark which enforces maximized concurrency shows a gain between
1.2X and 4.7X!!!

Obviously this is focussed on a particular problem and therefore needs to
be discussed in detail. It also requires wider testing outside of the cases
which this is focussed on.

Though the false sharing issue is obvious and should be addressed
independent of the more focussed reference count changes.

The series is also available from git:

  git://git.kernel.org/pub/scm/linux/kernel/git/tglx/devel.git rcuref

Changes vs. V2:

  - Rename __refcnt to __rcuref (Linus)

  - Fix comments and changelogs (Mark, Qiuxu)

  - Fixup kernel doc of generated atomic_add_negative() variants

I want to say thanks to Wangyang who analyzed the issue and provided the
initial fix for the false sharing problem. Further thanks go to Arjan
Peter, Marc, Will and Borislav for valuable input and providing test
results on machines which I do not have access to, and to Linus and
Eric, Qiuxu and Mark for helpful feedback.
====================

Link: https://lore.kernel.org/r/20230323102649.764958589@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dst: Switch to rcuref_t reference counting

Under high contention dst_entry::__refcnt becomes a significant bottleneck.

atomic_inc_not_zero() is implemented with a cmpxchg() loop, which goes into
high retry rates on contention.

Switch the reference count to rcuref_t which results in a significant
performance gain. Rename the reference count member to __rcuref to reflect
the change.

The gain depends on the micro-architecture and the number of concurrent
operations and has been measured in the range of +25% to +130% with a
localhost memtier/memcached benchmark which amplifies the problem
massively.

Running the memtier/memcached benchmark over a real (1Gb) network
connection the conversion on top of the false sharing fix for struct
dst_entry::__refcnt results in a total gain in the 2%-5% range over the
upstream baseline.

Reported-by: Wangyang Guo <wangyang.guo@intel.com>
Reported-by: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/r/20230307125538.989175656@linutronix.de
Link: https://lore.kernel.org/r/20230323102800.215027837@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: dst: Prevent false sharing vs. dst_entry:: __refcnt

dst_entry::__refcnt is highly contended in scenarios where many connections
happen from and to the same IP. The reference count is an atomic_t, so the
reference count operations have to take the cache-line exclusive.

Aside of the unavoidable reference count contention there is another
significant problem which is caused by that: False sharing.

perf top identified two affected read accesses. dst_entry::lwtstate and
rtable::rt_genid.

dst_entry:__refcnt is located at offset 64 of dst_entry, which puts it into
a seperate cacheline vs. the read mostly members located at the beginning
of the struct.

That prevents false sharing vs. the struct members in the first 64
bytes of the structure, but there is also

dst_entry::lwtstate

which is located after the reference count and in the same cache line. This
member is read after a reference count has been acquired.

struct rtable embeds a struct dst_entry at offset 0. struct dst_entry has a
size of 112 bytes, which means that the struct members of rtable which
follow the dst member share the same cache line as dst_entry::__refcnt.
Especially

rtable::rt_genid

is also read by the contexts which have a reference count acquired
already.

When dst_entry:__refcnt is incremented or decremented via an atomic
operation these read accesses stall. This was found when analysing the
memtier benchmark in 1:100 mode, which amplifies the problem extremly.

Move the rt[6i]_uncached[_list] members out of struct rtable and struct
rt6_info into struct dst_entry to provide padding and move the lwtstate
member after that so it ends up in the same cache line.

The resulting improvement depends on the micro-architecture and the number
of CPUs. It ranges from +20% to +120% with a localhost memtier/memcached
benchmark.

[ tglx: Rearrange struct ]

Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: David Ahern <dsahern@kernel.org>
Link: https://lore.kernel.org/r/20230323102800.042297517@linutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'locking/rcuref' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pulling rcurefs from Peter for tglx's work.

Link: https://lore.kernel.org/all/20230328084534.GE4253@hirez.programming.kicks-ass.net/
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net/mlx5e: Fix build break on 32bit

The cited commit caused the following build break in mlx5 due to a change
in size of MAX_SKB_FRAGS.

error: format '%lu' expects argument of type 'long unsigned int',
but argument 7 has type 'unsigned int' [-Werror=format=]

Fix this by explicit casting.

Fixes: 3948b05950fd ("net: introduce a config option to tweak MAX_SKB_FRAGS")
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
Link: https://lore.kernel.org/r/20230328200723.125122-1-saeed@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ethernet: ti: am65-cpsw: enable p0 host port rx_vlan_remap

By default, the tagged ingress packets to the switch from the host port
P0 get internal switch priority assigned equal to the DMA CPPI channel
number they came from, unless CPSW_P0_CONTROL_REG.RX_REMAP_VLAN is enabled.
This causes issues with applying QoS policies and mapping packets on
external port fifos, because the default configuration is vlan_aware and
DMA CPPI channels are shared between all external ports.

Hence enable CPSW_P0_CONTROL_REG.RX_REMAP_VLAN so packet will preserve
internal switch priority assigned following the VLAN(priority) tag no
matter through which DMA CPPI Channels packets enter the switch.

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Link: https://lore.kernel.org/r/20230327092103.3256118-1-s-vadapalli@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

net: ethernet: ti: am65-cpsw: add .ndo to set dma per-queue rate

Enable rate limiting TX DMA queues for CPSW interface by configuring the
rate in absolute Mb/s units per TX queue.

Example:
    ethtool -L eth0 tx 4

    echo 100 > /sys/class/net/eth0/queues/tx-0/tx_maxrate
    echo 200 > /sys/class/net/eth0/queues/tx-1/tx_maxrate
    echo 50 > /sys/class/net/eth0/queues/tx-2/tx_maxrate
    echo 30 > /sys/class/net/eth0/queues/tx-3/tx_maxrate

    # disable
    echo 0 > /sys/class/net/eth0/queues/tx-0/tx_maxrate

Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com>
Signed-off-by: Siddharth Vadapalli <s-vadapalli@ti.com>
Link: https://lore.kernel.org/r/20230327085758.3237155-1-s-vadapalli@ti.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Merge branch 'allocate-multiple-skbuffs-on-tx'

Arseniy Krasnov says:

====================
allocate multiple skbuffs on tx

This adds small optimization for tx path: instead of allocating single
skbuff on every call to transport, allocate multiple skbuff's until
credit space allows, thus trying to send as much as possible data without
return to af_vsock.c.

Also this patchset includes second patch which adds check and return from
'virtio_transport_get_credit()' and 'virtio_transport_put_credit()' when
these functions are called with 0 argument. This is needed, because zero
argument makes both functions to behave as no-effect, but both of them
always tries to acquire spinlock. Moreover, first patch always calls
function 'virtio_transport_put_credit()' with zero argument in case of
successful packet transmission.
====================

Link: https://lore.kernel.org/r/b0d15942-65ba-3a32-ba8d-fed64332d8f6@sberdevices.ru
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

virtio/vsock: check argument to avoid no effect call

Both of these functions have no effect when input argument is 0, so to
avoid useless spinlock access, check argument before it.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

virtio/vsock: allocate multiple skbuffs on tx

This adds small optimization for tx path: instead of allocating single
skbuff on every call to transport, allocate multiple skbuff's until
credit space allows, thus trying to send as much as possible data without
return to af_vsock.c.

Signed-off-by: Arseniy Krasnov <AVKrasnov@sberdevices.ru>
Reviewed-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>

atomics: Provide rcuref - scalable reference counting

atomic_t based reference counting, including refcount_t, uses
atomic_inc_not_zero() for acquiring a reference. atomic_inc_not_zero() is
implemented with a atomic_try_cmpxchg() loop. High contention of the
reference count leads to retry loops and scales badly. There is nothing to
improve on this implementation as the semantics have to be preserved.

Provide rcuref as a scalable alternative solution which is suitable for RCU
managed objects. Similar to refcount_t it comes with overflow and underflow
detection and mitigation.

rcuref treats the underlying atomic_t as an unsigned integer and partitions
this space into zones:

  0x00000000 - 0x7FFFFFFF valid zone (1 .. (INT_MAX + 1) references)
  0x80000000 - 0xBFFFFFFF saturation zone
  0xC0000000 - 0xFFFFFFFE dead zone
  0xFFFFFFFF    no reference

rcuref_get() unconditionally increments the reference count with
atomic_add_negative_relaxed(). rcuref_put() unconditionally decrements the
reference count with atomic_add_negative_release().

This unconditional increment avoids the inc_not_zero() problem, but
requires a more complex implementation on the put() side when the count
drops from 0 to -1.

When this transition is detected then it is attempted to mark the reference
count dead, by setting it to the midpoint of the dead zone with a single
atomic_cmpxchg_release() operation. This operation can fail due to a
concurrent rcuref_get() elevating the reference count from -1 to 0 again.

If the unconditional increment in rcuref_get() hits a reference count which
is marked dead (or saturated) it will detect it after the fact and bring
back the reference count to the midpoint of the respective zone. The zones
provide enough tolerance which makes it practically impossible to escape
from a zone.

The racy implementation of rcuref_put() requires to protect rcuref_put()
against a grace period ending in order to prevent a subtle use after
free. As RCU is the only mechanism which allows to protect against that, it
is not possible to fully replace the atomic_inc_not_zero() based
implementation of refcount_t with this scheme.

The final drop is slightly more expensive than the atomic_dec_return()
counterpart, but that's not the case which this is optimized for. The
optimization is on the high frequeunt get()/put() pairs and their
scalability.

The performance of an uncontended rcuref_get()/put() pair where the put()
is not dropping the last reference is still on par with the plain atomic
operations, while at the same time providing overflow and underflow
detection and mitigation.

The performance of rcuref compared to plain atomic_inc_not_zero() and
atomic_dec_return() based reference counting under contention:

-  Micro benchmark: All CPUs running a increment/decrement loop on an
    elevated reference count, which means the 0 to -1 transition never
    happens.

    The performance gain depends on microarchitecture and the number of
    CPUs and has been observed in the range of 1.3X to 4.7X

- Conversion of dst_entry::__refcnt to rcuref and testing with the
    localhost memtier/memcached benchmark. That benchmark shows the
    reference count contention prominently.

    The performance gain depends on microarchitecture and the number of
    CPUs and has been observed in the range of 1.1X to 2.6X over the
    previous fix for the false sharing issue vs. struct
    dst_entry::__refcnt.

    When memtier is run over a real 1Gb network connection, there is a
    small gain on top of the false sharing fix. The two changes combined
    result in a 2%-5% total gain for that networked test.

Reported-by: Wangyang Guo <wangyang.guo@intel.com>
Reported-by: Arjan Van De Ven <arjan.van.de.ven@intel.com>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230323102800.158429195@linutronix.de

atomics: Provide atomic_add_negative() variants

atomic_add_negative() does not provide the relaxed/acquire/release
variants.

Provide them in preparation for a new scalable reference count algorithm.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20230323102800.101763813@linutronix.de

Merge tag 'linux-can-next-for-6.4-20230327' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next

Marc Kleine-Budde says:

====================
pull-request: can-next 2023-03-27

The first 2 patches by Geert Uytterhoeven add transceiver support and
improve the error messages in the rcar_canfd driver.

Cai Huoqing contributes 3 patches which remove a redundant call to
pci_clear_master() in the c_can, ctucanfd and kvaser_pciefd driver.

Frank Jungclaus's patch replaces the struct esd_usb_msg with a union
in the esd_usb driver to improve readability.

Markus Schneider-Pargmann contributes 5 patches to improve the
performance in the m_can driver, especially for SPI attached
controllers like the tcan4x5x.

* tag 'linux-can-next-for-6.4-20230327' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next:
  can: m_can: Keep interrupts enabled during peripheral read
  can: m_can: Disable unused interrupts
  can: m_can: Remove double interrupt enable
  can: m_can: Always acknowledge all interrupts
  can: m_can: Remove repeated check for is_peripheral
  can: esd_usb: Improve code readability by means of replacing struct esd_usb_msg with a union
  can: kvaser_pciefd: Remove redundant pci_clear_master
  can: ctucanfd: Remove redundant pci_clear_master
  can: c_can: Remove redundant pci_clear_master
  can: rcar_canfd: Improve error messages
  can: rcar_canfd: Add transceiver support
====================

Link: https://lore.kernel.org/r/20230327073354.1003134-1-mkl@pengutronix.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge branch 'add-tx-push-buf-len-param-to-ethtool'

Shay Agroskin says:

====================
Add tx push buf len param to ethtool

This patchset adds a new sub-configuration to ethtool get/set queue
params (ethtool -g) called 'tx-push-buf-len'.

This configuration specifies the maximum number of bytes of a
transmitted packet a driver can push directly to the underlying
device ('push' mode). The motivation for pushing some of the bytes to
the device has the advantages of

- Allowing a smart device to take fast actions based on the packet's
  header
- Reducing latency for small packets that can be copied completely into
  the device

This new param is practically similar to tx-copybreak value that can be
set using ethtool's tunable but conceptually serves a different purpose.
While tx-copybreak is used to reduce the overhead of DMA mapping and
makes no sense to use if less than the whole segment gets copied,
tx-push-buf-len allows to improve performance by analyzing the packet's
data (usually headers) before performing the DMA operation.

The configuration can be queried and set using the commands:

    $ ethtool -g [interface]

    # ethtool -G [interface] tx-push-buf-len [number of bytes]

This patchset also adds support for the new configuration in ENA driver
for which this parameter ensures efficient resources management on the
device side.
====================

Link: https://lore.kernel.org/r/20230323163610.1281468-1-shayagr@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ena: Advertise TX push support

LLQ is auto enabled by the device and disabling it isn't supported on
new ENA generations while on old ones causes sub-optimal performance.

This patch adds advertisement of push-mode when LLQ mode is used, but
rejects an attempt to modify it.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ena: Add support to changing tx_push_buf_len

The ENA driver allows for two distinct values for the number of bytes
of the packet's payload that can be written directly to the device.

For a value of 224 the driver turns on Large LLQ Header mode in which
the first 224 of the packet's payload are written to the LLQ.

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ena: Recalculate TX state variables every device reset

With the ability to modify LLQ entry size, the size of packet's
payload that can be written directly to the device changes.
This patch makes the driver recalculate this information every device
negotiation (also called device reset).

Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ena: Add an option to configure large LLQ headers

Allow configuring the device with large LLQ headers. The Low Latency
Queue (LLQ) allows the driver to write the first N bytes of the packet,
along with the rest of the TX descriptors directly into device (N can be
either 96 or 224 for large LLQ headers configuration).

Having L4 TCP/UDP headers contained in the first 96 bytes of the packet
is required to get maximum performance from the device.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: David Arinzon <darinzon@amazon.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ena: Make few cosmetic preparations to support large LLQ

Move ena_calc_io_queue_size() implementation closer to the file's
beginning so that it can be later called from ena_device_init()
function without adding a function declaration.

Also add an empty line at some spots to separate logical blocks in
funcitons.

Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Michal Kubiak <michal.kubiak@intel.com>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ethtool: Add support for configuring tx_push_buf_len

This attribute, which is part of ethtool's ring param configuration
allows the user to specify the maximum number of the packet's payload
that can be written directly to the device.

Example usage:
# ethtool -G [interface] tx-push-buf-len [number of bytes]

Co-developed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

netlink: Add a macro to set policy message with format string

Similar to NL_SET_ERR_MSG_FMT, add a macro which sets netlink policy
error message with a format string.

Signed-off-by: Shay Agroskin <shayagr@amazon.com>
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

qed: remove unused num_ooo_add_to_peninsula variable

clang with W=1 reports
drivers/net/ethernet/qlogic/qed/qed_ll2.c:649:6: error: variable
  'num_ooo_add_to_peninsula' set but not used [-Werror,-Wunused-but-set-variable]
        u32 num_ooo_add_to_peninsula = 0, cid;
            ^
This variable is not used so remove it.

Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230326001733.1343274-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: introduce a config option to tweak MAX_SKB_FRAGS

Currently, MAX_SKB_FRAGS value is 17.

For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg()
attempts order-3 allocations, stuffing 32768 bytes per frag.

But with zero copy, we use order-0 pages.

For BIG TCP to show its full potential, we add a config option
to be able to fit up to 45 segments per skb.

This is also needed for BIG TCP rx zerocopy, as zerocopy currently
does not support skbs with frag list.

We have used MAX_SKB_FRAGS=45 value for years at Google before
we deployed 4K MTU, with no adverse effect, other than
a recent issue in mlx4, fixed in commit 26782aad00cc
("net/mlx4: MLX4_TX_BOUNCE_BUFFER_SIZE depends on MAX_SKB_FRAGS")

Back then, goal was to be able to receive full size (64KB) GRO
packets without the frag_list overhead.

Note that /proc/sys/net/core/max_skb_frags can also be used to limit
the number of fragments TCP can use in tx packets.

By default we keep the old/legacy value of 17 until we get
more coverage for the updated values.

Sizes of struct skb_shared_info on 64bit arches

MAX_SKB_FRAGS | sizeof(struct skb_shared_info):
==============================================
         17     320
         21     320+64  = 384
         25     320+128 = 448
         29     320+192 = 512
         33     320+256 = 576
         37     320+320 = 640
         41     320+384 = 704
         45     320+448 = 768

This inflation might cause problems for drivers assuming they could pack
both the incoming packet (for MTU=1500) and skb_shared_info in half a page,
using build_skb().

v3: fix build error when CONFIG_NET=n
v2: fix two build errors assuming MAX_SKB_FRAGS was "unsigned long"

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20230323162842.1935061-1-eric.dumazet@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

dev_ioctl: fix a W=1 warning

This fixes the following warning when compiled with GCC 12.2.0 and W=1.

net/core/dev_ioctl.c:475: warning: Function parameter or member 'data'
not described in 'dev_ioctl'

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: bcm7xxx: use devm_clk_get_optional_enabled to simplify the code

Use devm_clk_get_optional_enabled to simplify the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: ynl: default to treating enums as flags for mask generation

I was a bit too optimistic in commit bf51d27704c9 ("tools: ynl: fix
get_mask utility routine"), not every mask we use is necessarily
coming from an enum of type "flags". We also allow flipping an
enum into flags on per-attribute basis. That's done by
the 'enum-as-flags' property of an attribute.

Restore this functionality, it's not currently used by any in-tree
family.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: tls: add a test for queuing data before setting the ULP

Other tests set up the connection fully on both ends before
communicating any data. Add a test which will queue up TLS
records to TCP before the TLS ULP is installed.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: ynl: Add missing types to encode/decode

While testing the tool I noticed we miss the u16 type on payload create.
On the code inspection it turned out we miss also u64 - add them.

We also miss the decoding of u16 despite the fact `NlAttr` class
supports it - add it.

Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sunhme-cleanups'

Sean Anderson says:

====================
net: sunhme: Probe/IRQ cleanups

Well, I've had these patches kicking around in my tree since last
October, so I guess I had better get around to posting them. This
series is mainly a cleanup/consolidation of the probe process, with
some interrupt changes as well. Some of these changes are SBUS- (AKA
SPARC-) specific, so this should really get some testing there as well
to ensure nothing breaks. I've CC'd a few SPARC mailing lists in hopes
that someone there can try this out. I also have an SBUS card I
ordered by mistake if anyone has a SPARC computer but lacks this card.

Changes in v4:
- Tweak variable order for yuletide
- Move uninitialized return to its own commit
- Use correct SBUS/PCI accessors
- Rework hme_version to set the default in pci/sbus_probe and override it (if
necessary) in common_probe

Changes in v3:
- Incorperate a fix from another series into this commit

Changes in v2:
- Move happy_meal_begin_auto_negotiation earlier and remove forward declaration
- Make some more includes common
- Clean up mac address init
- Inline error returns
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: sunhme: Consolidate common probe tasks

Most of the second half of the PCI/SBUS probe functions are the same.
Consolidate them into a common function.

Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sunhme: Inline error returns

The err_out label used to have cleanup. Now that it just returns, inline it
everywhere.

Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sunhme: Clean up mac address init

Clean up some oddities suggested during review.

Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sunhme: Consolidate mac address initialization

The mac address initialization is braodly the same between PCI and SBUS,
and one was clearly copied from the other. Consolidate them. We still have
to have some ifdefs because pci_(un)map_rom is only implemented for PCI,
and idprom is only implemented for SPARC.

Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sunhme: Switch SBUS to devres

The PCI half of this driver was converted in commit 914d9b2711dd ("sunhme:
switch to devres"). Do the same for the SBUS half.

Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sunhme: Alphabetize includes

Alphabetize includes to make it clearer where to add new ones.

Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sunhme: Unify IRQ requesting

Instead of registering one interrupt handler for all four SBUS Quattro
HMEs, let each HME register its own handler. To make this work, we don't
handle the IRQ if none of the status bits are set. This reduces the
complexity of the driver, and makes it easier to ensure things happen
before/after enabling IRQs.

I'm not really sure why we request IRQs in two different places (and leave
them running after removing the driver!). A lot of things in this driver
seem to just be crusty, and not necessarily intentional. I'm assuming
that's the case here as well.

This really needs to be tested by someone with an SBUS Quattro card.

Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sunhme: Remove residual polling code

The sunhme driver never used the hardware MII polling feature. Even the
if-def'd out happy_meal_poll_start was removed by 2002 [1]. Remove the
various places in the driver which needlessly guard against MII interrupts
which will never be enabled.

[1] https://lwn.net/2002/0411/a/2.5.8-pre3.php3

Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sunhme: Just restart autonegotiation if we can't bring the link up

If we've tried regular autonegotiation and forcing the link mode, just
restart autonegotiation instead of reinitializing the whole NIC.

Signed-off-by: Sean Anderson <seanga2@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sunhme: Fix uninitialized return code

Fix an uninitialized return code if we never found a qfe slot. It would be
a bug if we ever got into this situation, but it's good to return something
tracable.

Fixes: acb3f35f920b ("sunhme: forward the error code from pci_enable_device()")
Reported-by: kernel test robot <lkp@intel.com>
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Sean Anderson <seanga2@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'octeon_ep-deferred-probe-and-mailbox'

Veerasenareddy Burru says:

====================
octeon_ep: deferred probe and mailbox

Implement Deferred probe, mailbox enhancements and heartbeat monitor.

v4 -> v5:
   - addressed review comments
     https://lore.kernel.org/all/20230323104703.GD36557@unreal/
     replaced atomic_inc() + atomic_read() with atomic_inc_return().

v3 -> v4:
   - addressed review comments on v3
     https://lore.kernel.org/all/20230214051422.13705-1-vburru@marvell.com/
   - 0004-xxx.patch v3 is split into 0004-xxx.patch and 0005-xxx.patch
     in v4.
   - API changes to accept function ID are moved to 0005-xxx.patch.
   - fixed rct violations.
   - reverted newly added changes that do not yet have use cases.

v2 -> v3:
   - removed SRIOV VF support changes from v2, as new drivers which use
     ndo_get_vf_xxx() and ndo_set_vf_xxx() are not accepted.
     https://lore.kernel.org/all/20221207200204.6819575a@kernel.org/

     Will implement VF representors and submit again.
   - 0007-xxx.patch and 0008-xxx.patch from v2 are removed and
     0009-xxx.patch in v2 is now 0007-xxx.patch in v3.
   - accordingly, changed title for cover letter.

v1 -> v2:
   - remove separate workqueue task to wait for firmware ready.
     instead defer probe when firmware is not ready.
Reported-by: Leon Romanovsky <leon@kernel.org>
   - This change has resulted in update of 0001-xxx.patch and
     all other patches in the patchset.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

octeon_ep: add heartbeat monitor

Monitor periodic heartbeat messages from device firmware.
Presence of heartbeat indicates the device is active and running.
If the heartbeat is missed for configured interval indicates
firmware has crashed and device is unusable; in this case, PF driver
stops and uninitialize the device.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeon_ep: function id in link info and stats mailbox commands

Update control mailbox API to include function id in get stats and
link info.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeon_ep: support asynchronous notifications

Add asynchronous notification support to the control mailbox.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeon_ep: include function id in mailbox commands

Extend control command structure to include vfid and
update APIs to accept VF ID.

Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeon_ep: add separate mailbox command and response queues

Enhance control mailbox protocol to support following
- separate command and response queues
    * command queue to send control commands to firmware.
    * response queue to receive responses and notifications from
      firmware.
- variable size messages using scatter/gather

Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeon_ep: control mailbox for multiple PFs

Add control mailbox support for multiple PFs.
Update control mbox base address calculation based on PF function link.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeon_ep: poll for control messages

Poll for control messages until interrupts are enabled.
All the interrupts are enabled in ndo_open().
Add ability to listen for notifications from firmware before ndo_open().
Once interrupts are enabled, this polling is disabled and all the
messages are processed by bottom half of interrupt handler.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeon_ep: defer probe if firmware not ready

Defer probe if firmware is not ready for device usage.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Signed-off-by: Abhijit Ayarekar <aayarekar@marvell.com>
Signed-off-by: Satananda Burla <sburla@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bcm53134-support'

Álvaro Fernández Rojas says:

====================
net: dsa: b53: mdio: add support for BCM53134

This is based on the initial work from Paul Geurts that was sent to the
incorrect linux development lists and recipients.
I've modified it by removing BCM53134_DEVICE_ID from is531x5() and therefore
adding is53134() where needed.
I also added a separate RGMII handling block for is53134() since according to
Paul, BCM53134 doesn't support RGMII_CTRL_TIMING_SEL as opposed to is531x5().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: b53: mdio: add support for BCM53134

Add support for the BCM53134 Ethernet switch in the existing b53 dsa driver.
BCM53134 is very similar to the BCM58XX series.

Signed-off-by: Paul Geurts <paul.geurts@prodrive-technologies.com>
Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: dsa: b53: add BCM53134 support

BCM53134 are B53 switches connected by MDIO.

Signed-off-by: Álvaro Fernández Rojas <noltari@gmail.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

tools: ynl: add the Python requirements.txt file

It is a good practice to state explicitly which are the required Python
packages needed in a particular project to run it. The most commonly
used way is to store them in the `requirements.txt` file*.

*URL: https://pip.pypa.io/en/stable/reference/requirements-file-format/

Currently user needs to figure out himself that Python needs `PyYAML`
and `jsonschema` (and theirs requirements) packages to use the tool.
Add the `requirements.txt` for user convenience.

How to use it:
1) (optional) Create and activate empty virtual environment:
  python3.X -m venv venv3X
  source ./venv3X/bin/activate
2) Install all the required packages:
  pip install -r requirements.txt
    or
  python -m pip install -r requirements.txt
3) Run the script!

The `requirements.txt` file was tested for:
* Python 3.6
* Python 3.8
* Python 3.10

Signed-off-by: Michal Michalik <michal.michalik@intel.com>
Link: https://lore.kernel.org/r/20230323190802.32206-1-michal.michalik@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

mISDN: remove unused vpm_read_address and cpld_read_reg functions

clang with W=1 reports
drivers/isdn/hardware/mISDN/hfcmulti.c:667:1: error: unused function
'vpm_read_address' [-Werror,-Wunused-function]
vpm_read_address(struct hfc_multi *c)
^

drivers/isdn/hardware/mISDN/hfcmulti.c:643:1: error: unused function
'cpld_read_reg' [-Werror,-Wunused-function]
cpld_read_reg(struct hfc_multi *hc, unsigned char reg)
^

These functions are not used, so remove them.

Reported-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Tom Rix <trix@redhat.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230323161343.2633836-1-trix@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge patch series "can: m_can: Optimizations for m_can/tcan part 2"

Markus Schneider-Pargmann <msp@baylibre.com> says:

third version part 2, functionally I had to move from spin_lock to
spin_lock_irqsave because of an interrupt that was calling start_xmit,
see attached stack. This is tested on tcan455x but I don't have the
integrated hardware myself so any testing is appreciated.

The series implements many small and bigger throughput improvements.

Changes in v3:
- Remove parenthesis in error messages
- Use memcpy_and_pad for buffer copy in 'can: m_can: Write transmit
  header and data in one transaction'.
- Replace spin_lock with spin_lock_irqsave. I got a report of a
  interrupt that was calling start_xmit just after the netqueue was
  woken up before the locked region was exited. spin_lock_irqsave should
  fix this. I attached the full stack at the end of the mail if someone
  wants to know.
- Rebased to v6.3-rc1.
- Removed tcan4x5x patches from this series.

Changes in v2: https://lore.kernel.org/all/20230125195059.630377-1-msp@baylibre.com
- Rebased on v6.2-rc5
- Fixed missing/broken accounting for non peripheral m_can devices.

part 1:
v1 - https://lore.kernel.org/lkml/20221116205308.2996556-1-msp@baylibre.com
v2 - https://lore.kernel.org/lkml/20221206115728.1056014-1-msp@baylibre.com

part 2:
v1 - https://lore.kernel.org/lkml/20221221152537.751564-1-msp@baylibre.com
v2 - https://lore.kernel.org/lkml/20230125195059.630377-1-msp@baylibre.com

stack of calling start_xmit within locked region:
[  308.170171]  dump_backtrace+0x0/0x1a0
[  308.173841]  show_stack+0x18/0x70
[  308.177158]  sched_show_task+0x154/0x180
[  308.181084]  dump_cpu_task+0x44/0x54
[  308.184664]  rcu_dump_cpu_stacks+0xe8/0x12c
[  308.188846]  rcu_sched_clock_irq+0x9f4/0xd10
[  308.193118]  update_process_times+0x9c/0xec
[  308.197304]  tick_sched_handle+0x34/0x60
[  308.201231]  tick_sched_timer+0x4c/0xa4
[  308.205071]  __hrtimer_run_queues+0x138/0x1e0
[  308.209429]  hrtimer_interrupt+0xe8/0x244
[  308.213440]  arch_timer_handler_phys+0x38/0x50
[  308.217890]  handle_percpu_devid_irq+0x84/0x130
[  308.222422]  handle_domain_irq+0x60/0x90
[  308.226347]  gic_handle_irq+0x54/0x130
[  308.230099]  do_interrupt_handler+0x34/0x60
[  308.234286]  el1_interrupt+0x30/0x80
[  308.237861]  el1h_64_irq_handler+0x18/0x24
[  308.241958]  el1h_64_irq+0x78/0x7c
[  308.245360]  queued_spin_lock_slowpath+0xf4/0x390
[  308.250067]  m_can_start_tx+0x20/0xb0 [m_can]
[  308.254431]  m_can_start_xmit+0xd8/0x230 [m_can]
[  308.259054]  dev_hard_start_xmit+0xd4/0x15c
[  308.263241]  sch_direct_xmit+0xe8/0x370
[  308.267080]  __qdisc_run+0x118/0x650
[  308.270660]  net_tx_action+0x118/0x230
[  308.274409]  _stext+0x124/0x2a0
[  308.277549]  __irq_exit_rcu+0xe4/0x100
[  308.281302]  irq_exit+0x10/0x20
[  308.284444]  handle_domain_irq+0x64/0x90
[  308.288367]  gic_handle_irq+0x54/0x130
[  308.292119]  call_on_irq_stack+0x2c/0x54
[  308.296043]  do_interrupt_handler+0x54/0x60
[  308.300228]  el1_interrupt+0x30/0x80
[  308.303804]  el1h_64_irq_handler+0x18/0x24
[  308.307901]  el1h_64_irq+0x78/0x7c
[  308.311303]  __netif_schedule+0x78/0xa0
[  308.315138]  netif_tx_wake_queue+0x50/0x7c
[  308.319237]  m_can_isr+0x474/0x1710 [m_can]
[  308.323425]  irq_thread_fn+0x2c/0x9c
[  308.327005]  irq_thread+0x178/0x2c0
[  308.330497]  kthread+0x150/0x160
[  308.333727]  ret_from_fork+0x10/0x20

Link: https://lore.kernel.org/all/20230315110546.2518305-1-msp@baylibre.com
[mkl: apply patches 1...5 only, adjust message accordingly]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: m_can: Keep interrupts enabled during peripheral read

Interrupts currently get disabled if the interrupt status shows new
received data. Non-peripheral chips handle receiving in a worker thread,
but peripheral chips are handling the receive process in the threaded
interrupt routine itself without scheduling it for a different worker.
So there is no need to disable interrupts for peripheral chips.

Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/all/20230315110546.2518305-6-msp@baylibre.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: m_can: Disable unused interrupts

There are a number of interrupts that are not used by the driver at the
moment. Disable all of these.

Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/all/20230315110546.2518305-5-msp@baylibre.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: m_can: Remove double interrupt enable

Interrupts are enabled a few lines further down as well. Remove this
second call to enable all interrupts.

Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/all/20230315110546.2518305-4-msp@baylibre.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: m_can: Always acknowledge all interrupts

The code already exits the function on !ir before this condition. No
need to check again if anything is set as IR_ALL_INT is 0xffffffff.

Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/all/20230315110546.2518305-3-msp@baylibre.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: m_can: Remove repeated check for is_peripheral

Merge both if-blocks to fix this.

Signed-off-by: Markus Schneider-Pargmann <msp@baylibre.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/all/20230315110546.2518305-2-msp@baylibre.com
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: esd_usb: Improve code readability by means of replacing struct esd_usb_msg with a union

As suggested by Vincent Mailhol, declare struct esd_usb_msg as a union
instead of a struct. Then replace all msg->msg.something constructs,
that make use of esd_usb_msg, with simpler and prettier looking
msg->something variants.

Link: https://lore.kernel.org/all/CAMZ6RqKRzJwmMShVT9QKwiQ5LJaQupYqkPkKjhRBsP=12QYpfA@mail.gmail.com/
Suggested-by: Vincent MAILHOL <mailhol.vincent@wanadoo.fr>
Signed-off-by: Frank Jungclaus <frank.jungclaus@esd.eu>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/20230222163754.3711766-1-frank.jungclaus@esd.eu
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge patch series "can: remove redundant pci_clear_master()"

Cai Huoqing's series removes redundant pci_clear_master() to simplify
the code of multiple CAN drivers.

Link: https://lore.kernel.org/all/20230323113318.9473-1-cai.huoqing@linux.dev
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: kvaser_pciefd: Remove redundant pci_clear_master

Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;

pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}

pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Link: https://lore.kernel.org/all/20230323113318.9473-3-cai.huoqing@linux.dev
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: ctucanfd: Remove redundant pci_clear_master

Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;

pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}

pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Link: https://lore.kernel.org/all/20230323113318.9473-2-cai.huoqing@linux.dev
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Remove redundant pci_clear_master

Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;

pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}

pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Link: https://lore.kernel.org/all/20230323113318.9473-1-cai.huoqing@linux.dev
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge patch series "can: rcar_canfd: Add transceiver support"

Geert Uytterhoeven <geert+renesas@glider.be> says:

This patch series adds transceiver support to the Renesas R-Car CAN-FD
driver, and improves the printing of error messages, as requested by
Vincent.

Link: https://lore.kernel.org/all/cover.1679414936.git.geert+renesas@glider.be
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_canfd: Improve error messages

Improve printed error messages:
  - Replace numerical error codes by mnemotechnic error codes, to
    improve the user experience in case of errors,
  - Drop parentheses around printed numbers, cfr.
    Documentation/process/coding-style.rst,
  - Drop printing of an error message in case of out-of-memory, as the
    core memory allocation code already takes care of this.

Suggested-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/4162cc46f72257ec191007675933985b6df394b9.1679414936.git.geert+renesas@glider.be
[mkl: use colon instead of comma]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: rcar_canfd: Add transceiver support

Add support for CAN transceivers described as PHYs.

While simple CAN transceivers can do without, this is needed for CAN
transceivers like NXP TJR1443 that need a configuration step (like
pulling standby or enable lines), and/or impose a bitrate limit.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Reviewed-by: Vincent Mailhol <mailhol.vincent@wanadoo.fr>
Link: https://lore.kernel.org/all/1ce907572ac1d4e1733fa6ea7712250f2229cfcb.1679414936.git.geert+renesas@glider.be
[mkl: squash error message update from patch 2]
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Conflicts:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
  6e9d51b1a5cb ("net/mlx5e: Initialize link speed to zero")
  1bffcea42926 ("net/mlx5e: Add devlink hairpin queues parameters")
https://lore.kernel.org/all/20230324120623.4ebbc66f@canb.auug.org.au/
https://lore.kernel.org/all/20230321211135.47711-1-saeed@kernel.org/

Adjacent changes:

drivers/net/phy/phy.c
  323fe43cf9ae ("net: phy: Improved PHY error reporting in state machine")
  4203d84032e2 ("net: phy: Ensure state transitions are processed from phy_stop()")

Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'net-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Jakub Kicinski:
"Including fixes from bpf, wifi and bluetooth.

  Current release - regressions:

   - wifi: mt76: mt7915: add back 160MHz channel width support for
     MT7915

   - libbpf: revert poisoning of strlcpy, it broke uClibc-ng

  Current release - new code bugs:

   - bpf: improve the coverage of the "allow reads from uninit stack"
     feature to fix verification complexity problems

   - eth: am65-cpts: reset PPS genf adj settings on enable

  Previous releases - regressions:

   - wifi: mac80211: serialize ieee80211_handle_wake_tx_queue()

   - wifi: mt76: do not run mt76_unregister_device() on unregistered hw,
     fix null-deref

   - Bluetooth: btqcomsmd: fix command timeout after setting BD address

   - eth: igb: revert rtnl_lock() that causes a deadlock

   - dsa: mscc: ocelot: fix device specific statistics

  Previous releases - always broken:

   - xsk: add missing overflow check in xdp_umem_reg()

   - wifi: mac80211:
      - fix QoS on mesh interfaces
      - fix mesh path discovery based on unicast packets

   - Bluetooth:
      - ISO: fix timestamped HCI ISO data packet parsing
      - remove "Power-on" check from Mesh feature

   - usbnet: more fixes to drivers trusting packet length

   - wifi: iwlwifi: mvm: fix mvmtxq->stopped handling

   - Bluetooth: btintel: iterate only bluetooth device ACPI entries

   - eth: iavf: fix inverted Rx hash condition leading to disabled hash

   - eth: igc: fix the validation logic for taprio's gate list

   - dsa: tag_brcm: legacy: fix daisy-chained switches

  Misc:

   - bpf: adjust insufficient default bpf_jit_limit to account for
     growth of BPF use over the last 5 years

   - xdp: bpf_xdp_metadata() use EOPNOTSUPP as unique errno indicating
     no driver support"

* tag 'net-6.3-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (84 commits)
  Bluetooth: HCI: Fix global-out-of-bounds
  Bluetooth: mgmt: Fix MGMT add advmon with RSSI command
  Bluetooth: btsdio: fix use after free bug in btsdio_remove due to unfinished work
  Bluetooth: L2CAP: Fix responding with wrong PDU type
  Bluetooth: btqcomsmd: Fix command timeout after setting BD address
  Bluetooth: btinel: Check ACPI handle for NULL before accessing
  net: mdio: thunder: Add missing fwnode_handle_put()
  net: dsa: mt7530: move setting ssc_delta to PHY_INTERFACE_MODE_TRGMII case
  net: dsa: mt7530: move lowering TRGMII driving to mt7530_setup()
  net: dsa: mt7530: move enabling disabling core clock to mt7530_pll_setup()
  net: asix: fix modprobe "sysfs: cannot create duplicate filename"
  gve: Cache link_speed value from device
  tools: ynl: Fix genlmsg header encoding formats
  net: enetc: fix aggregate RMON counters not showing the ranges
  Bluetooth: Remove "Power-on" check from Mesh feature
  Bluetooth: Fix race condition in hci_cmd_sync_clear
  Bluetooth: btintel: Iterate only bluetooth device ACPI entries
  Bluetooth: ISO: fix timestamped HCI ISO data packet parsing
  Bluetooth: btusb: Remove detection of ISO packets over bulk
  Bluetooth: hci_core: Detect if an ACL packet is in fact an ISO packet
  ...

Merge tag 'for-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux

Pull btrfs fixes from David Sterba:
"A few more fixes, the zoned accounting fix is spread across a few
  patches, preparatory and the actual fixes:

   - zoned mode:
      - fix accounting of unusable zone space
      - fix zone activation condition for DUP profile
      - preparatory patches

   - improved error handling of missing chunks

   - fix compiler warning"

* tag 'for-6.3-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
  btrfs: zoned: drop space_info->active_total_bytes
  btrfs: zoned: count fresh BG region as zone unusable
  btrfs: use temporary variable for space_info in btrfs_update_block_group
  btrfs: rename BTRFS_FS_NO_OVERCOMMIT to BTRFS_FS_ACTIVE_ZONE_TRACKING
  btrfs: zoned: fix btrfs_can_activate_zone() to support DUP profile
  btrfs: fix compiler warning on SPARC/PA-RISC handling fscrypt_setup_filename
  btrfs: handle missing chunk mapping more gracefully

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
"Four small fixes, three in drivers.

  The core fix adds a UFS device to an existing quirk to avoid a huge
  delay on boot"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: scsi_dh_alua: Fix memleak for 'qdata' in alua_activate()
  scsi: qla2xxx: Synchronize the IOCB count to be in order
  scsi: qla2xxx: Perform lockless command completion in abort path
  scsi: core: Add BLIST_SKIP_VPD_PAGES for SKhynix H28U74301AMR

net: phy: Improved PHY error reporting in state machine

When the PHY library calls phy_error() something bad has happened, and
we halt the PHY state machine. Calling phy_error() from the main state
machine however is not precise enough to know whether the issue is
reading the link status or starting auto-negotiation.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ism: Remove redundant pci_clear_master

Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;

pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}

pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>

isdn: mISDN: netjet: Remove redundant pci_clear_master

Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;

pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}

pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: micrel: Add support for PTP_PF_EXTTS for lan8841

Extend the PTP programmable gpios to implement also PTP_PF_EXTTS
function. The pins can be configured to capture both of rising
and falling edge. Once the event is seen, then an interrupt is
generated and the LTC is saved in the registers.

This was tested using:
ts2phc -m -l 7 -s generic -f ts2phc.cfg

Where the configuration was the following:
[global]
ts2phc.pin_index 6

[eth2]

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: ec_bhf: Remove redundant pci_clear_master

Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;

pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}

pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>

ionic: Remove redundant pci_clear_master

Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;

pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}

pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Acked-by: Shannon Nelson <shannon.nelson@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mana: Remove redundant pci_clear_master

Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;

pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}

pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5: Remove redundant pci_clear_master

Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;

pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}

pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/fungible: Remove redundant pci_clear_master

Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;

pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}

pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Acked-by: Dimitris Michailidis <dmichail@fungible.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cxgb4vf: Remove redundant pci_clear_master

Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;

pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}

pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hisilicon: Remove redundant pci_clear_master

Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;

pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}

pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: liquidio: Remove redundant pci_clear_master

Remove pci_clear_master to simplify the code,
the bus-mastering is also cleared in do_pci_disable_device,
like this:
./drivers/pci/pci.c:2197
static void do_pci_disable_device(struct pci_dev *dev)
{
u16 pci_command;

pci_read_config_word(dev, PCI_COMMAND, &pci_command);
if (pci_command & PCI_COMMAND_MASTER) {
pci_command &= ~PCI_COMMAND_MASTER;
pci_write_config_word(dev, PCI_COMMAND, pci_command);
}

pcibios_disable_device(dev);
}.
And dev->is_busmaster is set to 0 in pci_disable_device.

Signed-off-by: Cai Huoqing <cai.huoqing@linux.dev>
Signed-off-by: David S. Miller <davem@davemloft.net>

fix typos in net/sched/* files

This patch fixes typos in net/sched/* files.

Signed-off-by: Taichi Nishimura <awkrail01@gmail.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ethernet: broadcom/sb1250-mac: clean up after SIBYTE_BCM1x55 removal

With commit b984d7b56dfc ("MIPS: sibyte: Remove unused config option
SIBYTE_BCM1x55"), some #if's in the Broadcom SiByte SOC built-in Ethernet
driver can be simplified.

Simplify prepreprocessor conditions after config SIBYTE_BCM1x55 removal.

Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Steen Hegelund <Steen.Hegelund@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: prevent router_solicitations for team port

The issue fixed for bonding in commit c2edacf80e15 ("bonding / ipv6: no
addrconf for slaves separately from master") also exists in team driver.
However, we can't just disable ipv6 addrconf for team ports, as 'teamd'
will need it when nsns_ping watch is used in the user space.

Instead of preventing ipv6 addrconf, this patch only prevents RS packets
for team ports, as it did in commit b52e1cce31ca ("ipv6: Don't send rs
packets to the interface of ARPHRD_TUNNEL").

Note that we do not prevent DAD packets, to avoid the changes getting
intricate / hacky. Also, usually sysctl dad_transmits is set to 1 and
only 1 DAD packet will be sent, and by now no libteam user complains
about DAD packets on team ports, unlike RS packets.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'main' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Florian Westphal says:

====================
netfilter updates for net-next

This pull request contains changes for the *net-next* tree.

1. Change IPv6 stack to keep conntrack references until ipsec policy
   checks are done, like ipv4, from Madhu Koriginja.
   This update was missed when IPv6 NAT support was added 10 years ago.

2. get rid of old 'compat' structure layout in nf_nat_redirect
   core and move the conversion to the only user that needs the
   old layout for abi reasons. From Jeremy Sowden.

3. Compact some common code paths in nft_redir, also from Jeremy.

4. Time to remove the 'default y' knob so iptables 32bit compat interface
   isn't compiled in by default anymore, from myself.

5. Move ip(6)tables builtin icmp matches to the udptcp one.
   This has the advantage that icmp/icmpv6 match doesn't load the
   iptables/ip6tables modules anymore when iptables-nft is used.
   Also from myself.

* 'main' of ssh://gitolite.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
  netfilter: keep conntrack reference until IPsecv6 policy checks are done
  xtables: move icmp/icmpv6 logic to xt_tcpudp
  netfilter: xtables: disable 32bit compat interface by default
  netfilter: nft_masq: deduplicate eval call-backs
  netfilter: nft_redir: use `struct nf_nat_range2` throughout and deduplicate eval call-backs
====================

Link: https://lore.kernel.org/r/20230322210802.6743-1-fw@strlen.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: netdev: add note about Changes Requested and revising commit messages

One of the most commonly asked questions is "I answered all questions
and don't need to make any code changes, why was the patch not applied".
Document our time honored tradition of asking people to repost with
improved commit messages, to record the answers to reviewer questions.

Take this opportunity to also recommend a change log format.

Link: https://lore.kernel.org/r/20230322231202.265835-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

bnx2: remove deadcode in bnx2_init_cpus()

The load_cpu_fw function has no error return code
and always returns zero. Checking the value returned by
this function does not make sense.
Now checking the value of the return value is misleading when reading
the code. Path with error handling was deleted in 57579f7629a3
("bnx2: Use request_firmware()").
As a result, bnx2_init_cpus() will also return only zero
Therefore, it will be safe to change the type of functions
to void and remove checking to improving readability.

Found by Security Code and Linux Verification
Center (linuxtesting.org) with SVACE

Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Maxim Korotkov <korotkov.maxim.s@gmail.com>
Link: https://lore.kernel.org/r/20230322162843.3452-1-korotkov.maxim.s@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

net: ipa: add IPA v5.0 to ipa_version_string()

In the IPA device sysfs directory, the "version" file can be read to
find out what IPA version is implemented. The content of this file
is supplied by ipa_version_string(), which needs to be updated to
properly handle IPA v5.0.

Signed-off-by: Alex Elder <elder@linaro.org>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Link: https://lore.kernel.org/r/20230322144742.2203947-1-elder@linaro.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

ynl: allow to encode u8 attr

Playing with dpll netlink, I came across following issue:
$ sudo ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/dpll.yaml --do pin-set --json '{"id": 0, "pin-idx": 1, "pin-state": 1}'
Traceback (most recent call last):
  File "tools/net/ynl/cli.py", line 52, in <module>
    main()
  File "tools/net/ynl/cli.py", line 40, in main
    reply = ynl.do(args.do, attrs)
  File "tools/net/ynl/lib/ynl.py", line 520, in do
    return self._op(method, vals)
  File "tools/net/ynl/lib/ynl.py", line 476, in _op
    msg += self._add_attr(op.attr_set.name, name, value)
  File "tools/net/ynl/lib/ynl.py", line 344, in _add_attr
    raise Exception(f'Unknown type at {space} {name} {value} {attr["type"]}')
Exception: Unknown type at dpll pin-state 1 u8

I'm not that familiar with ynl code, but from a quick peek, I suspect
that couple other types are missing for both encoding and decoding.
Ignoring those here as I'm scratching my local itch only.

Fix the issue by adding u8 attr packing.

Signed-off-by: Jiri Pirko <jiri@nvidia.com>
Link: https://lore.kernel.org/r/20230322154242.1739136-1-jiri@resnulli.us
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

docs: networking: document NAPI

Add basic documentation about NAPI. We can stop linking to the ancient
doc on the LF wiki.

Link: https://lore.kernel.org/all/20230315223044.471002-1-kuba@kernel.org/
Reviewed-by: Bagas Sanjaya <bagasdotme@gmail.com>
Reviewed-by: Toke Høiland-Jørgensen <toke@redhat.com>
Acked-by: Pavel Pisa <pisa@cmp.felk.cvut.cz> # for ctucanfd-driver.rst
Reviewed-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Stephen Hemminger <stephen@networkplumber.org>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20230322053848.198452-1-kuba@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-03-23

We've added 8 non-merge commits during the last 13 day(s) which contain
a total of 21 files changed, 238 insertions(+), 161 deletions(-).

The main changes are:

1) Fix verification issues in some BPF programs due to their stack usage
   patterns, from Eduard Zingerman.

2) Fix to add missing overflow checks in xdp_umem_reg and return an error
   in such case, from Kal Conley.

3) Fix and undo poisoning of strlcpy in libbpf given it broke builds for
   libcs which provided the former like uClibc-ng, from Jesus Sanchez-Palencia.

4) Fix insufficient bpf_jit_limit default to avoid users running into hard
   to debug seccomp BPF errors, from Daniel Borkmann.

5) Fix driver return code when they don't support a bpf_xdp_metadata kfunc
   to make it unambiguous from other errors, from Jesper Dangaard Brouer.

6) Two BPF selftest fixes to address compilation errors from recent changes
   in kernel structures, from Alexei Starovoitov.

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  xdp: bpf_xdp_metadata use EOPNOTSUPP for no driver support
  bpf: Adjust insufficient default bpf_jit_limit
  xsk: Add missing overflow check in xdp_umem_reg
  selftests/bpf: Fix progs/test_deny_namespace.c issues.
  selftests/bpf: Fix progs/find_vma_fail1.c build error.
  libbpf: Revert poisoning of strlcpy
  selftests/bpf: Tests for uninitialized stack reads
  bpf: Allow reads from uninit stack
====================

Link: https://lore.kernel.org/r/20230323225221.6082-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'for-net-2023-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth

Luiz Augusto von Dentz says:

====================
bluetooth pull request for net:

- Fix MGMT add advmon with RSSI command
- L2CAP: Fix responding with wrong PDU type
- Fix race condition in hci_cmd_sync_clear
- ISO: Fix timestamped HCI ISO data packet parsing
- HCI: Fix global-out-of-bounds
- hci_sync: Resume adv with no RPA when active scan

* tag 'for-net-2023-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth:
  Bluetooth: HCI: Fix global-out-of-bounds
  Bluetooth: mgmt: Fix MGMT add advmon with RSSI command
  Bluetooth: btsdio: fix use after free bug in btsdio_remove due to unfinished work
  Bluetooth: L2CAP: Fix responding with wrong PDU type
  Bluetooth: btqcomsmd: Fix command timeout after setting BD address
  Bluetooth: btinel: Check ACPI handle for NULL before accessing
  Bluetooth: Remove "Power-on" check from Mesh feature
  Bluetooth: Fix race condition in hci_cmd_sync_clear
  Bluetooth: btintel: Iterate only bluetooth device ACPI entries
  Bluetooth: ISO: fix timestamped HCI ISO data packet parsing
  Bluetooth: btusb: Remove detection of ISO packets over bulk
  Bluetooth: hci_core: Detect if an ACL packet is in fact an ISO packet
  Bluetooth: hci_sync: Resume adv with no RPA when active scan
====================

Link: https://lore.kernel.org/r/20230323202335.3380841-1-luiz.dentz@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'wireless-2023-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless

Kalle Valo says:

====================
wireless fixes for v6.3

Third set of fixes for v6.3. mt76 has two kernel crash fixes and
adding back 160 MHz channel support for mt7915. mac80211 has fixes for
a race in transmit path and two mesh related fixes. iwlwifi also has
fixes for races.

* tag 'wireless-2023-03-23' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless:
  wifi: mac80211: fix mesh path discovery based on unicast packets
  wifi: mac80211: fix qos on mesh interfaces
  wifi: iwlwifi: mvm: protect TXQ list manipulation
  wifi: iwlwifi: mvm: fix mvmtxq->stopped handling
  wifi: mac80211: Serialize ieee80211_handle_wake_tx_queue()
  wifi: mwifiex: mark OF related data as maybe unused
  wifi: mt76: connac: do not check WED status for non-mmio devices
  wifi: mt76: mt7915: add back 160MHz channel width support for MT7915
  wifi: mt76: do not run mt76_unregister_device() on unregistered hw
====================

Link: https://lore.kernel.org/r/20230323110332.C4FE4C433D2@smtp.kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>

Merge tag 'gfs2-v6.3-rc3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2

Pull gfs2 fix from Andreas Gruenbacher:

- Reinstate commit 970343cd4904 ("GFS2: free disk inode which is
   deleted by remote node -V2") as reverting that commit could cause
   gfs2_put_super() to hang.

* tag 'gfs2-v6.3-rc3-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
  Reinstate "GFS2: free disk inode which is deleted by remote node -V2"

Bluetooth: HCI: Fix global-out-of-bounds

To loop a variable-length array, hci_init_stage_sync(stage) considers
that stage[i] is valid as long as stage[i-1].func is valid.
Thus, the last element of stage[].func should be intentionally invalid
as hci_init0[], le_init2[], and others did.
However, amp_init1[] and amp_init2[] have no invalid element, letting
hci_init_stage_sync() keep accessing amp_init1[] over its valid range.
This patch fixes this by adding {} in the last of amp_init1[] and
amp_init2[].

==================================================================
BUG: KASAN: global-out-of-bounds in hci_dev_open_sync (
/v6.2-bzimage/net/bluetooth/hci_sync.c:3154
/v6.2-bzimage/net/bluetooth/hci_sync.c:3343
/v6.2-bzimage/net/bluetooth/hci_sync.c:4418
/v6.2-bzimage/net/bluetooth/hci_sync.c:4609
/v6.2-bzimage/net/bluetooth/hci_sync.c:4689)
Read of size 8 at addr ffffffffaed1ab70 by task kworker/u5:0/1032
CPU: 0 PID: 1032 Comm: kworker/u5:0 Not tainted 6.2.0 #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04
Workqueue: hci1 hci_power_on
Call Trace:
<TASK>
dump_stack_lvl (/v6.2-bzimage/lib/dump_stack.c:107 (discriminator 1))
print_report (/v6.2-bzimage/mm/kasan/report.c:307
  /v6.2-bzimage/mm/kasan/report.c:417)
? hci_dev_open_sync (/v6.2-bzimage/net/bluetooth/hci_sync.c:3154
  /v6.2-bzimage/net/bluetooth/hci_sync.c:3343
  /v6.2-bzimage/net/bluetooth/hci_sync.c:4418
  /v6.2-bzimage/net/bluetooth/hci_sync.c:4609
  /v6.2-bzimage/net/bluetooth/hci_sync.c:4689)
kasan_report (/v6.2-bzimage/mm/kasan/report.c:184
  /v6.2-bzimage/mm/kasan/report.c:519)
? hci_dev_open_sync (/v6.2-bzimage/net/bluetooth/hci_sync.c:3154
  /v6.2-bzimage/net/bluetooth/hci_sync.c:3343
  /v6.2-bzimage/net/bluetooth/hci_sync.c:4418
  /v6.2-bzimage/net/bluetooth/hci_sync.c:4609
  /v6.2-bzimage/net/bluetooth/hci_sync.c:4689)
hci_dev_open_sync (/v6.2-bzimage/net/bluetooth/hci_sync.c:3154
  /v6.2-bzimage/net/bluetooth/hci_sync.c:3343
  /v6.2-bzimage/net/bluetooth/hci_sync.c:4418
  /v6.2-bzimage/net/bluetooth/hci_sync.c:4609
  /v6.2-bzimage/net/bluetooth/hci_sync.c:4689)
? __pfx_hci_dev_open_sync (/v6.2-bzimage/net/bluetooth/hci_sync.c:4635)
? mutex_lock (/v6.2-bzimage/./arch/x86/include/asm/atomic64_64.h:190
  /v6.2-bzimage/./include/linux/atomic/atomic-long.h:443
  /v6.2-bzimage/./include/linux/atomic/atomic-instrumented.h:1781
  /v6.2-bzimage/kernel/locking/mutex.c:171
  /v6.2-bzimage/kernel/locking/mutex.c:285)
? __pfx_mutex_lock (/v6.2-bzimage/kernel/locking/mutex.c:282)
hci_power_on (/v6.2-bzimage/net/bluetooth/hci_core.c:485
  /v6.2-bzimage/net/bluetooth/hci_core.c:984)
? __pfx_hci_power_on (/v6.2-bzimage/net/bluetooth/hci_core.c:969)
? read_word_at_a_time (/v6.2-bzimage/./include/asm-generic/rwonce.h:85)
? strscpy (/v6.2-bzimage/./arch/x86/include/asm/word-at-a-time.h:62
  /v6.2-bzimage/lib/string.c:161)
process_one_work (/v6.2-bzimage/kernel/workqueue.c:2294)
worker_thread (/v6.2-bzimage/./include/linux/list.h:292
  /v6.2-bzimage/kernel/workqueue.c:2437)
? __pfx_worker_thread (/v6.2-bzimage/kernel/workqueue.c:2379)
kthread (/v6.2-bzimage/kernel/kthread.c:376)
? __pfx_kthread (/v6.2-bzimage/kernel/kthread.c:331)
ret_from_fork (/v6.2-bzimage/arch/x86/entry/entry_64.S:314)
</TASK>
The buggy address belongs to the variable:
amp_init1+0x30/0x60
The buggy address belongs to the physical page:
page:000000003a157ec6 refcount:1 mapcount:0 mapping:0000000000000000 ia
flags: 0x200000000001000(reserved|node=0|zone=2)
raw: 0200000000001000 ffffea0005054688 ffffea0005054688 000000000000000
raw: 0000000000000000 0000000000000000 00000001ffffffff 000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffffffffaed1aa00: f9 f9 f9 f9 00 00 00 00 f9 f9 f9 f9 00 00 00 00
ffffffffaed1aa80: 00 00 00 00 f9 f9 f9 f9 00 00 00 00 00 00 00 00
>ffffffffaed1ab00: 00 f9 f9 f9 f9 f9 f9 f9 00 00 00 00 00 00 f9 f9
                                                             ^
ffffffffaed1ab80: f9 f9 f9 f9 00 00 00 00 f9 f9 f9 f9 00 00 00 f9
ffffffffaed1ac00: f9 f9 f9 f9 00 06 f9 f9 f9 f9 f9 f9 00 00 02 f9

This bug is found by FuzzBT, a modified version of Syzkaller.
Other contributors for this bug are Ruoyu Wu and Peng Hui.

Fixes: d0b137062b2d ("Bluetooth: hci_sync: Rework init stages")
Signed-off-by: Sungwoo Kim <iam@sung-woo.kim>
Reviewed-by: Simon Horman <simon.horman@corigine.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: mgmt: Fix MGMT add advmon with RSSI command

The MGMT command: MGMT_OP_ADD_ADV_PATTERNS_MONITOR_RSSI uses variable
length argument. This causes host not able to register advmon with rssi.

This patch has been locally tested by adding monitor with rssi via
btmgmt on a kernel 6.1 machine.

Reviewed-by: Archie Pusaka <apusaka@chromium.org>
Fixes: b338d91703fa ("Bluetooth: Implement support for Mesh")
Signed-off-by: Howard Chung <howardchung@google.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: btsdio: fix use after free bug in btsdio_remove due to unfinished work

In btsdio_probe, &data->work was bound with btsdio_work.In
btsdio_send_frame, it was started by schedule_work.

If we call btsdio_remove with an unfinished job, there may
be a race condition and cause UAF bug on hdev.

Fixes: ddbaf13e3609 ("[Bluetooth] Add generic driver for Bluetooth SDIO devices")
Signed-off-by: Zheng Wang <zyytlz.wz@163.com>
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>

Bluetooth: L2CAP: Fix responding with wrong PDU type

L2CAP_ECRED_CONN_REQ shall be responded with L2CAP_ECRED_CONN_RSP not
L2CAP_LE_CONN_RSP:

L2CAP LE EATT Server - Reject - run
  Listening for connections
  New client connection with handle 0x002a
  Sending L2CAP Request from client
  Client received response code 0x15
  Unexpected L2CAP response code (expected 0x18)
L2CAP LE EATT Server - Reject - test failed

> ACL Data RX: Handle 42 flags 0x02 dlen 26
      LE L2CAP: Enhanced Credit Connection Request (0x17) ident 1 len 18
        PSM: 39 (0x0027)
        MTU: 64
        MPS: 64
        Credits: 5
        Source CID: 65
        Source CID: 66
        Source CID: 67
        Source CID: 68
        Source CID: 69
< ACL Data TX: Handle 42 flags 0x00 dlen 16
      LE L2CAP: LE Connection Response (0x15) ident 1 len 8
        invalid size
        00 00 00 00 00 00 06 00

L2CAP LE EATT Server - Reject - run
  Listening for connections
  New client connection with handle 0x002a
  Sending L2CAP Request from client
  Client received response code 0x18
L2CAP LE EATT Server - Reject - test passed

Fixes: 15f02b910562 ("Bluetooth: L2CAP: Add initial code for Enhanced Credit Based Mode")
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>