Amit Cohen [Thu, 20 Oct 2022 15:20:04 +0000 (17:20 +0200)]
mlxsw: Add support for 800Gbps link modes
Add support for 800Gbps speed, link modes of 100Gbps per lane.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Thu, 20 Oct 2022 15:20:03 +0000 (17:20 +0200)]
ethtool: Add support for 800Gbps link modes
Add support for 800Gbps speed, link modes of 100Gbps per lane.
As mentioned in slide 21 in IEEE documentation [1], all adopted 802.3df
copper and optical PMDs baselines using 100G/lane will be supported.
Add the relevant PMDs which are mentioned in slide 5 in IEEE
documentation [1] and were approved on 10-2022 [2]:
BP - KR8
Cu Cable - CR8
MMF 50m - VR8
MMF 100m - SR8
SMF 500m - DR8
SMF 2km - DR8-2
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Petr Machata <petrm@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Oct 2022 09:37:43 +0000 (10:37 +0100)]
Merge branch 'sparx5-IS2-VCAP'
Steen Hegelund says:
====================
Add support for Sparx5 IS2 VCAP
This provides initial support for the Sparx5 VCAP functionality via the
'tc' traffic control userspace tool and its flower filter.
Overview:
=========
The supported flower filter keys and actions are:
- source and destination MAC address keys
- trap action
- pass action
The supported Sparx5 VCAPs are: IS2 (see below for more info)
The VCAP (Versatile Content-Aware Processor) feature is essentially a TCAM
with rules consisting of:
- Programmable key fields
- Programmable action fields
- A counter (which may be only one bit wide)
Besides this each VCAP has:
- A number of independent lookups
- A keyset configuration typically per port per lookup
VCAPs are used in many of the TSN features such as PSFP, PTP, FRER as well
as the general shaping, policing and access control, so it is an important
building block for these advanced features.
Functionality:
==============
When a frame is passed to a VCAP the VCAP will generate a set of keys
(keyset) based on the traffic type. If there is a rule created with this
keyset in the VCAP and the values of the keys matches the values in the
keyset of the frame, the rule is said to match and the actions in the rule
will be executed and the rule counter will be incremented. No more rules
will be examined in this VCAP lookup.
If there is no match in the current lookup the frame will be matched
against the next lookup (some VCAPs do the processing of the lookups in
parallel).
The Sparx5 SoC has 6 different VCAP types:
- IS0: Ingress Stage 0 (AKA CLM) mostly handles classification
- IS2: Ingress Stage 2 mostly handles access control
- IP6PFX: IPv6 prefix: Provides tables for IPV6 address management
- LPM: Longest Path Match for IP guarding and routing
- ES0: Egress Stage 0 is mostly used for CPU copying and multicast handling
- ES2: Egress Stage 2 is known as the rewriter and mostly updates tags
Design:
=======
The VCAP implementation provides switchcore independent handling of rules
and supports:
- Creating and deleting rules
- Updating and getting rules
The platform specific API implementation as well as the platform specific
model of the VCAP instances are attached to the VCAP API and a client can
then access rules via the API in a platform independent way, with the
limitations that each VCAP has in terms of is supported keys and actions.
The VCAP model is generated from information delivered by the designers of
the VCAP hardware.
These lookups are executed in parallel by the IS2 VCAP but the actions are
executed in series (the datasheet explains what happens if actions
overlap).
The functionality of TC flower as well as TC matchall filters will be
expanded in later submissions as well as the number of VCAPs supported.
This is current plan:
- add support for more TC flower filter keys and extend the Sparx5 port
keyset configuration
- support for TC protocol all
- debugfs support for inspecting rules
- TC flower filter statistics
- Sparx5 IS0 VCAP support and more TC keys and actions to support this
- add TC policer and drop action support (depends on the Sparx5 QoS support
upstreamed separately)
- Sparx5 ES0 VCAP support and more TC actions to support this
- TC flower template support
- TC matchall filter support for mirroring and policing ports
- TC flower filter mirror action support
- Sparx5 ES2 VCAP support
The LAN966x switchcore will also be updated to use the VCAP API as well as
future Microchip switches.
The LAN966x has 3 VCAPS (IS1, IS2 and ES0) and a slightly different keyset
and actionset portfolio than Sparx5.
Version History:
================
v3 Moved the sparx5_tc_flower_set_exterr function to the VCAP API and
renamed it.
Moved the sparx5_netbytes_copy function to the VCAP_API and renamed
it (thanks Horatiu Vultur).
Fixed indentation in the vcap_write_rule function.
Added a comment mentioning the typegroup table terminator in the
vcap_iter_skip_tg function.
v2 Made the KUNIT test model a superset of the real model to fix a
kernel robot build error.
v1 Initial version
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Thu, 20 Oct 2022 13:09:02 +0000 (15:09 +0200)]
net: microchip: sparx5: Writing rules to the IS2 VCAP
This adds rule encoding functionality to the VCAP API.
A rule consists of keys and actions in separate cache sections.
The maximum size of the keyset or actionset determines the size of the
rule.
The VCAP hardware need to be able to distinguish different rule sizes from
each other, and for that purpose some extra typegroup bits are added to the
rule when it is encoded.
The API provides a bit stream iterator that allows highlevel encoding
functionality to add key and action value bits independent of typegroup
bits.
This is handled by letting the concrete VCAP model provide the typegroup
table for the different rule sizes.
After the key and action values have been added to the encoding bit streams
the typegroup bits are set to their correct values just before the rule is
written to the VCAP hardware.
The key and action offsets provided in the VCAP model are the offset before
adding the typegroup bits.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Tested-by: Casper Andersson <casper.casan@gmail.com> Reviewed-by: Casper Andersson <casper.casan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Thu, 20 Oct 2022 13:09:01 +0000 (15:09 +0200)]
net: microchip: sparx5: Adding basic rule management in VCAP API
This provides most of the rule handling needed to add a new rule to a VCAP.
To add a rule a client must follow these steps:
1) Allocate a new rule (provide an id or get one automatically assigned)
2) Add keys to the rule
3) Add actions to the rule
4) Optionally set a keyset on the rule
5) Optionally set an actionset on the rule
6) Validate the rule (this will add keyset and actionset if not specified
in the previous steps)
7) Add the rule (if the validation was successful)
8) Free the rule instance (a copy has been added to the VCAP)
The validation step will fail if there are no keysets with the requested
keys, or there are no actionsets with the requested actions.
The validation will also fail if the keyset is not configured for the port
for the requested protocol).
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Tested-by: Casper Andersson <casper.casan@gmail.com> Reviewed-by: Casper Andersson <casper.casan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Thu, 20 Oct 2022 13:09:00 +0000 (15:09 +0200)]
net: microchip: sparx5: Adding port keyset config and callback interface
This provides a default port keyset configuration for the Sparx5 IS2 VCAP
where all ports and all lookups in IS2 use the same keyset (MAC_ETYPE) for
all types of traffic.
This means that no matter what frame type is received on any front port it
will generate the MAC_ETYPE keyset in the IS VCAP and any rule in the IS2
VCAP that uses this keyset will be matched against the keys in the
MAC_ETYPE keyset.
The callback interface used by the VCAP API is populated with Sparx5
specific handler functions that takes care of the actual reading and
writing to data to the Sparx5 IS2 VCAP instance.
A few functions are also added to the VCAP API to support addition of rule
fields such as the ingress port mask and the lookup bit.
The IS2 VCAP in Sparx5 is really divided in two instances with lookup 0
and 1 in the first instance and lookup 2 and 3 in the second instance.
The lookup bit selects lookup 0 or 3 in the respective instance when it is
set.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Tested-by: Casper Andersson <casper.casan@gmail.com> Reviewed-by: Casper Andersson <casper.casan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Steen Hegelund [Thu, 20 Oct 2022 13:08:56 +0000 (15:08 +0200)]
net: microchip: sparx5: Adding initial VCAP API support
This provides the initial VCAP API framework and Sparx5 specific VCAP
implementation.
When the Sparx5 Switchdev driver is initialized it will also initialize its
VCAP module, and this hooks up the concrete Sparx5 VCAP model to the VCAP
API, so that the VCAP API knows what VCAP instances are available.
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yanguo Li [Thu, 20 Oct 2022 08:28:34 +0000 (09:28 +0100)]
nfp: flower: tunnel neigh support bond offload
Support hardware offload when tunnel neigh out port is bond.
These feature work with the nfp firmware. If the firmware
supports the NFP_FL_FEATS_TUNNEL_NEIGH_LAG feature, nfp driver
write the bond information to the firmware neighbor table or
do nothing for bond. when neighbor MAC changes, nfp driver
need to update the neighbor information too.
Signed-off-by: Yanguo Li <yanguo.li@corigine.com> Reviewed-by: Louis Peens <louis.peens@corigine.com> Signed-off-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This is a follow-up series for commit d38afeec26ed ("tcp/udp: Call
inet6_destroy_sock() in IPv6 sk->sk_destruct().").
This series cleans up unnecessary inet6_destory_sock() calls in
sk->sk_prot->destroy() and call it from sk->sk_destruct() to make
sure we do not leak memory related to IPv6 specific-resources.
sctp: Call inet6_destroy_sock() via sk->sk_destruct().
After commit d38afeec26ed ("tcp/udp: Call inet6_destroy_sock()
in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in
sk->sk_destruct() by setting inet6_sock_destruct() to it to make
sure we do not leak inet6-specific resources.
SCTP sets its own sk->sk_destruct() in the sctp_init_sock(), and
SCTPv6 socket reuses it as the init function.
To call inet6_sock_destruct() from SCTPv6 sk->sk_destruct(), we
set sctp_v6_destruct_sock() in a new init function.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dccp: Call inet6_destroy_sock() via sk->sk_destruct().
After commit d38afeec26ed ("tcp/udp: Call inet6_destroy_sock()
in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in
sk->sk_destruct() by setting inet6_sock_destruct() to it to make
sure we do not leak inet6-specific resources.
DCCP sets its own sk->sk_destruct() in the dccp_init_sock(), and
DCCPv6 socket shares it by calling the same init function via
dccp_v6_init_sock().
To call inet6_sock_destruct() from DCCPv6 sk->sk_destruct(), we
export it and set dccp_v6_sk_destruct() in the init function.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Signed-off-by: David S. Miller <davem@davemloft.net>
inet6: Remove inet6_destroy_sock() in sk->sk_prot->destroy().
After commit d38afeec26ed ("tcp/udp: Call inet6_destroy_sock()
in IPv6 sk->sk_destruct()."), we call inet6_destroy_sock() in
sk->sk_destruct() by setting inet6_sock_destruct() to it to make
sure we do not leak inet6-specific resources.
Now we can remove unnecessary inet6_destroy_sock() calls in
sk->sk_prot->destroy().
DCCP and SCTP have their own sk->sk_destruct() function, so we
change them separately in the following patches.
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 24 Oct 2022 08:22:12 +0000 (09:22 +0100)]
Merge branch 'dpaa2-eth-AF_XDP-zc'
Ioana Ciornei says:
====================
net: dpaa2-eth: AF_XDP zero-copy support
This patch set adds support for AF_XDP zero-copy in the dpaa2-eth
driver. The support is available on the LX2160A SoC and its variants and
only on interfaces (DPNIs) with a maximum of 8 queues (HW limitations
are the root cause).
We are first implementing the .get_channels() callback since this a
dependency for further work.
Patches 2-3 are working on making the necessary changes for multiple
buffer pools on a single interface. By default, without an AF_XDP socket
attached, only a single buffer pool will be used and shared between all
the queues. The changes in the functions are made in this patch, but the
actual allocation and setup of a new BP is done in patch#10.
Patches 4-5 are improving the information exposed in debugfs. We are
exposing a new file to show which buffer pool is used by what channels
and how many buffers it currently has.
The 6th patch updates the dpni_set_pools() firmware API so that we are
capable of setting up a different buffer per queue in later patches.
In the 7th patch the generic dev_open/close APIs are used instead of the
dpaa2-eth internal ones.
Patches 8-9 are rearranging the existing code in dpaa2-eth.c in order to
create new functions which will be used in the XSK implementation in
dpaa2-xsk.c
Finally, the last 3 patches are adding the actual support for both the
Rx and Tx path of AF_XDP zero-copy and some associated tracepoints.
Details on the implementation can be found in the actual patch.
Changes in v2:
- 3/12: Export dpaa2_eth_allocate_dpbp/dpaa2_eth_free_dpbp in this
patch to avoid a build warning. The functions will be used in next
patches.
- 6/12: Use __le16 instead of u16 for the dpbp_id field.
- 12/12: Use xdp_buff->data_hard_start when tracing the BP seeding.
Changes in v3:
- 3/12: fix leaking of bp on error path
====================
Acked-by: Björn Töpel <bjorn@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Define the dpaa2_tx_xsk_fd and dpaa2_rx_xsk_fd trace events for the XSK
zero-copy Rx and Tx path. Also, define the dpaa2_eth_buf as an event
class so that both dpaa2_eth_buf_seed and dpaa2_xsk_buf_seed traces can
derive from the same class.
Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add support in dpaa2-eth for packet processing on the Tx path using
AF_XDP zero copy mode.
The newly added dpaa2_xsk_tx() function will handle enqueuing AF_XDP Tx
packets into the appropriate queue and update any necessary statistics.
On a more detailed note, the dpaa2_xsk_tx_build_fd() function handles
creating a Scatter-Gather frame descriptor with only one data buffer.
This is needed because otherwise we would need to impose a headroom in
the Tx buffer to store our software annotation structures.
This tactic is already used on the normal data path of the dpaa2-eth
driver, thus we are reusing the dpaa2_eth_sgt_get/dpaa2_eth_sgt_recycle
functions in order to allocate and recycle the Scatter-Gather table
buffers.
In case we have reached the maximum number of Tx XSK packets to be sent
in a NAPI cycle, we'll exit the dpaa2_eth_poll() and hope to be
rescheduled again.
On the XSK Tx confirmation path, we are just unmapping the SGT buffer
and recycle it for further use.
Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds the support for receiving packets via the AF_XDP
zero-copy mechanism in the dpaa2-eth driver. The support is available
only on the LX2160A SoC and variants because we are relying on the HW
capability to associate a buffer pool to a specific queue (QDBIN), only
available on newer WRIOP versions.
On the control path, the dpaa2_xsk_enable_pool() function is responsible
to allocate a buffer pool (BP), setup this new BP to be used only on the
requested queue and change the consume function to point to the XSK ZC
one.
We are forced to call dev_close() in order to change the queue to buffer
pool association (dpaa2_xsk_set_bp_per_qdbin) . This also works in our
favor since at dev_close() the buffer pools will be drained and at the
later dev_open() call they will be again seeded, this time with buffers
allocated from the XSK pool if needed.
On the data path, a new software annotation type is defined to be used
only for the XSK scenarios. This will enable us to pass keep necessary
information about a packet buffer between the moment in which it was
seeded and when it's received by the driver. In the XSK case, we are
keeping the associated xdp_buff.
Depending on the action returned by the BPF program, we will do the
following:
- XDP_PASS: copy the contents of the packet into a brand new skb,
recycle the initial buffer.
- XDP_TX: just enqueue the same frame descriptor back into the Tx path,
the buffer will get automatically released into the initial BP.
- XDP_REDIRECT: call xdp_do_redirect() and exit.
Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dpaa2-eth: create and export the dpaa2_eth_receive_skb() function
Carve out code from the dpaa2_eth_rx() function in order to create and
export the dpaa2_eth_receive_skb() function. Do this in order to reuse
this code also from the XSK path which will be introduced in a later
patch.
Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dpaa2-eth: create and export the dpaa2_eth_alloc_skb function
The dpaa2_eth_alloc_skb() function is added by moving code from the
dpaa2_eth_copybreak() previously defined function. What the new API does
is to allocate a new skb, copy the frame data from the passed FD to the
new skb and then return the skb.
Export this new function since we'll need the this functionality also
from the XSK code path.
Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Tue, 18 Oct 2022 14:18:56 +0000 (17:18 +0300)]
net: dpaa2-eth: use dev_close/open instead of the internal functions
Instead of calling the internal functions which implement .ndo_stop and
.ndo_open, we can simply call dev_close and dev_open, so that we keep
the code cleaner.
Also, in the next patches we'll use the same APIs from other files
without needing to export the internal functions.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dpaa2-eth: update the dpni_set_pools() API to support per QDBIN pools
Update the dpni_set_pool() firmware API so that in the next patches we
can configure per Rx queue (per QDBIN) buffer pools.
This is a hard requirement of the AF_XDP, thus we need the newer API
version.
Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Tue, 18 Oct 2022 14:18:53 +0000 (17:18 +0300)]
net: dpaa2-eth: export the CH#<index> in the 'ch_stats' debug file
Just give out an index for each channel that we export into the debug
file in the form of CH#<index>. This is purely to help corelate each
channel information from one debugfs file to another one.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dpaa2-eth: add support for multiple buffer pools per DPNI
This patch allows the configuration of multiple buffer pools associated
with a single DPNI object, each distinct DPBP object not necessarily
shared among all queues.
The user can interogate both the number of buffer pools and the buffer
count in each buffer pool by using the .get_ethtool_stats() callback.
Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ioana Ciornei [Tue, 18 Oct 2022 14:18:51 +0000 (17:18 +0300)]
net: dpaa2-eth: rearrange variable in dpaa2_eth_get_ethtool_stats
Rearrange the variables in the dpaa2_eth_get_ethtool_stats() function so
that we adhere to the reverse Christmas tree rule.
Also, in the next patch we are adding more variables and I didn't know
where to place them with the current ordering.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dpaa2-eth: add support to query the number of queues through ethtool
The .get_channels() ethtool_ops callback is implemented and exports the
number of queues: Rx, Tx, Tx conf and Rx err.
The last two ones, Tx confirmation and Rx err, are counted as 'others'.
The .set_channels() callback is not implemented since the DPAA2
software/firmware architecture does not allow the dynamic
reconfiguration of the number of queues.
Signed-off-by: Robert-Ionut Alexa <robert-ionut.alexa@nxp.com> Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jeff Johnson [Tue, 18 Oct 2022 21:17:18 +0000 (14:17 -0700)]
net: ipa: Make QMI message rules const
Commit ff6d365898d4 ("soc: qcom: qmi: use const for struct
qmi_elem_info") allows QMI message encoding/decoding rules to be
const, so do that for IPA.
Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com> Reviewed-by: Alex Elder <elder@linaro.org> Reviewed-by: Sibi Sankar <quic_sibis@quicinc.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Kodanev [Wed, 19 Oct 2022 18:07:33 +0000 (21:07 +0300)]
sctp: remove unnecessary NULL check in sctp_association_init()
'&asoc->ulpq' passed to sctp_ulpq_init() as the first argument,
then sctp_qlpq_init() initializes it and eventually returns the
address of the struct member back. Therefore, in this case, the
return pointer cannot be NULL.
Moreover, it seems sctp_ulpq_init() has always been used only in
sctp_association_init(), so there's really no need to return ulpq
anymore.
* tag 'net-6.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (43 commits)
net: phy: dp83822: disable MDI crossover status change interrupt
net: sched: fix race condition in qdisc_graft()
net: hns: fix possible memory leak in hnae_ae_register()
wwan_hwsim: fix possible memory leak in wwan_hwsim_dev_new()
sfc: include vport_id in filter spec hash and equal()
genetlink: fix kdoc warnings
selftests: add selftest for chaining of tc ingress handling to egress
net: Fix return value of qdisc ingress handling on success
net: sched: sfb: fix null pointer access issue when sfb_init() fails
Revert "net: sched: fq_codel: remove redundant resource cleanup in fq_codel_init()"
net: sched: cake: fix null pointer access issue when cake_init() fails
ethernet: marvell: octeontx2 Fix resource not freed after malloc
netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements
netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces.
ionic: catch NULL pointer issue on reconfig
net: hsr: avoid possible NULL deref in skb_clone()
bnxt_en: fix memory leak in bnxt_nvm_test()
ip6mr: fix UAF issue in ip6mr_sk_done() when addrconf_init_net() failed
udp: Update reuse->has_conns under reuseport_lock.
net: ethernet: mediatek: ppe: Remove the unused function mtk_foe_entry_usable()
...
Linus Torvalds [Fri, 21 Oct 2022 00:00:54 +0000 (17:00 -0700)]
Merge tag 'for-6.1/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper updates from Mike Snitzer:
- Fix dm-bufio to use test_bit_acquire to properly test_bit on arches
with weaker memory ordering.
- DM core replace DMWARN with DMERR or DMCRIT for fatal errors.
- Enable WQ_HIGHPRI on DM verity target's verify_wq.
- Add documentation for DM verity's try_verify_in_tasklet option.
- Various typo and redundant word fixes in code and/or comments.
* tag 'for-6.1/dm-changes-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm clone: Fix typo in block_device format specifier
dm: remove unnecessary assignment statement in alloc_dev()
dm verity: Add documentation for try_verify_in_tasklet option
dm cache: delete the redundant word 'each' in comment
dm raid: fix typo in analyse_superblocks code comment
dm verity: enable WQ_HIGHPRI on verify_wq
dm raid: delete the redundant word 'that' in comment
dm: change from DMWARN to DMERR or DMCRIT for fatal errors
dm bufio: use the acquire memory barrier when testing for B_READING
Kees Cook [Tue, 18 Oct 2022 09:28:27 +0000 (02:28 -0700)]
net: ipa: Proactively round up to kmalloc bucket size
Instead of discovering the kmalloc bucket size _after_ allocation, round
up proactively so the allocation is explicitly made for the full size,
allowing the compiler to correctly reason about the resulting size of
the buffer through the existing __alloc_size() hint.
Felix Riemann [Tue, 18 Oct 2022 10:47:54 +0000 (12:47 +0200)]
net: phy: dp83822: disable MDI crossover status change interrupt
If the cable is disconnected the PHY seems to toggle between MDI and
MDI-X modes. With the MDI crossover status interrupt active this causes
roughly 10 interrupts per second.
As the crossover status isn't checked by the driver, the interrupt can
be disabled to reduce the interrupt load.
Fixes: 87461f7a58ab ("net: phy: DP83822 initial driver submission") Signed-off-by: Felix Riemann <felix.riemann@sma.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://lore.kernel.org/r/20221018104755.30025-1-svc.sw.rte.linux@sma.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Eric Dumazet [Tue, 18 Oct 2022 20:32:58 +0000 (20:32 +0000)]
net: sched: fix race condition in qdisc_graft()
We had one syzbot report [1] in syzbot queue for a while.
I was waiting for more occurrences and/or a repro but
Dmitry Vyukov spotted the issue right away.
<quoting Dmitry>
qdisc_graft() drops reference to qdisc in notify_and_destroy
while it's still assigned to dev->qdisc
</quoting>
Indeed, RCU rules are clear when replacing a data structure.
The visible pointer (dev->qdisc in this case) must be updated
to the new object _before_ RCU grace period is started
(qdisc_put(old) in this case).
[1]
BUG: KASAN: use-after-free in __tcf_qdisc_find.part.0+0xa3a/0xac0 net/sched/cls_api.c:1066
Read of size 4 at addr ffff88802065e038 by task syz-executor.4/21027
The buggy address belongs to the object at ffff88802065e000
which belongs to the cache kmalloc-1k of size 1024
The buggy address is located 56 bytes inside of
1024-byte region [ffff88802065e000, ffff88802065e400)
Yang Yingliang [Tue, 18 Oct 2022 12:24:51 +0000 (20:24 +0800)]
net: hns: fix possible memory leak in hnae_ae_register()
Inject fault while probing module, if device_register() fails,
but the refcount of kobject is not decreased to 0, the name
allocated in dev_set_name() is leaked. Fix this by calling
put_device(), so that name can be freed in callback function
kobject_cleanup().
Yang Yingliang [Tue, 18 Oct 2022 13:16:07 +0000 (21:16 +0800)]
wwan_hwsim: fix possible memory leak in wwan_hwsim_dev_new()
Inject fault while probing module, if device_register() fails,
but the refcount of kobject is not decreased to 0, the name
allocated in dev_set_name() is leaked. Fix this by calling
put_device(), so that name can be freed in callback function
kobject_cleanup().
sfc: include vport_id in filter spec hash and equal()
Filters on different vports are qualified by different implicit MACs and/or
VLANs, so shouldn't be considered equal even if their other match fields
are identical.
Fixes: 7c460d9be610 ("sfc: Extend and abstract efx_filter_spec to cover Huntington/EF10") Co-developed-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansen-van-vuuren@amd.com> Reviewed-by: Martin Habets <habetsm.xilinx@gmail.com> Link: https://lore.kernel.org/r/20221018092841.32206-1-pieter.jansen-van-vuuren@amd.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Kees Cook [Tue, 18 Oct 2022 09:06:33 +0000 (02:06 -0700)]
openvswitch: Use kmalloc_size_roundup() to match ksize() usage
Round up allocations with kmalloc_size_roundup() so that openvswitch's
use of ksize() is always accurate and no special handling of the memory
is needed by KASAN, UBSAN_BOUNDS, nor FORTIFY_SOURCE.
1) Missing flowi uid field in nft_fib expression, from Guillaume Nault.
This is broken since the creation of the fib expression.
2) Relax sanity check to fix bogus EINVAL error when deleting elements
belonging set intervals. Broken since 6.0-rc.
* git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
netfilter: nf_tables: relax NFTA_SET_ELEM_KEY_END set flags requirements
netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces.
====================
Jakub Kicinski [Tue, 18 Oct 2022 23:13:10 +0000 (16:13 -0700)]
genetlink: fix kdoc warnings
Address a bunch of kdoc warnings:
include/net/genetlink.h:81: warning: Function parameter or member 'module' not described in 'genl_family'
include/net/genetlink.h:243: warning: expecting prototype for struct genl_info. Prototype was for struct genl_dumpit_info instead
include/net/genetlink.h:419: warning: Function parameter or member 'net' not described in 'genlmsg_unicast'
include/net/genetlink.h:438: warning: expecting prototype for gennlmsg_data(). Prototype was for genlmsg_data() instead
include/net/genetlink.h:244: warning: Function parameter or member 'op' not described in 'genl_dumpit_info'
Jakub Kicinski [Wed, 19 Oct 2022 20:00:09 +0000 (13:00 -0700)]
Merge branch 'netlink-formatted-extacks'
Edward Cree says:
====================
netlink: formatted extacks
Currently, netlink extacks can only carry fixed string messages, which
is limiting when reporting failures in complex systems. This series
adds the ability to return printf-formatted messages, and uses it in
the sfc driver's TC offload code.
Formatted extack messages are limited in length to a fixed buffer size,
currently 80 characters. If the message exceeds this, the full message
will be logged (ratelimited) to the console and a truncated version
returned over netlink.
There is no change to the netlink uAPI; only internal kernel changes
are needed.
====================
Edward Cree [Tue, 18 Oct 2022 14:37:27 +0000 (15:37 +0100)]
netlink: add support for formatted extack messages
Include an 80-byte buffer in struct netlink_ext_ack that can be used
for scnprintf()ed messages. This does mean that the resulting string
can't be enumerated, translated etc. in the way NL_SET_ERR_MSG() was
designed to allow.
Signed-off-by: Edward Cree <ecree.xilinx@gmail.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
No need to use more than one SPI transfer for reads.
Use only one from now as ADIN1110/2111 does not tolerate
CS changes during reads.
The BCM2711/2708 SPI controllers worked fine, but the NXP
IMX8MM could not keep CS lowered during SPI bursts.
This change aims to make the ADIN1110/2111 driver compatible
with both SPI controllers, without any loss of bandwidth/other
capabilities.
Fixes: bc93e19d088b ("net: ethernet: adi: Add ADIN1110 support") Signed-off-by: Alexandru Tachici <alexandru.tachici@analog.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Blakey [Tue, 18 Oct 2022 07:34:39 +0000 (10:34 +0300)]
selftests: add selftest for chaining of tc ingress handling to egress
This test runs a simple ingress tc setup between two veth pairs,
then adds a egress->ingress rule to test the chaining of tc ingress
pipeline to tc egress piepline.
Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Blakey [Tue, 18 Oct 2022 07:34:38 +0000 (10:34 +0300)]
net: Fix return value of qdisc ingress handling on success
Currently qdisc ingress handling (sch_handle_ingress()) doesn't
set a return value and it is left to the old return value of
the caller (__netif_receive_skb_core()) which is RX drop, so if
the packet is consumed, caller will stop and return this value
as if the packet was dropped.
This causes a problem in the kernel tcp stack when having a
egress tc rule forwarding to a ingress tc rule.
The tcp stack sending packets on the device having the egress rule
will see the packets as not successfully transmitted (although they
actually were), will not advance it's internal state of sent data,
and packets returning on such tcp stream will be dropped by the tcp
stack with reason ack-of-unsent-data. See reproduction in [0] below.
Fix that by setting the return value to RX success if
the packet was handled successfully.
[0] Reproduction steps:
$ ip link add veth1 type veth peer name peer1
$ ip link add veth2 type veth peer name peer2
$ ifconfig peer1 5.5.5.6/24 up
$ ip netns add ns0
$ ip link set dev peer2 netns ns0
$ ip netns exec ns0 ifconfig peer2 5.5.5.5/24 up
$ ifconfig veth2 0 up
$ ifconfig veth1 0 up
#ingress forwarding veth1 <-> veth2
$ tc qdisc add dev veth2 ingress
$ tc qdisc add dev veth1 ingress
$ tc filter add dev veth2 ingress prio 1 proto all flower \
action mirred egress redirect dev veth1
$ tc filter add dev veth1 ingress prio 1 proto all flower \
action mirred egress redirect dev veth2
#steal packet from peer1 egress to veth2 ingress, bypassing the veth pipe
$ tc qdisc add dev peer1 clsact
$ tc filter add dev peer1 egress prio 20 proto ip flower \
action mirred ingress redirect dev veth1
#run iperf and see connection not running
$ iperf3 -s&
$ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1
#delete egress rule, and run again, now should work
$ tc filter del dev peer1 egress
$ ip netns exec ns0 iperf3 -c 5.5.5.6 -i 1
Fixes: f697c3e8b35c ("[NET]: Avoid unnecessary cloning for ingress filtering") Signed-off-by: Paul Blakey <paulb@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 18 Oct 2022 06:40:01 +0000 (09:40 +0300)]
bridge: mcast: Simplify MDB entry creation
Before creating a new MDB entry, br_multicast_new_group() will call
br_mdb_ip_get() to see if one exists and return it if so.
Therefore, simply call br_multicast_new_group() and omit the call to
br_mdb_ip_get().
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 18 Oct 2022 06:40:00 +0000 (09:40 +0300)]
bridge: mcast: Use spin_lock() instead of spin_lock_bh()
IGMPv3 / MLDv2 Membership Reports are only processed from the data path
with softIRQ disabled, so there is no need to call spin_lock_bh(). Use
spin_lock() instead.
This is consistent with how other IGMP / MLD packets are processed.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
The test group address is added and removed in v2reportleave_test().
There is no need to delete it again during cleanup as it results in the
following error message:
# bash -x ./bridge_igmp.sh
[...]
+ cleanup
+ pre_cleanup
[...]
+ ip address del dev swp4 239.10.10.10/32
RTNETLINK answers: Cannot assign requested address
+ h2_destroy
Solve by removing the unnecessary address deletion.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Tue, 18 Oct 2022 06:39:58 +0000 (09:39 +0300)]
selftests: bridge_vlan_mcast: Delete qdiscs during cleanup
The qdiscs are added during setup, but not deleted during cleanup,
resulting in the following error messages:
# ./bridge_vlan_mcast.sh
[...]
# ./bridge_vlan_mcast.sh
Error: Exclusivity flag on, cannot modify.
Error: Exclusivity flag on, cannot modify.
Solve by deleting the qdiscs during cleanup.
Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 19 Oct 2022 12:47:09 +0000 (13:47 +0100)]
Merge branch 'qdisc-null-deref'
Zhengchao Shao says:
====================
net: fix null pointer access issue in qdisc
These three patches fix the same type of problem. Set the default qdisc,
and then construct an init failure scenario when the dev qdisc is
configured on mqprio to trigger the reset process. NULL pointer access
may occur during the reset process.
---
v2: for fq_codel, revert the patch
---
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
When the default qdisc is sfb, if the qdisc of dev_queue fails to be
inited during mqprio_init(), sfb_reset() is invoked to clear resources.
In this case, the q->qdisc is NULL, and it will cause gpf issue.
The process is as follows:
qdisc_create_dflt()
sfb_init()
tcf_block_get() --->failed, q->qdisc is NULL
...
qdisc_put()
...
sfb_reset()
qdisc_reset(q->qdisc) --->q->qdisc is NULL
ops = qdisc->ops
The following is the Call Trace information:
general protection fault, probably for non-canonical address
0xdffffc0000000003: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000018-0x000000000000001f]
RIP: 0010:qdisc_reset+0x2b/0x6f0
Call Trace:
<TASK>
sfb_reset+0x37/0xd0
qdisc_reset+0xed/0x6f0
qdisc_destroy+0x82/0x4c0
qdisc_put+0x9e/0xb0
qdisc_create_dflt+0x2c3/0x4a0
mqprio_init+0xa71/0x1760
qdisc_create+0x3eb/0x1000
tc_modify_qdisc+0x408/0x1720
rtnetlink_rcv_msg+0x38e/0xac0
netlink_rcv_skb+0x12d/0x3a0
netlink_unicast+0x4a2/0x740
netlink_sendmsg+0x826/0xcc0
sock_sendmsg+0xc5/0x100
____sys_sendmsg+0x583/0x690
___sys_sendmsg+0xe8/0x160
__sys_sendmsg+0xbf/0x160
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f2164122d04
</TASK>
Fixes: e13e02a3c68d ("net_sched: SFB flow scheduler") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When the default qdisc is fq_codel, if the qdisc of dev_queue fails to be
inited during mqprio_init(), fq_codel_reset() is invoked to clear
resources. In this case, the flow is NULL, and it will cause gpf issue.
The process is as follows:
qdisc_create_dflt()
fq_codel_init()
...
q->flows_cnt = 1024;
...
q->flows = kvcalloc(...) --->failed, q->flows is NULL
...
qdisc_put()
...
fq_codel_reset()
...
flow = q->flows + i --->q->flows is NULL
The following is the Call Trace information:
general protection fault, probably for non-canonical address
0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
RIP: 0010:fq_codel_reset+0x14d/0x350
Call Trace:
<TASK>
qdisc_reset+0xed/0x6f0
qdisc_destroy+0x82/0x4c0
qdisc_put+0x9e/0xb0
qdisc_create_dflt+0x2c3/0x4a0
mqprio_init+0xa71/0x1760
qdisc_create+0x3eb/0x1000
tc_modify_qdisc+0x408/0x1720
rtnetlink_rcv_msg+0x38e/0xac0
netlink_rcv_skb+0x12d/0x3a0
netlink_unicast+0x4a2/0x740
netlink_sendmsg+0x826/0xcc0
sock_sendmsg+0xc5/0x100
____sys_sendmsg+0x583/0x690
___sys_sendmsg+0xe8/0x160
__sys_sendmsg+0xbf/0x160
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7fd272b22d04
</TASK>
Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When the default qdisc is cake, if the qdisc of dev_queue fails to be
inited during mqprio_init(), cake_reset() is invoked to clear
resources. In this case, the tins is NULL, and it will cause gpf issue.
The process is as follows:
qdisc_create_dflt()
cake_init()
q->tins = kvcalloc(...) --->failed, q->tins is NULL
...
qdisc_put()
...
cake_reset()
...
cake_dequeue_one()
b = &q->tins[...] --->q->tins is NULL
The following is the Call Trace information:
general protection fault, probably for non-canonical address
0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
RIP: 0010:cake_dequeue_one+0xc9/0x3c0
Call Trace:
<TASK>
cake_reset+0xb1/0x140
qdisc_reset+0xed/0x6f0
qdisc_destroy+0x82/0x4c0
qdisc_put+0x9e/0xb0
qdisc_create_dflt+0x2c3/0x4a0
mqprio_init+0xa71/0x1760
qdisc_create+0x3eb/0x1000
tc_modify_qdisc+0x408/0x1720
rtnetlink_rcv_msg+0x38e/0xac0
netlink_rcv_skb+0x12d/0x3a0
netlink_unicast+0x4a2/0x740
netlink_sendmsg+0x826/0xcc0
sock_sendmsg+0xc5/0x100
____sys_sendmsg+0x583/0x690
___sys_sendmsg+0xe8/0x160
__sys_sendmsg+0xbf/0x160
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
RIP: 0033:0x7f89e5122d04
</TASK>
Fixes: 046f6fd5daef ("sched: Add Common Applications Kept Enhanced (cake) qdisc") Signed-off-by: Zhengchao Shao <shaozhengchao@huawei.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 19 Oct 2022 12:25:09 +0000 (13:25 +0100)]
Merge branch 'dpaa-phylink'
Sean Anderson says:
====================
net: dpaa: Convert to phylink
This series converts the DPAA driver to phylink.
I have tried to maintain backwards compatibility with existing device
trees whereever possible. However, one area where I was unable to
achieve this was with QSGMII. Please refer to patch 2 for details.
All mac drivers have now been converted. I would greatly appreciate if
anyone has T-series or P-series boards they can test/debug this series
on. I only have an LS1046ARDB. Everything but QSGMII should work without
breakage; QSGMII needs patches 7 and 8. For this reason, the last 4
patches in this series should be applied together (and should not go
through separate trees).
Changes in v7:
- provide phylink_validate_mask_caps() helper
- Fix oops if memac_pcs_create returned -EPROBE_DEFER
- Fix using pcs-names instead of pcs-handle-names
- Fix not checking for -ENODATA when looking for sgmii pcs
- Fix 81-character line
- Simplify memac_validate with phylink_validate_mask_caps
Changes in v6:
- Remove unnecessary $ref from renesas,rzn1-a5psw
- Remove unnecessary type from pcs-handle-names
- Add maxItems to pcs-handle
- Fix 81-character line
- Fix uninitialized variable in dtsec_mac_config
Changes in v5:
- Add Lynx PCS binding
Changes in v4:
- Use pcs-handle-names instead of pcs-names, as discussed
- Don't fail if phy support was not compiled in
- Split off rate adaptation series
- Split off DPAA "preparation" series
- Split off Lynx 10G support
- t208x: Mark MAC1 and MAC2 as 10G
- Add XFI PCS for t208x MAC1/MAC2
Changes in v3:
- Expand pcs-handle to an array
- Add vendor prefix 'fsl,' to rgmii and mii properties.
- Set maxItems for pcs-names
- Remove phy-* properties from example because dt-schema complains and I
can't be bothered to figure out how to make it work.
- Add pcs-handle as a preferred version of pcsphy-handle
- Deprecate pcsphy-handle
- Remove mii/rmii properties
- Put the PCS mdiodev only after we are done with it (since the PCS
does not perform a get itself).
- Remove _return label from memac_initialization in favor of returning
directly
- Fix grabbing the default PCS not checking for -ENODATA from
of_property_match_string
- Set DTSEC_ECNTRL_R100M in dtsec_link_up instead of dtsec_mac_config
- Remove rmii/mii properties
- Replace 1000Base... with 1000BASE... to match IEEE capitalization
- Add compatibles for QSGMII PCSs
- Split arm and powerpcs dts updates
Changes in v2:
- Better document how we select which PCS to use in the default case
- Move PCS_LYNX dependency to fman Kconfig
- Remove unused variable slow_10g_if
- Restrict valid link modes based on the phy interface. This is easier
to set up, and mostly captures what I intended to do the first time.
We now have a custom validate which restricts half-duplex for some SoCs
for RGMII, but generally just uses the default phylink validate.
- Configure the SerDes in enable/disable
- Properly implement all ethtool ops and ioctls. These were mostly
stubbed out just enough to compile last time.
- Convert 10GEC and dTSEC as well
- Fix capitalization of mEMAC in commit messages
- Add nodes for QSGMII PCSs
- Add nodes for QSGMII PCSs
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Anderson [Mon, 17 Oct 2022 20:22:41 +0000 (16:22 -0400)]
arm64: dts: layerscape: Add nodes for QSGMII PCSs
Now that we actually read registers from QSGMII PCSs, it's important
that we have the correct address (instead of hoping that we're the MAC
with all the QSGMII PCSs on its bus). This adds nodes for the QSGMII
PCSs. The exact mapping of QSGMII to MACs depends on the SoC.
Since the first QSGMII PCSs share an address with the SGMII and XFI
PCSs, we only add new nodes for PCSs 2-4. This avoids address conflicts
on the bus.
Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Anderson [Mon, 17 Oct 2022 20:22:40 +0000 (16:22 -0400)]
powerpc: dts: qoriq: Add nodes for QSGMII PCSs
Now that we actually read registers from QSGMII PCSs, it's important
that we have the correct address (instead of hoping that we're the MAC
with all the QSGMII PCSs on its bus). This adds nodes for the QSGMII
PCSs. They have the same addresses on all SoCs (e.g. if QSGMIIA is
present it's used for MACs 1 through 4).
Since the first QSGMII PCSs share an address with the SGMII and XFI
PCSs, we only add new nodes for PCSs 2-4. This avoids address conflicts
on the bus.
Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Anderson [Mon, 17 Oct 2022 20:22:39 +0000 (16:22 -0400)]
powerpc: dts: t208x: Mark MAC1 and MAC2 as 10G
On the T208X SoCs, MAC1 and MAC2 support XGMII. Add some new MAC dtsi
fragments, and mark the QMAN ports as 10G.
Fixes: da414bb923d9 ("powerpc/mpc85xx: Add FSL QorIQ DPAA FMan support to the SoC device tree(s)") Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Anderson [Mon, 17 Oct 2022 20:22:38 +0000 (16:22 -0400)]
net: dpaa: Convert to phylink
This converts DPAA to phylink. All macs are converted. This should work
with no device tree modifications (including those made in this series),
except for QSGMII (as noted previously).
The mEMAC configuration is one of the tricker areas. I have tried to
capture all the restrictions across the various models. Most of the time,
we assume that if the serdes supports a mode or the phy-interface-mode
specifies it, then we support it. The only place we can't do this is
(RG)MII, since there's no serdes. In that case, we rely on a (new)
devicetree property. There are also several cases where half-duplex is
broken. Unfortunately, only a single compatible is used for the MAC, so we
have to use the board compatible instead.
The 10GEC conversion is very straightforward, since it only supports XAUI.
There is generally nothing to configure.
The dTSEC conversion is broadly similar to mEMAC, but is simpler because we
don't support configuring the SerDes (though this can be easily added) and
we don't have multiple PCSs. From what I can tell, there's nothing
different in the driver or documentation between SGMII and 1000BASE-X
except for the advertising. Similarly, I couldn't find anything about
2500BASE-X. In both cases, I treat them like SGMII. These modes aren't used
by any in-tree boards. Similarly, despite being mentioned in the driver, I
couldn't find any documented SoCs which supported QSGMII. I have left it
unimplemented for now.
Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Anderson [Mon, 17 Oct 2022 20:22:37 +0000 (16:22 -0400)]
net: fman: memac: Use lynx pcs driver
Although not stated in the datasheet, as far as I can tell PCS for mEMACs
is a "Lynx." By reusing the existing driver, we can remove the PCS
management code from the memac driver. This requires calling some PCS
functions manually which phylink would usually do for us, but we will let
it do that soon.
One problem is that we don't actually have a PCS for QSGMII. We pretend
that each mEMAC's MDIO bus has four QSGMII PCSs, but this is not the case.
Only the "base" mEMAC's MDIO bus has the four QSGMII PCSs. This is not an
issue yet, because we never get the PCS state. However, it will be once the
conversion to phylink is complete, since the links will appear to never
come up. To get around this, we allow specifying multiple PCSs in pcsphy.
This breaks backwards compatibility with old device trees, but only for
QSGMII. IMO this is the only reasonable way to figure out what the actual
QSGMII PCS is.
Additionally, we now also support a separate XFI PCS. This can allow the
SerDes driver to set different addresses for the SGMII and XFI PCSs so they
can be accessed at the same time.
Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Anderson [Mon, 17 Oct 2022 20:22:36 +0000 (16:22 -0400)]
net: fman: memac: Add serdes support
This adds support for using a serdes which has to be configured. This is
primarly in preparation for phylink conversion, which will then change the
serdes mode dynamically.
Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phylink: provide phylink_validate_mask_caps() helper
Provide a helper that restricts the link modes according to the
phylink capabilities.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
[rebased on net-next/master and added documentation] Signed-off-by: Sean Anderson <sean.anderson@seco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
At the moment, mEMACs are configured almost completely based on the
phy-connection-type. That is, if the phy interface is RGMII, it assumed
that RGMII is supported. For some interfaces, it is assumed that the
RCW/bootloader has set up the SerDes properly. This is generally OK, but
restricts runtime reconfiguration. The actual link state is never
reported.
To address these shortcomings, the driver will need additional
information. First, it needs to know how to access the PCS/PMAs (in
order to configure them and get the link status). The SGMII PCS/PMA is
the only currently-described PCS/PMA. Add the XFI and QSGMII PCS/PMAs as
well. The XFI (and 10GBASE-KR) PCS/PMA is a c45 "phy" which sits on the
same MDIO bus as SGMII PCS/PMA. By default they will have conflicting
addresses, but they are also not enabled at the same time by default.
Therefore, we can let the XFI PCS/PMA be the default when
phy-connection-type is xgmii. This will allow for
backwards-compatibility.
QSGMII, however, cannot work with the current binding. This is because
the QSGMII PCS/PMAs are only present on one MAC's MDIO bus. At the
moment this is worked around by having every MAC write to the PCS/PMA
addresses (without checking if they are present). This only works if
each MAC has the same configuration, and only if we don't need to know
the status. Because the QSGMII PCS/PMA will typically be located on a
different MDIO bus than the MAC's SGMII PCS/PMA, there is no fallback
for the QSGMII PCS/PMA.
Signed-off-by: Sean Anderson <sean.anderson@seco.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Sean Anderson [Mon, 17 Oct 2022 20:22:33 +0000 (16:22 -0400)]
dt-bindings: net: Add Lynx PCS binding
This binding is fairly bare-bones for now, since the Lynx driver doesn't
parse any properties (or match based on the compatible). We just need it
in order to prevent the PCS nodes from having phy devices attached to
them. This is not really a problem, but it is a bit inefficient.
This binding is really for three separate PCSs (SGMII, QSGMII, and XFI).
However, the driver treats all of them the same. This works because the
SGMII and XFI devices typically use the same address, and the SerDes
driver (or RCW) muxes between them. The QSGMII PCSs have the same
register layout as the SGMII PCSs. To do things properly, we'd probably
do something like
Sean Anderson [Mon, 17 Oct 2022 20:22:32 +0000 (16:22 -0400)]
dt-bindings: net: Expand pcs-handle to an array
This allows multiple phandles to be specified for pcs-handle, such as
when multiple PCSs are present for a single MAC. To differentiate
between them, also add a pcs-handle-names property.
Signed-off-by: Sean Anderson <sean.anderson@seco.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 19 Oct 2022 08:49:38 +0000 (09:49 +0100)]
Merge branch 'net-marvell-yaml'
Michał Grzelak says:
====================
net: further improvements to marvell,pp2.yaml
This patchset addresses problems with reg ranges and
additional $refs. It also limits phy-mode and aligns examples.
Best regards,
Michał
---
Changelog:
v4->v5
- drop '+' from all patternProperties
- restrict range of patternProperties to [0-2] in top level
- drop the $ref in patternProperties:'^...':properties:reg
- add patternProperties:'^...':properties:reg:maximum:2
- drop $ref in patternProperties:'^...':properties:phys
- add patternProperties:'^...':properties:phys:maxItems:1
- limit phy-mode to the subset found in dts files
- reflect the order of subnodes' properties in subnodes' required:
- restrict range of pattern to [0-2] in marvell,armada-7k-pp22 case
- restrict range of pattern to [0-1] in marvell,armada-375-pp2 case
- align to 4 spaces all examples:
- add specified maximum to allOf:if:then-else:properties:reg
v3->v4
- change commit message of first patch
- move allOf:$ref to patternProperties:'^...':$ref
- deprecate port-id in favour of reg
- move reg to front of properties list in patternProperties
- reflect the order of properties in required list in
patternProperties
- add unevaluatedProperties: false to patternProperties
- change unevaluated- to additionalProperties at top level
- add property phys: to ports subnode
- extend example binding with additional information about phys and sfp
- hook phys property to phy-consumer.yaml schema
v2->v3
- move 'reg:description' to 'allOf:if:then'
- change '#size-cells: true' and '#address-cells: true'
to '#size-cells: const: 0' and '#address-cells: const: 1'
- replace all occurences of pattern "^eth\{hex_num}*"
with "^(ethernet-)?port@[0-9]+$"
- add description in 'patternProperties:^...'
- add 'patternProperties:^...:interrupt-names:minItems: 1'
- add 'patternProperties:^...:reg:description'
- update 'patternProperties:^...:port-id:description'
- add 'patternProperties:^...:required: - reg'
- update '*:description:' to uppercase
- add 'allOf:then:required:marvell,system-controller'
- skip quotation marks from 'allOf:$ref'
- add 'else' schema to match 'allOf:if:then'
- restrict 'clocks' in 'allOf:if:then'
- restrict 'clock-names' in 'allOf:if:then'
- add #address-cells=<1>; #size-cells=<0>; in 'examples:'
- change every "ethX" to "ethernet-port@X" in 'examples:'
- add "reg" and comment in all ports in 'examples:'
- change /ethernet/eth0/phy-mode in examples://Armada-375
to "rgmii-id"
- replace each cpm_ with cp0_ in 'examples:'
- replace each _syscon0 with _clk0 in 'examples:'
- remove each eth0X label in 'examples:'
- update armada-375.dtsi and armada-cp11x.dtsi to match
marvell,pp2.yaml
v1->v2
- move 'properties' to the front of the file
- remove blank line after 'properties'
- move 'compatible' to the front of 'properties'
- move 'clocks', 'clock-names' and 'reg' definitions to 'properties'
- substitute all occurences of 'marvell,armada-7k-pp2' with
'marvell,armada-7k-pp22'
- add properties:#size-cells and properties:#address-cells
- specify list in 'interrupt-names'
- remove blank lines after 'patternProperties'
- remove '^interrupt' and '^#.*-cells$' patterns
- remove blank line after 'allOf'
- remove first 'if-then-else' block from 'allOf'
- negate the condition in allOf:if schema
- delete 'interrupt-controller' from section 'examples'
- delete '#interrupt-cells' from section 'examples'
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Thu, 13 Oct 2022 14:37:47 +0000 (16:37 +0200)]
netfilter: rpfilter/fib: Set ->flowic_uid correctly for user namespaces.
Currently netfilter's rpfilter and fib modules implicitely initialise
->flowic_uid with 0. This is normally the root UID. However, this isn't
the case in user namespaces, where user ID 0 is mapped to a different
kernel UID. By initialising ->flowic_uid with sock_net_uid(), we get
the root UID of the user namespace, thus keeping the same behaviour
whether or not we're running in a user namepspace.
Note, this is similar to commit 8bcfd0925ef1 ("ipv4: add missing
initialization for flowi4_uid"), which fixed the rp_filter sysctl.
Fixes: 622ec2c9d524 ("net: core: add UID to flows, rules, and routes") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Brett Creeley [Mon, 17 Oct 2022 23:31:23 +0000 (16:31 -0700)]
ionic: catch NULL pointer issue on reconfig
It's possible that the driver will dereference a qcq that doesn't exist
when calling ionic_reconfigure_queues(), which causes a page fault BUG.
If a reduction in the number of queues is followed by a different
reconfig such as changing the ring size, the driver can hit a NULL
pointer when trying to clean up non-existent queues.
Fix this by checking to make sure both the qcqs array and qcq entry
exists bofore trying to use and free the entry.
Fixes: 101b40a0171f ("ionic: change queue count with no reset") Signed-off-by: Brett Creeley <brett@pensando.io> Signed-off-by: Shannon Nelson <snelson@pensando.io> Link: https://lore.kernel.org/r/20221017233123.15869-1-snelson@pensando.io Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We've added 33 non-merge commits during the last 14 day(s) which contain
a total of 31 files changed, 874 insertions(+), 538 deletions(-).
The main changes are:
1) Add RCU grace period chaining to BPF to wait for the completion
of access from both sleepable and non-sleepable BPF programs,
from Hou Tao & Paul E. McKenney.
2) Improve helper UAPI by explicitly defining BPF_FUNC_xxx integer
values. In the wild we have seen OS vendors doing buggy backports
where helper call numbers mismatched. This is an attempt to make
backports more foolproof, from Andrii Nakryiko.
3) Add libbpf *_opts API-variants for bpf_*_get_fd_by_id() functions,
from Roberto Sassu.
4) Fix libbpf's BTF dumper for structs with padding-only fields,
from Eduard Zingerman.
5) Fix various libbpf bugs which have been found from fuzzing with
malformed BPF object files, from Shung-Hsi Yu.
6) Clean up an unneeded check on existence of SSE2 in BPF x86-64 JIT,
from Jie Meng.
7) Fix various ASAN bugs in both libbpf and selftests when running
the BPF selftest suite on arm64, from Xu Kuohai.
8) Fix missing bpf_iter_vma_offset__destroy() call in BPF iter selftest
and use in-skeleton link pointer to remove an explicit bpf_link__destroy(),
from Jiri Olsa.
9) Fix BPF CI breakage by pointing to iptables-legacy instead of relying
on symlinked iptables which got upgraded to iptables-nft,
from Martin KaFai Lau.
10) Minor BPF selftest improvements all over the place, from various others.
* tag 'for-netdev' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (33 commits)
bpf/docs: Update README for most recent vmtest.sh
bpf: Use rcu_trace_implies_rcu_gp() for program array freeing
bpf: Use rcu_trace_implies_rcu_gp() in local storage map
bpf: Use rcu_trace_implies_rcu_gp() in bpf memory allocator
rcu-tasks: Provide rcu_trace_implies_rcu_gp()
selftests/bpf: Use sys_pidfd_open() helper when possible
libbpf: Fix null-pointer dereference in find_prog_by_sec_insn()
libbpf: Deal with section with no data gracefully
libbpf: Use elf_getshdrnum() instead of e_shnum
selftest/bpf: Fix error usage of ASSERT_OK in xdp_adjust_tail.c
selftests/bpf: Fix error failure of case test_xdp_adjust_tail_grow
selftest/bpf: Fix memory leak in kprobe_multi_test
selftests/bpf: Fix memory leak caused by not destroying skeleton
libbpf: Fix memory leak in parse_usdt_arg()
libbpf: Fix use-after-free in btf_dump_name_dups
selftests/bpf: S/iptables/iptables-legacy/ in the bpf_nf and xdp_synproxy test
selftests/bpf: Alphabetize DENYLISTs
selftests/bpf: Add tests for _opts variants of bpf_*_get_fd_by_id()
libbpf: Introduce bpf_link_get_fd_by_id_opts()
libbpf: Introduce bpf_btf_get_fd_by_id_opts()
...
====================