Eric Dumazet [Mon, 25 Jul 2022 20:05:54 +0000 (13:05 -0700)]
ip6mr: remove stray rcu_read_unlock() from ip6_mr_forward()
One rcu_read_unlock() should have been removed in blamed commit.
Fixes: 9b1c21d898fd ("ip6mr: do not acquire mrt_lock while calling ip6_mr_forward()") Reported-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Link: https://lore.kernel.org/r/20220725200554.2563581-1-eric.dumazet@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 26 Jul 2022 21:38:53 +0000 (14:38 -0700)]
Merge branch 'tls-rx-decrypt-from-the-tcp-queue'
Jakub Kicinski says:
====================
tls: rx: decrypt from the TCP queue
This is the final part of my TLS Rx rework. It switches from
strparser to decrypting data from skbs queued in TCP. We don't
need the full strparser for TLS, its needs are very basic.
This set gives us a small but measurable (6%) performance
improvement (continuous stream).
====================
Jakub Kicinski [Fri, 22 Jul 2022 23:50:33 +0000 (16:50 -0700)]
tls: rx: do not use the standard strparser
TLS is a relatively poor fit for strparser. We pause the input
every time a message is received, wait for a read which will
decrypt the message, start the parser, repeat. strparser is
built to delineate the messages, wrap them in individual skbs
and let them float off into the stack or a different socket.
TLS wants the data pages and nothing else. There's no need
for TLS to keep cloning (and occasionally skb_unclone()'ing)
the TCP rx queue.
This patch uses a pre-allocated skb and attaches the skbs
from the TCP rx queue to it as frags. TLS is careful never
to modify the input skb without CoW'ing / detaching it first.
Since we call TCP rx queue cleanup directly we also get back
the benefit of skb deferred free.
Overall this results in a 6% gain in my benchmarks.
Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Fri, 22 Jul 2022 23:50:32 +0000 (16:50 -0700)]
tls: rx: device: add input CoW helper
Wrap the remaining skb_cow_data() into a helper, so it's easier
to replace down the lane. The new version will change the skb
so make sure relevant pointers get reloaded after the call.
Jakub Kicinski [Fri, 22 Jul 2022 23:50:31 +0000 (16:50 -0700)]
tcp: allow tls to decrypt directly from the tcp rcv queue
Expose TCP rx queue accessor and cleanup, so that TLS can
decrypt directly from the TCP queue. The expectation
is that the caller can access the skb returned from
tcp_recv_skb() and up to inq bytes worth of data (some
of which may be in ->next skbs) and then call
tcp_read_done() when data has been consumed.
The socket lock must be held continuously across
those two operations.
Jakub Kicinski [Fri, 22 Jul 2022 23:50:30 +0000 (16:50 -0700)]
tls: rx: device: keep the zero copy status with offload
The non-zero-copy path assumes a full skb with decrypted contents.
This means the device offload would have to CoW the data. Try
to keep the zero-copy status instead, copy the data to user space.
Jakub Kicinski [Fri, 22 Jul 2022 23:50:29 +0000 (16:50 -0700)]
tls: rx: don't free the output in case of zero-copy
In the future we'll want to reuse the input skb in case of
zero-copy so we shouldn't always free darg.skb. Move the
freeing of darg.skb into the non-zc cases. All cases will
now free ctx->recv_pkt (inside let tls_rx_rec_done()).
Jakub Kicinski [Fri, 22 Jul 2022 23:50:28 +0000 (16:50 -0700)]
tls: rx: factor SW handling out of tls_rx_one_record()
After recent changes the SW side of tls_rx_one_record() can
be nicely encapsulated in its own function. Move the pad handling
as well. This will be useful for ->zc handling in tls_decrypt_device().
====================
Implement dev info and dev flash for line cards
This patchset implements two features:
1) "devlink dev info" is exposed for line card (patches 6-9)
2) "devlink dev flash" is implemented for line card gearbox
flashing (patch 10)
For every line card, "a nested" auxiliary device is created which
allows to bind the features mentioned above (patch 4).
The relationship between line card and its auxiliary dev devlink
is carried over extra line card netlink attribute (patches 3 and 5).
The first patch removes devlink_mutex from devlink_register/unregister()
which eliminates possible deadlock during devlink reload command. The
second patchset follows up with putting net pointer check into new
helper.
Examples:
$ devlink lc show pci/0000:01:00.0 lc 1
pci/0000:01:00.0:
lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0
supported_types:
16x100G
$ devlink dev show auxiliary/mlxsw_core.lc.0
auxiliary/mlxsw_core.lc.0
$ devlink dev info auxiliary/mlxsw_core.lc.0
auxiliary/mlxsw_core.lc.0:
versions:
fixed:
hw.revision 0
fw.psid MT_0000000749
running:
ini.version 4
fw 19.2010.1312
$ devlink dev flash auxiliary/mlxsw_core.lc.0 file mellanox/fw-AGB-rel-19_2010_1312-022-EVB.mfa2
====================
mlxsw: core_linecards: Introduce per line card auxiliary device
In order to be eventually able to expose line card gearbox version and
possibility to flash FW, model the line card as a separate device on
auxiliary bus.
Add the auxiliary device for provisioned line card in order to be able
to expose provisioned line card info over devlink dev info. When the
line card becomes active, there may be other additional info added to
the output.
net: devlink: introduce nested devlink entity for line card
For the purpose of exposing device info and allow flash update which is
going to be implemented in follow-up patches, introduce a possibility
for a line card to expose relation to nested devlink entity. The nested
devlink entity represents the line card.
Example:
$ devlink lc show pci/0000:01:00.0 lc 1
pci/0000:01:00.0:
lc 1 state active type 16x100G nested_devlink auxiliary/mlxsw_core.lc.0
supported_types:
16x100G
$ devlink dev show auxiliary/mlxsw_core.lc.0
auxiliary/mlxsw_core.lc.0
Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: devlink: make sure that devlink_try_get() works with valid pointer during xarray iteration
Remove dependency on devlink_mutex during devlinks xarray iteration.
The reason is that devlink_register/unregister() functions taking
devlink_mutex would deadlock during devlink reload operation of devlink
instance which registers/unregisters nested devlink instances.
The devlinks xarray consistency is ensured internally by xarray.
There is a reference taken when working with devlink using
devlink_try_get(). But there is no guarantee that devlink pointer
picked during xarray iteration is not freed before devlink_try_get()
is called.
Make sure that devlink_try_get() works with valid pointer.
Achieve it by:
1) Splitting devlink_put() so the completion is sent only
after grace period. Completion unblocks the devlink_unregister()
routine, which is followed-up by devlink_free()
2) During devlinks xa_array iteration, get devlink pointer from xa_array
holding RCU read lock and taking reference using devlink_try_get()
before unlock.
Signed-off-by: Jiri Pirko <jiri@nvidia.com> Reviewed-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Paolo Abeni [Tue, 26 Jul 2022 10:51:54 +0000 (12:51 +0200)]
Merge branch 'octeontx2-minor-tc-fixes'
Subbaraya Sundeep says:
====================
Octeontx2 minor tc fixes
This patch set fixes two problems found in tc code
wrt to ratelimiting and when installing UDP/TCP filters.
Patch 1: CN10K has different register format compared to
CN9xx hence fixes that.
Patch 2: Check flow mask also before installing a src/dst
port filter, otherwise installing for one port installs for other one too.
====================
octeontx2-pf: Fix UDP/TCP src and dst port tc filters
Check the mask for non-zero value before installing tc filters
for L4 source and destination ports. Otherwise installing a
filter for source port installs destination port too and
vice-versa.
NIX_AF_TLXX_PIR/CIR register format has changed from OcteonTx2
to CN10K. CN10K supports larger burst size. Fix burst exponent
and burst mantissa configuration for CN10K.
Also fixed 'maxrate' from u32 to u64 since 'police.rate_bytes_ps'
passed by stack is also u64.
====================
Add MTU change with stmmac interface running
This series is to permit MTU change while the interface is running.
Major rework are needed to permit to allocate a new dma conf based on
the new MTU before applying it. This is to make sure there is enough
space to allocate all the DMA queue before releasing the stmmac driver.
This was tested with a simple way to stress the network while the
interface is running.
2 ssh connection to the device:
- One generating simple traffic with while true; do free; done
- The other making the mtu change with a delay of 1 second
The connection is correctly stopped and recovered after the MTU is changed.
The first 2 patch of this series are minor fixup that fix problems
presented while testing this. One fix a problem when we renable a queue
while we are generating a new dma conf. The other is a corner case that
was notice while stressing the driver and turning down the interface while
there was some traffic.
(this is a follow-up of a simpler patch that wanted to add the same
feature. It was suggested to first try to check if it was possible to
apply the new configuration. Posting as RFC as it does major rework for
the new concept of DMA conf)
====================
net: ethernet: stmicro: stmmac: permit MTU change with interface up
Remove the limitation where the interface needs to be down to change
MTU by releasing and opening the stmmac driver to set the new MTU.
Also call the set_filter function to correctly init the port.
This permits to remove the EBUSY error while the ethernet port is
running permitting a correct MTU change if for example a DSA request
a MTU change for a switch CPU port.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: ethernet: stmicro: stmmac: generate stmmac dma conf before open
Rework the driver to generate the stmmac dma_conf before stmmac_open.
This permits a function to first check if it's possible to allocate a
new dma_config and then pass it directly to __stmmac_open and "open" the
interface with the new configuration.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: ethernet: stmicro: stmmac: move dma conf to dedicated struct
Move dma buf conf to dedicated struct. This in preparation for code
rework that will permit to allocate separate dma_conf without affecting
the priv struct.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: ethernet: stmicro: stmmac: first disable all queues and disconnect in release
Disable all queues and disconnect before tx_disable in stmmac_release to
prevent a corner case where packet may be still queued at the same time
tx_disable is called resulting in kernel panic if some packet still has
to be processed.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
net: ethernet: stmicro: stmmac: move queue reset to dedicated functions
Move queue reset to dedicated functions. This aside from a simple
cleanup is also required to allocate a dma conf without resetting the tx
queue while the device is temporarily detached as now the reset is not
part of the dma init function and can be done later in the code flow.
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Tue, 26 Jul 2022 01:20:51 +0000 (18:20 -0700)]
Merge tag 'wireless-next-2022-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next
Kalle Valo says:
====================
wireless-next patches for v5.20
Third set of patches for v5.20. MLO work continues and we have a lot
of stack changes due to that, including driver API changes. Not much
driver patches except on mt76.
Major changes:
cfg80211/mac80211
- more prepartion for Wi-Fi 7 Multi-Link Operation (MLO) support,
works with one link now
- align with IEEE Draft P802.11be_D2.0
- hardware timestamps for receive and transmit
mt76
- preparation for new chipset support
- ACPI SAR support
* tag 'wireless-next-2022-07-25' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (254 commits)
wifi: mac80211: fix link data leak
wifi: mac80211: mlme: fix disassoc with MLO
wifi: mac80211: add macros to loop over active links
wifi: mac80211: remove erroneous sband/link validation
wifi: mac80211: mlme: transmit assoc frame with address translation
wifi: mac80211: verify link addresses are different
wifi: mac80211: rx: track link in RX data
wifi: mac80211: optionally implement MLO multicast TX
wifi: mac80211: expand ieee80211_mgmt_tx() for MLO
wifi: nl80211: add MLO link ID to the NL80211_CMD_FRAME TX API
wifi: mac80211: report link ID to cfg80211 on mgmt RX
wifi: cfg80211: report link ID in NL80211_CMD_FRAME
wifi: mac80211: add hardware timestamps for RX and TX
wifi: cfg80211: add hardware timestamps to frame RX info
wifi: cfg80211/nl80211: move rx management data into a struct
wifi: cfg80211: add a function for reporting TX status with hardware timestamps
wifi: nl80211: add RX and TX timestamp attributes
wifi: ieee80211: add helper functions for detecting TM/FTM frames
wifi: mac80211_hwsim: handle links for wmediumd/virtio
wifi: mac80211: sta_info: fix link_sta insertion
...
====================
These properties are inherited from ethernet-controller.yaml.
This fixes the dt_binding_check warning:
imx8mm-tqma8mqml-mba8mx.dt.yaml: ethernet@30be0000: 'nvmem-cell-names',
'nvmem-cells' do not match any of the regexes: 'pinctrl-[0-9]+'
Lorenzo Bianconi [Fri, 22 Jul 2022 07:06:19 +0000 (09:06 +0200)]
net: ethernet: mtk-ppe: fix traffic offload with bridged wlan
A typical flow offload scenario for OpenWrt users is routed traffic
received by the wan interface that is redirected to a wlan device
belonging to the lan bridge. Current implementation fails to
fill wdma offload info in mtk_flow_get_wdma_info() since odev device is
the local bridge. Fix the issue running dev_fill_forward_path routine in
mtk_flow_get_wdma_info in order to identify the wlan device.
Tested-by: Paolo Valerio <pvalerio@redhat.com> Signed-off-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset includes various preparations required for Spectrum-2 PTP
support.
Most of the changes are non-functional (e.g., renaming, adding
registers). The only intentional user visible change is in patch #10
where the PHC time is initialized to zero in accordance with the
recommendation of the PTP maintainer.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The function mlxsw_sp_ptp_phc_adjfreq() configures MTUTC register to adjust
hardware frequency by a given value.
This configuration will be same for Spectrum-2. In preparation for
Spectrum-2 PTP support, rename the function to not be Spectrum-1 specific.
Later, it will be used for Spectrum-2 also.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Spectrum-1 and Spectrum-2 differ in their time stamping capabilities.
The former can be configured to time stamp only a subset of received PTP
events (e.g., only Sync), whereas the latter will time stamp all PTP
events or none.
In preparation for Spectrum-2 PTP support, rename the function that
parses the hardware time stamping configuration upon %SIOCSHWTSTAMP to
be Spectrum-1 specific.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 24 Jul 2022 08:03:27 +0000 (11:03 +0300)]
mlxsw: spectrum_ptp: Use 'struct mlxsw_sp_ptp_clock' per ASIC
Currently, there is one shared structure that holds the required
structures for PTP clock. Most of the existing fields are relevant only
for Spectrum-1 (cycles, timecounter, and more). Rename the structure to
be specific for Spectrum-1 and align the existing code. Add a common
structure which includes the structures which will be used also for
Spectrum-2. This structure will be returned from clock_init() operation,
as the definition is shared between all ASICs' operations.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 24 Jul 2022 08:03:26 +0000 (11:03 +0300)]
mlxsw: spectrum_ptp: Use 'struct mlxsw_sp_ptp_state' per ASIC
Currently, there is one shared structure that holds the required
structures and details for PTP. Most of the existing fields are relevant
only for Spectrum-1 (hash table, lock for hash table, delayed work, and
more). Rename the structure to be specific for Spectrum-1 and align the
existing code. Add a common structure which includes
'struct mlxsw_sp *mlxsw_sp' and will be returned from ptp_init()
operation, as the definition is shared between all ASICs' operations.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 24 Jul 2022 08:03:25 +0000 (11:03 +0300)]
mlxsw: pci: Simplify FRC clock reading
Currently, the reading of FRC values (high and low) is done using macro
which calls to a function. In addition, to calculate the offset of FRC,
a simple macro is used. This code can be simplified by adding an helper
function and calculating the offset explicitly instead of using an
additional macro for that.
Add the helper function and convert the existing code. This helper will be
used later to read UTC clock.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 24 Jul 2022 08:03:24 +0000 (11:03 +0300)]
mlxsw: spectrum_ptp: Initialize the clock to zero as part of initialization
As lately recommended in the mailing list[1], set the clock to zero time as
part of initialization.
The idea is that when the clock reads 'Jan 1, 1970', then it is clearly
wrong and user will not mistakenly think that the clock is set correctly.
If as part of initialization, the driver sets the clock, user might see
correct date and time (maybe with a small shift) and assume that there
is no need to sync the clock.
Fix the existing code of Spectrum-1 to set the 'timecounter' to zero.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: Rename 'read_frc_capable' bit to 'read_clock_capable'
Rename the 'read_frc_capable' bit to 'read_clock_capable' since now it can
be both the FRC and UTC clocks.
Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Sun, 24 Jul 2022 08:03:22 +0000 (11:03 +0300)]
mlxsw: resources: Add resource identifier for maximum number of FIDs
Add a resource identifier for maximum number of FIDs so that it could be
later used to query the information from firmware.
In Spectrum-2 and Spectrum-3, the correction field of PTP packets which are
sent as control packets is not updated at egress port. To overcome this
limitation, some packets will be sent as data packets. The header should
include FID, which is supposed to be 'Max FID + port - 1'. As preparation,
add the required resource, to be able to query the value from firmware
later.
Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum: Fix the shift of FID field in TX header
Currently, the field FID in TX header is defined, but is not used as it is
relevant only for data packets. mlxsw driver currently sends all
host-generated traffic as control packets and not as data packets.
In Spectrum-2 and Spectrum-3, the correction field of PTP packets which
are sent as control packets is not updated at egress port. To overcome this
limitation while adding support for PTP, some packets will be sent as data
packets.
Fix the wrong shift in the definition, to allow using the field later.
Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: Set time stamp type as part of config profile
The type of time stamp field in the CQE is configured via the
CONFIG_PROFILE command during driver initialization. Add the definition
of the relevant fields to the command's payload and set the type to UTC
for Spectrum-2 and above. This configuration can be done as part of the
preparations to PTP support, as the type of the time stamp will not break
any existing behavior.
Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: cmd: Add UTC related fields to query firmware command
Add UTC sec and nsec PCI BAR and offset to query firmware command for a
future use.
Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: pci_hw: Add 'time_stamp' and 'time_stamp_type' fields to CQEv2
The Completion Queue Element version 2 (CQEv2) includes various metadata
fields of packets.
Add 'time_stamp' and 'time_stamp_type' fields along with functions to
extract the seconds and nanoseconds for a future use.
Signed-off-by: Danielle Ratson <danieller@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: reg: Add Monitoring Time Precision Correction Port Configuration Register
In Spectrum-2, all the packets are time stamped, the MTPCPC register is
used to configure the types of packets that will adjust the correction
field and which port will trap PTP packets.
If ingress correction is set on a port for a given packet type, then
when such a packet is received via the port, the current time stamp is
subtracted from the correction field.
If egress correction is set on a port for a given packet type, then when
such a packet is transmitted via the port, the current time stamp is
added to the correction field.
Assuming the systems is configured correctly, the above means that the
correction field will contain the transient delay between the ports.
Add this register for a future use in order to support PTP in Spectrum-2.
Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: reg: Add MTUTC register's fields for supporting PTP in Spectrum-2
The MTUTC register configures the HW UTC counter.
Add the relevant fields and operations to support PTP in Spectrum-2 and
update mlxsw_reg_mtutc_pack() with the new fields for a future use.
Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: Rename mlxsw_reg_mtptptp_pack() to mlxsw_reg_mtptpt_pack()
The right name of the register is MTPTPT, which refers to Monitoring
Precision Time Protocol Trap Register.
Therefore, rename the function mlxsw_reg_mtptptp_pack() to
mlxsw_reg_mtptpt_pack().
Signed-off-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: Amit Cohen <amcohen@nvidia.com> Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: macb: Update tsu clk usage in runtime suspend/resume for Versal
On Versal TSU clock cannot be disabled irrespective of whether PTP is
used. Hence introduce a new Versal config structure with a "need tsu"
caps flag and check the same in runtime_suspend/resume before cutting
off clocks.
More information on this for future reference:
This is an IP limitation on versions 1p11 and 1p12 when Qbv is enabled
(See designcfg1, bit 3). However it is better to rely on an SoC specific
check rather than the IP version because tsu clk property itself may not
represent actual HW tsu clock on some chip designs.
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Reviewed-by: Claudiu Beznea <claudiu.beznea@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@xilinx.com> Signed-off-by: Harini Katakam <harini.katakam@xilinx.com> Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 25 Jul 2022 09:38:58 +0000 (10:38 +0100)]
Merge branch 'mtk_eth_soc-xdp'
Lorenzo Bianconi says:
====================
mtk_eth_soc: add xdp support
Introduce XDP support for mtk_eth_soc driver if rx hwlro is not
enabled in the chipset (e.g. mt7986).
Supported XDP verdicts:
- XDP_PASS
- XDP_DROP
- XDP_REDIRECT
- XDP_TX
- ndo_xdp_xmit
Rely on page_pool allocator for single page buffers in order to keep
them dma mapped and add skb recycling support.
Matthias May [Thu, 21 Jul 2022 20:27:19 +0000 (22:27 +0200)]
ip_tunnels: allow VXLAN/GENEVE to inherit TOS/TTL from VLAN
The current code allows for VXLAN and GENEVE to inherit the TOS
respective the TTL when skb-protocol is ETH_P_IP or ETH_P_IPV6.
However when the payload is VLAN encapsulated, then this inheriting
does not work, because the visible skb-protocol is of type
ETH_P_8021Q or ETH_P_8021AD.
====================
net: usb: ax88179_178a: improvements and bug fixes
Power management was partially broken. There were two issues when dropping
into a sleep state.
1. Resume was not doing a fully HW restore. Only a partial restore. This
lead to a couple things being broken on resume. One of them being tcp rx.
2. wolopt was not being restored properly on resume.
Also did some general improvements and clean up to make it easier to fix
the issues mentioned above.
====================
- Check if wol is supported on reset instead of everytime get_wol
is called.
- Save wolopts in private data instead of relying on the HW to save it.
- Defer enabling WoL until suspend instead of enabling it everytime
set_wol is called.
Signed-off-by: Justin Chen <justinpopo6@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
We need more space to save WoL context. So lets allocate memory
for ax88179_data instead of using struct usbnet data field which
only supports 5 words. We continue to use the struct usbnet data
field for multicast filters. However since we no longer have the
private data stored there, we can shift it to the beginning.
Signed-off-by: Justin Chen <justinpopo6@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Jakub Kicinski [Sat, 23 Jul 2022 02:00:17 +0000 (19:00 -0700)]
Merge tag 'for-net-next-2022-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next
Luiz Augusto von Dentz says:
====================
bluetooth-next pull request for net-next:
- Add support for IM Networks PID 0x3568
- Add support for BCM4349B1
- Add support for CYW55572
- Add support for MT7922 VID/PID 0489/e0e2
- Add support for Realtek RTL8852C
- Initial support for Isochronous Channels/ISO sockets
- Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING quirk
* tag 'for-net-next-2022-07-22' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (58 commits)
Bluetooth: btusb: Detect if an ACL packet is in fact an ISO packet
Bluetooth: btusb: Add support for ISO packets
Bluetooth: ISO: Add broadcast support
Bluetooth: Add initial implementation of BIS connections
Bluetooth: Add BTPROTO_ISO socket type
Bluetooth: Add initial implementation of CIS connections
Bluetooth: hci_core: Introduce hci_recv_event_data
Bluetooth: Convert delayed discov_off to hci_sync
Bluetooth: Remove update_scan hci_request dependancy
Bluetooth: Remove dead code from hci_request.c
Bluetooth: btrtl: Fix typo in comment
Bluetooth: MGMT: Fix holding hci_conn reference while command is queued
Bluetooth: mgmt: Fix using hci_conn_abort
Bluetooth: Use bt_status to convert from errno
Bluetooth: Add bt_status
Bluetooth: hci_sync: Split hci_dev_open_sync
Bluetooth: hci_sync: Refactor remove Adv Monitor
Bluetooth: hci_sync: Refactor add Adv Monitor
Bluetooth: hci_sync: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING
Bluetooth: btusb: Remove HCI_QUIRK_BROKEN_ERR_DATA_REPORTING for fake CSR
...
====================
This adds broadcast support for BTPROTO_ISO by extending the
sockaddr_iso with a new struct sockaddr_iso_bc where the socket user
can set the broadcast address when receiving, the SID and the BIS
indexes it wants to synchronize.
When using BTPROTO_ISO for broadcast the roles are:
Broadcaster -> uses connect with address set to BDADDR_ANY:
> tools/isotest -s 00:00:00:00:00:00
Broadcast Receiver -> uses listen with address set to broadcaster:
> tools/isotest -d 00:AA:01:00:00:00
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
This introduces a new socket type BTPROTO_ISO which can be enabled with
use of ISO Socket experiemental UUID, it can used to initiate/accept
connections and transfer packets between userspace and kernel similarly
to how BTPROTO_SCO works:
Central -> uses connect with address set to destination bdaddr:
> tools/isotest -s 00:AA:01:00:00:00
Peripheral -> uses listen:
> tools/isotest -d
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
We've added 73 non-merge commits during the last 12 day(s) which contain
a total of 88 files changed, 3458 insertions(+), 860 deletions(-).
The main changes are:
1) Implement BPF trampoline for arm64 JIT, from Xu Kuohai.
2) Add ksyscall/kretsyscall section support to libbpf to simplify tracing kernel
syscalls through kprobe mechanism, from Andrii Nakryiko.
3) Allow for livepatch (KLP) and BPF trampolines to attach to the same kernel
function, from Song Liu & Jiri Olsa.
4) Add new kfunc infrastructure for netfilter's CT e.g. to insert and change
entries, from Kumar Kartikeya Dwivedi & Lorenzo Bianconi.
5) Add a ksym BPF iterator to allow for more flexible and efficient interactions
with kernel symbols, from Alan Maguire.
6) Bug fixes in libbpf e.g. for uprobe binary path resolution, from Dan Carpenter.
7) Fix BPF subprog function names in stack traces, from Alexei Starovoitov.
8) libbpf support for writing custom perf event readers, from Jon Doron.
9) Switch to use SPDX tag for BPF helper man page, from Alejandro Colomar.
10) Fix xsk send-only sockets when in busy poll mode, from Maciej Fijalkowski.
11) Reparent BPF maps and their charging on memcg offlining, from Roman Gushchin.
12) Multiple follow-up fixes around BPF lsm cgroup infra, from Stanislav Fomichev.
13) Use bootstrap version of bpftool where possible to speed up builds, from Pu Lehui.
14) Cleanup BPF verifier's check_func_arg() handling, from Joanne Koong.
15) Make non-prealloced BPF map allocations low priority to play better with
memcg limits, from Yafang Shao.
16) Fix BPF test runner to reject zero-length data for skbs, from Zhengchao Shao.
17) Various smaller cleanups and improvements all over the place.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (73 commits)
bpf: Simplify bpf_prog_pack_[size|mask]
bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)
bpf, x64: Allow to use caller address from stack
ftrace: Allow IPMODIFY and DIRECT ops on the same function
ftrace: Add modify_ftrace_direct_multi_nolock
bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test
bpf: Fix build error in case of !CONFIG_DEBUG_INFO_BTF
selftests/bpf: Fix test_verifier failed test in unprivileged mode
selftests/bpf: Add negative tests for new nf_conntrack kfuncs
selftests/bpf: Add tests for new nf_conntrack kfuncs
selftests/bpf: Add verifier tests for trusted kfunc args
net: netfilter: Add kfuncs to set and change CT status
net: netfilter: Add kfuncs to set and change CT timeout
net: netfilter: Add kfuncs to allocate and insert CT
net: netfilter: Deduplicate code in bpf_{xdp,skb}_ct_lookup
bpf: Add documentation for kfuncs
bpf: Add support for forcing kfunc args to be trusted
bpf: Switch to new kfunc flags infrastructure
tools/resolve_btfids: Add support for 8-byte BTF sets
bpf: Introduce 8-byte BTF set
...
====================
Pavel Begunkov [Thu, 21 Jul 2022 14:25:46 +0000 (15:25 +0100)]
net: fix uninitialised msghdr->sg_from_iter
Because of how struct msghdr is usually initialised some fields and
sg_from_iter in particular might be left out not initialised, so we
can't safely use it in __zerocopy_sg_from_iter().
For now use the callback only when there is ->msg_ubuf set relying on
the fact that they're used together and we properly zero ->msg_ubuf.
Fixes: ebe73a284f4de8 ("net: Allow custom iter handler in msghdr") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Message-Id: <ce8b68b41351488f79fd998b032b3c56e9b1cc6c.1658401817.git.asml.silence@gmail.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
This introduces hci_recv_event_data to make it simpler to access the
contents of last received event rather than having to pass its contents
to the likes of *_ind/*_cfm callbacks.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Song Liu [Wed, 13 Jul 2022 20:49:50 +0000 (13:49 -0700)]
bpf: Simplify bpf_prog_pack_[size|mask]
Simplify the logic that selects bpf_prog_pack_size, and always use
(PMD_SIZE * num_possible_nodes()). This is a good tradeoff, as most of
the performance benefit observed is from less direct map fragmentation [0].
Also, module_alloc(4MB) may not allocate 4MB aligned memory. Therefore,
we cannot use (ptr & bpf_prog_pack_mask) to find the correct address of
bpf_prog_pack. Fix this by checking the header address falls in the range
of pack->ptr and (pack->ptr + bpf_prog_pack_size).
Song Liu [Wed, 20 Jul 2022 00:21:26 +0000 (17:21 -0700)]
bpf: Support bpf_trampoline on functions with IPMODIFY (e.g. livepatch)
When tracing a function with IPMODIFY ftrace_ops (livepatch), the bpf
trampoline must follow the instruction pointer saved on stack. This needs
extra handling for bpf trampolines with BPF_TRAMP_F_CALL_ORIG flag.
Implement bpf_tramp_ftrace_ops_func and use it for the ftrace_ops used
by BPF trampoline. This enables tracing functions with livepatch.
This also requires moving bpf trampoline to *_ftrace_direct_mult APIs.
Jiri Olsa [Wed, 20 Jul 2022 00:21:25 +0000 (17:21 -0700)]
bpf, x64: Allow to use caller address from stack
Currently we call the original function by using the absolute address
given at the JIT generation. That's not usable when having trampoline
attached to multiple functions, or the target address changes dynamically
(in case of live patch). In such cases we need to take the return address
from the stack.
Adding support to retrieve the original function address from the stack
by adding new BPF_TRAMP_F_ORIG_STACK flag for arch_prepare_bpf_trampoline
function.
Basically we take the return address of the 'fentry' call:
function + 0: call fentry # stores 'function + 5' address on stack
function + 5: ...
The 'function + 5' address will be used as the address for the
original function to call.
Song Liu [Wed, 20 Jul 2022 00:21:24 +0000 (17:21 -0700)]
ftrace: Allow IPMODIFY and DIRECT ops on the same function
IPMODIFY (livepatch) and DIRECT (bpf trampoline) ops are both important
users of ftrace. It is necessary to allow them work on the same function
at the same time.
First, DIRECT ops no longer specify IPMODIFY flag. Instead, DIRECT flag is
handled together with IPMODIFY flag in __ftrace_hash_update_ipmodify().
Then, a callback function, ops_func, is added to ftrace_ops. This is used
by ftrace core code to understand whether the DIRECT ops can share with an
IPMODIFY ops. To share with IPMODIFY ops, the DIRECT ops need to implement
the callback function and adjust the direct trampoline accordingly.
If DIRECT ops is attached before the IPMODIFY ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_PEER on the DIRECT ops before registering the
IPMODIFY ops.
If IPMODIFY ops is attached before the DIRECT ops, ftrace core code calls
ENABLE_SHARE_IPMODIFY_SELF in __ftrace_hash_update_ipmodify. Owner of the
DIRECT ops may return 0 if the DIRECT trampoline can share with IPMODIFY,
so error code otherwise. The error code is propagated to
register_ftrace_direct_multi so that onwer of the DIRECT trampoline can
handle it properly.
For more details, please refer to comment before enum ftrace_ops_cmd.
Brian Gix [Thu, 21 Jul 2022 23:22:25 +0000 (16:22 -0700)]
Bluetooth: Convert delayed discov_off to hci_sync
The timed ending of Discoverability was handled in hci_requst.c, with
calls using the deprecated hci_req_add() mechanism. Converted to live
inside mgmt.c using the same delayed work queue, but with hci_sync
version of hci_update_discoverable().
Signed-off-by: Brian Gix <brian.gix@intel.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Bluetooth: MGMT: Fix holding hci_conn reference while command is queued
This removes the use of hci_conn_hold from Get Conn Info and Get Clock
Info since the callback can just do a lookup by address using the cmd
data and only then set cmd->user_data to pass to the complete callback.
Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
Avinash Dayanand [Tue, 21 Jun 2022 18:39:33 +0000 (14:39 -0400)]
iavf: Check for duplicate TC flower filter before parsing
Record of all the TC flower filters are kept for local book keeping, so
take advantage of that and check for duplicate filter even before sending
a request to the PF driver.
Signed-off-by: Avinash Dayanand <avinash.dayanand@intel.com> Signed-off-by: Jun Zhang <xuejun.zhang@intel.com> Tested-by: Bharathi Sreenivas <bharathi.sreenivas@intel.com> Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
bpf/selftests: Fix couldn't retrieve pinned program in xdp veth test
Before change:
selftests: bpf: test_xdp_veth.sh
Couldn't retrieve pinned program '/sys/fs/bpf/test_xdp_veth/progs/redirect_map_0': No such file or directory
selftests: xdp_veth [SKIP]
ok 20 selftests: bpf: test_xdp_veth.sh # SKIP
After change:
PING 10.1.1.33 (10.1.1.33) 56(84) bytes of data.
64 bytes from 10.1.1.33: icmp_seq=1 ttl=64 time=0.320 ms
--- 10.1.1.33 ping statistics ---
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.320/0.320/0.320/0.000 ms
selftests: xdp_veth [PASS]
For the test case, the following can be found:
ls /sys/fs/bpf/test_xdp_veth/progs/redirect_map_0
ls: cannot access '/sys/fs/bpf/test_xdp_veth/progs/redirect_map_0': No such file or directory
ls /sys/fs/bpf/test_xdp_veth/progs/
xdp_redirect_map_0 xdp_redirect_map_1 xdp_redirect_map_2