Jakub Kicinski [Tue, 17 Dec 2019 22:12:00 +0000 (14:12 -0800)]
nfp: pass packet pointer to nfp_net_parse_meta()
Make nfp_net_parse_meta() take a packet pointer and return
a drop/no drop decision. Right now it returns the end of
metadata and caller compares it to the packet pointer.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 20 Dec 2019 01:37:13 +0000 (17:37 -0800)]
Merge branch 'nfp-ipv6-tunnel'
John Hurley says:
====================
Add ipv6 tunnel support to NFP
The following patches add support for IPv6 tunnel offload to the NFP
driver.
Patches 1-2 do some code tidy up and prepare existing code for reuse in
IPv6 tunnels.
Patches 3-4 handle IPv6 tunnel decap (match) rules.
Patches 5-8 handle encap (action) rules.
Patch 9 adds IPv6 support to the merge and pre-tunnel rule functions.
v1->v2:
- fix compiler warning when building without CONFIG_IPV6 set -
Jakub Kicinski (patch 7)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Tue, 17 Dec 2019 21:57:24 +0000 (21:57 +0000)]
nfp: flower: update flow merge code to support IPv6 tunnels
Both pre-tunnel match rules and flow merge functions parse compiled
match/action fields for validation.
Update these validation functions to include IPv6 match and action fields.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Tue, 17 Dec 2019 21:57:23 +0000 (21:57 +0000)]
nfp: flower: support ipv6 tunnel keep-alive messages from fw
FW sends an update of IPv6 tunnels that are active in a given period. Use
this information to update the kernel table so that neighbour entries do
not time out when active on the NIC.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Tue, 17 Dec 2019 21:57:22 +0000 (21:57 +0000)]
nfp: flower: handle notifiers for ipv6 route changes
A notifier is used to track route changes in the kernel. If a change is
made to a route that is offloaded to fw then an update is sent to the NIC.
The driver tracks all routes that are offloaded to determine if a kernel
change is of interest.
Extend the notifier to track IPv6 route changes and create a new list that
stores offloaded IPv6 routes. Modify the IPv4 route helper functions to
accept varying address lengths. This way, the same core functions can be
used to handle IPv4 and IPv6.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Tue, 17 Dec 2019 21:57:21 +0000 (21:57 +0000)]
nfp: flower: handle ipv6 tunnel no neigh request
When fw does not know the next hop for an IPv6 tunnel, it sends a request
to the driver.
Handle this request by doing a route lookup on the IPv6 address and
offloading the next hop to the fw neighbour table.
Similar functions already exist to handle IPv4 no neighbour requests. To
avoid confusion, append these functions with the _ipv4 tag. There is no
change in functionality with this.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Tue, 17 Dec 2019 21:57:20 +0000 (21:57 +0000)]
nfp: flower: modify pre-tunnel and set tunnel action for ipv6
The IPv4 set tunnel action allows the setting of tunnel metadata such as
the TTL and ToS values. The pre-tunnel action includes the destination IP
address and is used to calculate the next hop from from the neighbour
table.
Much of the IPv4 tunnel actions can be reused for IPv6 tunnels. Change the
names of associated functions and structs to remove the IPv4 identifier
and make minor modifcations to support IPv6 tunnel actions.
Ensure the pre-tunnel action contains the IPv6 address along with an
identifying flag when an IPv6 tunnel action is required.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Tue, 17 Dec 2019 21:57:19 +0000 (21:57 +0000)]
nfp: flower: offload list of IPv6 tunnel endpoint addresses
Fw requires a list of IPv6 addresses that are used as tunnel endpoints to
enable correct decap of tunneled packets.
Store a list of IPv6 endpoints used in rules with a ref counter to track
how many times it is in use. Offload the entire list any time a new IPv6
address is added or when an address is removed (ref count is 0).
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Tue, 17 Dec 2019 21:57:18 +0000 (21:57 +0000)]
nfp: flower: compile match for IPv6 tunnels
IPv6 tunnel matches are now supported by firmware. Modify the NFP driver
to compile these match rules. IPv6 matches are handled similar to IPv4
tunnels with the difference the address length. The type of tunnel is
indicated by the same bitmap that is used in IPv4 with an extra bit
signifying that the IPv6 variation should be used.
Only compile IPv6 tunnel matches when the fw features symbol indicated
that they are compatible with the currently loaded fw.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Tue, 17 Dec 2019 21:57:17 +0000 (21:57 +0000)]
nfp: flower: move udp tunnel key match compilation to helper function
IPv4 UDP and GRE tunnel match rule compile helpers share functions for
compiling fields such as IP addresses. However, they handle fields such
tunnel IDs differently.
Create new helper functions for compiling GRE and UDP tunnel key data.
This is in preparation for supporting IPv6 tunnels where these new
functions can be reused.
This patch does not change functionality.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Tue, 17 Dec 2019 21:57:16 +0000 (21:57 +0000)]
nfp: flower: pass flow rule pointer directly to match functions
In kernel 5.1, the flow offload API was introduced along with a helper
function to extract the flow_rule from the TC offload struct. Each of the
match helper functions are passed the offload struct and extract the flow
rule to a local variable.
Simplify the code while also removing the extra compat and local variable
calls by extracting the rule once in the main match handler, and passing
a reference to the rule direct to each helper.
This patch does not change driver functionality.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Aditya Pakki [Tue, 17 Dec 2019 21:06:19 +0000 (15:06 -0600)]
hdlcdrv: replace unnecessary assertion in hdlcdrv_register
In hdlcdrv_register, failure to register the driver causes a crash.
The three callers of hdlcdrv_register all pass valid pointers and
do not fail. The patch eliminates the unnecessary BUG_ON assertion.
Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 17 Dec 2019 15:47:36 +0000 (15:47 +0000)]
net: mvpp2: cycle comphy to power it down
Presently, at boot time, the comphys are enabled. For firmware
compatibility reasons, the comphy driver does not power down the
comphys at boot. Consequently, the ethernet comphys are left active
until the network interfaces are brought through an up/down cycle.
If the port is never used, the port wastes power needlessly. Arrange
for the ethernet comphys to be cycled by the mvpp2 driver as if the
interface went through an up/down cycle during driver probe, thereby
powering them down.
This saves:
270mW per 10G SFP+ port on the Macchiatobin Single Shot (eth0/eth1)
370mW per 10G PHY port on the Macchiatobin Double Shot (eth0/eth1)
160mW on the SFP port on either Macchiatobin flavour (eth3)
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 17 Dec 2019 13:50:29 +0000 (13:50 +0000)]
net: sfp: report error on failure to read sfp soft status
Report a rate-limited error if we fail to read the SFP soft status,
and preserve the current status in that case. This avoids I2C bus
errors from triggering a link flap.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 19 Dec 2019 20:52:34 +0000 (12:52 -0800)]
Merge branch 'phylib-consolidation'
Russell King says:
====================
phylib consolidation
Over the last few releases, there has been a push to clean up and
consolidate the phylib code. Some cases have been missed, and this
series catches those cases.
1. Remove redundant .aneg_done initialisers; calling genphy_aneg_done()
for clause 22 PHYs is the default when .aneg_done is not set.
2. Some PHY drivers manually set phydev->pause and phydev->asym_pause,
but we have a helper for this - phy_resolve_aneg_pause(), introduced
in 2d880b8709c0 ("net: phy: extract pause mode"). Use this in the
lxt, marvell and uPD60620 drivers.
Incidentally, this brings up the question whether marvell fiber mode
is correctly interpreting and advertising the pause parameters.
3. Add a genphy_check_and_restart_aneg() helper, which complements the
clause 45 version of this. This will be useful for PHY drivers that
open code this logic (e.g. marvell.c)
4. Add a genphy_read_status_fixed() helper to read the fixed-mode
status from a clause 22 PHY. lxt and marvell both contain copies
of this code, so convert them over.
5. Arrange marvell driver to use genphy_read_lpa() for copper mode.
This needs some rearrangement of the code in
marvell_read_status_page_an(), but preserves using the PHY specific
status register to derive the current negotiation results.
6. Simplify the marvell driver so we can use the
genphy_read_status_fixed() helper directly rather than
marvell_read_status_page_fixed().
7. Use positive logic in the marvell driver to determine the link
state, and get rid of the REGISTER_LINK_STATUS definition; we
already have a definition for this.
8. The marvell driver reads the PHY specific status register multiple
times when determining the status: once in marvell_update_link()
and again in marvell_read_status_page_an(). This is a waste;
rearrange to read the status register once, and pass its value into
marvell_read_status_page_an(). We preserve using
genphy_update_link() for the copper side.
9. The marvell driver was using private clause 37 definitions, but we
have clause 37 definitions in uapi/linux/mii.h. Use the generic
definitions.
10. Switch the marvell driver to use phy_modify_changed() to modify
the fiber advertisement.
11. Switch the marvell driver to use genphy_check_and_restart_aneg()
introduced above rather than open-coding this functionality.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 17 Dec 2019 13:39:52 +0000 (13:39 +0000)]
net: phy: marvell: use genphy_check_and_restart_aneg()
Use the helper to check and restart autonegotiation for the marvell
fiber page negotiation setting.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 17 Dec 2019 13:39:47 +0000 (13:39 +0000)]
net: phy: marvell: use phy_modify_changed()
Use phy_modify_changed() to change the fiber advertisement register
rather than open coding this functionality.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 17 Dec 2019 13:39:42 +0000 (13:39 +0000)]
net: phy: marvell: use existing clause 37 definitions
Use existing clause 37 advertising/link partner definitions rather than
private ones for the advertisement registers.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 17 Dec 2019 13:39:36 +0000 (13:39 +0000)]
net: phy: marvell: consolidate phy status reading
marvell_read_status_page_an() always reads the PHY status register, but
marvell_update_link() has already done this. Rather than wastefully
reading the register twice in quick succession, read it once in
marvell_read_status_page() and use the result for both.
This makes marvell_update_link() rather pointless, so move it into
marvell_read_status_page().
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 17 Dec 2019 13:39:31 +0000 (13:39 +0000)]
net: phy: marvell: use positive logic for link state
Rather than using negative logic:
if (there is no link)
set link = 0
else
set link = 1
use the more natural positive logic:
if (there is link)
set link = 1
else
set link = 0
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 17 Dec 2019 13:39:26 +0000 (13:39 +0000)]
net: phy: marvell: initialise link partner state earlier
Move the initialisation of the link partner state earlier, inside
marvell_read_status_page(), so we don't have the same initialisation
scattered amongst the other files. This is in a similar place to
the genphy implementation, so would result in the same behaviour if
a PHY read error occurs.
This allows us to get rid of marvell_read_status_page_fixed(), which
became a pointless wrapper around genphy_read_status_fixed().
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 17 Dec 2019 13:39:21 +0000 (13:39 +0000)]
net: phy: marvell: rearrange to use genphy_read_lpa()
Rearrange the Marvell PHY driver to use genphy_read_lpa() rather than
open-coding this functionality.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 17 Dec 2019 13:39:16 +0000 (13:39 +0000)]
net: phy: provide and use genphy_read_status_fixed()
There are two drivers and generic code which contain exactly the same
code to read the status of a PHY operating without autonegotiation
enabled. Rather than duplicate this code, provide a helper to read
this information.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 17 Dec 2019 13:39:11 +0000 (13:39 +0000)]
net: phy: add genphy_check_and_restart_aneg()
Add a helper for restarting autonegotiation(), similar to the clause 45
variant. Use it in __genphy_config_aneg()
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 17 Dec 2019 13:39:06 +0000 (13:39 +0000)]
net: phy: use phy_resolve_aneg_pause()
Several drivers code their own version of this, working from the LPA
register, after setting the ethtool link partner advertisement bitmask.
Use the generic function instead.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Remove initialisers that set .aneg_done to genphy_aneg_done - this is
the default for clause 22 PHYs, so the initialiser is redundant.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Wed, 18 Dec 2019 22:55:01 +0000 (23:55 +0100)]
net: stmmac: tc: Fix TAPRIO division operation
For ARCHs that don't support 64 bits division we need to use the
helpers.
Fixes: b60189e0392f ("net: stmmac: Integrate EST with TAPRIO scheduler API") Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Dec 2019 21:32:30 +0000 (13:32 -0800)]
Merge branch 'ETS-qdisc'
Petr Machata says:
====================
Add a new Qdisc, ETS
The IEEE standard 802.1Qaz (and 802.1Q-2014) specifies four principal
transmission selection algorithms: strict priority, credit-based shaper,
ETS (bandwidth sharing), and vendor-specific. All these have their
corresponding knobs in DCB. But DCB does not have interfaces to configure
RED and ECN, unlike Qdiscs.
In the Qdisc land, strict priority is implemented by PRIO. Credit-based
transmission selection algorithm can then be modeled by having e.g. TBF or
CBS Qdisc below some of the PRIO bands. ETS would then be modeled by
placing a DRR Qdisc under the last PRIO band.
The problem with this approach is that DRR on its own, as well as the
combination of PRIO and DRR, are tricky to configure and tricky to offload
to 802.1Qaz-compliant hardware. This is due to several reasons:
- As any classful Qdisc, DRR supports adding classifiers to decide in which
class to enqueue packets. Unlike PRIO, there's however no fallback in the
form of priomap. A way to achieve classification based on packet priority
is e.g. like this:
# tc filter add dev swp1 root handle 1: \
basic match 'meta(priority eq 0)' flowid 1:10
Expressing the priomap in this manner however forces drivers to deep dive
into the classifier block to parse the individual rules.
A possible solution would be to extend the classes with a "defmap" a la
split / defmap mechanism of CBQ, and introduce this as a last resort
classification. However, unlike priomap, this doesn't have the guarantee
of covering all priorities. Traffic whose priority is not covered is
dropped by DRR as unclassified. But ASICs tend to implement dropping in
the ACL block, not in scheduling pipelines. The need to treat these
configurations correctly (if only to decide to not offload at all)
complicates a driver.
It's not clear how to retrofit priomap with all its benefits to DRR
without changing it beyond recognition.
- The interplay between PRIO and DRR is also causing problems. 802.1Qaz has
all ETS TCs as a last resort. Switch ASICs that support ETS at all are
likely to handle ETS traffic this way as well. However, the Linux model
is more generic, allowing the DRR block in any band. Drivers would need
to be careful to handle this case correctly, otherwise the offloaded
model might not match the slow-path one.
In a similar vein, PRIO and DRR need to agree on the list of priorities
assigned to DRR. This is doubly problematic--the user needs to take care
to keep the two in sync, and the driver needs to watch for any holes in
DRR coverage and treat the traffic correctly, as discussed above.
Note that at the time that DRR Qdisc is added, it has no classes, and
thus any priorities assigned to that PRIO band are not covered. Thus this
case is surprisingly rather common, and needs to be handled gracefully by
the driver.
- Similarly due to DRR flexibility, when a Qdisc (such as RED) is attached
below it, it is not immediately clear which TC the class represents. This
is unlike PRIO with its straightforward classid scheme. When DRR is
combined with PRIO, the relationship between classes and TCs gets even
more murky.
This is a problem for users as well: the TC mapping is rather important
for (devlink) shared buffer configuration and (ethtool) counters.
So instead, this patch set introduces a new Qdisc, which is based on
802.1Qaz wording. It is PRIO-like in how it is configured, meaning one
needs to specify how many bands there are, how many are strict and how many
are ETS, quanta for the latter, and priomap.
The new Qdisc operates like the PRIO / DRR combo would when configured as
per the standard. The strict classes, if any, are tried for traffic first.
When there's no traffic in any of the strict queues, the ETS ones (if any)
are treated in the same way as in DRR.
The chosen interface makes the overall system both reasonably easy to
configure, and reasonably easy to offload. The extra code to support ETS in
mlxsw (which already supports PRIO) is about 150 lines, of which perhaps 20
lines is bona fide new business logic.
Credit-based shaping transmission selection algorithm can be configured by
adding a CBS Qdisc under one of the strict bands (e.g. TBF can be used to a
similar effect as well). As a non-work-conserving Qdisc, CBS can't be
hooked under the ETS bands. This is detected and handled identically to DRR
Qdisc at runtime. Note that offloading CBS is not subject of this patchset.
The patchset proceeds in four stages:
- Patches #1-#3 are cleanups.
- Patches #4 and #5 contain the new Qdisc.
- Patches #6 and #7 update mlxsw to offload the new Qdisc.
- Patches #8-#10 add selftests for ETS.
Examples:
- Add a Qdisc with 6 bands, 3 strict and 3 ETS with 45%-30%-25% weights:
- Use "bands" to specify number of bands explicitly. Underspecified bands
are implicitly ETS and their quantum is taken from MTU. The following
thus gives each band the same weight:
v2:
- This addresses points raised by David Miller.
- Patch #4:
- sch_ets.c: Add a comment with description of the Qdisc and the
dequeuing algorithm.
- Kconfig: Add a high-level description to the help blurb.
v1:
- No changes, first upstream submission after RFC.
v3 (internal):
- This addresses review from Jiri Pirko.
- Patch #3:
- Rename to _HR_ instead of to _HIERARCHY_.
- Patch #4:
- pkt_sched.h: Keep all the TCA_ETS_ constants in one enum.
- pkt_sched.h: Rename TCA_ETS_BANDS to _NBANDS, _STRICT to _NSTRICT,
_BAND_QUANTUM to _QUANTA_BAND and _PMAP_BAND to _PRIOMAP_BAND.
- sch_ets.c: Update to reflect the above changes. Add a new policy,
ets_class_policy, which is used when parsing class changes.
Currently that policy is the same as the quanta policy, but that
might change.
- sch_ets.c: Move MTU handling from ets_quantum_parse() to the one
caller that makes use of it.
- sch_ets.c: ets_qdisc_priomap_parse(): WARN_ON_ONCE on invalid
attribute instead of returning an extack.
- Patch #6:
- __mlxsw_sp_qdisc_ets_replace(): Pass the weights argument to this
function in this patch already. Drop the weight computation.
- mlxsw_sp_qdisc_prio_replace(): Rename "quanta" to "zeroes" and
pass for the abovementioned "weights".
- mlxsw_sp_qdisc_prio_graft(): Convert to a wrapper around
__mlxsw_sp_qdisc_ets_graft(), instead of invoking the latter
directly from mlxsw_sp_setup_tc_prio().
- Update to follow the _HIERARCHY_ -> _HR_ renaming.
- Patch #7:
- __mlxsw_sp_qdisc_ets_replace(): The "weights" argument passing and
weight computation removal are now done in a previous patch.
- mlxsw_sp_setup_tc_ets(): Drop case TC_ETS_REPLACE, which is handled
earlier in the function.
- Patch #3 (iproute2):
- Add an example output to the commit message.
- tc-ets.8: Fix output of two examples.
- tc-ets.8: Describe default values of "bands", "quanta".
- q_ets.c: A number of fixes in error messages.
- q_ets.c: Comment formatting: /*padding*/ -> /* padding */
- q_ets.c: parse_nbands: Move duplicate checking to callers.
- q_ets.c: Don't accept both "quantum" and "quanta" as equivalent.
v2 (internal):
- This addresses review from Ido Schimmel and comments from Alexander
Kushnarov.
- Patch #2:
- s/coment/comment in the commit message.
- Patch #4:
- sch_ets: ets_class_is_strict(), ets_class_id(): Constify an argument
- ets_class_find(): RXTify
- Patch #3 (iproute2):
- tc-ets.8: some spelling fixes
- tc-ets.8: add another example
- tc.8: add an ETS to "CLASSFUL QDISCS" section
v1 (internal):
- This addresses RFC reviews from Ido Schimmel and Roman Mashak, bugs found
by Alexander Petrovskiy and myself, and other improvements.
- Patch #2:
- Expand the explanation with an explicit example.
- Patch #4:
- Kconfig: s/sch_drr/sch_ets/
- sch_ets: Reorder includes to be in alphabetical order
- sch_ets: ets_quantum_parse(): Rename the return-pointer argument
from pquantum to quantum, and use it directly, not going through a
local temporary.
- sch_ets: ets_qdisc_quanta_parse(): Convert syntax of function
argument "quanta" from an array to a pointer.
- sch_ets: ets_qdisc_priomap_parse(): Likewise with "priomap".
- sch_ets: ets_qdisc_quanta_parse(), ets_qdisc_priomap_parse(): Invoke
__nla_validate_nested directly instead of nl80211_validate_nested().
- sch_ets: ets_qdisc_quanta_parse(): WARN_ON_ONCE on invalid attribute
instead of returning an extack.
- sch_ets: ets_qdisc_change(): Make the last band the default one for
unmentioned priomap priorities.
- sch_ets: Fix a panic when an offloaded child in a bandwidth-sharing
band notified its ETS parent.
- sch_ets: When ungrafting, add the newly-created invisible FIFO to
the Qdisc hash
- Patch #5:
- pkt_cls.h: Note that quantum=0 signifies a strict band.
- Fix error path handling when ets_offload_dump() fails.
- Patch #6:
- __mlxsw_sp_qdisc_ets_replace(): Convert syntax of function arguments
"quanta" and "priomap" from arrays to pointers.
- Patch #7:
- __mlxsw_sp_qdisc_ets_replace(): Convert syntax of function argument
"weights" from an array to a pointer.
- Patch #9:
- mlxsw/sch_ets.sh: Add a comment explaining packet prioritization.
- Adjust the whole suite to allow testing of traffic classifiers
in addition to testing priomap.
- Patch #10:
- Add a number of new tests to test default priomap band, overlarge
number of bands, zeroes in quanta, and altogether missing quanta.
- Patch #1 (iproute2):
- State motivation for inclusion of this patch in the patcheset in the
commit message.
- Patch #3 (iproute2):
- tc-ets.8: it is now December
- tc-ets.8: explain inactivity WRT using non-WC Qdiscs under ETS band
- tc-ets.8: s/flow/band in explanation of quantum
- tc-ets.8: explain what happens with priorities not covered by priomap
- tc-ets.8: default priomap band is now the last one
- q_ets.c: ets_parse_opt(): Remove unnecessary initialization of
priomap and quanta.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Wed, 18 Dec 2019 14:55:22 +0000 (14:55 +0000)]
selftests: forwarding: sch_ets: Add test coverage for ETS Qdisc
This tests the newly-added ETS Qdisc. It runs two to three streams of
traffic, each with a different priority. ETS Qdisc is supposed to allocate
bandwidth according to the DRR algorithm and given weights. After running
the traffic for a while, counters are compared for each stream to check
that the expected ratio is in fact observed.
In order for the DRR process to kick in, a traffic bottleneck must exist in
the first place. In slow path, such bottleneck can be implemented by
wrapping the ETS Qdisc inside a TBF or other shaper. This might however
make the configuration unoffloadable. Instead, on HW datapath, the
bottleneck would be set up by lowering port speed and configuring shared
buffer suitably.
Therefore the test is structured as a core component that implements the
testing, with two wrapper scripts that implement the details of slow path
resp. fast path configuration.
Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Wed, 18 Dec 2019 14:55:21 +0000 (14:55 +0000)]
selftests: forwarding: Move start_/stop_traffic from mlxsw to lib.sh
These two functions are used for starting several streams of traffic, and
then stopping them later. They will be handy for the test coverage of ETS
Qdisc. Move them from mlxsw-specific qos_lib.sh to the generic lib.sh.
Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Wed, 18 Dec 2019 14:55:19 +0000 (14:55 +0000)]
mlxsw: spectrum_qdisc: Support offloading of ETS Qdisc
Handle TC_SETUP_QDISC_ETS, add a new ops structure for the ETS Qdisc.
Invoke the extended prio handlers implemented in the previous patch. For
stats ops, invoke directly the prio callbacks, which are not sensitive to
differences between PRIO and ETS.
Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Wed, 18 Dec 2019 14:55:17 +0000 (14:55 +0000)]
mlxsw: spectrum_qdisc: Generalize PRIO offload to support ETS
Thanks to the similarity between PRIO and ETS it is possible to simply
reuse most of the code for offloading PRIO Qdisc. Extract the common
functionality into separate functions, making the current PRIO handlers
thin API adapters.
Extend the new functions to pass quanta for individual bands, which allows
configuring a subset of bands as WRR. Invoke mlxsw_sp_port_ets_set() as
appropriate to de/configure WRR-ness and weight of individual bands.
Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Wed, 18 Dec 2019 14:55:15 +0000 (14:55 +0000)]
net: sch_ets: Make the ETS qdisc offloadable
Add hooks at appropriate points to make it possible to offload the ETS
Qdisc.
Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Wed, 18 Dec 2019 14:55:13 +0000 (14:55 +0000)]
net: sch_ets: Add a new Qdisc
Introduces a new Qdisc, which is based on 802.1Q-2014 wording. It is
PRIO-like in how it is configured, meaning one needs to specify how many
bands there are, how many are strict and how many are dwrr, quanta for the
latter, and priomap.
The new Qdisc operates like the PRIO / DRR combo would when configured as
per the standard. The strict classes, if any, are tried for traffic first.
When there's no traffic in any of the strict queues, the ETS ones (if any)
are treated in the same way as in DRR.
Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
These enums want to be named MLXSW_REG_QEEC_HIERARCHY_, but due to a typo
lack the second H. That is confusing and complicates searching.
But actually the enumerators should be named _HR_, because that is how
their enum type is called. So rename them as appropriate.
Signed-off-by: Petr Machata <petrm@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Wed, 18 Dec 2019 14:55:10 +0000 (14:55 +0000)]
mlxsw: spectrum_qdisc: Clarify a comment
Expand the comment at mlxsw_sp_qdisc_prio_graft() to make the problem that
this function is trying to handle clearer.
Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Wed, 18 Dec 2019 14:55:08 +0000 (14:55 +0000)]
net: pkt_cls: Clarify a comment
The bit about negating HW backlog left me scratching my head. Clarify the
comment.
Signed-off-by: Petr Machata <petrm@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Turns out tin_quantum_prio isn't used anymore and is a leftover from a
previous implementation of diffserv tins. Since the variable isn't used
in any calculations it can be eliminated.
Drop variable and places where it was set. Rename remaining variable
and consolidate naming of intermediate variables that set it.
Signed-off-by: Kevin Darbyshire-Bryant <ldir@darbyshire-bryant.me.uk> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 18 Dec 2019 20:34:57 +0000 (12:34 -0800)]
Merge branch 's390-next'
Julian Wiedmann says:
====================
s390/qeth: features 2019-12-18
please apply the following patch series to your net-next tree.
Nothing major, just the usual mix of small improvements and cleanups.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 18 Dec 2019 16:34:48 +0000 (17:34 +0100)]
s390/qeth: stop yielding the ip_lock during IPv4 registration
As commit df2a2a5225cc ("s390/qeth: convert IP table spinlock to mutex")
converted the ip_lock to a mutex, we no longer have to yield it while
the subsequent IO sleep-waits for completion.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 18 Dec 2019 16:34:47 +0000 (17:34 +0100)]
s390/qeth: don't raise NETDEV_REBOOT event from L3 offline path
This is a leftover from back when a recovery action didn't go through
dev_close(), and was meant to shoot down all remaining af_iucv sockets
on the interface.
Now that the offline path always calls dev_close(), the
NETDEV_GOING_DOWN event from __dev_close_many() is sufficient and this
hack can be removed.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 18 Dec 2019 16:34:44 +0000 (17:34 +0100)]
s390/qeth: overhaul L3 IP address dump code
The current code that dumps the RXIP/VIPA/IPATO addresses via sysfs
first checks whether the buffer still provides sufficient space to hold
another formatted address.
But the maximum length of an formatted IPv4 address is 15 characters,
not 12. So we underestimate the max required length and if the buffer
was previously filled to _just_ the right level, a formatted address can
end up being truncated.
Revamp these code paths to use the _actually_ required length of the
formatted IP address, and while at it suppress a gratuitous newline.
Also use scnprintf() to format the output. In case of a truncation, this
would allow us to return the number of characters that were actually
written.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julian Wiedmann [Wed, 18 Dec 2019 16:34:43 +0000 (17:34 +0100)]
s390/qeth: wake up all waiters from qeth_irq()
card->wait_q is shared by different users, for different wake-up
conditions. qeth_irq() can potentially trigger multiple of these
conditions:
1) A change to channel->irq_pending, which qeth_send_control_data() is
waiting for.
2) A change to card->state, which qeth_clear_channel() and
qeth_halt_channel() are waiting for.
As qeth_irq() does only a single wake_up(), we might miss to wake up
a second eligible waiter. Luckily all waiters are guarded with a
timeout, so this situation should recover on its own eventually.
To make things work robustly, add an additional wake_up() for changes
to channel->state. And extract a helper that updates
channel->irq_pending along with the needed wake_up().
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Wed, 18 Dec 2019 10:33:08 +0000 (11:33 +0100)]
net: stmmac: Add Frame Preemption support using TAPRIO API
Adds the support for Frame Preemption using TAPRIO API. This works along
with EST feature and allows to select if preemptable traffic shall be
sent during specific queues opening time.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Wed, 18 Dec 2019 10:33:07 +0000 (11:33 +0100)]
net: stmmac: Integrate EST with TAPRIO scheduler API
Now that we have the EST code for XGMAC and QoS we can use it with the
TAPRIO scheduler. Integrate it into the main driver and use the API to
configure the EST feature.
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Wed, 18 Dec 2019 10:24:45 +0000 (11:24 +0100)]
net: stmmac: Always use TX coalesce timer value when rescheduling
When we have pending packets we re-arm the TX timer with a magic value.
This changes the re-arm of the timer from 10us to the user-defined
coalesce value. As we support different speeds, having a magic value of
10us can be either too short or to large depending on the speed so we
let user configure it. The default value of the timer is 1ms but it can
be reconfigured by ethtool.
Changes from v1:
- Reword commit message (Jakub)
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Wed, 18 Dec 2019 10:24:44 +0000 (11:24 +0100)]
net: stmmac: Let TX and RX interrupts be independently enabled/disabled
By using this mechanism we can get rid of the not so nice method of
scheduling TX NAPI when the RX was scheduled. No bandwidth reduction was
seen with this change.
Changes from v1:
- Remove useless comment (Jakub)
- Do not bind the TX clean to NAPI budget (Jakub)
Signed-off-by: Jose Abreu <Jose.Abreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Tue, 17 Dec 2019 13:32:18 +0000 (13:32 +0000)]
xen-netback: remove 'hotplug-status' once it has served its purpose
Removing the 'hotplug-status' node in netback_remove() is wrong; the script
may not have completed. Only remove the node once the watch has fired and
has been unregistered.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Durrant [Tue, 17 Dec 2019 13:32:17 +0000 (13:32 +0000)]
xen-netback: switch state to InitWait at the end of netback_probe()...
...as the comment above the function states.
The switch to Initialising at the start of the function is somewhat bogus
as the toolstack will have set that initial state anyway. To behave
correctly, a backend should switch to InitWait once it has set up all
xenstore values that may be required by a initialising frontend. This
patch calls backend_switch_state() to make the transition at the
appropriate point.
NOTE: backend_switch_state() ignores errors from xenbus_switch_state()
and so this patch removes an error path from netback_probe(). This
means a failure to change state at this stage (in the absence of
other failures) will leave the device instantiated. This is highly
unlikley to happen as a failure to change state would indicate a
failure to write to xenstore, and that will trigger other error
paths. Also, a 'stuck' device can still be cleaned up using 'unbind'
in any case.
Signed-off-by: Paul Durrant <pdurrant@amazon.com> Acked-by: Wei Liu <wei.liu@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
cxgb4/chtls: fix ULD connection failures due to wrong TID base
Currently, the hardware TID index is assumed to start from index 0.
However, with the following changeset,
commit c21939998802 ("cxgb4: add support for high priority filters")
hardware TID index can start after the high priority region, which
has introduced a regression resulting in connection failures for
ULDs.
So, fix all related code to properly recalculate the TID start index
based on whether high priority filters are enabled or not.
Fixes: c21939998802 ("cxgb4: add support for high priority filters") Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
cxgb4: fix missed high priority region calculation
commit c21939998802 ("cxgb4: add support for high priority filters")
has missed considering high priority region calculation in some code
paths. This patch fixes them.
Fixes: c21939998802 ("cxgb4: add support for high priority filters") Signed-off-by: Shahjada Abul Husain <shahjada@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Mon, 16 Dec 2019 18:32:47 +0000 (10:32 -0800)]
net: dsa: Make PHYLINK related function static again
Commit 77373d49de22 ("net: dsa: Move the phylink driver calls into
port.c") moved and exported a bunch of symbols, but they are not used
outside of net/dsa/port.c at the moment, so no reason to export them.
Reported-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@gmail.com> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Mon, 16 Dec 2019 18:21:02 +0000 (19:21 +0100)]
tipc: don't send gap blocks in ACK messages
In the commit referred to below we eliminated sending of the 'gap'
indicator in regular ACK messages, reserving this to explicit NACK
ditto.
Unfortunately we missed to also eliminate building of the 'gap block'
area in ACK messages. This area is meant to report gaps in the
received packet sequence following the initial gap, so that lost
packets can be retransmitted earlier and received out-of-sequence
packets can be released earlier. However, the interpretation of those
blocks is dependent on a complete and correct sequence of gaps and
acks. Hence, when the initial gap indicator is missing a single gap
block will be interpreted as an acknowledgment of all preceding
packets. This may lead to packets being released prematurely from the
sender's transmit queue, with easily predicatble consequences.
We now fix this by not building any gap block area if there is no
initial gap to report.
Fixes: commit 02288248b051 ("tipc: eliminate gap indicator from ACK messages") Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 17 Dec 2019 21:55:23 +0000 (13:55 -0800)]
Merge branch 'stmmac-dwc-qos-ACPI-device-support'
Ajay Gupta says:
====================
net: stmmac: dwc-qos: ACPI device support
Version 3 of patches have fixes for comments from Jakub Kicinski.
These two changes are needed to enable ACPI based devices to use stmmac
driver. First patch is to use generic device api (device_*) instead of
device tree based api (of_*). Second patch avoids clock and reset accesses
for Tegra ACPI based devices. ACPI interface will be used to access clock
and reset for Tegra ACPI devices in later patches.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net-next: stmmac: dwmac-mediatek: add more support for RMII
changes in v2:
PATCH 1/2 net-next: stmmac: mediatek: add more support for RMII
As Andrew's comments, add the "rmii_internal" clock to the list of clocks.
PATCH 2/2 net-next: dt-binding: dwmac-mediatek: add more description for RMII
document the "rmii_internal" clock in dt-bindings
rewrite the sample dts in dt-bindings.
v1:
This series is for support RMII when MT2712 SoC provides the reference clock.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Biao Huang [Mon, 16 Dec 2019 05:39:57 +0000 (13:39 +0800)]
net-next: stmmac: mediatek: add more support for RMII
MT2712 SoC can provide the rmii reference clock, and the clock
will output from TXC pin only, which means ref_clk pin of external
PHY should connect to TXC pin in this case.
Add corresponding clock and timing settings.
Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
improve clause 45 support in phylink
These three patches improve the clause 45 support in phylink, fixing
some corner cases that have been noticed with the addition of SFP+
NBASE-T modules, but are actually a little more wisespread than I
initially realised.
The first issue was spotted with a NBASE-T PHY on a SFP+ module plugged
into a mvneta platform. When these PHYs are not operating in USXGMII
mode, but are in a single-lane Serdes mode, they will switch between
one of several different PHY interface modes.
If we call the MAC validate() function with the current PHY interface
mode, we will restrict the supported and advertising masks to the link
modes that the current PHY interface mode supports. For example, if we
determine that we want to start the PHY with an interface mode of
2500BASE-X, then this setup will restrict the advertisement and
supported masks to 2.5G speed link modes.
What we actually want for these PHYs is to allow them to support any
link modes that the PHY supports _and_ the MAC is also capable of
supporting. Without knowing the details of the PHY interface modes that
may be used, we can do this by using PHY_INTERFACE_MODE_NA to validate
and restrict the link modes to any that the MAC supports.
mvpp2 with the 88X3310 PHY avoids this problem, because the validate()
implementation allows all MAC supported speeds not only for
PHY_INTERFACE_MODE_NA, but also for XAUI and 10GKR modes.
The first patch addresses this; current MAC drivers should continue to
work as-is, but there will be a follow-on patch to fixup at least
mvpp2.
The second issue addresses a very similar problem that occurs when
trying to use ethtool to alter the advertisement mask - we call
the MAC validate() function with the current interface mode, the
current support and requested advertisement masks. This immediately
restricts the advertisement in the same way as the above.
This patch series addresses both issues, although the patches are not
in the above order.
v2: fix patch 3 missing 1G link modes for SGMII and RGMII interface
modes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix up the mvpp2 validate implementation to adopt the same behaviour as
mvneta:
- only allow the link modes that the specified PHY interface mode
supports with the exception of 1000base-X and 2500base-X.
- use the basex helper to deal with SFP modules that can be switched
between 1000base-X vs 2500base-X.
This gives consistent behaviour between mvneta and mvpp2.
This commit depends on "net: phylink: extend clause 45 PHY validation
workaround" so is not marked for backporting to stable kernels.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Acked-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Commit e45d1f5288b8 ("net: phylink: support Clause 45 PHYs on SFP+
modules") added a workaround to support clause 45 PHYs which
dynamically switch their interface mode on SFP+ modules. This was
implemented by validating the PHYs supported/advertising using
PHY_INTERFACE_MODE_NA, rather than the specific interface mode that
we attached the PHY with.
However, we already have a situation where phylink is used to connect
a Marvell 88X3310 PHY which also behaves in exactly the same way, but
which seemingly doesn't need this. The reason seems to be that the
mvpp2 driver sets a whole bunch of link modes for
PHY_INTERFACE_MODE_10GKR down to 10Mb/s, despite 10GBASE-R not actually
supporting anything but 10Gb/s speeds.
When testing with drivers that (correctly) take the mvneta approach,
where the validate() method only returns what can be supported /
advertised for the specified link mode, we find that Clause 45 PHYs do
not behave as we expect: their advertisement is restricted to what
the current link will support, rather than what the PHY supports
through its dynamic switching.
Extend this workaround to all such cases; if we have a Clause 45 PHY
attaching via any means, except in USXGMII, XAUI and RXAUI which are
all unable to support this dynamic switching or have other solutions
to it, then we need to validate using PHY_INTERFACE_MODE_NA.
This should allow mvpp2 to switch to a more conformant validate()
implementation.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
While testing ethtool with the Methode DM7052 module, it was noticed
that attempting to set the advertising mask results in the mask being
truncated to the support offered by the currently chosen PHY interface
mode.
When a PHY dynamically changes the PHY interface mode, limiting the
advertising mask in this way is not correct - if the PHY happened to
negotiate 10GBASE-T, and selected 10GBASE-R as the host interface, we
don't want to restrict the advertisement to just 10GBASE-* modes.
Rework setting the advertisement to take account of this; do not pass
the requested advertisement through phylink_validate(), but rely on
the advertisement restriction (supported mask) set when the PHY was
initially setup.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 17 Dec 2019 03:22:22 +0000 (19:22 -0800)]
Merge branch 'WireGuard-CI-and-housekeeping'
Jason A. Donenfeld says:
====================
WireGuard CI and housekeeping
This is a collection of commits gathered during the last 1.5 weeks since
merging WireGuard. If you'd prefer, I can send tree pull requests
instead, but I figure it might be best for now to just send things as
full patch sets to netdev.
The first part of this adds in the CI test harness that we've been using
for quite some time with success. You can type `make` and get the
selftests running in a fresh VM immediately. This has been an
instrumental tool in developing WireGuard, and I think it'd benefit most
from being in-tree alongside the selftests that are already there. Once
this lands, I plan to get build.wireguard.com building wireguard-
linux.git and net-next.git on every single commit pushed, and do so on a
bunch of different architectures. As this migrates into Linus' tree
eventually and then into net.git, I'll get net.git building there too on
every commit. Future work with this involves generalizing it to include
more networking subsystem tests beyond just WireGuard, but one step at a
time. In the process of porting this to the tree, the builder uncovered
a mistake in the config menu file, which the second commit fixes.
The last three commits are small housekeeping things, fixing spelling
mistakes, replacing call_rcu with kfree_rcu, and removing an unused
include.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Sun, 15 Dec 2019 21:08:04 +0000 (22:08 +0100)]
wireguard: allowedips: use kfree_rcu() instead of call_rcu()
The callback function of call_rcu() just calls a kfree(), so we
can use kfree_rcu() instead of call_rcu() + callback function.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Sun, 15 Dec 2019 21:08:03 +0000 (22:08 +0100)]
wireguard: main: remove unused include <linux/version.h>
Remove <linux/version.h> from the includes for main.c, which is unused.
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
[Jason: reworded commit message] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Josh Soref [Sun, 15 Dec 2019 21:08:02 +0000 (22:08 +0100)]
wireguard: global: fix spelling mistakes in comments
This fixes two spelling errors in source code comments.
Signed-off-by: Josh Soref <jsoref@gmail.com>
[Jason: rewrote commit message] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
wireguard: Kconfig: select parent dependency for crypto
This fixes the crypto selection submenu depenencies. Otherwise, we'd
wind up issuing warnings in which certain dependencies we also select
couldn't be satisfied. This condition was triggered by the addition of
the test suite autobuilder in the previous commit.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
wireguard: selftests: import harness makefile for test suite
WireGuard has been using this on build.wireguard.com for the last
several years with considerable success. It allows for very quick and
iterative development cycles, and supports several platforms.
To run the test suite on your current platform in QEMU:
$ make -C tools/testing/selftests/wireguard/qemu -j$(nproc)
To run it with KASAN and such turned on:
$ DEBUG_KERNEL=yes make -C tools/testing/selftests/wireguard/qemu -j$(nproc)
To run it emulated for another platform in QEMU:
$ ARCH=arm make -C tools/testing/selftests/wireguard/qemu -j$(nproc)
At the moment, we support aarch64_be, aarch64, arm, armeb, i686, m68k,
mips64, mips64el, mips, mipsel, powerpc64le, powerpc, and x86_64.
The system supports incremental rebuilding, so it should be very fast to
change a single file and then test it out and have immediate feedback.
This requires for the right toolchain and qemu to be installed prior.
I've had success with those from musl.cc.
This is tailored for WireGuard at the moment, though later projects
might generalize it for other network testing.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Aditya Pakki [Sun, 15 Dec 2019 17:51:30 +0000 (11:51 -0600)]
net: caif: replace BUG_ON with recovery code
In caif_xmit, there is a crash if the ptr dev is NULL. However, by
returning the error to the callers, the error can be handled. The
patch fixes this issue.
Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
Aditya Pakki [Sun, 15 Dec 2019 16:14:51 +0000 (10:14 -0600)]
fore200e: Fix incorrect checks of NULL pointer dereference
In fore200e_send and fore200e_close, the pointers from the arguments
are dereferenced in the variable declaration block and then checked
for NULL. The patch fixes these issues by avoiding NULL pointer
dereferences.
Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 17 Dec 2019 00:14:43 +0000 (16:14 -0800)]
Merge branch 'Simplify-IPv4-route-offload-API'
Ido Schimmel says:
====================
Simplify IPv4 route offload API
Motivation
==========
The aim of this patch set is to simplify the IPv4 route offload API by
making the stack a bit smarter about the notifications it is generating.
This allows driver authors to focus on programming the underlying device
instead of having to duplicate the IPv4 route insertion logic in their
driver, which is error-prone.
This is the first patch set out of a series of four. Subsequent patch
sets will simplify the IPv6 API, add offload/trap indication to routes
and add tests for all the code paths (including error paths). Available
here [1].
Details
=======
Today, whenever an IPv4 route is added or deleted a notification is sent
in the FIB notification chain and it is up to offload drivers to decide
if the route should be programmed to the hardware or not. This is not an
easy task as in hardware routes are keyed by {prefix, prefix length,
table id}, whereas the kernel can store multiple such routes that only
differ in metric / TOS / nexthop info.
This series makes sure that only routes that are actually used in the
data path are notified to offload drivers. This greatly simplifies the
work these drivers need to do, as they are now only concerned with
programming the hardware and do not need to replicate the IPv4 route
insertion logic and store multiple identical routes.
The route that is notified is the first FIB alias in the FIB node with
the given {prefix, prefix length, table ID}. In case the route is
deleted and there is another route with the same key, a replace
notification is emitted. Otherwise, a delete notification is emitted.
The above means that in the case of multiple routes with the same key,
but different TOS, only the route with the highest TOS is notified.
While the kernel can route a packet based on its TOS, this is not
supported by any hardware devices I am familiar with. Moreover, this is
not supported by IPv6 nor by BIRD/FRR from what I could see. Offload
drivers should therefore use the presence of a non-zero TOS as an
indication to trap packets matching the route and let the kernel route
them instead. mlxsw has been doing it for the past two years.
Testing
=======
To ensure there is no degradation in route insertion rates, I averaged
the insertion rate of 512k routes (/24 and /32) over 50 runs. Did not
observe any degradation.
Functional tests are available here [1]. They rely on route trap
indication, which is only added in the last patch set.
In addition, I have been running syzkaller for the past week with all
four patch sets and debug options enabled. Did not observe any problems.
Patch set overview
==================
Patches #1-#8 gradually introduce the new FIB notifications
Patch #9 converts mlxsw to use the new notifications
Patch #10 converts the remaining listeners and removes the old
notifications
v2:
* Extend fib_find_alias() with another argument instead of introducing a
new function (David Ahern)
Ido Schimmel [Sat, 14 Dec 2019 15:53:15 +0000 (17:53 +0200)]
ipv4: Remove old route notifications and convert listeners
Unlike mlxsw, the other listeners to the FIB notification chain do not
require any special modifications as they never considered multiple
identical routes.
This patch removes the old route notifications and converts all the
listeners to use the new replace / delete notifications.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sat, 14 Dec 2019 15:53:13 +0000 (17:53 +0200)]
ipv4: Only Replay routes of interest to new listeners
When a new listener is registered to the FIB notification chain it
receives a dump of all the available routes in the system. Instead, make
sure to only replay the IPv4 routes that are actually used in the data
path and are of any interest to the new listener.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sat, 14 Dec 2019 15:53:12 +0000 (17:53 +0200)]
ipv4: Handle route deletion notification during flush
In a similar fashion to previous patch, when a route is deleted as part
of table flushing, promote the next route in the list, if exists.
Otherwise, simply emit a delete notification.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sat, 14 Dec 2019 15:53:11 +0000 (17:53 +0200)]
ipv4: Handle route deletion notification
When a route is deleted we potentially need to promote the next route in
the FIB alias list (e.g., with an higher metric). In case we find such a
route, a replace notification is emitted. Otherwise, a delete
notification for the deleted route.
v2:
* Convert to use fib_find_alias() instead of fib_find_first_alias()
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sat, 14 Dec 2019 15:53:10 +0000 (17:53 +0200)]
ipv4: Notify newly added route if should be offloaded
When a route is added, it should only be notified in case it is the
first route in the FIB alias list with the given {prefix, prefix length,
table ID}. Otherwise, it is not used in the data path and should not be
considered by switch drivers.
v2:
* Convert to use fib_find_alias() instead of fib_find_first_alias()
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sat, 14 Dec 2019 15:53:09 +0000 (17:53 +0200)]
ipv4: Notify route if replacing currently offloaded one
When replacing a route, its replacement should only be notified in case
the replaced route is of any interest to listeners. In other words, if
the replaced route is currently used in the data path, which means it is
the first route in the FIB alias list with the given {prefix, prefix
length, table ID}.
v2:
* Convert to use fib_find_alias() instead of fib_find_first_alias()
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sat, 14 Dec 2019 15:53:08 +0000 (17:53 +0200)]
ipv4: Extend FIB alias find function
Extend the function with another argument, 'find_first'. When set, the
function returns the first FIB alias with the matching {prefix, prefix
length, table ID}. The TOS and priority parameters are ignored. Current
callers are converted to pass 'false' in order to maintain existing
behavior.
This will be used by subsequent patches in the series.
v2:
* New patch
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Suggested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 17 Dec 2019 00:12:25 +0000 (16:12 -0800)]
Merge branch 'hns3-next'
Huazhong Tan says:
====================
net: hns3: some optimizaions related to work task
This series refactors the work task of the HNS3 ethernet driver.
[patch 1/5] uses delayed workqueue to replace the timer for
hclgevf_service task, make the code simpler.
[patch 2/5] & [patch 3/5] unifies current mailbox, reset and
service work into one.
[patch 4/5] allocates a private work queue with WQ_MEM_RECLAIM
for the HNS3 driver.
[patch 5/5] adds a new flag to indicate whether reset fails,
and prevent scheduling service task to handle periodic task
when this flag has been set.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Guojia Liao [Sat, 14 Dec 2019 02:06:41 +0000 (10:06 +0800)]
net: hns3: do not schedule the periodic task when reset fail
service_task will be scheduled per second to do some periodic
jobs. When reset fails, it means this device is not available
now, so the periodic jobs do not need to be handled.
This patch adds flag HCLGE_STATE_RST_FAIL/HCLGEVF_STATE_RST_FAIL
to indicate that reset fails, and checks this flag before
schedule periodic task.
Signed-off-by: Guojia Liao <liaoguojia@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>