David Ahern [Sat, 8 Jun 2019 21:53:29 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in __ip6_route_redirect
Add a hook in __ip6_route_redirect to handle a nexthop struct in a
fib6_info. Use nexthop_for_each_fib6_nh and fib6_nh_redirect_match
to call ip6_redirect_nh_match for each fib6_nh looking for a match.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:28 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in exception handling
Add a hook in rt6_flush_exceptions, rt6_remove_exception_rt,
rt6_update_exception_stamp_rt, and rt6_age_exceptions to handle
nexthop struct in a fib6_info.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:26 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in rt6_nlmsg_size
Add a hook in rt6_nlmsg_size to handle nexthop struct in a fib6_info.
rt6_nh_nlmsg_size is used to sum the space needed for all nexthops in
the fib entry.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:25 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in __find_rr_leaf
Add a hook in __find_rr_leaf to handle nexthop struct in a fib6_info.
nexthop_for_each_fib6_nh is used to walk each fib6_nh in a nexthop and
call find_match. On a match, use the fib6_nh saved in the callback arg
to setup fib6_result.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:24 +0000 (14:53 -0700)]
ipv6: Handle all fib6_nh in a nexthop in rt6_device_match
Add a hook in rt6_device_match to handle nexthop struct in a fib6_info.
The new rt6_nh_dev_match uses nexthop_for_each_fib6_nh to walk each
fib6_nh in a nexthop and call __rt6_device_match. On match,
rt6_nh_dev_match returns the fib6_nh and rt6_device_match uses it to
setup fib6_result.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sat, 8 Jun 2019 21:53:22 +0000 (14:53 -0700)]
nexthops: Add ipv6 helper to walk all fib6_nh in a nexthop struct
IPv6 has traditionally had a single fib6_nh per fib6_info. With
nexthops we can have multiple fib6_nh associated with a fib6_info.
Add a nexthop helper to invoke a callback for each fib6_nh in a
'struct nexthop'. If the callback returns non-0, the loop is
stopped and the return value passed to the caller.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Mon, 10 Jun 2019 15:19:08 +0000 (23:19 +0800)]
tcp: Make tcp_fastopen_alloc_ctx static
Fix sparse warning:
net/ipv4/tcp_fastopen.c:75:29: warning:
symbol 'tcp_fastopen_alloc_ctx' was not declared. Should it be static?
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Jason Baron <jbaron@akamai.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 10 Jun 2019 16:23:30 +0000 (18:23 +0200)]
r8169: remove callback hw_start from struct rtl_cfg_info
After the latest changes we don't need separate functions
rtl_hw_start_8168 and rtl_hw_start_8101 any longer. This allows us to
simplify the code. For this change we need to move rtl_hw_start() and
rtl_hw_start_8169(). rtl_hw_start_8169() is unchanged.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 10 Jun 2019 16:22:33 +0000 (18:22 +0200)]
r8169: rename CPCMD_QUIRK_MASK and apply it on all chip versions
CPCMD_QUIRK_MASK isn't specific to certain chip versions. The vendor
driver applies this mask to all 8168 versions. Therefore remove QUIRK
from the mask name and apply it on all chip versions.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 10 Jun 2019 16:21:50 +0000 (18:21 +0200)]
r8169: improve setting interrupt mask
So far several places in the code deal with setting the interrupt mask
for the respective chip versions. Improve this by having one function
for this only. In addition don't set RxFIFOOver for all 8101 chip
versions like in the vendor driver.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 10 Jun 2019 16:12:53 +0000 (09:12 -0700)]
Merge branch 'mvpp2-stats'
Maxime Chevallier says:
====================
net: mvpp2: Add extra ethtool stats
This series adds support for more ethtool counters in PPv2 :
- Per port counters, including one indicating the classifier drops
- Per RXQ and per TXQ counters
The first 2 patches perform some light rework and renaming, and the 3rd
adds the extra counters.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: mvpp2: Only clear the stat counters at port init
When first configuring a port on PPv2, we want to clear the internal
counters so that we don't get values from previous boot stages.
However, we can't really clear these counters when resetting the MAC,
since there are valid reasons to do so while the port is being used,
such as when reconfiguring the interface mode with the PHY.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mao Wenan [Sun, 9 Jun 2019 07:11:26 +0000 (15:11 +0800)]
ocelot: remove unused variable 'rc' in vcap_cmd()
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/mscc/ocelot_ace.c: In function ‘vcap_cmd’:
drivers/net/ethernet/mscc/ocelot_ace.c:108:6: warning: variable ‘rc’ set
but not used [-Wunused-but-set-variable]
int rc;
^
It's never used since introduction in commit b596229448dd ("net: mscc:
ocelot: Add support for tcam")
Signed-off-by: Mao Wenan <maowenan@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sun, 9 Jun 2019 00:58:51 +0000 (17:58 -0700)]
ipv6: tcp: send consistent autoflowlabel in TIME_WAIT state
In case autoflowlabel is in action, skb_get_hash_flowi6()
derives a non zero skb->hash to the flowlabel.
If skb->hash is zero, a flow dissection is performed.
Since all TCP skbs sent from ESTABLISH state inherit their
skb->hash from sk->sk_txhash, we better keep a copy
of sk->sk_txhash into the TIME_WAIT socket.
After this patch, ACK or RST packets sent on behalf of
a TIME_WAIT socket have the flowlabel that was previously
used by the flow.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
RGMII delays for SJA1105 DSA driver
This patchset configures the Tunable Delay Lines of the SJA1105 P/Q/R/S
switches. These add a programmable phase offset on the RGMII RX and TX
clock signals and get used by the driver for fixed-link interfaces that
use the rgmii-id, rgmii-txid or rgmii-rxid phy-modes.
Tested on a board where RGMII delays were already set up, by adding
MAC-side delays on the RGMII interface towards a BCM5464R PHY and
noticing that the MAC now reports SFD, preamble, FCS etc. errors.
Conflicts trivially in drivers/net/dsa/sja1105/sja1105_spi.c with
https://patchwork.ozlabs.org/project/netdev/list/?series=112614&state=*
which must be applied first.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 16:12:27 +0000 (19:12 +0300)]
net: dsa: sja1105: Remove duplicate rgmii_pad_mii_tx from regs
The pad_mii_tx registers point to the same memory region but were
unused. So convert to using these for RGMII I/O cell configuration, as
they bear a shorter name.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 13:53:56 +0000 (16:53 +0300)]
net: phy: broadcom: Add genphy_suspend and genphy_resume for BCM5464
This puts the quad PHY ports in power-down mode when the PHY transitions
to the PHY_HALTED state. It is likely that all the other PHYs support
the BMCR_PDOWN bit, but I only have the BCM5464R to test.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Rethink PHYLINK callbacks for SJA1105 DSA
This patchset implements phylink_mac_link_up and phylink_mac_link_down,
while also removing the code that was modifying the EGRESS and INGRESS
MAC settings for STP and replacing them with the "inhibit TX"
functionality.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 13:03:44 +0000 (16:03 +0300)]
net: dsa: sja1105: Rethink the PHYLINK callbacks
The first fact that needs to be stated is that the per-MAC settings in
SJA1105 called EGRESS and INGRESS do *not* disable egress and ingress on
the MAC. They only prevent non-link-local traffic from being
sent/received on this port.
So instead of having .phylink_mac_config essentially mess with the STP
state and force it to DISABLED/BLOCKING (which also brings useless
complications in sja1105_static_config_reload), simply add the
.phylink_mac_link_down and .phylink_mac_link_up callbacks which inhibit
TX at the MAC level, while leaving RX essentially enabled.
Also stop from trying to put the link down in .phylink_mac_config, which
is incorrect.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 13:03:43 +0000 (16:03 +0300)]
net: dsa: sja1105: Export the sja1105_inhibit_tx function
This will be used to stop egress traffic in .phylink_mac_link_up.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 13:03:42 +0000 (16:03 +0300)]
net: dsa: sja1105: Update some comments about PHYLIB
Since the driver is now using PHYLINK exclusively, it makes sense to
remove all references to it and replace them with PHYLINK.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 13:03:41 +0000 (16:03 +0300)]
net: dsa: sja1105: Use SPEED_{10, 100, 1000, UNKNOWN} macros
This is a cosmetic patch that replaces the link speed numbers used in
the driver with the corresponding ethtool macros.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Suggested-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/key/af_key.c:932:2-5: WARNING: Use BUG_ON instead of if condition
followed by BUG.
net/key/af_key.c:948:2-5: WARNING: Use BUG_ON instead of if condition
followed by BUG.
Signed-off-by: Hariprasad Kelam <hariprasad.kelam@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 7 Jun 2019 19:23:48 +0000 (12:23 -0700)]
ipv6: tcp: fix potential NULL deref in tcp_v6_send_reset()
syzbot found a crash in tcp_v6_send_reset() caused by my latest
change.
Problem is that if an skb has been queued to socket prequeue,
skb_dst(skb)->dev can not anymore point to the device.
Fortunately in this case the socket pointer is not NULL.
A similar issue has been fixed in commit 0f85feae6b71 ("tcp: fix
more NULL deref after prequeue changes"), I should have known better.
Fixes: 323a53c41292 ("ipv6: tcp: enable flowlabel reflection in some RST packets") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Avoid local_irq_save() and use napi_alloc_frag() where possible
The first two patches remove local_irq_save() around
`netdev_alloc_cache' which does not work on -RT. Besides helping -RT it
whould benefit the users of the function since they can avoid disabling
interrupts and save a few cycles.
The remaining patches are from a time when I tried to remove
`netdev_alloc_cache' but then noticed that we still have non-NAPI
drivers using netdev_alloc_skb() and I dropped that idea. Using
napi_alloc_frag() over netdev_alloc_frag() would skip the not required
local_bh_disable() around the allocation.
v1…v2:
- 1/7 + 2/7 use now "(in_irq() || irqs_disabled())" instead just
"irqs_disabled()" to align with __dev_kfree_skb_any(). Pointed out
by Eric Dumazet.
- 6/7 has a typo less. Pointed out by Sergei Shtylyov.
- 3/7 + 4/7 added acks from Ioana Radulescu.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Based on review, `lock' is only acquired in hwbm_pool_add() which is
invoked via ->probe(), ->resume() and ->ndo_change_mtu(). Based on this
the lock can become a mutex and there is no need to disable interrupts
during the procedure.
Now that the lock is a mutex, hwbm_pool_add() no longer invokes
hwbm_pool_refill() in an atomic context so we can pass GFP_KERNEL to
hwbm_pool_refill() and remove the `gfp' argument from hwbm_pool_add().
Cc: Thomas Petazzoni <thomas.petazzoni@bootlin.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
tg3_alloc_rx_data() uses netdev_alloc_frag() for skb allocation. All
callers of tg3_alloc_rx_data() either hold tp->lock (which is held with
BH disabled) or run in NAPI context.
Use napi_alloc_frag() for skb allocations.
Cc: Siva Reddy Kallam <siva.kallam@broadcom.com> Cc: Prashant Sreedharan <prashant@broadcom.com> Cc: Michael Chan <mchan@broadcom.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
SKB allocation via bnx2x_frag_alloc() is always performed in NAPI
context. Preemptible context passes GFP_KERNEL and bnx2x_frag_alloc()
uses then __get_free_page() for the allocation.
Use napi_alloc_frag() for memory allocation.
Cc: Ariel Elior <aelior@marvell.com> Cc: Sudarsana Kalluru <skalluru@marvell.com> Cc: GR-everest-linux-l2@marvell.com Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
The driver is using netdev_alloc_frag() for allocation in the
->ndo_start_xmit() path. That one is always invoked in a BH disabled
region so we could also use napi_alloc_frag().
Use napi_alloc_frag() for skb allocation.
Cc: Ioana Radulescu <ruxandra.radulescu@nxp.com> Acked-by: Ioana Radulescu <ruxandra.radulescu@nxp.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
dpaa2-eth: Remove preempt_disable() from seed_pool()
According to the comment, the preempt_disable() statement is required
due to synchronisation in napi_alloc_frag(). The awful truth is that
local_bh_disable() is required because otherwise the NAPI poll callback
can be invoked while the open function setup buffers. This isn't
unlikely since the dpaa2 provides multiple devices.
The usage of napi_alloc_frag() has been removed in commit
27c874867c4e9 ("dpaa2-eth: Use a single page per Rx buffer")
which means that the comment is not accurate and the preempt_disable()
statement is not required.
Remove the outdated comment and the no longer required
preempt_disable().
Cc: Ioana Radulescu <ruxandra.radulescu@nxp.com> Acked-by: Ioana Radulescu <ruxandra.radulescu@nxp.com> Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
net: Don't disable interrupts in __netdev_alloc_skb()
__netdev_alloc_skb() can be used from any context and is used by NAPI
and non-NAPI drivers. Non-NAPI drivers use it in interrupt context and
NAPI drivers use it during initial allocation (->ndo_open() or
->ndo_change_mtu()). Some NAPI drivers share the same function for the
initial allocation and the allocation in their NAPI callback.
The interrupts are disabled in order to ensure locked access from every
context to `netdev_alloc_cache'.
Let __netdev_alloc_skb() check if interrupts are disabled. If they are, use
`netdev_alloc_cache'. Otherwise disable BH and use `napi_alloc_cache.page'.
The IRQ check is cheaper compared to disabling & enabling interrupts and
memory allocation with disabled interrupts does not work on -RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
net: Don't disable interrupts in napi_alloc_frag()
netdev_alloc_frag() can be used from any context and is used by NAPI
and non-NAPI drivers. Non-NAPI drivers use it in interrupt context
and NAPI drivers use it during initial allocation (->ndo_open() or
->ndo_change_mtu()). Some NAPI drivers share the same function for the
initial allocation and the allocation in their NAPI callback.
The interrupts are disabled in order to ensure locked access from every
context to `netdev_alloc_cache'.
Let netdev_alloc_frag() check if interrupts are disabled. If they are,
use `netdev_alloc_cache' otherwise disable BH and invoke
__napi_alloc_frag() for the allocation. The IRQ check is cheaper
compared to disabling & enabling interrupts and memory allocation with
disabled interrupts does not work on -RT.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 10 Jun 2019 02:25:59 +0000 (19:25 -0700)]
Merge branch 'SFP-polling-fixes'
Robert Hancock says:
====================
SFP polling fixes
This has an updated version of an earlier patch to ensure that SFP
operations are stopped during shutdown, and another patch suggested by
Russell King to address a potential concurrency issue with SFP state
checks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Hancock [Fri, 7 Jun 2019 16:42:36 +0000 (10:42 -0600)]
net: sfp: add mutex to prevent concurrent state checks
sfp_check_state can potentially be called by both a threaded IRQ handler
and delayed work. If it is concurrently called, it could result in
incorrect state management. Add a st_mutex to protect the state - this
lock gets taken outside of code that checks and handle state changes, and
the existing sm_mutex nests inside of it.
Suggested-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Robert Hancock <hancock@sedsystems.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Hancock [Fri, 7 Jun 2019 16:42:35 +0000 (10:42 -0600)]
net: sfp: Stop SFP polling and interrupt handling during shutdown
SFP device polling can cause problems during the shutdown process if the
parent devices of the network controller have been shut down already.
This problem was seen on the iMX6 platform with PCIe devices, where
accessing the device after the bus is shut down causes a hang.
Free any acquired GPIO interrupts and stop all delayed work in the SFP
driver during the shutdown process, so that we ensure that no pending
operations are still occurring after the SFP shutdown completes.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Fri, 7 Jun 2019 15:31:07 +0000 (18:31 +0300)]
nexthop: off by one in nexthop_mpath_select()
The nhg->nh_entries[] array is allocated in nexthop_grp_alloc() and it
has nhg->num_nh elements so this check should be >= instead of >.
Fixes: 430a049190de ("nexthop: Add support for nexthop groups") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
bonding: clean up and standarize logging printks
This set improves a few somewhat terse bonding debug messages, fixes some
errors in others, and then standarizes the majority of them, using new
slave_* printk macros that wrap around netdev_* to ensure both master
and slave information is provided consistently, where relevant. This set
proves very useful in debugging issues on hosts with multiple bonds.
I've run an array of LNST tests over this set, creating and destroying
quite a few different bonds of the course of testing, fixed the little
gotchas here and there, and everything looks stable and reasonable to me,
but I can't guarantee I've tested every possible message and scenario to
catch every possible "slave could be NULL" case.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Fri, 7 Jun 2019 14:59:32 +0000 (10:59 -0400)]
bonding/options: convert to using slave printk macros
All of these printk instances benefit from having both master and slave
device information included, so convert to using a standardized macro
format and remove redundant information.
Suggested-by: Joe Perches <joe@perches.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Fri, 7 Jun 2019 14:59:31 +0000 (10:59 -0400)]
bonding/alb: convert to using slave printk macros
All of these printk instances benefit from having both master and slave
device information included, so convert to using a standardized macro
format and remove redundant information.
Suggested-by: Joe Perches <joe@perches.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Fri, 7 Jun 2019 14:59:30 +0000 (10:59 -0400)]
bonding/802.3ad: convert to using slave printk macros
All of these printk instances benefit from having both master and slave
device information included, so convert to using a standardized macro
format and remove redundant information.
Suggested-by: Joe Perches <joe@perches.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Fri, 7 Jun 2019 14:59:29 +0000 (10:59 -0400)]
bonding/main: convert to using slave printk macros
All of these printk instances benefit from having both master and slave
device information included, so convert to using a standardized macro
format and remove redundant information.
Suggested-by: Joe Perches <joe@perches.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Fri, 7 Jun 2019 14:59:28 +0000 (10:59 -0400)]
bonding: add slave_foo printk macros
Where possible, we generally want both the bond master and the relevant slave
information in message output. Standardize the format using new slave_*
printk macros.
Suggested-by: Joe Perches <joe@perches.com> CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Fri, 7 Jun 2019 14:59:27 +0000 (10:59 -0400)]
bonding: fix error messages in bond_do_fail_over_mac
Passing the bond name again to debug output when referencing slave is wrong.
We're trying to set the bond's MAC to that of the new_active slave, so adjust
the error message slightly and pass in the slave's name, not the bond's.
Then we're trying to set the MAC on the old active slave, but putting the
new active slave's name in the output. While we're at it, clarify the
error messages so you know which one actually triggered.
CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Fri, 7 Jun 2019 14:59:26 +0000 (10:59 -0400)]
bonding: improve event debug usability
Seeing bonding debug log data along the lines of "event: 5" is a bit spartan,
and often requires a lookup table if you don't remember what every event is.
Make use of netdev_cmd_to_name for an improved debugging experience, so for
the prior example, you'll see: "bond_netdev_event received NETDEV_REGISTER"
instead (both are prefixed with the device for which the event pertains).
CC: Jay Vosburgh <j.vosburgh@gmail.com> CC: Veaceslav Falico <vfalico@gmail.com> CC: Andy Gospodarek <andy@greyhouse.net> CC: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 9 Jun 2019 20:20:59 +0000 (13:20 -0700)]
Merge branch 'hns3-next'
Huazhong Tan says:
====================
net: hns3: some code optimizations & cleanups & bugfixes
This patch-set includes code optimizations, cleanups and bugfixes for
the HNS3 ethernet controller driver.
[patch 1/12] logs more detail error info for ROCE RAS errors.
[patch 2/12] fixes a wrong size issue for mailbox responding.
[patch 3/12] makes HW GRO handing compliant with SW one.
[patch 4/12] refactors hns3_get_new_int_gl.
[patch 5/12] adds handling for VF's over_8bd_nfe_err.
[patch 6/12 - 12/12] adds some code optimizations and cleanups, to
make the code more readable and compliant with some static code
analysis tools, these modifications do not change the logic of
the code.
Change log:
V1->V2: fixes comment from David Miller.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Weihang Li [Fri, 7 Jun 2019 02:03:13 +0000 (10:03 +0800)]
net: hns3: fix some coding style issues
This patch fixes some coding style issues reported by some static code
analysis tools and code review, such as modify some comments, rename
some variables, log some errors in detail, and fixes some alignment
errors.
BTW, these cleanups do not change the logic of code.
Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: HuiSong Li <lihuisong@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Fri, 7 Jun 2019 02:03:12 +0000 (10:03 +0800)]
net: hns3: some modifications to simplify and optimize code
This patch deletes some redundant code and refactors some bloated
functions.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In order to make it more readable, this patch modifies PF/VF's
RSS hash key configuring function.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Fri, 7 Jun 2019 02:03:10 +0000 (10:03 +0800)]
net: hns3: use macros instead of magic numbers
This patch adds some macros instead of magic numbers in serval places
Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 7 Jun 2019 02:03:09 +0000 (10:03 +0800)]
net: hns3: small changes for magic numbers
In order to improve readability, this patch uses macros to
replace some magic numbers, and adds some comments for some
others.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yonglong Liu [Fri, 7 Jun 2019 02:03:08 +0000 (10:03 +0800)]
net: hns3: delete the redundant user NIC codes
Since HNAE3_CLIENT_UNIC and HNAE3_DEV_UNIC is not used any more,
this patch removes the redundant codes.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Weihang Li [Fri, 7 Jun 2019 02:03:07 +0000 (10:03 +0800)]
net: hns3: trigger VF reset if a VF has an over_8bd_nfe_err
We trigger PF reset when a RAS error of NIC named over_8bd_nfe_err
occurred before. But it is possible that a VF causes that error, it's
reasonable to trigger VF reset instead of PF reset in this case.
This patch add detection of vf_id if a over_8bd_nfe_err occurs, if
vf_id is 0, we trigger PF reset. Otherwise, we will trigger VF reset
on the VF with error.
Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Fri, 7 Jun 2019 02:03:06 +0000 (10:03 +0800)]
net: hns3: refactor hns3_get_new_int_gl function
This patch adds a new hns3_get_new_flow_lvl function to calculate
the packet flow level, which is used to decide the interrupt
coalescence parameter, in order to make the flow level calculation
code more readable and make the future calculation ajdustment easier.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Fri, 7 Jun 2019 02:03:05 +0000 (10:03 +0800)]
net: hns3: replace numa_node_id with numa_mem_id for buffer reusing
This patch replaces numa_node_id with numa_mem_id when doing buffer
reusing checking, because the buffer still can be reused when the
buffer is from the nearest node and the local node has no memory
attached.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Fri, 7 Jun 2019 02:03:04 +0000 (10:03 +0800)]
net: hns3: make HW GRO handling compliant with SW GRO
Currently when a GRO packet is assembled by HW, the checksum is
modified to reflect the entire packet by HW and skb->ip_summed is
set to CHECKSUM_UNNECESSARY, which is not compliant with SW GRO.
This patch sets up skb's network and transport header, sets the
GRO packet's checksum according to pseudo header and set the
skb->ip_summed to CHECKSUM_PARTIAL.
This patch also use gso_size to distinguish GRO packet from
normal packet, use eth_type_vlan to check the VLAN type and set
the SKB_GSO_TCP_FIXEDID according to BD info during HW GRO info
processing.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zhongzhu Liu [Fri, 7 Jun 2019 02:03:03 +0000 (10:03 +0800)]
net: hns3: fix wrong size of mailbox responding data
According to user manual, the maximum size of mailbox responding
data is 8 bytes, the macro HCLGE_MBX_MAX_RESP_DATA_SIZE
should be defined as 8 instead of 16.
Fixes: 9194d18b0577 ("net: hns3: fix the problem that the supported port is empty") Signed-off-by: Zhongzhu Liu <liuzhongzhu@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaofei Tan [Fri, 7 Jun 2019 02:03:02 +0000 (10:03 +0800)]
net: hns3: log detail error info of ROCEE ECC and AXI errors
This patch logs detail error info of ROCEE ECC and AXI errors for
debug purpose, and remove unnecessary reset for ROCEE overflow
errors.
Signed-off-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: ethernet: ti: netcp: update and enable cpts support
The Keystone 2 66AK2HK/E/L 1G Ethernet Switch Subsystems contains The
Common Platform Time Sync (CPTS) module which is in general compatible with
CPTS module found on TI AM3/4/5 SoCs. So, the basic support for
Keystone 2 CPTS is available by default, but not documented and has never been
enabled inconfig files.
The Keystone 2 CPTS module supports also some additional features like time
sync reference (RFTCLK) clock selection through CPTS_RFTCLK_SEL register
(offset: x08) in CPTS module, which can modelled as multiplexer clock
(this was discussed some time ago [1]).
This series adds missed binding documentation for Keystone 2 66AK2HK/E/L
CPTS module and enables CPTS for TI Keystone 2 66AK2HK/E/L SoCs with possiblity
to select CPTS reference clock.
Patch 1: adds the CPTS binding documentation. CPTS bindings are defined in the
way that allows CPTS properties to be grouped under "cpts" sub-node.
It also defines "cpts-refclk-mux" clock for CPTS RFTCLK selection.
Patches 2-3: implement CPTS properties grouping under "cpts" sub-node with
backward compatibility support.
Patch 4: adds support for time sync reference (RFTCLK) clock selection from DT
by adding support for "cpts-refclk-mux" multiplexer clock.
Patches 5-9: DT CPTS nodes update for TI Keystone 2 66AK2HK/E/L SoCs.
Patch 10: enables CPTS for TI Keystone 2 66AK2HK/E/L SoCs.
I grouped all patches in one series for better illustration of the changes,
but in general Pateches 1-4 are netdev matarieal (first) and other patches
are platform specific.
Series can be found at:
git@git.ti.com:~gragst/ti-linux-kernel/gragsts-ti-linux-kernel.git
branch:
net-next-k2e-cpts-refclk
Changes in v2:
- do reverse christmas tree in cpts_of_mux_clk_setup()
- add ack from Richard Cochran
net: ethernet: ti: cpts: add support for ext rftclk selection
Some CPTS instances, which can be found on KeyStone 2 1G Ethernet Switch
Subsystems, can control an external multiplexer that selects one of up to
32 clocks as time sync reference (RFTCLK) clock. This feature can be
configured through CPTS_RFTCLK_SEL register (offset: x08) in CPTS module
and can be represented as multiplexer clock.
Hence, introduce support for optional cpts-refclk-mux clock, which, once
defined will allow to select required CPTS RFTCLK by using
assigned-clock-parents DT property in board files.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: ti: netcp_ethss: add support for child cpts node
Allow to place CPTS properties in the child "cpts" DT node. For backward
compatibility - roll-back and read CPTS DT properties from parent node if
"cpts" node is not present.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: ti: cpts: use devm_get_clk_from_child
Use devm_get_clk_from_child() instead of devm_clk_get() and this way allow
to group CPTS DT properties in sub-node for better code readability and
maintenance. Roll-back to devm_clk_get() if devm_get_clk_from_child()
fails for backward compatibility.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The Keystone 2 66AK2HK/E/L 1G Ethernet Switch Subsystems contains The
Common Platform Time Sync (CPTS) module which is in general compatible with
CPTS module found on "legacy" TI AM3/4/5 SoCs. So, the basic support for
Keystone 2 CPTS is available by default, but not documented.
The Keystone 2 CPTS module supports also some additional features like time
sync reference (RFTCLK) clock selection through CPTS_RFTCLK_SEL register
(offset: x08) in CPTS module, which is modelled as multiplexer clock.
This patch adds missed binding documentation for Keystone 2 66AK2HK/E/L
CPTS module.
Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Acked-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
PTP support for the SJA1105 DSA driver
This patchset adds the following:
- A timecounter/cyclecounter based PHC for the free-running
timestamping clock of this switch.
- A state machine implemented in the DSA tagger for SJA1105, which
keeps track of metadata follow-up Ethernet frames (the switch's way
of transmitting RX timestamps).
Clock manipulations on the actual hardware PTP clock will have to be
implemented anyway, for the TTEthernet block and the time-based ingress
policer.
v3 patchset can be found at:
https://lkml.org/lkml/2019/6/4/954
Changes from v3:
- Made it compile with the SJA1105 DSA driver and PTP driver as modules.
- Reworked/simplified/fixed some issues in 03/17
(dsa_8021q_remove_header) and added an ASCII image that
illustrates the transformation that is taking place.
- Removed a useless check for sja1105_is_link_local from 16/17 (RX
timestamping) which also made previous 08/17 patch ("Move
sja1105_is_link_local to include/linux") useless and therefore dropped.
v2 patchset can be found at:
https://lkml.org/lkml/2019/6/2/146
Changes from v2:
- Broke previous 09/10 patch (timestamping) into multiple smaller
patches.
- Every patch in the series compiles.
v1 patchset can be found at:
https://lkml.org/lkml/2019/5/28/1093
Changes from v1:
- Removed the addition of the DSA .can_timestamp callback.
- Waiting for meta frames is done completely inside the tagger, and all
frames emitted on RX are already partially timestamped.
- Added a global data structure for the tagger common to all ports.
- Made PTP work with ports in standalone mode, by limiting use of the
DMAC-mangling "incl_srcpt" mode only when ports are bridged, aka when
the DSA master is already promiscuous and can receive anything.
Also changed meta frames to be sent at the 01-80-C2-00-00-0E DMAC.
- Made some progress w.r.t. observed negative path delay. Apparently it
only appears when the delay mechanism is the delay request-response
(end-to-end) one. If peer delay is used (-P), the path delay is
positive and appears reasonable for an 1000Base-T link (485 ns in
steady state).
Vladimir Oltean [Sat, 8 Jun 2019 12:04:43 +0000 (15:04 +0300)]
net: dsa: sja1105: Expose PTP timestamping ioctls to userspace
This enables the PTP support towards userspace applications such as
linuxptp.
The switches can timestamp only trapped multicast MAC frames, and
therefore only the profiles of 1588 over L2 are supported.
TX timestamping can be enabled per port, but RX timestamping is enabled
globally. As long as RX timestamping is enabled, the switch will emit
metadata follow-up frames that will be processed by the tagger. It may
be a problem that linuxptp does not restore the RX timestamping settings
when exiting.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 12:04:42 +0000 (15:04 +0300)]
net: dsa: sja1105: Add a state machine for RX timestamping
Meta frame reception relies on the hardware keeping its promise that it
will send no other traffic towards the CPU port between a link-local
frame and a meta frame. Otherwise there is no other way to associate
the meta frame with the link-local frame it's holding a timestamp of.
The receive function is made stateful, and buffers a timestampable frame
until its meta frame arrives, then merges the two, drops the meta and
releases the link-local frame up the stack.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 12:04:41 +0000 (15:04 +0300)]
net: dsa: sja1105: Increase priority of CPU-trapped frames
Without noticing any particular issue, this patch ensures that
management traffic is treated with the maximum priority on RX by the
switch. This is generally desirable, as the driver keeps a state
machine that waits for metadata follow-up frames as soon as a management
frame is received. Increasing the priority helps expedite the reception
(and further reconstruction) of the RX timestamp to the driver after the
MAC has generated it.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 12:04:40 +0000 (15:04 +0300)]
net: dsa: sja1105: Add a global sja1105_tagger_data structure
This will be used to keep state for RX timestamping. It is global
because the switch serializes timestampable and meta frames when
trapping them towards the CPU port (lower port indices have higher
priority) and therefore having one state machine per port would create
unnecessary complications.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 12:04:39 +0000 (15:04 +0300)]
net: dsa: sja1105: Receive and decode meta frames
This adds support in the tagger for understanding the source port and
switch id of meta frames. Their timestamp is also extracted but not
used yet - this needs to be done in a state machine that modifies the
previously received timestampable frame - will be added in a follow-up
patch.
Also take the opportunity to:
- Remove a comment in sja1105_filter made obsolete by e8d67fa5696e
("net: dsa: sja1105: Don't store frame type in skb->cb")
- Reorder the checks in sja1105_filter to optimize for the most likely
scenario first: regular traffic.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 12:04:38 +0000 (15:04 +0300)]
net: dsa: sja1105: Make sja1105_is_link_local not match meta frames
Although meta frames are configured to be sent at SJA1105_META_DMAC
(01-80-C2-00-00-0E) which is a multicast MAC address that would also be
trapped by the switch to the CPU, were it to receive it on a front-panel
port, meta frames are conceptually not link-local frames, they only
carry their RX timestamps.
The choice of sending meta frames at a multicast DMAC is a pragmatic
one, to avoid installing an extra entry to the DSA master port's
multicast MAC filter.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 12:04:37 +0000 (15:04 +0300)]
net: dsa: sja1105: Add support for the AVB Parameters Table
This table is used to program the switch to emit "meta" follow-up
Ethernet frames (which contain partial RX timestamps) after each
link-local frame that was trapped to the CPU port through MAC filtering.
This includes PTP frames.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The Ethernet payload will be decoded in a follow-up patch.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 12:04:35 +0000 (15:04 +0300)]
net: dsa: sja1105: Add logic for TX timestamping
On TX, timestamping is performed synchronously from the
port_deferred_xmit worker thread.
In management routes, the switch is requested to take egress timestamps
(again partial), which are reconstructed and appended to a clone of the
skb that was just sent. The cloning is done by DSA and we retrieve the
pointer from the structure that DSA keeps in skb->cb.
Then these clones are enqueued to the socket's error queue for
application-level processing.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 12:04:34 +0000 (15:04 +0300)]
net: dsa: sja1105: Add support for the PTP clock
The design of this PHC driver is influenced by the switch's behavior
w.r.t. timestamping. It exposes two PTP counters, one free-running
(PTPTSCLK) and the other offset- and frequency-corrected in hardware
through PTPCLKVAL, PTPCLKADD and PTPCLKRATE. The MACs can sample either
of these for frame timestamps.
However, the user manual warns that taking timestamps based on the
corrected clock is less than useful, as the switch can deliver corrupted
timestamps in a variety of circumstances.
Therefore, this PHC uses the free-running PTPTSCLK together with a
timecounter/cyclecounter structure that translates it into a software
time domain. Thus, the settime/adjtime and adjfine callbacks are
hardware no-ops.
The timestamps (introduced in a further patch) will also be translated
to the correct time domain before being handed over to the userspace PTP
stack.
The introduction of a second set of PHC operations that operate on the
hardware PTPCLKVAL/PTPCLKADD/PTPCLKRATE in the future is somewhat
unavoidable, as the TTEthernet core uses the corrected PTP time domain.
However, the free-running counter + timecounter structure combination
will suffice for now, as the resulting timestamps yield a sub-50 ns
synchronization offset in steady state using linuxptp.
For this patch, in absence of frame timestamping, the operations of the
switch PHC were tested by syncing it to the system time as a local slave
clock with:
phc2sys -s CLOCK_REALTIME -c swp2 -O 0 -m -S 0.01
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 12:04:32 +0000 (15:04 +0300)]
net: dsa: sja1105: Limit use of incl_srcpt to bridge+vlan mode
The incl_srcpt setting makes the switch mangle the destination MACs of
multicast frames trapped to the CPU - a primitive tagging mechanism that
works even when we cannot use the 802.1Q software features.
The downside is that the two multicast MAC addresses that the switch
traps for L2 PTP (01-80-C2-00-00-0E and 01-1B-19-00-00-00) quickly turn
into a lot more, as the switch encodes the source port and switch id
into bytes 3 and 4 of the MAC. The resulting range of MAC addresses
would need to be installed manually into the DSA master port's multicast
MAC filter, and even then, most devices might not have a large enough
MAC filtering table.
As a result, only limit use of incl_srcpt to when it's strictly
necessary: when under a VLAN filtering bridge. This fixes PTP in
non-bridged mode (standalone ports). Otherwise, PTP frames, as well as
metadata follow-up frames holding RX timestamps won't be received
because they will be blocked by the master port's MAC filter.
Linuxptp doesn't help, because it only requests the addition of the
unmodified PTP MACs to the multicast filter.
This issue is not seen in bridged mode because the master port is put in
promiscuous mode when the slave ports are enslaved to a bridge.
Therefore, there is no downside to having the incl_srcpt mechanism
active there.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 12:04:31 +0000 (15:04 +0300)]
net: dsa: sja1105: Reverse TPID and TPID2
>From reading the P/Q/R/S user manual, it appears that TPID is used by
the switch for detecting S-tags and TPID2 for C-tags. Their meaning is
not clear from the E/T manual.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 12:04:30 +0000 (15:04 +0300)]
net: dsa: sja1105: Move sja1105_change_tpid into sja1105_vlan_filtering
This is a cosmetic patch, pre-cursor to making another change to the
General Parameters Table (incl_srcpt) which does not logically pertain
to the sja1105_change_tpid function name, but not putting it there would
otherwise create a need of resetting the switch twice.
So simply move the existing code into the .port_vlan_filtering callback,
where the incl_srcpt change will be added as well.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 12:04:29 +0000 (15:04 +0300)]
net: dsa: tag_8021q: Create helper function for removing VLAN header
This removes the existing implementation from tag_sja1105, which was
partially incorrect (it was not changing the MAC header offset, thereby
leaving it to point 4 bytes earlier than it should have).
This overwrites the VLAN tag by moving the Ethernet source and
destination MACs 4 bytes to the right. Then skb->data (assumed to be
pointing immediately after the EtherType) is temporarily pushed to the
beginning of the new Ethernet header, the new Ethernet header offset and
length are recorded, then skb->data is moved back to where it was.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 12:04:28 +0000 (15:04 +0300)]
net: dsa: Add teardown callback for drivers
This is helpful for e.g. draining per-driver (not per-port) tagger
queues.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 8 Jun 2019 12:04:27 +0000 (15:04 +0300)]
net: dsa: Keep a pointer to the skb clone for TX timestamping
For drivers that use deferred_xmit for PTP frames (such as sja1105),
there is no need to perform matching between PTP frames and their egress
timestamps, since the sending process can be serialized.
In that case, it makes sense to have the pointer to the skb clone that
DSA made directly in the skb->cb. It will be used for pushing the egress
timestamp back in the application socket's error queue.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1) Free AF_PACKET po->rollover properly, from Willem de Bruijn.
2) Read SFP eeprom in max 16 byte increments to avoid problems with
some SFP modules, from Russell King.
3) Fix UDP socket lookup wrt. VRF, from Tim Beale.
4) Handle route invalidation properly in s390 qeth driver, from Julian
Wiedmann.
5) Memory leak on unload in RDS, from Zhu Yanjun.
6) sctp_process_init leak, from Neil HOrman.
7) Fix fib_rules rule insertion semantic change that broke Android,
from Hangbin Liu.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (33 commits)
pktgen: do not sleep with the thread lock held.
net: mvpp2: Use strscpy to handle stat strings
net: rds: fix memory leak in rds_ib_flush_mr_pool
ipv6: fix EFAULT on sendto with icmpv6 and hdrincl
ipv6: use READ_ONCE() for inet->hdrincl as in ipv4
Revert "fib_rules: return 0 directly if an exactly same rule exists when NLM_F_EXCL not supplied"
net: aquantia: fix wol configuration not applied sometimes
ethtool: fix potential userspace buffer overflow
Fix memory leak in sctp_process_init
net: rds: fix memory leak when unload rds_rdma
ipv6: fix the check before getting the cookie in rt6_get_cookie
ipv4: not do cache for local delivery if bc_forwarding is enabled
s390/qeth: handle error when updating TX queue count
s390/qeth: fix VLAN attribute in bridge_hostnotify udev event
s390/qeth: check dst entry before use
s390/qeth: handle limited IPv4 broadcast in L3 TX path
net: fix indirect calls helpers for ptype list hooks.
net: ipvlan: Fix ipvlan device tso disabled while NETIF_F_IP_CSUM is set
udp: only choose unbound UDP socket for multicast when not in a VRF
net/tls: replace the sleeping lock around RX resync with a bit lock
...
Linus Torvalds [Fri, 7 Jun 2019 16:25:27 +0000 (09:25 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Things are looking pretty quiet here in RDMA, not too many bug fixes
rolling in right now. The usual driver bug fixes and fixes for a
couple of regressions introduced in 5.2:
- Fix a race on bootup with RDMA device renaming and srp. SRP also
needs to rename its internal sys files
- Fix a memory leak in hns
- Don't leak resources in efa on certain error unwinds
- Don't panic in certain error unwinds in ib_register_device
- Various small user visible bug fix patches for the hfi and efa
drivers
- Fix the 32 bit compilation break"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/efa: Remove MAYEXEC flag check from mmap flow
mlx5: avoid 64-bit division
IB/hfi1: Validate page aligned for a given virtual address
IB/{qib, hfi1, rdmavt}: Correct ibv_devinfo max_mr value
IB/hfi1: Insure freeze_work work_struct is canceled on shutdown
IB/rdmavt: Fix alloc_qpn() WARN_ON()
RDMA/core: Fix panic when port_data isn't initialized
RDMA/uverbs: Pass udata on uverbs error unwind
RDMA/core: Clear out the udata before error unwind
RDMA/hns: Fix PD memory leak for internal allocation
RDMA/srp: Rename SRP sysfs name after IB device rename trigger
Linus Torvalds [Fri, 7 Jun 2019 16:21:48 +0000 (09:21 -0700)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Will Deacon:
"Another round of mostly-benign fixes, the exception being a boot crash
on SVE2-capable CPUs (although I don't know where you'd find such a
thing, so maybe it's benign too).
We're in the process of resolving some big-endian ptrace breakage, so
I'll probably have some more for you next week.
Summary:
- Fix boot crash on platforms with SVE2 due to missing register
encoding
- Fix architected timer accessors when CONFIG_OPTIMIZE_INLINING=y
- Move cpu_logical_map into smp.h for use by upcoming irqchip drivers
- Trivial typo fix in comment
- Disable some useless, noisy warnings from GCC 9"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: Silence gcc warnings about arch ABI drift
ARM64: trivial: s/TIF_SECOMP/TIF_SECCOMP/ comment typo fix
arm64: arch_timer: mark functions as __always_inline
arm64: smp: Moved cpu_logical_map[] to smp.h
arm64: cpufeature: Fix missing ZFR0 in __read_sysreg_by_encoding()
This is a series of enhancements and bug fixes in order to get the mainline
version of this driver into a more generally usable state, including on
x86 or ARM platforms. It also converts the driver to use the phylink API
in order to provide support for SFP modules.
Changes since v4:
-Use reverse christmas tree variable order
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Hancock [Thu, 6 Jun 2019 22:28:24 +0000 (16:28 -0600)]
net: axienet: convert to phylink API
Convert this driver to use the phylink API rather than the legacy PHY
API. This allows for better support for SFP modules connected using a
1000BaseX or SGMII interface.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Hancock [Thu, 6 Jun 2019 22:28:23 +0000 (16:28 -0600)]
net: axienet: make use of axistream-connected attribute optional
Currently the axienet driver requires the use of a second devicetree
node, referenced by an axistream-connected attribute on the Ethernet
device node, which contains the resources for the AXI DMA block used by the
device. This setup is problematic for a use case we have where the Ethernet
and DMA cores are behind a PCIe to AXI bridge and the memory resources for
the nodes are injected into the platform devices using the multifunction
device subsystem - it's not easily possible for the driver to obtain the
platform-level resources from the linked device.
In order to simplify that usage model, and simplify the overall use of
this driver in general, allow for all of the resources to be kept on one
node where the resources are retrieved using platform device APIs rather
than device-tree-specific ones. The previous usage setup is still
supported if the axistream-connected attribute is specified.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
The axienet driver requires the use of an axistream-connected attribute,
but this isn't documented in the devicetree bindings. Document how this
attribute is supposed to be used, including the upcoming change to make
the usage of this attribute optional.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Hancock [Thu, 6 Jun 2019 22:28:21 +0000 (16:28 -0600)]
net: axienet: Fix MDIO bus parent node detection
This driver was previously using the parent node of the specified PHY
node as the device node to register the MDIO bus on. Andrew Lunn
pointed out this is wrong as the PHY node is potentially not even
underneath the MDIO bus for the current device instance. Find the MDIO
node explicitly by looking it up by name under the controller's device
node instead.
This could potentially break existing device trees if they don't use
"mdio" as the name for the MDIO bus, but I did not find any with various
searches and Xilinx's examples all use mdio as the name so it seems like
this should be relatively safe.
Signed-off-by: Robert Hancock <hancock@sedsystems.ca> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>