====================
am65-cpsw: add taprio/EST offload support
AM65 CPSW h/w supports Enhanced Scheduled Traffic (EST – defined
in P802.1Qbv/D2.2 that later got included in IEEE 802.1Q-2018)
configuration. EST allows express queue traffic to be scheduled
(placed) on the wire at specific repeatable time intervals. In
Linux kernel, EST configuration is done through tc command and
the taprio scheduler in the net core implements a software only
scheduler (SCH_TAPRIO). If the NIC is capable of EST configuration,
user indicate "flag 2" in the command which is then parsed by
taprio scheduler in net core and indicate that the command is to
be offloaded to h/w. taprio then offloads the command to the
driver by calling ndo_setup_tc() ndo ops. This patch implements
ndo_setup_tc() as well as other changes required to offload EST
configuration to CPSW h/w
For more details please refer patch 2/2.
This series is based on original work done by Ivan Khoronzhuk
<ivan.khoronzhuk@linaro.org> to add taprio offload support to
AM65 CPSW 2G.
1. Example configuration 3 Gates
ifconfig eth0 down
ethtool -L eth0 tx 3
ethtool --set-priv-flags eth0 p0-rx-ptype-rrobin off
ethtool --set-priv-flags eth0 p0-rx-ptype-rrobin off
ifconfig eth0 192.168.2.20
tc qdisc replace dev eth0 parent root handle 100 taprio \
num_tc 8 \
map 0 1 2 3 4 5 6 7 0 0 0 0 0 0 0 0 \
queues 1@0 1@1 1@2 1@3 1@4 1@5 1@6 1@7 \
base-time 0000 \
sched-entry S 80 125000 \
sched-entry S 40 125000 \
sched-entry S 20 125000 \
sched-entry S 10 125000 \
sched-entry S 08 125000 \
sched-entry S 04 125000 \
sched-entry S 02 125000 \
sched-entry S 01 125000 \
flags 2
Classify frames to particular priority using skbedit so that they land at
a specific queue in cpsw h/w which is Gated by the EST gate which opens based
on the sched-entry.
tc qdisc add dev eth0 clsact
In the below for example an iperf3 session with destination port 5007
will go through Q7.
tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5007 0xffff action skbedit priority 7
tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5006 0xffff action skbedit priority 6
tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5005 0xffff action skbedit priority 5
tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5004 0xffff action skbedit priority 4
tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5003 0xffff action skbedit priority 3
tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5002 0xffff action skbedit priority 2
tc filter add dev eth0 egress protocol ip prio 1 u32 match ip dport 5001 0xffff action skbedit priority 1
Testing was done by capturing frames at the PC using wireshark and checking for
the bust interval or cycle time of UDP frames with a specific port number.
Verified that the distance between first frame of a burst (cycle-time) is 1
milli second and burst duration is within 125 usec based on the received packet
timestamp shown in wireshark packet display.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Wed, 13 May 2020 13:26:15 +0000 (09:26 -0400)]
ethernet: ti: am65-cpsw-qos: add TAPRIO offload support
AM65 CPSW h/w supports Enhanced Scheduled Traffic (EST – defined
in P802.1Qbv/D2.2 that later got included in IEEE 802.1Q-2018)
configuration. EST allows express queue traffic to be scheduled
(placed) on the wire at specific repeatable time intervals. In
Linux kernel, EST configuration is done through tc command and
the taprio scheduler in the net core implements a software only
scheduler (SCH_TAPRIO). If the NIC is capable of EST configuration,
user indicate "flag 2" in the command which is then parsed by
taprio scheduler in net core and indicate that the command is to
be offloaded to h/w. taprio then offloads the command to the
driver by calling ndo_setup_tc() ndo ops. This patch implements
ndo_setup_tc() to offload EST configuration to CPSW h/w.
Currently driver supports only SetGateStates operation. EST
operates on a repeating time interval generated by the CPTS EST
function generator. Each Ethernet port has a global EST fetch
RAM that can be configured as 2 buffers, each of 64 locations
or one large buffer of 128 locations. In 2 buffer configuration,
a ping pong mechanism is used to hold the active schedule (oper)
in one buffer and new (admin) command in the other. Each 22-bit
fetch command consists of a 14-bit fetch count (14 MSB’s) and an
8-bit priority fetch allow (8 LSB’s) that will be applied for the
fetch count time in wireside clocks. Driver process each of the
sched-entry in the offload command and update the fetch RAM.
Driver configures duration in sched-entry into the fetch count
and Gate mask into the priority fetch bits of the RAM. Then
configures the CPTS EST function generator to activate the
schedule. Currently driver supports only 2 buffer configuration
which means driver supports a max cycle time of ~8 msec.
CPSW supports a configurable number of priority queues (up to 8)
and needs to be switched to this mode from the default round
robin mode before EST can be offloaded. User configures
these through ethtool commands (-L for changing number of
queues and --set-priv-flags to disable round robin mode).
Driver doesn't enable EST if pf_p0_rx_ptype_rrobin privat flag
is set. The flag is common for all ports, and so can't be just
overridden by taprio configuration w/o user involvement.
Command fails if pf_p0_rx_ptype_rrobin is already set in the
driver.
Scheds (commands) configuration depends on interface speed so
driver translates the duration to the fetch count based on
link speed. Each schedule can be constructed with several
command entries in fetch RAM depending on interval. For example
if each sched has timer interval < ~130us on 1000 Mb link then
each sched consumes one command and have 1:1 mapping. When
Ethernet link goes down, driver purge the configuration if link
is down for more than 1 second.
The patch allows to update the timer and scheds memory only if it's
really needed, and skip cases required the user to stop timer by
configuring only shceds memory.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Wed, 13 May 2020 13:26:14 +0000 (09:26 -0400)]
ethernet: ti: am65-cpts: add routines to support taprio offload
TAPRIO/EST offload support in CPSW2G requires EST scheduler
function enabled in CPTS. So this patch add a function to
set cycle time for EST scheduler. It also add a function for
getting time in ns of PHC clock for taprio qdisc configuration.
Mostly to verify if timer update is needed or to get actual
state of oper/admin schedule.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
FastLinQ devices as a complex systems may observe various hardware
level error conditions, both severe and recoverable.
Driver is able to detect and report this, but so far it only did
trace/dmesg based reporting.
Here we implement an extended hw error detection, service task
handler captures a dump for the later analysis.
I also resubmit a patch from Denis Bolotin on tx timeout handler,
addressing David's comment regarding recovery procedure as an extra
reaction on this event.
v2:
Removing the patch with ethtool dump and udev magic. Its quite isolated,
I'm working on devlink based logic for this separately.
Igor Russkikh [Thu, 14 May 2020 09:57:27 +0000 (12:57 +0300)]
net: qed: fix bad formatting
On some adjacent code, fix bad code formatting
Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
MCP may signal driver about generic critical failure.
Driver has to collect mdump information (get_retain),
it pushes that to logs and triggers generic notification on
"hardware attention" event.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 May 2020 09:57:25 +0000 (12:57 +0300)]
net: qed: introduce critical fan failure handler
Fan failure is sent by firmware, driver reacts on this error with
newly introduced notification path. It will collect dump and shut down
the device to prevent physical breakage
Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Denis Bolotin [Thu, 14 May 2020 09:57:24 +0000 (12:57 +0300)]
net: qede: Implement ndo_tx_timeout
Upon tx timeout detection we do disable carrier and print TX queue
info on TX timeout. We then raise hw error condition and trigger
service task to handle this.
This handler will capture extra debug info and then optionally
trigger recovery procedure to try restore function.
Signed-off-by: Denis Bolotin <dbolotin@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 May 2020 09:57:23 +0000 (12:57 +0300)]
net: qede: optional hw recovery procedure
Driver has an ability to initiate a recovery process as a reaction to
detected errors. But the codepath (recovery_process) was disabled and
never active.
Here we add ethtool private flag to allow user have the recovery
procedure activated.
We still do not enable this by default though, since in some configurations
this is not desirable. E.g. this may impact other PFs/VFs.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 May 2020 09:57:22 +0000 (12:57 +0300)]
net: qed: attention clearing properties
On different hardware events we have to respond differently,
on some of hardware indications hw attention (error condition)
should be cleared by the driver to continue normal functioning.
Here we introduce attention clear flags, and put them on some
important events (in aeu_descs).
Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 May 2020 09:57:21 +0000 (12:57 +0300)]
net: qed: cleanup debug related declarations
Thats probably a legacy code had double declaration of some fields.
Cleanup this, removing copy and fixing references.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 May 2020 09:57:20 +0000 (12:57 +0300)]
net: qed: critical err reporting to management firmware
On various critical errors, notification handler should also report
the err information into the management firmware.
MFW can interact with server/motherboard backend agents - these are
used by server manufacturers to monitor server HW health.
Thus, it is important for driver to report on any faulty conditions
Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 May 2020 09:57:19 +0000 (12:57 +0300)]
net: qed: invoke err notify on critical areas
In a number of critical places not only debug trace should be printed,
but the appropriate hw error condition should be raised and error
handling/recovery should start.
Introduce our new qed_hw_err_notify invocation in these places to
record and indicate critical error conditions in hardware.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 May 2020 09:57:18 +0000 (12:57 +0300)]
net: qede: add hw err scheduled handler
qede (ethernet level driver) registers a callback handler.
This handler maintains eth dev state flags/bits to track error processing.
It implements in place processing part for nonsleeping context (WARN_ON
trigger), and a deferred (delayed work) part which triggers recovery
process for recoverable errors.
In later patches this atomic handler will come with more meat.
We introduce err_flags on ethdevice structure, its being used to record
error handling properties.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Thu, 14 May 2020 09:57:17 +0000 (12:57 +0300)]
net: qed: adding hw_err states and handling
Here we introduce qed device error tracking flags and error types.
qed_hw_err_notify is an entrace point to report errors.
It'll notify higher level drivers (qede/qedr/etc) to handle and recover
the error.
List of posible errors comes from hardware interfaces, but could be
extended in future.
Signed-off-by: Ariel Elior <ariel.elior@marvell.com> Signed-off-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 14 May 2020 12:41:26 +0000 (20:41 +0800)]
net: hns3: remove unnecessary frag list checking in hns3_nic_net_xmit()
The skb_has_frag_list() in hns3_nic_net_xmit() is redundant, since
skb_walk_frags() includes this checking implicitly.
Reported-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 14 May 2020 12:41:25 +0000 (20:41 +0800)]
net: hns3: remove some unused macros
There are some macros defined in hns3_enet.h, but not used in
anywhere.
Reported-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 14 May 2020 12:41:24 +0000 (20:41 +0800)]
net: hns3: modify an incorrect error log in hclge_mbx_handler()
When handling HCLGE_MBX_GET_LINK_STATUS, PF will return the link
status to the VF, so the error log of hclge_get_link_info() is
incorrect.
Reported-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 14 May 2020 12:41:23 +0000 (20:41 +0800)]
net: hns3: remove a duplicated printing in hclge_configure()
Since hclge_get_cfg() already has error print, so hclge_configure()
should not print error when calling hclge_get_cfg() fail.
Reported-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 14 May 2020 12:41:22 +0000 (20:41 +0800)]
net: hns3: modify some incorrect spelling
This patch modifies some incorrect spelling.
Reported-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thierry Reding [Thu, 14 May 2020 12:38:48 +0000 (14:38 +0200)]
r8152: Use MAC address from device tree if available
If a MAC address was passed via the device tree node for the r8152
device, use it and fall back to reading from EEPROM otherwise. This is
useful for devices where the r8152 EEPROM was not programmed with a
valid MAC address, or if users want to explicitly set a MAC address in
the bootloader and pass that to the kernel.
Signed-off-by: Thierry Reding <treding@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Thu, 14 May 2020 06:35:52 +0000 (09:35 +0300)]
selftests: fix flower parent qdisc
Flower tests used to create ingress filter with specified parent qdisc
"parent ffff:" but dump them on "ingress". With recent commit that fixed
tcm_parent handling in dump those are not considered same parent anymore,
which causes iproute2 tc to emit additional "parent ffff:" in first line of
filter dump output. The change in output causes filter match in tests to
fail.
Prevent parent qdisc output when dumping filters in flower tests by always
correctly specifying "ingress" parent both when creating and dumping
filters.
Fixes: a7df4870d79b ("net_sched: fix tcm_parent in tc filter dump") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
DENG Qingfang [Wed, 13 May 2020 15:37:17 +0000 (23:37 +0800)]
net: dsa: mt7530: set CPU port to fallback mode
Currently, setting a bridge's self PVID to other value and deleting
the default VID 1 renders untagged ports of that VLAN unable to talk to
the CPU port:
bridge vlan add dev br0 vid 2 pvid untagged self
bridge vlan del dev br0 vid 1 self
bridge vlan add dev sw0p0 vid 2 pvid untagged
bridge vlan del dev sw0p0 vid 1
# br0 cannot send untagged frames out of sw0p0 anymore
That is because the CPU port is set to security mode and its PVID is
still 1, and untagged frames are dropped due to VLAN member violation.
Set the CPU port to fallback mode so untagged frames can pass through.
Fixes: 83163f7dca56 ("net: dsa: mediatek: add VLAN support for MT7530") Signed-off-by: DENG Qingfang <dqfext@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: mvneta: speed down the PHY, if WoL used, to save energy
Some PHYs connected to this ethernet hardware support the WoL feature.
But when WoL is enabled and the machine is powered off, the PHY remains
waiting for a magic packet at max speed (i.e. 1Gbps), which is a waste of
energy.
Slow down the PHY speed before stopping the ethernet if WoL is enabled,
and save some energy while the machine is powered off or sleeping.
Tested using an Armada 370 based board (LS421DE) equipped with a Marvell 88E1518 PHY.
Signed-off-by: Daniel González Cabanelas <dgcbueu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 12 May 2020 17:13:55 +0000 (18:13 +0100)]
sfc: fix dereference of table before it is null checked
Currently pointer table is being dereferenced on a null check of
table->must_restore_filters before it is being null checked, leading
to a potential null pointer dereference issue. Fix this by null
checking table before dereferencing it when checking for a null
table->must_restore_filters.
Addresses-Coverity: ("Dereference before null check") Fixes: e4fe938cff04 ("sfc: move 'must restore' flags out of ef10-specific nic_data") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Walle [Wed, 13 May 2020 20:38:07 +0000 (22:38 +0200)]
net: phy: at803x: add cable diagnostics support
The AR8031/AR8033 and the AR8035 support cable diagnostics. Adding
driver support is straightforward, so lets add it.
The PHY just do one pair at a time, so we have to start the test four
times. The cable_test_get_status() can block and therefore we can just
busy poll the test completion and continue with the next pair until we
are done.
The time delta counter seems to run at 125MHz which just gives us a
resolution of about 82.4cm per tick.
100m cable, A/B/C/D open:
Cable test started for device eth0.
Cable test completed for device eth0.
Pair: Pair A, result: Open Circuit
Pair: Pair A, fault length: 107.94m
Pair: Pair B, result: Open Circuit
Pair: Pair B, fault length: 104.64m
Pair: Pair C, result: Open Circuit
Pair: Pair C, fault length: 105.47m
Pair: Pair D, result: Open Circuit
Pair: Pair D, fault length: 107.94m
1m cable, A/B connected, C shorted, D open:
Cable test started for device eth0.
Cable test completed for device eth0.
Pair: Pair A, result: OK
Pair: Pair B, result: OK
Pair: Pair C, result: Short within Pair
Pair: Pair C, fault length: 0.82m
Pair: Pair D, result: Open Circuit
Pair: Pair D, fault length: 0.82m
Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: set msg_control_is_user in do_ipv6_getsockopt
While do_ipv6_getsockopt does not call the high-level recvmsg helper,
the msghdr eventually ends up being passed to put_cmsg anyway, and thus
needs msg_control_is_user set to the proper value.
Fixes: 1f466e1f15cf ("net: cleanly handle kernel vs user buffers for ->msg_control") Reported-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: phy: broadcom: cable tester support
Add cable tester support for the Broadcom PHYs. Support for it was
developed on a BCM54140 Quad PHY which RDB register access.
If there is a link partner the results are not as good as with an open
cable. I guess we could retry if the measurement until all pairs had at
least one valid result.
changes since v1:
- added Reviewed-by: tags
- removed "div by 2" for cross shorts, just mention it in the commit
message. The results are inconclusive if the tests are repeated. So
just report the length as is for now.
- fixed typo in commit message
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Walle [Wed, 13 May 2020 16:35:24 +0000 (18:35 +0200)]
net: phy: bcm54140: add cable diagnostics support
Use the generic cable tester functions from bcm-phy-lib to add cable
tester support.
100m cable, A/B/C/D open:
Cable test started for device eth0.
Cable test completed for device eth0.
Pair: Pair A, result: Open Circuit
Pair: Pair B, result: Open Circuit
Pair: Pair C, result: Open Circuit
Pair: Pair D, result: Open Circuit
Pair: Pair A, fault length: 106.60m
Pair: Pair B, fault length: 103.32m
Pair: Pair C, fault length: 104.96m
Pair: Pair D, fault length: 106.60m
1m cable, A/B connected, pair C shorted, D open:
Cable test started for device eth0.
Cable test completed for device eth0.
Pair: Pair A, result: OK
Pair: Pair B, result: OK
Pair: Pair C, result: Short within Pair
Pair: Pair D, result: Open Circuit
Pair: Pair C, fault length: 0.82m
Pair: Pair D, fault length: 1.64m
1m cable, A/B connected, pair C shorted with D:
Cable test started for device eth0.
Cable test completed for device eth0.
Pair: Pair A, result: OK
Pair: Pair B, result: OK
Pair: Pair C, result: Short to another pair
Pair: Pair D, result: Short to another pair
Pair: Pair C, fault length: 1.64m
Pair: Pair D, fault length: 1.64m
The granularity of the length measurement seems to be 82cm.
Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Walle [Wed, 13 May 2020 16:35:23 +0000 (18:35 +0200)]
net: phy: broadcom: add cable test support
Most modern broadcom PHYs support ECD (enhanced cable diagnostics). Add
support for it in the bcm-phy-lib so they can easily be used in the PHY
driver.
There are two access methods for ECD: legacy by expansion registers and
via the new RDB registers which are exclusive. Provide functions in two
variants where the PHY driver can choose from. To keep things simple for
now, we just switch the register access to expansion registers in the
RDB variant for now. On the flipside, we have to keep a bus lock to
prevent any other non-legacy access on the PHY.
The results of the intra-pair tests are inconclusive (at least for the
BCM54140). Most of the times half the length is reported but sometimes
the length is correct.
Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Walle [Wed, 13 May 2020 16:35:22 +0000 (18:35 +0200)]
net: phy: broadcom: add bcm_phy_modify_exp()
Add the convenience function to do a read-modify-write. This has the
additional benefit of saving one write to the selection register.
Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Walle [Wed, 13 May 2020 16:35:21 +0000 (18:35 +0200)]
net: phy: broadcom: add exp register access methods without buslock
Add helper to read and write expansion registers without taking the mdio
lock.
Please note, that this changes the semantics of the read and write.
Before there was no lock between selecting the expansion register and
the actual read/write. This may lead to access failures if there are
parallel accesses. Instead take the bus lock during the whole access
cycle.
Signed-off-by: Michael Walle <michael@walle.cc> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Wed, 13 May 2020 12:34:40 +0000 (14:34 +0200)]
net: phy: tja11xx: add cable-test support
Add initial cable testing support.
This PHY needs only 100usec for this test and it is recommended to run it
before the link is up. For now, provide at least ethtool support, so it
can be tested by more developers.
This patch was tested with TJA1102 PHY with following results:
- No cable, is detected as open
- 1m cable, with no connected other end and detected as open
- a 40m cable (out of spec, max lenght should be 15m) is detected as OK.
Current patch do not provide polarity test support. This test would
indicate not proper wire connection, where "+" wire of main phy is
connected to the "-" wire of the link partner.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The Ethernet TX performance has been historically bad on Meson8b and
Meson8m2 SoCs because high packet loss was seen. I found out that this
was related (yet again) to the RGMII TX delay configuration.
In the process of discussing the big picture (and not just a single
patch) [0] with Andrew I discovered that the IP block behind the
dwmac-meson8b driver actually seems to support the configuration of the
RGMII RX delay (at least on the Meson8b SoC generation).
Since I sent the first RFC I got additional documentation from Jianxin
(many thanks!). Also I have discovered some more interesting details:
- Meson8b Odroid-C1 requires an RX delay (by either the PHY or the MAC)
Based on the vendor u-boot code (not upstream) I assume that it will
be the same for all Meson8b and Meson8m2 boards
- Khadas VIM2 seems to have the RX delay built into the PCB trace
length. When I enable the RX delay on the PHY or MAC I can't get any
data through. I expect that we will have the same situation on all
GXBB, GXM, AXG, G12A, G12B and SM1 boards. Further clarification is
needed here though (since I can't visually see these lengthened
traces on the PCB). This will be done before sending patches for
these boards.
Dependencies for this series:
There is a soft dependency for patch #2 on commit f22531438ff42c
"dt-bindings: net: dwmac: increase 'maxItems' for 'clocks',
'clock-names' properties" which is currently in Rob's -next tree.
That commit is needed to make the dt-bindings schema validation
pass for patch #2. That patch has been for ~4 weeks in Robs tree,
so I assume that is not going to be dropped.
Changes since RFC v2 at [2]:
- dropped $ref: /schemas/types.yaml#definitions/uint32 from the
"amlogic,rx-delay-ns" in patch #1 ("Don't need to define the
type when in standard units." says Rob - thanks, I learned
something new). Also use "default: 0" for for this property
instead of explaining it in the description text.
- added a note to the cover-letter about a hidden dependency for
dt-binding schema validation in patch #2
- Added Andrew's Reviewed-by to patches 1-7. Thank you again for
the quick and detailed reviews, I appreciate this!
- error out if the (optional) timing-adjustment clock is missing
but we're asked to enable the RGMII RX delay. The MAC won't
work in this specific case and either the RX delay has to be
provided by the PHY or the timing-adjustment clock has to be
added.
- dropped the dts patches (#9-11) which were only added to give
an overview how this is going to be used. those will be sent
separately
- dropped the RFC prefix
Changes since RFC v1 at [1]:
- add support for the timing adjustment clock input (dt-bindings and
in the driver) thanks to the input from the unnamed Ethernet engineer
at Amlogic. This is the missing link between the fclk_div2 clock and
the Ethernet controller on Meson8b (no traffic would flow if that
clock was disabled)
- add support fot the amlogic,rx-delay-ns property. The only supported
values so far are 0ns and 2ns. The registers seem to allow more
precise timing adjustments, but I could not make that work so far.
- add more register documentation (for the new RX delay bits) and
unified the placement of existing register documentation. Again,
thanks to Jianxin and the unnamed Ethernet engineer at Amlogic
- DO NOT MERGE: .dts patches to show the conversion of the Meson8b
and Meson8m2 boards to "rgmii-id". I didn't have time for all arm64
patches yet, but these will switch to phy-mode = "rgmii-txid" with
amlogic,rx-delay-ns = <0> (because the delay seems to be provided by
the PCB trace length).
net: stmmac: dwmac-meson8b: add support for the RX delay configuration
Configure the PRG_ETH0_ADJ_* bits to enable or disable the RX delay
based on the various RGMII PHY modes. For now the only supported RX
delay settings are:
- disabled, use for example for phy-mode "rgmii-id"
- 0ns - this is treated identical to "disabled", used for example on
boards where the PHY provides 2ns TX delay and the PCB trace length
already adds 2ns RX delay
- 2ns - for whenever the PHY cannot add the RX delay and the traces on
the PCB don't add any RX delay
Disabling the RX delay (in case u-boot enables it, which is the case
for example on Meson8b Odroid-C1) simply means that PRG_ETH0_ADJ_ENABLE,
PRG_ETH0_ADJ_SETUP, PRG_ETH0_ADJ_DELAY and PRG_ETH0_ADJ_SKEW should be
disabled (just disabling PRG_ETH0_ADJ_ENABLE may be enough, since that
disables the whole re-timing logic - but I find it makes more sense to
clear the other bits as well since they depend on that setting).
u-boot on Odroid-C1 uses the following steps to enable a 2ns RX delay:
- enabling enabling the timing adjustment clock
- enabling the timing adjustment logic by setting PRG_ETH0_ADJ_ENABLE
- setting the PRG_ETH0_ADJ_SETUP bit
The documentation for the PRG_ETH0_ADJ_DELAY and PRG_ETH0_ADJ_SKEW
registers indicates that we can even set different RX delays. However,
I could not find out how this works exactly, so for now we only support
a 2ns RX delay using the exact same way that Odroid-C1's u-boot does.
Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: stmmac: dwmac-meson8b: Make the clock enabling code re-usable
The timing adjustment clock will need similar logic as the RGMII clock:
It has to be enabled in the driver conditionally and when the driver is
unloaded it should be disabled again. Extract the existing code for the
RGMII clock into a new function so it can be re-used.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: stmmac: dwmac-meson8b: Fetch the "timing-adjustment" clock
The PRG_ETHERNET registers have a built-in timing adjustment circuit
which can provide the RX delay in RGMII mode. This is driven by an
external (to this IP, but internal to the SoC) clock input. Fetch this
clock as optional (even though it's there on all supported SoCs) since
we just learned about it and existing .dtbs don't specify it.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: stmmac: dwmac-meson8b: Add the PRG_ETH0_ADJ_* bits
The PRG_ETH0_ADJ_* are used for applying the RGMII RX delay. The public
datasheets only have very limited description for these registers, but
Jianxin Pan provided more detailed documentation from an (unnamed)
Amlogic engineer. Add the PRG_ETH0_ADJ_* bits along with the improved
description.
Suggested-by: Jianxin Pan <jianxin.pan@amlogic.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: stmmac: dwmac-meson8b: Move the documentation for the TX delay
Move the documentation for the TX delay above the PRG_ETH0_TXDLY_MASK
definition. Future commits will add more registers also with
documentation above their register bit definitions. Move the existing
comment so it will be consistent with the upcoming changes.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: stmmac: dwmac-meson8b: use FIELD_PREP instead of open-coding it
Use FIELD_PREP() to shift a value to the correct offset based on a
bitmask instead of open-coding the logic.
No functional changes.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dt-bindings: net: dwmac-meson: Document the "timing-adjustment" clock
The PRG_ETHERNET registers can add an RX delay in RGMII mode. This
requires an internal re-timing circuit whose input clock is called
"timing adjustment clock". Document this clock input so the clock can be
enabled as needed.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dt-bindings: net: meson-dwmac: Add the amlogic,rx-delay-ns property
The PRG_ETHERNET registers on Meson8b and newer SoCs can add an RX
delay. Add a property with the known supported values so it can be
configured according to the board layout.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Martin Blumenstingl <martin.blumenstingl@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Here's a second attempt at a bluetooth-next pull request which
supercedes the one dated 2020-05-09. This should have the issues
discovered by Jakub fixed.
- Add support for Intel Typhoon Peak device (8087:0032)
- Add device tree bindings for Realtek RTL8723BS device
- Add device tree bindings for Qualcomm QCA9377 device
- Add support for experimental features configuration through mgmt
- Add driver hook to prevent wake from suspend
- Add support for waiting for L2CAP disconnection response
- Multiple fixes & cleanups to the btbcm driver
- Add support for LE scatternet topology for selected devices
- A few other smaller fixes & cleanups
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: dsa: felix: tc taprio and CBS offload support
This patch series support tc taprio and CBS hardware offload according
to IEEE 802.1Qbv and IEEE-802.1Qav on VSC9959.
v1->v2 changes:
- Move port_qos_map_init() function to be common felix codes.
- Keep const for dsa_switch_ops structs, add felix_port_setup_tc
function to call port_setup_tc of felix.info.
- fix code style for cbs_set, rename variables.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaoliang Yang [Wed, 13 May 2020 02:25:10 +0000 (10:25 +0800)]
net: dsa: felix: add support Credit Based Shaper(CBS) for hardware offload
VSC9959 hardware support the Credit Based Shaper(CBS) which part
of the IEEE-802.1Qav. This patch support sch_cbs set for VSC9959.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaoliang Yang [Wed, 13 May 2020 02:25:09 +0000 (10:25 +0800)]
net: dsa: felix: Configure Time-Aware Scheduler via taprio offload
Ocelot VSC9959 switch supports time-based egress shaping in hardware
according to IEEE 802.1Qbv. This patch add support for TAS configuration
on egress port of VSC9959 switch.
Felix driver is an instance of Ocelot family, with a DSA front-end. The
patch uses tc taprio hardware offload to setup TAS set function on felix
driver.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xiaoliang Yang [Wed, 13 May 2020 02:25:08 +0000 (10:25 +0800)]
net: dsa: felix: qos classified based on pcp
Set the default QoS Classification based on PCP and DEI of vlan tag,
after that, frames can be Classified to different Qos based on PCP tag.
If there is no vlan tag or vlan ignored, use port default Qos.
Signed-off-by: Xiaoliang Yang <xiaoliang.yang_1@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bluetooth: L2CAP: add support for waiting disconnection resp
Whenever we disconnect a L2CAP connection, we would immediately
report a disconnection event (EPOLLHUP) to the upper layer, without
waiting for the response of the other device.
This patch offers an option to wait until we receive a disconnection
response before reporting disconnection event, by using the "how"
parameter in l2cap_sock_shutdown(). Therefore, upper layer can opt
to wait for disconnection response by shutdown(sock, SHUT_WR).
This can be used to enforce proper disconnection order in HID,
where the disconnection of the interrupt channel must be complete
before attempting to disconnect the control channel.
Sonny Sasaka [Wed, 6 May 2020 19:55:03 +0000 (12:55 -0700)]
Bluetooth: Handle Inquiry Cancel error after Inquiry Complete
After sending Inquiry Cancel command to the controller, it is possible
that Inquiry Complete event comes before Inquiry Cancel command complete
event. In this case the Inquiry Cancel command will have status of
Command Disallowed since there is no Inquiry session to be cancelled.
This case should not be treated as error, otherwise we can reach an
inconsistent state.
Signed-off-by: Raghuram Hegde <raghuram.hegde@intel.com> Signed-off-by: Chethan T N <chethan.tumkur.narayan@intel.com> Signed-off-by: Amit K Bag <amit.k.bag@intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Implement the prevent_wake hook by checking device_may_wakeup on the usb
interface. This prevents the Bluetooth core from enabling scanning when
the device isn't expected to wake from suspend.
Bluetooth: Add hook for driver to prevent wake from suspend
Let drivers have a hook to disable configuring scanning during suspend.
Drivers should use the device_may_wakeup function call to determine
whether hci should be configured for wakeup.
For example, an implementation for btusb may look like the following:
Bluetooth: Modify LE window and interval for suspend
When a device is suspended, it doesn't need to be as responsive to
connection events. Increase the interval to 640ms (creating a duty cycle
of roughly 1.75%) so that passive scanning uses much less power (vs
previous duty cycle of 18.75%). The new window + interval combination
has been tested to work with HID devices (which are currently the only
devices capable of wake up).
Vladimir Oltean [Wed, 13 May 2020 00:23:27 +0000 (03:23 +0300)]
net: dsa: tag_sja1105: appease sparse checks for ethertype accessors
A comparison between a value from the packet and an integer constant
value needs to be done by converting the value from the packet from
net->host, or the constant from host->net. Not the other way around.
Even though it makes no practical difference, correct that.
Fixes: 38b5beeae7a4 ("net: dsa: sja1105: prepare tagger for handling DSA tags and VLAN simultaneously") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
William Tu [Tue, 12 May 2020 17:36:23 +0000 (10:36 -0700)]
erspan: Check IFLA_GRE_ERSPAN_VER is set.
Add a check to make sure the IFLA_GRE_ERSPAN_VER is provided by users.
Fixes: f989d546a2d5 ("erspan: Add type I version 0 support.") Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: William Tu <u9012063@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Traffic support for dsa_8021q in vlan_filtering=1 mode
This series is an attempt to support as much as possible in terms of
traffic I/O from the network stack with the only dsa_8021q user thus
far, sja1105.
The hardware doesn't support pushing a second VLAN tag to packets that
are already tagged, so our only option is to combine the dsa_8021q with
the user tag into a single tag and decode that on the CPU.
The assumption is that there is a type of use cases for which 7 VLANs
per port are more than sufficient, and that there's another type of use
cases where the full 4096 entries are barely enough. Those use cases are
very different from one another, so I prefer trying to give both the
best experience by creating this best_effort_vlan_filtering knob to
select the mode in which they want to operate in.
v2 was submitted here:
https://patchwork.ozlabs.org/project/netdev/cover/20200511135338.20263-1-olteanv@gmail.com/
v1 was submitted here:
https://patchwork.ozlabs.org/project/netdev/cover/20200510164255.19322-1-olteanv@gmail.com/
Changes in v3:
Patch 01/15:
- Rename again to configure_vlan_while_not_filtering, and add a helper
function for skipping VLAN configuration.
Patch 03/15:
- Remove sja1105_can_use_vlan_as_tags from driver code.
Patch 06/15:
- Adapt sja1105 driver to the second variable name change.
Patch 08/15:
- Provide an implementation of sja1105_can_use_vlan_as_tags as part of
the tagger and not as part of the switch driver. So we have to look at
the skb only, and not at the VLAN awareness state.
Changes in v2:
Patch 01/15:
- Rename variable from vlan_bridge_vtu to configure_vlans_while_disabled.
Patch 03/15:
- Be much more thorough, and make sure that things like virtual links
and FDB operations still work properly.
Patch 05/15:
- Free the vlan lists on teardown.
- Simplify sja1105_classify_vlan: only look at priv->expect_dsa_8021q.
- Keep vid 1 in the list of dsa_8021q VLANs, to make sure that untagged
packets transmitted from the stack, like PTP, continue to work in
VLAN-unaware mode.
Patch 06/15:
- Adapt to vlan_bridge_vtu variable name change.
Patch 11/15:
- In sja1105_best_effort_vlan_filtering_set, get the vlan_filtering
value of each port instead of just one time for port 0. Normally this
shouldn't matter, but it avoids issues when port 0 is disabled in
device tree.
Patch 14/14:
- Only do anything in sja1105_build_subvlans and in
sja1105_build_crosschip_subvlans when operating in
SJA1105_VLAN_BEST_EFFORT state. This avoids installing VLAN retagging
rules in unaware mode, which would cost us a penalty in terms of
usable frame memory.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:39 +0000 (20:20 +0300)]
docs: net: dsa: sja1105: document the best_effort_vlan_filtering option
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:38 +0000 (20:20 +0300)]
net: dsa: sja1105: implement VLAN retagging for dsa_8021q sub-VLANs
Expand the delta commit procedure for VLANs with additional logic for
treating bridge_vlans in the newly introduced operating mode,
SJA1105_VLAN_BEST_EFFORT.
For every bridge VLAN on every user port, a sub-VLAN index is calculated
and retagging rules are installed towards a dsa_8021q rx_vid that
encodes that sub-VLAN index. This way, the tagger can identify the
original VLANs.
Extra care is taken for VLANs to still work as intended in cross-chip
scenarios. Retagging may have unintended consequences for these because
a sub-VLAN encoding that works for the CPU does not make any sense for a
front-panel port.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:37 +0000 (20:20 +0300)]
net: dsa: sja1105: implement a common frame memory partitioning function
There are 2 different features that require some reserved frame memory
space: VLAN retagging and virtual links. Create a central function that
modifies the static config and ensures frame memory is never
overcommitted.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:36 +0000 (20:20 +0300)]
net: dsa: sja1105: add packing ops for the Retagging Table
The Retagging Table is an optional feature that allows the switch to
match frames against a {ingress port, egress port, vid} rule and change
their VLAN ID. The retagged frames are by default clones of the original
ones (since the hardware-foreseen use case was to mirror traffic for
debugging purposes and to tag it with a special VLAN for this purpose),
but we can force the original frames to be dropped by removing the
pre-retagging VLAN from the port membership list of the egress port.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:35 +0000 (20:20 +0300)]
net: dsa: sja1105: add a new best_effort_vlan_filtering devlink parameter
This devlink parameter enables the handling of DSA tags when enslaved to
a bridge with vlan_filtering=1. There are very good reasons to want
this, but there are also very good reasons for not enabling it by
default. So a devlink param named best_effort_vlan_filtering, currently
driver-specific and exported only by sja1105, is used to configure this.
In practice, this is perhaps the way that most users are going to use
the switch in. It assumes that no more than 7 VLANs are needed per port.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Create a subvlan_map as part of each port's tagger private structure.
This keeps reverse mappings of bridge-to-dsa_8021q VLAN retagging rules.
Note that as of this patch, this piece of code is never engaged, due to
the fact that the driver hasn't installed any retagging rule, so we'll
always see packets with a subvlan code of 0 (untagged).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:33 +0000 (20:20 +0300)]
net: dsa: tag_8021q: support up to 8 VLANs per port using sub-VLANs
For switches that support VLAN retagging, such as sja1105, we extend
dsa_8021q by encoding a "sub-VLAN" into the remaining 3 free bits in the
dsa_8021q tag.
A sub-VLAN is nothing more than a number in the range 0-7, which serves
as an index into a per-port driver lookup table. The sub-VLAN value of
zero means that traffic is untagged (this is also backwards-compatible
with dsa_8021q without retagging).
The switch should be configured to retag VLAN-tagged traffic that gets
transmitted towards the CPU port (and towards the CPU only). Example:
bridge vlan add dev sw1p0 vid 100
The switch retags frames received on port 0, going to the CPU, and
having VID 100, to the VID of 1104 (0x0450). In dsa_8021q language:
0x0450 means:
- DIR = 0b01: this is an RX VLAN
- SUBVLAN = 0b001: this is subvlan #1
- SWITCH_ID = 0b001: this is switch 1 (see the name "sw1p0")
- PORT = 0b0000: this is port 0 (see the name "sw1p0")
The driver also remembers the "1 -> 100" mapping. In the hotpath, if the
sub-VLAN from the tag encodes a non-untagged frame, this mapping is used
to create a VLAN hwaccel tag, with the value of 100.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:32 +0000 (20:20 +0300)]
net: dsa: sja1105: prepare tagger for handling DSA tags and VLAN simultaneously
In VLAN-unaware mode, sja1105 uses VLAN tags with a custom TPID of
0xdadb. While in the yet-to-be introduced best_effort_vlan_filtering
mode, it needs to work with normal VLAN TPID values.
A complication arises when we must transmit a VLAN-tagged packet to the
switch when it's in VLAN-aware mode. We need to construct a packet with
2 VLAN tags, and the switch will use the outer header for routing and
pop it on egress. But sadly, here the 2 hardware generations don't
behave the same:
- E/T switches won't pop an ETH_P_8021AD tag on egress, it seems
(packets will remain double-tagged).
- P/Q/R/S switches will drop a packet with 2 ETH_P_8021Q tags (it looks
like it tries to prevent VLAN hopping).
But looks like the reverse is also true:
- E/T switches have no problem popping the outer tag from packets with
2 ETH_P_8021Q tags.
- P/Q/R/S will have no problem popping a single tag even if that is
ETH_P_8021AD.
So it is clear that if we want the hardware to work with dsa_8021q
tagging in VLAN-aware mode, we need to send different TPIDs depending on
revision. Keep that information in priv->info->qinq_tpid.
The per-port tagger structure will hold an xmit_tpid value that depends
not only upon the qinq_tpid, but also upon the VLAN awareness state
itself (in case we must transmit using 0xdadb).
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:31 +0000 (20:20 +0300)]
net: dsa: sja1105: exit sja1105_vlan_filtering when called multiple times
VLAN filtering is a global property for sja1105, and that means that we
rely on the DSA core to not call us more than once.
But we need to introduce some per-port state for the tagger, namely the
xmit_tpid, and the best place to do that is where the xmit_tpid changes,
namely in sja1105_vlan_filtering. So at the moment, exit early from the
function to avoid unnecessarily resetting the switch for each port call.
Then we'll change the xmit_tpid prior to the early exit in the next
patch.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:30 +0000 (20:20 +0300)]
net: dsa: sja1105: allow VLAN configuration from the bridge in all states
Let the DSA core call our .port_vlan_add methods every time the bridge
layer requests so. We will deal internally with saving/restoring VLANs
depending on our VLAN awareness state.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:29 +0000 (20:20 +0300)]
net: dsa: sja1105: save/restore VLANs using a delta commit method
Managing the VLAN table that is present in hardware will become very
difficult once we add a third operating state
(best_effort_vlan_filtering). That is because correct cleanup (not too
little, not too much) becomes virtually impossible, when VLANs can be
added from the bridge layer, from dsa_8021q for basic tagging, for
cross-chip bridging, as well as retagging rules for sub-VLANs and
cross-chip sub-VLANs. So we need to rethink VLAN interaction with the
switch in a more scalable way.
In preparation for that, use the priv->expect_dsa_8021q boolean to
classify any VLAN request received through .port_vlan_add or
.port_vlan_del towards either one of 2 internal lists: bridge VLANs and
dsa_8021q VLANs.
Then, implement a central sja1105_build_vlan_table method that creates a
VLAN configuration from scratch based on the 2 lists of VLANs kept by
the driver, and based on the VLAN awareness state. Currently, if we are
VLAN-unaware, install the dsa_8021q VLANs, otherwise the bridge VLANs.
Then, implement a delta commit procedure that identifies which VLANs
from this new configuration are actually different from the config
previously committed to hardware. We apply the delta through the dynamic
configuration interface (we don't reset the switch). The result is that
the hardware should see the exact sequence of operations as before this
patch.
This also helps remove the "br" argument passed to
dsa_8021q_crosschip_bridge_join, which it was only using to figure out
whether it should commit the configuration back to us or not, based on
the VLAN awareness state of the bridge. We can simplify that, by always
allowing those VLANs inside of our dsa_8021q_vlans list, and committing
those to hardware when necessary.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:28 +0000 (20:20 +0300)]
net: dsa: sja1105: deny alterations of dsa_8021q VLANs from the bridge
At the moment, this can never happen. The 2 modes that we operate in do
not permit that:
- SJA1105_VLAN_UNAWARE: we are guarded from bridge VLANs added by the
user by the DSA core. We will later lift this restriction by setting
ds->vlan_bridge_vtu = true, and that is where we'll need it.
- SJA1105_VLAN_FILTERING_FULL: in this mode, dsa_8021q configuration is
disabled. So the user is free to add these VLANs in the 1024-3071
range.
The reason for the patch is that we'll introduce a third VLAN awareness
state, where both dsa_8021q as well as the bridge are going to call our
.port_vlan_add and .port_vlan_del methods.
For that, we need a good way to discriminate between the 2. The easiest
(and less intrusive way for upper layers) is to recognize the fact that
dsa_8021q configurations are always driven by our driver - we _know_
when a .port_vlan_add method will be called from dsa_8021q because _we_
initiated it.
So introduce an expect_dsa_8021q boolean which is only used, at the
moment, for blacklisting VLANs in range 1024-3071 in the modes when
dsa_8021q is active.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:27 +0000 (20:20 +0300)]
net: dsa: sja1105: keep the VLAN awareness state in a driver variable
Soon we'll add a third operating mode to the driver. Introduce a
vlan_state to make things more easy to manage, and use it where
applicable.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 12 May 2020 17:20:26 +0000 (20:20 +0300)]
net: dsa: tag_8021q: introduce a vid_is_dsa_8021q helper
This function returns a boolean denoting whether the VLAN passed as
argument is part of the 1024-3071 range that the dsa_8021q tagging
scheme uses.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Tue, 12 May 2020 17:20:25 +0000 (20:20 +0300)]
net: dsa: provide an option for drivers to always receive bridge VLANs
DSA assumes that a bridge which has vlan filtering disabled is not
vlan aware, and ignores all vlan configuration. However, the kernel
software bridge code allows configuration in this state.
This causes the kernel's idea of the bridge vlan state and the
hardware state to disagree, so "bridge vlan show" indicates a correct
configuration but the hardware lacks all configuration. Even worse,
enabling vlan filtering on a DSA bridge immediately blocks all traffic
which, given the output of "bridge vlan show", is very confusing.
Provide an option that drivers can set to indicate they want to receive
vlan configuration even when vlan filtering is disabled. At the very
least, this is safe for Marvell DSA bridges, which do not look up
ingress traffic in the VTU if the port is in 8021Q disabled state. It is
also safe for the Ocelot switch family. Whether this change is suitable
for all DSA bridges is not known.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 12 May 2020 13:24:58 +0000 (14:24 +0100)]
sfc: siena_check_caps() can be static
Reported-by: Jakub Kicinski <kuba@kernel.org> Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 12 May 2020 13:24:34 +0000 (14:24 +0100)]
sfc: actually wire up siena_check_caps()
Assign it to siena_a0_nic_type.check_caps function pointer.
Fixes: be904b855200 ("sfc: make capability checking a nic_type function") Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kunihiko Hayashi [Tue, 12 May 2020 07:26:50 +0000 (16:26 +0900)]
dt-bindings: net: Convert UniPhier AVE4 controller to json-schema
Convert the UniPhier AVE4 controller binding to DT schema format.
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 12 May 2020 19:19:30 +0000 (12:19 -0700)]
Merge branch 'ionic-updates'
Shannon Nelson says:
====================
ionic updates
This set of patches is a bunch of code cleanup, a little
documentation, longer tx sg lists, more ethtool stats,
and a couple more transceiver types.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 12 May 2020 00:59:31 +0000 (17:59 -0700)]
ionic: shorter dev cmd wait time
Shorten our msleep time while polling for the dev command
request to finish. Yes, checkpatch.pl complains that the
msleep might actually go longer - that won't hurt, but we'll
take the shorter time if we can get it.
Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 12 May 2020 00:59:29 +0000 (17:59 -0700)]
ionic: protect vf calls from fw reset
When going into a firmware upgrade cycle, we set the device as
not present to keep some user commands from trying to change
the driver while we're only half there. Unfortunately, the
ndo_vf_* calls don't check netif_device_present() so we need
to add a check in the callbacks.
Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
Shannon Nelson [Tue, 12 May 2020 00:59:27 +0000 (17:59 -0700)]
ionic: support longer tx sg lists
The version 1 Tx queues can use longer SG lists than the
original version 0 queues, but we need to check to see if the
firmware supports the v1 Tx queues. This implements the queue
type query for all queue types, and uses the information to
set up for using the longer Tx SG lists.
Because the Tx SG list can be longer, we need to limit the
max ring length to be sure we stay inside the boundaries of a
DMA allocation max size, so we lower the max Tx ring size.
The driver sets its highest known version in the Q_IDENTITY
command, and the FW returns the highest version that it knows,
bounded by the driver's version. The negotiated version number
is later used in the Q_INIT commands.
Signed-off-by: Shannon Nelson <snelson@pensando.io> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Mon, 11 May 2020 17:08:07 +0000 (10:08 -0700)]
checkpatch: warn about uses of ENOTSUPP
ENOTSUPP often feels like the right error code to use, but it's
in fact not a standard Unix error. E.g.:
$ python
>>> import errno
>>> errno.errorcode[errno.ENOTSUPP]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: module 'errno' has no attribute 'ENOTSUPP'
There were numerous commits converting the uses back to EOPNOTSUPP
but in some cases we are stuck with the high error code for backward
compatibility reasons.
Let's try prevent more ENOTSUPPs from getting into the kernel.
v2 (Joe):
- add a link to recent discussion,
- don't match when scanning files, not patches to avoid sudden
influx of conversion patches.
https://lore.kernel.org/netdev/20200511165319.2251678-1-kuba@kernel.org/
Suggested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Jakub Kicinski <kuba@kernel.org> Acked-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
improve msg_control kernel vs user pointer handling
this series replace the msg_control in the kernel msghdr structure
with an anonymous union and separate fields for kernel vs user
pointers. In addition to helping a bit with type safety and reducing
sparse warnings, this also allows to remove the set_fs() in
kernel_recvmsg, helping with an eventual entire removal of set_fs().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: cleanly handle kernel vs user buffers for ->msg_control
The msg_control field in struct msghdr can either contain a user
pointer when used with the recvmsg system call, or a kernel pointer
when used with sendmsg. To complicate things further kernel_recvmsg
can stuff a kernel pointer in and then use set_fs to make the uaccess
helpers accept it.
Replace it with a union of a kernel pointer msg_control field, and
a user pointer msg_control_user one, and allow kernel_recvmsg operate
on a proper kernel pointer using a bitfield to override the normal
choice of a user pointer for recvmsg.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Add a variant of CMSG_DATA that operates on user pointer to avoid
sparse warnings about casting to/from user pointers. Also fix up
CMSG_DATA to rely on the gcc extension that allows void pointer
arithmetics to cut down on the amount of casts.
Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
sfc: remove nic_data usage in common code
efx->nic_data should only be used from NIC-specific code (i.e. nic_type
functions and things they call), in files like ef10[_sriov].c and
siena.c. This series refactors several nic_data usages from common
code (mainly in mcdi_filters.c) into nic_type functions, in preparation
for the upcoming ef100 driver which will use those functions but have
its own struct layout for efx->nic_data distinct from ef10's.
After this series, one nic_data usage (in ptp.c) remains; it wasn't
clear to me how to fix it, and ef100 devices don't yet have PTP support
(so the initial ef100 driver will not call that code).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>