HI13X1_GMAC changed the offsets and bitmaps for
GE_TX_LOCAL_PAGE_REG registers in the same peripheral
device on different models of the hip04_eth. With the
default configuration, HI13X1_GMAC can also work without
any writes to the GE_TX_LOCAL_PAGE_REG register.
Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: hisilicon: Cleanup for cast to restricted __be32
This patch fixes the following warning from sparse:
hip04_eth.c:533:23: warning: cast to restricted __be16
hip04_eth.c:533:23: warning: cast to restricted __be16
hip04_eth.c:533:23: warning: cast to restricted __be16
hip04_eth.c:533:23: warning: cast to restricted __be16
hip04_eth.c:534:23: warning: cast to restricted __be32
hip04_eth.c:534:23: warning: cast to restricted __be32
hip04_eth.c:534:23: warning: cast to restricted __be32
hip04_eth.c:534:23: warning: cast to restricted __be32
hip04_eth.c:534:23: warning: cast to restricted __be32
hip04_eth.c:534:23: warning: cast to restricted __be32
Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes the following warning from sparse:
hip04_eth.c:468:25: warning: incorrect type in assignment
hip04_eth.c:468:25: expected unsigned int [usertype] send_addr
hip04_eth.c:468:25: got restricted __be32 [usertype]
hip04_eth.c:469:25: warning: incorrect type in assignment
hip04_eth.c:469:25: expected unsigned int [usertype] send_size
hip04_eth.c:469:25: got restricted __be32 [usertype]
hip04_eth.c:470:19: warning: incorrect type in assignment
hip04_eth.c:470:19: expected unsigned int [usertype] cfg
hip04_eth.c:470:19: got restricted __be32 [usertype]
hip04_eth.c:472:23: warning: incorrect type in assignment
hip04_eth.c:472:23: expected unsigned int [usertype] wb_addr
hip04_eth.c:472:23: got restricted __be32 [usertype]
Signed-off-by: Jiangfeng Xiao <xiaojiangfeng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 9 Jul 2019 21:17:59 +0000 (14:17 -0700)]
Merge branch 'stmmac-hash-table'
Biao Huang says:
====================
stmmac: fix out-of-boundary issue and add taller hash table support
Fix mac address out-of-boundary issue in net-next tree.
and resend the patch which was discussed in
https://lore.kernel.org/patchwork/patch/1082117
but with no further progress.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: stmmac: add support for hash table size 128/256 in dwmac4
1. get hash table size in hw feature reigster, and add support
for taller hash table(128/256) in dwmac4.
2. only clear GMAC_PACKET_FILTER bits used in this function,
to avoid side effect to functions of other bits.
stmmac selftests output log with flow control on:
ethtool -t eth0
The test result is PASS
The test extra info:
1. MAC Loopback 0
2. PHY Loopback -95
3. MMC Counters 0
4. EEE -95
5. Hash Filter MC 0
6. Perfect Filter UC 0
7. MC Filter 0
8. UC Filter 0
9. Flow Control 0
Signed-off-by: Biao Huang <biao.huang@mediatek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
tc-testing: Add plugin for simple traffic generation
This series supersedes the previous submission that included a patch for test
case verification using JSON output. It adds a new tdc plugin, scapyPlugin, as
a way to send traffic to test tc filters and actions.
The first patch makes a change to the TdcPlugin module that will allow tdc
plugins to examine the test case currently being executed, so plugins can
play a more active role in testing by accepting information or commands from
the test case. This is required for scapyPlugin to work.
The second patch adds scapyPlugin itself, and an example test case file to
demonstrate how the scapy block works in the test cases.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lucas Bates [Tue, 9 Jul 2019 01:34:27 +0000 (21:34 -0400)]
tc-testing: introduce scapyPlugin for basic traffic
The scapyPlugin allows for simple traffic generation in tdc to
test various tc features. It was tested with scapy v2.4.2, but
should work with any successive version.
In order to use the plugin's functionality, scapy must be
installed. This can be done with:
pip3 install scapy
or to install 2.4.2:
pip3 install scapy==2.4.2
If the plugin is unable to import the scapy module, it will
terminate the tdc run.
The plugin makes use of a new key in the test case data, 'scapy'.
This block contains three other elements: 'iface', 'count', and
'packet':
* iface is the name of the device on the host machine from which
the packet(s) will be sent. Values contained within tdc_config.py's
NAMES dict can be used here - this is useful if paired with
nsPlugin
* count is the number of copies of this packet to be sent
* packet is a string detailing the different layers of the packet
to be sent. If a property isn't explicitly set, scapy will set
default values for you.
Layers in the packet info are separated by slashes. For info about
common TCP and IP properties, see:
https://blogs.sans.org/pen-testing/files/2016/04/ScapyCheatSheet_v0.2.pdf
Caution is advised when running tests using the scapy functionality,
since the plugin blindly sends the packet as defined in the test case
data.
See creating-testcases/scapy-example.json for sample test cases;
the first test is intended to pass while the second is intended to
fail.
Signed-off-by: Lucas Bates <lucasb@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lucas Bates [Tue, 9 Jul 2019 01:34:26 +0000 (21:34 -0400)]
tc-testing: Allow tdc plugins to see test case data
Instead of only passing the test case name and ID, pass the
entire current test case down to the plugins. This change
allows plugins to start accepting commands and directives
from the test cases themselves, for greater flexibility
in testing.
Signed-off-by: Lucas Bates <lucasb@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 9 Jul 2019 20:03:04 +0000 (13:03 -0700)]
Merge branch 'Armada-8040-SoC-in-orion-mdio-hang'
Josua Mayer says:
====================
Fix hang of Armada 8040 SoC in orion-mdio
With a modular kernel as configured by Debian a hang was observed with
the Armada 8040 SoC in the Clearfog GT and Macchiatobin boards.
The 8040 SoC actually requires four clocks to be enabled for the mdio
interface to function. All 4 clocks are already specified in
armada-cp110.dtsi. It has however been missed that the orion-mdio driver
only supports enabling up to three clocks.
This patch-set allows the orion-mdio driver to handle four clocks and
adds a warning when more clocks are specified to prevent this particular
oversight in the future.
Changes since v1:
- fixed condition for priting the warning (Andrew Lunn)
- rephrased commit description for deferred probing (Andrew Lunn)
- fixed compiler warnings (kbuild test robot)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: mvmdio: defer probe of orion-mdio if a clock is not ready
Defer probing of the orion-mdio interface when getting a clock returns
EPROBE_DEFER. This avoids locking up the Armada 8k SoC when mdio is used
before all clocks have been enabled.
Signed-off-by: Josua Mayer <josua@solid-run.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
net: mvmdio: print warning when orion-mdio has too many clocks
Print a warning when device tree specifies more than the maximum of four
clocks supported by orion-mdio. Because reading from mdio can lock up
the Armada 8k when a required clock is not initialized, it is important
to notify the user when a specified clock is ignored.
Signed-off-by: Josua Mayer <josua@solid-run.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
net: mvmdio: allow up to four clocks to be specified for orion-mdio
Allow up to four clocks to be specified and enabled for the orion-mdio
interface, which are required by the Armada 8k and defined in
armada-cp110.dtsi.
Fixes a hang in probing the mvmdio driver that was encountered on the
Clearfog GT 8K with all drivers built as modules, but also affects other
boards such as the MacchiatoBIN.
Cc: stable@vger.kernel.org Fixes: 96cb43423822 ("net: mvmdio: allow up to three clocks to be specified for orion-mdio") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Josua Mayer <josua@solid-run.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dt-bindings: allow up to four clocks for orion-mdio
Armada 8040 needs four clocks to be enabled for MDIO accesses to work.
Update the binding to allow the extra clock to be specified.
Cc: stable@vger.kernel.org Fixes: 6d6a331f44a1 ("dt-bindings: allow up to three clocks for orion-mdio") Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Josua Mayer <josua@solid-run.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: netsec: start using buffers if page_pool registration succeeded
The current driver starts using page_pool buffers before calling
xdp_rxq_info_reg_mem_model(). Start using the buffers after the
registration succeeded, so we won't have to call
page_pool_request_shutdown() in case of failure
Fixes: 5c67bf0ec4d0 ("net: netsec: Use page_pool API") Signed-off-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Tue, 9 Jul 2019 08:03:00 +0000 (10:03 +0200)]
net: stmmac: Introducing support for Page Pool
Mapping and unmapping DMA region is an high bottleneck in stmmac driver,
specially in the RX path.
This commit introduces support for Page Pool API and uses it in all RX
queues. With this change, we get more stable troughput and some increase
of banwidth with iperf:
- MAC1000 - 950 Mbps
- XGMAC: 9.22 Gbps
Changes from v3:
- Use page_pool_destroy() (Ilias)
Changes from v2:
- Uncoditionally call page_pool_free() (Jesper)
Changes from v1:
- Use page_pool_get_dma_addr() (Jesper)
- Add a comment (Jesper)
- Add page_pool_free() call (Jesper)
- Reintroduce sync_single_for_device (Arnd / Ilias)
Signed-off-by: Jose Abreu <joabreu@synopsys.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Tue, 9 Jul 2019 08:02:59 +0000 (10:02 +0200)]
net: stmmac: Fix descriptors address being in > 32 bits address space
Commit a993db88d17d ("net: stmmac: Enable support for > 32 Bits
addressing in XGMAC"), introduced support for > 32 bits addressing in
XGMAC but the conversion of descriptors to dma_addr_t was left out.
As some devices assing coherent memory in regions > 32 bits we need to
set lower and upper value of descriptors address when initializing DMA
channels.
Luckly, this was working for me because I was assigning CMA to < 4GB
address space for performance reasons.
Fixes: a993db88d17d ("net: stmmac: Enable support for > 32 Bits addressing in XGMAC") Signed-off-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Tue, 9 Jul 2019 08:02:58 +0000 (10:02 +0200)]
net: stmmac: Implement RX Coalesce Frames setting
Add support for coalescing RX path by specifying number of frames which
don't need to have interrupt on completion bit set.
This is only available when RX Watchdog is enabled.
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Jose Abreu <joabreu@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Tue, 9 Jul 2019 07:50:07 +0000 (03:50 -0400)]
bnxt_en: Add page_pool_destroy() during RX ring cleanup.
Add page_pool_destroy() in bnxt_free_rx_rings() during normal RX ring
cleanup, as Ilias has informed us that the following commit has been
merged:
1da4bbeffe41 ("net: core: page_pool: add user refcnt and reintroduce page_pool_destroy")
The special error handling code to call page_pool_free() can now be
removed. bnxt_free_rx_rings() will always be called during normal
shutdown or any error paths.
Fixes: 322b87ca55f2 ("bnxt_en: add page_pool support") Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org> Cc: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Acked-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch series add connection tracking capabilities in tc sw datapath.
It does so via a new tc action, called act_ct, and new tc flower classifier matching
on conntrack state, mark and label.
Usage is as follows:
$ tc qdisc add dev ens1f0_0 ingress
$ tc qdisc add dev ens1f0_1 ingress
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp ct_state -trk \
action ct zone 2 pipe \
action goto chain 2
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower ct_state +trk+new \
action ct zone 2 commit mark 0xbb nat src addr 5.5.5.7 pipe \
action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
action ct nat pipe \
action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_1 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp ct_state -trk \
action ct zone 2 pipe \
action goto chain 1
$ tc filter add dev ens1f0_1 ingress \
prio 1 chain 1 proto ip \
flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
action ct nat pipe \
action mirred egress redirect dev ens1f0_0
The pattern used in the design here closely resembles OvS, as the plan is to also offload
OvS conntrack rules to tc. OvS datapath rules uses it's recirculation mechanism to send
specific packets to conntrack, and return with the new conntrack state (ct_state) on some other recirc_id
to be matched again (we use goto chain for this).
Paul Blakey [Tue, 9 Jul 2019 07:30:51 +0000 (10:30 +0300)]
tc-tests: Add tc action ct tests
Add 13 tests ensuring the command line is doing what is supposed to do.
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Retreives connection tracking zone, mark, label, and state from
a SKB.
Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Blakey [Tue, 9 Jul 2019 07:30:48 +0000 (10:30 +0300)]
net/sched: Introduce action ct
Allow sending a packet to conntrack module for connection tracking.
The packet will be marked with conntrack connection's state, and
any metadata such as conntrack mark and label. This state metadata
can later be matched against with tc classifers, for example with the
flower classifier as below.
In addition to committing new connections the user can optionally
specific a zone to track within, set a mark/label and configure nat
with an address range and port range.
Usage is as follows:
$ tc qdisc add dev ens1f0_0 ingress
$ tc qdisc add dev ens1f0_1 ingress
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp ct_state -trk \
action ct zone 2 pipe \
action goto chain 2
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower ct_state +trk+new \
action ct zone 2 commit mark 0xbb nat src addr 5.5.5.7 pipe \
action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_0 ingress \
prio 1 chain 2 proto ip \
flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
action ct nat pipe \
action mirred egress redirect dev ens1f0_1
$ tc filter add dev ens1f0_1 ingress \
prio 1 chain 0 proto ip \
flower ip_proto tcp ct_state -trk \
action ct zone 2 pipe \
action goto chain 1
$ tc filter add dev ens1f0_1 ingress \
prio 1 chain 1 proto ip \
flower ct_zone 2 ct_mark 0xbb ct_state +trk+est \
action ct nat pipe \
action mirred egress redirect dev ens1f0_0
Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Yossi Kuperman <yossiku@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com>
Changelog:
V5->V6:
Added CONFIG_NF_DEFRAG_IPV6 in handle fragments ipv6 case
V4->V5:
Reordered nf_conntrack_put() in tcf_ct_skb_nfct_cached()
V3->V4:
Added strict_start_type for act_ct policy
V2->V3:
Fixed david's comments: Removed extra newline after rcu in tcf_ct_params , and indent of break in act_ct.c
V1->V2:
Fixed parsing of ranges TCA_CT_NAT_IPV6_MAX as 'else' case overwritten ipv4 max
Refactored NAT_PORT_MIN_MAX range handling as well
Added ipv4/ipv6 defragmentation
Removed extra skb pull push of nw offset in exectute nat
Refactored tcf_ct_skb_network_trim after pull
Removed TCA_ACT_CT define
Signed-off-by: David S. Miller <davem@davemloft.net>
====================
devlink: Introduce PCI PF, VF ports and attributes
This patchset carry forwards the work initiated in [1] and discussion
futher concluded at [2].
To improve visibility of representor netdevice, its association with
PF or VF, physical port, two new devlink port flavours are added as
PCI PF and PCI VF ports.
A sample eswitch view can be seen below, which will be futher extended to
mdev subdevices of a PCI function in future.
Patch-1 moves physical port's attribute to new structure
Patch-2 enhances netlink response to consider port flavour
Patch-3,4 extends devlink port attributes and port flavour
Patch-5 extends mlx5 driver to register devlink ports for PF, VF and
physical link.
Changelog:
v5->v6:
- Fixed port flavour check order for PCI PF vs other flavours in
netlink response.
- Changed 'physical' to 'phys'.
v4->v5:
- Split first patch to two patches to handle netlink response in
separate patch.
- Corrected typo 'otwerwise' to 'otherwise' in patches 3 and 4.
v3->v4:
- Addressed comments from Jiri.
- Split first patch to two patches.
- Renamed phys_port to physical to be consistent with pci_pf.
- Removed port_number from __devlink_port_attrs_set and moved
assignment to caller function.
- Used capital letter while moving old comment to new structure.
- Removed helper function is_devlink_phy_port_num_supported().
v2->v3:
- Made port_number and split_port_number applicable only to
physical port flavours.
v1->v2:
- Updated new APIs and mlx5 driver to drop port_number for PF, VF
attributes
- Updated port_number comment for its usage
- Limited putting port_number to physical ports
====================
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/mlx5e: Register devlink ports for physical link, PCI PF, VFs
Register devlink port of physical port, PCI PF and PCI VF flavour
for each PF, VF when a given devlink instance is in switchdev mode.
Implement ndo_get_devlink_port callback API to make use of registered
devlink ports.
This eliminates ndo_get_phys_port_name() and ndo_get_port_parent_id()
callbacks. Hence, remove them.
An example output with 2 VFs, without a PF and single uplink port is
below.
$devlink port show
pci/0000:06:00.0/65535: type eth netdev ens2f0 flavour physical
pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0
pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1
Reviewed-by: Roi Dayan <roid@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
devlink: Introduce PCI VF port flavour and port attribute
In an eswitch, PCI VF may have port which is normally represented using
a representor netdevice.
To have better visibility of eswitch port, its association with VF,
and its representor netdevice, introduce a PCI VF port flavour.
When devlink port flavour is PCI VF, fill up PCI VF attributes of
the port.
Extend port name creation using PCI PF and VF number scheme on best
effort basis, so that vendor drivers can skip defining their own scheme.
$ devlink port show
pci/0000:05:00.0/0: type eth netdev eth0 flavour pcipf pfnum 0
pci/0000:05:00.0/1: type eth netdev eth1 flavour pcivf pfnum 0 vfnum 0
pci/0000:05:00.0/2: type eth netdev eth2 flavour pcivf pfnum 0 vfnum 1
Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
devlink: Introduce PCI PF port flavour and port attribute
In an eswitch, PCI PF may have port which is normally represented
using a representor netdevice.
To have better visibility of eswitch port, its association with
PF and a representor netdevice, introduce a PCI PF port
flavour and port attriute.
When devlink port flavour is PCI PF, fill up PCI PF attributes of the
port.
Extend port name creation using PCI PF number on best effort basis.
So that vendor drivers can skip defining their own scheme.
$ devlink port show
pci/0000:05:00.0/0: type eth netdev eth0 flavour pcipf pfnum 0
Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
devlink: Return physical port fields only for applicable port flavours
Physical port number and split group fields are applicable only to
physical port flavours such as PHYSICAL, CPU and DSA.
Hence limit returning those values in netlink response to such port
flavours.
Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
To support additional devlink port flavours and to support few common
and few different port attributes, move physical port attributes to a
different structure.
Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Parav Pandit <parav@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
nfp: tls: fixes for initial TLS support
This series brings various fixes to nfp tls offload recently added
to net-next.
First 4 patches revolve around device mailbox communication, trying
to make it more reliable. Next patch fixes statistical counter.
Patch 6 improves the TX resync if device communication failed.
Patch 7 makes sure we remove keys from memory after talking to FW.
Patch 8 adds missing tls context initialization, we fill in the
context information from various places based on the configuration
and looks like we missed the init in the case of where TX is
offloaded, but RX wasn't initialized yet. Patches 9 and 10 make
the nfp driver undo TLS state changes if we need to drop the
frame (e.g. due to DMA mapping error).
Last but not least TLS fallback should not adjust socket memory
after skb_orphan_partial(). This code will go away once we forbid
orphaning of skbs in need of crypto, but that's "real" -next
material, so lets do a quick fix.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 9 Jul 2019 02:53:18 +0000 (19:53 -0700)]
net/tls: fix socket wmem accounting on fallback with netem
netem runs skb_orphan_partial() which "disconnects" the skb
from normal TCP write memory accounting. We should not adjust
sk->sk_wmem_alloc on the fallback path for such skbs.
Fixes: e8f69799810c ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 9 Jul 2019 02:53:17 +0000 (19:53 -0700)]
nfp: tls: undo TLS sequence tracking when dropping the frame
If driver has to drop the TLS frame it needs to undo the TCP
sequence tracking changes, otherwise device will receive
segments out of order and drop them.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 9 Jul 2019 02:53:16 +0000 (19:53 -0700)]
nfp: tls: avoid one of the ifdefs for TLS
Move the #ifdef CONFIG_TLS_DEVICE a little so we can eliminate
the other one.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 9 Jul 2019 02:53:15 +0000 (19:53 -0700)]
net/tls: add missing prot info init
Turns out TLS_TX in HW offload mode does not initialize tls_prot_info.
Since commit 9cd81988cce1 ("net/tls: use version from prot") we actually
use this field on the datapath. Luckily we always compare it to TLS 1.3,
and assume 1.2 otherwise. So since zero is not equal to 1.3, everything
worked fine.
Fixes: 9cd81988cce1 ("net/tls: use version from prot") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 9 Jul 2019 02:53:14 +0000 (19:53 -0700)]
nfp: tls: don't leave key material in freed FW cmsg skbs
Make sure the contents of the skb which carried key material
to the FW is cleared.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce a return code for the tls_dev_resync callback.
When the driver TX resync fails, kernel can retry the resync again
until it succeeds. This prevents drivers from attempting to offload
TLS packets if the connection is known to be out of sync.
We don't worry about the RX resync since they will be retried naturally
as more encrypted records get received.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 9 Jul 2019 02:53:12 +0000 (19:53 -0700)]
nfp: tls: count TSO segments separately for the TLS offload
Count the number of successfully submitted TLS segments,
not skbs. This will make it easier to compare the TLS
encryption count against other counters.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Increase the batch limit to consume small message bursts more
effectively. Practically, the effect on the 'add' messages is not
significant since the mailbox is sized such that the 'add' messages are
still limited to the same order of magnitude that it was originally set
for.
Furthermore, increase the queue size limit to 1024 entries. This further
improves the handling of bursts of small control messages.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 9 Jul 2019 02:53:10 +0000 (19:53 -0700)]
nfp: tls: use unique connection ids instead of 4-tuple for TX
Connection 4 tuple reuse is slightly problematic - TLS socket
and context do not get destroyed until all the associated skbs
left the system and all references are released. This leads
to stale connection entry in the device preventing addition
of new one if the 4 tuple is reused quickly enough.
Instead of using read 4 tuple as the key use a unique ID.
Set the protocol to TCP and port to 0 to ensure no collisions
with real connections.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 9 Jul 2019 02:53:09 +0000 (19:53 -0700)]
nfp: tls: move setting ipver_vlan to a helper
Long lines are ugly. No functional changes.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Tue, 9 Jul 2019 02:53:08 +0000 (19:53 -0700)]
nfp: tls: ignore queue limits for delete commands
We need to do our best not to drop delete commands, otherwise
we will have stale entries in the connection table. Ignore
the control message queue limits for delete commands.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 8 Jul 2019 16:59:40 +0000 (00:59 +0800)]
sctp: remove rcu_read_lock from sctp_bind_addr_state
sctp_bind_addr_state() is called either in packet rcv path or
by sctp_copy_local_addr_list(), which are under rcu_read_lock.
So there's no need to call it again in sctp_bind_addr_state().
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 8 Jul 2019 16:57:04 +0000 (00:57 +0800)]
sctp: remove reconf_enable from asoc
asoc's reconf support is actually decided by the 4-shakehand negotiation,
not something that users can set by sockopt. asoc->peer.reconf_capable is
working for this. So remove it from asoc.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
linkmode_mod_bit is introduced as a helper function to set/clear
bits in a linkmode.
Replace the if else code structure with a call to the helper
linkmode_mod_bit.
Signed-off-by: Fuqian Huang <huangfq.daxian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 9 Jul 2019 02:50:13 +0000 (19:50 -0700)]
Merge branch 'Add-MPLS-actions-to-TC'
John Hurley says:
====================
Add MPLS actions to TC
This patchset introduces a new TC action module that allows the
manipulation of the MPLS headers of packets. The code impliments
functionality including push, pop, and modify.
Also included are tests for the new funtionality. Note that these will
require iproute2 changes to be submitted soon.
NOTE: these patches are applied to net-next along with the patch:
[PATCH net 1/1] net: openvswitch: fix csum updates for MPLS actions
This patch has been accepted into net but, at time of posting, is not yet
in net-next.
v6-v7:
- add extra tests for setting max/min and exceeding range of fields -
patch 5 (Roman Mashak)
v5-v6:
- add CONFIG_NET_ACT_MPLS to tc-testing config file - patch 5
(Davide Caratti)
v4-v5:
- move mpls_hdr() call to after skb_ensure_writable - patch 3
(Willem de Bruijn)
- move mpls_dec_ttl to helper - patch 4 (Willem de Bruijn)
- add iproute2 usage example to commit msg - patch 4 (David Ahern)
- align label validation with mpls core code - patch 4 (David Ahern)
- improve extack message for no proto in mpls pop - patch 4 (David Ahern)
v3-v4:
- refactor and reuse OvS code (Cong Wang)
- use csum API rather than skb_post*rscum to update skb->csum (Cong Wang)
- remove unnecessary warning (Cong Wang)
- add comments to uapi attributes (David Ahern)
- set strict type policy check for TCA_MPLS_UNSPEC (David Ahern)
- expand/improve extack messages (David Ahern)
- add option to manually set BOS
v2-v3:
- remove a few unnecessary line breaks (Jiri Pirko)
- retract hw offload patch from set (resubmit with driver changes) (Jiri)
v1->v2:
- ensure TCA_ID_MPLS does not conflict with TCA_ID_CTINFO (Davide Caratti)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Sun, 7 Jul 2019 14:01:58 +0000 (15:01 +0100)]
tc-tests: actions: add MPLS tests
Add a new series of selftests to verify the functionality of act_mpls in
TC.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Sun, 7 Jul 2019 14:01:57 +0000 (15:01 +0100)]
net: sched: add mpls manipulation actions to TC
Currently, TC offers the ability to match on the MPLS fields of a packet
through the use of the flow_dissector_key_mpls struct. However, as yet, TC
actions do not allow the modification or manipulation of such fields.
Add a new module that registers TC action ops to allow manipulation of
MPLS. This includes the ability to push and pop headers as well as modify
the contents of new or existing headers. A further action to decrement the
TTL field of an MPLS header is also provided with a new helper added to
support this.
Examples of the usage of the new action with flower rules to push and pop
MPLS labels are:
tc filter add dev eth0 protocol ip parent ffff: flower \
action mpls push protocol mpls_uc label 123 \
action mirred egress redirect dev eth1
tc filter add dev eth0 protocol mpls_uc parent ffff: flower \
action mpls pop protocol ipv4 \
action mirred egress redirect dev eth1
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Sun, 7 Jul 2019 14:01:56 +0000 (15:01 +0100)]
net: core: add MPLS update core helper and use in OvS
Open vSwitch allows the updating of an existing MPLS header on a packet.
In preparation for supporting similar functionality in TC, move this to a
common skb helper function.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Sun, 7 Jul 2019 14:01:55 +0000 (15:01 +0100)]
net: core: move pop MPLS functionality from OvS to core helper
Open vSwitch provides code to pop an MPLS header to a packet. In
preparation for supporting this in TC, move the pop code to an skb helper
that can be reused.
Remove the, now unused, update_ethertype static function from OvS.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Sun, 7 Jul 2019 14:01:54 +0000 (15:01 +0100)]
net: core: move push MPLS functionality from OvS to core helper
Open vSwitch provides code to push an MPLS header to a packet. In
preparation for supporting this in TC, move the push code to an skb helper
that can be reused.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv6: elide flowlabel check if no exclusive leases exist
Processes can request ipv6 flowlabels with cmsg IPV6_FLOWINFO.
If not set, by default an autogenerated flowlabel is selected.
Explicit flowlabels require a control operation per label plus a
datapath check on every connection (every datagram if unconnected).
This is particularly expensive on unconnected sockets multiplexing
many flows, such as QUIC.
In the common case, where no lease is exclusive, the check can be
safely elided, as both lease request and check trivially succeed.
Indeed, autoflowlabel does the same even with exclusive leases.
Elide the check if no process has requested an exclusive lease.
fl6_sock_lookup previously returns either a reference to a lease or
NULL to denote failure. Modify to return a real error and update
all callers. On return NULL, they can use the label and will elide
the atomic_dec in fl6_sock_release.
This is an optimization. Robust applications still have to revert to
requesting leases if the fast path fails due to an exclusive lease.
Changes RFC->v1:
- use static_key_false_deferred to rate limit jump label operations
- call static_key_deferred_flush to stop timers on exit
- move decrement out of RCU context
- defer optimization also if opt data is associated with a lease
- updated all fp6_sock_lookup callers, not just udp
Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: Reset bytes_acked and bytes_received when disconnecting
If an app is playing tricks to reuse a socket via tcp_disconnect(),
bytes_acked/received needs to be reset to 0. Otherwise tcp_info will
report the sum of the current and the old connection..
Cc: Eric Dumazet <edumazet@google.com> Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info") Fixes: bdd1f9edacb5 ("tcp: add tcpi_bytes_received to tcp_info") Signed-off-by: Christoph Paasch <cpaasch@apple.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vincent Bernat [Sat, 6 Jul 2019 21:01:08 +0000 (23:01 +0200)]
bonding: fix value exported by Netlink for peer_notif_delay
IFLA_BOND_PEER_NOTIF_DELAY was set to the value of downdelay instead
of peer_notif_delay. After this change, the correct value is exported.
Fixes: 07a4ddec3ce9 ("bonding: add an option to specify a delay between peer notifications") Signed-off-by: Vincent Bernat <vincent@bernat.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 5 Jul 2019 19:14:16 +0000 (20:14 +0100)]
coallocate socket_wq with socket itself
socket->wq is assign-once, set when we are initializing both
struct socket it's in and struct socket_wq it points to. As the
matter of fact, the only reason for separate allocation was the
ability to RCU-delay freeing of socket_wq. RCU-delaying the
freeing of socket itself gets rid of that need, so we can just
fold struct socket_wq into the end of struct socket and simplify
the life both for sock_alloc_inode() (one allocation instead of
two) and for tun/tap oddballs, where we used to embed struct socket
and struct socket_wq into the same structure (now - embedding just
the struct socket).
Note that reference to struct socket_wq in struct sock does remain
a reference - that's unchanged.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Al Viro [Fri, 5 Jul 2019 19:13:22 +0000 (20:13 +0100)]
sockfs: switch to ->free_inode()
we do have an RCU-delayed part there already (freeing the wq),
so it's not like the pipe situation; moreover, it might be
worth considering coallocating wq with the rest of struct sock_alloc.
->sk_wq in struct sock would remain a pointer as it is, but
the object it normally points to would be coallocated with
struct socket...
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) Lots of libbpf improvements: i) addition of new APIs to attach BPF
programs to tracing entities such as {k,u}probes or tracepoints,
ii) improve specification of BTF-defined maps by eliminating the
need for data initialization for some of the members, iii) addition
of a high-level API for setting up and polling perf buffers for
BPF event output helpers, all from Andrii.
2) Add "prog run" subcommand to bpftool in order to test-run programs
through the kernel testing infrastructure of BPF, from Quentin.
3) Improve verifier for BPF sockaddr programs to support 8-byte stores
for user_ip6 and msg_src_ip6 members given clang tends to generate
such stores, from Stanislav.
4) Enable the new BPF JIT zero-extension optimization for further
riscv64 ALU ops, from Luke.
5) Fix a bpftool json JIT dump crash on powerpc, from Jiri.
6) Fix an AF_XDP race in generic XDP's receive path, from Ilya.
7) Various smaller fixes from Ilya, Yue and Arnd.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Unlike driver mode, generic xdp receive could be triggered
by different threads on different CPU cores at the same time
leading to the fill and rx queue breakage. For example, this
could happen while sending packets from two processes to the
first interface of veth pair while the second part of it is
open with AF_XDP socket.
Need to take a lock for each generic receive to avoid race.
Fixes: c497176cb2e4 ("xsk: add Rx receive functions and poll support") Signed-off-by: Ilya Maximets <i.maximets@samsung.com> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> Tested-by: William Tu <u9012063@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
David S. Miller [Mon, 8 Jul 2019 23:37:30 +0000 (16:37 -0700)]
Merge branch 'mp-inner-L3'
Stephen Suryaputra says:
====================
net: Multipath hashing on inner L3
This series extends commit 363887a2cdfe ("ipv4: Support multipath
hashing on inner IP pkts for GRE tunnel") to include support when the
outer L3 is IPv6 and to consider the case where the inner L3 is
different version from the outer L3, such as IPv6 tunneled by IPv4 GRE
or vice versa. It also includes kselftest scripts to test the use cases.
v2: Clarify the commit messages in the commits in this series to use the
term tunneled by IPv4 GRE or by IPv6 GRE so that it's clear which
one is the inner and which one is the outer (per David Miller).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
selftests: forwarding: Test multipath hashing on inner IP pkts for GRE tunnel
Add selftest scripts for multipath hashing on inner IP pkts when there
is a single GRE tunnel but there are multiple underlay routes to reach
the other end of the tunnel.
Four cases are covered in these scripts:
- IPv4 inner, IPv4 outer
- IPv6 inner, IPv4 outer
- IPv4 inner, IPv6 outer
- IPv6 inner, IPv6 outer
Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Make the same support as commit 363887a2cdfe ("ipv4: Support multipath
hashing on inner IP pkts for GRE tunnel") for outer IPv6. The hashing
considers both IPv4 and IPv6 pkts when they are tunneled by IPv6 GRE.
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ipv4: Multipath hashing on inner L3 needs to consider inner IPv6 pkts
Commit 363887a2cdfe ("ipv4: Support multipath hashing on inner IP pkts
for GRE tunnel") supports multipath policy value of 2, Layer 3 or inner
Layer 3 if present, but it only considers inner IPv4. There is a use
case of IPv6 is tunneled by IPv4 GRE, thus add the ability to hash on
inner IPv6 addresses.
Fixes: 363887a2cdfe ("ipv4: Support multipath hashing on inner IP pkts for GRE tunnel") Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wen Yang [Sat, 6 Jul 2019 04:23:41 +0000 (12:23 +0800)]
net: pasemi: fix an use-after-free in pasemi_mac_phy_init()
The phy_dn variable is still being used in of_phy_connect() after the
of_node_put() call, which may result in use-after-free.
Fixes: 1dd2d06c0459 ("net: Rework pasemi_mac driver to use of_mdio infrastructure") Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: netdev@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Wen Yang [Sat, 6 Jul 2019 03:38:41 +0000 (11:38 +0800)]
net: axienet: fix a potential double free in axienet_probe()
There is a possible use-after-free issue in the axienet_probe():
1701: np = of_parse_phandle(pdev->dev.of_node, "axistream-connected", 0);
1702: if (np) {
...
1787: of_node_put(np); ---> released here
1788: lp->eth_irq = platform_get_irq(pdev, 0);
1789: } else {
...
1801: }
1802: if (IS_ERR(lp->dma_regs)) {
...
1805: of_node_put(np); ---> double released here
1806: goto free_netdev;
1807: }
We solve this problem by removing the unnecessary of_node_put().
Fixes: 28ef9ebdb64c ("net: axienet: make use of axistream-connected attribute optional") Signed-off-by: Wen Yang <wen.yang99@zte.com.cn> Cc: Anirudha Sarangi <anirudh@xilinx.com> Cc: John Linn <John.Linn@xilinx.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Michal Simek <michal.simek@xilinx.com> Cc: Robert Hancock <hancock@sedsystems.ca> Cc: netdev@vger.kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Reviewed-by: Robert Hancock <hancock@sedsystems.ca> Signed-off-by: David S. Miller <davem@davemloft.net>
DWMAC4 is capable to support clause 45 mdio communication.
This patch enable the feature on stmmac_mdio_write() and
stmmac_mdio_read() by following phy_write_mmd() and
phy_read_mmd() mdiobus read write implementation format.
Reviewed-by: Li, Yifan <yifan2.li@intel.com> Signed-off-by: Kweh Hock Leong <hock.leong.kweh@intel.com> Signed-off-by: Ong Boon Leong <boon.leong.ong@intel.com> Signed-off-by: Voon Weifeng <weifeng.voon@intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
In this release cycle the number of NIC drivers using page_pool
will likely reach 4 drivers. It is about time to add a maintainer
entry. Add myself and Ilias.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Ilias Apalodimas <ilias.apalodimas@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 8 Jul 2019 22:50:06 +0000 (15:50 -0700)]
Merge branch 'mvpp2-cls-ether'
Maxime Chevallier says:
====================
net: mvpp2: Add classification based on the ETHER flow
This series adds support for classification of the ETHER flow in the
mvpp2 driver.
The first patch allows detecting when a user specifies a flow_type that
isn't supported by the driver, while the second adds support for this
flow_type by adding the mapping between the ETHER_FLOW enum value and
the relevant classifier flow entries.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Users can specify classification actions based on the 'ether' flow type.
In that case, this will apply to all ethernet traffic, superseeding
flows such as 'udp4' or 'tcp6'.
Add support for this flow type in the PPv2 classifier, by mapping the
ETHER_FLOW value to the corresponding entries in the classifier.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
selftests: txring_overwrite: fix incorrect test of mmap() return value
If mmap() fails it returns MAP_FAILED, which is defined as ((void *) -1).
The current if-statement incorrectly tests if *ring is NULL.
Fixes: 358be656406d ("selftests/net: add txring_overwrite") Signed-off-by: Frank de Brabander <debrabander@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 8 Jul 2019 22:35:17 +0000 (15:35 -0700)]
Merge branch 'vsock-virtio-fixes'
Stefano Garzarella says:
====================
vsock/virtio: several fixes in the .probe() and .remove()
During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock
before registering the driver", Stefan pointed out some possible issues
in the .probe() and .remove() callbacks of the virtio-vsock driver.
This series tries to solve these issues:
- Patch 1 adds RCU critical sections to avoid use-after-free of
'the_virtio_vsock' pointer.
- Patch 2 stops workers before to call vdev->config->reset(vdev) to
be sure that no one is accessing the device.
- Patch 3 moves the works flush at the end of the .remove() to avoid
use-after-free of 'vsock' object.
v3:
- Patch 1: use rcu_dereference_protected() to get the_virtio_vosck value in
the virtio_vsock_probe() [Jason]
Before this series the guest crashes in a few second. After this series the
test runs (~12h) without issues.
Tested on an SMP guest (-smp 4 -monitor tcp:127.0.0.1:1234,server,nowait)
with these scripts to stress the .probe()/.remove() path:
vsock/virtio: fix flush of works during the .remove()
This patch moves the flush of works after vdev->config->del_vqs(vdev),
because we need to be sure that no workers run before to free the
'vsock' object.
Since we stopped the workers using the [tx|rx|event]_run flags,
we are sure no one is accessing the device while we are calling
vdev->config->reset(vdev), so we can safely move the workers' flush.
Before the vdev->config->del_vqs(vdev), workers can be scheduled
by VQ callbacks, so we must flush them after del_vqs(), to avoid
use-after-free of 'vsock' object.
Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Before to call vdev->config->reset(vdev) we need to be sure that
no one is accessing the device, for this reason, we add new variables
in the struct virtio_vsock to stop the workers during the .remove().
This patch also add few comments before vdev->config->reset(vdev)
and vdev->config->del_vqs(vdev).
Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Suggested-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
vsock/virtio: use RCU to avoid use-after-free on the_virtio_vsock
Some callbacks used by the upper layers can run while we are in the
.remove(). A potential use-after-free can happen, because we free
the_virtio_vsock without knowing if the callbacks are over or not.
To solve this issue we move the assignment of the_virtio_vsock at the
end of .probe(), when we finished all the initialization, and at the
beginning of .remove(), before to release resources.
For the same reason, we do the same also for the vdev->priv.
We use RCU to be sure that all callbacks that use the_virtio_vsock
ended before freeing it. This is not required for callbacks that
use vdev->priv, because after the vdev->config->del_vqs() we are sure
that they are ended and will no longer be invoked.
We also take the mutex during the .remove() to avoid that .probe() can
run while we are resetting the device.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 8 Jul 2019 22:30:13 +0000 (15:30 -0700)]
Merge branch 'b53-docs'
Benedikt Spranger says:
====================
Document the configuration of b53
this is the third round to document the configuration of a b53 supported
switch.
v3..v2:
- fix a typo
- improve b53 configuration in DSA_TAG_PROTO_NONE showcase.
- grade up from RFC to patch for mainline inclusion.
v1..v2:
- split out generic parts of the configuration.
- target comments by Andrew Lunn and Florian Fainelli.
- make changes visible to build system
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Document the different needs of documentation for the b53 driver.
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Document DSA tagged and VLAN based switch configuration by showcases.
Signed-off-by: Benedikt Spranger <b.spranger@linutronix.de> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Mon, 8 Jul 2019 21:53:04 +0000 (17:53 -0400)]
bnxt_en: add page_pool support
This removes contention over page allocation for XDP_REDIRECT actions by
adding page_pool support per queue for the driver. The performance for
XDP_REDIRECT actions scales linearly with the number of cores performing
redirect actions when using the page pools instead of the standard page
allocator.
v2: Fix up the error path from XDP registration, noted by Ilias Apalodimas.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Mon, 8 Jul 2019 21:53:03 +0000 (17:53 -0400)]
bnxt_en: optimized XDP_REDIRECT support
This adds basic support for XDP_REDIRECT in the bnxt_en driver. Next
patch adds the more optimized page pool support.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Mon, 8 Jul 2019 21:53:02 +0000 (17:53 -0400)]
bnxt_en: Refactor __bnxt_xmit_xdp().
__bnxt_xmit_xdp() is used by XDP_TX and ethtool loopback packet transmit.
Refactor it so that it can be re-used by the XDP_REDIRECT logic.
Restructure the TX interrupt handler logic to cleanly separate XDP_TX
logic in preparation for XDP_REDIRECT.
Acked-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Gospodarek [Mon, 8 Jul 2019 21:53:01 +0000 (17:53 -0400)]
bnxt_en: rename some xdp functions
Renaming bnxt_xmit_xdp to __bnxt_xmit_xdp to get ready for XDP_REDIRECT
support and reduce confusion/namespace collision.
Signed-off-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 8 Jul 2019 21:58:04 +0000 (14:58 -0700)]
Merge branch 'cpsw-Add-XDP-support'
Ivan Khoronzhuk says:
====================
net: ethernet: ti: cpsw: Add XDP support
This patchset adds XDP support for TI cpsw driver and base it on
page_pool allocator. It was verified on af_xdp socket drop,
af_xdp l2f, ebpf XDP_DROP, XDP_REDIRECT, XDP_PASS, XDP_TX.
It was verified with following configs enabled:
CONFIG_JIT=y
CONFIG_BPFILTER=y
CONFIG_BPF_SYSCALL=y
CONFIG_XDP_SOCKETS=y
CONFIG_BPF_EVENTS=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_JIT=y
CONFIG_CGROUP_BPF=y
Link on previous v7:
https://lkml.org/lkml/2019/7/4/715
Also regular tests with iperf2 were done in order to verify impact on
regular netstack performance, compared with base commit:
https://pastebin.com/JSMT0iZ4
v8..v9:
- fix warnings on arm64 caused by typos in type casting
v7..v8:
- corrected dma calculation based on headroom instead of hard start
- minor comment changes
v6..v7:
- rolled back to v4 solution but with small modification
- picked up patch:
https://www.spinics.net/lists/netdev/msg583145.html
- added changes related to netsec fix and cpsw
v5..v6:
- do changes that is rx_dev while redirect/flush cycle is kept the same
- dropped net: ethernet: ti: davinci_cpdma: return handler status
- other changes desc in patches
v4..v5:
- added two plreliminary patches:
net: ethernet: ti: davinci_cpdma: allow desc split while down
net: ethernet: ti: cpsw_ethtool: allow res split while down
- added xdp alocator refcnt on xdp level, avoiding page pool refcnt
- moved flush status as separate argument for cpdma_chan_process
- reworked cpsw code according to last changes to allocator
- added missed statistic counter
v3..v4:
- added page pool user counter
- use same pool for ndevs in dual mac
- restructured page pool create/destroy according to the last changes in API
v2..v3:
- each rxq and ndev has its own page pool
v1..v2:
- combined xdp_xmit functions
- used page allocation w/o refcnt juggle
- unmapped page for skb netstack
- moved rxq/page pool allocation to open/close pair
- added several preliminary patches:
net: page_pool: add helper function to retrieve dma addresses
net: page_pool: add helper function to unmap dma addresses
net: ethernet: ti: cpsw: use cpsw as drv data
net: ethernet: ti: cpsw_ethtool: simplify slave loops
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Mon, 8 Jul 2019 21:34:32 +0000 (00:34 +0300)]
net: ethernet: ti: cpsw: add XDP support
Add XDP support based on rx page_pool allocator, one frame per page.
Page pool allocator is used with assumption that only one rx_handler
is running simultaneously. DMA map/unmap is reused from page pool
despite there is no need to map whole page.
Due to specific of cpsw, the same TX/RX handler can be used by 2
network devices, so special fields in buffer are added to identify
an interface the frame is destined to. Thus XDP works for both
interfaces, that allows to test xdp redirect between two interfaces
easily. Also, each rx queue have own page pools, but common for both
netdevs.
XDP prog is common for all channels till appropriate changes are added
in XDP infrastructure. Also, once page_pool recycling becomes part of
skb netstack some simplifications can be added, like removing
page_pool_release_page() before skb receive.
In order to keep rx_dev while redirect, that can be somehow used in
future, do flush in rx_handler, that allows to keep rx dev the same
while redirect. It allows to conform with tracing rx_dev pointed
by Jesper.
Also, there is probability, that XDP generic code can be extended to
support multi ndev drivers like this one, using same rx queue for
several ndevs, based on switchdev for instance or else. In this case,
driver can be modified like exposed here:
https://lkml.org/lkml/2019/7/3/243
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Mon, 8 Jul 2019 21:34:31 +0000 (00:34 +0300)]
net: ethernet: ti: cpsw_ethtool: allow res split while down
That's possible to set channel num while interfaces are down. When
interface gets up it should resplit budget. This resplit can happen
after phy is up but only if speed is changed, so should be set before
this, for this allow it to happen while changing number of channels,
when interfaces are down.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Mon, 8 Jul 2019 21:34:30 +0000 (00:34 +0300)]
net: ethernet: ti: davinci_cpdma: allow desc split while down
That's possible to set ring params while interfaces are down. When
interface gets up it uses number of descs to fill rx queue and on
later on changes to create rx pools. Usually, this resplit can happen
after phy is up, but it can be needed before this, so allow it to
happen while setting number of rx descs, when interfaces are down.
Also, if no dependency on intf state, move it to cpdma layer, where
it should be.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>