Setup TPID's for vlan0 and vlan1 for Tx VLAN insertion offloads.
Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Krzysztof Kanas [Sun, 2 Dec 2018 12:47:45 +0000 (18:17 +0530)]
octeontx2-af: Add support for Tx packet marking
NIX_AF_MARK_FORMAT(0..127)_CTL register enables an SW defined
means to mark/insert various data in the packet based on
final packet color from traffic shaping HW.
0..127 works as an index to choose the algorithm. On success,
the mailbox returns the index to the client.
Add NIX_MARK_FORMAT_CFG mailbox which reserves mark format based on
tuple (offset, y_mask, y_val, r_mask, r_val)
If the tuple is requested again for mark format that was already
reserved, then it will be reused. If not it will reserve a new entry
if space is available.
Also on AF init commonly used marker format such as VLAN DEI, IPv4
ECN, IPv4 DSCP are reserved for AF consumers.
Signed-off-by: Krzysztof Kanas <kkanas@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vamsi Attunuru [Sun, 2 Dec 2018 12:47:44 +0000 (18:17 +0530)]
octeontx2-af: Enable RSS with promiscuous mode
This patch adds support for enabling RSS in promiscuous mode
if RSS is already requested by the AF client.
Signed-off-by: Vamsi Attunuru <vamsi.attunuru@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jerin Jacob [Sun, 2 Dec 2018 12:47:42 +0000 (18:17 +0530)]
octeontx2-af: Enable inner IPv4 checksum and its error code
This patch enables the inner IPv4 checksum and
defines the error code for Rx inner and outer checksum errors.
Setting ERRCODE as 1 so that CQE descriptor can be embedded
valid checksum error code and the driver can interpret
checksum error as ERRLEV = LID + 1 and ERRCODE = 1.
Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
octeontx2-af: Allow freeing single TLx Tx schedule queue
The default behavior was to free all the TLx Tx schedule
queues. This patch adds support for freeing a single Tx
schedule queue if TXSCHQ_FREE_ALL flag is not set.
Signed-off-by: Krzysztof Kanas <kkanas@marvell.com> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
octeontx2-af: Restrict TL1 allocation and configuration
TL1 is the root node in the scheduling hierarchy and
it is a global resource with a limited number.
This patch introduces restriction and validation on
the allocation of the TL1 nodes for the effective resource
sharing across the AF consumers.
- Limit TL1 allocation to 2 per lmac.
One could be for the normal link and one for IEEE802.3br
express link (Express Send DMA).
Effectively all the VF's of an RVU PF(lmac) share the two TL1 schqs.
- TL1 cannot be freed once allocated.
- Allow VF's to only apply default config to TL1 if not
already applied. PF's can always overwrite the TL1 config.
- Consider NIX_AQ_INSTOP_WRITE while validating txschq
when sq.ena is set.
Signed-off-by: Krzysztof Kanas <kkanas@marvell.com> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jerin Jacob [Sun, 2 Dec 2018 12:47:39 +0000 (18:17 +0530)]
octeontx2-af: Add support for runtime RSS algo index reservation
Introduced reserve_flowkey_alg_idx()to reserve RSS algorithm index,
it would internally use set_flowkey_fields() to generate fields
based on the flow key dynamically.
On AF driver init, it would reserve a predefined set RSS algo indexes,
which will be available all the time for all the AF driver consumers.
The leftover algo indexes can be reserved at runtime through
exiting nix_rss_flowkey_cfg mailbox message.
The NIX_FLOW_KEY_TYPE_PORT is removed from predefined a set of RSS flow
type as it is not used by any consumer.
Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jerin Jacob [Sun, 2 Dec 2018 12:47:38 +0000 (18:17 +0530)]
octeontx2-af: Add support for dynamic flow cfg to RSS field generation
Introduce state-based algorithm to convert the flow_key value
to RSS algo field used by NIX_AF_RX_FLOW_KEY_ALGX_FIELDX register.
The outer `for loop` goes over _all_ protocol field and the following
variables depict the state machine forward progress logic.
a) keyoff_marker - Enabled when hash byte length needs to be accounted
in field->key_offset update.
b) field_marker - Enabled when a new field needs to be selected.
c) group_member - Enabled when a protocol is part of a group.
This would remove the existing hard coding and enable to add
new protocol support seamlessly.
Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: Nithin Dabilpuram <ndabilpuram@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jerin Jacob [Sun, 2 Dec 2018 12:47:37 +0000 (18:17 +0530)]
octeontx2-af: Add response for RSS flow key cfg message
Added response for nix_rss_flowkey_cfg message to return
selected RSS algorithm index.
The FLOW_KEY_TYPE* definition is part of the mbox message and
it will be used by the other consumers of AF driver hence moving to mbox.h.
Also renamed FLOW_* definitions to NIX_FLOW_* to avoid global
name space collisions, as we have various coming from
include/uapi/linux/pkt_cls.h for example.
Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Sun, 2 Dec 2018 12:47:36 +0000 (18:17 +0530)]
octeontx2-af: Skip NIXLF check for bcast MCE entry
At the time of initial broadcast packet replication table init,
NIXLFs are not yet attached to PF_FUNCs. Hence skipped checking
NIXLF while submitting MCE entry init instruction to NIX admin queue.
Also did a minor cleanup while installing bcast match entry in
packet parser unit i.e NPC.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: Jerin Jacob <jerinj@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 3 Dec 2018 23:58:33 +0000 (15:58 -0800)]
Merge branch 'udp-msg_zerocopy'
Willem de Bruijn says:
====================
udp msg_zerocopy
Enable MSG_ZEROCOPY for udp sockets
Patch 1/3 is the main patch, a rework of RFC patch
http://patchwork.ozlabs.org/patch/899630/
more details in the patch commit message
Patch 2/3 is an optimization to remove a branch from the UDP hot path
and refcount_inc/refcount_dec_and_test pair when zerocopy is used.
This used to be included in the first patch in v2.
Patch 3/3 runs the already existing udp zerocopy tests
as part of kselftest
See also recent Linux Plumbers presentation
https://linuxplumbersconf.org/event/2/contributions/106/attachments/104/128/willemdebruijn-lpc2018-udpgso-presentation-20181113.pdf
Changes:
v1 -> v2
- Fixup reverse christmas tree violation
v2 -> v3
- Split refcount avoidance optimization into separate patch
- Fix refcount leak on error in fragmented case
(thanks to Paolo Abeni for pointing this one out!)
- Fix refcount inc on zero
v3 -> v4
- Move skb_zcopy_set below the only kfree_skb that might cause
a premature uarg destroy before skb_zerocopy_put_abort
- Move the entire skb_shinfo assignment block, to keep that
cacheline access in one place
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Fri, 30 Nov 2018 20:32:41 +0000 (15:32 -0500)]
selftests: extend zerocopy tests to udp
Both msg_zerocopy and udpgso_bench have udp zerocopy variants.
Exercise these as part of the standard kselftest run.
With udp, msg_zerocopy has no control channel. Ensure that the
receiver exits after the sender by accounting for the initial
delay in starting them (in msg_zerocopy.sh).
Signed-off-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Fri, 30 Nov 2018 20:32:40 +0000 (15:32 -0500)]
udp: elide zerocopy operation in hot path
With MSG_ZEROCOPY, each skb holds a reference to a struct ubuf_info.
Release of its last reference triggers a completion notification.
The TCP stack in tcp_sendmsg_locked holds an extra ref independent of
the skbs, because it can build, send and free skbs within its loop,
possibly reaching refcount zero and freeing the ubuf_info too soon.
The UDP stack currently also takes this extra ref, but does not need
it as all skbs are sent after return from __ip(6)_append_data.
Avoid the extra refcount_inc and refcount_dec_and_test, and generally
the sock_zerocopy_put in the common path, by passing the initial
reference to the first skb.
This approach is taken instead of initializing the refcount to 0, as
that would generate error "refcount_t: increment on 0" on the
next skb_zcopy_set.
Changes
v3 -> v4
- Move skb_zcopy_set below the only kfree_skb that might cause
a premature uarg destroy before skb_zerocopy_put_abort
- Move the entire skb_shinfo assignment block, to keep that
cacheline access in one place
Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Willem de Bruijn [Fri, 30 Nov 2018 20:32:39 +0000 (15:32 -0500)]
udp: msg_zerocopy
Extend zerocopy to udp sockets. Allow setting sockopt SO_ZEROCOPY and
interpret flag MSG_ZEROCOPY.
This patch was previously part of the zerocopy RFC patchsets. Zerocopy
is not effective at small MTU. With segmentation offload building
larger datagrams, the benefit of page flipping outweights the cost of
generating a completion notification.
tools/testing/selftests/net/msg_zerocopy.sh after applying follow-on
test patch and making skb_orphan_frags_rx same as skb_orphan_frags:
Changes
v1 -> v2
- Fixup reverse christmas tree violation
v2 -> v3
- Split refcount avoidance optimization into separate patch
- Fix refcount leak on error in fragmented case
(thanks to Paolo Abeni for pointing this one out!)
- Fix refcount inc on zero
- Test sock_flag SOCK_ZEROCOPY directly in __ip_append_data.
This is needed since commit 5cf4a8532c99 ("tcp: really ignore
MSG_ZEROCOPY if no SO_ZEROCOPY") did the same for tcp.
Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 3 Dec 2018 23:44:27 +0000 (15:44 -0800)]
Merge tag 'wireless-drivers-next-for-davem-2018-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
wireless-drivers-next patches for 4.21
First set of patches for 4.21. Most notable here is support for
Quantenna's QSR1000/QSR2000 chipsets and more flexible ways to provide
nvram files for brcmfmac.
Major changes:
brcmfmac
* add support for first trying to get a board specific nvram file
* add support for getting nvram contents from EFI variables
qtnfmac
* use single PCIe driver for all platforms and rename
Kconfig option CONFIG_QTNFMAC_PEARL_PCIE to CONFIG_QTNFMAC_PCIE
* add support for QSR1000/QSR2000 (Topaz) family of chipsets
ath10k
* add support for WCN3990 firmware crash recovery
* add firmware memory dump support for QCA4019
wil6210
* add firmware error recovery while in AP mode
ath9k
* remove experimental notice from dynack feature
iwlwifi
* PCI IDs for some new 9000-series cards
* improve antenna usage on connection problems
* new firmware debugging infrastructure
* some more work on 802.11ax
* improve support for multiple RF modules with 22000 devices
cordic
* move cordic macros and defines to a public header file
* convert brcmsmac and b43 to fully use cordic library
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
====================
davinci_emac: read the MAC address from nvmem
This series is part of a bigger series that aims at removing the platform
data structure from the at24 EEPROM driver[1].
We provide a generalized version of of_get_nvmem_mac_address(), switch the
only user of the of_ variant to using it, remove the previous
implementation and use the new routine in the davinci_emac driver.
All DaVinci boards still supported in board files now define nvmem
cells containing the MAC address. We want to stop using the setup
callback from at24 so the MAC address for those users will no longer
be provided over platform data. If we didn't get a valid MAC in pdata,
try nvmem before resorting to a random MAC.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We've switched all users to nvmem_get_mac_address(). Remove the now
dead code.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Reviewed-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: cadence: switch to using nvmem_get_mac_address()
We now have a generalized helper routine to read the MAC address from
nvmem which takes struct device as argument. The nvmem subsystem will
then try device tree first before all other potential providers.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Acked-by: Nicolas Ferre <nicolas.ferre@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We already have of_get_nvmem_mac_address() but some non-DT systems want
to read the MAC address from NVMEM too. Implement a generalized routine
that takes struct device as argument.
Signed-off-by: Bartosz Golaszewski <bgolaszewski@baylibre.com> Signed-off-by: David S. Miller <davem@davemloft.net>
NeilBrown [Thu, 29 Nov 2018 23:26:50 +0000 (10:26 +1100)]
rhashtable: detect when object movement between tables might have invalidated a lookup
Some users of rhashtables might need to move an object from one table
to another - this appears to be the reason for the incomplete usage
of NULLS markers.
To support these, we store a unique NULLS_MARKER at the end of
each chain, and when a search fails to find a match, we check
if the NULLS marker found was the expected one. If not, the search
may not have examined all objects in the target bucket, so it is
repeated.
The unique NULLS_MARKER is derived from the address of the
head of the chain. As this cannot be derived at load-time the
static rhnull in rht_bucket_nested() needs to be initialised
at run time.
Any caller of a lookup function must still be prepared for the
possibility that the object returned is in a different table - it
might have been there for some time.
Note that this does NOT provide support for other uses of
NULLS_MARKERs such as allocating with SLAB_TYPESAFE_BY_RCU or changing
the key of an object and re-inserting it in the same table.
These could only be done safely if new objects were inserted
at the *start* of a hash chain, and that is not currently the case.
Signed-off-by: NeilBrown <neilb@suse.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Thu, 29 Nov 2018 16:40:59 +0000 (16:40 +0000)]
net: hns3: Adds support to dump(using ethool-d) PCIe regs in HNS3 PF driver
This patch adds support to dump PF PCIe registers using ethtool -d
for HNS3 PF Driver.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Thu, 29 Nov 2018 16:40:58 +0000 (16:40 +0000)]
net: hns3: Support "ethtool -d" for HNS3 VF driver
This patch adds "ethtool -d" support for HNS3 VF Driver.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Tue, 27 Nov 2018 21:30:14 +0000 (22:30 +0100)]
net: phy: improve generic EEE ethtool functions
So far the two functions consider neither member eee_enabled nor
eee_active. Therefore network drivers have to do this in some kind
of glue code. I think this can be avoided.
Getting EEE parameters:
When not advertising any EEE mode, we can't consider EEE to be enabled.
Therefore interpret "EEE enabled" as "we advertise at least one EEE
mode". It's similar with "EEE active": interpret it as "EEE modes
advertised by both link partner have at least one mode in common".
Setting EEE parameters:
If eee_enabled isn't set, don't advertise any EEE mode and restart
aneg if needed to switch off EEE. If eee_enabled is set and
data->advertised is empty (e.g. because EEE was disabled), advertise
everything we support as default. This way EEE can easily switched
on/off by doing ethtool --set-eee <if> eee on/off, w/o any additional
parameters.
The changes to both functions shouldn't break any existing user.
Once the changes have been applied, at least some users can be
simplified.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
v4 -> v5:
- move test script to its own patch (6/6)
- add schematic for test script
- apply David Ahern comments to the test script
v3 -> v4:
- rename vxlan_is_in_l3mdev_chain to netdev_is_upper master
- move it to net/core/dev.c
- make it return bool instead of int
- check if remote_ifindex is zero before resolving the l3mdev
- add testing script
v2 -> v3:
- fix build when CONFIG_NET_IPV6 is off
- fix build "unused l3mdev_master_upper_ifindex_by_index" build error with some
configs
v1 -> v2:
- move vxlan_get_l3mdev from vxlan driver to l3mdev driver as
l3mdev_master_upper_ifindex_by_index
- vxlan: rename variables named l3mdev_ifindex to ifindex
v0 -> v1:
- fix typos
We are trying to isolate the VXLAN traffic from different VMs with VRF as shown
in the schemas below:
ip link add green type vrf table 1
ip link set green up
ip link add eth0.2030 link eth0 type vlan id 2030
ip link set eth0.2030 master green
ip addr add 10.0.0.1/24 dev eth0.2030
ip link set eth0.2030 up
ip link add blue type vrf table 2
ip link set blue up
ip link add br-blue type bridge
ip link set br-blue master blue
ip link set br-blue up
ip link add vxlan-blue type vxlan id 2 local 10.0.0.1 dev eth0.2030 \
port 4789
ip link set vxlan-blue master br-blue
ip link set vxlan-blue up
ip link set tap-blue master br-blue
ip link set tap-blue up
ip link add red type vrf table 3
ip link set red up
ip link add br-red type bridge
ip link set br-red master red
ip link set br-red up
ip link add vxlan-red type vxlan id 3 local 10.0.0.1 dev eth0.2030 \
port 4789
ip link set vxlan-red master br-red
ip link set vxlan-red up
ip link set tap-red master br-red
ip link set tap-red up
We faced some issue in the datapath, here are the details:
* Egress traffic:
The vxlan packets are sent directly to the default VRF because it's where the
socket is bound, therefore the traffic has a default route via eth0. the
workaround is to force this traffic to VRF green with ip rules.
* Ingress traffic:
When receiving the traffic on eth0.2030 the vxlan socket is unreachable from
VRF green. The workaround is to enable *udp_l3mdev_accept* sysctl, but
this breaks isolation between overlay and underlay: packets sent from
blue or red by e.g. a guest VM will be accepted by the socket, allowing
injection of VXLAN packets from the overlay.
This patch series fixes the issues describe above by allowing VXLAN socket to be
bound to a specific VRF device therefore looking up in the correct table.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexis Bauvin [Mon, 3 Dec 2018 09:54:41 +0000 (10:54 +0100)]
test/net: Add script for VXLAN underlay in a VRF
This script tests the support of a VXLAN underlay in a non-default VRF.
It does so by simulating two hypervisors and two VMs, an extended L2
between the VMs with the hypervisors as VTEPs with the underlay in a
VRF, and finally by pinging the two VMs.
It also tests that moving the underlay from a VRF to another works when
down/up the VXLAN interface.
Signed-off-by: Alexis Bauvin <abauvin@scaleway.com> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com> Reviewed-by: David Ahern <dsahern@gmail.com> Tested-by: Amine Kherbouche <akherbouche@scaleway.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexis Bauvin [Mon, 3 Dec 2018 09:54:40 +0000 (10:54 +0100)]
vxlan: add support for underlay in non-default VRF
Creating a VXLAN device with is underlay in the non-default VRF makes
egress route lookup fail or incorrect since it will resolve in the
default VRF, and ingress fail because the socket listens in the default
VRF.
This patch binds the underlying UDP tunnel socket to the l3mdev of the
lower device of the VXLAN device. This will listen in the proper VRF and
output traffic from said l3mdev, matching l3mdev routing rules and
looking up the correct routing table.
When the VXLAN device does not have a lower device, or the lower device
is in the default VRF, the socket will not be bound to any interface,
keeping the previous behaviour.
The underlay l3mdev is deduced from the VXLAN lower device
(IFLA_VXLAN_LINK).
Alexis Bauvin [Mon, 3 Dec 2018 09:54:39 +0000 (10:54 +0100)]
l3mdev: add function to retreive upper master
Existing functions to retreive the l3mdev of a device did not walk the
master chain to find the upper master. This patch adds a function to
find the l3mdev, even indirect through e.g. a bridge:
Alexis Bauvin [Mon, 3 Dec 2018 09:54:38 +0000 (10:54 +0100)]
udp_tunnel: add config option to bind to a device
UDP tunnel sockets are always opened unbound to a specific device. This
patch allow the socket to be bound on a custom device, which
incidentally makes UDP tunnels VRF-aware if binding to an l3mdev.
Signed-off-by: Alexis Bauvin <abauvin@scaleway.com> Reviewed-by: Amine Kherbouche <akherbouche@scaleway.com> Tested-by: Amine Kherbouche <akherbouche@scaleway.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, drivers do not have the ability to control the firmware
loading policy and they always use their own fixed policy. This prevents
drivers from running the device with a different firmware version for
testing and/or debugging purposes. For example, testing a firmware bug
fix.
For these situations, the new devlink generic parameter,
'fw_load_policy', gives the ability to control this option and allows
drivers to run with a different firmware version than required by the
driver.
Patch #1 adds the new parameter to devlink. The other two patches, #2
and #3, add support for this parameter in the mlxsw driver.
Example:
# Query the devlink parameters supported by the device
$ devlink dev param show
pci/0000:03:00.0:
name fw_load_policy type generic
values:
cmode driverinit value driver
# Flash new firmware using ethtool
$ ethtool -f swp1 mellanox/mlxsw_spectrum-13.1703.4.mfa2
# Toggle parameter
$ devlink dev param set pci/0000:03:00.0 name fw_load_policy value flash cmode driverinit
# devlink reset
$ devlink dev reload pci/0000:03:00.0
# Query firmware version to show changes took affect
$ ethtool -i swp1
driver: mlxsw_spectrum
version: 1.0
firmware-version: 13.1703.4
expansion-rom-version:
bus-info: 0000:03:00.0
supports-statistics: yes
supports-test: no
supports-eeprom-access: no
supports-register-dump: no
supports-priv-flags: no
iproute2 patches available here:
https://github.com/tshalom/iproute2-next
v2:
* Change 'fw_version_check' to 'fw_load_policy' with values 'driver' and
'flash' (Jakub)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Mon, 3 Dec 2018 07:59:02 +0000 (07:59 +0000)]
mlxsw: spectrum: Load firmware version based on devlink parameter
Load firmware version based on 'fw_load_policy' devlink parameter. The
driver supports these two options:
* DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_DRIVER (0)
Default, load firmware version preferred by the driver
* DEVLINK_PARAM_FW_LOAD_POLICY_VALUE_FLASH (1)
Load firmware currently stored in flash
The second option, 'flash', allow the device to run with different firmware
version than preferred by the driver for testing and/or debugging purposes.
For example, testing a firmware bug fix.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Mon, 3 Dec 2018 07:59:01 +0000 (07:59 +0000)]
mlxsw: core: Reset firmware after flash during driver initialization
After flashing new firmware during the driver initialization flow (reload
or not), the driver should do a firmware reset when it gets -EAGAIN in
order to load the new one.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Mon, 3 Dec 2018 07:58:59 +0000 (07:58 +0000)]
devlink: Add 'fw_load_policy' generic parameter
Many drivers load the device's firmware image during the initialization
flow either from the flash or from the disk. Currently this option is not
controlled by the user and the driver decides from where to load the
firmware image.
'fw_load_policy' gives the ability to control this option which allows the
user to choose between different loading policies supported by the driver.
This parameter can be useful while testing and/or debugging the device. For
example, testing a firmware bug fix.
Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 3 Dec 2018 07:04:57 +0000 (08:04 +0100)]
net: phy: don't allow __set_phy_supported to add unsupported modes
Currently __set_phy_supported allows to add modes w/o checking whether
the PHY supports them. This is wrong, it should never add modes but
only remove modes we don't want to support.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Use memset to initialize the object to take compiler instrumentation out
of the equation.
Fixes: e58ba4544c77 ("net: usb: aqc111: Add support for wake on LAN by MAGIC packet") Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
YueHaibing [Thu, 29 Nov 2018 02:36:32 +0000 (02:36 +0000)]
net: qualcomm: rmnet: Remove set but not used variable 'cmd'
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c: In function 'rmnet_map_do_flow_control':
drivers/net/ethernet/qualcomm/rmnet/rmnet_map_command.c:23:36: warning:
variable 'cmd' set but not used [-Wunused-but-set-variable]
struct rmnet_map_control_command *cmd;
'cmd' not used anymore now, should also be removed.
Signed-off-by: YueHaibing <yuehaibing@huawei.com> Acked-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Wed, 28 Nov 2018 18:12:56 +0000 (19:12 +0100)]
tun: implement carrier change
The userspace may need to control the carrier state.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Didier Pallard <didier.pallard@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Wed, 28 Nov 2018 16:37:53 +0000 (17:37 +0100)]
net: reorder flowi_common fields to avoid holes
the flowi* structures are used and memsetted by server functions
in critical path. Currently flowi_common has a couple of holes that
we can eliminate reordering the struct fields. As a side effect,
both flowi4 and flowi6 shrink by 8 bytes.
====================
mlxsw: Add VxLAN support with VLAN-aware bridges
Commit 53e50a6ec24d ("Merge branch 'mlxsw-Add-VxLAN-support'") added
mlxsw support for VxLAN when the VxLAN device was enslaved to
VLAN-unaware bridges. This patchset extends mlxsw to also support VxLAN
with VLAN-aware bridges.
With VLAN-aware bridges, the VxLAN device's VNI is mapped to the VLAN
that is configured as 'pvid untagged' on the corresponding bridge port.
To prevent ambiguity, mlxsw forbids configurations in which the same
VLAN is configured as 'pvid untagged' on multiple VxLAN devices.
Patches #1-#2 add the necessary APIs in mlxsw and the bridge driver.
Patches #3-#4 perform small refactoring in order to prepare mlxsw for
VLAN-aware support.
Patch #5 finally enables the enslavement of VxLAN devices to a
VLAN-aware bridge. Among other things, it extends mlxsw to handle
switchdev notifications about VLAN add / delete on a VxLAN device
enslaved to an offloaded VLAN-aware bridge.
Patches #6-#8 add selftests to test the new functionality.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Nov 2018 20:07:08 +0000 (20:07 +0000)]
selftests: forwarding: Add VxLAN test with a VLAN-aware bridge
The test is very similar to its VLAN-unaware counterpart
(vxlan_bridge_1d.sh), but instead of using multiple VLAN-unaware
bridges, a single VLAN-aware bridge is used with multiple VLANs.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Nov 2018 20:07:06 +0000 (20:07 +0000)]
selftests: mlxsw: Add a test for VxLAN configuration with a VLAN-aware bridge
Extend the existing VLAN-unaware tests with their VLAN-aware
counterparts. This includes sanitization of invalid configuration and
offload indication on the local route performing decapsulation and the
FDB entries perform encapsulation.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Nov 2018 20:07:04 +0000 (20:07 +0000)]
mlxsw: spectrum: Enable VxLAN enslavement to VLAN-aware bridges
Commit 1c30d1836aeb ("mlxsw: spectrum: Enable VxLAN enslavement to
bridges") enabled the enslavement of VxLAN devices to bridges that have
mlxsw ports (or their upper) as slaves. This patch extends mlxsw to also
support VLAN-aware bridges.
The patch is similar in nature to mentioned commit, but there is one
major difference. With VLAN-aware bridges, the VxLAN device's VNI is
mapped to the VLAN that is configured as PVID and egress untagged on the
bridge port.
Therefore, the driver is extended to listen to VLAN configuration on
VxLAN devices of interest and enable / disable NVE encapsulation on the
corresponding 802.1Q FIDs.
To prevent ambiguity, the driver makes sure that a given VLAN is not
configured as PVID and egress untagged on multiple VxLAN devices. This
sanitization takes place both when a port is enslaved to a bridge with
existing VxLAN devices and when a VLAN is added to / removed from a
VxLAN device of interest.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Nov 2018 20:07:02 +0000 (20:07 +0000)]
mlxsw: spectrum_switchdev: Prepare function for VLAN-aware bridges
The vxlan_join() function resolves the FID on which the VNI should be
set and then sets the VNI. Currently, the FID is simply resolved
according to the ifindex of the bridge device to which the VxLAN device
is enslaved. This works because only VLAN-unaware bridges are supported.
With VLAN-aware bridges the FID would need to be resolved based on the
VLAN to which the VNI is mapped to.
Add the VLAN ID to the argument list of the function.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Nov 2018 20:07:01 +0000 (20:07 +0000)]
mlxsw: spectrum_switchdev: Unify VxLAN leave function
The function mlxsw_sp_bridge_vxlan_leave() is currently split between
VLAN-aware and VLAN-unaware bridges, but actually both types can use the
same function.
The function needs to resolve the FID that corresponds to the VxLAN
device and disable NVE encapsulation on it. Instead of looking up the
FID differently for VLAN-aware and VLAN-unaware bridges, we can always
use the VxLAN's device VNI.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Nov 2018 20:06:59 +0000 (20:06 +0000)]
mlxsw: spectrum_fid: Add API to lookup 802.1Q FIDs without creating them
In a similar fashion to commit 564c6d727aca ("mlxsw: spectrum_fid: Add
APIs to lookup FID without creating it"), add a corresponding API to
lookup 802.1Q FIDs.
This is a prerequisite to VxLAN support with VLAN-aware bridges and will
allow us to resolve a 802.1Q FID by its VLAN when an FDB entry is added
on the bridge port of the VxLAN device.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Wed, 28 Nov 2018 20:06:58 +0000 (20:06 +0000)]
net: bridge: Extend br_vlan_get_pvid() for bridge ports
Currently, the function only works for the bridge device itself, but
subsequent patches will need to be able to query the PVID of a given
bridge port, so extend the function.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Doorbell Overflow
If sufficient CPU cores will send doorbells at a sufficiently high rate, they
can cause an overflow in the doorbell queue block message fifo. When fill level
reaches maximum, the device stops accepting all doorbells from that PF until a
recovery procedure has taken place.
Doorbell Overflow Recovery
The recovery procedure basically means resending the last doorbell for every
doorbelling entity. A doorbelling entity is anything which may send doorbells:
L2 tx ring, rdma sq/rq/cq, light l2, vf l2 tx ring, spq, etc. This relies on
the design assumption that all doorbells are aggregative, so last doorbell
carries the information of all previous doorbells.
APIs
All doorbelling entities need to register with the mechanism before sending
doorbells. The registration entails providing the doorbell address the entity
would be using, and a virtual address where last doorbell data can be found.
Typically fastpath structures already have this construct.
Executing the recovery procedure
Handling the attentions, iterating over all the registered entities and
resending their doorbells, is all handled within qed core module.
Relevance
All doorbelling entities in all protocols need to register with the mechanism,
via the new APIs. Technically this is quite simple (just call the API). Some
protocol fastpath implementation may not have the doorbell data stored anywhere
(compute it from scratch every time) and will have to add such a place.
This is rare and is also better practice (save some cycles on the fastpath).
Performance Penalty
No performance penalty should incur as a result of this feature. If anything
performance can improve by avoiding recalcualtion of doorbell data everytime
doorbell is sent (in some flows).
Add the database used to register doorbelling entities, and APIs for adding
and deleting entries, and logic for traversing the database and doorbelling
once on behalf of all entities.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ariel Elior [Wed, 28 Nov 2018 16:16:07 +0000 (18:16 +0200)]
qede: Register l2 queues with doorbell overflow recovery mechanism
All L2 queues funnel through this flow, so this would cover the
regular RSS queues, as well queues created for VFs, mqos queues,
xdp queues, etc.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ariel Elior [Wed, 28 Nov 2018 16:16:06 +0000 (18:16 +0200)]
qed: Expose the doorbell overflow recovery mechanism to the protocol drivers
Most of the doorbelling entities are outside of the core module.
L2 queues, Roce queues, iscsi and fcoe all need to register.
Make the APIs available for these drivers.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ariel Elior [Wed, 28 Nov 2018 16:16:05 +0000 (18:16 +0200)]
qed: Register light L2 queues with doorbell overflow recovery mechanism
Light L2 queues are doorbelling entities. Modify the implementation
to keep the doorbell data necessary for doorbelling in well known
location instead of recomputing every time. Register the LL2 queue
with doorbell recovery mechanism.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ariel Elior [Wed, 28 Nov 2018 16:16:04 +0000 (18:16 +0200)]
qed: Register slowpath queue doorbell with doorbell overflow recovery mechanism
Slow path queue is a doorbelling entity. Register it with the overflow mechanism.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ariel Elior [Wed, 28 Nov 2018 16:16:03 +0000 (18:16 +0200)]
qed: Use the doorbell overflow recovery mechanism in case of doorbell overflow
In case of an attention from the doorbell queue block, analyze the HW
indications. In case of a doorbell overflow, execute a doorbell recovery.
Since there can be spurious indications (race conditions between multiple PFs),
schedule a periodic task for checking whether a doorbell overflow may have been
missed. After a set time with no indications, terminate the periodic task.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ariel Elior [Wed, 28 Nov 2018 16:16:02 +0000 (18:16 +0200)]
qed: Add doorbell overflow recovery mechanism
Add the database used to register doorbelling entities, and APIs for adding
and deleting entries, and logic for traversing the database and doorbelling
once on behalf of all entities.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com> Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
rtnetlink: avoid a warning in rtnl_newlink()
I've been hoping for some time that someone more competent would fix
the stack frame size warning in rtnl_newlink(), but looks like I'll
have to take a stab at it myself :) That's the only warning I see
in most of my builds.
First patch refactors away a somewhat surprising if (1) code block.
Reindentation will most likely cause cherry-pick problems but OTOH
rtnl_newlink() doesn't seem to be changed often, so perhaps we can
risk it in the name of cleaner code?
Second patch fixes the warning in simplest possible way. I was
pondering if there is any more clever solution, but I can't see it..
rtnl_newlink() is quite long with a lot of possible execution paths
so doing memory allocations half way through leads to very ugly
results.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 28 Nov 2018 06:32:31 +0000 (22:32 -0800)]
rtnetlink: avoid frame size warning in rtnl_newlink()
Standard kernel compilation produces the following warning:
net/core/rtnetlink.c: In function ‘rtnl_newlink’:
net/core/rtnetlink.c:3232:1: warning: the frame size of 1288 bytes is larger than 1024 bytes [-Wframe-larger-than=]
}
^
This should not really be an issue, as rtnl_newlink() stack is
generally quite shallow.
Fix the warning by allocating attributes with kmalloc() in a wrapper
and passing it down to rtnl_newlink(), avoiding complexities on error
paths.
Alternatively we could kmalloc() some structure within rtnl_newlink(),
slave attributes look like a good candidate. In practice it adds to
already rather high complexity and length of the function.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 28 Nov 2018 06:32:30 +0000 (22:32 -0800)]
rtnetlink: remove a level of indentation in rtnl_newlink()
rtnl_newlink() used to create VLAs based on link kind. Since
commit ccf8dbcd062a ("rtnetlink: Remove VLA usage") statically
sized array is created on the stack, so there is no more use
for a separate code block that used to be the VLA's live range.
While at it christmas tree the variables. Note that there is
a goto-based retry so to be on the safe side the variables can
no longer be initialized in place. It doesn't seem to matter,
logically, but why make the code harder to read..
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
nfp: update TX path to enable repr offloads
This set starts with three micro optimizations to the TX path.
The improvement is measurable, but below 1% of CPU utilization.
Patches 4 - 9 add basic TX offloads to representor devices, like
checksum offload or TSO, and remove the unnecessary TX lock and
Qdisc (our representors are software constructs on top of the PF).
The last 2 patches add more info to error messages - id of command
which failed and exact location of incorrect TLVs, very useful for
debugging.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 28 Nov 2018 06:24:58 +0000 (22:24 -0800)]
nfp: report more info when reconfiguration fails
FW reconfiguration timeouts are a common indicator of FW trouble.
To make debugging easier print requested update and control word
when reconfiguration fails.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 28 Nov 2018 06:24:57 +0000 (22:24 -0800)]
nfp: add offset to all TLV parsing errors
When troubleshooting incorrect FW capabilities it's useful to know
where the faulty TLV is located. Add offset to all errors messages.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 28 Nov 2018 06:24:56 +0000 (22:24 -0800)]
nfp: add offloads on representors
FW/HW can generally support the standard networking offloads
on representors without any trouble. Add the ability for FW
to advertise which features should be available on representors.
Because representors are muxed on top of the vNIC we need to listen
on feature changes of their lower devices, and update their features
appropriately.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 28 Nov 2018 06:24:55 +0000 (22:24 -0800)]
nfp: add locking around representor changes
Up until now we never needed to keep a networking locks around
representors accesses, we only accessed them when device was
reconfigured (under nfp pf->lock) or on fast path (under RCU).
Now we want to be able to iterate over all representors during
notifications, so make sure representor assignment is done
under RTNL lock.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 28 Nov 2018 06:24:54 +0000 (22:24 -0800)]
nfp: run don't require Qdiscs on representor netdevs
Our representors are software devices built on top of the PF
vNIC, the queuing should only happen at the vNIC netdevice.
Allow representors to run qdisc-less.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 28 Nov 2018 06:24:53 +0000 (22:24 -0800)]
nfp: run representor TX locklessly
Our representors are software devices built on top of the PF
vNIC, the only state they have are per-cpu stats, so make
the TX run locklessly.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 28 Nov 2018 06:24:52 +0000 (22:24 -0800)]
nfp: avoid oversized TSO headers with metadata prepend
In preparation for TSO over representors make sure the port id
prepend will always fit in the frame. The current max header
length is 255, which is ample, so assume worst case scenario
of 8 byte prepend and save ourselves the conditionals.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 28 Nov 2018 06:24:51 +0000 (22:24 -0800)]
nfp: correct descriptor offsets in presence of metadata
The TSO-related offsets in the descriptor should not include
the length of the prepended metadata. Adjust them. Note that
this could not have caused issues in the past as we don't
support TSO with metadata prepend as of this patch.
Signed-off-by: Michael Rapson <michael.rapson@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 28 Nov 2018 06:24:50 +0000 (22:24 -0800)]
nfp: move queue variable init
nd_q is only used at the very end of nfp_net_tx(), there is no need
to initialize it early.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 28 Nov 2018 06:24:49 +0000 (22:24 -0800)]
nfp: move temporary variables in nfp_net_tx_complete()
Move temporary variables in scope of the loop in nfp_net_tx_complete(),
and add a temp for txbuf software structure. This saves us 0.2% of CPU.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 28 Nov 2018 06:24:48 +0000 (22:24 -0800)]
nfp: copy only the relevant part of the TX descriptor for frags
Chained descriptors for fragments need to duplicate all the descriptor
fields of the skb head, so we copy the descriptor and then modify the
relevant fields. This is wasteful, because the top half of the descriptor
will get overwritten entirely while the bottom half is not modified at all.
Copy only the bottom half. This saves us 0.3% of CPU in a GSO test.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
tcp: take a bit more care of backlog stress
While working on the SACK compression issue Jean-Louis Dupond
reported, we found that his linux box was suffering very hard
from tail drops on the socket backlog queue.
First patch hints the compiler about sack flows being the norm.
Second patch changes non-sack code in preparation of the ack
compression.
Third patch fixes tcp_space() to take backlog into account.
Fourth patch is attempting coalescing when a new packet must
be added to the backlog queue. Cooking bigger skbs helps
to keep backlog list smaller and speeds its handling when
user thread finally releases the socket lock.
v3: Neal/Yuchung feedback addressed :
Do not aggregate if any skb has URG bit set.
Do not aggregate if the skbs have different ECE/CWR bits
v2: added feedback from Neal : tcp: take care of compressed acks in tcp_add_reno_sack()
added : tcp: hint compiler about sack flows
added : tcp: make tcp_space() aware of socket backlog
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 27 Nov 2018 22:42:03 +0000 (14:42 -0800)]
tcp: implement coalescing on backlog queue
In case GRO is not as efficient as it should be or disabled,
we might have a user thread trapped in __release_sock() while
softirq handler flood packets up to the point we have to drop.
This patch balances work done from user thread and softirq,
to give more chances to __release_sock() to complete its work
before new packets are added the the backlog.
This also helps if we receive many ACK packets, since GRO
does not aggregate them.
This patch brings ~60% throughput increase on a receiver
without GRO, but the spectacular gain is really on
1000x release_sock() latency reduction I have measured.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Neal Cardwell <ncardwell@google.com> Cc: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 27 Nov 2018 22:42:02 +0000 (14:42 -0800)]
tcp: make tcp_space() aware of socket backlog
Jean-Louis Dupond reported poor iscsi TCP receive performance
that we tracked to backlog drops.
Apparently we fail to send window updates reflecting the
fact that we are under stress.
Note that we might lack a proper window increase when
backlog is fully processed, since __release_sock() clears
sk->sk_backlog.len _after_ all skbs have been processed.
This should not matter in practice. If we had a significant
load through socket backlog, we are in a dangerous
situation.
Reported-by: Jean-Louis Dupond <jean-louis@dupond.be> Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Tested-by: Jean-Louis Dupond<jean-louis@dupond.be> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 27 Nov 2018 22:42:01 +0000 (14:42 -0800)]
tcp: take care of compressed acks in tcp_add_reno_sack()
Neal pointed out that non sack flows might suffer from ACK compression
added in the following patch ("tcp: implement coalescing on backlog queue")
Instead of tweaking tcp_add_backlog() we can take into
account how many ACK were coalesced, this information
will be available in skb_shinfo(skb)->gso_segs
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 27 Nov 2018 22:42:00 +0000 (14:42 -0800)]
tcp: hint compiler about sack flows
Tell the compiler that most TCP flows are using SACK these days.
There is no need to add the unlikely() clause in tcp_is_reno(),
the compiler is able to infer it.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Trace events are already present for the receive entry points, to indicate
how the reception entered the stack.
This patch adds the corresponding exit trace events that will bound the
reception such that all events occurring between the entry and the exit
can be considered as part of the reception context. This greatly helps
for dependency and root cause analyses.
Without this, it is not possible with tracepoint instrumentation to
determine whether a sched_wakeup event following a netif_receive_skb
event is the result of the packet reception or a simple coincidence after
further processing by the thread. It is possible using other mechanisms
like kretprobes, but considering the "entry" points are already present,
it would be good to add the matching exit events.
In addition to linking packets with wakeups, the entry/exit event pair
can also be used to perform network stack latency analyses.
Signed-off-by: Geneviève Bastien <gbastien@versatic.net> CC: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> CC: Steven Rostedt <rostedt@goodmis.org> CC: Ingo Molnar <mingo@redhat.com> CC: David S. Miller <davem@davemloft.net> Reviewed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> (tracing side) Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Tue, 27 Nov 2018 15:40:59 +0000 (15:40 +0000)]
net/flow_dissector: correct comments on enum flow_dissector_key_id
There are no such structs flow_dissector_key_flow_vlan or
flow_dissector_key_flow_tags, the actual structs used are struct
flow_dissector_key_vlan and struct flow_dissector_key_tags. So correct the
comments against FLOW_DISSECTOR_KEY_VLAN, FLOW_DISSECTOR_KEY_FLOW_LABEL and
FLOW_DISSECTOR_KEY_CVLAN to refer to those.
Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Tue, 27 Nov 2018 09:29:06 +0000 (14:59 +0530)]
cxgb4: number of VFs supported is not always 16
Total number of VFs supported by PF is used to determine the last
byte of VF's mac address. Number of VFs supported is not always
16, use the variable nvfs to get the number of VFs supported
rather than hard coding it to 16.
Signed-off-by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The following pull-request contains BPF updates for your *net-next* tree.
(Getting out bit earlier this time to pull in a dependency from bpf.)
The main changes are:
1) Add libbpf ABI versioning and document API naming conventions
as well as ABI versioning process, from Andrey.
2) Add a new sk_msg_pop_data() helper for sk_msg based BPF
programs that is used in conjunction with sk_msg_push_data()
for adding / removing meta data to the msg data, from John.
3) Optimize convert_bpf_ld_abs() for 0 offset and fix various
lib and testsuite build failures on 32 bit, from David.
4) Make BPF prog dump for !JIT identical to how we dump subprogs
when JIT is in use, from Yonghong.
5) Rename btf_get_from_id() to make it more conform with libbpf
API naming conventions, from Martin.
6) Add a missing BPF kselftest config item, from Naresh.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yonghong Song [Thu, 29 Nov 2018 23:31:45 +0000 (15:31 -0800)]
tools/bpf: make libbpf _GNU_SOURCE friendly
During porting libbpf to bcc, I got some warnings like below:
...
[ 2%] Building C object src/cc/CMakeFiles/bpf-shared.dir/libbpf/src/libbpf.c.o
/home/yhs/work/bcc2/src/cc/libbpf/src/libbpf.c:12:0:
warning: "_GNU_SOURCE" redefined [enabled by default]
#define _GNU_SOURCE
...
[ 3%] Building C object src/cc/CMakeFiles/bpf-shared.dir/libbpf/src/libbpf_errno.c.o
/home/yhs/work/bcc2/src/cc/libbpf/src/libbpf_errno.c: In function ‘libbpf_strerror’:
/home/yhs/work/bcc2/src/cc/libbpf/src/libbpf_errno.c:45:7:
warning: assignment makes integer from pointer without a cast [enabled by default]
ret = strerror_r(err, buf, size);
...
bcc is built with _GNU_SOURCE defined and this caused the above warning.
This patch intends to make libpf _GNU_SOURCE friendly by
. define _GNU_SOURCE in libbpf.c unless it is not defined
. undefine _GNU_SOURCE as non-gnu version of strerror_r is expected.
Signed-off-by: Yonghong Song <yhs@fb.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Eric Dumazet [Thu, 29 Nov 2018 15:56:20 +0000 (07:56 -0800)]
tcp: remove loop to compute wscale
We can remove the loop and conditional branches
and compute wscale efficiently thanks to ilog2()
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qede - Add a statistic for a case where driver drops tx packet due to memory allocation failure.
skb_linearization can fail due to memory allocation failure.
In such a case, the driver will drop the packet. In such a case
The driver used to print an error message.
This patch replaces this error message by a dedicated statistic.
Signed-off-by: Michael Shteinbok <michael.shteinbok@cavium.com> Signed-off-by: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 29 Nov 2018 18:34:46 +0000 (10:34 -0800)]
Merge branch 'ave-suspend-resume'
Kunihiko Hayashi says:
====================
Add suspend/resume support for AVE ethernet driver
This series adds support for suspend/resume to AVE ethernet driver.
And to avoid the error that wol state of phy hardware is enabled by default,
this sets initial wol state to disabled and add preservation the state in
suspend/resume sequence.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kunihiko Hayashi [Thu, 29 Nov 2018 08:06:32 +0000 (17:06 +0900)]
net: ethernet: ave: Set initial wol state to disabled
If wol state of phy hardware is enabled after reset, phy_ethtool_get_wol()
returns that wol.wolopts is true.
However, since net_device.wol_enabled is zero and this doesn't apply wol
state until calling ethtool_set_wol(), so mdio_bus_phy_may_suspend()
returns true, that is, it's in a state where phy can suspend even though
wol state is enabled.
In this inconsistency, phy_suspend() returns -EBUSY, and at last,
suspend sequence fails with the following message:
dpm_run_callback(): mdio_bus_phy_suspend+0x0/0x58 returns -16
PM: Device 65000000.ethernet-ffffffff:01 failed to suspend: error -16
PM: Some devices failed to suspend, or early wake event detected
In order to fix the above issue, this patch forces to set initial wol state
to disabled as default.
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lyude Paul [Sat, 24 Nov 2018 22:57:05 +0000 (17:57 -0500)]
brcmfmac: Fix out of bounds memory access during fw load
I ended up tracking down some rather nasty issues with f2fs (and other
filesystem modules) constantly crashing on my kernel down to a
combination of out of bounds memory accesses, one of which was coming
from brcmfmac during module load:
[ 30.891382] brcmfmac: brcmf_fw_alloc_request: using brcm/brcmfmac4356-sdio for chip BCM4356/2
[ 30.894437] ==================================================================
[ 30.901581] BUG: KASAN: global-out-of-bounds in brcmf_fw_alloc_request+0x42c/0x480 [brcmfmac]
[ 30.909935] Read of size 1 at addr ffff2000024865df by task kworker/6:2/387
[ 30.916805]
[ 30.918261] CPU: 6 PID: 387 Comm: kworker/6:2 Tainted: G O 4.20.0-rc3Lyude-Test+ #19
[ 30.927251] Hardware name: amlogic khadas-vim2/khadas-vim2, BIOS 2018.07-rc2-armbian 09/11/2018
[ 30.935964] Workqueue: events brcmf_driver_register [brcmfmac]
[ 30.941641] Call trace:
[ 30.944058] dump_backtrace+0x0/0x3e8
[ 30.947676] show_stack+0x14/0x20
[ 30.950968] dump_stack+0x130/0x1c4
[ 30.954406] print_address_description+0x60/0x25c
[ 30.959066] kasan_report+0x1b4/0x368
[ 30.962683] __asan_report_load1_noabort+0x18/0x20
[ 30.967547] brcmf_fw_alloc_request+0x42c/0x480 [brcmfmac]
[ 30.967639] brcmf_sdio_probe+0x163c/0x2050 [brcmfmac]
[ 30.978035] brcmf_ops_sdio_probe+0x598/0xa08 [brcmfmac]
[ 30.983254] sdio_bus_probe+0x190/0x398
[ 30.983270] really_probe+0x2a0/0xa70
[ 30.983296] driver_probe_device+0x1b4/0x2d8
[ 30.994901] __driver_attach+0x200/0x280
[ 30.994914] bus_for_each_dev+0x10c/0x1a8
[ 30.994925] driver_attach+0x38/0x50
[ 30.994935] bus_add_driver+0x330/0x608
[ 30.994953] driver_register+0x140/0x388
[ 31.013965] sdio_register_driver+0x74/0xa0
[ 31.014076] brcmf_sdio_register+0x14/0x60 [brcmfmac]
[ 31.023177] brcmf_driver_register+0xc/0x18 [brcmfmac]
[ 31.023209] process_one_work+0x654/0x1080
[ 31.032266] worker_thread+0x4f0/0x1308
[ 31.032286] kthread+0x2a8/0x320
[ 31.039254] ret_from_fork+0x10/0x1c
[ 31.039269]
[ 31.044226] The buggy address belongs to the variable:
[ 31.044351] brcmf_firmware_path+0x11f/0xfffffffffffd3b40 [brcmfmac]
[ 31.055601]
[ 31.057031] Memory state around the buggy address:
[ 31.061800] ffff200002486480: 04 fa fa fa fa fa fa fa 00 00 00 00 00 00 00 00
[ 31.068983] ffff200002486500: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 31.068993] >ffff200002486580: 00 00 00 00 00 00 00 00 fa fa fa fa 00 00 00 00
[ 31.068999] ^
[ 31.069017] ffff200002486600: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
[ 31.096521] ffff200002486680: 00 00 00 00 00 00 00 00 00 00 00 00 fa fa fa fa
[ 31.096528] ==================================================================
[ 31.096533] Disabling lock debugging due to kernel taint
It appears that when trying to determine the length of the string in the
alternate firmware path, we make the mistake of not handling the case
where the firmware path is empty correctly. Since strlen(mp_path) can
return 0, we'll end up accessing mp_path[-1] when the firmware_path
isn't provided through the module arguments.
So, fix this by just setting the end char to '\0' by default, and only
changing it if we have a non-zero length. Additionally, use strnlen()
with BRCMF_FW_ALTPATH_LEN instead of strlen() just to be extra safe.
Fixes: 2baa3aaee27f ("brcmfmac: introduce brcmf_fw_alloc_request() function") Cc: Hante Meuleman <hante.meuleman@broadcom.com> Cc: Pieter-Paul Giesberts <pieter-paul.giesberts@broadcom.com> Cc: Franky Lin <franky.lin@broadcom.com> Cc: Arend van Spriel <arend.vanspriel@broadcom.com> Cc: Kalle Valo <kvalo@codeaurora.org> Cc: Arend Van Spriel <arend.vanspriel@broadcom.com> Cc: Himanshu Jha <himanshujha199640@gmail.com> Cc: Dan Haab <dhaab@luxul.com> Cc: Jia-Shyr Chuang <saint.chuang@cypress.com> Cc: Ian Molton <ian@mnementh.co.uk> Cc: <stable@vger.kernel.org> # v4.17+ Signed-off-by: Lyude Paul <lyude@redhat.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Hans de Goede [Fri, 23 Nov 2018 09:11:48 +0000 (10:11 +0100)]
brcmfmac: Call brcmf_dmi_probe before brcmf_of_probe
ARM systems with UEFI may have both devicetree (of) and DMI data in this
case we end up setting brcmf_mp_device.board_type twice.
In this case we should prefer the devicetree data, because:
1) The devicerree data is more reliable
2) Some ARM systems (e.g. the Raspberry Pi 3 models) support both UEFI and
classic uboot booting, the devicetree data is always there, so using it
makes sure we ask for the same nvram file independent of how we booted.
This commit moves the brcmf_dmi_probe call to before the brcmf_of_probe
call, so that the latter can override the value of the first if both are
set.
Fixes: bd1e82bb420a ("brcmfmac: Set board_type from DMI on x86 based ...") Cc: Peter Robinson <pbrobinson@gmail.com> Tested-and-reported-by: Peter Robinson <pbrobinson@gmail.com> Signed-off-by: Hans de Goede <hdegoede@redhat.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Dan Haab [Fri, 9 Nov 2018 16:38:55 +0000 (09:38 -0700)]
brcmfmac: support STA info struct v7
The newest firmwares provide STA info using v7 of the struct. As v7
isn't backward compatible, a union is needed.
Even though brcmfmac does not use any of the new info it's important to
provide the proper struct buffer. Without this change new firmwares will
fallback to the very limited v3 instead of something in between such as
v4.
Signed-off-by: Dan Haab <dan.haab@luxul.com> Reviewed-by: Rafał Miłecki <rafal@milecki.pl> Reviewed-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Larry Finger [Mon, 19 Nov 2018 18:01:24 +0000 (20:01 +0200)]
b43: Fix error in cordic routine
The cordic routine for calculating sines and cosines that was added in
commit 6f98e62a9f1b ("b43: update cordic code to match current specs")
contains an error whereby a quantity declared u32 can in fact go negative.
This problem was detected by Priit Laes who is switching b43 to use the
routine in the library functions of the kernel.
Fixes: 986504540306 ("b43: make cordic common (LP-PHY and N-PHY need it)") Reported-by: Priit Laes <plaes@plaes.org> Cc: Rafał Miłecki <zajec5@gmail.com> Cc: Stable <stable@vger.kernel.org> # 2.6.34 Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Priit Laes <plaes@plaes.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Priit Laes [Mon, 19 Nov 2018 18:01:22 +0000 (20:01 +0200)]
lib: cordic: Move cordic macros and defines to header file
Now that these macros are in header file, we can eventually
clean up the duplicate macros present in the drivers that
utilize the same cordic algorithm implementation.
Also add CORDIC_ prefix to nonprefixed macros.
Reviewed-by: Arend van Spriel <arend.vanspriel@broadcom.com> Signed-off-by: Priit Laes <plaes@plaes.org> Acked-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Kalle Valo [Thu, 29 Nov 2018 09:00:00 +0000 (11:00 +0200)]
Merge tag 'iwlwifi-next-for-kalle-2018-11-23' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-next
second batch of iwlwifi patches intended for v4.21
* New FW debugging infrastructure;
* Some more work on 802.11ax;
* Improve support for multiple RF modules with 22000 devices;
* Remove an unused FW parameter;
* Other debugging improvements;
David S. Miller [Thu, 29 Nov 2018 07:18:16 +0000 (23:18 -0800)]
Merge tag 'linux-can-next-for-4.21-20181128' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can-next
Marc Kleine-Budde says:
====================
This is a pull request for net-next/master consisting of 18 patches.
The first patch is by Colin Ian King and fixes the spelling in the ucan
driver.
The next three patches target the xilinx driver. YueHaibing's patch
fixes the return type of ndo_start_xmit function. Two patches by
Shubhrajyoti Datta add support for the CAN FD 2.0 controllers.
Flavio Suligoi's patch for the sja1000 driver add support for the ASEM
CAN raw hardware.
Wolfram Sang's and Kuninori Morimoto's patches switch the rcar driver to
use SPDX license identifiers.
The remaining 111 patches improve the flexcan driver. Pankaj Bansal's
patch enables the driver in Kconfig on all architectures with IOMEM
support. The next four patches by me fix indention, add missing
parentheses and comments. Aisheng Dong's patches add self wake support
and document it in the DT bindings. The remaining patches by Pankaj
Bansal first fix the loopback support and prepare the driver for the
CAN-FD support needed for the LX2160A SoC. The actual CAN-FD support
will be added in a later patch series.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>