mlx5e_mpwqe_get_log_rq_size calculates the number of WQEs (N) based on
the requested number of frames in the RQ (F) and the number of packets
per WQE (P). It ensures that N is not less than the minimum number of
WQEs in an RQ (N_min). Arithmetically, it means that F / P >= N_min
should be true. This function deals with logarithms, so it should check
that log(F) - log(P) >= log(N_min). However, if F < P, this expression
will cause an unsigned underflow. Check log(F) >= log(P) + log(N_min)
instead.
If the channels fail to reopen after setting an XDP program, return the
error code instead of 0. A proper fix is still needed, as now any error
while reopening the channels brings the interface down. This patch only
adds error reporting.
Shay Agroskin [Thu, 14 Mar 2019 12:54:07 +0000 (14:54 +0200)]
net/mlx5e: XDP, Inline small packets into the TX MPWQE in XDP xmit flow
Upon high packet rate with multiple CPUs TX workloads, much of the HCA's
resources are spent on prefetching TX descriptors, thus affecting
transmission rates.
This patch comes to mitigate this problem by moving some workload to the
CPU and reducing the HW data prefetch overhead for small packets (<= 256B).
When forwarding packets with XDP, a packet that is smaller
than a certain size (set to ~256 bytes) would be sent inline within
its WQE TX descrptor (mem-copied), when the hardware tx queue is congested
beyond a pre-defined water-mark.
This is added to better utilize the HW resources (which now makes
one less packet data prefetch) and allow better scalability, on the
account of CPU usage (which now 'memcpy's the packet into the WQE).
To load balance between HW and CPU and get max packet rate, we use
watermarks to detect how much the HW is congested and move the work
loads back and forth between HW and CPU.
Performance:
Tested packet rate for UDP 64Byte multi-stream
over two dual port ConnectX-5 100Gbps NICs.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
* Tested with hyper-threading disabled
XDP_TX:
| | before | after | |
| 24 rings | 51Mpps | 116Mpps | +126% |
| 1 ring | 12Mpps | 12Mpps | same |
XDP_REDIRECT:
** Below is the transmit rate, not the redirection rate
which might be larger, and is not affected by this patch.
| | before | after | |
| 32 rings | 64Mpps | 92Mpps | +43% |
| 1 ring | 6.4Mpps | 6.4Mpps | same |
As we can see, feature significantly improves scaling, without
hurting single ring performance.
Tariq Toukan [Wed, 27 Feb 2019 10:06:08 +0000 (12:06 +0200)]
net/mlx5e: RX, Support multiple outstanding UMR posts
The buffers mapping of the Multi-Packet WQEs (of Striding RQ)
is done via UMR posts, one UMR WQE per an RX MPWQE.
A single MPWQE is capable of serving many incoming packets,
usually larger than the budget of a single napi cycle.
Hence, posting a single UMR WQE per napi cycle (and handling its
completion in the next cycle) works fine in many common cases,
but not always.
When an XDP program is loaded, every MPWQE is capable of serving less
packets, to satisfy the packet-per-page requirement.
Thus, for the same number of packets more MPWQEs (and UMR posts)
are needed (twice as much for the default MTU), giving less latency
room for the UMR completions.
In this patch, we add support for multiple outstanding UMR posts,
to allow faster gap closure between consuming MPWQEs and reposting
them back into the WQ.
For better SW and HW locality, we combine the UMR posts in bulks of
(at least) two.
This is expected to improve packet rate in high CPU scale.
Performance test:
As expected, huge improvement in large-scale (48 cores).
xdp_redirect_map, 64B UDP multi-stream.
Redirect from ConnectX-5 100Gbps to ConnectX-6 100Gbps.
CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz.
Before: Unstable, 7 to 30 Mpps
After: Stable, at 70.5 Mpps
Merge tag 'v5.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux into mlx5-next
Linux 5.1-rc1
We forgot to reset the branch last merge window thus mlx5-next is outdated
and still based on 5.0-rc2. This merge commit is needed to sync mlx5-next
branch with 5.1-rc1.
David Ahern [Mon, 22 Apr 2019 00:39:18 +0000 (17:39 -0700)]
ipv6: Restore RTF_ADDRCONF check in rt6_qualify_for_ecmp
The RTF_ADDRCONF flag filters out routes added by RA's in determining
which routes can be appended to an existing one to create a multipath
route. Restore the flag check and add a comment to document the RA piece.
Fixes: 4e54507ab1a9 ("ipv6: Simplify rt6_qualify_for_ecmp") Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Sun, 21 Apr 2019 15:49:01 +0000 (08:49 -0700)]
ipv6: Simplify rt6_qualify_for_ecmp
After commit c7a1ce397ada ("ipv6: Change addrconf_f6i_alloc to use
ip6_route_info_create"), the gateway is no longer filled in for fib6_nh
structs in a prefix route. Accordingly, the RTF_ADDRCONF flag check can
be dropped from the 'rt6_qualify_for_ecmp'.
Further, RTF_DYNAMIC is only set in rt6_info instances, so it can be
removed from the check as well.
This reduces rt6_qualify_for_ecmp and the mlxsw version to just checking
if the nexthop has a gateway which is the real indication of whether
entries can be coalesced into a multipath route.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 21 Apr 2019 17:31:45 +0000 (10:31 -0700)]
Merge branch 'mlxsw-Small-routing-improvements'
Ido Schimmel says:
====================
mlxsw: Small routing improvements
Patch #1 switches the driver to use a unique and stable ECMP/LAG seed.
This allows for consistent behavior across reboots and avoids hash
polarization at the same time.
Patch #2 relaxes the FIB rule validation in the driver to allow the
installation of rules that direct locally generated traffic (iif=lo).
This does not result in a discrepancy between both data paths because
packets received by the device would never match such rules.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, mlxsw does not support policy-based routing (PBR) and
therefore forbids the installation of non-default FIB rules except for
the l3mdev rule which is used for VRFs.
Relax the check to allow the installation of FIB rules that would never
match packets received by the device. Specifically, if the iif is that
of the loopback netdev. This is useful for users that need to redirect
locally generated packets based on FIB rules.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Tested-by: Alexander Petrovskiy <alexpe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Pablo Cascón [Sat, 20 Apr 2019 00:20:09 +0000 (17:20 -0700)]
nfp: add SR-IOV trusted VF support
By default VFs are not trusted. Add ndo_set_vf_trust support to toggle
a new per-VF bit. Coupled with FW with this capability allows a
trusted VF to change its MAC even after being administratively set by
the PF. Also populate the trusted field on ndo_get_vf_config. Add the
same ndo to the representors.
Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Fri, 19 Apr 2019 03:05:47 +0000 (11:05 +0800)]
net: hns3: add function type check for debugfs help information
PF supports all debugfs command, but VF only supports part of
debugfs command. So VF should not show unsupported help information.
This patch adds a check for PF and PF to show the supportable help
information.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: hns3: add queue's statistics update to service task
This patch updates VF's TQP statistic info in the service task,
and adds a limitation to prevent update too frequently.
Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Weihang Li [Fri, 19 Apr 2019 03:05:44 +0000 (11:05 +0800)]
net: hns3: add support for dump ncl config by debugfs
This patch allow users to dump content of NCL_CONFIG by using debugfs
command.
Command format:
echo dump ncl_config <offset> <length> > cmd
It will print as follows:
hns3 0000:7d:00.0: offset | data
hns3 0000:7d:00.0: 0x0000 | 0x00000020
hns3 0000:7d:00.0: 0x0004 | 0x00000400
hns3 0000:7d:00.0: 0x0008 | 0x08020401
Signed-off-by: Weihang Li <liweihang@hisilicon.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yonglong Liu [Fri, 19 Apr 2019 03:05:43 +0000 (11:05 +0800)]
net: hns3: Add support for netif message level settings
This patch adds support for network interface message level
settings. The message level can be changed by module parameter
or ethtool.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: hns3: dump more information when tx timeout happens
Currently we just print few information when tx timeout happens.
In order to find out the cause of timeout, this patch prints more
information about the packet statistics, tqp registers and
napi state.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: hns3: fix loop condition of hns3_get_tx_timeo_queue_info()
In function hns3_get_tx_timeo_queue_info(), it should use
netdev->num_tx_queues, instead of netdve->real_num_tx_queues
as the loop limitation.
Fixes: 424eb834a9be ("net: hns3: Unified HNS3 {VF|PF} Ethernet Driver for hip08 SoC") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In current codes, tx_timeout_cnt is used before increased,
then we can see the tx_timeout_count is still 0 from the
print when tx timeout happens, e.g.
"hns3 0000:7d:00.3 eth3: tx_timeout count: 0, queue id: 0, SW_NTU:
0xa6, SW_NTC: 0xa4, HW_HEAD: 0xa4, HW_TAIL: 0xa6, INT: 0x1"
The tx_timeout_cnt should be updated before used.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Fri, 19 Apr 2019 03:05:39 +0000 (11:05 +0800)]
net: hns3: add some debug info for hclgevf_get_mbx_resp()
When wait for response timeout or response code not match, there
should be more information for debugging.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Fri, 19 Apr 2019 03:05:38 +0000 (11:05 +0800)]
net: hns3: add some debug information for hclge_check_event_cause
When received vector0 msix and other event interrupt, it should
print the value of the register for debugging.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Fri, 19 Apr 2019 03:05:37 +0000 (11:05 +0800)]
net: hns3: add reset statistics for VF
This patch adds some statistics for VF reset.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Fri, 19 Apr 2019 03:05:36 +0000 (11:05 +0800)]
net: hns3: add reset statistics info for PF
This patch adds statistics for PF's reset information,
also, provides a debugfs command to dump these statistics.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 19 Apr 2019 23:02:03 +0000 (16:02 -0700)]
tcp: properly reset skb->truesize for tx recycling
tcp sendmsg() and sendpage() normally advance skb->data_len
and skb->truesize by the payload added to an skb.
But sendmsg(fd, ..., MSG_ZEROCOPY) has to account for whole pages,
even if a single byte of payload is used in the page.
This means that we can not assume skb->truesize can be adjusted
by skb->data_len. We must instead overwrite its value.
Otherwise skb->truesize is too big and can hit socket sndbuf limit,
especially if the skb is recycled multiple times :/
Fixes: 472c2e07eef0 ("tcp: add one skb cache for tx") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum: Assume CONFIG_NET_DEVLINK is always enabled
Since commit f6b19b354d50 ("net: devlink: select NET_DEVLINK
from drivers") adds implicit select of NET_DEVLINK for
mlxsw, the code does not have to deal with the case
when CONFIG_NET_DEVLINK is not enabled. So remove the ifcase
and adjust Makefile.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tipc: introduce new socket option TIPC_SOCK_RECVQ_USED
When using TIPC_SOCK_RECVQ_DEPTH for getsockopt(), it returns the
number of buffers in receive socket buffer which is not so helpful
for user space applications.
This commit introduces the new option TIPC_SOCK_RECVQ_USED which
returns the current allocated bytes of the receive socket buffer.
This helps user space applications dimension its buffer usage to
avoid buffer overload issue.
Signed-off-by: Tung Nguyen <tung.q.nguyen@dektech.com.au> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Thu, 18 Apr 2019 01:11:39 +0000 (03:11 +0200)]
net: dsa: mv88e6xxx: Only reconfigure MAC when something changes
phylink will call the mac_config() callback once per second when
polling a PHY or a fixed link. The MAC driver is not supposed to
reconfigure the MAC if nothing has changed.
Make the mv88e6xxx driver look at the current configuration of the
port, and return early if nothing has changed.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
The 'timeval' and 'timespec' data structures used for socket timestamps
are going to be redefined in user space based on 64-bit time_t in future
versions of the C library to deal with the y2038 overflow problem,
which breaks the ABI definition.
Unlike many modern ioctl commands, SIOCGSTAMP and SIOCGSTAMPNS do not
use the _IOR() macro to encode the size of the transferred data, so it
remains ambiguous whether the application uses the old or new layout.
The best workaround I could find is rather ugly: we redefine the command
code based on the size of the respective data structure with a ternary
operator. This lets it get evaluated as late as possible, hopefully after
that structure is visible to the caller. We cannot use an #ifdef here,
because inux/sockios.h might have been included before any libc header
that could determine the size of time_t.
The ioctl implementation now interprets the new command codes as always
referring to the 64-bit structure on all architectures, while the old
architecture specific command code still refers to the old architecture
specific layout. The new command number is only used when they are
actually different.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
ia64, parisc and sparc just use a copy of the generic version
of asm/sockios.h, and x86 is a redirect to the same file, so we
can just let the header file be generated.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
The SIOCGSTAMP/SIOCGSTAMPNS ioctl commands are implemented by many
socket protocol handlers, and all of those end up calling the same
sock_get_timestamp()/sock_get_timestampns() helper functions, which
results in a lot of duplicate code.
With the introduction of 64-bit time_t on 32-bit architectures, this
gets worse, as we then need four different ioctl commands in each
socket protocol implementation.
To simplify that, let's add a new .gettstamp() operation in
struct proto_ops, and move ioctl implementation into the common
sock_ioctl()/compat_sock_ioctl_trans() functions that these all go
through.
We can reuse the sock_get_timestamp() implementation, but generalize
it so it can deal with both native and compat mode, as well as
timeval and timespec structures.
Acked-by: Stefan Schmidt <stefan@datenfreihafen.org> Acked-by: Neil Horman <nhorman@tuxdriver.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Link: https://lore.kernel.org/lkml/CAK8P3a038aDQQotzua_QtKGhq8O9n+rdiz2=WDCp82ys8eUT+A@mail.gmail.com/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: support binding vlan dev link state to vlan member bridge ports
For vlan filtering on bridges, the bridge may also have vlan devices
as upper devices. For switches, these are used to provide L3 packet
processing for ports that are members of a given vlan.
While it is correct that the admin state for these vlan devices is
either set directly for the device or inherited from the lower device,
the link state is also transferred from the lower device. So this is
always up if the bridge is in admin up state and there is at least one
bridge port that is up, regardless of the vlan that the port is in.
The link state of the vlan device may need to track only the state of
the subset of ports that are also members of the corresponding vlan,
rather than that of all ports.
This series provides an optional vlan flag so that the link state of
the vlan device is only up if there is at least one bridge port that is
up AND is a member of the corresponding vlan.
v2:
- Address review comments from Nikolay Aleksandrov
in patches 3 & 4 and add patch 5 to address bridge link down due to STP
v3:
- Address review comment from Nikolay Aleksandrov
in patch 4 so as to remove unnecessary inline #ifdef
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Manning [Thu, 18 Apr 2019 17:35:35 +0000 (18:35 +0100)]
bridge: update vlan dev link state for bridge netdev changes
If vlan bridge binding is enabled, then the link state of a vlan device
that is an upper device of the bridge tracks the state of bridge ports
that are members of that vlan. But this can only be done when the link
state of the bridge is up. If it is down, then the link state of the
vlan devices must also be down. This is to maintain existing behavior
for when STP is enabled and there are no live ports, in which case the
link state for the bridge and any vlan devices is down.
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Manning [Thu, 18 Apr 2019 17:35:34 +0000 (18:35 +0100)]
bridge: update vlan dev state when port added to or deleted from vlan
If vlan bridge binding is enabled, then the link state of a vlan device
that is an upper device of the bridge should track the state of bridge
ports that are members of that vlan. So if a bridge port becomes or
stops being a member of a vlan, then update the link state of the
vlan device if necessary.
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Manning [Thu, 18 Apr 2019 17:35:33 +0000 (18:35 +0100)]
bridge: support binding vlan dev link state to vlan member bridge ports
In the case of vlan filtering on bridges, the bridge may also have the
corresponding vlan devices as upper devices. A vlan bridge binding mode
is added to allow the link state of the vlan device to track only the
state of the subset of bridge ports that are also members of the vlan,
rather than that of all bridge ports. This mode is set with a vlan flag
rather than a bridge sysfs so that the 8021q module is aware that it
should not set the link state for the vlan device.
If bridge vlan is configured, the bridge device event handling results
in the link state for an upper device being set, if it is a vlan device
with the vlan bridge binding mode enabled. This also sets a
vlan_bridge_binding flag so that subsequent UP/DOWN/CHANGE events for
the ports in that bridge result in a link state update of the vlan
device if required.
The link state of the vlan device is up if there is at least one bridge
port that is a vlan member that is admin & oper up, otherwise its oper
state is IF_OPER_LOWERLAYERDOWN.
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Manning [Thu, 18 Apr 2019 17:35:32 +0000 (18:35 +0100)]
vlan: do not transfer link state in vlan bridge binding mode
In vlan bridge binding mode, the link state is no longer transferred
from the lower device. Instead it is set by the bridge module according
to the state of bridge ports that are members of the vlan.
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Manning [Thu, 18 Apr 2019 17:35:31 +0000 (18:35 +0100)]
vlan: support binding link state to vlan member bridge ports
In the case of vlan filtering on bridges, the bridge may also have the
corresponding vlan devices as upper devices. Currently the link state
of vlan devices is transferred from the lower device. So this is up if
the bridge is in admin up state and there is at least one bridge port
that is up, regardless of the vlan that the port is a member of.
The link state of the vlan device may need to track only the state of
the subset of ports that are also members of the corresponding vlan,
rather than that of all ports.
Add a flag to specify a vlan bridge binding mode, by which the link
state is no longer automatically transferred from the lower device,
but is instead determined by the bridge ports that are members of the
vlan.
Signed-off-by: Mike Manning <mmanning@vyatta.att-mail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Thu, 18 Apr 2019 00:05:39 +0000 (01:05 +0100)]
nfp: flower: fix size_t compile warning
A recent addition to NFP introduced a function that formats a string with
a size_t variable. This is formatted with %ld which is fine on 64-bit
architectures but produces a compile warning on 32-bit architectures.
Fix this by using the z length modifier.
Fixes: a6156a6ab0f9 ("nfp: flower: handle merge hint messages") Signed-off-by: John Hurley <john.hurley@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset adds support for a PHY reset driven by a reset-controller.
Currently, only GPIO driven resets are supported by the PHY subsystem.
It also renames the reset-gpio from 'reset' to 'reset_gpio' to
better differentiate between resets wired to a GPIO and resets wired to
a reset-controller driven pin.
Some systems have the PHY reset-line wired to a pin controlled by a
reset-controller (eg. some Atheros AR9132 based boards). In case the
bootloader asserts reset before loading the kernel, we currently do not
have a clean way of deasserting reset to probe the PHY.
v3:
- add missing newline in mdio_bus.c
v2:
- fixed missed rename of "reset" in at803x.c
- move initial reset to mdio_device_reset
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 17 Apr 2019 20:51:59 +0000 (13:51 -0700)]
net: skb: remove unused asserts
We are discouraging the use of BUG() these days, remove the
unused ASSERT macros from skbuff.h.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 17 Apr 2019 20:51:58 +0000 (13:51 -0700)]
net: gemini: remove unnecessary assert
The driver does not advertize NETIF_F_FRAGLIST, the stack can't
pass skbs with frags lists to the xmit function.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 17 Apr 2019 20:51:57 +0000 (13:51 -0700)]
net/sched: taprio: fix build without 64bit div
Recent changes to taprio did not use the correct div64 helpers,
leading to:
net/sched/sch_taprio.o: In function `taprio_dequeue':
sch_taprio.c:(.text+0x34a): undefined reference to `__divdi3'
net/sched/sch_taprio.o: In function `advance_sched':
sch_taprio.c:(.text+0xa0b): undefined reference to `__divdi3'
net/sched/sch_taprio.o: In function `taprio_init':
sch_taprio.c:(.text+0x1450): undefined reference to `__divdi3'
/home/jkicinski/devel/linux/Makefile:1032: recipe for target 'vmlinux' failed
Use math64 helpers.
Fixes: 7b9eba7ba0c1 ("net/sched: taprio: fix picos_per_byte miscalculation") Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 17 Apr 2019 20:51:56 +0000 (13:51 -0700)]
sb1000: fix variable set but not used warnings
GCC 8 complains:
drivers/net/sb1000.c: In function ‘card_send_command’:
drivers/net/sb1000.c:319:14: warning: variable ‘x’ set but not used [-Wunused-but-set-variable]
int status, x;
^
drivers/net/sb1000.c: In function ‘sb1000_check_CRC’:
drivers/net/sb1000.c:493:6: warning: variable ‘crc’ set but not used [-Wunused-but-set-variable]
int crc, status;
^~~
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Wed, 17 Apr 2019 20:51:55 +0000 (13:51 -0700)]
l2tp: fix set but not used variable
GCC complains:
net/l2tp/l2tp_ppp.c: In function ‘pppol2tp_ioctl’:
net/l2tp/l2tp_ppp.c:1073:6: warning: variable ‘val’ set but not used [-Wunused-but-set-variable]
int val;
^~~
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
To make ICMPv6 closer to ICMPv4, add ratemask parameter. Since the ICMP
message types use larger numeric values, a simple bitmask doesn't fit.
I use large bitmap. The input and output are the in form of list of
ranges. Set the default to rate limit all error messages but Packet Too
Big. For Packet Too Big, use ratemask instead of hard-coded.
There are functions where icmpv6_xrlim_allow() and icmpv6_global_allow()
aren't called. This patch only adds them to icmpv6_echo_reply().
Rate limiting error messages is mandated by RFC 4443 but RFC 4890 says
that it is also acceptable to rate limit informational messages. Thus,
I removed the current hard-coded behavior of icmpv6_mask_allow() that
doesn't rate limit informational messages.
v2: Add dummy function proc_do_large_bitmap() if CONFIG_PROC_SYSCTL
isn't defined, expand the description in ip-sysctl.txt and remove
unnecessary conditional before kfree().
v3: Inline the bitmap instead of dynamically allocated. Still is a
pointer to it is needed because of the way proc_do_large_bitmap work.
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In phy_device_create() we set phydev->autoneg = 1. This isn't changed
even if the PHY doesn't support autoneg. This seems to affect very
few PHY's, and they disable phydev->autoneg in their config_init
callback. So it's more of an improvement, therefore net-next.
The patch also wouldn't apply to older kernel versions because the
link mode bitmaps have been introduced recently.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 Apr 2019 18:25:33 +0000 (11:25 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2019-04-18
This series contains updates to the ice driver only.
Anirudh fixes up code comments which had typos. Added support for DCB
into the ice driver, which required a bit of refactoring of the existing
code. Also fixed a potential race condition between closing and opening
the VSI for a MIB change event, so resolved this by grabbing the
rtnl_lock prior to closing. Added support to process LLDP MIB change
notifications. Added support for reporting DCB stats via ethtool.
Brett updates the calculation to increment ITR to use a direct
calculation instead of using estimations. This provides a more accurate
value.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Brett Creeley [Thu, 28 Feb 2019 23:25:47 +0000 (15:25 -0800)]
ice: Calculate ITR increment based on direct calculation
Currently when calculating how much to increment ITR by inside of
ice_update_itr() we do some estimations and intermediate
calculations. Instead of doing estimations, just do the
calculation directly. This allows for a more accurate value and it
makes it easier for the next person to understand and update.
Also, remove the dividing the ITR value by 2 when latency
driven because the ITR values are already so low for 100Gbps
speed. This should help get to the desired ITR value faster.
Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds a new function ice_pf_dcb_cfg (and related helpers)
which applies the DCB configuration obtained from the firmware. As
part of this, VSIs/netdevs are updated with traffic class information.
This patch requires a bit of a refactor of existing code.
1. For a MIB change event, the associated VSI is closed and brought up
again. The gap between closing and opening the VSI can cause a race
condition. Fix this by grabbing the rtnl_lock prior to closing the
VSI and then only free it after re-opening the VSI during a MIB
change event.
2. ice_sched_query_elem is used in ice_sched.c and with this patch, in
ice_dcb.c as well. However, ice_dcb.c is not built when CONFIG_DCB is
unset. This results in namespace warnings (ice_sched.o: Externally
defined symbols with no external references) when CONFIG_DCB is unset.
To avoid this move ice_sched_query_elem from ice_sched.c to
ice_common.c.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch introduces a new top level function ice_init_dcb (and
related lower level helper functions) which continues the DCB init
flow.
This function uses ice_get_dcb_cfg to get, parse and store the DCB
configuration. Once this is done, it sets itself up to be notified
by the firmware on LLDP MIB change events.
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch introduces a skeleton for ice_init_pf_dcb, the top level
function for DCB initialization. Subsequent patches will add to this
DCB init flow.
In this patch, ice_init_pf_dcb checks if DCB is a supported capability.
If so, an admin queue call to start the LLDP and DCBx in firmware is
issued. If not, an error is reported. Note that we don't fail the driver
init if DCB init fails.
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Capitalize abbreviations and spell out some that aren't obvious.
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Larry Finger [Wed, 17 Apr 2019 00:35:47 +0000 (19:35 -0500)]
rtlwifi: rtl8188ee: Remove extraneous file
Somehow file drivers/net/wireless/realtek/rtlwifi/rtl8188ee/trx.c.rej was
incorporated into the sources. Obviously, it can be removed.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Reported-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
David Ahern [Wed, 17 Apr 2019 00:31:43 +0000 (17:31 -0700)]
net ipv6: Prevent neighbor add if protocol is disabled on device
Disabling IPv6 on an interface removes existing entries but nothing prevents
new entries from being manually added. To that end, add a new neigh_table
operation, allow_add, that is called on RTM_NEWNEIGH to see if neighbor
entries are allowed on a given device. If IPv6 is disabled on the device,
allow_add returns false and passes a message back to the user via extack.
$ echo 1 > /proc/sys/net/ipv6/conf/eth1/disable_ipv6
$ ip -6 neigh add fe80::4c88:bff:fe21:2704 dev eth1 lladdr de:ad:be:ef:01:01
Error: IPv6 is disabled on this device.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
ipv6: Use fib6_result for fib_lookups
Add fib6_result as a single data structure to hold results from a fib
lookup. IPv6 currently has everything in 1 data structure - a fib6_info,
but with nexthop objects the fib6_nh can be in a nexthop or a nexthop
can be a blackhole which affects the fib6_type and flags (REJECT).
v2
- fixed 2 bugs in patch12:
i. checking return from fib6_table_lookup in fib6_lookup
ii. call to fib6_rule_saddr in fib6_rule_action_alt should use res->nh
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 16 Apr 2019 21:36:11 +0000 (14:36 -0700)]
ipv6: Add fib6_type and fib6_flags to fib6_result
Add the fib6_flags and fib6_type to fib6_result. Update the lookup helpers
to set them and update post fib lookup users to use the version from the
result.
This allows nexthop objects to have blackhole nexthop.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 16 Apr 2019 21:36:05 +0000 (14:36 -0700)]
ipv6: Pass fib6_result to rt6_insert_exception
Update rt6_insert_exception to take a fib6_result over a fib6_info.
Change ort to f6i from the fib6_result and rename to better reflect
what it references (a fib6_info).
Since this function is already getting changed, update the comments
to reference fib6_info variables rather than the older rt6_info.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 16 Apr 2019 21:36:00 +0000 (14:36 -0700)]
ipv6: Pass fib6_result to rt6_find_cached_rt
Simplify rt6_find_cached_rt for the fast path cases and pass fib6_result
to rt6_find_cached_rt. Rename the local return variable to ret to maintain
consisting with fib6_result name.
Update the comment in rt6_find_cached_rt to reference the new names in
a fib6_info vs the old name when fib entries were an rt6_info.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 16 Apr 2019 21:35:59 +0000 (14:35 -0700)]
ipv6: Rename fib6_multipath_select and pass fib6_result
Add 'struct fib6_result' to hold the fib entry and fib6_nh from a fib
lookup as separate entries, similar to what IPv4 now has with fib_result.
Rename fib6_multipath_select to fib6_select_path, pass fib6_result to
it, and set f6i and nh in the result once a path selection is done.
Call fib6_select_path unconditionally for path selection which means
moving the sibling and oif check to fib6_select_path. To handle the two
different call paths (2 only call multipath_select if flowi6_oif == 0 and
the other always calls it), add a new have_oif_match that controls the
sibling walk if relevant.
Update callers of fib6_multipath_select accordingly and have them use the
fib6_info and fib6_nh from the result.
This is needed for multipath nexthop objects where a single f6i can
point to multiple fib6_nh (similar to IPv4).
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
s390/qeth: stop/wake TX queues based on their fill level
Current xmit code only stops the txq after attempting to fill an
IO buffer that hasn't been TX-completed yet. In many-connection
scenarios, this can result in frequent rejected TX attempts, requeuing
of skbs with NETDEV_TX_BUSY and extra overhead.
Now that we have a proper 1-to-1 relation between stack-side txqs and
our HW Queues, overhaul the stop/wake logic so that the xmit code
stops the txq as needed.
Given that we might map multiple skbs into a single buffer, it's crucial
to ensure that the queue always provides an _entirely_ empty IO buffer.
Otherwise large skbs (eg TSO) might not fit into the last available
buffer. So whenever qeth_do_send_packet() first utilizes an _empty_
buffer, it updates & checks the used_buffers count.
This now ensures that an skb passed to qeth_xmit() can always be mapped
into an IO buffer, so remove all of the -EBUSY roll-back handling in the
TX path. We preserve the minimal safety-checks ("Is this IO buffer
really available?"), just in case some nasty future bug ever attempts to
corrupt an in-use buffer.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
s390/qeth: add TX multiqueue support for OSA devices
This adds trivial support for multiple TX queues on OSA-style devices
(both real HW and z/VM NICs). For now we expose the driver's existing
QoS mechanism via .ndo_select_queue, and adjust the number of available
TX queues when qeth_update_from_chp_desc() detects that the
HW configuration has changed.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
s390/qeth: add TX multiqueue support for IQD devices
qeth has been supporting multiple HW Output Queues for a long time. But
rather than exposing those queues to the stack, it uses its own queue
selection logic in .ndo_start_xmit... with all the drawbacks that
entails.
Start off by switching IQD devices over to a proper mqs net_device,
and converting all the netdev_queue management code.
One oddity with IQD devices is the requirement to place all mcast
traffic on the _highest_ established HW queue. Doing so via
.ndo_select_queue seems straight-forward - but that won't work if only
some of the HW queues are active
(ie. when dev->real_num_tx_queues < dev->num_tx_queues), since
netdev_cap_txqueue() will not allow us to put skbs on the higher queues.
To make this work, we
1. let .ndo_select_queue() map all mcast traffic to netdev_queue 0, and
2. later re-map the netdev_queue and HW queue indices in
.ndo_start_xmit and the TX completion handler.
With this patch we default to a fixed set of 1 ucast and 1 mcast queue.
Support for dynamic reconfiguration is added at a later time.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
struct netdev_queue contains a counter for tx timeouts, which gets
updated by dev_watchdog(). So let's not attempt to maintain our own
statistics, in particular not by overloading the skb-error counter.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>