Jacob Keller [Tue, 25 Oct 2016 23:08:49 +0000 (16:08 -0700)]
i40e: remove second check of VLAN_N_VID in i40e_vlan_rx_add_vid
Replace a check of magic number 4095 with VLAN_N_VID. This
makes it obvious that a later check against VLAN_N_VID is
always true and can be removed.
Change-ID: I28998f127a61a529480ce63d8a07e266f6c63b7b Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 25 Oct 2016 23:08:48 +0000 (16:08 -0700)]
i40e: remove error_param_int label from i40e_vc_config_promiscuous_mode_msg
This label is unnecessary, as are jumping to a block that checks aq_ret
and then immediately skipping it and returning. So just jump straight to
the error_param and remove this unnecessary label.
Also use goto error_param even in the last check for style consistency.
Change-ID: If487c7d10c4048e37c594e5eca167693aaed45f6 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Tue, 25 Oct 2016 23:08:47 +0000 (16:08 -0700)]
i40evf: Be much more verbose about what we can and cannot offload
This change makes it so that we are much more robust about defining what we
can and cannot offload. Previously we were performing no checks. This
should bring us up to parity with the i40e PF driver.
In addition the device only supports GSO as long as the MSS is 64 or
greater. We were not checking this so an MSS less than that was resulting
in Tx hangs.
Change-ID: If533553ec92fc6ba694eab6ac81fdaf3004f3592 Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Tue, 25 Oct 2016 23:08:46 +0000 (16:08 -0700)]
i40e: Be much more verbose about what we can and cannot offload
This change makes it so that we are much more robust about defining what we
can and cannot offload. Previously we were just checking for the L4 tunnel
header length, however there are other fields we should be verifying as
there are multiple scenarios in which we cannot perform hardware offloads.
In addition the device only supports GSO as long as the MSS is 64 or
greater. We were not checking this so an MSS less than that was resulting
in Tx hangs.
Change-ID: I5e2fd5f3075c73601b4b36327b771c64fcb6c31b Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
David S. Miller [Fri, 2 Dec 2016 18:52:02 +0000 (13:52 -0500)]
Merge branch 'mvneta-64bit'
Gregory CLEMENT says:
====================
Support Armada 37xx SoC (ARMv8 64-bits) in mvneta driver
The Armada 37xx is a new ARMv8 SoC from Marvell using same network
controller as the older Armada 370/38x/XP SoCs. This series adapts the
driver in order to be able to use it on this new SoC. The main changes
are:
- 64-bits support: the first patches allow using the driver on a 64-bit
architecture.
- MBUS support: the mbus configuration is different on Armada 37xx
from the older SoCs.
- per cpu interrupt: Armada 37xx do not support per cpu interrupt for
the NETA IP, the non-per-CPU behavior was added back.
The first patch is an optimization in the rx path in swbm mode.
The second patch remove unnecessary allocation for HWBM.
The first item is solved by patches 4 and 5.
The 2 last items are solved by patch 6.
In patch 7 the dt support is added.
Beside Armada 37xx, this series have been again tested on Armada XP
and Armada 38x (with Hardware Buffer Management and with Software
Buffer Management).
This is the 6th version of the series:
- 1st version:
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/469588.html
Changelog:
v5 -> v6:
- Added Tested-by from Marcin Wojtas on the series
- Added Reviewed-by from Jisheng Zhang on patch 3
- Fix eth1 phy mode for Armada 3720 DB board on patch 7
v4 -> v5:
- remove unnecessary cast in patch 3
v3 -> v4:
- Adding new patch: "net: mvneta: do not allocate buffer in rxq init
with HWBM"
- Simplify the HWBM case in patch 3 as suggested by Marcin
v2 -> v3:
- Adding patch 1 "Optimize rx path for small frame"
- Fix the kbuild error by moving the "phys_addr += pp->rx_offset_correction;"
line from patch 2 to patch 3 where rx_offset_correction is introduced.
- Move the memory allocation of the buf_virt_addr of the rxq to be
called by the probe function in order to avoid a memory leak.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Thu, 1 Dec 2016 17:03:09 +0000 (18:03 +0100)]
net: mvneta: Add network support for Armada 3700 SoC
Armada 3700 is a new ARMv8 SoC from Marvell using same network controller
as older Armada 370/38x/XP. There are however some differences that
needed taking into account when adding support for it:
* open default MBUS window to 4GB of DRAM - Armada 3700 SoC's Mbus
configuration for network controller has to be done on two levels:
global and per-port. The first one is inherited from the
bootloader. The latter can be opened in a default way, leaving
arbitration to the bus controller. Hence filled mbus_dram_target_info
structure is not needed
* make per-CPU operation optional - Recent patches adding RSS and XPS
support for Armada 38x/XP enabled per-CPU operation of the controller
by default. Contrary to older SoC's Armada 3700 SoC's network
controller is not capable of per-CPU processing due to interrupt lines'
connectivity. This patch restores non-per-CPU operation, which is now
optional and depends on neta_armada3700 flag value in mvneta_port
structure. In order not to complicate the code, separate interrupt
subroutine is implemented.
For now, on the Armada 3700, RSS is disabled as the current
implementation depend on the per cpu interrupts.
[gregory.clement@free-electrons.com: extract from a larger patch, replace
some ifdef and port to net-next for v4.10]
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gregory CLEMENT [Thu, 1 Dec 2016 17:03:08 +0000 (18:03 +0100)]
net: mvneta: Only disable mvneta_bm for 64-bits
Actually only the mvneta_bm support is not 64-bits compatible.
The mvneta code itself can run on 64-bits architecture.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Thu, 1 Dec 2016 17:03:07 +0000 (18:03 +0100)]
net: mvneta: Convert to be 64 bits compatible
Prepare the mvneta driver in order to be usable on the 64 bits platform
such as the Armada 3700.
[gregory.clement@free-electrons.com]: this patch was extract from a larger
one to ease review and maintenance.
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gregory CLEMENT [Thu, 1 Dec 2016 17:03:06 +0000 (18:03 +0100)]
net: mvneta: Use cacheable memory to store the rx buffer virtual address
Until now the virtual address of the received buffer were stored in the
cookie field of the rx descriptor. However, this field is 32-bits only
which prevents to use the driver on a 64-bits architecture.
With this patch the virtual address is stored in an array not shared with
the hardware (no more need to use the DMA API). Thanks to this, it is
possible to use cache contrary to the access of the rx descriptor member.
The change is done in the swbm path only because the hwbm uses the cookie
field, this also means that currently the hwbm is not usable in 64-bits.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Reviewed-by: Jisheng Zhang <jszhang@marvell.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gregory CLEMENT [Thu, 1 Dec 2016 17:03:05 +0000 (18:03 +0100)]
net: mvneta: Do not allocate buffer in rxq init with HWBM
For HWBM all buffers are allocated in mvneta_bm_construct() and in runtime
they are put into descriptors by hardware. There is no need to fill them
at this point.
Suggested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gregory CLEMENT [Thu, 1 Dec 2016 17:03:04 +0000 (18:03 +0100)]
net: mvneta: Optimize rx path for small frame
For small frame reuse the phys_addr variable instead of accessing the
uncacheable value in the rx descriptor.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Dec 2016 18:46:21 +0000 (13:46 -0500)]
Merge branch 'bpf-support-for-sockets'
David Ahern says:
====================
net: Add bpf support for sockets
The recently added VRF support in Linux leverages the bind-to-device
API for programs to specify an L3 domain for a socket. While
SO_BINDTODEVICE has been around for ages, not every ipv4/ipv6 capable
program has support for it. Even for those programs that do support it,
the API requires processes to be started as root (CAP_NET_RAW) which
is not desirable from a general security perspective.
This patch set leverages Daniel Mack's work to attach bpf programs to
a cgroup to provide a capability to set sk_bound_dev_if for all
AF_INET{6} sockets opened by a process in a cgroup when the sockets
are allocated.
For example:
1. configure vrf (e.g., using ifupdown2)
auto eth0
iface eth0 inet dhcp
vrf mgmt
3. set shell into cgroup (e.g., can be done at login using pam)
echo $$ >> /tmp/cgroupv2/mgmt/cgroup.procs
At this point all commands run in the shell (e.g, apt) have sockets
automatically bound to the VRF (see output of ss -ap 'dev == <vrf>'),
including processes not running as root.
This capability enables running any program in a VRF context and is key
to deploying Management VRF, a fundamental configuration for networking
gear, with any Linux OS installation.
This patchset also exports the socket family, type and protocol as
read-only allowing bpf filters to deny a process in a cgroup the ability
to open specific types of AF_INET or AF_INET6 sockets.
v7
- comments from Alexei
v6
- add export of socket family, type and protocol
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Dec 2016 16:48:08 +0000 (08:48 -0800)]
samples/bpf: add userspace example for prohibiting sockets
Add examples preventing a process in a cgroup from opening a socket
based family, protocol and type.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Dec 2016 16:48:07 +0000 (08:48 -0800)]
samples/bpf: Update bpf loader for cgroup section names
Add support for section names starting with cgroup/skb and cgroup/sock.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Dec 2016 16:48:06 +0000 (08:48 -0800)]
bpf: Add support for reading socket family, type, protocol
Add socket family, type and protocol to bpf_sock allowing bpf programs
read-only access.
Add __sk_flags_offset[0] to struct sock before the bitfield to
programmtically determine the offset of the unsigned int containing
protocol and type.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Dec 2016 16:48:05 +0000 (08:48 -0800)]
samples: bpf: add userspace example for modifying sk_bound_dev_if
Add a simple program to demonstrate the ability to attach a bpf program
to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets when they
are created.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Dec 2016 16:48:04 +0000 (08:48 -0800)]
bpf: Add new cgroup attach type to enable sock modifications
Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to
BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run
any time a process in the cgroup opens an AF_INET or AF_INET6 socket.
Currently only sk_bound_dev_if is exported to userspace for modification
by a bpf program.
This allows a cgroup to be configured such that AF_INET{6} sockets opened
by processes are automatically bound to a specific device. In turn, this
enables the running of programs that do not support SO_BINDTODEVICE in a
specific VRF context / L3 domain.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Dec 2016 16:48:03 +0000 (08:48 -0800)]
bpf: Refactor cgroups code in prep for new type
Code move and rename only; no functional change intended.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Thu, 1 Dec 2016 12:54:28 +0000 (18:24 +0530)]
net: thunderx: Fix transmit queue timeout issue
Transmit queue timeout issue is seen in two cases
- Due to a race condition btw setting stop_queue at xmit()
and checking for stopped_queue in NAPI poll routine, at times
transmission from a SQ comes to a halt. This is fixed
by using barriers and also added a check for SQ free descriptors,
incase SQ is stopped and there are only CQE_RX i.e no CQE_TX.
- Contrary to an assumption, a HW errata where HW doesn't stop transmission
even though there are not enough CQEs available for a CQE_TX is
not fixed in T88 pass 2.x. This results in a Qset error with
'CQ_WR_FULL' stalling transmission. This is fixed by adjusting
RXQ's RED levels for CQ level such that there is always enough
space left for CQE_TXs.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Dec 2016 18:28:38 +0000 (13:28 -0500)]
Merge branch 'offloading-tc-rules-hw'
Hadar Hen Zion says:
====================
Offloading tc rules using underline Hardware device
This series adds flower classifier support in offloading tc rules when the
Software ingress device is different from the Hardware ingress device,
such as when dealing with IP tunnels
The first two patches are a small fixes to flower, checking the skip_hw flag
wasn't set before calling the Hardware offloading functions which will try to
offload the rule.
The next two patches are infrastructure patches, a preparation for the fourth
patch which is adding support in flower to offload rules when the ingress
device is not a Hardware device and therefore can't offload.
In this case ndo_setup_tc is called with the mirred (egress) device.
The last three patchs are adding mlx5e support to offload rules using the new
"egress_device" flag.
Thanks,
Hadar
Changes from v0:
- check if CONFIG_NET_CLS_ACT is defined befor calling tc_action_ops get_dev()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 1 Dec 2016 12:06:40 +0000 (14:06 +0200)]
net/mlx5e: Support adding ingress tc rule when egress device flag is set
When ndo_setup_tc is called with an egress_dev flag set, it means that
the ndo call was executed on the mirred action (egress) device and not
on the ingress device.
In order to support this kind of ndo_setup_tc call, and insert the
correct decap rule to the hardware, the uplink device on the same eswitch
should be found.
Currently, we use this resolution between the mirred device and the
uplink on the same eswitch to offload vxlan shared device decap rules.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 1 Dec 2016 12:06:38 +0000 (14:06 +0200)]
net/mlx5e: Bring back representor's ndos that were accidentally removed
The VF Representor udp tunnel ndo entries were removed by mistake,
return them.
Fixes: 370bad0f9a52 ('net/mlx5e: Support HW (offloaded) and SW counters for SRIOV switchdev mode') Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 1 Dec 2016 12:06:37 +0000 (14:06 +0200)]
net/sched: cls_flower: Add offload support using egress Hardware device
In order to support hardware offloading when the device given by the tc
rule is different from the Hardware underline device, extract the mirred
(egress) device from the tc action when a filter is added, using the new
tc_action_ops, get_dev().
Flower caches the information about the mirred device and use it for
calling ndo_setup_tc in filter change, update stats and delete.
Calling ndo_setup_tc of the mirred (egress) device instead of the
ingress device will allow a resolution between the software ingress
device and the underline hardware device.
The resolution will take place inside the offloading driver using
'egress_device' flag added to tc_to_netdev struct which is provided to
the offloading driver.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 1 Dec 2016 12:06:35 +0000 (14:06 +0200)]
net/sched: cls_flower: Provide a filter to replace/destroy hardware filter functions
Instead of providing many arguments to fl_hw_{replace/destroy}_filter
functions, just provide cls_fl_filter struct that includes all the relevant
args.
This patches doesn't add any new functionality.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 1 Dec 2016 12:06:34 +0000 (14:06 +0200)]
net/sched: cls_flower: Try to offload only if skip_hw flag isn't set
Check skip_hw flag isn't set before calling
fl_hw_{replace/destroy}_filter and fl_hw_update_stats functions.
Replace the call to tc_should_offload with tc_can_offload.
tc_can_offload only checks if the device supports offloading, the check for
skip_hw flag is done earlier in the flow.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 1 Dec 2016 12:06:33 +0000 (14:06 +0200)]
net/sched: Add separate check for skip_hw flag
Creating a difference between two possible cases:
1. Not offloading tc rule since the user sets 'skip_hw' flag.
2. Not offloading tc rule since the device doesn't support offloading.
This patch doesn't add any new functionality.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: allow to turn tcp timestamp randomization off
Eric says: "By looking at tcpdump, and TS val of xmit packets of multiple
flows, we can deduct the relative qdisc delays (think of fq pacing).
This should work even if we have one flow per remote peer."
Having random per flow (or host) offsets doesn't allow that anymore so add
a way to turn this off.
Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: randomize tcp timestamp offsets for each connection
jiffies based timestamps allow for easy inference of number of devices
behind NAT translators and also makes tracking of hosts simpler.
commit ceaa1fef65a7c2e ("tcp: adding a per-socket timestamp offset")
added the main infrastructure that is needed for per-connection ts
randomization, in particular writing/reading the on-wire tcp header
format takes the offset into account so rest of stack can use normal
tcp_time_stamp (jiffies).
So only two items are left:
- add a tsoffset for request sockets
- extend the tcp isn generator to also return another 32bit number
in addition to the ISN.
Re-use of ISN generator also means timestamps are still monotonically
increasing for same connection quadruple, i.e. PAWS will still work.
Includes fixes from Eric Dumazet.
Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This series introduces hardware offload iSCSI initiator driver for the
41000 Series Converged Network Adapters (579xx chip) by Qlogic. The overall
driver design includes a common module ('qed') and protocol specific
dependent modules ('qedi' for iSCSI).
This is an open iSCSI driver, modifications to open iSCSI user components
'iscsid', 'iscsiuio', etc. are required for the solution to work. The user
space changes are also in the process of being submitted.
The 'qed' common module, under drivers/net/ethernet/qlogic/qed/, is
enhanced with functionality required for the iSCSI support. This series
is based on:
net tree base: Merge of net and net-next as of 11/29/2016
Changes from RFC v2:
1. qedi patches are squashed into single patch to prevent krobot
warning.
2. Fixed 'hw_p_cpuq' incompatible pointer type.
3. Fixed sparse incompatible types in comparison expression.
4. Misc fixes with latest 'checkpatch --strict' option.
5. Remove int_mode option from MODULE_PARAM.
6. Prefix all MODULE_PARAM params with qedi_*.
7. Use CONFIG_QED_ISCSI instead of CONFIG_QEDI
8. Added bad task mem access fix.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Thu, 1 Dec 2016 08:21:07 +0000 (00:21 -0800)]
qed: Add iSCSI out of order packet handling.
This patch adds out of order packet handling for hardware offloaded
iSCSI. Out of order packet handling requires driver buffer allocation
and assistance.
Signed-off-by: Arun Easi <arun.easi@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Yuval Mintz [Thu, 1 Dec 2016 08:21:06 +0000 (00:21 -0800)]
qed: Add support for hardware offloaded iSCSI.
This adds the backbone required for the various HW initalizations
which are necessary for the iSCSI driver (qedi) for QLogic FastLinQ
4xxxx line of adapters - FW notification, resource initializations, etc.
Signed-off-by: Arun Easi <arun.easi@cavium.com> Signed-off-by: Yuval Mintz <yuval.mintz@cavium.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Wed, 30 Nov 2016 21:05:39 +0000 (22:05 +0100)]
net/mlx5e: skip loopback selftest with !CONFIG_INET
When CONFIG_INET is disabled, the new selftest results in a link
error:
drivers/net/ethernet/mellanox/mlx5/core/mlx5_core.o: In function `mlx5e_test_loopback':
en_selftest.c:(.text.mlx5e_test_loopback+0x2ec): undefined reference to `ip_send_check'
en_selftest.c:(.text.mlx5e_test_loopback+0x34c): undefined reference to `udp4_hwcsum'
This hides the specific test in that configuration.
Fixes: 0952da791c97 ("net/mlx5e: Add support for loopback selftest") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 30 Nov 2016 21:16:06 +0000 (22:16 +0100)]
bpf, xdp: drop rcu_read_lock from bpf_prog_run_xdp and move to caller
After 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"),
the rcu_read_lock() in bpf_prog_run_xdp() is superfluous, since callers
need to hold rcu_read_lock() already to make sure BPF program doesn't
get released in the background.
Thus, drop it from bpf_prog_run_xdp(), as it can otherwise be misleading.
Still keeping the bpf_prog_run_xdp() is useful as it allows for grepping
in XDP supported drivers and to keep the typecheck on the context intact.
For mlx4, this means we don't have a double rcu_read_lock() anymore. nfp can
just make use of bpf_prog_run_xdp(), too. For qede, just move rcu_read_lock()
out of the helper. When the driver gets atomic replace support, this will
move to call-sites eventually.
mlx5 needs actual fixing as it has the same issue as described already in 326fe02d1ed6 ("net/mlx4_en: protect ring->xdp_prog with rcu_read_lock"),
that is, we're under RCU bh at this time, BPF programs are released via
call_rcu(), and call_rcu() != call_rcu_bh(), so we need to properly mark
read side as programs can get xchg()'ed in mlx5e_xdp_set() without queue
reset.
Fixes: 86994156c736 ("net/mlx5e: XDP fast RX drop bpf programs support") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sock: reset sk_err for ICMP packets read from error queue
Only when ICMP packets are enqueued onto the error queue,
sk_err is also set. Before f5f99309fa74 (sock: do not set sk_err
in sock_dequeue_err_skb), a subsequent error queue read
would set sk_err to the next error on the queue, or 0 if empty.
As no error types other than ICMP set this field, sk_err should
not be modified upon dequeuing them.
Only for ICMP errors, reset the (racy) sk_err. Some applications,
like traceroute, rely on it and go into a futile busy POLLERR
loop otherwise.
In principle, sk_err has to be set while an ICMP error is queued.
Testing is_icmp_err_skb(skb_next) approximates this without
requiring a full queue walk. Applications that receive both ICMP
and other errors cannot rely on this legacy behavior, as other
errors do not set sk_err in the first place.
Fixes: f5f99309fa74 (sock: do not set sk_err in sock_dequeue_err_skb) Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: Willem de Bruijn <willemb@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Maciej Żenczykowski <maze@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Dec 2016 15:52:05 +0000 (10:52 -0500)]
Merge branch 'lwt-bpf'
Thomas Graf says:
====================
bpf: BPF for lightweight tunnel encapsulation
This series implements BPF program invocation from dst entries via the
lightweight tunnels infrastructure. The BPF program can be attached to
lwtunnel_input(), lwtunnel_output() or lwtunnel_xmit() and see an L3
skb as context. Programs attached to input and output are read-only.
Programs attached to lwtunnel_xmit() can modify and redirect, push headers
and redirect packets.
The facility can be used to:
- Collect statistics and generate sampling data for a subset of traffic
based on the dst utilized by the packet thus allowing to extend the
existing realms.
- Apply additional per route/dst filters to prohibit certain outgoing
or incoming packets based on BPF filters. In particular, this allows
to maintain per dst custom state across multiple packets in BPF maps
and apply filters based on statistics and behaviour observed over time.
- Attachment of L2 headers at transmit where resolving the L2 address
is not required.
- Possibly many more.
v3 -> v4:
- Bumped LWT_BPF_MAX_HEADROOM from 128 to 256 (Alexei)
- Renamed bpf_skb_push() helper to bpf_skb_change_head() to relate to
existing bpf_skb_change_tail() helper (Alexei/Daniel)
- Added check in __bpf_redirect_common() to verify that program added a
link header before redirecting to a l2 device. Adding the check to
lwt-bpf code was considered but dropped due to massive code required
due to retrieval of net_device via per-cpu redirect buffer. A test
case was added to cover the scenario when a program directs to an l2
device without adding an appropriate l2 header.
(Alexei)
- Prohibited access to tc_classid (Daniel)
- Collapsed bpf_verifier_ops instance for lwt in/out as they are
identical (Daniel)
- Some cosmetic changes
v2 -> v3:
- Added real world sample lwt_len_hist_kern.c which demonstrates how to
collect a histogram on packet sizes for all packets flowing through
a number of routes.
- Restricted output to be read-only. Since the header can no longer
be modified, the rerouting functionality has been removed again.
- Added test case which cover destructive modification of packet data.
v1 -> v2:
- Added new BPF_LWT_REROUTE return code for program to indicate
that new route lookup should be performed. Suggested by Tom.
- New sample to illustrate rerouting
- New patch 05: Recursion limit for lwtunnel_output for the case
when user creates circular dst redirection. Also resolves the
issue for ILA.
- Fix to ensure headroom for potential future L2 header is still
guaranteed
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Wed, 30 Nov 2016 16:10:11 +0000 (17:10 +0100)]
bpf: Add tests and samples for LWT-BPF
Adds a series of tests to verify the functionality of attaching
BPF programs at LWT hooks.
Also adds a sample which collects a histogram of packet sizes which
pass through an LWT hook.
$ ./lwt_len_hist.sh
Starting netserver with host 'IN(6)ADDR_ANY' port '12865' and family AF_UNSPEC
MIGRATED TCP STREAM TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 192.168.253.2 () port 0 AF_INET : demo
Recv Send Send
Socket Socket Message Elapsed
Size Size Size Time Throughput
bytes bytes bytes secs. 10^6bits/sec
Thomas Graf [Wed, 30 Nov 2016 16:10:10 +0000 (17:10 +0100)]
bpf: BPF for lightweight tunnel infrastructure
Registers new BPF program types which correspond to the LWT hooks:
- BPF_PROG_TYPE_LWT_IN => dst_input()
- BPF_PROG_TYPE_LWT_OUT => dst_output()
- BPF_PROG_TYPE_LWT_XMIT => lwtunnel_xmit()
The separate program types are required to differentiate between the
capabilities each LWT hook allows:
* Programs attached to dst_input() or dst_output() are restricted and
may only read the data of an skb. This prevent modification and
possible invalidation of already validated packet headers on receive
and the construction of illegal headers while the IP headers are
still being assembled.
* Programs attached to lwtunnel_xmit() are allowed to modify packet
content as well as prepending an L2 header via a newly introduced
helper bpf_skb_change_head(). This is safe as lwtunnel_xmit() is
invoked after the IP header has been assembled completely.
All BPF programs receive an skb with L3 headers attached and may return
one of the following error codes:
BPF_OK - Continue routing as per nexthop
BPF_DROP - Drop skb and return EPERM
BPF_REDIRECT - Redirect skb to device as per redirect() helper.
(Only valid in lwtunnel_xmit() context)
The return codes are binary compatible with their TC_ACT_
relatives to ease compatibility.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Wed, 30 Nov 2016 16:10:09 +0000 (17:10 +0100)]
route: Set lwtstate for local traffic and cached input dsts
A route on the output path hitting a RTN_LOCAL route will keep the dst
associated on its way through the loopback device. On the receive path,
the dst_input() call will thus invoke the input handler of the route
created in the output path. Thus, lwt redirection for input must be done
for dsts allocated in the otuput path as well.
Also, if a route is cached in the input path, the allocated dst should
respect lwtunnel configuration on the nexthop as well.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Wed, 30 Nov 2016 16:10:08 +0000 (17:10 +0100)]
route: Set orig_output when redirecting to lwt on locally generated traffic
orig_output for IPv4 was only set for dsts which hit an input route.
Set it consistently for locally generated traffic as well to allow
lwt to continue the dst_output() path as configured by the nexthop.
Fixes: 2536862311d ("lwt: Add support to redirect dst.input") Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
The following series from Tariq and Roi, provides some critical fixes
and updates for the mlx5e driver.
From Tariq:
- Fix driver coherent memory huge allocation issues by fragmenting
completion queues, in a way that is transparent to the netdev driver by
providing a new buffer type "mlx5_frag_buf" with the same access API.
- Create UMR MKey per RQ to have better scalability.
From Roi:
- Some fixes for the encap-decap support and tc flower added lately to the
mlx5e driver.
v1->v2:
- Fix start index in error flow of mlx5_frag_buf_alloc_node, pointed out by Eric.
This series was generated against commit: 31ac1c19455f ("geneve: fix ip_hdr_len reserved for geneve6 tunnel.")
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Roi Dayan [Wed, 30 Nov 2016 15:59:43 +0000 (17:59 +0200)]
net/mlx5e: Remove flow encap entry in the correct place
Handling flow encap entry should be inside tc del flow
and is only relevant for offloaded eswitch TC rules.
Fixes: 11a457e9b6c1 ("net/mlx5e: Add basic TC tunnel set action for SRIOV offloads") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roi Dayan [Wed, 30 Nov 2016 15:59:42 +0000 (17:59 +0200)]
net/mlx5e: Refactor tc del flow to accept mlx5e_tc_flow instance
Change the function that deletes offloaded TC rule to get
struct mlx5e_tc_flow instance which contains both the flow
handle and flow attributes. This is a cleanup needed for
downstream patches, it doesn't change any functionality.
Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roi Dayan [Wed, 30 Nov 2016 15:59:41 +0000 (17:59 +0200)]
net/mlx5e: Correct cleanup order when deleting offloaded TC rules
According to the reverse unwinding principle, on delete time we should
first handle deletion of the steering rule and later handle the vlan
deletion from the eswitch.
Fixes: 8b32580df1cb ("net/mlx5e: Add TC vlan action for SRIOV offloads") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roi Dayan [Wed, 30 Nov 2016 15:59:40 +0000 (17:59 +0200)]
net/mlx5e: Remove redundant hashtable lookup in configure flower
We will never find a flow with the same cookie as cls_flower always
allocates a new flow and the cookie is the allocated memory address.
Fixes: e3a2b7ed018e ("net/mlx5e: Support offload cls_flower with drop action") Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Wed, 30 Nov 2016 15:59:39 +0000 (17:59 +0200)]
net/mlx5e: Create UMR MKey per RQ
In Striding RQ implementation, we used a single UMR
(User-Mode Memory Registration) memory key for all RQs.
When the product of RQs number*size gets high, we hit a
limitation of u16 field size in FW.
Here we move to using a UMR memory key per RQ, so we can
scale to any number of rings, with the maximum buffer
size in each.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Wed, 30 Nov 2016 15:59:38 +0000 (17:59 +0200)]
net/mlx5e: Move function mlx5e_create_umr_mkey
In next patch we are going to create a UMR MKey per RQ, we need
mlx5e_create_umr_mkey declared before mlx5e_create_rq.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Wed, 30 Nov 2016 15:59:37 +0000 (17:59 +0200)]
net/mlx5e: Implement Fragmented Work Queue (WQ)
Add new type of struct mlx5_frag_buf which is used to allocate fragmented
buffers rather than contiguous, and make the Completion Queues (CQs) use
it as they are big (default of 2MB per CQ in Striding RQ).
This fixes the failures of type:
"mlx5e_open_locked: mlx5e_open_channels failed, -12"
due to dma_zalloc_coherent insufficient contiguous coherent memory to
satisfy the driver's request when the user tries to setup more or larger
rings.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Neill Whillans [Wed, 30 Nov 2016 13:41:05 +0000 (13:41 +0000)]
net: ethernet: altera_tse: add support for SGMII PCS
Add support for the (optional) SGMII PCS functionality of the Altera
TSE MAC. If the phy-mode is set to 'sgmii' then we attempt to discover
and initialise the PCS so that the MAC can communicate to the PHY.
The PCS IP block provides a scratch register for testing presence of
the PCS, which is mapped into one of the two MDIO spaces present in
the MAC's register space. Once we have determined that the scratch
register is functioning, we attempt to initialise the PCS to
auto-negotiate an SGMII link with the PHY. There is no need to monitor
or manage the SGMII link beyond this, since the normal PHY MDIO will
then be used to monitor the media layer.
The Altera TSE MAC has only one way in which it can be configured with an
SGMII PCS, and as such, this patch only looks to the phy-mode to select
whether or not to attempt to initialise the PCS registers. During
initialisation, we report the PCS's equivalent of a PHY ID register.
This can be parameterised during the IP instantiation and is often left
as '0x00000000' which is not an error.
Signed-off-by: Neill Whillans <neill.whillans@codethink.co.uk> Reviewed-by: Daniel Silverstone <daniel.silverstone@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Stephen Agate [Wed, 30 Nov 2016 13:41:04 +0000 (13:41 +0000)]
net: phy: vitesse: add support for VSC8572
Add support for the Vitesse VSC8572 which is functionally equivalent to
the already supported VSC8574. As such, all the same handling functions
are used since the VSC8572 merely has half the number of phy blocks
internally.
Signed-off-by: Stephen Agate <stephen.agate@uk.thalesgroup.com> Signed-off-by: Neill Whillans <neill.whillans@codethink.co.uk> Reviewed-by: Daniel Silverstone <daniel.silverstone@codethink.co.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Thu, 1 Dec 2016 17:00:54 +0000 (17:00 +0000)]
sfc: fix debug message format string in efx_farch_handle_rx_not_ok
Defalconisation removed one of the string arguments, but missed the
corresponding %s.
Fixes: 5a6681e22c14 ("sfc: separate out SFC4000 ("Falcon") support into new sfc-falcon driver") Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 1 Dec 2016 16:26:48 +0000 (11:26 -0500)]
Merge branch 'mdix_ctrl'
Raju Lakkaraju says:
====================
Adding PHY MDI(X) support
I updated all review comments which were given by Andrew and Florian.
This series add support for PHY MDI(X), and implement it for MSCC phys.
Tested on Beaglebone Black with VSC 8531 PHY.
Change set:
v1:
- Initial patch submit the WoL and MDI-X in single set of patches
v2:
- Split the mdi(x) as signal set of patches.
- Remove the out_unlock as suggested by Andrew.
- Add mdix_ctrl parameter in "phy_device" to handle the user configure
mdi(x). Proposed implementation accepted by Florian.
- phydev->mdix_ctrl initialize with ETH_TP_MDI_AUTO. Ethernet controller
never initialize this parameter.
- Fix the mdix changes in marvell and microchip driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Tue, 29 Nov 2016 09:46:48 +0000 (15:16 +0530)]
net: phy: Add mdi(x) support in Microsemi PHYs driver
To connect two ports of the same configuration (MDI to MDI or
MDI-X to MDI-X) with a 10/100/1000 Mbit/s connection, an
Ethernet crossover cable is needed to cross over the transmit
and receive signals in the cable, so that they are matched at
the connector level.
When connecting an MDI port to an MDI-X port a straight through
cable is used while to connect two MDI ports or two MDI-X ports
a crossover cable must be used. Conventionally MDI is used on end
devices while MDI-X is used on hubs and switches
Auto MDI-X automatically detects the required cable connection
type and configures the connection appropriately, removing the
need for crossover cables to interconnect switches or connecting
PCs peer-to-peer.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Raju Lakkaraju [Tue, 29 Nov 2016 09:46:46 +0000 (15:16 +0530)]
net: phy: add mdix_ctrl to hold the user configuration.
Add new parameter mdix_ctrl to hold the user configuration.
Existing mdix maintain the current status of MDI(X) crossover performed or
not.
mdix_ctrl can configure either ETH_TP_MDI or ETH_TP_MDI_X orETH_TP_MDI_AUTO.
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microsemi.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jes Sorensen [Tue, 29 Nov 2016 23:59:02 +0000 (18:59 -0500)]
rtl8xxxu: Work around issue with 8192eu and 8723bu devices not reconnecting
The H2C MEDIA_STATUS_RPT command for some reason causes 8192eu and
8723bu devices not being able to reconnect.
Reported-by: Barry Day <briselec@gmail.com> Cc: <stable@vger.kernel.org> #4.8+ Signed-off-by: Jes Sorensen <Jes.Sorensen@redhat.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
Lior David [Mon, 28 Nov 2016 11:49:03 +0000 (13:49 +0200)]
wil6210: align to latest auto generated wmi.h
Align to latest version of the auto generated wmi file
describing the interface with FW.
Signed-off-by: Lior David <qca_liord@qca.qualcomm.com> Signed-off-by: Maya Erez <qca_merez@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Lior David [Mon, 28 Nov 2016 11:49:02 +0000 (13:49 +0200)]
wil6210: add debugfs blobs for UCODE code and data
Added new areas to fw_mappings area for UCODE code
and data areas.
The new areas are only exposed through debugfs blobs,
and mainly needed to access UCODE logs.
The change does not affect crash dumps because the
newly added areas overlap with the "upper" area which
is already dumped.
Signed-off-by: Lior David <qca_liord@qca.qualcomm.com> Signed-off-by: Maya Erez <qca_merez@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Maya Erez [Mon, 28 Nov 2016 11:49:01 +0000 (13:49 +0200)]
wil6210: validate wil_pmc_alloc parameters
num_descriptors and descriptor_size needs to be
checked for:
1) not being negative values
2) no overflow occurs when these are multiplied
together as done in wil_pmc_read.
An overflow of two signed integers is undefined
behavior.
Signed-off-by: Maya Erez <qca_merez@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Lior David [Mon, 28 Nov 2016 11:49:00 +0000 (13:49 +0200)]
wil6210: delay remain on channel when scan is active
Currently it was possible to call remain_on_channel(ROC)
while scan was active and this caused a crash in the FW.
In order to fix this problem and make the behavior
consistent with other drivers, queue the ROC in case
a scan is active and try it again when scan is done.
As part of the fix, clean up some locking issues and
return error if scan is called while ROC is active.
Signed-off-by: Lior David <qca_liord@qca.qualcomm.com> Signed-off-by: Maya Erez <qca_merez@qca.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Anthony Romano [Mon, 28 Nov 2016 04:27:57 +0000 (20:27 -0800)]
ath9k_htc: don't use HZ for usb msg timeouts
The usb_*_msg() functions expect a timeout in msecs but are given HZ,
which is ticks per second. If HZ=100, firmware download often times out
when there is modest USB utilization and the device fails to initialize.
Replaces HZ in usb_*_msg timeouts with 1000 msec since HZ is one second
for timeouts in jiffies.
Signed-off-by: Anthony Romano <anthony.romano@coreos.com> Acked-by: Oleksij Rempel <linux@rempel-privat.de> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Bhumika Goyal [Sun, 27 Nov 2016 17:03:26 +0000 (22:33 +0530)]
ath9k: constify ath_bus_ops structure
Declare the structure ath_bus_ops as const as it is only passed as an
argument to the function ath9k_init_device. This argument is of type
const struct ath_bus_ops *, so ath_bus_ops structures with this property
can be declared as const.
Done using Coccinelle:
@r1 disable optional_qualifier @
identifier i;
position p;
@@
static struct ath_bus_ops i@p = {...};
@ok1@
identifier r1.i;
position p;
expression e1,e2;
@@
ath9k_init_device(e1,e2,&i@p)
@bad@
position p!={r1.p,ok1.p};
identifier r1.i;
@@
i@p
ath10k: fix Tx DMA alloc failure during continuous wifi down/up
With maximum number of vap's configured in a two radio supported
systems of ~256 Mb RAM, doing a continuous wifi down/up and
intermittent traffic streaming from the connected stations results
in failure to allocate contiguous memory for tx buffers. This results
in the disappearance of all VAP's and a manual reboot is needed as
this is not a crash (or) OOM(for OOM killer to be invoked). To address
this allocate contiguous memory for tx buffers one time and re-use them
until the modules are unloaded but this results in a slight increase in
memory footprint of ath10k when the wifi is down, but the modules are
still loaded. Also as of now we use a separate bool 'tx_mem_allocated'
to keep track of the one time memory allocation, as we cannot come up
with something like 'ath10k_tx_{register,unregister}' before
'ath10k_probe_fw' is called as 'ath10k_htt_tx_alloc_cont_frag_desc'
memory allocation is dependent on the hw_param 'continuous_frag_desc'
ath10k: fix soft lockup during firmware crash/hw-restart
During firmware crash (or) user requested manual restart
the system gets into a soft lock up state because of the
below root cause.
During user requested hardware restart / firmware crash
the system goes into a soft lockup state as 'napi_synchronize'
is called after 'napi_disable' (which sets 'NAPI_STATE_SCHED'
bit) and it sleeps into infinite loop as it waits for
'NAPI_STATE_SCHED' to be cleared. This condition is hit because
'ath10k_hif_stop' is called twice as below (resulting in calling
'napi_synchronize' after 'napi_disable')
Fix this by calling 'ath10k_halt' in ath10k_core_restart itself
as it makes more sense before informing mac80211 to restart h/w
Also remove 'ath10k_halt' in ath10k_start for the state of 'restarting'
Fixes: 3c97f5de1f28 ("ath10k: implement NAPI support") Cc: <stable@vger.kernel.org> # v4.9 Signed-off-by: Mohammed Shafi Shajakhan <mohammed@qti.qualcomm.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Zefir Kurtisi [Fri, 25 Nov 2016 15:51:58 +0000 (17:51 +0200)]
ath9k: feed only active spectral / dfs-detector
Radar pulse and spectral scan reports are provided by the HW
with the ATH9K_RXERR_PHY flag set. Those are forwarded to
the dfs-detector and spectral module for further processing.
For some older chips, the pre-conditions checked in those
modules are ambiguous, since ATH9K_PHYERR_RADAR is used to
tag both types. As a result, spectral frames are fed into
the dfs-detector and vice versa.
This could lead to a false radar detection on a non-DFS
channel (which is uncritical), but more relevant it causes
useless CPU load for processing invalid frames.
This commit ensures that the dfs-detector and spectral
collector are only fed when they are active.
Signed-off-by: Zefir Kurtisi <zefir.kurtisi@neratec.com> Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
Gao Feng [Wed, 30 Nov 2016 00:48:44 +0000 (08:48 +0800)]
driver: ipvlan: Remove useless member mtu_adj of struct ipvl_dev
The mtu_adj is initialized to zero when alloc mem, there is no any
assignment to mtu_adj. It is only used in ipvlan_adjust_mtu as one
right value.
So it is useless member of struct ipvl_dev, then remove it.
Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lorenzo Colitti [Tue, 29 Nov 2016 17:56:47 +0000 (02:56 +0900)]
net: ipv4: Don't crash if passing a null sk to ip_rt_update_pmtu.
Commit e2d118a1cb5e ("net: inet: Support UID-based routing in IP
protocols.") made __build_flow_key call sock_net(sk) to determine
the network namespace of the passed-in socket. This crashes if sk
is NULL.
Fix this by getting the network namespace from the skb instead.
Fixes: e2d118a1cb5e ("net: inet: Support UID-based routing in IP protocols.") Reported-by: Erez Shitrit <erezsh@dev.mellanox.co.il> Signed-off-by: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Josef Bacik [Tue, 29 Nov 2016 17:35:19 +0000 (12:35 -0500)]
bpf: add test for the verifier equal logic bug
This is a test to verify that
bpf: fix states equal logic for varlen access
actually fixed the problem. The problem was if the register we added to our map
register was UNKNOWN in both the false and true branches and the only thing that
changed was the range then we'd incorrectly assume that the true branch was
valid, which it really wasnt. This tests this case and properly fails without
my fix in place and passes with it in place.
Signed-off-by: Josef Bacik <jbacik@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 30 Nov 2016 19:37:15 +0000 (14:37 -0500)]
Merge branch 'cpsw-per-channel-shaping'
Ivan Khoronzhuk says:
====================
cpsw: add per channel shaper configuration
This series is intended to allow user to set rate for per channel
shapers at cpdma level. This patchset doesn't have impact on performance.
The rate can be set with:
Ivan Khoronzhuk [Tue, 29 Nov 2016 15:00:50 +0000 (17:00 +0200)]
net: ethernet: ti: cpsw: optimize end of poll cycle
Check budget fullness only after it's updated and update
channel mask only once to keep budget balance between channels.
It's also needed for farther changes.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Tue, 29 Nov 2016 15:00:49 +0000 (17:00 +0200)]
net: ethernet: ti: cpsw: add .ndo to set per-queue rate
This patch allows to rate limit queues tx queues for cpsw interface.
The rate is set in absolute Mb/s units and cannot be more a speed
an interface is connected with.
Ivan Khoronzhuk [Tue, 29 Nov 2016 15:00:48 +0000 (17:00 +0200)]
net: ethernet: ti: davinci_cpdma: add set rate for a channel
The cpdma has 8 rate limited tx channels. This patch adds
ability for cpdma driver to use 8 tx h/w shapers. If at least one
channel is not rate limited then it must have higher number, this
is because the rate limited channels have to have higher priority
then not rate limited channels. The channel priority is set in low-hi
direction already, so that when a new channel is added with ethtool
and it doesn't have rate yet, it cannot affect on rate limited
channels. It can be useful for TSN streams and just in cases when
h/w rate limited channels are needed.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Tue, 29 Nov 2016 15:00:47 +0000 (17:00 +0200)]
net: ethernet: ti: davinci_cpdma: add weight function for channels
The weight of a channel is needed to split descriptors between
channels. The weight can depend on maximum rate of channels, maximum
rate of an interface or other reasons. The channel weight is in
percentage and is independent for rx and tx channels.
Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 30 Nov 2016 19:32:06 +0000 (14:32 -0500)]
Merge branch 'qed-XDP-support'
Yuval Mintz says:
====================
qed*: Add XDP support
This patch series is intended to add XDP to the qede driver, although
it contains quite a bit of cleanups, refactorings and infrastructure
changes as well.
The content of this series can be roughly divided into:
- Datapath improvements - mostly focused on having the datapath utilize
parameters which can be more tightly contained in cachelines.
Patches #1, #2, #8, #9 belong to this group.
- Refactoring - done mostly in favour of XDP. Patches #3, #4, #5, #9.
- Infrastructure changes - done in favour of XDP. Paches #6 and #7 belong
to this category [#7 being by far the biggest patch in the series].
- Actual XDP support - last two patches [#10, #11].
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Tue, 29 Nov 2016 14:47:10 +0000 (16:47 +0200)]
qede: Add support for XDP_TX
Add support for forwarding via XDP. Once the eBPF is attached,
driver would allocate & configure a designated transmission queue
meant solely for forwarding packets. Said queue would share the
receive-queue's interrupt line, and would have it's own Tx statistics.
Infrastructure changes required for this [spread-out through the code]:
- Determine the DMA direction of the receive buffers based on the presence
of the eBPF program.
- Turn the sw Tx ring into a union, as regular/XDP queues have different
needs for releasing resources after completion [regular requires the SKB,
XDP requires the transmitted page].
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Tue, 29 Nov 2016 14:47:08 +0000 (16:47 +0200)]
qede: Better utilize the qede_[rt]x_queue
Improve the cacheline usage of both queues by reordering -
This reduces the cachelines required for egress datapath processing
from 3 to 2 and those required by ingress datapath processing by 2.
It also changes a couple of datapath related functions that currently
require either the fastpath or the qede_dev, changing them to be based
on the tx/rx queue instead.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Tue, 29 Nov 2016 14:47:06 +0000 (16:47 +0200)]
qed*: Handle-based L2-queues.
The driver needs to maintain several FW/HW-indices for each one of
its queues. Currently, that mapping is done by the QED where it uses
an rx/tx array of so-called hw-cids, populating them whenever a new
queue is opened and clearing them upon destruction of said queues.
This maintenance is far from ideal - there's no real reason why
QED needs to maintain such a data-structure. It becomes even worse
when considering the fact that the PF's queues and its child VFs' queues
are all mapped into the same data-structure.
As a by-product, the set of parameters an interface needs to supply for
queue APIs is non-trivial, and some of the variables in the API
structures have different meaning depending on their exact place
in the configuration flow.
This patch re-organizes the way L2 queues are configured and maintained.
In short:
- Required parameters for queue init are now well-defined.
- Qed would allocate a queue-cid based on parameters.
Upon initialization success, it would return a handle to caller.
- Queue-handle would be maintained by entity requesting queue-init,
not necessarily qed.
- All further queue-APIs [update, destroy] would use the opaque
handle as reference for the queue instead of various indices.
The possible owners of such handles:
- PF queues [qede] - complete handles based on provided configuration.
- VF queues [qede] - fw-context-less handles, containing only relative
information; Only the PF-side would need the absolute indices
for configuration, so they're omitted here.
- VF queues [qed, PF-side] - complete handles based on VF initialization.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Tue, 29 Nov 2016 14:47:05 +0000 (16:47 +0200)]
qede: Revise state locking scheme
As qede utilizes an internal-reload sequence as result of various
configuration changes, the netif state wouldn't always accurately describe
the status of the configuration.
To compensate, we're storing an internal state of the device, which should
only be accessed under the qede_lock.
This patch fixes and improves several state/lock interactions:
- The internal state should only be checked while locked.
- While holding lock, it's preferable to check state rather than
the netdevice's state.
- The reload sequence is not 'atomic' - unload and subsequent load
are not in the same critical section.
This also add the 'locked' variant for the reload, which would later be
used by XDP - useful in the case where the correct sequence is 'lock,
check state and re-configure if good', instead of allowing the reload
itself to make the decision regarding the configurability of the device.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Tue, 29 Nov 2016 14:47:04 +0000 (16:47 +0200)]
qede: Refactor data-path Rx flow
Driver's NAPI poll is using a long sequence for processing ingress
packets, and it's going to get even longer once we do XDP.
Break down the main loop into a series of sub-functions to allow
better readability of the function.
While we're at it, correct the accounting of the NAPI budget -
currently we're counting only packets passed to the stack against
the budget, even in case those are actually aggregations.
After refactoring every CQE processed would be counted against the budget.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>