This change moves a multi-line register setting into a function
which simplifies reading the flow of the enable function.
This also fixes a bug where the enable function was enabling
the interrupt twice while trying to update the two interrupt
throttle rate thresholds for Rx and Tx.
Change-ID: Ie308f9d0d48540204590cb9d7a5a7b1196f959bb Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Mon, 28 Sep 2015 18:16:50 +0000 (14:16 -0400)]
i40evf: don't give up
When the VF driver is unable to communicate with the PF, it just gives
up and never tries again. Aside from the obvious character flaw that
this shows, it's also a lousy user experience.
When PF communications fail, wait five seconds, and try again. And
again. Don't give up, little VF driver! Your prince will come!
Change-ID: Ia1378a39879883563b8faffce819f375821f9585 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Tue, 29 Sep 2015 22:19:50 +0000 (15:19 -0700)]
i40e/i40evf: use napi_schedule_irqoff()
The i40e_intr and i40e/i40evf_msix_clean_rings functions run from hard
interrupt context or with interrupts already disabled in netpoll.
They can use napi_schedule_irqoff() instead of napi_schedule()
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Acquire NVM, before issuing an AQ read nvm command for X722.
We need to acquire the NVM before issuing an AQ read to the NVM
otherwise we will get EBUSY from the FW. Also release when done.
This fixes the two X722 issues with respect to eeprom checksum verify
and reading NVM version info.
With this patch in place, i40e driver will provide basic support
for X722 devices.
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com> Acked-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch introduces a spinlock which is to be used for synchronizing
access to VSI's MAC filter list.
This patch also synchronizes execution of other codepaths which are
accessing VSI's MAC filter list with execution of
service_task:sync_vsi_filters.
In function i40e_add_vsi, copied out LAA MAC address instead of cloning
MAC filter entry because only MAC address is needed to remove MAC VLAN
filter from FW/HW.
Change-ID: I0e10ac7c715d44aa994239642aa4d57c998573a2 Signed-off-by: Kiran Patil <kiran.patil@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next
tree. Most relevantly, updates for the nfnetlink_log to integrate with
conntrack, fixes for cttimeout and improvements for nf_queue core, they are:
1) Remove useless ifdef around static inline function in IPVS, from
Eric W. Biederman.
2) Simplify the conntrack support for nfnetlink_queue: Merge
nfnetlink_queue_ct.c file into nfnetlink_queue_core.c, then rename it back
to nfnetlink_queue.c
3) Use y2038 safe timestamp from nfnetlink_queue.
4) Get rid of dead function definition in nf_conntrack, from Flavio
Leitner.
5) Attach conntrack support for nfnetlink_log.c, from Ken-ichirou MATSUZAWA.
This adds a new NETFILTER_NETLINK_GLUE_CT Kconfig switch that
controls enabling both nfqueue and nflog integration with conntrack.
The userspace application can request this via NFULNL_CFG_F_CONNTRACK
configuration flag.
6) Remove unused netns variables in IPVS, from Eric W. Biederman and
Simon Horman.
7) Don't put back the refcount on the cttimeout object from xt_CT on success.
8) Fix crash on cttimeout policy object removal. We have to flush out
the cttimeout extension area of the conntrack not to refer to an unexisting
object that was just removed.
9) Make sure rcu_callback completion before removing nfnetlink_cttimeout
module removal.
10) Fix compilation warning in br_netfilter when no nf_defrag_ipv4 and
nf_defrag_ipv6 are enabled. Patch from Arnd Bergmann.
11) Autoload ctnetlink dependencies when NFULNL_CFG_F_CONNTRACK is
requested. Again from Ken-ichirou MATSUZAWA.
12) Don't use pointer to previous hook when reinjecting traffic via
nf_queue with NF_REPEAT verdict since it may be already gone. This
also avoids a deadloop if the userspace application keeps returning
NF_REPEAT.
13) A bunch of cleanups for netfilter IPv4 and IPv6 code from Ian Morris.
14) Consolidate logger instance existence check in nfulnl_recv_config().
15) Fix broken atomicity when applying configuration updates to logger
instances in nfnetlink_log.
16) Get rid of the .owner attribute in our hook object. We don't need
this anymore since we're dropping pending packets that have escaped
from the kernel when unremoving the hook. Patch from Florian Westphal.
17) Remove unnecessary rcu_read_lock() from nf_reinject code, we always
assume RCU read side lock from .call_rcu in nfnetlink. Also from Florian.
18) Use static inline function instead of macros to define NF_HOOK() and
NF_HOOK_COND() when no netfilter support in on, from Arnd Bergmann.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sowmini found hang with rds-ping while testing RDS over TCP. Its
a corner case and doesn't happen always. The issue is not reproducible
with IB transport. Its clear from below dump why we see it with RDS TCP.
This happens because rds_send_xmit() chain wants to take
sock_lock which is already taken by tcp_v4_rcv() on its
way to rds_tcp_data_ready(). Commit db6526dcb51b ("RDS: use
rds_send_xmit() state instead of RDS_LL_SEND_FULL") which
was trying to opportunistically finish the send request
in same thread context.
But because of above recursive lock hang with RDS TCP,
the send work from rds_send_pong() needs to deferred to
worker to avoid lock up. Given RDS ping is more of connectivity
test than performance critical path, its should be ok even
for transport like IB.
Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: Santosh Shilimkar <ssantosh@kernel.org> Signed-off-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add missing rule to export mpls iptunnel header needed by iproute2
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 16 Oct 2015 10:00:51 +0000 (12:00 +0200)]
net: hix5hd2_gmac: avoid integer overload warning
BITS_RX_EN is an 'unsigned long' constant, so the ones complement of that
has bits set that do not fit into a 32-bit variable on 64-bit architectures,
which causes a harmless gcc warning:
drivers/net/ethernet/hisilicon/hix5hd2_gmac.c: In function 'hix5hd2_port_disable':
drivers/net/ethernet/hisilicon/hix5hd2_gmac.c:374:2: warning: large integer implicitly truncated to unsigned type [-Woverflow]
writel_relaxed(~(BITS_RX_EN | BITS_TX_EN), priv->base + PORT_EN);
This adds a cast to (u32) to tell gcc that the code is indeed fine.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 16 Oct 2015 09:33:49 +0000 (11:33 +0200)]
net: hisilicon: add OF dependency
The HNS MDIO driver fails to build on older ARM machines that are not
yet converted to CONFIG_OF:
drivers/net/ethernet/hisilicon/hns_mdio.c: In function 'hns_mdio_bus_name':
drivers/net/ethernet/hisilicon/hns_mdio.c:405:14: error: 'OF_BAD_ADDR' undeclared (first use in this function)
u64 taddr = OF_BAD_ADDR;
^
drivers/net/ethernet/hisilicon/hns_mdio.c:405:14: note: each undeclared identifier is reported only once for each function it appears in
drivers/net/ethernet/hisilicon/hns_mdio.c:409:11: error: implicit declaration of function 'of_translate_address' [-Werror=implicit-function-declaration]
taddr = of_translate_address(np, addr);
^
This clarifies the dependency to ensure we don't attempt to build these
drivers without CONFIG_OF, but also adds a COMPILE_TEST alternative to
give us better build coverage testing.
Build-tested on x86 as well to ensure this actually works.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 16 Oct 2015 09:30:56 +0000 (11:30 +0200)]
net: hisilicon: include linux/vmalloc.h in dsaf
Some configurations fail to build the hns dsaf code because of
a missing header file:
ethernet/hisilicon/hns/hns_dsaf_main.c: In function 'hns_dsaf_init':
ethernet/hisilicon/hns/hns_dsaf_main.c:1096:2: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration]
priv->soft_mac_tbl = vzalloc(sizeof(*priv->soft_mac_tbl)
This adds the correct #include.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 19 Oct 2015 02:57:12 +0000 (19:57 -0700)]
Merge branch 'hns-fixes'
yankejian says:
====================
net: hns: fixes two bugs in hns driver
This patchset fixes two bugs in hns driver.
- fixes timeout when received pause frame from the connective ports
- should be set by using ethtool -s when the devices are link down
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
lisheng [Fri, 16 Oct 2015 09:03:20 +0000 (17:03 +0800)]
net: hns: fixes a bug about timeout by pause frame
this patch fixes the bug triggered timeout sequence. when the connective
ports cannot accept the packets with higher speed, they will send out the
pause frame to the Soc's mac. At that time, the driver resets the relevant
of the Soc, then it causes the packets cannot be sent out immediately.
this patch fixes the issue.
Signed-off-by: yankejian <yankejian@huawei.com> Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: lisheng <lisheng011@huawei.com> Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chenny Xu [Fri, 16 Oct 2015 09:03:19 +0000 (17:03 +0800)]
net: hns: fixes the issue by using ethtool -s
before this patch, hns driver only permits user to set the net device
by using ethtool -s when the device is link up. it is obviously not so
good. it needs to be set no matter it is link up or down. so this patch
fixes this issue.
Signed-off-by: yankejian <yankejian@huawei.com> Signed-off-by: Yisen Zhuang <yisen.zhuang@huawei.com> Signed-off-by: lisheng <lisheng011@huawei.com> Signed-off-by: lipeng <lipeng321@huawei.com> Signed-off-by: Chenny Xu <chenny.xu@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 19 Oct 2015 02:54:45 +0000 (19:54 -0700)]
Merge branch 'hsi-fixes'
huangdaode says:
====================
net: hisilicon fix some bugs in HNS drivers
This patchset fixes the two bugs in HNS driver, one is remove the hnae sysfs interface
according to the review comments from Arnd Bergmann <arnd@arndb.de>, another
is fixing the wrong mac_id judgement bug which is found during internal tests.
change log:
v3:
remove the hnae sysfs interface.
v2:
1) remove first bug fix, which is fixed in another patch submitted by
Arnd Bergmann <arnd@arndb.de>
2) change the code sytyle according to Joe.
v1:
initial version.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
huangdaode [Fri, 16 Oct 2015 03:54:17 +0000 (11:54 +0800)]
net: hisilicon fix a bug on Hisilicon Network Subsystem
This patch fixes the wrong judgement of mac_id when get port num.
Signed-off-by: huangdaode <huangdaode@hisilicon.com> Signed-off-by: yankejian <yankejian@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The 'bcm-phy-lib.c', added as a part of the commit
"net: phy: Add Broadcom phy library for common interfaces"
was missing the module license. This was causing an issue
when the library is built as a module; "module license
'unspecified' taints kernel".
This patch fixes the issue by adding the module license,
author and description to the bcm-phy-lib.c file.
Fixes: a1cba5613edf5 ("net: phy: Add Broadcom phy library for
common interfaces") Signed-off-by: Arun Parameswaran <arunp@broadcom.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Li RongQing [Thu, 15 Oct 2015 08:54:36 +0000 (16:54 +0800)]
ipconfig: send Client-identifier in DHCP requests
A dhcp server may provide parameters to a client from a pool of IP
addresses and using a shared rootfs, or provide a specific set of
parameters for a specific client, usually using the MAC address to
identify each client individually. The dhcp protocol also specifies
a client-id field which can be used to determine the correct
parameters to supply when no MAC address is available. There is
currently no way to tell the kernel to supply a specific client-id,
only the userspace dhcp clients support this feature, but this can
not be used when the network is needed before userspace is available
such as when the root filesystem is on NFS.
This patch is to be able to do something like "ip=dhcp,client_id_type,
client_id_value", as a kernel parameter to enable the kernel to
identify itself to the server.
Signed-off-by: Li RongQing <roy.qing.li@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
This merge resolves conflicts with 75aec9df3a78 ("bridge: Remove
br_nf_push_frag_xmit_sk") as part of Eric Biederman's effort to improve
netns support in the network stack that reached upstream via David's
net-next tree.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Conflicts:
net/bridge/br_netfilter_hooks.c
Arnd Bergmann [Fri, 9 Oct 2015 18:45:42 +0000 (20:45 +0200)]
netfilter: turn NF_HOOK into an inline function
A recent change to the dst_output handling caused a new warning
when the call to NF_HOOK() is the only used of a local variable
passed as 'dev', and CONFIG_NETFILTER is disabled:
net/ipv6/ip6_output.c: In function 'ip6_output':
net/ipv6/ip6_output.c:135:21: warning: unused variable 'dev' [-Wunused-variable]
The reason for this is that the NF_HOOK macro in this case does
not reference the variable at all, and the call to dev_net(dev)
got removed from the ip6_output function. To avoid that warning now
and in the future, this changes the macro into an equivalent
inline function, which tells the compiler that the variable is
passed correctly but still unused.
The dn_forward function apparently had the same problem in
the past and added a local workaround that no longer works
with the inline function. In order to avoid a regression, we
have to also remove the #ifdef from decnet in the same patch.
Fixes: ede2059dbaf9 ("dst: Pass net into dst->output") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David S. Miller [Fri, 16 Oct 2015 14:15:31 +0000 (07:15 -0700)]
Merge branch 'mlxsw-spectrum'
Jiri Pirko says:
====================
mlxsw: Driver update, add initial support for Spectrum ASIC
Purpose of this patchset is to introduce initial support for Mellanox
Spectrum ASIC, including L2 bridge forwarding offload.
The only non-mlxsw patch in this patchset is the first one, introducing
pre-change upper notifier. That is used in last patch to ensure ports of
single ASIC are not bridged into multiple bridges, as that scenario is
currently not supported by driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 16 Oct 2015 12:01:37 +0000 (14:01 +0200)]
mlxsw: spectrum: Add initial support for Spectrum ASIC
Add support for new generation Mellanox Spectrum ASIC, 10/25/40/50 and
100Gb/s Ethernet Switch.
The initial driver implements bridge forwarding offload including
bridge internal VLAN support, FDB static entries, FDB learning and
HW ageing including their setup.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Elad Raz <eladr@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 16 Oct 2015 12:01:36 +0000 (14:01 +0200)]
mlxsw: reg: Add Switch Port VLAN MAC Learning register definition
Since we currently do not support the offloading of 802.1D bridges, we
need to be able to let the device know it should not learn MAC addresses
on specific {Port, VID} pairs.
Add the SPVMLR register, which controls the learning enablement of
{Port, VID} pairs.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 16 Oct 2015 12:01:25 +0000 (14:01 +0200)]
mlxsw: cmd: Introduce FID-offset flooding tables
Packets destined to offloaded netdevs will be classified to FIDs in the
device and flooded in case of BUM.
The flooding table used is of type FID-offset, which allows one to
create different flooding domains for different FIDs and specify the
offset in the flooding table for each FID (not necessarily equal to FID
or VID).
Add support for this flooding table type, by exposing the configuration
of the number of tables from this type and their size.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Fri, 16 Oct 2015 12:01:24 +0000 (14:01 +0200)]
mlxsw: cmd: Introduce per-FID flooding tables
In the newly introduced Spectrum switch ASIC, packets destined to not
offloaded netdevs will be classified to special FIDs (vFIDs) in the
device and flooded to the CPU port.
The flooding table used is of type per-FID, which allows one to create
different flooding domains for different vFIDs.
While using a simple single-entry flood table is certainly sufficient at
this point, we do plan to offload 802.1D bridges involving VLAN
interfaces, thus making this change necessary.
Add support for this flooding table type, by exposing the configuration
of the number of tables from this type and their size.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 16 Oct 2015 12:01:22 +0000 (14:01 +0200)]
net: introduce pre-change upper device notifier
This newly introduced netdevice notifier is called before actual change
upper happens. That provides a possibility for notifier handlers to
know upper change will happen and react to it, including possibility to
forbid the change. That is valuable for drivers which can check if the
upper device linkage is supported and forbid that in case it is not.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 16 Oct 2015 13:41:10 +0000 (06:41 -0700)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
Intel Wired LAN Driver Updates 2015-10-16
This series contains updates to e1000, e1000e, igb, igbvf, ixgbe, ixgbevf,
i40e, i40evf and fm10k.
Alex Duyck fixes the polling routine for i40e/i40evf were the NAPI budget
for receive cleanup was being rounded up to 1 but the netpoll call was
expecting no Rx to be processed as the budget passed was 0. Also cleaned
up IN_NETPOLL flag that was not adding any value due to the receive
cleanup was handled in NAPI. Added support for netpoll for i40evf as
well.
Jesse updates all of our drivers to use napi_complete_done() instead of
napi_complete(), which allows us to use
/sys/class/net/ethX/gro_flush_timeout. Added ethtool support to control
and report the new Interrupt Limit register, since the XL710 hardware
has a different interrupt moderation design that can support a limit of
total interrupts per second per vector.
Shannon cleans up startup log entries to cut down the number by putting
a couple behind debug flags and combining others into single line. Added
support to enable/disable printing VEB statistics via ethtool.
Jingjing fixes a compile issue by adding const to functions that return
strings that are not going to be modified.
Greg Rose cleans up defines that were not used and were causing customer
confusion.
Greg Bowers adds support for setting a new bit in the Set Local LLDP MIB
admin queue command Type field.
Mitch fixes an issue where vlan_features field was set to the same value
as netdev features field, but before the features were actually being
set up, leaving the vlan_features empty. Resolve the issue by setting
up the netdev features first, then mask out the VLAN feature bits when
assigning vlan_features. Fixed VF init timing, where in some instances
the VFs would fail to initialize the first time you loaded the driver.
To correct this, increased the delay time for the init task and wait
longer before giving up.
v2: fix missing space in function header comment in patch 3, based on
feedback from Sergei Shtylyov.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Mitch Williams [Mon, 28 Sep 2015 18:12:43 +0000 (14:12 -0400)]
i40e: increase AQ work limit
With 64 VFs, we can easily overwhelm the AQ on the PF if we have too low
a limit on the number of AQ requests. This leads to ARQ overflow errors,
and occasionally VFs that fail to initialize.
Since we really only hit this condition on initial VF driver load, the
requests that we process are lightweight, so this extra work doesn't
cause problems for the PF driver.
Change-ID: I620221520d8af987df6ace9ba938ffaf22107681 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Mon, 28 Sep 2015 18:12:42 +0000 (14:12 -0400)]
i40evf: relax and stagger init timing a bit
On some devices, in some systems, in some configurations, the VFs would
fail to initialize the first time you loaded the driver.
To correct this, increase the delay time for the init task slightly, and
wait longer before giving up.
If we enable VFs and load the VF driver in the same kernel as the PF
driver, we can totally overwhelm the PF driver with AQ requests because
all of the instances try to initialize at the same time.
To help alleviate this, stagger the initial scheduling of the init task
using the PCIe function as a multiplier. We mask off the function to
only three bits so no instance has to wait too long.
With these two changes, initializing 128 VFs on a single device goes
from four minutes to just a few seconds.
Change-ID: If3d8720c1c4e838ab36d8781d9ec295a62380936 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40e: Recognize 1000Base_T_Optical phy type when link is up
1000Base_T_Optical got added to the function that figures out what
is supported when link is down but not when link is up. Add it in there
too so that we display the correct information.
Change-ID: I85ebcdfa7c02d898c44c673b1500552a53c8042e Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mitch Williams [Mon, 28 Sep 2015 18:12:40 +0000 (14:12 -0400)]
i40evf: correctly populate vlan_features
The vlan_features field was correctly being set to the same value as the
netdev features field. However, this was being done before the features
were actually being set up, leaving the vlan_features empty.
Also, after a reset, vlan_features will be incorrectly assigned the
previous netdev feature flags, which can contain VLAN feature bits. This
makes the VLAN code angry and will cause a stack dump.
To fix these issues, set up the netdev features first, then mask out the
VLAN feature bits when assigning vlan_features.
Change-ID: Ib0548869dc83cf6a841cb8697dd94c12359ba4d2 Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
i40e: reset the invalid msg counter in vf when a valid msg is received
When the number of invalid messages from a VF is exceeded, the VF
will be disabled, due to the invalid messages. This happens if
other VF drivers (like DPDK) send a message through the driver's
mailbox (aka virtchannel) interface, but the message is not
supported by the i40e pf driver, such as CONFIG_PROMISCUOUS_MODE.
This patch changes the num_invalid_msgs in struct i40e_vf to record
the continuous invalid msgs, and it will be reset when a valid msg
is received.
Change-ID: Iaec42fd3dcdd281476b3518be23261dd46fc3718 Signed-off-by: Jingjing Wu <jingjing.wu@intel.com> Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The XL710 hardware has a different interrupt moderation design
that can support a limit of total interrupts per second per
vector, in addition to the "number of interrupts per second"
controls already established in the driver. This combination
of hardware features allows us to set very low default latency
settings but minimize the total CPU utilization by not
making too many interrupts, should the user desire.
The current driver implementation is still enabling the dynamic
moderation in the driver, and only using the rx/tx-usecs
limit in ethtool to limit the interrupt rate per second, by default.
The new code implemented in this patch
2) adds init/use of the new "Interrupt Limit" register
3) adds ethtool knob to control/report the limits above
Usage is ethtool -C ethx rx-usecs-high <value> Where <value> is number
of microseconds to create a rate of 1/N interrupts per second,
regardless of rx-usecs or tx-usecs values. Since there is a credit based
scheme in the hardware, the rx-usecs and tx-usecs can be configured for
very low latency for short bursts, but once the credit runs out the
refill rate on the credits is limited by rx-usecs-high.
Change-ID: I3a1075d3296123b0f4f50623c779b027af5b188d Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Adds support for setting a new bit in the Set Local LLDP MIB AQ command
Type field. When set to 1, the bit indicates to FW that Apps should be
treated as non-willing. When 0, FW behaves as before.
Change-ID: I0d2101c1606c59c7188d3e6a0c7810e0f205233a Signed-off-by: Greg Bowers <gregory.j.bowers@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Mon, 28 Sep 2015 18:12:34 +0000 (14:12 -0400)]
i40e: priv flag for controlling VEB stats
Add an ethtool priv flag to enable and disable printing
the VEB statistics.
Change-ID: I7654054a3a73b08aa8310d94ee8fce6219107dd8 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Greg Rose [Mon, 28 Sep 2015 18:12:33 +0000 (14:12 -0400)]
i40e: Removed unused defines
Two defines that are not used are causing customer confusion - remove
them.
Change-ID: Icef0325aca8e0f4fcdfc519e026bdd375e791200 Signed-off-by: Greg Rose <gregory.v.rose@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Mon, 28 Sep 2015 18:12:32 +0000 (14:12 -0400)]
i40e: remove read/write failed messages from nvmupdate
Allow the nvmupdate application to decide when a read or write error
should be exposed to the user. Since the application needs to use
write probes to find the ReadOnly sections on a potentially unknown NVM
version in the HW and read probes to check the status of the last write,
some error messages are expected, but need not be shown to the users.
The driver doesn't know which are ignorable from real errors, so needs
to let the application make the decision.
Change-ID: I78fca8ab672bede11c10c820b83c26adfd536d03 Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Shannon Nelson [Mon, 28 Sep 2015 18:12:30 +0000 (14:12 -0400)]
i40e: generate fewer startup messages
Cut down on the number of startup log entries by putting a couple behind
debug flags and combining a couple others into a single line.
Change-ID: I708089f086308f84d43f8b6f0e8a634a02d058fb Signed-off-by: Shannon Nelson <shannon.nelson@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
As per Eric Dumazet's previous patches:
(see commit (24d2e4a50737) - tg3: use napi_complete_done())
Quoting verbatim:
Using napi_complete_done() instead of napi_complete() allows
us to use /sys/class/net/ethX/gro_flush_timeout
GRO layer can aggregate more packets if the flush is delayed a bit,
without having to set too big coalescing parameters that impact
latencies.
</end quote>
Tested
configuration: low latency via ethtool -C ethx adaptive-rx off
rx-usecs 10 adaptive-tx off tx-usecs 15
workload: streaming rx using netperf TCP_MAERTS
igb:
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo
...
Interim result: 941.48 10^6bits/s over 1.000 seconds ending at 1440193171.589
Alignment Offset Bytes Bytes Recvs Bytes Sends
Local Remote Local Remote Xfered Per Per
Recv Send Recv Send Recv (avg) Send (avg)
8 8 0 0 1176930056 1475.36 797726 16384.00 71905
MIGRATED TCP MAERTS TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 10.0.0.1 () port 0 AF_INET : demo
...
Interim result: 941.49 10^6bits/s over 0.997 seconds ending at 1440193142.763
Alignment Offset Bytes Bytes Recvs Bytes Sends
Local Remote Local Remote Xfered Per Per
Recv Send Recv Send Recv (avg) Send (avg)
8 8 0 0 1175182320 50476.00 23282 16384.00 71816
i40e:
Hard to test because the traffic is incoming so fast (24Gb/s) that GRO
always receives 87kB, even at the highest interrupt rate.
Other drivers were only compile tested.
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Thu, 24 Sep 2015 16:04:38 +0000 (09:04 -0700)]
i40evf: Add support for netpoll
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Thu, 24 Sep 2015 16:04:32 +0000 (09:04 -0700)]
i40e/i40evf: Drop useless "IN_NETPOLL" flag
The code in i40e and i40evf is using an "IN_NETPOLL" flag that has never
added any value due to the fact that the Rx clean-up is handled in NAPI.
As such the flag was set, the queue was scheduled via NAPI, and then polled
from the netpoll controller and if any Rx packets were processed the were
processed in the wrong context.
In addition the flag itself just added an unneeded conditional to the
hot-path so it can safely be dropped and save us a few instructions.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Thu, 24 Sep 2015 16:04:26 +0000 (09:04 -0700)]
i40e/i40evf: Fix handling of napi budget
The polling routine for i40e was rounding up the budget for Rx cleanup to
1. This is incorrect as the netpoll poll call is expecting no Rx to be
processed as the budget passed was 0.
Signed-off-by: Alexander Duyck <aduyck@mirantis.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
It looks like all of the code paths to fib_rebalance are under rtnl.
Fixes: 0e884c78ee19 ("ipv4: L3 hash-based multipath") Cc: Peter Nørlund <pch@ordbogen.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Wed, 14 Oct 2015 21:40:44 +0000 (14:40 -0700)]
bpf: Need to call bpf_prog_uncharge_memlock from bpf_prog_put
Currently, is only called from __prog_put_rcu in the bpf_prog_release
path. Need this to call this from bpf_prog_put also to get correct
accounting.
Fixes: aaac3ba95e4c8b49 ("bpf: charge user for creation of BPF maps and programs") Signed-off-by: Tom Herbert <tom@herbertland.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 16 Oct 2015 07:52:27 +0000 (00:52 -0700)]
Merge branch 'robust_listener'
Eric Dumazet says:
====================
tcp/dccp: make our listener code more robust
This patch series addresses request sockets leaks and listener dismantle
phase. This survives a stress test with listeners being added/removed
quite randomly.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Let's reduce the confusion about inet_csk_reqsk_queue_drop() :
In many cases we also need to release reference on request socket,
so add a helper to do this, reducing code size and complexity.
Fixes: 4bdc3d66147b ("tcp/dccp: fix behavior of stale SYN_RECV request sockets") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
At the time of above commit, tcp_req_err() and dccp_req_err()
were dead code, as SYN_RECV request sockets were not yet in ehash table.
Real bug was fixed later in a different commit.
We need to revert to not leak a refcount on request socket.
inet_csk_reqsk_queue_drop_and_put() will be added
in following commit to make clean inet_csk_reqsk_queue_drop()
does not release the reference owned by caller.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Thu, 15 Oct 2015 19:28:52 +0000 (21:28 +0200)]
drivers/net: get rid of unnecessary initializations in .get_drvinfo()
Many drivers initialize uselessly n_priv_flags, n_stats, testinfo_len,
eedump_len & regdump_len fields in their .get_drvinfo() ethtool op.
It's not necessary as these fields is filled in ethtool_get_drvinfo().
v2: removed unused variable
v3: removed another unused variable
Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 15 Oct 2015 18:52:46 +0000 (14:52 -0400)]
tipc: update node FSM when peer RESET message is received
The change made in the previous commit revealed a small flaw in the way
the node FSM is updated. When the function tipc_node_link_down() is
called for the last link to a node, we should check whether this was
caused by a local reset or by a received RESET message from the peer.
In the latter case, we can directly issue a PEER_LOST_CONTACT_EVT to
the node FSM, so that it is ready to re-establish contact. If this is
not done, the peer node will sometimes have to go through a second
establish cycle before the link becomes stable.
We fix this in this commit by conditionally issuing the mentioned
event in the function tipc_node_link_down(). We also move LINK_RESET
FSM even away from the link_reset() function and into the caller
function, partially because it is easier to follow the code when state
changes are gathered at a limited number of locations, partially
because there will be cases in future commits where we don't want the
link to go RESET mode when link_reset() is called.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 15 Oct 2015 18:52:45 +0000 (14:52 -0400)]
tipc: send out RESET immediately when link goes down
When a link is taken down because of a node local event, such as
disabling of a bearer or an interface, we currently leave it to the
peer node to discover the broken communication. The default time for
such failure discovery is 1.5-2 seconds.
If we instead allow the terminating link endpoint to send out a RESET
message at the moment it is reset, we can achieve the impression that
both endpoints are going down instantly. Since this is a very common
scenario, we find it worthwhile to make this small modification.
Apart from letting the link produce the said message, we also have to
ensure that the interface is able to transmit it before TIPC is
detached. We do this by performing the disabling of a bearer in three
steps:
1) Disable reception of TIPC packets from the interface in question.
2) Take down the links, while allowing them so send out a RESET message.
3) Disable transmission of TIPC packets on the interface.
Apart from this, we now have to react on the NETDEV_GOING_DOWN event,
instead of as currently the NEDEV_DOWN event, to ensure that such
transmission is possible during the teardown phase.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 15 Oct 2015 18:52:44 +0000 (14:52 -0400)]
tipc: delay ESTABLISH state event when link is established
Link establishing, just like link teardown, is a non-atomic action, in
the sense that discovering that conditions are right to establish a link,
and the actual adding of the link to one of the node's send slots is done
in two different lock contexts. The link FSM is designed to help bridging
the gap between the two contexts in a safe manner.
We have now discovered a weakness in the implementaton of this FSM.
Because we directly let the link go from state LINK_ESTABLISHING to
state LINK_ESTABLISHED already in the first lock context, we are unable
to distinguish between a fully established link, i.e., a link that has
been added to its slot, and a link that has not yet reached the second
lock context. It may hence happen that a manual intervention, e.g., when
disabling an interface, causes the function tipc_node_link_down() to try
removing the link from the node slots, decrementing its active link
counter etc, although the link was never added there in the first place.
We solve this by delaying the actual state change until we reach the
second lock context, inside the function tipc_node_link_up(). This
makes it possible for potentail callers of __tipc_node_link_down() to
know if they should proceed or not, and the problem is solved.
Unforunately, the situation described above also has a second problem.
Since there by necessity is a tipc_node_link_up() call pending once
the node lock has been released, we must defuse that call by setting
the link back from LINK_ESTABLISHING to LINK_RESET state. This forces
us to make a slight modification to the link FSM, which will now look
as follows.
Jon Paul Maloy [Thu, 15 Oct 2015 18:52:43 +0000 (14:52 -0400)]
tipc: disallow packet duplicates in link deferred queue
After the previous commits, we are guaranteed that no packets
of type LINK_PROTOCOL or with illegal sequence numbers will be
attempted added to the link deferred queue. This makes it possible to
make some simplifications to the sorting algorithm in the function
tipc_skb_queue_sorted().
We also alter the function so that it will drop packets if one with
the same seqeunce number is already present in the queue. This is
necessary because we have identified weird packet sequences, involving
duplicate packets, where a legitimate in-sequence packet may advance to
the head of the queue without being detected and de-queued.
Finally, we make this function outline, since it will now be called only
in exceptional cases.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 15 Oct 2015 18:52:42 +0000 (14:52 -0400)]
tipc: improve sequence number checking
The sequence number of an incoming packet is currently only checked
for less than, equality to, or bigger than the next expected number,
meaning that the receive window in practice becomes one half sequence
number cycle, or U16_MAX/2. This does not make sense, and may not even
be safe if there are extreme delays in the network. Any packet sent by
the peer during the ongoing cycle must belong inside his current send
window, or should otherwise be dropped if possible.
Since a link endpoint cannot know its peer's current send window, it
has to base this sanity check on a worst-case assumption, i.e., that
the peer is using a maximum sized window of 8191 packets. Using this
assumption, we now add a check that the sequence number is not bigger
than next_expected + TIPC_MAX_LINK_WIN. We also re-order the checks
done, so that the receive window test is performed before the gap test.
This way, we are guaranteed that no packet with illegal sequence numbers
are ever added to the deferred queue.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 15 Oct 2015 18:52:41 +0000 (14:52 -0400)]
tipc: simplify tipc_link_rcv() reception loop
Currently, all packets received in tipc_link_rcv() are unconditionally
added to the packet deferred queue, whereafter that queue is walked and
all its buffers evaluated for delivery. This is both non-optimal and
and makes the queue sorting function unnecessary complex.
This commit changes the loop so that an arrived packet is evaluated
first, and added to the deferred queue only when a sequence number gap
is discovered. A non-empty deferred queue is walked until it is empty
or until its head's sequence number doesn't fit.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 15 Oct 2015 18:52:40 +0000 (14:52 -0400)]
tipc: limit usage of temporary skb list during packet reception
During packet reception, the function tipc_link_rcv() adds its accepted
packets to a temporary buffer queue, before finally splicing this queue
into the lock protected input queue that will be delivered up to the
socket layer. The purpose is to reduce potential contention on the input
queue lock. However, since the vast majority of packets arrive in
sequence, they will anyway be added one by one to the input queue, and
the use of the temporary queue becomes a sub-optimization.
The only case where this queue makes sense is when unpacking buffers
from a bundle packet; here we want to avoid dozens of small buffers
to be added individually to the lock-protected input queue in a tight
loop.
In this commit, we remove the general usage of the temporary queue,
and keep it only for the packet unbundling case.
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Insu Yun [Thu, 15 Oct 2015 16:24:09 +0000 (12:24 -0400)]
mlx4: corretly check failed allocation
When allocation fails, mlx4_alloc_cmd_mailbox returns -ENOMEM.
Since there is no case that mlx4_alloc_cmd_mailbox returns NULL,
it needs to be checked by IS_ERR, not IS_ERR_OR_NULL
Signed-off-by: Insu Yun <wuninsu@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 15 Oct 2015 16:22:11 +0000 (09:22 -0700)]
bonding: support encapsulated ipv6 TSO
If using a sixtofour device on top of a bonding device,
skb segmentation of TCP traffic is done right before calling
bonding xmit, because bonding only enables TSO for IPv4.
This patch improves single flow performance by about 120 % on my hosts,
because segmentation is deferred right before calling slave xmit.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 15 Oct 2015 15:43:28 +0000 (17:43 +0200)]
mlxsw: Add trap group for control packets
Previously, we trapped flooded and control packets using the same trap
group. This can cause flooded packets to overflow the PCI bus and
prevent control packets (e.g. STP, LACP) from getting to the CPU.
Solve this by splitting the RX trap group to RX and control, which allows
us to configure a policer on the first, thereby preventing it from
overflowing the PCI bus.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 15 Oct 2015 15:43:27 +0000 (17:43 +0200)]
mlxsw: Simplify traps creation
The Host Trap Group Table (HTGT) register configures trap groups, which
are populated with trap IDs using the Host PacKet Trap (HPKT) register.
However, a trap ID can only be present inside one trap group (the last
configured).
Instead of passing both the trap group and ID for the function that
packs HPKT, pass only the trap ID and derive from it the trap group.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 15 Oct 2015 15:43:20 +0000 (17:43 +0200)]
mlxsw: pci: Limit number of entries being sent in single MAP_FA cmd
Firmware accepts only limited number of mapping entries for MAP_FA
command. In order to prevent overflow, introduce a limit and in case the
number of entries is bigger, call MAP_FA multiple times.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Wed, 14 Oct 2015 17:37:32 +0000 (12:37 -0500)]
amd-xgbe: Use system workqueue for device restart
A previous patch switched from using the system workqueue to the device
workqueue for various operations. During a device restart the device
workqueue is flushed so the restart cannot use this workqueue or else
a deadlock results. Move the device restart back to using the system
workqueue.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>