Ido Schimmel [Sat, 3 Dec 2016 15:45:00 +0000 (16:45 +0100)]
mlxsw: core: Create an ordered workqueue for FIB offload
We're going to start processing FIB entries addition / deletion events
in deferred work. These work items must be processed in the order they
were submitted or otherwise we can have differences between the kernel's
FIB table and the device's.
Solve this by creating an ordered workqueue to which these work items
will be submitted to. Note that we can't simply convert the current
workqueue to be ordered, as EMADs re-transmissions are also processed in
deferred work.
Later on, we can migrate other work items to this workqueue, such as FDB
notification processing and nexthop resolution, since they all take the
same lock anyway.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Sat, 3 Dec 2016 15:44:58 +0000 (16:44 +0100)]
ipv4: fib: Export free_fib_info()
The FIB notification chain is going to be converted to an atomic chain,
which means switchdev drivers will have to offload FIB entries in
deferred work, as hardware operations entail sleeping.
However, while the work is queued fib info might be freed, so a
reference must be taken. To release the reference (and potentially free
the fib info) fib_info_put() will be called, which in turn calls
free_fib_info().
Export free_fib_info() so that modules will be able to invoke
fib_info_put().
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Sat, 3 Dec 2016 18:36:01 +0000 (10:36 -0800)]
act_mirred: fix a typo in get_dev
Fixes: 255cb30425c0 ("net/sched: act_mirred: Add new tc_action_ops get_dev()") Cc: Hadar Hen Zion <hadarh@mellanox.com> Cc: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 4 Dec 2016 00:10:48 +0000 (19:10 -0500)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2016-12-02
This series contains updates to i40e and i40evf only.
Alex provides changes so that we are much more robust about defining what
we can and cannot offload in i40e and i40evf by doing additional checks
other than L4 tunnel header length.
Jake provides several fixes/changes, first cleaning up a label that is
unnecessary, as well as cleaned up the use of a "magic number". Clarified
the code by separating the global private flags and the regular private
flags per interface into two arrays, so that future additions will not
produce duplication and buggy code. Adds additional checks to protect
against NULL values for msix_entries and q_vectors pointers.
Michal adds Clause22 method for accessing registers for some external
PHYs.
Piotr adds additional protocol support for the admin queue discover
capabilities function.
Tushar Dave fixes a panic seen on SPARC, where writel() should not be
used to write directly to a memory address but only to a memory mapped
I/O address otherwise it causes data access exceptions.
Joe Perches separates out a section of code into its own function, to
help reduce i40evf_reset_task() a bit.
Alan fixes an issue by checking for NULL before dereferencing msix_entries
and returning early in the case where it is NULL within the i40evf_close()
code path.
Henry provides code cleanup to remove unreachable and redundant sections
of code. Fixed up an issue where new NICs were not identifying "unknown
PHYs" correctly.
Harshitha fixes a issue where the ethtool "Supported Link" modes list
backplane interfaces on X722 devices for 10 GbE with SFP+ and Cortina
retimer, where these interfaces should not be visible to the user since
they cannot use them.
Carolyn changes an X722 informational message so that it only appears
when extra messages are desired.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Yuchung Cheng [Sat, 3 Dec 2016 22:46:22 +0000 (14:46 -0800)]
tcp: fix the missing avr32 SOF_TIMESTAMPING_OPT_STATS
The commit of SOF_TIMESTAMPING_OPT_STATS didn't include the
new header for avr32, causing build to break. The patch fixes it.
Fixes: 1c885808e456 ("tcp: SOF_TIMESTAMPING_OPT_STATS option for SO_TIMESTAMPING") Reported-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 2 Dec 2016 16:35:49 +0000 (17:35 +0100)]
udp: be less conservative with sock rmem accounting
Before commit 850cbaddb52d ("udp: use it's own memory accounting
schema"), the udp protocol allowed sk_rmem_alloc to grow beyond
the rcvbuf by the whole current packet's truesize. After said commit
we allow sk_rmem_alloc to exceed the rcvbuf only if the receive queue
is empty. As reported by Jesper this cause a performance regression
for some (small) values of rcvbuf.
This commit is intended to fix the regression restoring the old
handling of the rcvbuf limit.
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com> Fixes: 850cbaddb52d ("udp: use it's own memory accounting schema") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 2 Dec 2016 16:11:00 +0000 (08:11 -0800)]
net_sched: gen_estimator: account for timer drifts
Under heavy stress, timer used in estimators tend to slowly be delayed
by a few jiffies, leading to inaccuracies.
Lets remember what was the last scheduled jiffies so that we get more
precise estimations, without having to add a multiply/divide in the loop
to account for the drifts.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Edward Cree [Fri, 2 Dec 2016 15:51:33 +0000 (15:51 +0000)]
sfc: remove EFX_BUG_ON_PARANOID, use EFX_WARN_ON_[ONCE_]PARANOID instead
Logically, EFX_BUG_ON_PARANOID can never be correct. For, BUG_ON should
only be used if it is not possible to continue without potential harm;
and since the non-DEBUG driver will continue regardless (as the BUG_ON is
compiled out), clearly the BUG_ON cannot be needed in the DEBUG driver.
So, replace every EFX_BUG_ON_PARANOID with either an EFX_WARN_ON_PARANOID
or the newly defined EFX_WARN_ON_ONCE_PARANOID.
Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 3 Dec 2016 21:08:01 +0000 (16:08 -0500)]
Merge branch 'samples-bpf-automated-cgroup-tests'
Sargun Dhillon says:
====================
samples, bpf: Refactor; Add automated tests for cgroups
These two patches are around refactoring out some old, reusable code from the
existing test_current_task_under_cgroup_user test, and adding a new, automated
test.
There is some generic cgroupsv2 setup & cleanup code, given that most
environment still don't have it setup by default. With this code, we're able
to pretty easily add an automated test for future cgroupsv2 functionality.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sargun Dhillon [Fri, 2 Dec 2016 10:42:32 +0000 (02:42 -0800)]
samples, bpf: Add automated test for cgroup filter attachments
This patch adds the sample program test_cgrp2_attach2. This program is
similar to test_cgrp2_attach, but it performs automated testing of the
cgroupv2 BPF attached filters. It runs the following checks:
* Simple filter attachment
* Application of filters to child cgroups
* Overriding filters on child cgroups
* Checking that this still works when the parent filter is removed
The filters that are used here are simply allow all / deny all filters, so
it isn't checking the actual functionality of the filters, but rather
the behaviour around detachment / attachment. If net_cls is enabled,
this test will fail.
Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Sargun Dhillon [Fri, 2 Dec 2016 10:42:18 +0000 (02:42 -0800)]
samples, bpf: Refactor test_current_task_under_cgroup - separate out helpers
This patch modifies test_current_task_under_cgroup_user. The test has
several helpers around creating a temporary environment for cgroup
testing, and moving the current task around cgroups. This set of
helpers can then be used in other tests.
Signed-off-by: Sargun Dhillon <sargun@sargun.me> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
silence some of the clang compiler warnings like:
include/linux/fs.h:2693:9: warning: comparison of unsigned enum expression < 0 is always false
arch/x86/include/asm/processor.h:491:30: warning: taking address of packed member 'sp0' of class or structure 'x86_hw_tss' may result in an unaligned pointer value
include/linux/cgroup-defs.h:326:16: warning: field 'cgrp' with variable sized type 'struct cgroup' not at the end of a struct or class is a GNU extension
since they add too much noise to samples/bpf/ build.
Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Fri, 2 Dec 2016 01:21:32 +0000 (04:21 +0300)]
netns: fix net_generic() "id - 1" bloat
net_generic() function is both a) inline and b) used ~600 times.
It has the following code inside
...
ptr = ng->ptr[id - 1];
...
"id" is never compile time constant so compiler is forced to subtract 1.
And those decrements or LEA [r32 - 1] instructions add up.
We also start id'ing from 1 to catch bugs where pernet sybsystem id
is not initialized and 0. This is quite pointless idea (nothing will
work or immediate interference with first registered subsystem) in
general but it hints what needs to be done for code size reduction.
Namely, overlaying allocation of pointer array and fixed part of
structure in the beginning and using usual base-0 addressing.
Ids are just cookies, their exact values do not matter, so lets start
with 3 on x86_64.
Code size savings (oh boy): -4.2 KB
As usual, ignore the initial compiler stupidity part of the table.
This bug was introduced with commit dec827d174d7f76c457238800183ca864a639365
("[NETNS]: The generic per-net pointers.") in the glorious days of
chopping networking stack into containers proper 8.5 years ago (whee...)
How it didn't trigger for so long?
Well, you need quite specific set of conditions:
*) race window opens once per pernet subsystem addition
(read: modprobe or boot)
*) not every pernet subsystem is eligible (need ->id and ->size)
*) not every pernet subsystem is vulnerable (need incorrect or absense
of ordering of register_pernet_sybsys() and actually using net_generic())
*) to hide the bug even more, default is to preallocate 13 pointers which
is actually quite a lot. You need IPv6, netfilter, bridging etc together
loaded to trigger reallocation in the first place. Trimmed down
config are OK.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexey Dobriyan [Fri, 2 Dec 2016 00:59:06 +0000 (03:59 +0300)]
netlink: 2-clause nla_ok()
nla_ok() consists of 3 clauses:
1) int rem >= (int)sizeof(struct nlattr)
2) u16 nla_len >= sizeof(struct nlattr)
3) u16 nla_len <= int rem
The statement is that clause (1) is redundant.
What it does is ensuring that "rem" is a positive number,
so that in clause (3) positive number will be compared to positive number
with no problems.
However, "u16" fully fits into "int" and integers do not change value
when upcasting even to signed type. Negative integers will be rejected
by clause (3) just fine. Small positive integers will be rejected
by transitivity of comparison operator.
NOTE: all of the above DOES NOT apply to nlmsg_ok() where ->nlmsg_len is
u32(!), so 3 clauses AND A CAST TO INT are necessary.
Zhang Shengju [Fri, 2 Dec 2016 01:51:07 +0000 (09:51 +0800)]
staging: wilc1000: use reset to set mac header
Since offset is zero, it's not necessary to use set function. Reset
function is straightforward, and will remove the unnecessary add
operation in set function.
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zhang Shengju [Fri, 2 Dec 2016 01:51:06 +0000 (09:51 +0800)]
iwlwifi: use reset to set transport header
Since offset is zero, it's not necessary to use set function. Reset
function is straightforward, and will remove the unnecessary add
operation in set function.
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zhang Shengju [Fri, 2 Dec 2016 01:51:05 +0000 (09:51 +0800)]
mlx4: use reset to set mac header
Since offset is zero, it's not necessary to use set function. Reset
function is straightforward, and will remove the unnecessary add
operation in set function.
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zhang Shengju [Fri, 2 Dec 2016 01:51:04 +0000 (09:51 +0800)]
bnx2x: use reset to set network header
Since offset is zero, it's not necessary to use set function. Reset
function is straightforward, and will remove the unnecessary add
operation in set function.
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zhang Shengju [Fri, 2 Dec 2016 01:51:03 +0000 (09:51 +0800)]
qede: use reset to set network header
Since offset is zero, it's not necessary to use set function. Reset
function is straightforward, and will remove the unnecessary add
operation in set function.
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds get_pauseparam and set_pauseparam functions.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
drivers: net: xgene: Add flow control initialization
This patch adds flow control/pause frame initialization and
advertising capabilities.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
drivers: net: xgene: Add flow control configuration
This patch adds functions to configure mac, when flow control
and pause frame settings change.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch fixes RSS feature, for non-TCP/UDP packets.
Signed-off-by: Khuong Dinh <kdinh@apm.com> Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements ndo_change_mtu() callback function that
enables mtu change.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch adds support for jumbo frame, by allocating
additional buffer (page) pool and configuring the hardware.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
drivers: net: xgene: Configure classifier with pagepool
This patch configures classifier with the pagepool information.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This is a prepartion patch and adds xgene_enet_get_fpsel() helper
function to get buffer pool number.
Signed-off-by: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: Quan Nguyen <qnguyen@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
As of commit 8f32b90981dcdb355516fb95953133f8d4e6b11d
("net: ethernet: ti: davinci_cpdma: add set rate for a channel") the
ARM allmodconfig builds would fail modpost with:
Since these weren't declared as static, it is assumed they were
meant to be shared outside the file, and that modular build testing
was simply overlooked.
Fixes: 8f32b90981dc ("net: ethernet: ti: davinci_cpdma: add set rate for a channel") Cc: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Cc: Mugunthan V N <mugunthanvnm@ti.com> Cc: Grygorii Strashko <grygorii.strashko@ti.com> Cc: linux-omap@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
this is a pull request of 4 patches for net-next/master.
There are two patches by Chris Paterson for the rcar_can and rcar_canfd
device tree binding documentation. And a patch by Geert Uytterhoeven
that corrects the order of interrupt specifiers.
The fourth patch by Colin Ian King fixes a spelling error in the
kvaser_usb driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Thu, 1 Dec 2016 15:19:41 +0000 (16:19 +0100)]
net: stmmac: unify mdio functions
stmmac_mdio_{read|write} and stmmac_mdio_{read|write}_gmac4 are not
enought different for being split.
The only differences between thoses two functions are shift/mask for
addr/reg/clk_csr.
This patch introduce a per platform set of variable for setting thoses
shift/mask and unify mdio read and write functions.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
LABBE Corentin [Thu, 1 Dec 2016 15:19:40 +0000 (16:19 +0100)]
net: stmmac: avoid Camelcase naming
This patch simply rename regValue to value, like it was named in other
mdio functions.
Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 28 Nov 2016 14:19:43 +0000 (15:19 +0100)]
irda: w83977af_ir: fix damaged whitespace
As David Miller pointed out for for the previous patch, the whitespace
in some functions looks rather odd. This was caused by commit 6329da5f258a
("obsolete config in kernel source: USE_INTERNAL_TIMER"), which removed
some conditions but did not reindent the code.
This fixes the indentation in the file and removes extraneous whitespace
at the end of the lines and before tabs.
There are many other minor coding style problems in the driver, but I'm
not touching those here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
1) In the MACB driver, a bug fix to properly initialize the
RX tail pointer properly overlapped with some changes
to support variable sized rings.
2) In XGBE we had a "CONFIG_PM" --> "CONFIG_PM_SLEEP" fix
overlapping with a reorganization of the driver to support
ACPI, OF, as well as PCI variants of the chip.
3) In 'net' we had several probe error path bug fixes to the
stmmac driver, meanwhile a lot of this code was cleaned up
and reorganized in 'net-next'.
4) The cls_flower classifier obtained a helper function in
'net-next' called __fl_delete() and this overlapped with
Daniel Borkamann's bug fix to use RCU for object destruction
in 'net'. It also overlapped with Jiri's change to guard
the rhashtable_remove_fast() call with a check against
tc_skip_sw().
5) In mlx4, a revert bug fix in 'net' overlapped with some
unrelated changes in 'net-next'.
6) In geneve, a stale header pointer after pskb_expand_head()
bug fix in 'net' overlapped with a large reorganization of
the same code in 'net-next'. Since the 'net-next' code no
longer had the bug in question, there was nothing to do
other than to simply take the 'net-next' hunks.
Signed-off-by: David S. Miller <davem@davemloft.net>
Carolyn Wyborny [Tue, 8 Nov 2016 21:05:12 +0000 (13:05 -0800)]
i40e: change message to only appear when extra debug info is wanted
This patch changes an X722 informational message so that it only
appears when extra messages are desired. Without this patch,
on X722 devices, this message appears at load, potentially causing
unnecessary alarm.
Change-ID: I94f7aae15dc5b2723cc9728c630c72538a3e670e Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 8 Nov 2016 21:05:11 +0000 (13:05 -0800)]
i40e/i40evf: replace for memcpy with single memcpy call in ethtool
memcpy replaced with single memcpy call in ethtool.
Change-ID: I3f5bef6bcc593412c56592c6459784db41575a0a Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 8 Nov 2016 21:05:10 +0000 (13:05 -0800)]
i40e: set broadcast promiscuous mode for each active VLAN
A previous workaround added to ensure receipt of all broadcast frames
incorrectly set the broadcast promiscuous mode unconditionally
regardless of active VLAN status.
Replace this partial workaround with a complete solution that sets the
broadcast promiscuous filters in i40e_sync_vsi_filters. This new method
sets the promiscuous mode based on when broadcast filters are added or
removed.
I40E_VLAN_ANY will request a broadcast filter for all VLANs, (as we're
in untagged mode) while a broadcast filter on a specific VLAN will only
request broadcast for that VLAN.
Thus, we restore addition of broadcast filter to the array, but we add
special handling for these such that they enable the broadcast
promiscuous mode instead of being sent as regular filters.
The end result is that we will correctly receive all broadcast packets
(even those with a *source* address equal to the broadcast address) but
will not receive packets for which we don't have an active VLAN filter.
Change-ID: I7d0585c5cec1a5bf55bf533b42e5e817d5db6a2d Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes the problem where the ethtool Supported link
modes list backplane interfaces on X722 devices for 10GbE with
SFP+ and Cortina retimer. This patch fixes the problem by setting
and using a flag for this particular device since the backplane
interface is only between the internal PHY and the retimer and it
should not be seen by the user as they cannot use it.
Without this patch, the user wrongly thinks that backplane interfaces
are supported on their device when they actually are not.
Change-ID: I3882bc2928431d48a2db03a51a713a1f681a79e9 Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 8 Nov 2016 21:05:08 +0000 (13:05 -0800)]
i40evf: protect against NULL msix_entries and q_vectors pointers
Update the functions which free msix_entries and q_vectors so that they
are safe against NULL values. This allows calling code to not care
whether these have already been freed when disabling and freeing them.
Change-ID: I31bfd1c0da18023d971b618edc6fb049721f3298 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Henry Tieman [Tue, 8 Nov 2016 21:05:07 +0000 (13:05 -0800)]
i40e: Pass unknown PHY type for unknown PHYs
The PHY type value for unrecognized PHYs and cables was changed
based on firmware version number. Newer hardware use lower firmware
version numbers and this was causing some PHYs to be identified
as type 0x16 instead of 0xe (unknown).
Without this patch, newer card will incorrectly identify unknown
PHYs and cables.
This change adds hardware type to the check for firmware version
so the PHY type is reported correctly.
Change-ID: I0723cbfd263c76fc73ff1a5275d1639051376c9a Signed-off-by: Henry Tieman <henry.w.tieman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Henry Tieman [Tue, 8 Nov 2016 21:05:06 +0000 (13:05 -0800)]
i40e: Remove unreachable code
The code at the end of i40e_read_phy_register_clause22() contained
unreachable code and redundant control statements.
This change removes the unreachable code. And deletes the redundant
goto statement and if statement.
Change-ID: I713032b1585396f40f903cbcfdea987abd874400 Signed-off-by: Henry Tieman <henry.w.tieman@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alan Brady [Tue, 8 Nov 2016 21:05:05 +0000 (13:05 -0800)]
i40evf: check for msix_entries null dereference
It is possible for msix_entries to be freed by a previous suspend/remove
before a VF is closed. This patch fixes the issue by checking for NULL
before dereferencing msix_entries and returning early in the case where
it is NULL within the i40evf_close code path. Without this patch it is
possible to trigger a kernel panic through NULL dereference.
Change-ID: I92a2746e82533a889e25f91578eac9abd0388ae2 Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Joe Perches [Tue, 1 Nov 2016 22:35:14 +0000 (15:35 -0700)]
i40evf: Move some i40evf_reset_task code to separate function
The i40evf_reset_task function is a couple hundred lines and it has
a separable block that disables VF. Move that block to a new
i40evf_disable_vf function to shorten i40evf_reset_task a bit.
Signed-off-by: Joe Perches <joe@perches.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tushar Dave [Wed, 26 Oct 2016 17:49:27 +0000 (10:49 -0700)]
i40e: fix panic on SPARC while changing num of desc
On SPARC, writel() should not be used to write directly to memory
address but only to memory mapped I/O address otherwise it causes
data access exception.
Commit 147e81ec75689 ("i40e: Test memory before ethtool alloc
succeeds") introduced a code that uses memory address to fake the HW
tail address and attempt to write to that address using writel()
causes kernel panic on SPARC. The issue is reproduced while changing
number of descriptors using ethtool.
This change resolves the panic by using HW read-only memory mapped
I/O register to fake HW tail address instead memory address.
Signed-off-by: Tushar Dave <tushar.n.dave@oracle.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Piotr Raczynski [Tue, 25 Oct 2016 23:08:53 +0000 (16:08 -0700)]
i40e: Add protocols over MCTP to i40e_aq_discover_capabilities
Add logical_id to I40E_AQ_CAP_ID_MNG_MODE capability starting from major
version 2.
Change-ID: Idb29214b172ea5c70cbd45a99e6745c0215af7e4 Signed-off-by: Piotr Raczynski <piotr.raczynski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 25 Oct 2016 23:08:52 +0000 (16:08 -0700)]
i40e: fix trivial typo in naming of i40e_sync_filters_subtask
A comment incorrectly referred to i40e_vsi_sync_filters_subtask which
does not actually exist. Reference the correct function instead.
Change-ID: I6bd805c605741ffb6fe34377259bb0d597edfafd Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Michal Kosiarz [Tue, 25 Oct 2016 23:08:51 +0000 (16:08 -0700)]
i40e: Add Clause22 implementation
Some external PHYs require Clause22 method for accessing registers.
This patch also adds some defines to support blink led on devices using
10CBaseT PHY.
Change-ID: I868a4326911900f6c89e7e522fda4968b0825f14 Signed-off-by: Michal Kosiarz <michal.kosiarz@intel.com> Signed-off-by: Matt Jared <matthew.a.jared@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 25 Oct 2016 23:08:50 +0000 (16:08 -0700)]
i40e: avoid duplicate private flags definitions
Separate the global private flags and the regular private flags per
interface into two arrays. Future additions of private flags will not
need to be duplicated which may lead to buggy code. Also rename
"i40e_priv_flags_strings_gl" to "i40e_gl_priv_flags_strings" for
clarity, as it reads more naturally.
Change-ID: I68caef3c9954eb7da342d7f9d20f2873186f2758 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 25 Oct 2016 23:08:49 +0000 (16:08 -0700)]
i40e: remove second check of VLAN_N_VID in i40e_vlan_rx_add_vid
Replace a check of magic number 4095 with VLAN_N_VID. This
makes it obvious that a later check against VLAN_N_VID is
always true and can be removed.
Change-ID: I28998f127a61a529480ce63d8a07e266f6c63b7b Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 25 Oct 2016 23:08:48 +0000 (16:08 -0700)]
i40e: remove error_param_int label from i40e_vc_config_promiscuous_mode_msg
This label is unnecessary, as are jumping to a block that checks aq_ret
and then immediately skipping it and returning. So just jump straight to
the error_param and remove this unnecessary label.
Also use goto error_param even in the last check for style consistency.
Change-ID: If487c7d10c4048e37c594e5eca167693aaed45f6 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Tue, 25 Oct 2016 23:08:47 +0000 (16:08 -0700)]
i40evf: Be much more verbose about what we can and cannot offload
This change makes it so that we are much more robust about defining what we
can and cannot offload. Previously we were performing no checks. This
should bring us up to parity with the i40e PF driver.
In addition the device only supports GSO as long as the MSS is 64 or
greater. We were not checking this so an MSS less than that was resulting
in Tx hangs.
Change-ID: If533553ec92fc6ba694eab6ac81fdaf3004f3592 Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Tue, 25 Oct 2016 23:08:46 +0000 (16:08 -0700)]
i40e: Be much more verbose about what we can and cannot offload
This change makes it so that we are much more robust about defining what we
can and cannot offload. Previously we were just checking for the L4 tunnel
header length, however there are other fields we should be verifying as
there are multiple scenarios in which we cannot perform hardware offloads.
In addition the device only supports GSO as long as the MSS is 64 or
greater. We were not checking this so an MSS less than that was resulting
in Tx hangs.
Change-ID: I5e2fd5f3075c73601b4b36327b771c64fcb6c31b Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Linus Torvalds [Fri, 2 Dec 2016 21:34:37 +0000 (13:34 -0800)]
Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc
Pull ARM SoC fixes from Arnd Bergmann:
"This should be the last set of bugfixes for arm-soc in v4.9. None of
these are critical regressions, but it would be nice to still get them
merged.
- On the Juno platform, the idle latency was described wrong, leading
to suboptimal cpuidle tuning.
- Also on the same platform, PCI I/O space was set up incorrectly and
could not work.
- On the sti platform, a syntactically incorrect DT entry caused
warnings.
- The newly added 'gr8' platform has somewhat confusing file names,
which we rename for consistency"
* tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
arm64: dts: juno: fix cluster sleep state entry latency on all SoC versions
arm64: dts: juno: Correct PCI IO window
ARM: dts: STiH407-family: fix i2c nodes
ARM: gr8: Rename the DTSI and relevant DTS
1) Lots more phydev and probe error path leaks in various drivers by
Johan Hovold.
2) Fix race in packet_set_ring(), from Philip Pettersson.
3) Use after free in dccp_invalid_packet(), from Eric Dumazet.
4) Signnedness overflow in SO_{SND,RCV}BUFFORCE, also from Eric
Dumazet.
5) When tunneling between ipv4 and ipv6 we can be left with the wrong
skb->protocol value as we enter the IPSEC engine and this causes all
kinds of problems. Set it before the output path does any
dst_output() calls, from Eli Cooper.
6) bcmgenet uses wrong device struct pointer in DMA API calls, fix from
Florian Fainelli.
7) Various netfilter nat bug fixes from FLorian Westphal.
8) Fix memory leak in ipvlan_link_new(), from Gao Feng.
9) Locking fixes, particularly wrt. socket lookups, in l2tp from
Guillaume Nault.
10) Avoid invoking rhash teardowns in atomic context by moving netlink
cb->done() dump completion from a worker thread. Fix from Herbert
Xu.
11) Buffer refcount problems in tun and macvtap on errors, from Jason
Wang.
12) We don't set Kconfig symbol DEFAULT_TCP_CONG properly when the user
selects BBR. Fix from Julian Wollrath.
13) Fix deadlock in transmit path on altera TSE driver, from Lino
Sanfilippo.
14) Fix unbalanced reference counting in dsa_switch_tree, from Nikita
Yushchenko.
15) tc_tunnel_key needs to be properly exported to userspace via uapi,
fix from Roi Dayan.
16) rds_tcp_init_net() doesn't unregister notifier in error path, fix
from Sowmini Varadhan.
17) Stale packet header pointer access after pskb_expand_head() in
genenve driver, fix from Sabrina Dubroca.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (103 commits)
net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
geneve: avoid use-after-free of skb->data
tipc: check minimum bearer MTU
net: renesas: ravb: unintialized return value
sh_eth: remove unchecked interrupts for RZ/A1
net: bcmgenet: Utilize correct struct device for all DMA operations
NET: usb: qmi_wwan: add support for Telit LE922A PID 0x1040
cdc_ether: Fix handling connection notification
ip6_offload: check segs for NULL in ipv6_gso_segment.
RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net
Revert: "ip6_tunnel: Update skb->protocol to ETH_P_IPV6 in ip6_tnl_xmit()"
ipv6: Set skb->protocol properly for local output
ipv4: Set skb->protocol properly for local output
packet: fix race condition in packet_set_ring
net: ethernet: altera: TSE: do not use tx queue lock in tx completion handler
net: ethernet: altera: TSE: Remove unneeded dma sync for tx buffers
net: ethernet: stmmac: fix of-node and fixed-link-phydev leaks
net: ethernet: stmmac: platform: fix outdated function header
net: ethernet: stmmac: dwmac-meson8b: fix probe error path
net: ethernet: stmmac: dwmac-generic: fix probe error path
...
Eric Dumazet [Fri, 2 Dec 2016 17:44:53 +0000 (09:44 -0800)]
net: avoid signed overflows for SO_{SND|RCV}BUFFORCE
CAP_NET_ADMIN users should not be allowed to set negative
sk_sndbuf or sk_rcvbuf values, as it can lead to various memory
corruptions, crashes, OOM...
Note that before commit 82981930125a ("net: cleanups in
sock_setsockopt()"), the bug was even more serious, since SO_SNDBUF
and SO_RCVBUF were vulnerable.
This needs to be backported to all known linux kernels.
Again, many thanks to syzkaller team for discovering this gem.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Fri, 2 Dec 2016 15:49:29 +0000 (16:49 +0100)]
geneve: avoid use-after-free of skb->data
geneve{,6}_build_skb can end up doing a pskb_expand_head(), which
makes the ip_hdr(skb) reference we stashed earlier stale. Since it's
only needed as an argument to ip_tunnel_ecn_encap(), move this
directly in the function call.
Fixes: 08399efc6319 ("geneve: ensure ECN info is handled properly in all tx/rx paths") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: John W. Linville <linville@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kubeček [Fri, 2 Dec 2016 08:33:41 +0000 (09:33 +0100)]
tipc: check minimum bearer MTU
Qian Zhang (张谦) reported a potential socket buffer overflow in
tipc_msg_build() which is also known as CVE-2016-8632: due to
insufficient checks, a buffer overflow can occur if MTU is too short for
even tipc headers. As anyone can set device MTU in a user/net namespace,
this issue can be abused by a regular user.
As agreed in the discussion on Ben Hutchings' original patch, we should
check the MTU at the moment a bearer is attached rather than for each
processed packet. We also need to repeat the check when bearer MTU is
adjusted to new device MTU. UDP case also needs a check to avoid
overflow when calculating bearer MTU.
Fixes: b97bf3fd8f6a ("[TIPC] Initial merge") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Reported-by: Qian Zhang (张谦) <zhangqian-c@360.cn> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 1 Dec 2016 20:57:44 +0000 (23:57 +0300)]
net: renesas: ravb: unintialized return value
We want to set the other "err" variable here so that we can return it
later. My version of GCC misses this issue but I caught it with a
static checker.
Fixes: 9f70eb339f52 ("net: ethernet: renesas: ravb: fix fixed-link phydev leaks") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Reviewed-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Chris Brandt [Thu, 1 Dec 2016 18:32:14 +0000 (13:32 -0500)]
sh_eth: remove unchecked interrupts for RZ/A1
When streaming a lot of data and the RZ/A1 can't keep up, some status bits
will get set that are not being checked or cleared which cause the
following messages and the Ethernet driver to stop working. This
patch fixes that issue.
irq 21: nobody cared (try booting with the "irqpoll" option)
handlers:
[<c036b71c>] sh_eth_interrupt
Disabling IRQ #21
Fixes: db893473d313a4ad ("sh_eth: Add support for r7s72100") Signed-off-by: Chris Brandt <chris.brandt@renesas.com> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: bcmgenet: Utilize correct struct device for all DMA operations
__bcmgenet_tx_reclaim() and bcmgenet_free_rx_buffers() are not using the
same struct device during unmap that was used for the map operation,
which makes DMA-API debugging warn about it. Fix this by always using
&priv->pdev->dev throughout the driver, using an identical device
reference for all map/unmap calls.
Fixes: 1c1008c793fa ("net: bcmgenet: add main driver file") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Dec 2016 18:52:02 +0000 (13:52 -0500)]
Merge branch 'mvneta-64bit'
Gregory CLEMENT says:
====================
Support Armada 37xx SoC (ARMv8 64-bits) in mvneta driver
The Armada 37xx is a new ARMv8 SoC from Marvell using same network
controller as the older Armada 370/38x/XP SoCs. This series adapts the
driver in order to be able to use it on this new SoC. The main changes
are:
- 64-bits support: the first patches allow using the driver on a 64-bit
architecture.
- MBUS support: the mbus configuration is different on Armada 37xx
from the older SoCs.
- per cpu interrupt: Armada 37xx do not support per cpu interrupt for
the NETA IP, the non-per-CPU behavior was added back.
The first patch is an optimization in the rx path in swbm mode.
The second patch remove unnecessary allocation for HWBM.
The first item is solved by patches 4 and 5.
The 2 last items are solved by patch 6.
In patch 7 the dt support is added.
Beside Armada 37xx, this series have been again tested on Armada XP
and Armada 38x (with Hardware Buffer Management and with Software
Buffer Management).
This is the 6th version of the series:
- 1st version:
http://lists.infradead.org/pipermail/linux-arm-kernel/2016-November/469588.html
Changelog:
v5 -> v6:
- Added Tested-by from Marcin Wojtas on the series
- Added Reviewed-by from Jisheng Zhang on patch 3
- Fix eth1 phy mode for Armada 3720 DB board on patch 7
v4 -> v5:
- remove unnecessary cast in patch 3
v3 -> v4:
- Adding new patch: "net: mvneta: do not allocate buffer in rxq init
with HWBM"
- Simplify the HWBM case in patch 3 as suggested by Marcin
v2 -> v3:
- Adding patch 1 "Optimize rx path for small frame"
- Fix the kbuild error by moving the "phys_addr += pp->rx_offset_correction;"
line from patch 2 to patch 3 where rx_offset_correction is introduced.
- Move the memory allocation of the buf_virt_addr of the rxq to be
called by the probe function in order to avoid a memory leak.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Thu, 1 Dec 2016 17:03:09 +0000 (18:03 +0100)]
net: mvneta: Add network support for Armada 3700 SoC
Armada 3700 is a new ARMv8 SoC from Marvell using same network controller
as older Armada 370/38x/XP. There are however some differences that
needed taking into account when adding support for it:
* open default MBUS window to 4GB of DRAM - Armada 3700 SoC's Mbus
configuration for network controller has to be done on two levels:
global and per-port. The first one is inherited from the
bootloader. The latter can be opened in a default way, leaving
arbitration to the bus controller. Hence filled mbus_dram_target_info
structure is not needed
* make per-CPU operation optional - Recent patches adding RSS and XPS
support for Armada 38x/XP enabled per-CPU operation of the controller
by default. Contrary to older SoC's Armada 3700 SoC's network
controller is not capable of per-CPU processing due to interrupt lines'
connectivity. This patch restores non-per-CPU operation, which is now
optional and depends on neta_armada3700 flag value in mvneta_port
structure. In order not to complicate the code, separate interrupt
subroutine is implemented.
For now, on the Armada 3700, RSS is disabled as the current
implementation depend on the per cpu interrupts.
[gregory.clement@free-electrons.com: extract from a larger patch, replace
some ifdef and port to net-next for v4.10]
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gregory CLEMENT [Thu, 1 Dec 2016 17:03:08 +0000 (18:03 +0100)]
net: mvneta: Only disable mvneta_bm for 64-bits
Actually only the mvneta_bm support is not 64-bits compatible.
The mvneta code itself can run on 64-bits architecture.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Thu, 1 Dec 2016 17:03:07 +0000 (18:03 +0100)]
net: mvneta: Convert to be 64 bits compatible
Prepare the mvneta driver in order to be usable on the 64 bits platform
such as the Armada 3700.
[gregory.clement@free-electrons.com]: this patch was extract from a larger
one to ease review and maintenance.
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gregory CLEMENT [Thu, 1 Dec 2016 17:03:06 +0000 (18:03 +0100)]
net: mvneta: Use cacheable memory to store the rx buffer virtual address
Until now the virtual address of the received buffer were stored in the
cookie field of the rx descriptor. However, this field is 32-bits only
which prevents to use the driver on a 64-bits architecture.
With this patch the virtual address is stored in an array not shared with
the hardware (no more need to use the DMA API). Thanks to this, it is
possible to use cache contrary to the access of the rx descriptor member.
The change is done in the swbm path only because the hwbm uses the cookie
field, this also means that currently the hwbm is not usable in 64-bits.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Reviewed-by: Jisheng Zhang <jszhang@marvell.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gregory CLEMENT [Thu, 1 Dec 2016 17:03:05 +0000 (18:03 +0100)]
net: mvneta: Do not allocate buffer in rxq init with HWBM
For HWBM all buffers are allocated in mvneta_bm_construct() and in runtime
they are put into descriptors by hardware. There is no need to fill them
at this point.
Suggested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gregory CLEMENT [Thu, 1 Dec 2016 17:03:04 +0000 (18:03 +0100)]
net: mvneta: Optimize rx path for small frame
For small frame reuse the phys_addr variable instead of accessing the
uncacheable value in the rx descriptor.
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com> Tested-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 2 Dec 2016 18:48:50 +0000 (10:48 -0800)]
Fix up a couple of field names in the CREDITS file
Ozgur Karatas reported that the very first entry in the CREDITS file had
the wrong tag for name (M: instead of N: - it happened when moving the
entry from the MAINTAINERS file, where 'M:' stands for "Maintainer").
And when I went looking, I found a couple of other cases of wrong
tagging too.
David S. Miller [Fri, 2 Dec 2016 18:46:21 +0000 (13:46 -0500)]
Merge branch 'bpf-support-for-sockets'
David Ahern says:
====================
net: Add bpf support for sockets
The recently added VRF support in Linux leverages the bind-to-device
API for programs to specify an L3 domain for a socket. While
SO_BINDTODEVICE has been around for ages, not every ipv4/ipv6 capable
program has support for it. Even for those programs that do support it,
the API requires processes to be started as root (CAP_NET_RAW) which
is not desirable from a general security perspective.
This patch set leverages Daniel Mack's work to attach bpf programs to
a cgroup to provide a capability to set sk_bound_dev_if for all
AF_INET{6} sockets opened by a process in a cgroup when the sockets
are allocated.
For example:
1. configure vrf (e.g., using ifupdown2)
auto eth0
iface eth0 inet dhcp
vrf mgmt
3. set shell into cgroup (e.g., can be done at login using pam)
echo $$ >> /tmp/cgroupv2/mgmt/cgroup.procs
At this point all commands run in the shell (e.g, apt) have sockets
automatically bound to the VRF (see output of ss -ap 'dev == <vrf>'),
including processes not running as root.
This capability enables running any program in a VRF context and is key
to deploying Management VRF, a fundamental configuration for networking
gear, with any Linux OS installation.
This patchset also exports the socket family, type and protocol as
read-only allowing bpf filters to deny a process in a cgroup the ability
to open specific types of AF_INET or AF_INET6 sockets.
v7
- comments from Alexei
v6
- add export of socket family, type and protocol
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Dec 2016 16:48:08 +0000 (08:48 -0800)]
samples/bpf: add userspace example for prohibiting sockets
Add examples preventing a process in a cgroup from opening a socket
based family, protocol and type.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Dec 2016 16:48:07 +0000 (08:48 -0800)]
samples/bpf: Update bpf loader for cgroup section names
Add support for section names starting with cgroup/skb and cgroup/sock.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Dec 2016 16:48:06 +0000 (08:48 -0800)]
bpf: Add support for reading socket family, type, protocol
Add socket family, type and protocol to bpf_sock allowing bpf programs
read-only access.
Add __sk_flags_offset[0] to struct sock before the bitfield to
programmtically determine the offset of the unsigned int containing
protocol and type.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Dec 2016 16:48:05 +0000 (08:48 -0800)]
samples: bpf: add userspace example for modifying sk_bound_dev_if
Add a simple program to demonstrate the ability to attach a bpf program
to a cgroup that sets sk_bound_dev_if for AF_INET{6} sockets when they
are created.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Dec 2016 16:48:04 +0000 (08:48 -0800)]
bpf: Add new cgroup attach type to enable sock modifications
Add new cgroup based program type, BPF_PROG_TYPE_CGROUP_SOCK. Similar to
BPF_PROG_TYPE_CGROUP_SKB programs can be attached to a cgroup and run
any time a process in the cgroup opens an AF_INET or AF_INET6 socket.
Currently only sk_bound_dev_if is exported to userspace for modification
by a bpf program.
This allows a cgroup to be configured such that AF_INET{6} sockets opened
by processes are automatically bound to a specific device. In turn, this
enables the running of programs that do not support SO_BINDTODEVICE in a
specific VRF context / L3 domain.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 1 Dec 2016 16:48:03 +0000 (08:48 -0800)]
bpf: Refactor cgroups code in prep for new type
Code move and rename only; no functional change intended.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Commit bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling")
introduced a work-around in usbnet_cdc_status() for devices that exported
cdc carrier on twice on connect. Before the commit, this behavior caused
the link state to be incorrect. It was assumed that all CDC Ethernet
devices would either export this behavior, or send one off and then one on
notification (which seems to be the default behavior).
Unfortunately, it turns out multiple devices sends a connection
notification multiple times per second (via an interrupt), even when
connection state does not change. This has been observed with several
different USB LAN dongles (at least), for example 13b1:0041 (Linksys).
After bfe9b9d2df66, the link state has been set as down and then up for
each notification. This has caused a flood of Netlink NEWLINK messages and
syslog to be flooded with messages similar to:
cdc_ether 2-1:2.0 eth1: kevent 12 may have been dropped
This commit fixes the behavior by reverting usbnet_cdc_status() to how it
was before bfe9b9d2df66. The work-around has been moved to a separate
status-function which is only called when a known, affect device is
detected.
v1->v2:
* Do not open-code netif_carrier_ok() (thanks Henning Schild).
* Call netif_carrier_off() instead of usb_link_change(). This prevents
calling schedule_work() twice without giving the work queue a chance to be
processed (thanks Bjørn Mork).
Fixes: bfe9b9d2df66 ("cdc_ether: Improve ZTE MF823/831/910 handling") Reported-by: Henning Schild <henning.schild@siemens.com> Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Artem Savkov [Thu, 1 Dec 2016 13:06:04 +0000 (14:06 +0100)]
ip6_offload: check segs for NULL in ipv6_gso_segment.
segs needs to be checked for being NULL in ipv6_gso_segment() before calling
skb_shinfo(segs), otherwise kernel can run into a NULL-pointer dereference:
Sunil Goutham [Thu, 1 Dec 2016 12:54:28 +0000 (18:24 +0530)]
net: thunderx: Fix transmit queue timeout issue
Transmit queue timeout issue is seen in two cases
- Due to a race condition btw setting stop_queue at xmit()
and checking for stopped_queue in NAPI poll routine, at times
transmission from a SQ comes to a halt. This is fixed
by using barriers and also added a check for SQ free descriptors,
incase SQ is stopped and there are only CQE_RX i.e no CQE_TX.
- Contrary to an assumption, a HW errata where HW doesn't stop transmission
even though there are not enough CQEs available for a CQE_TX is
not fixed in T88 pass 2.x. This results in a Qset error with
'CQ_WR_FULL' stalling transmission. This is fixed by adjusting
RXQ's RED levels for CQ level such that there is always enough
space left for CQE_TXs.
Signed-off-by: Sunil Goutham <sgoutham@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
RDS: TCP: unregister_netdevice_notifier() in error path of rds_tcp_init_net
If some error is encountered in rds_tcp_init_net, make sure to
unregister_netdevice_notifier(), else we could trigger a panic
later on, when the modprobe from a netns fails.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Dec 2016 18:28:38 +0000 (13:28 -0500)]
Merge branch 'offloading-tc-rules-hw'
Hadar Hen Zion says:
====================
Offloading tc rules using underline Hardware device
This series adds flower classifier support in offloading tc rules when the
Software ingress device is different from the Hardware ingress device,
such as when dealing with IP tunnels
The first two patches are a small fixes to flower, checking the skip_hw flag
wasn't set before calling the Hardware offloading functions which will try to
offload the rule.
The next two patches are infrastructure patches, a preparation for the fourth
patch which is adding support in flower to offload rules when the ingress
device is not a Hardware device and therefore can't offload.
In this case ndo_setup_tc is called with the mirred (egress) device.
The last three patchs are adding mlx5e support to offload rules using the new
"egress_device" flag.
Thanks,
Hadar
Changes from v0:
- check if CONFIG_NET_CLS_ACT is defined befor calling tc_action_ops get_dev()
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 1 Dec 2016 12:06:40 +0000 (14:06 +0200)]
net/mlx5e: Support adding ingress tc rule when egress device flag is set
When ndo_setup_tc is called with an egress_dev flag set, it means that
the ndo call was executed on the mirred action (egress) device and not
on the ingress device.
In order to support this kind of ndo_setup_tc call, and insert the
correct decap rule to the hardware, the uplink device on the same eswitch
should be found.
Currently, we use this resolution between the mirred device and the
uplink on the same eswitch to offload vxlan shared device decap rules.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 1 Dec 2016 12:06:38 +0000 (14:06 +0200)]
net/mlx5e: Bring back representor's ndos that were accidentally removed
The VF Representor udp tunnel ndo entries were removed by mistake,
return them.
Fixes: 370bad0f9a52 ('net/mlx5e: Support HW (offloaded) and SW counters for SRIOV switchdev mode') Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 1 Dec 2016 12:06:37 +0000 (14:06 +0200)]
net/sched: cls_flower: Add offload support using egress Hardware device
In order to support hardware offloading when the device given by the tc
rule is different from the Hardware underline device, extract the mirred
(egress) device from the tc action when a filter is added, using the new
tc_action_ops, get_dev().
Flower caches the information about the mirred device and use it for
calling ndo_setup_tc in filter change, update stats and delete.
Calling ndo_setup_tc of the mirred (egress) device instead of the
ingress device will allow a resolution between the software ingress
device and the underline hardware device.
The resolution will take place inside the offloading driver using
'egress_device' flag added to tc_to_netdev struct which is provided to
the offloading driver.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 1 Dec 2016 12:06:35 +0000 (14:06 +0200)]
net/sched: cls_flower: Provide a filter to replace/destroy hardware filter functions
Instead of providing many arguments to fl_hw_{replace/destroy}_filter
functions, just provide cls_fl_filter struct that includes all the relevant
args.
This patches doesn't add any new functionality.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 1 Dec 2016 12:06:34 +0000 (14:06 +0200)]
net/sched: cls_flower: Try to offload only if skip_hw flag isn't set
Check skip_hw flag isn't set before calling
fl_hw_{replace/destroy}_filter and fl_hw_update_stats functions.
Replace the call to tc_should_offload with tc_can_offload.
tc_can_offload only checks if the device supports offloading, the check for
skip_hw flag is done earlier in the flow.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hadar Hen Zion [Thu, 1 Dec 2016 12:06:33 +0000 (14:06 +0200)]
net/sched: Add separate check for skip_hw flag
Creating a difference between two possible cases:
1. Not offloading tc rule since the user sets 'skip_hw' flag.
2. Not offloading tc rule since the device doesn't support offloading.
This patch doesn't add any new functionality.
Signed-off-by: Hadar Hen Zion <hadarh@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: allow to turn tcp timestamp randomization off
Eric says: "By looking at tcpdump, and TS val of xmit packets of multiple
flows, we can deduct the relative qdisc delays (think of fq pacing).
This should work even if we have one flow per remote peer."
Having random per flow (or host) offsets doesn't allow that anymore so add
a way to turn this off.
Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: randomize tcp timestamp offsets for each connection
jiffies based timestamps allow for easy inference of number of devices
behind NAT translators and also makes tracking of hosts simpler.
commit ceaa1fef65a7c2e ("tcp: adding a per-socket timestamp offset")
added the main infrastructure that is needed for per-connection ts
randomization, in particular writing/reading the on-wire tcp header
format takes the offset into account so rest of stack can use normal
tcp_time_stamp (jiffies).
So only two items are left:
- add a tsoffset for request sockets
- extend the tcp isn generator to also return another 32bit number
in addition to the ISN.
Re-use of ISN generator also means timestamps are still monotonically
increasing for same connection quadruple, i.e. PAWS will still work.
Includes fixes from Eric Dumazet.
Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This series introduces hardware offload iSCSI initiator driver for the
41000 Series Converged Network Adapters (579xx chip) by Qlogic. The overall
driver design includes a common module ('qed') and protocol specific
dependent modules ('qedi' for iSCSI).
This is an open iSCSI driver, modifications to open iSCSI user components
'iscsid', 'iscsiuio', etc. are required for the solution to work. The user
space changes are also in the process of being submitted.
The 'qed' common module, under drivers/net/ethernet/qlogic/qed/, is
enhanced with functionality required for the iSCSI support. This series
is based on:
net tree base: Merge of net and net-next as of 11/29/2016
Changes from RFC v2:
1. qedi patches are squashed into single patch to prevent krobot
warning.
2. Fixed 'hw_p_cpuq' incompatible pointer type.
3. Fixed sparse incompatible types in comparison expression.
4. Misc fixes with latest 'checkpatch --strict' option.
5. Remove int_mode option from MODULE_PARAM.
6. Prefix all MODULE_PARAM params with qedi_*.
7. Use CONFIG_QED_ISCSI instead of CONFIG_QEDI
8. Added bad task mem access fix.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>