git.proxmox.com Git - mirror_ubuntu-eoan-kernel.git/log

i40e/i40evf: Add ATR support for tunneled TCP/IPv4/IPv6 packets.

Without this, RSS would have done inner header load balancing. Now we can
get the benefits of ATR for tunneled packets to better align TX and RX
queues with the right core/interrupt.

Change-ID: I07d0e0a192faf28fdd33b2f04c32b2a82ff97ddd
Signed-off-by: Anjali Singhai Jain <anjali.singhai@intel.com>
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Disable offline diagnostics if VFs are enabled

Require the user to disable virtual functions before running the device
offline diagnostics. The offline diagnostics are intended to ensure
basic operation of the device - it is beyond the scope of the diagnostic
test to handle the additional complexity of bringing all the virtual
functions offline and then back online for each test run.

Change-ID: Ic0b854851a09fc85df0c9e82c220e45885457c30
Signed-off-by: Greg Rose <gregory.v.rose@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Collect PFC XOFF RX stats even in single TC case

When PFC is enabled for any UP in single TC configuration the driver didn't
collect the PFC XOFF RX stats. Though a single TC with PFC enabled is not a
common scenario do not prevent the driver from collecting stats if firmware
indicates that PFC is enabled.

Change-ID: Ie20bd58b07608b528f3c6d95894c9ae56b00077a
Signed-off-by: Neerav Parikh <neerav.parikh@intel.com>
Tested-by: Jim Young <james.m.young@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ixgbe: Allow flow director to use entire queue space

Flow director is exported to user space using the ethtool ntuple
support. However, currently it only supports steering traffic to a
subset of the queues in use by the hardware. This change allows
flow director to specify queues that have been assigned to virtual
functions by partitioning the ring_cookie into a 8bit VF specifier
followed by 32bit queue index. At the moment we don't have any
ethernet drivers with more than 2^32 queues on a single function
as best I can tell and nor do I expect this to happen anytime
soon. This way the ring_cookie's normal use for specifying a queue
on a specific PCI function continues to work as expected.

CC: Alex Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Tested-by: Krishneil Singh <krishneil.k.singh@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

ethtool: Add helper routines to pass vf to rx_flow_spec

The ring_cookie is 64 bits wide which is much larger than can be used
for actual queue index values. So provide some helper routines to
pack a VF index into the cookie. This is useful to steer packets to
a VF ring without having to know the queue layout of the device.

CC: Alex Duyck <alexander.h.duyck@redhat.com>
Signed-off-by: John Fastabend <john.r.fastabend@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

tcp/dccp: warn user for preferred ip_local_port_range

After commit 07f4c90062f8f ("tcp/dccp: try to not exhaust
ip_local_port_range in connect()") it is advised to have an even number
of ports described in /proc/sys/net/ipv4/ip_local_port_range

This means start/end values should have a different parity.

Let's warn sysadmins of this, so that they can update their settings
if they want to.

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: connect() from bound sockets can be faster

__inet_hash_connect() does not use its third argument (port_offset)
if socket was already bound to a source port.

No need to perform useless but expensive md5 computations.

Reported-by: Crestez Dan Leonard <cdleonard@gmail.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cxgb4-next'

Hariprasad Shenai says:

====================
cxgb4/cxgb4vf: Adds FL starvation support and cleanup

This patch series adds the following.
Adds debugfs entry to inject freelist starvation and some function and
argument cleanup

This patch series has been created against net-next tree and includes
patches on cxgb4 and cxgb4vf driver.

We have included all the maintainers of respective drivers. Kindly review
the change and let us know in case of any review comments.

Thanks

V2:
Skipping patch "cxgb4: Add support for loopback between VI of same port".
This needs some major code change, since module param is not recommended.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4/cxgb4vf: function and argument name cleanup

This patch changes variable name 'fn' to 'pf' of structure adapter.
A 'fn' usually stands for PCI function which could be a PF or a VF.
However, the use of this particular variable is explicitly limited to PF
only. So, be specific about it in the variable name.

Also corrects arguments passed for fn t4_ofld_eq_free, t4_ctrl_eq_free,
t4_eth_eq_free, t4_iq_free, t4_alloc_vi, t4_fw_hello, t4_wr_mbox and
t4_cfg_pfvf function.

Also renames cxgb4_t4_bar2_sge_qregs to t4_bar2_sge_qregs and renames
the latter function name in cxgb4vf driver to t4vf_bar2_sge_qregs to
avoid conflicts. Also fixes alignment for these function.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Add debugfs facility to inject FL starvation

Add debugfs entry to inject Freelist starvation, used only for debugging
purpose.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qla4xxx: add a missing include

vmalloc.h used to be included from include/net/inet_hashtables.h
but it is no longer the case.

Fixes: 095dc8e0c368 ("tcp: fix/cleanup inet_ehash_locks_alloc()")
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Eric Dumazet <edumzet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'thunderx'

Aleksey Makarov says:

====================
Adding support for Cavium ThunderX network controller

This patchset adds support for the Cavium ThunderX network controller.

changes in v6:
* unused preprocessor symbols were removed
* reduce no of atomic operations in SQ maintenance
* support for TCP segmentation at driver level
* reset RBDR if fifo state is FAIL
* fixed an issue with link state mailbox message

changes in v5:
* __packed were removed. now we rely on C language ABI
* nic_dbg() -> netdev_dbg()
* fixes for a typo, constant spelling and using BIT_ULL
* use print_hex_dump()
* unnecessary conditions in a long if() chain were removed

changes in v4:
* the patch "pci: Add Cavium PCI vendor id" was attributed correctly
* a note that Cavium id is used in many drivers was added
* the license comments now match MODULE_LICENSE
* a comment explaining usage of writeq_relaxed()/readq_relaxed() was added

changes in v3:
* code cleanup
* issues discovered by reviewers were addressed

changes in v2:
* non-generic module parameters removed
* ethtool support added (nicvf_set_rxnfc())

v5: https://lkml.kernel.org/g/<1432344498-17131-1-git-send-email-aleksey.makarov@caviumnetworks.com>
v4: https://lkml.kernel.org/g/<1432000757-28700-1-git-send-email-aleksey.makarov@auriga.com>
v3: https://lkml.kernel.org/g/<1431747401-20847-1-git-send-email-aleksey.makarov@auriga.com>
v2: https://lkml.kernel.org/g/<1415596445-10061-1-git-send-email-rric@kernel.org>
v1: https://lkml.kernel.org/g/<20141030165434.GW20170@rric.localhost>
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: Adding support for Cavium ThunderX network controller

This patch adds support for the Cavium ThunderX network controller.
The driver is on the pci bus and thus requires the Thunder PCIe host
controller driver to be enabled.

Signed-off-by: Maciej Czekaj <mjc@semihalf.com>
Signed-off-by: David Daney <david.daney@cavium.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Ganapatrao Kulkarni <ganapatrao.kulkarni@caviumnetworks.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: Tomasz Nowicki <tomasz.nowicki@linaro.org>
Signed-off-by: Robert Richter <rrichter@cavium.com>
Signed-off-by: Kamil Rytarowski <kamil@semihalf.com>
Signed-off-by: Thanneeru Srinivasulu <tsrinivasulu@caviumnetworks.com>
Signed-off-by: Sruthi Vangala <svangala@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pci: Add Cavium PCI vendor id

This vendor id will be used by network (vNIC), USB (xHCI),
SATA (AHCI), GPIO, I2C, MMC and maybe other drivers
for ThunderX SoC.

Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Signed-off-by: Sunil Goutham <sgoutham@cavium.com>
Signed-off-by: Aleksey Makarov <aleksey.makarov@caviumnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

test_bpf: add similarly conflicting jump test case only for classic

While 3b52960266a3 ("test_bpf: add more eBPF jump torture cases")
added the int3 bug test case only for eBPF, which needs exactly 11
passes to converge, here's a version for classic BPF with 11 passes,
and one that would need 70 passes on x86_64 to actually converge for
being successfully JITed. Effectively, all jumps are being optimized
out resulting in a JIT image of just 89 bytes (from originally max
BPF insns), only returning K.

Might be useful as a receipe for folks wanting to craft a test case
when backporting the fix in commit 3f7352bf21f8 ("x86: bpf_jit: fix
compilation of large bpf programs") while not having eBPF. The 2nd
one is delegated to the interpreter as the last pass still results
in shrinking, in other words, this one won't be JITed on x86_64.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'sfc-next'

Edward Cree says:

====================
sfc: add MCDI tracing

This patchset adds support for logging MCDI (Management-Controller-to-
Driver Interface) interactions between the sfc driver and a bound device,
to aid in debugging.
Solarflare has a tool to decode the resulting traces and will look to
open-source this if there is any external interest, but the protocol is
already detailed in drivers/net/ethernet/sfc/mcdi_pcol.h.
The logging buffer we allocate per MCDI context is a work area for
constructing each individual message before logging it with netif_info.
The reason the buffer is long-lived is simply to avoid the overhead of
allocating and freeing it every MCDI call, since MCDIs are already known
to be serialised for other reasons.

--
v4: remove patch #4, which has already been applied via sshah
v3: add some explanations to cover letter and patch #4
v2: avoid long lines in cover letter; fix multiline comment style
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: add module parameter to enable MCDI logging on new functions

As many issues are encountered at probe time, where MCDI logging can't be
enabled through the sysfs node, this change adds a module parameter
'mcdi_logging_default', which defaults to false. When set to true, newly-
probed functions will have MCDI logging enabled. The setting can
subsequently be changed as normal through the sysfs node.

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: add sysfs entry to control MCDI tracing

MCDI tracing is enabled per-function with a sysfs file
/sys/class/net/<NET_DEV>/device/mcdi_logging

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc: add tracing of MCDI commands

MCDI tracing is conditional on CONFIG_SFC_MCDI_LOGGING, which is enabled
by default.

Each MCDI command will produce a console line like
sfc dom:bus:dev:fn ifname: MCDI RPC REQ: xxxxxxxx [yyyyyyyy...]
where xxxxxxxx etc. are the raw MCDI payload in 32-bit hex chunks.
The response will then produce a similar line with "RESP" instead of "REQ",
and containing the MCDI response payload (if any).

Signed-off-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: release lock after each bucket in vxlan_cleanup

We're seeing some softlockups from this function when there
are a lot fdb entries on a vxlan device. Taking the lock for
each bucket instead of the whole table is enough to fix that.

Signed-off-by: Sorin Dumitru <sdumitru@ixiacom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp/dccp: try to not exhaust ip_local_port_range in connect()

A long standing problem on busy servers is the tiny available TCP port
range (/proc/sys/net/ipv4/ip_local_port_range) and the default
sequential allocation of source ports in connect() system call.

If a host is having a lot of active TCP sessions, chances are
very high that all ports are in use by at least one flow,
and subsequent bind(0) attempts fail, or have to scan a big portion of
space to find a slot.

In this patch, I changed the starting point in __inet_hash_connect()
so that we try to favor even [1] ports, leaving odd ports for bind()
users.

We still perform a sequential search, so there is no guarantee, but
if connect() targets are very different, end result is we leave
more ports available to bind(), and we spread them all over the range,
lowering time for both connect() and bind() to find a slot.

This strategy only works well if /proc/sys/net/ipv4/ip_local_port_range
is even, ie if start/end values have different parity.

Therefore, default /proc/sys/net/ipv4/ip_local_port_range was changed to
32768 - 60999 (instead of 32768 - 61000)

There is no change on security aspects here, only some poor hashing
schemes could be eventually impacted by this change.

[1] : The odd/even property depends on ip_local_port_range values parity

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ip_frag_next'

Florian Westphal says:

====================
net: force refragmentation for DF reassembed skbs

output path tests:

    if (skb->len > mtu) ip_fragment()

This breaks connectivity in one corner case:
If the skb was reassembled, but has the DF bit set and ..
.. its reassembled size is <= outdev mtu ..
.. we will forward a DF packet larger than what the sender
    transmitted on wire.

If a router later in the path can't forward this packet, it will send an
icmp error in response to an mtu that the original sender never exceeded.

This changes ipv4 defrag/output path to

a) force refragmentation for DF reassembled skbs and
b) set DF bit on all fragments when refragmenting if it was set on original
frags.

tested via:
from scapy.all import *
dip="10.23.42.2"
payload="A"*1400
packet=IP(dst=dip,id=12345,flags='DF')/UDP(sport=42,dport=42)/payload
frags=fragment(packet,fragsize=1200)
for fragment in frags:
    send(fragment)

Without this patch, we generate fragments without df bit set based
on the outgoing device mtu when fragmenting after forwarding, ie.

IP (ttl 64, id 12345, offset 0, flags [+, DF], proto UDP (17), length 1204)
    192.168.7.1.42 > 10.23.42.2.42: UDP, length 1400
IP (ttl 64, id 12345, offset 1184, flags [DF], proto UDP (17), length 244)
    192.168.7.1 > 10.23.42.2: ip-proto-17

on ingress will either turn into

IP (ttl 63, id 12345, offset 0, flags [+], proto UDP (17), length 1396)
    192.168.7.1.42 > 10.23.42.2.42: UDP, length 1400
IP (ttl 63, id 12345, offset 1376, flags [none], proto UDP (17), length 52)

(mtu 1400: We strip df and send larger fragment), or

IP (ttl 63, id 12345, offset 0, flags [DF], proto UDP (17), length 1428)
    192.168.7.1.42 > 10.23.42.2.42: [udp sum ok] UDP, length 1400

if mtu is 1500.  And in this case things break; router with a smaller mtu
will send icmp error, but original sender only sent packets <= 1204 byte.

With patch, we keep intent of such fragments and will emit DF-fragments
that won't exceed 1204 byte in size.

Joint work with Hannes Frederic Sowa.

Changes since v2:
- split unrelated patches from series
- rework changelog of patch #2 to better illustrate breakage
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ip_fragment: don't forward defragmented DF packet

We currently always send fragments without DF bit set.

Thus, given following setup:

mtu1500 - mtu1500:1400 - mtu1400:1280 - mtu1280
A R1 R2 B

Where R1 and R2 run linux with netfilter defragmentation/conntrack
enabled, then if Host A sent a fragmented packet _with_ DF set to B, R1
will respond with icmp too big error if one of these fragments exceeded
1400 bytes.

However, if R1 receives fragment sizes 1200 and 100, it would
forward the reassembled packet without refragmenting, i.e.
R2 will send an icmp error in response to a packet that was never sent,
citing mtu that the original sender never exceeded.

The other minor issue is that a refragmentation on R1 will conceal the
MTU of R2-B since refragmentation does not set DF bit on the fragments.

This modifies ip_fragment so that we track largest fragment size seen
both for DF and non-DF packets, and set frag_max_size to the largest
value.

If the DF fragment size is larger or equal to the non-df one, we will
consider the packet a path mtu probe:
We set DF bit on the reassembled skb and also tag it with a new IPCB flag
to force refragmentation even if skb fits outdev mtu.

We will also set DF bit on each fragment in this case.

Joint work with Hannes Frederic Sowa.

Reported-by: Jesse Gross <jesse@nicira.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipv4: avoid repeated calls to ip_skb_dst_mtu helper

ip_skb_dst_mtu is small inline helper, but its called in several places.

before: 17061 44 0 17105 42d1 net/ipv4/ip_output.o
after: 16805 44 0 16849 41d1 net/ipv4/ip_output.o

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'phy_rgmii'

Florian Fainelli says:

====================
net: phy: phy_interface_is_rgmii helper

As you suggested, here is the helper function to avoid missing some RGMII
interface checks. Had to wait for net to be merged in net-next to avoid
submitting the same patch/commit.

Dan, you might want to rebase your dp83867 submission to use that helper
when you this patchset gets merged into net-next, thanks!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Utilize phy_interface_is_rgmii

Update all open-coded tests for all 4 PHY_INTERFACE_MODE_RGMII* values
to use the newly introduced helper: phy_interface_is_rgmii.

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: Add phy_interface_is_rgmii helper

RGMII interfaces come in 4 different flavors that the PHY library needs
to care about: regular RGMII (no delays), RGMII with either RX or TX
delay, and both. In order to avoid errors of checking only for one type
of RGMII interface and miss the 3 others, introduce a convenience
function which tests for all values.

Suggested-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: Fix fib_trie.c build, missing linux/vmalloc.h include.

We used to get this indirectly I supposed, but no longer do.

Either way, an explicit include should have been done in the
first place.

   net/ipv4/fib_trie.c: In function '__node_free_rcu':
>> net/ipv4/fib_trie.c:293:3: error: implicit declaration of function 'vfree' [-Werror=implicit-function-declaration]
      vfree(n);
      ^
   net/ipv4/fib_trie.c: In function 'tnode_alloc':
>> net/ipv4/fib_trie.c:312:3: error: implicit declaration of function 'vzalloc' [-Werror=implicit-function-declaration]
      return vzalloc(size);
      ^
>> net/ipv4/fib_trie.c:312:3: warning: return makes pointer from integer without a cast
   cc1: some warnings being treated as errors

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: tcp_tso_autosize() minimum is one packet

By making sure sk->sk_gso_max_segs minimal value is one,
and sysctl_tcp_min_tso_segs minimal value is one as well,
tcp_tso_autosize() will return a non zero value.

We can then revert 843925f33fcc293d80acf2c5c8a78adf3344d49b
("tcp: Do not apply TSO segment limit to non-TSO packets")
and save few cpu cycles in fast path.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Neal Cardwell <ncardwell@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: fix/cleanup inet_ehash_locks_alloc()

If tcp ehash table is constrained to a very small number of buckets
(eg boot parameter thash_entries=128), then we can crash if spinlock
array has more entries.

While we are at it, un-inline inet_ehash_locks_alloc() and make
following changes :

- Budget 2 cache lines per cpu worth of 'spinlocks'
- Try to kmalloc() the array to avoid extra TLB pressure.
(Most servers at Google allocate 8192 bytes for this hash table)
- Get rid of various #ifdef

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: fix bug in link protocol message create function

In commit dd3f9e70f59f43a5712eba9cf3ee4f1e6999540c
("tipc: add packet sequence number at instant of transmission") we
made a change with the consequence that packets in the link backlog
queue don't contain valid sequence numbers.

However, when we create a link protocol message, we still use the
sequence number of the first packet in the backlog, if there is any,
as "next_sent" indicator in the message. This may entail unnecessary
retransissions or stale packet transmission when there is very low
traffic on the link.

This commit fixes this issue by only using the current value of
tipc_link::snd_nxt as indicator.

Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fix inet_proto_csum_replace4() sparse errors

make C=2 CF=-D__CHECK_ENDIAN__ net/core/utils.o
...
net/core/utils.c:307:72: warning: incorrect type in argument 2 (different base types)
net/core/utils.c:307:72:    expected restricted __wsum [usertype] addend
net/core/utils.c:307:72:    got restricted __be32 [usertype] from
net/core/utils.c:308:34: warning: incorrect type in argument 2 (different base types)
net/core/utils.c:308:34:    expected restricted __wsum [usertype] addend
net/core/utils.c:308:34:    got restricted __be32 [usertype] to
net/core/utils.c:310:70: warning: incorrect type in argument 2 (different base types)
net/core/utils.c:310:70:    expected restricted __wsum [usertype] addend
net/core/utils.c:310:70:    got restricted __be32 [usertype] from
net/core/utils.c:310:77: warning: incorrect type in argument 2 (different base types)
net/core/utils.c:310:77:    expected restricted __wsum [usertype] addend
net/core/utils.c:310:77:    got restricted __be32 [usertype] to
net/core/utils.c:312:72: warning: incorrect type in argument 2 (different base types)
net/core/utils.c:312:72:    expected restricted __wsum [usertype] addend
net/core/utils.c:312:72:    got restricted __be32 [usertype] from
net/core/utils.c:313:35: warning: incorrect type in argument 2 (different base types)
net/core/utils.c:313:35:    expected restricted __wsum [usertype] addend
net/core/utils.c:313:35:    got restricted __be32 [usertype] to

Note we can use csum_replace4() helper

Fixes: 58e3cac5613aa ("net: optimise inet_proto_csum_replace4()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: remove a sparse error in secure_dccpv6_sequence_number()

make C=2 CF=-D__CHECK_ENDIAN__ net/core/secure_seq.o
net/core/secure_seq.c:157:50: warning: restricted __be32 degrades to
integer

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: skip fdb add if the port shouldn't learn

Check in fdb_add_entry() if the source port should learn, similar
check is used in br_fdb_update.
Note that new fdb entries which are added manually or
as local ones are still permitted.
This patch has been tested by running traffic via a bridge port and
switching the port's state, also by manually adding/removing entries
from the bridge's fdb.

Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: remove one sparse error

net/core/pktgen.c:2672:43: warning: incorrect type in assignment (different base types)
net/core/pktgen.c:2672:43: expected unsigned short [unsigned] [short] [usertype] <noident>
net/core/pktgen.c:2672:43: got restricted __be16 [usertype] protocol

Let's use proper struct ethhdr instead of hard coding everything.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: ipv6_select_ident() returns a __be32

ipv6_select_ident() returns a 32bit value in network order.

Fixes: 286c2349f666 ("ipv6: Clean up ipv6_select_ident() and ip6_fragment()")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: kbuild test robot <fengguang.wu@intel.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'cpsw-cleanups'

Richard Cochran says:

====================
cpsw cleanups

While working on an out-of-tree customization, I noticed a few minor
problems in the cpsw code. This series cleans up the issues I found.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: cpsw: remove redundant calls disabling dma interrupts.

The function, cpsw_intr_disable, already calls cpdma_ctlr_int_ctrl. There
is no need to disable the dma interrupts twice. This patch removes the
extra calls.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cpsw: remove redundant calls enabling dma interrupts.

The function, cpsw_intr_enable, already calls cpdma_ctlr_int_ctrl. There
is no need to enable the dma interrupts twice. This patch removes the
extra call.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cpsw: remove two unused global functions

The funtions, cpsw_ale_flush and cpsw_ale_set_ageout, have never been used
since they were first introduced. This patch removes the dead code.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cpsw: fix misplaced break statements.

Having the breaks too far to the left makes parsing the dense switch/case
block unnecessarily harder.

Signed-off-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'rocker-cleanups'

Simon Horman says:

====================
rocker: unused parameter and const cleanups

This series provides some minor though verbose cleanup of rocker.

The second patch depends on the first though it could be rebased.

I had previously asked for v2 to be put on hold while some bugs I had found
in the rocker driver were shaken out. That has now happened and the bugs
turned out to be unrelated.  Accordingly I am reposting the series.

* Changes v2 -> v3
  - Rebase and update for new variables and parameters that may be const

* Changes v1 -> v2
  - Found quite a few more variables and parameters to make const
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: mark parameters and local variables as const

Mark parameters and local variables as const where possible.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: remove unused rocker_port parameter from rocker_port_kfree

Remove unused rocker_port parameter from rocker_port_kfree.
Also remove the rocker_port parameter from callers of rocker_port_kfree
where the parameter it is now unused.

Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: Scott Feldman <sfeldma@gmail.com>
Acked-by: Jiri Pirko <jiri@resnulli.us>
Signed-off-by: David S. Miller <davem@davemloft.net>

irda: use msecs_to_jiffies for conversion to jiffies

API compliance scanning with coccinelle flagged:
./net/irda/timer.c:63:35-37: use of msecs_to_jiffies probably perferable

Converting milliseconds to jiffies by "val * HZ / 1000" technically
is not a clean solution as it does not handle all corner cases correctly.
By changing the conversion to use msecs_to_jiffies(val) conversion is
correct in all cases. Further the () around the arithmetic expression
was dropped.

Patch was compile tested for x86_64_defconfig + CONFIG_IRDA=m

Patch is against 4.1-rc4 (localversion-next is -next-20150522)

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

neterion: s2io: Fix kernel doc formatting

These two uses seem to have had carriage returns removed.
Make these entries like all the others in this file.

Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

irda: irda-usb: use msecs_to_jiffies for conversions

API compliance scanning with coccinelle flagged:

Converting milliseconds to jiffies by "val * HZ / 1000" is technically
is not a clean solution as it does not handle all corner cases correctly.
By changing the conversion to use msecs_to_jiffies(val) conversion is
correct in all cases.

in the current code:
mod_timer(&self->rx_defer_timer, jiffies + (10 * HZ / 1000));
for HZ < 100 (e.g. CONFIG_HZ == 64|32 in alpha) this effectively results
in no delay at all.

Patch was compile tested for x86_64_defconfig (implies CONFIG_USB_IRDA=m)

Patch is against 4.1-rc4 (localversion-next is -next-20150522)

Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: allow setting hash_max + multicast_router if interface is down

Network managers like netifd (used in OpenWRT for instance) try to
configure interface options after creation but before setting the
interface up.

Unfortunately the sysfs / bridge currently only allows to configure the
hash_max and multicast_router options when the bridge interface is up.
But since br_multicast_init() doesn't start any timers and only sets
default values and initializes timers it should be save to reconfigure
the default values after that, before things actually get active after
the bridge is set up.

Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: don't increase size when refragmenting forwarded ipv6 skbs

since commit 6aafeef03b9d ("netfilter: push reasm skb through instead of
original frag skbs") we will end up sometimes re-fragmenting skbs
that we've reassembled.

ipv6 defrag preserves the original skbs using the skb frag list, i.e. as long
as the skb frag list is preserved there is no problem since we keep
original geometry of fragments intact.

However, in the rare case where the frag list is munged or skb
is linearized, we might send larger fragments than what we originally
received.

A router in the path might then send packet-too-big errors even if
sender never sent fragments exceeding the reported mtu:

mtu 1500 - 1500:1400 - 1400:1280 - 1280
A R1 R2 B

1 - A sends to B, fragment size 1400
2 - R2 sends pkttoobig error for 1280
3 - A sends to B, fragment size 1280
4 - R2 sends pkttoobig error for 1280 again because it sees fragments of size 1400.

make sure ip6_fragment always caps MTU at largest packet size seen
when defragmented skb is forwarded.

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

atm:he - Change 1 to true for bool type variable.

The variable irq_coalesce is bool type.
So assign the value true instead of 1.

Signed-off-by: Shailendra Verma <shailendra.capricorn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net:xen-netback - Change 1 to true for bool type variable.

The variable separate_tx_rx_irq is bool type so assigning true
instead of 1.

Signed-off-by: Shailendra Verma <shailendra.capricorn@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipv6_route_sharing'

Martin KaFai Lau says:

====================
ipv6: Only create RTF_CACHE route after encountering pmtu exception

v4 -> v5:
- Patch 1 is new. Clean up the ipv6_select_ident() and ip6_fragment().

- Further simplify the newly added rt6_get_pcpu_route().  If there is a
  'prev' after cmpxchg, return prev instead of the newly created percpu
  clone.

v3 -> v4:
- Patch 8 is new. It keeps track of the DST_NOCACHE routes in a list to handle
  the iface down/unregister event.

- Remove rcu from the newly added rt6i_pcpu variable.  It is not needed
  because it has already been protected by the existing reader/writer lock.

- Thanks to 'Julian Anastasov <ja@ssi.bg>' for testing the FLOWI_FLAG_KNOWN_NH
  patches.

v2 -> v3:
- Patch 5 to 7 are new.  They take care of cases where the daddr in
  skb is not the one used to do the route look-up.  There is also
  related changes to rt6_nexthop() since v2 which is in patch 2/9.
  Thanks to 'Julian Anastasov <ja@ssi.bg>' for pointing it out.

- Fix a few problems in __ip6_rt_update_pmtu(), like setting the expire
  and mtu before inserting to the tree and don't do dst_destroy() after
  tree insertion failure.  Also update the rt6i_pmtu in fib6_add_rt2node().
  Thanks to 'Steffen Klassert <steffen.klassert@secunet.com>' for pointing
  it out.

- Merge ip6_pmtu_rt_cache_alloc() into ip6_rt_cache_alloc().

v1 -> v2:
- Move the /128 route bug fixes to another series (accepted).
- Create a function for checking (rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY)).
- Avoid shuffling the skb network_header.  Instead, change the function
  signature to take iph instead of skb.

- Many Thanks to 'Hannes Frederic Sowa <hannes@stressinduktion.org>' on
  reviewing v1 and v2 and giving advice.

--Martin

~~~ start: v1 compose message (with the out-dated parts removed) ~~~

This series is to avoid creating a RTF_CACHE route whenever we are consulting
the fib6 tree with a new destination.  Instead, only create RTF_CACHE route
when we see a pmtu exception.

Out of all ipv6 RTF_CACHE routes that are created, the percentage that has a
different mtu is very small. In one of our end-user facing proxy server,
only 1k out of 80k RTF_CACHE routes have a smaller MTU.  For our DC
traffic, there is no mtu exception.

A large fib6 tree has problems like, 'ip -6 r show' takes a long time.
gc may kick in too often.  Also, when a service has restarted and a lot
of new TCP conn requests come in, it creates pressure on the tree by inserting
a lot of RTF_CACHE in a short time and it currently requires a write lock
to do that.

The first few patches are prep works to remove assumption that the
returned rt is always RTF_CACHE.

The patch 'ipv6: Only create RTF_CACHE routes after encountering pmtu exception'
do the lazy RTF_CACHE route creation.

The following patches added percpu rt to compensate the performance loss after
doing the RTF_CACHE lazy creation.

Here is some numbers of the udpflood test.  The udpflood has been
slightly modified to have a time limit instead of count limit.

A /64 via gateway route is used for the test. Each udpflood uses 10000 dst
addresses.  The dst addresses of different udpflood processes do not overlap
with each other.

1                    16M                          15M
10                   61M                          61M
20                   65M                          62M
40                   88M                          83M

~~~ end: v1 compose message ~~~
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Create percpu rt6_info

After the patch
'ipv6: Only create RTF_CACHE routes after encountering pmtu exception',
we need to compensate the performance hit (bouncing dst->__refcnt).

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Break up ip6_rt_copy()

This patch breaks up ip6_rt_copy() into ip6_rt_copy_init() and
ip6_rt_cache_alloc().

In the later patch, we need to create a percpu rt6_info copy. Hence,
refactor the common rt6_info init codes to ip6_rt_copy_init().

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister

This patch keeps track of the DST_NOCACHE routes in a list and replaces its
dev with loopback during the iface down/unregister event.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Create RTF_CACHE clone when FLOWI_FLAG_KNOWN_NH is set

This patch always creates RTF_CACHE clone with DST_NOCACHE
when FLOWI_FLAG_KNOWN_NH is set so that the rt6i_dst is set to
the fl6->daddr.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Julian Anastasov <ja@ssi.bg>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Set FLOWI_FLAG_KNOWN_NH at flowi6_flags

The neighbor look-up used to depend on the rt6i_gateway (if
there is a gateway) or the rt6i_dst (if it is a RTF_CACHE clone)
as the nexthop address. Note that rt6i_dst is set to fl6->daddr
for the RTF_CACHE clone where fl6->daddr is the one used to do
the route look-up.

Now, we only create RTF_CACHE clone after encountering exception.
When doing the neighbor look-up with a route that is neither a gateway
nor a RTF_CACHE clone, the daddr in skb will be used as the nexthop.

In some cases, the daddr in skb is not the one used to do
the route look-up. One example is in ip_vs_dr_xmit_v6() where the
real nexthop server address is different from the one in the skb.

This patch is going to follow the IPv4 approach and ask the
ip6_pol_route() callers to set the FLOWI_FLAG_KNOWN_NH properly.

In the next patch, ip6_pol_route() will honor the FLOWI_FLAG_KNOWN_NH
and create a RTF_CACHE clone.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Acked-by: Julian Anastasov <ja@ssi.bg>
Tested-by: Julian Anastasov <ja@ssi.bg>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Add rt6_get_cookie() function

Instead of doing the rt6->rt6i_node check whenever we need
to get the route's cookie. Refactor it into rt6_get_cookie().
It is a prep work to handle FLOWI_FLAG_KNOWN_NH and also
percpu rt6_info later.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Only create RTF_CACHE routes after encountering pmtu exception

This patch creates a RTF_CACHE routes only after encountering a pmtu
exception.

After ip6_rt_update_pmtu() has inserted the RTF_CACHE route to the fib6
tree, the rt->rt6i_node->fn_sernum is bumped which will fail the
ip6_dst_check() and trigger a relookup.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Combine rt6_alloc_cow and rt6_alloc_clone

A prep work for creating RTF_CACHE on exception only. After this
patch, the same condition (rt->rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY))
is checked twice. This redundancy will be removed in the later patch.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST

When creating a RTF_CACHE route, RTF_ANYCAST is set based on rt6i_dst.
Also, rt6i_gateway is always set to the nexthop while the nexthop
could be a gateway or the rt6i_dst.addr.

After removing the rt6i_dst and rt6i_src dependency in the last patch,
we also need to stop the caller from depending on rt6i_gateway and
RTF_ANYCAST.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Remove external dependency on rt6i_dst and rt6i_src

This patch removes the assumptions that the returned rt is always
a RTF_CACHE entry with the rt6i_dst and rt6i_src containing the
destination and source address. The dst and src can be recovered from
the calling site.

We may consider to rename (rt6i_dst, rt6i_src) to
(rt6i_key_dst, rt6i_key_src) later.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: Clean up ipv6_select_ident() and ip6_fragment()

This patch changes the ipv6_select_ident() signature to return a
fragment id instead of taking a whole frag_hdr as a param to
only set the frag_hdr->identification.

It also cleans up ip6_fragment() to obtain the fragment id at the
beginning instead of using multiple "if" later to check fragment id
has been generated or not.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: Add PHY firmware support for T420-BT cards

Add support for flashing 10GBaseT adapter with BCM 84834 PHY and
Aquantia AQ1202 PHY.

Updating of the PHY firmware must happen before the INITIALIZE_CMD.

Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

test_bpf: add more eBPF jump torture cases

Add two more eBPF test cases for JITs, i.e. the second one revealed a
bug in the x86_64 JIT compiler, where only an int3 filled image from
the allocator was emitted and later wrongly set by the compiler as the
bpf_func program code since optimization pass boundary was surpassed
w/o actually emitting opcodes.

Interpreter:

  [   45.782892] test_bpf: #242 BPF_MAXINSNS: Very long jump backwards jited:0 11 PASS
  [   45.783062] test_bpf: #243 BPF_MAXINSNS: Edge hopping nuthouse jited:0 14705 PASS

After x86_64 JIT (fixed):

  [   80.495638] test_bpf: #242 BPF_MAXINSNS: Very long jump backwards jited:1 6 PASS
  [   80.495957] test_bpf: #243 BPF_MAXINSNS: Edge hopping nuthouse jited:1 17157 PASS

Reference: http://thread.gmane.org/gmane.linux.network/364729
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'amd-xgbe-next'

Tom Lendacky says:

====================
amd-xgbe: AMD XGBE driver updates 2015-05-22

The following patches are included in this driver update series:

- Retrieve and set an additional hardware feature setting
- Fix the initial mode/speed determination when auto-negotiation is
disabled
- Add additional netif_dbg support to the driver

This patch series is based on net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Add more netif_dbg output to the driver

Change more netdev_dbg statements over to netif_dbg and add some new
netif_dbg statements to the driver.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Fix initial mode when auto-negotiation is disabled

When the ethtool command is used to set the speed of the device while
the device is down, the check to set the initial mode may fail when
the device is brought up, causing failure to bring the device up.

Update the code to set the initial mode based on the desired speed if
auto-negotiation is disabled.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

amd-xgbe: Add setting of a missing hardware feature

The device private data structure contains all the defined hardware
features for the device. However one of the features is not set. Even
though the feature is not currently used, set it to avoid future
issues of the feature being checked thinking it has been properly set.

Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ip: reject too-big defragmented DF-skb when forwarding

Send icmp pmtu error if we find that the largest fragment of df-skb
exceeded the output path mtu.

The ip output path will still catch this later on but we can avoid the
forward/postrouting hook traversal by rejecting right away.

This is what ipv6 already does.

Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'af_unix_sendpage'

Hannes Frederic Sowa says:

====================
net: af_unix: zerocopy stream bits

This series implements zerocopy support for AF_UNIX SOCK_STREAM sockets.

Changelog in the specific patches. Thanks to all the reviewers!
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: af_unix: implement splice for stream af_unix sockets

unix_stream_recvmsg is refactored to unix_stream_read_generic in this
patch and enhanced to deal with pipe splicing. The refactoring is
inneglible, we mostly have to deal with a non-existing struct msghdr
argument.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: make skb_splice_bits more configureable

Prepare skb_splice_bits to be able to deal with AF_UNIX sockets.

AF_UNIX sockets don't use lock_sock/release_sock and thus we have to
use a callback to make the locking and unlocking configureable.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: af_unix: implement stream sendpage support

This patch implements sendpage support for AF_UNIX SOCK_STREAM
sockets. This is also required for a complete splice implementation.

The implementation is a bit tricky because we append to already existing
skbs and so have to hold unix_sk->readlock to protect the reading side
from either advancing UNIXCB.consumed or freeing the skb at the socket
receive tail.

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skbuff: add skb_append_pagefrags and use it

Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'wireless-drivers-next-for-davem-2015-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next

Kalle Valo says:

====================
ath10k:

* enable channel 144 on 5 GHz
* enable Adaptive Noise Immunity (ANI) by default
* add Wake on Wireless LAN (WOW) patterns support
* add basic Tunneled Direct Link Setup (TDLS) support
* add multi-channel support for QCA6174
* enable IBSS RSN support
* enable Bluetooth Coexistance whenever firmware supports it
* add more versatile way to set bitrates used by the firmware

ath9k:

* spectral scan: add support for multiple FFT frames per report

iwlwifi:

* major rework of the scan code (Luca)
* some work on the thermal code (Chaya Rachel)
* some work on the firwmare debugging infrastructure

brcmfmac:

* SDIO suspend and resume fixes
* wiphy band info and changes in regulatory settings
* add support for BCM4324 SDIO and BCM4358 PCIe
* enable support of PCIe devices on router platforms (Hante)
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx4-next'

Or Gerlitz says:

====================
mlx4: Enable single ported VFs over IB ports

This series further enhances the support for mlx4 single ported VFs
introduced in 3.15 to work over IB ports too.

Just as quick reminder, the ConnectX3 device family exposes one PCI device
which serves both ports.

This can be non-optimal under virtualization schemes where the admin
would like the VF to expose one interface to the VM, etc.

Since all the VF interaction with the firmware passes through the PF
driver, we can emulate to the VF they have one port, and further create
a set of the VFs which act on port1 of the device and another set which
acts on port2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Enable single ported IB VFs

Remove the limitation that disallows configuring single ported VFs
in the presence of IB ports, after addressing the issues that
prevented that to work.

SMI (QP0) requests/responses are still not supported for single
ported IB VFs.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Adjust the schedule queue port in reset-to-init too

It's legal for drivers to provide the QP port through the
QPC schedule-queue field on the reset-to-init QP state change.

Add adjusting of the schedule queue port in the SRIOV wrapper
for that operation too.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Adjust the schedule queue port for single ported IB VFs

Some VF drivers flow set the schedule queue in the QP context but
without setting none of OPTPAR_SCHED_QUEUE or OPTPAR_PRIMARY_ADDR_PATH.

To allow for such non-modified drivers to function as single ported
IB VFs, we must adjust the schedule queue port whenever being set,
e.g as currently done for single ported Eth VFs.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Modify port values when generting EQEs for VFs

As part of enabling single ported VFs over IB ports we need to handle
some of the flows for generting EQ events for VFs which don't come
into play under Eth ports.

This mainly includes port management events derived from changes of the
phyiscal port (lid change, client re-register, down/up, etc), VF pkey table
changes and VF guid changes initiated by the IB driver.

(1) make sure that events are generated only for VFs sitting on
the relevant physical port (under the ALL_SLAVES flow).

(2) before generating the event, convert from physical (one or two)
to VF port (always equals one).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

IB/mlx4: Convert slave port before building address-handle

When multiplexling a MAD sent from VF, we should convert the port used
by the guest to send the packet to the actual physical port which will be
used to transmit the packet, before building the relevant address-handle (AH).

This is needed under VPI for single ported VFs, since the code that builds
the AH (mlx4_ib_query_ah()) makes decisions based on the input port. If we
use the port number provided by the guest, it might have different protocol
vs. the one this packat has to go from, and hence the result could be wrong.

So far, the conversion was done after the AH was built and it worked for
single ported Eth VFs which were not enabled under VPI. When adding support
for single ported IB VFs and VPI, we hit that.

Fixes: 449fc48866f7 ('net/mlx4: Adapt code for N-Port VF')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_core: Enhance the MAD_IFC wrapper to convert VF port to physical

Single port VFs always provide port = 1 (even if the actual physical
port used is port 2). As such, we need to convert the port provided
by the VF to the physical port before calling into the firmware.

It turns out that the Linux mlx4 VF RoCE driver maintains a copy of
the GID table and hence this change became critical only for single
ported IB VFs, but it could be needed for other RoCE VF drivers too.

Fixes: 449fc48866f7 ('net/mlx4: Adapt code for N-Port VF')
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: David S. Miller <davem@davemloft.net>

enic: Grammar s/an negative/a negative/

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Conflicts:
drivers/net/ethernet/cadence/macb.c
drivers/net/phy/phy.c
include/linux/skbuff.h
net/ipv4/tcp.c
net/switchdev/switchdev.c

Switchdev was a case of RTNH_H_{EXTERNAL --> OFFLOAD}
renaming overlapping with net-next changes of various
sorts.

phy.c was a case of two changes, one adding a local
variable to a function whilst the second was removing
one.

tcp.c overlapped a deadlock fix with the addition of new tcp_info
statistic values.

macb.c involved the addition of two zyncq device entries.

skbuff.h involved adding back ipv4_daddr to nf_bridge_info
whilst net-next changes put two other existing members of
that struct into a union.

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'pktgen-new-scripts'

Jesper Dangaard Brouer says:

====================
pktgen: cleanups and introducing new samples/pktgen scripts

v3:
- Aborted v2 send due it was not generating diff stat
   (this is a bug in stg-mail, if not in the root directory)

v2: address nitpicks from Cong Wang
- Remove useless cat's, but keep them for old pgset()
- Comment on: Due to pgctrl, cannot use exit code $? from grep
- Use arithmetic compare in pktgen_sample03_burst_single_flow.sh

This patchset is focused on making pktgen easier to use and better
documented. It contains a number of documentation updates and minor
changes to pktgen.  The major contribution is introduction of common
helper function for sample scripts.

Instead of the old pgset() function, three new shell functions for
configuring the different components of pktgen are introduced:
pg_ctrl(), pg_thread() and pg_set().

The new functions correspond to pktgens different components.
* pg_ctrl()   control "pgctrl" (/proc/net/pktgen/pgctrl)
* pg_thread() control the kernel threads and binding to devices
* pg_set()    control setup of individual devices

Helpers also provide consistent parameter parsing across the sample
scripts.

Usage example:
./pktgen_sample01_simple.sh -i eth41 -m 00:12:C0:02:AC:5A -d 192.168.41.2

Usage: ./pktgen_sample01_simple.sh [-vx] -i ethX
  -i : ($DEV)       output interface/device (required)
  -s : ($PKT_SIZE)  packet size
  -d : ($DEST_IP)   destination IP
  -m : ($DST_MAC)   destination MAC-addr
  -t : ($THREADS)   threads to start
  -c : ($SKB_CLONE) SKB clones send before alloc new SKB
  -b : ($BURST)     HW level bursting of SKBs
  -v : ($VERBOSE)   verbose
  -x : ($DEBUG)     debug

These scripts are borrowed from:
https://github.com/netoptimizer/network-testing/tree/master/pktgen
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: add benchmark script pktgen_bench_xmit_mode_netif_receive.sh

This script pktgen_bench_xmit_mode_netif_receive.sh is a benchmark
script, which can be used for benchmarking part of the network stack.
This can be used for performance improving or catching regression in
that area.

The script is developed for benchmarking ingress qdisc path, original
idea by Alexei Starovoitov. This script don't really need any
hardware. This is achieved via the recently introduced stack inject
feature "xmit_mode netif_receive". See commit 62f64aed622b6 ("pktgen:
introduce xmit_mode '<start_xmit|netif_receive>'").

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: add sample script pktgen_sample03_burst_single_flow.sh

Add the pktgen samples script pktgen_sample03_burst_single_flow.sh
that demonstrates how to acheive maximum performance.

If correctly tuned[1] single CPU 10Gbit/s wirespeed small pkts is
possible[2] which is 14.88Mpps. The trick is to take advantage of the
"burst" feature introduced in commit 38b2cf2982dc73 ("net: pktgen:
packet bursting via skb->xmit_more").

[1] http://netoptimizer.blogspot.dk/2014/06/pktgen-for-network-overload-testing.html
[2] http://netoptimizer.blogspot.dk/2014/10/unlocked-10gbps-tx-wirespeed-smallest.html

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: add sample script pktgen_sample02_multiqueue.sh

Add the pktgen samples script pktgen_sample02_multiqueue.sh that
demonstrates generating packets on multiqueue NICs.

Specifically notice the options "-t" that specifies how many
kernel threads to activate.  Also notice the flag QUEUE_MAP_CPU,
which cause the SKB TX queue to be mapped to the CPU running the
kernel thread.  For best scalability people are also encourage to
map NIC IRQ /proc/irq/*/smp_affinity to CPU number.

Usage example with "-t" 4 threads and help:
./pktgen_sample02_multiqueue.sh -i eth4 -m 00:1B:21:3C:9D:F8 -t 4

Usage: ./pktgen_sample02_multiqueue.sh [-vx] -i ethX
  -i : ($DEV)       output interface/device (required)
  -s : ($PKT_SIZE)  packet size
  -d : ($DEST_IP)   destination IP
  -m : ($DST_MAC)   destination MAC-addr
  -t : ($THREADS)   threads to start
  -c : ($SKB_CLONE) SKB clones send before alloc new SKB
  -b : ($BURST)     HW level bursting of SKBs
  -v : ($VERBOSE)   verbose
  -x : ($DEBUG)     debug

Removing pktgen.conf-2-1 and pktgen.conf-2-2 as these examples
should be covered now.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: add sample script pktgen_sample01_simple.sh

Add the first basic pktgen samples script pktgen_sample01_simple.sh,
which demonstrates the a simple use of the helper functions.
Removing pktgen.conf-1-1 as that example should be covered now.

The naming scheme pktgen_sampleNN, where NN is a number, should encourage
reading the samples in a specific order.

Script cause pktgen sending with a single thread and single interface,
and introduce flow variation via random UDP source port.

Usage example and help:
./pktgen_sample01_simple.sh -i eth4 -m 00:1B:21:3C:9D:F8 -d 192.168.8.2

Usage: ./pktgen_sample01_simple.sh [-vx] -i ethX
  -i : ($DEV)       output interface/device (required)
  -s : ($PKT_SIZE)  packet size
  -d : ($DEST_IP)   destination IP
  -m : ($DST_MAC)   destination MAC-addr
  -c : ($SKB_CLONE) SKB clones send before alloc new SKB
  -v : ($VERBOSE)   verbose
  -x : ($DEBUG)     debug

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: new pktgen helper functions for samples scripts

Preparing for removing existing samples/pktgen/ scripts, and
replacing these with easier to use samples.

This commit provides two helper shell files, that can
be "included" by shell source'ing. Namely "functions.sh"
and "parameters.sh".

The parameters.sh file support easy and consistant parameter
parsing across the sample scripts.  Usage example is printed on
errors.

The functions.sh file provides, three new shell functions for
configuring the different components of pktgen: pg_ctrl(),
pg_thread() and pg_set().  A slightly improved version of the old
pgset() function is also provided for backwards compat.

The new functions correspond to pktgens different components.
* pg_ctrl()   control "pgctrl" (/proc/net/pktgen/pgctrl)
* pg_thread() control the kernel threads and binding to devices
* pg_set()    control setup of individual devices

These changes are borrowed from:
https://github.com/netoptimizer/network-testing/tree/master/pktgen

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: make /proc/net/pktgen/pgctrl report fail on invalid input

Giving /proc/net/pktgen/pgctrl an invalid command just returns shell
success and prints a warning in dmesg. This is not very useful for
shell scripting, as it can only detect the error by parsing dmesg.

Instead return -EINVAL when the command is unknown, as this provides
userspace shell scripting a way of detecting this.

Also bump version tag to 2.75, because (1) reading /proc/net/pktgen/pgctrl
output this version number which would allow to detect this small
semantic change, and (2) because the pktgen version tag have not been
updated since 2010.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: document ability to add same device to several threads

The pktgen.txt documentation still claimed that adding same device to
multiple threads were not supported, but it have been since 2008 via
commit e6fce5b916cd7 ("pktgen: multiqueue etc.").

Document this and describe the naming scheme dev@X, as the procfile name
still need to be unique.

Fixes: e6fce5b916cd7 ("pktgen: multiqueue etc.")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: doc were missing several config options

The pktgen.txt documentation over available config options were not complete.
Making the list complete by adding the following.

Pgcontrol commands:
reset

Device commands:
burst
queue_map_min
queue_map_max
skb_priority
tos
traffic_class
node
spi
dst6_max
dst6_min
vlan_cfi
vlan_id
vlan_p
svlan_cfi
svlan_id
svlan_p

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: adjust spacing in proc file interface output

Too many spaces were introduced in commit 63adc6fb8ac0 ("pktgen: cleanup
checkpatch warnings"), thus misaligning "src_min:" to other columns.

Fixes: 63adc6fb8ac0 ("pktgen: cleanup checkpatch warnings")
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

pktgen: remove obsolete "max_before_softirq" from pktgen doc

And cleanup some whitespaces in pktgen.txt.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"Radeon has two displayport fixes, one for a regression.

  i915 regression flicker fix needed so 4.0 can get fixed.

  A bunch of msm fixes and a bunch of exynos fixes, these two are
  probably a bit larger than I'd like, but most of them seems pretty
  good"

* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (29 commits)
  drm/radeon: fix error flag checking in native aux path
  drm/radeon: retry dcpd fetch
  drm/msm/mdp5: fix incorrect parameter for msm_framebuffer_iova()
  drm/exynos: dp: Lower level of EDID read success message
  drm/exynos: cleanup exynos_drm_plane
  drm/exynos: 'win' is always unsigned
  drm/exynos: mixer: don't dump registers under spinlock
  drm/exynos: Consolidate return statements in fimd_bind()
  drm/exynos: Constify exynos_drm_crtc_ops
  drm/exynos: Fix build breakage on !DRM_EXYNOS_FIMD
  drm/exynos: mixer: Constify platform_device_id
  drm/exynos: mixer: cleanup pixelformat handling
  drm/exynos: mixer: also allow NV21 for the video processor
  drm/exynos: mixer: remove buffer count handling in vp_video_buffer()
  drm/exynos: plane: honor buffer offset for dma_addr
  drm/exynos: fb: use drm_format_num_planes to get buffer count
  drm/i915: fix screen flickering
  drm/msm: fix locking inconsistencies in gpu->destroy()
  drm/msm/dsi: Simplify the code to get the number of read byte
  drm/msm: Attach assigned encoder to eDP and DSI connectors
  ...

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Don't leak ipvs->sysctl_tbl, from Tommi Rentala.

2) Fix neighbour table entry leak in rocker driver, from Ying Xue.

3) Do not emit bonding notifications for unregistered interfaces, from
    Nicolas Dichtel.

4) Set ipv6 flow label properly when in TIME_WAIT state, from Florent
    Fourcot.

5) Fix regression in ipv6 multicast filter test, from Henning Rogge.

6) do_replace() in various footables netfilter modules is missing a
    check for 0 counters in the datastructure provided by the user.  Fix
    from Dave Jones, and found with trinity.

7) Fix RCU bug in packet scheduler classifier module unloads, from
    Daniel Borkmann.

8) Avoid deadlock in tcp_get_info() by using u64_sync.  From Eric
    Dumzaet.

9) Input packet processing can race with inetdev_destroy() teardown,
    fix potential OOPS in ip_error() by explicitly testing whether the
    inetdev is still attached.  From Eric W Biederman.

10) MLDv2 parser in bridge multicast code breaks too early while
    parsing.  Fix from Thadeu Lima de Souza Cascardo.

11) Asking for settings on non-zero PHYID doesn't work because we do not
    import the command structure from the user and use the PHYID
    provided there.  Fix from Arun Parameswaran.

12) Fix UDP checksums with IPV6 RAW sockets, from Vlad Yasevich.

13) Missing NF_TABLES depends for TPROXY etc can cause build failures,
    fix from Florian Westphal.

14) Fix netfilter conntrack to handle RFC5961 challenge ACKs properly,
    from Jesper Dangaard Brouer.

15) If netlink autobind retry fails, we have to reset the sockets portid
    back to zero.  From Herbert Xu.

16) VXLAN netns exit code unregisters using wrong device, from John W
    Linville.

17) Add some USB device IDs to ath3k and btusb bluetooth drivers, from
    Dmitry Tunin and Wen-chien Jesse Sung.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
  bridge: fix lockdep splat
  net: core: 'ethtool' issue with querying phy settings
  bridge: fix parsing of MLDv2 reports
  ARM: zynq: DT: Use the zynq binding with macb
  net: macb: Disable half duplex gigabit on Zynq
  net: macb: Document zynq gem dt binding
  ipv4: fill in table id when replacing a route
  cdc_ncm: Fix tx_bytes statistics
  ipv4: Avoid crashing in ip_error
  tcp: fix a potential deadlock in tcp_get_info()
  net: sched: fix call_rcu() race on classifier module unloads
  net: phy: Make sure phy_start() always re-enables the phy interrupts
  ipv6: fix ECMP route replacement
  ipv6: do not delete previously existing ECMP routes if add fails
  Revert "netfilter: bridge: query conntrack about skb dnat"
  netfilter: ensure number of counters is >0 in do_replace()
  netfilter: nfnetlink_{log,queue}: Register pernet in first place
  tcp: don't over-send F-RTO probes
  tcp: only undo on partial ACKs in CA_Loss
  net/ipv6/udp: Fix ipv6 multicast socket filter regression
  ...

Merge branch 'for-linus' of git://git.kernel.dk/linux-block

Pull block fixes from Jens Axboe:
"Three small fixes that have been picked up the last few weeks.
  Specifically:

   - Fix a memory corruption issue in NVMe with malignant user
     constructed request.  From Christoph.

   - Kill (now) unused blk_queue_bio(), dm was changed to not need this
     anymore.  From Mike Snitzer.

   - Always use blk_schedule_flush_plug() from the io_schedule() path
     when flushing a plug, fixing a !TASK_RUNNING warning with md.  From
     Shaohua"

* 'for-linus' of git://git.kernel.dk/linux-block:
  sched: always use blk_schedule_flush_plug in io_schedule_out
  nvme: fix kernel memory corruption with short INQUIRY buffers
  block: remove export for blk_queue_bio

Merge tag 'md/4.1-rc4-fixes' of git://neil.brown.name/md

Pull md bugfixes from Neil Brown:
"I have a few more raid5 bugfixes pending, but I want them to get a bit
  more review first.  In the meantime:

   - one serious RAID0 data corruption - caused by recent bugfix that
     wasn't reviewed properly.

   - one raid5 fix in new code (a couple more of those to come).

   - one little fix to stop static analysis complaining about silly rcu
     annotation"

* tag 'md/4.1-rc4-fixes' of git://neil.brown.name/md:
  md/bitmap: remove rcu annotation from pointer arithmetic.
  md/raid0: fix restore to sector variable in raid0_make_request
  raid5: fix broken async operation chain