David S. Miller [Mon, 25 May 2015 22:19:10 +0000 (18:19 -0400)]
Merge branch 'cpsw-cleanups'
Richard Cochran says:
====================
cpsw cleanups
While working on an out-of-tree customization, I noticed a few minor
problems in the cpsw code. This series cleans up the issues I found.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The function, cpsw_intr_disable, already calls cpdma_ctlr_int_ctrl. There
is no need to disable the dma interrupts twice. This patch removes the
extra calls.
Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The function, cpsw_intr_enable, already calls cpdma_ctlr_int_ctrl. There
is no need to enable the dma interrupts twice. This patch removes the
extra call.
Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 25 May 2015 22:17:09 +0000 (18:17 -0400)]
Merge branch 'rocker-cleanups'
Simon Horman says:
====================
rocker: unused parameter and const cleanups
This series provides some minor though verbose cleanup of rocker.
The second patch depends on the first though it could be rebased.
I had previously asked for v2 to be put on hold while some bugs I had found
in the rocker driver were shaken out. That has now happened and the bugs
turned out to be unrelated. Accordingly I am reposting the series.
* Changes v2 -> v3
- Rebase and update for new variables and parameters that may be const
* Changes v1 -> v2
- Found quite a few more variables and parameters to make const
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Mon, 25 May 2015 05:28:36 +0000 (14:28 +0900)]
rocker: mark parameters and local variables as const
Mark parameters and local variables as const where possible.
Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Horman [Mon, 25 May 2015 05:28:35 +0000 (14:28 +0900)]
rocker: remove unused rocker_port parameter from rocker_port_kfree
Remove unused rocker_port parameter from rocker_port_kfree.
Also remove the rocker_port parameter from callers of rocker_port_kfree
where the parameter it is now unused.
Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Scott Feldman <sfeldma@gmail.com> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
irda: use msecs_to_jiffies for conversion to jiffies
API compliance scanning with coccinelle flagged:
./net/irda/timer.c:63:35-37: use of msecs_to_jiffies probably perferable
Converting milliseconds to jiffies by "val * HZ / 1000" technically
is not a clean solution as it does not handle all corner cases correctly.
By changing the conversion to use msecs_to_jiffies(val) conversion is
correct in all cases. Further the () around the arithmetic expression
was dropped.
Patch was compile tested for x86_64_defconfig + CONFIG_IRDA=m
Patch is against 4.1-rc4 (localversion-next is -next-20150522)
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
irda: irda-usb: use msecs_to_jiffies for conversions
API compliance scanning with coccinelle flagged:
Converting milliseconds to jiffies by "val * HZ / 1000" is technically
is not a clean solution as it does not handle all corner cases correctly.
By changing the conversion to use msecs_to_jiffies(val) conversion is
correct in all cases.
in the current code:
mod_timer(&self->rx_defer_timer, jiffies + (10 * HZ / 1000));
for HZ < 100 (e.g. CONFIG_HZ == 64|32 in alpha) this effectively results
in no delay at all.
Patch was compile tested for x86_64_defconfig (implies CONFIG_USB_IRDA=m)
Patch is against 4.1-rc4 (localversion-next is -next-20150522)
Signed-off-by: Nicholas Mc Guire <hofrat@osadl.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Lüssing [Sat, 23 May 2015 01:12:34 +0000 (03:12 +0200)]
bridge: allow setting hash_max + multicast_router if interface is down
Network managers like netifd (used in OpenWRT for instance) try to
configure interface options after creation but before setting the
interface up.
Unfortunately the sysfs / bridge currently only allows to configure the
hash_max and multicast_router options when the bridge interface is up.
But since br_multicast_init() doesn't start any timers and only sets
default values and initializes timers it should be save to reconfigure
the default values after that, before things actually get active after
the bridge is set up.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Thu, 21 May 2015 22:44:16 +0000 (00:44 +0200)]
ipv6: don't increase size when refragmenting forwarded ipv6 skbs
since commit 6aafeef03b9d ("netfilter: push reasm skb through instead of
original frag skbs") we will end up sometimes re-fragmenting skbs
that we've reassembled.
ipv6 defrag preserves the original skbs using the skb frag list, i.e. as long
as the skb frag list is preserved there is no problem since we keep
original geometry of fragments intact.
However, in the rare case where the frag list is munged or skb
is linearized, we might send larger fragments than what we originally
received.
A router in the path might then send packet-too-big errors even if
sender never sent fragments exceeding the reported mtu:
mtu 1500 - 1500:1400 - 1400:1280 - 1280
A R1 R2 B
1 - A sends to B, fragment size 1400
2 - R2 sends pkttoobig error for 1280
3 - A sends to B, fragment size 1280
4 - R2 sends pkttoobig error for 1280 again because it sees fragments of size 1400.
make sure ip6_fragment always caps MTU at largest packet size seen
when defragmented skb is forwarded.
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 25 May 2015 17:25:35 +0000 (13:25 -0400)]
Merge branch 'ipv6_route_sharing'
Martin KaFai Lau says:
====================
ipv6: Only create RTF_CACHE route after encountering pmtu exception
v4 -> v5:
- Patch 1 is new. Clean up the ipv6_select_ident() and ip6_fragment().
- Further simplify the newly added rt6_get_pcpu_route(). If there is a
'prev' after cmpxchg, return prev instead of the newly created percpu
clone.
v3 -> v4:
- Patch 8 is new. It keeps track of the DST_NOCACHE routes in a list to handle
the iface down/unregister event.
- Remove rcu from the newly added rt6i_pcpu variable. It is not needed
because it has already been protected by the existing reader/writer lock.
- Thanks to 'Julian Anastasov <ja@ssi.bg>' for testing the FLOWI_FLAG_KNOWN_NH
patches.
v2 -> v3:
- Patch 5 to 7 are new. They take care of cases where the daddr in
skb is not the one used to do the route look-up. There is also
related changes to rt6_nexthop() since v2 which is in patch 2/9.
Thanks to 'Julian Anastasov <ja@ssi.bg>' for pointing it out.
- Fix a few problems in __ip6_rt_update_pmtu(), like setting the expire
and mtu before inserting to the tree and don't do dst_destroy() after
tree insertion failure. Also update the rt6i_pmtu in fib6_add_rt2node().
Thanks to 'Steffen Klassert <steffen.klassert@secunet.com>' for pointing
it out.
- Merge ip6_pmtu_rt_cache_alloc() into ip6_rt_cache_alloc().
v1 -> v2:
- Move the /128 route bug fixes to another series (accepted).
- Create a function for checking (rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY)).
- Avoid shuffling the skb network_header. Instead, change the function
signature to take iph instead of skb.
- Many Thanks to 'Hannes Frederic Sowa <hannes@stressinduktion.org>' on
reviewing v1 and v2 and giving advice.
--Martin
~~~ start: v1 compose message (with the out-dated parts removed) ~~~
This series is to avoid creating a RTF_CACHE route whenever we are consulting
the fib6 tree with a new destination. Instead, only create RTF_CACHE route
when we see a pmtu exception.
Out of all ipv6 RTF_CACHE routes that are created, the percentage that has a
different mtu is very small. In one of our end-user facing proxy server,
only 1k out of 80k RTF_CACHE routes have a smaller MTU. For our DC
traffic, there is no mtu exception.
A large fib6 tree has problems like, 'ip -6 r show' takes a long time.
gc may kick in too often. Also, when a service has restarted and a lot
of new TCP conn requests come in, it creates pressure on the tree by inserting
a lot of RTF_CACHE in a short time and it currently requires a write lock
to do that.
The first few patches are prep works to remove assumption that the
returned rt is always RTF_CACHE.
The patch 'ipv6: Only create RTF_CACHE routes after encountering pmtu exception'
do the lazy RTF_CACHE route creation.
The following patches added percpu rt to compensate the performance loss after
doing the RTF_CACHE lazy creation.
Here is some numbers of the udpflood test. The udpflood has been
slightly modified to have a time limit instead of count limit.
A /64 via gateway route is used for the test. Each udpflood uses 10000 dst
addresses. The dst addresses of different udpflood processes do not overlap
with each other.
Martin KaFai Lau [Sat, 23 May 2015 03:56:06 +0000 (20:56 -0700)]
ipv6: Create percpu rt6_info
After the patch
'ipv6: Only create RTF_CACHE routes after encountering pmtu exception',
we need to compensate the performance hit (bouncing dst->__refcnt).
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Sat, 23 May 2015 03:56:05 +0000 (20:56 -0700)]
ipv6: Break up ip6_rt_copy()
This patch breaks up ip6_rt_copy() into ip6_rt_copy_init() and
ip6_rt_cache_alloc().
In the later patch, we need to create a percpu rt6_info copy. Hence,
refactor the common rt6_info init codes to ip6_rt_copy_init().
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Sat, 23 May 2015 03:56:04 +0000 (20:56 -0700)]
ipv6: Keep track of DST_NOCACHE routes in case of iface down/unregister
This patch keeps track of the DST_NOCACHE routes in a list and replaces its
dev with loopback during the iface down/unregister event.
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Sat, 23 May 2015 03:56:03 +0000 (20:56 -0700)]
ipv6: Create RTF_CACHE clone when FLOWI_FLAG_KNOWN_NH is set
This patch always creates RTF_CACHE clone with DST_NOCACHE
when FLOWI_FLAG_KNOWN_NH is set so that the rt6i_dst is set to
the fl6->daddr.
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Tested-by: Julian Anastasov <ja@ssi.bg> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Sat, 23 May 2015 03:56:02 +0000 (20:56 -0700)]
ipv6: Set FLOWI_FLAG_KNOWN_NH at flowi6_flags
The neighbor look-up used to depend on the rt6i_gateway (if
there is a gateway) or the rt6i_dst (if it is a RTF_CACHE clone)
as the nexthop address. Note that rt6i_dst is set to fl6->daddr
for the RTF_CACHE clone where fl6->daddr is the one used to do
the route look-up.
Now, we only create RTF_CACHE clone after encountering exception.
When doing the neighbor look-up with a route that is neither a gateway
nor a RTF_CACHE clone, the daddr in skb will be used as the nexthop.
In some cases, the daddr in skb is not the one used to do
the route look-up. One example is in ip_vs_dr_xmit_v6() where the
real nexthop server address is different from the one in the skb.
This patch is going to follow the IPv4 approach and ask the
ip6_pol_route() callers to set the FLOWI_FLAG_KNOWN_NH properly.
In the next patch, ip6_pol_route() will honor the FLOWI_FLAG_KNOWN_NH
and create a RTF_CACHE clone.
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Acked-by: Julian Anastasov <ja@ssi.bg> Tested-by: Julian Anastasov <ja@ssi.bg> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Sat, 23 May 2015 03:56:01 +0000 (20:56 -0700)]
ipv6: Add rt6_get_cookie() function
Instead of doing the rt6->rt6i_node check whenever we need
to get the route's cookie. Refactor it into rt6_get_cookie().
It is a prep work to handle FLOWI_FLAG_KNOWN_NH and also
percpu rt6_info later.
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Sat, 23 May 2015 03:56:00 +0000 (20:56 -0700)]
ipv6: Only create RTF_CACHE routes after encountering pmtu exception
This patch creates a RTF_CACHE routes only after encountering a pmtu
exception.
After ip6_rt_update_pmtu() has inserted the RTF_CACHE route to the fib6
tree, the rt->rt6i_node->fn_sernum is bumped which will fail the
ip6_dst_check() and trigger a relookup.
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Sat, 23 May 2015 03:55:59 +0000 (20:55 -0700)]
ipv6: Combine rt6_alloc_cow and rt6_alloc_clone
A prep work for creating RTF_CACHE on exception only. After this
patch, the same condition (rt->rt6i_flags & (RTF_NONEXTHOP | RTF_GATEWAY))
is checked twice. This redundancy will be removed in the later patch.
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Sat, 23 May 2015 03:55:58 +0000 (20:55 -0700)]
ipv6: Remove external dependency on rt6i_gateway and RTF_ANYCAST
When creating a RTF_CACHE route, RTF_ANYCAST is set based on rt6i_dst.
Also, rt6i_gateway is always set to the nexthop while the nexthop
could be a gateway or the rt6i_dst.addr.
After removing the rt6i_dst and rt6i_src dependency in the last patch,
we also need to stop the caller from depending on rt6i_gateway and
RTF_ANYCAST.
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Sat, 23 May 2015 03:55:57 +0000 (20:55 -0700)]
ipv6: Remove external dependency on rt6i_dst and rt6i_src
This patch removes the assumptions that the returned rt is always
a RTF_CACHE entry with the rt6i_dst and rt6i_src containing the
destination and source address. The dst and src can be recovered from
the calling site.
We may consider to rename (rt6i_dst, rt6i_src) to
(rt6i_key_dst, rt6i_key_src) later.
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin KaFai Lau [Sat, 23 May 2015 03:55:56 +0000 (20:55 -0700)]
ipv6: Clean up ipv6_select_ident() and ip6_fragment()
This patch changes the ipv6_select_ident() signature to return a
fragment id instead of taking a whole frag_hdr as a param to
only set the frag_hdr->identification.
It also cleans up ip6_fragment() to obtain the fragment id at the
beginning instead of using multiple "if" later to check fragment id
has been generated or not.
Signed-off-by: Martin KaFai Lau <kafai@fb.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 22 May 2015 23:10:07 +0000 (01:10 +0200)]
test_bpf: add more eBPF jump torture cases
Add two more eBPF test cases for JITs, i.e. the second one revealed a
bug in the x86_64 JIT compiler, where only an int3 filled image from
the allocator was emitted and later wrongly set by the compiler as the
bpf_func program code since optimization pass boundary was surpassed
w/o actually emitting opcodes.
Reference: http://thread.gmane.org/gmane.linux.network/364729 Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The following patches are included in this driver update series:
- Retrieve and set an additional hardware feature setting
- Fix the initial mode/speed determination when auto-negotiation is
disabled
- Add additional netif_dbg support to the driver
This patch series is based on net-next.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 22 May 2015 21:32:14 +0000 (16:32 -0500)]
amd-xgbe: Fix initial mode when auto-negotiation is disabled
When the ethtool command is used to set the speed of the device while
the device is down, the check to set the initial mode may fail when
the device is brought up, causing failure to bring the device up.
Update the code to set the initial mode based on the desired speed if
auto-negotiation is disabled.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lendacky, Thomas [Fri, 22 May 2015 21:32:09 +0000 (16:32 -0500)]
amd-xgbe: Add setting of a missing hardware feature
The device private data structure contains all the defined hardware
features for the device. However one of the features is not set. Even
though the feature is not currently used, set it to avoid future
issues of the feature being checked thinking it has been properly set.
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Thu, 21 May 2015 22:06:40 +0000 (00:06 +0200)]
ip: reject too-big defragmented DF-skb when forwarding
Send icmp pmtu error if we find that the largest fragment of df-skb
exceeded the output path mtu.
The ip output path will still catch this later on but we can avoid the
forward/postrouting hook traversal by rejecting right away.
This is what ipv6 already does.
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
net: af_unix: implement splice for stream af_unix sockets
unix_stream_recvmsg is refactored to unix_stream_read_generic in this
patch and enhanced to deal with pipe splicing. The refactoring is
inneglible, we mostly have to deal with a non-existing struct msghdr
argument.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Prepare skb_splice_bits to be able to deal with AF_UNIX sockets.
AF_UNIX sockets don't use lock_sock/release_sock and thus we have to
use a callback to make the locking and unlocking configureable.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch implements sendpage support for AF_UNIX SOCK_STREAM
sockets. This is also required for a complete splice implementation.
The implementation is a bit tricky because we append to already existing
skbs and so have to hold unix_sk->readlock to protect the reading side
from either advancing UNIXCB.consumed or freeing the skb at the socket
receive tail.
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 25 May 2015 03:23:01 +0000 (23:23 -0400)]
Merge tag 'wireless-drivers-next-for-davem-2015-05-21' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next
Kalle Valo says:
====================
ath10k:
* enable channel 144 on 5 GHz
* enable Adaptive Noise Immunity (ANI) by default
* add Wake on Wireless LAN (WOW) patterns support
* add basic Tunneled Direct Link Setup (TDLS) support
* add multi-channel support for QCA6174
* enable IBSS RSN support
* enable Bluetooth Coexistance whenever firmware supports it
* add more versatile way to set bitrates used by the firmware
ath9k:
* spectral scan: add support for multiple FFT frames per report
iwlwifi:
* major rework of the scan code (Luca)
* some work on the thermal code (Chaya Rachel)
* some work on the firwmare debugging infrastructure
brcmfmac:
* SDIO suspend and resume fixes
* wiphy band info and changes in regulatory settings
* add support for BCM4324 SDIO and BCM4358 PCIe
* enable support of PCIe devices on router platforms (Hante)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 25 May 2015 03:05:10 +0000 (23:05 -0400)]
Merge branch 'mlx4-next'
Or Gerlitz says:
====================
mlx4: Enable single ported VFs over IB ports
This series further enhances the support for mlx4 single ported VFs
introduced in 3.15 to work over IB ports too.
Just as quick reminder, the ConnectX3 device family exposes one PCI device
which serves both ports.
This can be non-optimal under virtualization schemes where the admin
would like the VF to expose one interface to the VM, etc.
Since all the VF interaction with the firmware passes through the PF
driver, we can emulate to the VF they have one port, and further create
a set of the VFs which act on port1 of the device and another set which
acts on port2.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Thu, 21 May 2015 12:14:10 +0000 (15:14 +0300)]
net/mlx4_core: Enable single ported IB VFs
Remove the limitation that disallows configuring single ported VFs
in the presence of IB ports, after addressing the issues that
prevented that to work.
SMI (QP0) requests/responses are still not supported for single
ported IB VFs.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Thu, 21 May 2015 12:14:09 +0000 (15:14 +0300)]
net/mlx4_core: Adjust the schedule queue port in reset-to-init too
It's legal for drivers to provide the QP port through the
QPC schedule-queue field on the reset-to-init QP state change.
Add adjusting of the schedule queue port in the SRIOV wrapper
for that operation too.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Thu, 21 May 2015 12:14:08 +0000 (15:14 +0300)]
net/mlx4_core: Adjust the schedule queue port for single ported IB VFs
Some VF drivers flow set the schedule queue in the QP context but
without setting none of OPTPAR_SCHED_QUEUE or OPTPAR_PRIMARY_ADDR_PATH.
To allow for such non-modified drivers to function as single ported
IB VFs, we must adjust the schedule queue port whenever being set,
e.g as currently done for single ported Eth VFs.
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Thu, 21 May 2015 12:14:07 +0000 (15:14 +0300)]
net/mlx4_core: Modify port values when generting EQEs for VFs
As part of enabling single ported VFs over IB ports we need to handle
some of the flows for generting EQ events for VFs which don't come
into play under Eth ports.
This mainly includes port management events derived from changes of the
phyiscal port (lid change, client re-register, down/up, etc), VF pkey table
changes and VF guid changes initiated by the IB driver.
(1) make sure that events are generated only for VFs sitting on
the relevant physical port (under the ALL_SLAVES flow).
(2) before generating the event, convert from physical (one or two)
to VF port (always equals one).
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Thu, 21 May 2015 12:14:06 +0000 (15:14 +0300)]
IB/mlx4: Convert slave port before building address-handle
When multiplexling a MAD sent from VF, we should convert the port used
by the guest to send the packet to the actual physical port which will be
used to transmit the packet, before building the relevant address-handle (AH).
This is needed under VPI for single ported VFs, since the code that builds
the AH (mlx4_ib_query_ah()) makes decisions based on the input port. If we
use the port number provided by the guest, it might have different protocol
vs. the one this packat has to go from, and hence the result could be wrong.
So far, the conversion was done after the AH was built and it worked for
single ported Eth VFs which were not enabled under VPI. When adding support
for single ported IB VFs and VPI, we hit that.
Fixes: 449fc48866f7 ('net/mlx4: Adapt code for N-Port VF') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Thu, 21 May 2015 12:14:05 +0000 (15:14 +0300)]
net/mlx4_core: Enhance the MAD_IFC wrapper to convert VF port to physical
Single port VFs always provide port = 1 (even if the actual physical
port used is port 2). As such, we need to convert the port provided
by the VF to the physical port before calling into the firmware.
It turns out that the Linux mlx4 VF RoCE driver maintains a copy of
the GID table and hence this change became critical only for single
ported IB VFs, but it could be needed for other RoCE VF drivers too.
Fixes: 449fc48866f7 ('net/mlx4: Adapt code for N-Port VF') Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sat, 23 May 2015 03:59:23 +0000 (23:59 -0400)]
Merge branch 'pktgen-new-scripts'
Jesper Dangaard Brouer says:
====================
pktgen: cleanups and introducing new samples/pktgen scripts
v3:
- Aborted v2 send due it was not generating diff stat
(this is a bug in stg-mail, if not in the root directory)
v2: address nitpicks from Cong Wang
- Remove useless cat's, but keep them for old pgset()
- Comment on: Due to pgctrl, cannot use exit code $? from grep
- Use arithmetic compare in pktgen_sample03_burst_single_flow.sh
This patchset is focused on making pktgen easier to use and better
documented. It contains a number of documentation updates and minor
changes to pktgen. The major contribution is introduction of common
helper function for sample scripts.
Instead of the old pgset() function, three new shell functions for
configuring the different components of pktgen are introduced:
pg_ctrl(), pg_thread() and pg_set().
The new functions correspond to pktgens different components.
* pg_ctrl() control "pgctrl" (/proc/net/pktgen/pgctrl)
* pg_thread() control the kernel threads and binding to devices
* pg_set() control setup of individual devices
Helpers also provide consistent parameter parsing across the sample
scripts.
This script pktgen_bench_xmit_mode_netif_receive.sh is a benchmark
script, which can be used for benchmarking part of the network stack.
This can be used for performance improving or catching regression in
that area.
The script is developed for benchmarking ingress qdisc path, original
idea by Alexei Starovoitov. This script don't really need any
hardware. This is achieved via the recently introduced stack inject
feature "xmit_mode netif_receive". See commit 62f64aed622b6 ("pktgen:
introduce xmit_mode '<start_xmit|netif_receive>'").
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add the pktgen samples script pktgen_sample03_burst_single_flow.sh
that demonstrates how to acheive maximum performance.
If correctly tuned[1] single CPU 10Gbit/s wirespeed small pkts is
possible[2] which is 14.88Mpps. The trick is to take advantage of the
"burst" feature introduced in commit 38b2cf2982dc73 ("net: pktgen:
packet bursting via skb->xmit_more").
Add the pktgen samples script pktgen_sample02_multiqueue.sh that
demonstrates generating packets on multiqueue NICs.
Specifically notice the options "-t" that specifies how many
kernel threads to activate. Also notice the flag QUEUE_MAP_CPU,
which cause the SKB TX queue to be mapped to the CPU running the
kernel thread. For best scalability people are also encourage to
map NIC IRQ /proc/irq/*/smp_affinity to CPU number.
Usage example with "-t" 4 threads and help:
./pktgen_sample02_multiqueue.sh -i eth4 -m 00:1B:21:3C:9D:F8 -t 4
Add the first basic pktgen samples script pktgen_sample01_simple.sh,
which demonstrates the a simple use of the helper functions.
Removing pktgen.conf-1-1 as that example should be covered now.
The naming scheme pktgen_sampleNN, where NN is a number, should encourage
reading the samples in a specific order.
Script cause pktgen sending with a single thread and single interface,
and introduce flow variation via random UDP source port.
Usage example and help:
./pktgen_sample01_simple.sh -i eth4 -m 00:1B:21:3C:9D:F8 -d 192.168.8.2
pktgen: new pktgen helper functions for samples scripts
Preparing for removing existing samples/pktgen/ scripts, and
replacing these with easier to use samples.
This commit provides two helper shell files, that can
be "included" by shell source'ing. Namely "functions.sh"
and "parameters.sh".
The parameters.sh file support easy and consistant parameter
parsing across the sample scripts. Usage example is printed on
errors.
The functions.sh file provides, three new shell functions for
configuring the different components of pktgen: pg_ctrl(),
pg_thread() and pg_set(). A slightly improved version of the old
pgset() function is also provided for backwards compat.
The new functions correspond to pktgens different components.
* pg_ctrl() control "pgctrl" (/proc/net/pktgen/pgctrl)
* pg_thread() control the kernel threads and binding to devices
* pg_set() control setup of individual devices
These changes are borrowed from:
https://github.com/netoptimizer/network-testing/tree/master/pktgen
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
pktgen: make /proc/net/pktgen/pgctrl report fail on invalid input
Giving /proc/net/pktgen/pgctrl an invalid command just returns shell
success and prints a warning in dmesg. This is not very useful for
shell scripting, as it can only detect the error by parsing dmesg.
Instead return -EINVAL when the command is unknown, as this provides
userspace shell scripting a way of detecting this.
Also bump version tag to 2.75, because (1) reading /proc/net/pktgen/pgctrl
output this version number which would allow to detect this small
semantic change, and (2) because the pktgen version tag have not been
updated since 2010.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
pktgen: document ability to add same device to several threads
The pktgen.txt documentation still claimed that adding same device to
multiple threads were not supported, but it have been since 2008 via
commit e6fce5b916cd7 ("pktgen: multiqueue etc.").
Document this and describe the naming scheme dev@X, as the procfile name
still need to be unique.
Fixes: e6fce5b916cd7 ("pktgen: multiqueue etc.") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 23 May 2015 00:34:24 +0000 (17:34 -0700)]
Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux
Pull drm fixes from Dave Airlie:
"Radeon has two displayport fixes, one for a regression.
i915 regression flicker fix needed so 4.0 can get fixed.
A bunch of msm fixes and a bunch of exynos fixes, these two are
probably a bit larger than I'd like, but most of them seems pretty
good"
* 'drm-fixes' of git://people.freedesktop.org/~airlied/linux: (29 commits)
drm/radeon: fix error flag checking in native aux path
drm/radeon: retry dcpd fetch
drm/msm/mdp5: fix incorrect parameter for msm_framebuffer_iova()
drm/exynos: dp: Lower level of EDID read success message
drm/exynos: cleanup exynos_drm_plane
drm/exynos: 'win' is always unsigned
drm/exynos: mixer: don't dump registers under spinlock
drm/exynos: Consolidate return statements in fimd_bind()
drm/exynos: Constify exynos_drm_crtc_ops
drm/exynos: Fix build breakage on !DRM_EXYNOS_FIMD
drm/exynos: mixer: Constify platform_device_id
drm/exynos: mixer: cleanup pixelformat handling
drm/exynos: mixer: also allow NV21 for the video processor
drm/exynos: mixer: remove buffer count handling in vp_video_buffer()
drm/exynos: plane: honor buffer offset for dma_addr
drm/exynos: fb: use drm_format_num_planes to get buffer count
drm/i915: fix screen flickering
drm/msm: fix locking inconsistencies in gpu->destroy()
drm/msm/dsi: Simplify the code to get the number of read byte
drm/msm: Attach assigned encoder to eDP and DSI connectors
...
1) Don't leak ipvs->sysctl_tbl, from Tommi Rentala.
2) Fix neighbour table entry leak in rocker driver, from Ying Xue.
3) Do not emit bonding notifications for unregistered interfaces, from
Nicolas Dichtel.
4) Set ipv6 flow label properly when in TIME_WAIT state, from Florent
Fourcot.
5) Fix regression in ipv6 multicast filter test, from Henning Rogge.
6) do_replace() in various footables netfilter modules is missing a
check for 0 counters in the datastructure provided by the user. Fix
from Dave Jones, and found with trinity.
7) Fix RCU bug in packet scheduler classifier module unloads, from
Daniel Borkmann.
8) Avoid deadlock in tcp_get_info() by using u64_sync. From Eric
Dumzaet.
9) Input packet processing can race with inetdev_destroy() teardown,
fix potential OOPS in ip_error() by explicitly testing whether the
inetdev is still attached. From Eric W Biederman.
10) MLDv2 parser in bridge multicast code breaks too early while
parsing. Fix from Thadeu Lima de Souza Cascardo.
11) Asking for settings on non-zero PHYID doesn't work because we do not
import the command structure from the user and use the PHYID
provided there. Fix from Arun Parameswaran.
12) Fix UDP checksums with IPV6 RAW sockets, from Vlad Yasevich.
13) Missing NF_TABLES depends for TPROXY etc can cause build failures,
fix from Florian Westphal.
14) Fix netfilter conntrack to handle RFC5961 challenge ACKs properly,
from Jesper Dangaard Brouer.
15) If netlink autobind retry fails, we have to reset the sockets portid
back to zero. From Herbert Xu.
16) VXLAN netns exit code unregisters using wrong device, from John W
Linville.
17) Add some USB device IDs to ath3k and btusb bluetooth drivers, from
Dmitry Tunin and Wen-chien Jesse Sung.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (44 commits)
bridge: fix lockdep splat
net: core: 'ethtool' issue with querying phy settings
bridge: fix parsing of MLDv2 reports
ARM: zynq: DT: Use the zynq binding with macb
net: macb: Disable half duplex gigabit on Zynq
net: macb: Document zynq gem dt binding
ipv4: fill in table id when replacing a route
cdc_ncm: Fix tx_bytes statistics
ipv4: Avoid crashing in ip_error
tcp: fix a potential deadlock in tcp_get_info()
net: sched: fix call_rcu() race on classifier module unloads
net: phy: Make sure phy_start() always re-enables the phy interrupts
ipv6: fix ECMP route replacement
ipv6: do not delete previously existing ECMP routes if add fails
Revert "netfilter: bridge: query conntrack about skb dnat"
netfilter: ensure number of counters is >0 in do_replace()
netfilter: nfnetlink_{log,queue}: Register pernet in first place
tcp: don't over-send F-RTO probes
tcp: only undo on partial ACKs in CA_Loss
net/ipv6/udp: Fix ipv6 multicast socket filter regression
...
Linus Torvalds [Fri, 22 May 2015 22:15:30 +0000 (15:15 -0700)]
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"Three small fixes that have been picked up the last few weeks.
Specifically:
- Fix a memory corruption issue in NVMe with malignant user
constructed request. From Christoph.
- Kill (now) unused blk_queue_bio(), dm was changed to not need this
anymore. From Mike Snitzer.
- Always use blk_schedule_flush_plug() from the io_schedule() path
when flushing a plug, fixing a !TASK_RUNNING warning with md. From
Shaohua"
* 'for-linus' of git://git.kernel.dk/linux-block:
sched: always use blk_schedule_flush_plug in io_schedule_out
nvme: fix kernel memory corruption with short INQUIRY buffers
block: remove export for blk_queue_bio
Linus Torvalds [Fri, 22 May 2015 21:49:55 +0000 (14:49 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input
Pull input fixes from Dmitry Torokhov:
"Updates for the input subsystem.
The main change is that we tell joydev not to touch "absolute mice",
such as VMware virtual mouse, as that produced bad result (cursor
stuck in upper right corner) with games"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
Input: smtpe-ts - wait 50mS until polling for pen-up
Input: smtpe-ts - use msecs_to_jiffies() instead of HZ
Input: joydev - don't classify the vmmouse as a joystick
Input: vmmouse - do not reference non-existing version of X driver
Input: alps - fix finger jumps on lifting 2 fingers on v7 touchpad
Input: elantech - fix semi-mt protocol for v3 HW
Input: sx8654 - fix memory allocation check
Problem here is that br_forward_delay_timer_expired() is a timer
handler, calling br_ifinfo_notify() which assumes either rcu_read_lock()
or RTNL are held.
Simplest fix seems to add rcu read lock section.
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Josh Boyer <jwboyer@fedoraproject.org> Reported-by: Dominick Grift <dac.override@gmail.com> Cc: Vlad Yasevich <vyasevich@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: core: 'ethtool' issue with querying phy settings
When trying to configure the settings for PHY1, using commands
like 'ethtool -s eth0 phyad 1 speed 100', the 'ethtool' seems to
modify other settings apart from the speed of the PHY1, in the
above case.
The ethtool seems to query the settings for PHY0, and use this
as the base to apply the new settings to the PHY1. This is
causing the other settings of the PHY 1 to be wrongly
configured.
The issue is caused by the '_ethtool_get_settings()' API, which
gets called because of the 'ETHTOOL_GSET' command, is clearing
the 'cmd' pointer (of type 'struct ethtool_cmd') by calling
memset. This clears all the parameters (if any) passed for the
'ETHTOOL_GSET' cmd. So the driver's callback is always invoked
with 'cmd->phy_address' as '0'.
The '_ethtool_get_settings()' is called from other files in the
'net/core'. So the fix is applied to the 'ethtool_get_settings()'
which is only called in the context of the 'ethtool'.
Signed-off-by: Arun Parameswaran <aparames@broadcom.com> Reviewed-by: Ray Jui <rjui@broadcom.com> Reviewed-by: Scott Branden <sbranden@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Holzheu [Fri, 22 May 2015 15:36:40 +0000 (08:36 -0700)]
test_bpf: Add backward jump test case
Currently the testsuite does not have a test case with a backward jump.
The s390x JIT (kernel 4.0) had a bug in that area.
So add one new test case for this now.
Signed-off-by: Michael Holzheu <holzheu@linux.vnet.ibm.com> Signed-off-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
When more than a multicast address is present in a MLDv2 report, all but
the first address is ignored, because the code breaks out of the loop if
there has not been an error adding that address.
This has caused failures when two guests connected through the bridge
tried to communicate using IPv6. Neighbor discoveries would not be
transmitted to the other guest when both used a link-local address and a
static address.
This only happens when there is a MLDv2 querier in the network.
The fix will only break out of the loop when there is a failure adding a
multicast address.
The mdb before the patch:
dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
dev ovirtmgmt port bond0.86 grp ff02::2 temp
After the patch:
dev ovirtmgmt port vnet0 grp ff02::1:ff7d:6603 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff7d:6604 temp
dev ovirtmgmt port bond0.86 grp ff02::fb temp
dev ovirtmgmt port bond0.86 grp ff02::2 temp
dev ovirtmgmt port bond0.86 grp ff02::d temp
dev ovirtmgmt port vnet0 grp ff02::1:ff00:76 temp
dev ovirtmgmt port bond0.86 grp ff02::16 temp
dev ovirtmgmt port vnet1 grp ff02::1:ff00:77 temp
dev ovirtmgmt port bond0.86 grp ff02::1:ff00:def temp
dev ovirtmgmt port bond0.86 grp ff02::1:ffa1:40bf temp
Fixes: 08b202b67264 ("bridge br_multicast: IPv6 MLD support.") Reported-by: Rik Theys <Rik.Theys@esat.kuleuven.be> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@redhat.com> Tested-by: Rik Theys <Rik.Theys@esat.kuleuven.be> Signed-off-by: David S. Miller <davem@davemloft.net>
Nathan Sullivan [Fri, 22 May 2015 14:22:10 +0000 (09:22 -0500)]
net: macb: Disable half duplex gigabit on Zynq
According to the Zynq TRM, gigabit half duplex is not supported. Add a
new cap and compatible string so Zynq can avoid advertising that mode.
Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nathan Sullivan [Fri, 22 May 2015 14:22:09 +0000 (09:22 -0500)]
net: macb: Document zynq gem dt binding
Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com> Acked-by: Nicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michal Kubeček [Fri, 22 May 2015 11:40:09 +0000 (13:40 +0200)]
ipv4: fill in table id when replacing a route
When replacing an IPv4 route, tb_id member of the new fib_alias
structure is not set in the replace code path so that the new route is
ignored.
Fixes: 0ddcf43d5d4a ("ipv4: FIB Local/MAIN table collapse") Signed-off-by: Michal Kubecek <mkubecek@suse.cz> Acked-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 22 May 2015 11:15:22 +0000 (13:15 +0200)]
cdc_ncm: Fix tx_bytes statistics
The tx_curr_frame_payload field is u32. When we try to calculate a
small negative delta based on it, we end up with a positive integer
close to 2^32 instead. So the tx_bytes pointer increases by about
2^32 for every transmitted frame.
Fix by calculating the delta as a signed long.
Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Reported-by: Florian Bruhin <me@the-compiler.org> Fixes: 7a1e890e2168 ("usbnet: Fix tx_bytes statistic running backward in cdc_ncm") Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
The following patchset contain Netfilter fixes for your net tree, they are:
1) Fix a race in nfnetlink_log and nfnetlink_queue that can lead to a crash.
This problem is due to wrong order in the per-net registration and netlink
socket events. Patch from Francesco Ruggeri.
2) Make sure that counters that userspace pass us are higher than 0 in all the
x_tables frontends. Discovered via Trinity, patch from Dave Jones.
3) Revert a patch for br_netfilter to rely on the conntrack status bits. This
breaks stateless IPv6 NAT transformations. Patch from Florian Westphal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_error does not check if in_dev is NULL before dereferencing it.
IThe following sequence of calls is possible:
CPU A CPU B
ip_rcv_finish
ip_route_input_noref()
ip_route_input_slow()
inetdev_destroy()
dst_input()
With the result that a network device can be destroyed while processing
an input packet.
A crash was triggered with only unicast packets in flight, and
forwarding enabled on the only network device. The error condition
was created by the removal of the network device.
As such it is likely the that error code was -EHOSTUNREACH, and the
action taken by ip_error (if in_dev had been accessible) would have
been to not increment any counters and to have tried and likely failed
to send an icmp error as the network device is going away.
Therefore handle this weird case by just dropping the packet if
!in_dev. It will result in dropping the packet sooner, and will not
result in an actual change of behavior.
Fixes: 251da4130115b ("ipv4: Cache ip_error() routes even when not forwarding.") Reported-by: Vittorio Gambaletta <linuxbugs@vittgam.net> Tested-by: Vittorio Gambaletta <linuxbugs@vittgam.net> Signed-off-by: Vittorio Gambaletta <linuxbugs@vittgam.net> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 22 May 2015 09:05:58 +0000 (11:05 +0200)]
flow_dissector: do not break if ports are not needed in flowlabel
This restored previous behaviour. If caller does not want ports to be
filled, we should not break.
Fixes: 06635a35d13d ("flow_dissect: use programable dissector in skb_flow_dissect and friends") Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 22 May 2015 04:51:19 +0000 (21:51 -0700)]
tcp: fix a potential deadlock in tcp_get_info()
Taking socket spinlock in tcp_get_info() can deadlock, as
inet_diag_dump_icsk() holds the &hashinfo->ehash_locks[i],
while packet processing can use the reverse locking order.
We could avoid this locking for TCP_LISTEN states, but lockdep would
certainly get confused as all TCP sockets share same lockdep classes.
[ 523.722504] ======================================================
[ 523.728706] [ INFO: possible circular locking dependency detected ]
[ 523.734990] 4.1.0-dbg-DEV #1676 Not tainted
[ 523.739202] -------------------------------------------------------
[ 523.745474] ss/18032 is trying to acquire lock:
[ 523.750002] (slock-AF_INET){+.-...}, at: [<ffffffff81669d44>] tcp_get_info+0x2c4/0x360
[ 523.758129]
[ 523.758129] but task is already holding lock:
[ 523.763968] (&(&hashinfo->ehash_locks[i])->rlock){+.-...}, at: [<ffffffff816bcb75>] inet_diag_dump_icsk+0x1d5/0x6c0
[ 523.774661]
[ 523.774661] which lock already depends on the new lock.
[ 523.774661]
[ 523.782850]
[ 523.782850] the existing dependency chain (in reverse order) is:
[ 523.790326]
-> #1 (&(&hashinfo->ehash_locks[i])->rlock){+.-...}:
[ 523.796599] [<ffffffff811126bb>] lock_acquire+0xbb/0x270
[ 523.802565] [<ffffffff816f5868>] _raw_spin_lock+0x38/0x50
[ 523.808628] [<ffffffff81665af8>] __inet_hash_nolisten+0x78/0x110
[ 523.815273] [<ffffffff816819db>] tcp_v4_syn_recv_sock+0x24b/0x350
[ 523.822067] [<ffffffff81684d41>] tcp_check_req+0x3c1/0x500
[ 523.828199] [<ffffffff81682d09>] tcp_v4_do_rcv+0x239/0x3d0
[ 523.834331] [<ffffffff816842fe>] tcp_v4_rcv+0xa8e/0xc10
[ 523.840202] [<ffffffff81658fa3>] ip_local_deliver_finish+0x133/0x3e0
[ 523.847214] [<ffffffff81659a9a>] ip_local_deliver+0xaa/0xc0
[ 523.853440] [<ffffffff816593b8>] ip_rcv_finish+0x168/0x5c0
[ 523.859624] [<ffffffff81659db7>] ip_rcv+0x307/0x420
Lets use u64_sync infrastructure instead. As a bonus, 64bit
arches get optimized, as these are nop for them.
Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info") Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Airlie [Fri, 22 May 2015 03:31:03 +0000 (13:31 +1000)]
Merge tag 'drm-intel-fixes-2015-05-21' of git://anongit.freedesktop.org/drm-intel into drm-fixes
There's a stable backport from Ander [1] that combines this and a few
other commits to fix the flickering on v4.0, reported in [2] among
others. Having this upstream is obviously a requirement for stable.
* tag 'drm-intel-fixes-2015-05-21' of git://anongit.freedesktop.org/drm-intel:
drm/i915: fix screen flickering
Florian Westphal [Thu, 21 May 2015 00:26:24 +0000 (02:26 +0200)]
net: sched: pkt_cls: remove unused macros from uapi
Jamal points out that this header also contains kernel internal magic that
cannot be used from userspace for anything meaningful.
Lets remove what the kernel doesn't use anymore and wrap remainder with
__KERNEL__.
Suggested-by: Jamal Hadi Salim <jhs@mojatatu.com> Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: add tcpi_segs_in and tcpi_segs_out to tcp_info
This patch tracks the total number of inbound and outbound segments on a
TCP socket. One may use this number to have an idea on connection
quality when compared against the retransmissions.
RFC4898 named these : tcpEStatsPerfSegsIn and tcpEStatsPerfSegsOut
These are a 32bit field each and can be fetched both from TCP_INFO
getsockopt() if one has a handle on a TCP socket, or from inet_diag
netlink facility (iproute2/ss patch will follow)
Note that tp->segs_out was placed near tp->snd_nxt for good data
locality and minimal performance impact, while tp->segs_in was placed
near tp->bytes_received for the same reason.
Join work with Eric Dumazet.
Note that received SYN are accounted on the listener, but sent SYNACK
are not accounted.
Signed-off-by: Marcelo Ricardo Leitner <mleitner@redhat.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Wed, 20 May 2015 22:25:41 +0000 (00:25 +0200)]
ipv6: reject locally assigned nexthop addresses
ip -6 addr add dead::1/128 dev eth0
sleep 5
ip -6 route add default via dead::1/128
-> fails
ip -6 addr add dead::1/128 dev eth0
ip -6 route add default via dead::1/128
-> succeeds
reason is that if (nonsensensical) route above is added,
dead::1 is still subject to DAD, so the route lookup will
pick eth0 as outdev due to the prefix route that is added before
DAD work is started.
Add explicit test that checks if nexthop gateway is a local address.
Link: https://bugzilla.redhat.com/show_bug.cgi?id=1167969 Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
crypto: s390/ghash - Fix incorrect ghash icv buffer handling.
Multitheaded tests showed that the icv buffer in the current ghash
implementation is not handled correctly. A move of this working ghash
buffer value to the descriptor context fixed this. Code is tested and
verified with an multithreaded application via af_alg interface.
Cc: stable@vger.kernel.org Signed-off-by: Harald Freudenberger <freude@linux.vnet.ibm.com> Signed-off-by: Gerald Schaefer <geraldsc@linux.vnet.ibm.com> Reported-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Linus Torvalds [Fri, 22 May 2015 02:54:50 +0000 (19:54 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Martin Schwidefsky:
"Bug fixes.
Three for our crypto code, two for eBPF, and one memory management fix
to get machines with memory > 8TB working"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: correct return value of pmd_pfn
s390/crypto: fix stckf loop
s390/zcrypt: Fix invalid domain handling during ap module unload
s390/bpf: Fix gcov stack space problem
s390/zcrypt: fixed ap poll timer behavior
s390/bpf: Adjust ALU64_DIV/MOD to match interpreter change
Linus Torvalds [Fri, 22 May 2015 00:54:00 +0000 (17:54 -0700)]
Merge tag 'sound-4.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"This batch became slightly large, just because I've been on vacation
for the last two weeks. Nothing to scare much here, all
device-specific fixes, mostly small patches.
Majority of patches are for HD-audio, especially Dell machines. The
rest are small ASoC fixes for various codecs, and a USB-audio quirk.
One PCM fix is included to ease the faulty condition checks in the
case of two periods PCM buffers"
* tag 'sound-4.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: hda - Disable widget power-saving for ALC292 & co
ALSA: hda - Reduce verbs by node power-saves
ALSA: sound/atmel/ac97c.c: remove unused variable
ALSA: usb-audio: Add quirk for MS LifeCam Studio
ALSA: pcm: Modify double acknowledged interrupts check condition
ALSA: hda/realtek - ALC292 dock fix for Thinkpad L450
ALSA: hda - Add Conexant codecs CX20721, CX20722, CX20723 and CX20724
ALSA: hda - Fix headset mic and mic-in for a Dell desktop
ASoC: wm8994: correct BCLK DIV 348 to 384
ASoC: wm8960: fix "RINPUT3" audio route error
ASoC: dapm: Modify widget stream name according to prefix
ALSA: hda - Add headset mic quirk for Dell Inspiron 5548
ASoC: rt5645: Fix mask for setting RT5645_DMIC_2_DP_GPIO12 bit
ASoC: rt5645: Add ACPI match ID
ALSA: hda/realtek - Add ALC298 alias name for Dell
ASoC: uda1380: Avoid accessing i2c bus when codec is disabled
ALSA: hda/realtek - Fix typo for ALC286/ALC288
ASoC: mc13783: Fix wrong mask value used in mc13xxx_reg_rmw() calls
ALSA: hda - Add headphone quirk for Lifebook E752
ASoC: davinci-mcasp: Correct pm status check in suspend callback
Linus Torvalds [Fri, 22 May 2015 00:36:23 +0000 (17:36 -0700)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull infiniband/rdma fixes from Doug Ledford:
"This should hopefully be the last request for 4.1-rc for the RDMA
stack. It contains some late ocrdma fixes that I'm including because
they are small and self contained. It also contains two bug fixes
that are simple and easily verified.
Summary:
- a number of small, well contained bug fixes for ocrdma driver
- a simple fix for the connection negotiation sequence on IB
- fix for broken AF_IB address on UD queue pair support"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma:
IB/cma: Fix broken AF_IB UD support
ib/cm: Change reject message type when destroying cm_id
RDMA/ocrdma: Update ocrdma version number
RDMA/ocrdma: Fail connection for MTU lesser than 512
RDMA/ocrdma: Fix dmac resolution for link local address
RDMA/ocrdma: Prevent allocation of DPP PDs if FW doesnt support it
RDMA/ocrdma: Fix the request length for RDMA_QUERY_QP mailbox command to FW.
RDMA/ocrdma: Use VID 0 if PFC is enabled and vlan is not configured
RDMA/ocrdma: Fix QP state transition in destroy_qp
RDMA/ocrdma: Report EQ full fatal error
RDMA/ocrdma: Fix EQ destroy failure during driver unload
Linus Torvalds [Fri, 22 May 2015 00:23:11 +0000 (17:23 -0700)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid
Pull HID fixes from Jiri Kosina:
"Bugfixes for HID subsystem that should go in 4.1. Important
highlights:
- the patch that extended support for HID++ protocol for TK820
touchpad turns out to be causing regressions due to firmware
issues; patch reverting back to basic support from Benjamin
Tissoires
- Wacom driver can oops for devices that report non-touch data on
touch interfaces. Fix from Ping Cheng
- gpiolib is not mandatory for i2c-hid, so the driver shouldn't fail
if gpiolib is not enabled. Fix from Mika Westerberg"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/hid:
HID: wacom: fix an Oops caused by wacom_wac_finger_count_touches
HID: usbhid: Add HID_QUIRK_NOGET for Aten DVI KVM switch
HID: hid-sensor-hub: Fix debug lock warning
Revert "HID: logitech-hidpp: support combo keyboard touchpad TK820"
HID: i2c-hid: Do not fail probing if gpiolib is not enabled
Marek Vasut [Fri, 8 May 2015 05:25:49 +0000 (22:25 -0700)]
Input: smtpe-ts - wait 50mS until polling for pen-up
Wait a little bit longer, 50mS instead of 20mS, until the driver starts
polling for pen-up. The problematic behavior before this patch is applied
is as follows. The behavior was observed on the STMPE610QTR controller.
Upon a physical pen-down event, the touchscreen reports one set of x-y-p
coordinates and a pen-down event. After that, the pen-up polling is
triggered and since the controller is not ready yet, the polling mistakenly
detects a pen-up event while the physical state is still such that the pen
is down on the touch surface.
The pen-up handling flushes the controller FIFO, so after that, all the
samples in the controller are discarded. The controller becomes ready
shortly after this bogus pen-up handling and does generate again a pen-down
interrupt. This time, the controller contains x-y-p samples which all read
as zero. Since pressure value is zero, this set of samples is effectively
ignored by userland.
In the end, the driver just bounces between pen-down and bogus pen-up
handling, generating no useful results. Fix this by giving the controller a
bit more time before polling it for pen-up.
Marek Vasut [Fri, 8 May 2015 05:24:58 +0000 (22:24 -0700)]
Input: smtpe-ts - use msecs_to_jiffies() instead of HZ
Use msecs_to_jiffies(20) instead of plain (HZ / 50), as the former is much
more explicit about it's behavior. We want to schedule the task 20 mS from
now, so make it explicit in the code.
Linus Torvalds [Thu, 21 May 2015 23:57:50 +0000 (16:57 -0700)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Michael Turquette:
"The first set of clk fixes for 4.1 are all driver bugs, with the
exception of a single locking fix in the core code.
All driver fixes are for code that was merged recently. The Samsung
stuff is mostly fixes around suspend/resume, the Qualcomm fixes are
for invalid hardware configuration data and the Silicon Labs patches
are fixes following their move away from platform_data to Device Tree"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: si5351: Do not pass struct clk in platform_data
clk: si5351: Mention clock-names in the binding documentation
clk: add missing lock when call clk_core_enable in clk_set_parent
clk: exynos5420: Restore GATE_BUS_TOP on suspend
clk: qcom: Fix MSM8916 gfx3d_clk_src configuration
clk: qcom: Fix MSM8916 venus divider value
clk: exynos5433: Fix wrong PMS value of exynos5433_pll_rates
clk: exynos5433: Fix wrong parent clock of sclk_apollo clock
clk: exynos5433: Fix CLK_PCLK_MONOTONIC_CNT clk register assignment
clk: exynos5433: Fix wrong offset of PCLK_MSCL_SECURE_SMMU_JPEG
clk: Use CONFIG_ARCH_EXYNOS instead of CONFIG_ARCH_EXYNOS5433
Thomas Hellstrom [Wed, 20 May 2015 21:50:30 +0000 (14:50 -0700)]
Input: joydev - don't classify the vmmouse as a joystick
Joydev is currently thinking some absolute mice are joystick, and that
messes up games in VMware guests, as the cursor typically gets stuck in
the top left corner.
Try to detect the event signature of a VMmouse input device and back off
for such devices. We're still incorrectly detecting, for example, the
VMware absolute USB mouse as a joystick, but adding an event signature
matching also that device would be considerably more risky, so defer that
to a later merge window.
Signed-off-by: Thomas Hellstrom <thellstrom@vmware.com> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
David S. Miller [Thu, 21 May 2015 22:57:26 +0000 (18:57 -0400)]
Merge branch 'stmmac-probe-refactoring'
Joachim Eastwood says:
====================
stmmac: probe code refactoring and clean up part 1
This patch set refactor the code in stmmac_pci_probe and stmmac_pltfr_probe
and moves the common bits into stmmac_dvr_probe. Along the way some clean-
ups are applied to stmmac_pltfr_probe.
The code has been tested on the LPC18xx platform.
I am still working on more refactoring of the platform probe code, hence
part 1, but I need some more time on this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>