bridge: apply multicast snooping to IPv6 link-local, too
The multicast snooping code should have matured enough to be safely
applicable to IPv6 link-local multicast addresses (excluding the
link-local all nodes address, ff02::1), too.
Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
bridge: prevent flooding IPv6 packets that do not have a listener
Currently if there is no listener for a certain group then IPv6 packets
for that group are flooded on all ports, even though there might be no
host and router interested in it on a port.
With this commit they are only forwarded to ports with a multicast
router.
Just like commit bd4265fe36 ("bridge: Only flood unregistered groups
to routers") did for IPv4, let's do the same for IPv6 with the same
reasoning.
Signed-off-by: Linus Lüssing <linus.luessing@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 3 Sep 2013 22:19:44 +0000 (00:19 +0200)]
net: ipv6: mld: document force_mld_version in ip-sysctl.txt
Document force_mld_version parameter in ip-sysctl.txt.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
We already have mld_{gq,ifc,dad}_start_timer() functions, so introduce
mld_{gq,ifc,dad}_stop_timer() functions to reduce code size and make it
more readable.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 3 Sep 2013 22:19:42 +0000 (00:19 +0200)]
net: ipv6: mld: refactor query processing into v1/v2 functions
Make igmp6_event_query() a bit easier to read by refactoring code
parts into mld_process_v1() and mld_process_v2().
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 3 Sep 2013 22:19:41 +0000 (00:19 +0200)]
net: ipv6: mld: similarly to MLDv2 have min max_delay of 1
Similarly as we do in MLDv2 queries, set a forged MLDv1 query with
0 ms mld_maxdelay to minimum timer shot time of 1 jiffies. This is
eventually done in igmp6_group_queried() anyway, so we can simplify
a check there.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 3 Sep 2013 22:19:40 +0000 (00:19 +0200)]
net: ipv6: mld: implement RFC3810 MLDv2 mode only
RFC3810, 10. Security Considerations says under subsection 10.1.
Query Message:
A forged Version 1 Query message will put MLDv2 listeners on that
link in MLDv1 Host Compatibility Mode. This scenario can be avoided
by providing MLDv2 hosts with a configuration option to ignore
Version 1 messages completely.
Hence, implement a MLDv2-only mode that will ignore MLDv1 traffic:
echo 2 > /proc/sys/net/ipv6/conf/ethX/force_mld_version or
echo 2 > /proc/sys/net/ipv6/conf/all/force_mld_version
Note that <all> device has a higher precedence as it was previously
also the case in the macro MLD_V1_SEEN() that would "short-circuit"
if condition on <all> case.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 3 Sep 2013 22:19:39 +0000 (00:19 +0200)]
net: ipv6: mld: get rid of MLDV2_MRC and simplify calculation
Get rid of MLDV2_MRC and use our new macros for mantisse and
exponent to calculate Maximum Response Delay out of the Maximum
Response Code.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 3 Sep 2013 22:19:38 +0000 (00:19 +0200)]
net: ipv6: mld: clean up MLD_V1_SEEN macro
Replace the macro with a function to make it more readable. GCC will
eventually decide whether to inline this or not (also, that's not
fast-path anyway).
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
The Maximum Response Delay used to calculate the Maximum Response
Code inserted into the periodic General Queries. Default value:
10000 (10 seconds) [...] The number of seconds represented by the
[Query Response Interval] must be less than the [Query Interval].
iii) RFC3810, 9.12. Older Version Querier Present Timeout [OVQPT] says:
The Older Version Querier Present Timeout is the time-out for
transitioning a host back to MLDv2 Host Compatibility Mode. When an
MLDv1 query is received, MLDv2 hosts set their Older Version Querier
Present Timer to [Older Version Querier Present Timeout].
This value MUST be ([Robustness Variable] times (the [Query Interval]
in the last Query received)) plus ([Query Response Interval]).
Having that said, we currently calculate [OVQPT] (here given as 'switchback'
variable) as ...
switchback = (idev->mc_qrv + 1) * max_delay
RFC3810, 9.12. says "the [Query Interval] in the last Query received". In
section "9.14. Configuring timers", it is said:
This section is meant to provide advice to network administrators on
how to tune these settings to their network. Ambitious router
implementations might tune these settings dynamically based upon
changing characteristics of the network. [...]
iv) RFC38010, 9.14.2. Query Interval:
The overall level of periodic MLD traffic is inversely proportional
to the Query Interval. A longer Query Interval results in a lower
overall level of MLD traffic. The value of the Query Interval MUST
be equal to or greater than the Maximum Response Delay used to
calculate the Maximum Response Code inserted in General Query
messages.
I assume that was why switchback is calculated as is (3 * max_delay), although
this setting seems to be meant for routers only to configure their [QI]
interval for non-default intervals. So usage here like this is clearly wrong.
Concluding, the current behaviour in IPv6's multicast code is not conform
to the RFC as switch back is calculated wrongly. That is, it has a too small
value, so MLDv2 hosts switch back again to MLDv2 way too early, i.e. ~30secs
instead of ~260secs on default.
Hence, introduce necessary helper functions and fix this up properly as it
should be.
Introduced in 06da92283 ("[IPV6]: Add MLDv2 support."). Credits to Hannes
Frederic Sowa who also had a hand in this as well. Also thanks to Hangbin Liu
who did initial testing.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Cc: David Stevens <dlstevens@us.ibm.com> Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Commit 1b7fdd2ab585("tcp: do not use cached RTT for RTT estimation")
removes important comments on how RTO is initialized and updated.
Hopefully this patch puts those information back.
Signed-off-by: Yuchung Cheng <ycheng@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
vxlan-udp-recv function lookup vxlan_sock struct on every packet
recv by using udp-port number. we can use sk->sk_user_data to
store vxlan_sock and avoid lookup.
I have open coded rcu-api to store and read vxlan_sock from
sk_user_data to avoid sparse warning as sk_user_data is not
__rcu pointer.
Signed-off-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4
net: sctp: Fix data chunk fragmentation for MTU values which are not multiple of 4
Initially the problem was observed with ipsec, but later it became clear that
SCTP data chunk fragmentation algorithm has problems with MTU values which are
not multiple of 4. Test program was used which just transmits 2000 bytes long
packets to other host. tcpdump was used to observe re-fragmentation in IP layer
after SCTP already fragmented data chunks.
With MTU 1500:
12:54:34.082904 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1500)
10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 2366088589] [SID: 0] [SSEQ 1] [PPID 0x0]
12:54:34.082933 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 596)
10.151.38.153.39303 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 2366088590] [SID: 0] [SSEQ 1] [PPID 0x0]
12:54:34.090576 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
10.151.24.91.54321 > 10.151.38.153.39303: sctp (1) [SACK] [cum ack 2366088590] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]
With MTU 1499:
13:02:49.955220 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 0, flags [+], proto SCTP (132), length 1492)
10.151.38.153.39084 > 10.151.24.91.54321: sctp[|sctp]
13:02:49.955249 IP (tos 0x2,ECT(0), ttl 64, id 48215, offset 1472, flags [none], proto SCTP (132), length 28)
10.151.38.153 > 10.151.24.91: ip-proto-132
13:02:49.955262 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
10.151.38.153.39084 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 404355346] [SID: 0] [SSEQ 1] [PPID 0x0]
13:02:49.956770 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
10.151.24.91.54321 > 10.151.38.153.39084: sctp (1) [SACK] [cum ack 404355346] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]
Here problem in data portion limit calculation leads to re-fragmentation in IP,
which is sub-optimal. The problem is max_data initial value, which doesn't take
into account the fact, that data chunk must be padded to 4-bytes boundary.
It's enough to correct max_data, because all later adjustments are correctly
aligned to 4-bytes boundary.
After the fix is applied, everything is fragmented correctly for uneven MTUs:
15:16:27.083881 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 1496)
10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (B) [TSN: 3077098183] [SID: 0] [SSEQ 1] [PPID 0x0]
15:16:27.083907 IP (tos 0x2,ECT(0), ttl 64, id 0, offset 0, flags [DF], proto SCTP (132), length 600)
10.151.38.153.53417 > 10.151.24.91.54321: sctp (1) [DATA] (E) [TSN: 3077098184] [SID: 0] [SSEQ 1] [PPID 0x0]
15:16:27.085640 IP (tos 0x2,ECT(0), ttl 63, id 0, offset 0, flags [DF], proto SCTP (132), length 48)
10.151.24.91.54321 > 10.151.38.153.53417: sctp (1) [SACK] [cum ack 3077098184] [a_rwnd 79920] [#gap acks 0] [#dup tsns 0]
The bug was there for years already, but
- is a performance issue, the packets are still transmitted
- doesn't show up with default MTU 1500, but possibly with ipsec (MTU 1438)
Signed-off-by: Alexander Sverdlin <alexander.sverdlin@nsn.com> Acked-by: Vlad Yasevich <vyasevich@gmail.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Julia Lawall [Mon, 2 Sep 2013 09:54:21 +0000 (11:54 +0200)]
drivers:net: delete premature free_irq
Free_irq is not needed if there has been no request_irq. Free_irq is
removed from both the probe and remove functions. The correct request_irq
and free_irq are found in the open and close functions.
A simplified version of the semantic match that finds this problem is as
follows: (http://coccinelle.lip6.fr/)
// <smpl>
@@
expression e;
@@
*e = platform_get_irq(...);
... when != request_irq(e,...)
*free_irq(e,...)
// </smpl>
Signed-off-by: Julia Lawall <Julia.Lawall@lip6.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Carlos O'Donell [Thu, 15 Aug 2013 09:28:10 +0000 (17:28 +0800)]
net: sync some IP headers with glibc
Solution:
=========
- Synchronize linux's `include/uapi/linux/in6.h'
with glibc's `inet/netinet/in.h'.
- Synchronize glibc's `inet/netinet/in.h with linux's
`include/uapi/linux/in6.h'.
- Allow including the headers in either other.
- First header included defines the structures and macros.
Details:
========
The kernel promises not to break the UAPI ABI so I don't
see why we can't just have the two userspace headers
coordinate?
If you include the kernel headers first you get those,
and if you include the glibc headers first you get those,
and the following patch arranges a coordination and
synchronization between the two.
Let's handle `include/uapi/linux/in6.h' from linux,
and `inet/netinet/in.h' from glibc and ensure they compile
in any order and preserve the required ABI.
These two patches pass the following compile tests:
cat >> test1.c <<EOF
int main (void) {
return 0;
}
EOF
gcc -c test1.c
cat >> test2.c <<EOF
int main (void) {
return 0;
}
EOF
gcc -c test2.c
One wrinkle is that the kernel has a different name for one of
the members in ipv6_mreq. In the kernel patch we create a macro
to cover the uses of the old name, and while that's not entirely
clean it's one of the best solutions (aside from an anonymous
union which has other issues).
I've reviewed the code and it looks to me like the ABI is
assured and everything matches on both sides.
Notes:
- You want netinet/in.h to include bits/in.h as early as possible,
but it needs in_addr so define in_addr early.
- You want bits/in.h included as early as possible so you can use
the linux specific code to define __USE_KERNEL_DEFS based on
the _UAPI_* macro definition and use those to cull in.h.
- glibc was missing IPPROTO_MH, added here.
Compile tested and inspected.
Reported-by: Thomas Backlund <tmb@mageia.org> Cc: Thomas Backlund <tmb@mageia.org> Cc: libc-alpha@sourceware.org Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org> Cc: David S. Miller <davem@davemloft.net> Tested-by: Cong Wang <amwang@redhat.com> Signed-off-by: Carlos O'Donell <carlos@redhat.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 4 Sep 2013 16:40:37 +0000 (12:40 -0400)]
Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-next
Jeff Kirsher says:
====================
This series contains updates to igb only.
Todd provides a fix for igb to not look for a PBA in the iNVM on
devices that are flashless.
Akeem provides igb patches to add a new PHY id for i354, as well as
a couple of patches to implement the new PHY id. He also provides
several patches to correctly report the appropriate media type as
well as correctly report advertised/supported link for i354 devices.
Lastly Akeem implements a 1 second delay mechanism for i210 devices
to avoid erroneous link issue with the link partner.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
igb: Implementation to report advertised/supported link on i354 devices
This patch changes the way we report supported/advertised link for i354
devices, especially for 2.5 GB. Instead of reporting 2.5 GB for all i354
devices erroneously, check first, if it is 2.5 GB capable.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
igb: Get speed and duplex for 1G non_copper devices
This patch changes how we get speed/duplex for non_copper devices; it
now uses pcs register to get current speed and duplex instead of using
generic status register that we use to detect speed/duplex for copper
devices.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Phil Oester [Sun, 1 Sep 2013 15:32:21 +0000 (08:32 -0700)]
netfilter: xt_TCPMSS: correct return value in tcpmss_mangle_packet
In commit b396966c4 (netfilter: xt_TCPMSS: Fix missing fragmentation handling),
I attempted to add safe fragment handling to xt_TCPMSS. However, Andy Padavan
of Project N56U correctly points out that returning XT_CONTINUE in this
function does not work. The callers (tcpmss_tg[46]) expect to receive a value
of 0 in order to return XT_CONTINUE.
Signed-off-by: Phil Oester <kernel@linuxace.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
igb: Support to get 2_5G link status for appropriate media type
Since i354 2.5Gb devices are not Copper media type but SerDes, so this
patch changes the way we detect speed/duplex link info for this device.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
PHY Power Management does not exist for i354 device. So, there is no
need to read and write this register or clear go link Disconnect bit,
which could cause a lot of issues.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch implements downshift mechanism for M88E1543 PHY, so that
downshift is disabled first during link setup process, and later enabled
if we are master and downshift link is negotiated. Also cleaned up
return code implementation.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch changes PHY_ID for i354 device, now using M88E1543
instead of M88E1545.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
igb: Implementation of 1-sec delay for i210 devices
This patch adds 1 sec delay mechanism to i210 device family, in order
to avoid erroneous link issue with the link partner.
Signed-off-by: Akeem G Abodunrin <akeem.g.abodunrin@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
netfilter: SYNPROXY: let unrelated packets continue
Packets reaching SYNPROXY were default dropped, as they were most
likely invalid (given the recommended state matching). This
patch, changes SYNPROXY target to let packets, not consumed,
continue being processed by the stack.
This will be more in line other target modules. As it will allow
more flexible configurations of handling, logging or matching on
packets in INVALID states.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The reason is that the conntrack template is set to confirmed before adding
the extension and it is invalid to add extensions to already confirmed
conntracks. Fix by adding the extensions before setting the conntrack to
confirmed.
Reported-by: Jesper Dangaard Brouer <jesper.brouer@gmail.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This does filter SYN flags early, for packets in the UNTRACKED state,
but packets in the INVALID state with other TCP flags could still
reach the module, thus this stricter flag matching is still needed.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Patrick McHardy <kaber@trash.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
tcp_rcv_established() returns only one value namely 0. We change the return
value to void (as suggested by David Miller).
After commit 0c24604b (tcp: implement RFC 5961 4.2), we no longer send RSTs in
response to SYNs. We can remove the check and processing on the return value of
tcp_rcv_established().
We also fix jtcp_rcv_established() in tcp_probe.c to match that of
tcp_rcv_established().
Signed-off-by: Vijay Subramanian <subramanian.vijay@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Tue, 3 Sep 2013 16:24:02 +0000 (18:24 +0200)]
net: tcp_probe: adapt tbuf size for recent changes
With recent changes in tcp_probe module (e.g. f925d0a62d ("net: tcp_probe:
add IPv6 support")) we also need to take into account that tbuf needs to
be updated as format string will be further expanded. tbuf sits on the stack
in tcpprobe_read() function that is invoked when user space reads procfs
file /proc/net/tcpprobe, hence not fast path as in jtcp_rcv_established().
Having a size similarly as in sctp_probe module of 256 bytes is fully
sufficient for that, we need theoretical maximum of 252 bytes otherwise we
could get truncated.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 3 Sep 2013 09:13:47 +0000 (12:13 +0300)]
qlcnic: remove a stray semicolon
Just remove a small semicolon.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Himanshu Madhani <himanshu.madhani@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 3 Sep 2013 09:03:40 +0000 (12:03 +0300)]
x25: add a sanity check parsing X.25 facilities
This was found with a manual audit and I don't have a reproducer. We
limit ->calling_len and ->called_len when we get them from
copy_from_user() in x25_ioctl() so when they come from skb->data then
we should cap them there as well.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 3 Sep 2013 09:02:32 +0000 (12:02 +0300)]
caif: add a sanity check to the tty name
"tty->name" and "name" are a 64 character buffers. My static checker
complains because we add the "cf" on the front so it look like we are
copying a 66 character string into a 64 character buffer.
Also if the name is larger than IFNAMSIZ (16) it triggers a BUG_ON()
inside the call to alloc_netdev().
This is all under CAP_SYS_ADMIN so it's not a security fix, it just adds
a little robustness.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Mon, 2 Sep 2013 23:54:04 +0000 (08:54 +0900)]
net: netx-eth: remove unnecessary casting
Casting from 'void *' is unnecessary, because casting from 'void *'
to any pointer type is automatic.
Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently we're linking upper devices to lower ones, which results in
upside-down relationship: upper devices seeing lower devices via its upper
lists.
Fix this by correctly linking lower devices to the upper ones.
CC: "David S. Miller" <davem@davemloft.net> CC: Eric Dumazet <edumazet@google.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Cong Wang <amwang@redhat.com> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Dichtel [Mon, 2 Sep 2013 13:34:57 +0000 (15:34 +0200)]
tunnels: harmonize cleanup done on skb on xmit path
The goal of this patch is to harmonize cleanup done on a skbuff on xmit path.
Before this patch, behaviors were different depending of the tunnel type.
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: simplify bond_3ad_update_lacp_rate and use RTNL for sync
We can drop the use of bond->lock for mutual exclusion in
bond_3ad_update_lacp_rate and use RTNL in the sysfs store function
instead. This way we'll prevent races with mode change and interface
up/down as well as simplify update_lacp_rate by removing the check for
port->slave because it'll always be initialized (done while enslaving
with RTNL). This change will also help in the future removal of reader
bond->lock from bond_enslave.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch aims to remove a use of the bond->lock for mutual exclusion
which will later allow easier migration to RCU of the users of this
functionality. We use RTNL as a synchronizing mechanism since it's
always held when send_peer_notif is set, and when it is decremented from
the notifier function. We can also drop some locking, and fix the
leakage of the send_peer_notif counter.
Signed-off-by: Nikolay Aleksandrov <nikolay@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Mon, 2 Sep 2013 08:41:01 +0000 (16:41 +0800)]
vhost_net: correctly limit the max pending buffers
As Michael point out, We used to limit the max pending DMAs to get better cache
utilization. But it was not done correctly since it was one done when there's no
new buffers submitted from guest. Guest can easily exceeds the limitation by
keeping sending packets.
So this patch moves the check into main loop. Tests shows about 5%-10%
improvement on per cpu throughput for guest tx.
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Mon, 2 Sep 2013 08:41:00 +0000 (16:41 +0800)]
vhost_net: poll vhost queue after marking DMA is done
We used to poll vhost queue before making DMA is done, this is racy if vhost
thread were waked up before marking DMA is done which can result the signal to
be missed. Fix this by always polling the vhost thread before DMA is done.
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Mon, 2 Sep 2013 08:40:59 +0000 (16:40 +0800)]
vhost_net: determine whether or not to use zerocopy at one time
Currently, even if the packet length is smaller than VHOST_GOODCOPY_LEN, if
upend_idx != done_idx we still set zcopy_used to true and rollback this choice
later. This could be avoided by determining zerocopy once by checking all
conditions at one time before.
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Mon, 2 Sep 2013 08:40:58 +0000 (16:40 +0800)]
vhost: switch to use vhost_add_used_n()
Let vhost_add_used() to use vhost_add_used_n() to reduce the code
duplication. To avoid the overhead brought by __copy_to_user(). We will use
put_user() when one used need to be added.
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Mon, 2 Sep 2013 08:40:57 +0000 (16:40 +0800)]
vhost_net: use vhost_add_used_and_signal_n() in vhost_zerocopy_signal_used()
We tend to batch the used adding and signaling in vhost_zerocopy_callback()
which may result more than 100 used buffers to be updated in
vhost_zerocopy_signal_used() in some cases. So switch to use
vhost_add_used_and_signal_n() to avoid multiple calls to
vhost_add_used_and_signal(). Which means much less times of used index
updating and memory barriers.
2% performance improvement were seen on netperf TCP_RR test.
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Mon, 2 Sep 2013 08:12:41 +0000 (17:12 +0900)]
net: sunhme: use pci_{get,set}_drvdata()
Use the wrapper functions for getting and setting the driver data
using pci_dev instead of using dev_{get,set}_drvdata() with
&pdev->dev, so we can directly pass a struct pci_dev. This is
a purely cosmetic change.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Mon, 2 Sep 2013 08:11:53 +0000 (17:11 +0900)]
net: tulip: use pci_{get,set}_drvdata()
Use the wrapper functions for getting and setting the driver data
using pci_dev instead of using dev_{get,set}_drvdata() with
&pdev->dev, so we can directly pass a struct pci_dev. This is
a purely cosmetic change.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Mon, 2 Sep 2013 08:10:09 +0000 (17:10 +0900)]
net: mdio-octeon: use platform_{get,set}_drvdata()
Use the wrapper functions for getting and setting the driver data
using platform_device instead of using dev_{get,set}_drvdata()
with &pdev->dev, so we can directly pass a struct platform_device.
This is a purely cosmetic change.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Mon, 2 Sep 2013 08:08:44 +0000 (17:08 +0900)]
net: sunhme: use platform_{get,set}_drvdata()
Use the wrapper functions for getting and setting the driver data
using platform_device instead of using dev_{get,set}_drvdata()
with &pdev->dev, so we can directly pass a struct platform_device.
This is a purely cosmetic change.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jingoo Han [Mon, 2 Sep 2013 08:06:52 +0000 (17:06 +0900)]
net: emac: use platform_{get,set}_drvdata()
Use the wrapper functions for getting and setting the driver data
using platform_device instead of using dev_{get,set}_drvdata()
with &pdev->dev, so we can directly pass a struct platform_device.
This is a purely cosmetic change.
Signed-off-by: Jingoo Han <jg1.han@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 30 Aug 2013 14:36:14 +0000 (15:36 +0100)]
net: fix comment typo for __skb_alloc_pages()
The name of the function in the comment is __skb_alloc_page() while we
are actually commenting __skb_alloc_pages(). Fix this typo and make it
a valid kernel doc comment.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Claudiu Manoil [Fri, 30 Aug 2013 12:01:15 +0000 (15:01 +0300)]
gianfar: Fix reported number of sent bytes to BQL
Fix the amount of sent bytes reported to BQL by reporting the
number of bytes on wire in the xmit routine, and recording that
value for each skb in order to be correctly confirmed on Tx
confirmation cleanup.
Reporting skb->len to BQL just before exiting xmit is not correct
due to possible insertions of TOE block and alignment bytes in the
skb->data, which are being stripped off by the controller before
transmission on wire. This led to mismatch of (incorrectly)
reported bytes to BQL b/w xmit and Tx confirmation, resulting in
Tx timeout firing, for the h/w tx timestamping acceleration case.
There's no easy way to obtain the number of bytes on wire in the Tx
confirmation routine, so skb->cb is used to convey that information
from xmit to Tx confirmation, for now (as proposed by Eric). Revived
the currently unused GFAR_CB() construct for that purpose.
Signed-off-by: Claudiu Manoil <claudiu.manoil@freescale.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Aloni [Fri, 30 Aug 2013 10:59:46 +0000 (13:59 +0300)]
netconsole: avoid a crash with multiple sysfs writers
When my 'ifup eth' script was fired multiple times and ran concurrent on
my laptop, for some obscure /etc scripting reason, it was revealed
that the store_enabled() function in netconsole doesn't handle it nicely,
as recorded by the Oops below (a syslog paste, but not mangled too much
to prevent from discerning the traceback).
On Linux 3.10.4, this patch seeks to remedy the problem, and it has been
running stable on my laptop for a few days.
Signed-off-by: Dan Aloni <alonid@postram.com> Signed-off-by: Neil Horman <nhorman@tuxdriver.com> CC: David S. Miller <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Kouei Abe [Fri, 30 Aug 2013 03:41:08 +0000 (12:41 +0900)]
sh_eth: Enable Rx descriptor word 0 shift for r8a7790
This corrects an oversight when r8a7790 support was added to sh_eth.
Signed-off-by: Kouei Abe <kouei.abe.cp@renesas.com> Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Kouei Abe [Fri, 30 Aug 2013 03:41:07 +0000 (12:41 +0900)]
sh_eth: Fix cache invalidation omission of receive buffer
Signed-off-by: Kouei Abe <kouei.abe.cp@renesas.com> Signed-off-by: Simon Horman <horms+renesas@verge.net.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Thu, 29 Aug 2013 21:38:57 +0000 (23:38 +0200)]
bonding: use rlb_client_info->vlan_id instead of ->tag
Store VID in ->vlan_id (if any), and remove the useless ->tag.
CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Veaceslav Falico [Thu, 29 Aug 2013 21:38:56 +0000 (23:38 +0200)]
bonding: remove bond_vlan_used()
We're using it currently to verify if we have vlans before getting the tag
from the skb we're about to send. It's useless because the vlan_get_tag()
verifies if the skb has the tag (and returns an error if not), and we can
receive tagged skbs only if we *already* have vlans.
Plus, the current RCUed implementation is kind of useless anyway - the we
can remove the last vlan in the moment we return from the function.
So remove the only usage of it and the whole function.
CC: Jay Vosburgh <fubar@us.ibm.com> CC: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 4 Sep 2013 01:54:02 +0000 (21:54 -0400)]
Merge branch 'for-davem' of git://gitorious.org/linux-can/linux-can-next
Marc Kleine-Budde says:
====================
this is a pull request for net-next. There are two patches from Gerhard
Sittig, which improves the clock handling on mpc5121. Oliver Hartkopp
provides a patch that adds a per rule limitation of frame hops.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 4 Sep 2013 01:45:31 +0000 (21:45 -0400)]
Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless-next
John W. Linville says:
====================
Please accept this batch of updates intended for the 3.12 stream.
For the mac80211 bits, Johannes says this:
"This time I have various improvements all over the place: IBSS, mesh,
testmode, AP client powersave handling, one of the rare rfkill patches
and some code cleanup."
Also for mac80211:
"And I also have some more changes for -next, just a few small fixes and
improvements, nothing really stands out."
And for iwlwifi:
"This time I have some powersave work (notably uAPSD support), CQM
offloads, support for a new firmware API and various code cleanups."
Regarding the Bluetooth bits, Gustavo says:
"Patches to 3.12, here we have:
* implementation of a proper tty_port for RFCOMM devices, this fixes some
issues people were seeing lately in the kernel.
* Add voice_setting option for SCO, it is used for SCO Codec selection
* bugfixes, small improvements and clean ups"
For the NFC bits, Samuel says:
"With this one we have:
- A few pn533 improvements and minor fixes. Testing our pn533 driver
against Google's NCI stack triggered a few issues that we fixed now.
We also added Tx fragmentation support to this driver.
- More NFC secure element handling. We added a GET_SE netlink command
for getting all the discovered secure elements, and we defined 2
additional secure element netlink event (transaction and connectivity).
We also fixed a couple of typos and copy-paste bugs from the secure
element handling code.
- Firmware download support for the pn544 driver. This chipset can enter a
special mode where it's waiting for firmware blobs to replace the
already flashed one. We now support that mode."
With repect to the ath tree, Kalle says:
"New features in ath10k are rx/tx checsumming in hw and survey scan
implemented by Michal. Also he made fixes to different areas of the
driver, most notable being fixing the case when using two streams and
reducing the number of interface combinations to avoid firmware crashes.
Bartosz did a clean related to how we handle SoC power save in PCI
layer.
For ath6kl Mohammed and Vasanth sent each a patch to fix two infrequent
crashes."
I also pulled the wireless tree into wireless-next to support a
request from Johannes. On top of all that, there are the usual
sort of driver updates. The mwifiex, brcmfmac, brcmsmac, ath9k,
and rt2x00 drivers all get some attention, as does the bcma bus and
a few other random bits here and there.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 29 Aug 2013 16:54:49 +0000 (18:54 +0200)]
net: sctp: probe: allow more advanced ingress filtering by mark
This is a follow-up commit for commit b1dcdc68b1f4 ("net: tcp_probe:
allow more advanced ingress filtering by mark") that allows for
advanced SCTP probe module filtering based on skb mark (for a more
detailed description and advantages using mark, refer to b1dcdc68b1f4).
The current option to filter by a given port is still being preserved.
Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tim Gardner [Thu, 29 Aug 2013 12:38:47 +0000 (06:38 -0600)]
net: neighbour: Remove CONFIG_ARPD
This config option is superfluous in that it only guards a call
to neigh_app_ns(). Enabling CONFIG_ARPD by default has no
change in behavior. There will now be call to __neigh_notify()
for each ARP resolution, which has no impact unless there is a
user space daemon waiting to receive the notification, i.e.,
the case for which CONFIG_ARPD was designed anyways.
Suggested-by: Eric W. Biederman <ebiederm@xmission.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Gao feng <gaofeng@cn.fujitsu.com> Cc: Joe Perches <joe@perches.com> Cc: Veaceslav Falico <vfalico@redhat.com> Signed-off-by: Tim Gardner <tim.gardner@canonical.com> Reviewed-by: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Huth [Tue, 27 Aug 2013 15:09:02 +0000 (17:09 +0200)]
virtio-net: Set RXCSUM feature if GUEST_CSUM is available
If the VIRTIO_NET_F_GUEST_CSUM virtio feature is available, the guest
does not have to calculate the checksums on all received packets. This
is pretty much the same feature as RX checksum offloading on real
network cards, so the virtio-net driver should report this by setting
the NETIF_F_RXCSUM flag. When the user now runs "ethtool -k", he or she
can see whether the virtio-net interface has to calculate RX checksums
or not.
Signed-off-by: Thomas Huth <thuth@linux.vnet.ibm.com> Acked-by: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 4 Sep 2013 00:57:57 +0000 (20:57 -0400)]
Merge branch 'addr_assign_type'
Bjørn Mork says:
====================
net: set addr_assign_type when inheriting a dev_addr
Copying the dev_addr from a parent device is an operation
common to a number of drivers. The addr_assign_type should
be updated accordingly, either by reusing the value from
the source device or explicitly indicating that the address
is stolen by setting addr_assign_type to NET_ADDR_STOLEN.
This patch set adds a helper copying both the dev_addr and
the addr_assign_type, and use this helper in drivers which
don't currently set the addr_assign_type. Using NET_ADDR_STOLEN
might be more appropriate in some of these cases. Please
let me know, and I'll update the patch accordingly.
Changes in v2:
- assuming addr_len == ETH_ALEN to allow optimized memcpy
- dropped the vt6656 patch due to addr_len being unset in that driver
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 30 Aug 2013 16:08:52 +0000 (18:08 +0200)]
staging: vt6655: inherit addr_assign_type along with dev_addr
A device inheriting a random or set address should reflect this in
its addr_assign_type.
Cc: Forest Bond <forest@alittletooquiet.net> Signed-off-by: Bjørn Mork <bjorn@mork.no> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Bjørn Mork [Fri, 30 Aug 2013 16:08:44 +0000 (18:08 +0200)]
net: etherdevice: add address inherit helper
Some etherdevices inherit their address from a parent or
master device. The addr_assign_type should be updated along
with the address in these cases. Adding a helper function
to simplify this.
Signed-off-by: Bjørn Mork <bjorn@mork.no> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 2 Sep 2013 02:06:53 +0000 (10:06 +0800)]
net: make snmp_mib_free static inline
Fengguang reported:
net/built-in.o: In function `in6_dev_finish_destroy':
(.text+0x4ca7d): undefined reference to `snmp_mib_free'
this is due to snmp_mib_free() is defined when CONFIG_INET is enabled,
but in6_dev_finish_destroy() is now moved to core kernel.
I think snmp_mib_free() is small enough to be inlined, so just make it
static inline.
Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 2 Sep 2013 02:06:52 +0000 (10:06 +0800)]
vxlan: include net/ip6_checksum.h for csum_ipv6_magic()
Fengguang reported a compile warning:
drivers/net/vxlan.c: In function 'vxlan6_xmit_skb':
drivers/net/vxlan.c:1352:3: error: implicit declaration of function 'csum_ipv6_magic' [-Werror=implicit-function-declaration]
cc1: some warnings being treated as errors
this patch fixes it.
Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Cong Wang <amwang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>