Liping Zhang [Sun, 6 Nov 2016 06:40:01 +0000 (14:40 +0800)]
netfilter: nf_tables: fix oops when inserting an element into a verdict map
Dalegaard says:
The following ruleset, when loaded with 'nft -f bad.txt'
----snip----
flush ruleset
table ip inlinenat {
map sourcemap {
type ipv4_addr : verdict;
}
chain postrouting {
ip saddr vmap @sourcemap accept
}
}
add chain inlinenat test
add element inlinenat sourcemap { 100.123.10.2 : jump test }
----snip----
Nicolas Dichtel says:
After commit b87a2f9199ea ("netfilter: conntrack: add gc worker to
remove timed-out entries"), netlink conntrack deletion events may be
sent with a huge delay.
and indeed, this isn't optimal at all. Rationale here was to ensure that
we don't block other work items for too long, even if
nf_conntrack_htable_size is huge. But in order to have some guarantee
about maximum time period where a scan of the full conntrack table
completes we should always use a fixed slice size, so that once every
N scans the full table has been examined at least once.
We also need to balance this vs. the case where the system is either idle
(i.e., conntrack table (almost) empty) or very busy (i.e. eviction happens
from packet path).
So, after some discussion with Nicolas:
1. want hard guarantee that we scan entire table at least once every X s
-> need to scan fraction of table (get rid of upper bound)
2. don't want to eat cycles on idle or very busy system
-> increase interval if we did not evict any entries
3. don't want to block other worker items for too long
-> make fraction really small, and prefer small scan interval instead
4. Want reasonable short time where we detect timed-out entry when
system went idle after a burst of traffic, while not doing scans
all the time.
-> Store next gc scan in worker, increasing delays when no eviction
happened and shrinking delay when we see timed out entries.
The old gc interval is turned into a max number, scans can now happen
every jiffy if stale entries are present.
Longest possible time period until an entry is evicted is now 2 minutes
in worst case (entry expires right after it was deemed 'not expired').
Reported-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Florian Westphal [Sat, 29 Oct 2016 01:01:50 +0000 (03:01 +0200)]
netfilter: connmark: ignore skbs with magic untracked conntrack objects
The (percpu) untracked conntrack entries can end up with nonzero connmarks.
The 'untracked' conntrack objects are merely a way to distinguish INVALID
(i.e. protocol connection tracker says payload doesn't meet some
requirements or packet was never seen by the connection tracking code)
from packets that are intentionally not tracked (some icmpv6 types such as
neigh solicitation, or by using 'iptables -j CT --notrack' option).
Untracked conntrack objects are implementation detail, we might as well use
invalid magic address instead to tell INVALID and UNTRACKED apart.
Check skb->nfct for untracked dummy and behave as if skb->nfct is NULL.
Liping Zhang [Sat, 29 Oct 2016 14:03:05 +0000 (22:03 +0800)]
netfilter: nf_tables: destroy the set if fail to add transaction
When the memory is exhausted, then we will fail to add the NFT_MSG_NEWSET
transaction. In such case, we should destroy the set before we free it.
Fixes: 958bee14d071 ("netfilter: nf_tables: use new transaction infrastructure to handle sets") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Building the ip_vs_sync code with CONFIG_OPTIMIZE_INLINING on x86
confuses the compiler to the point where it produces a rather
dubious warning message:
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.init_seq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
struct ip_vs_sync_conn_options opt;
^~~
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘opt.previous_delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).init_seq’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
net/netfilter/ipvs/ip_vs_sync.c:1073:33: error: ‘*((void *)&opt+12).previous_delta’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
The problem appears to be a combination of a number of factors, including
the __builtin_bswap32 compiler builtin being slightly odd, having a large
amount of code inlined into a single function, and the way that some
functions only get partially inlined here.
I've spent way too much time trying to work out a way to improve the
code, but the best I've come up with is to add an explicit memset
right before the ip_vs_seq structure is first initialized here. When
the compiler works correctly, this has absolutely no effect, but in the
case that produces the warning, the warning disappears.
In the process of analysing this warning, I also noticed that
we use memcpy to copy the larger ip_vs_sync_conn_options structure
over two members of the ip_vs_conn structure. This works because
the layout is identical, but seems error-prone, so I'm changing
this in the process to directly copy the two members. This change
seemed to have no effect on the object code or the warning, but
it deals with the same data, so I kept the two changes together.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
This is now a fixed-size extension, so we don't need to pass a variable
alloc size. This (harmless) error results in allocating 32 instead of
the needed 16 bytes for this extension as the size gets passed twice.
Fixes: 23014011ba420 ("netfilter: conntrack: support a fixed size of 128 distinct labels") Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
John W. Linville [Tue, 25 Oct 2016 19:56:39 +0000 (15:56 -0400)]
netfilter: nf_tables: fix type mismatch with error return from nft_parse_u32_check
Commit 36b701fae12ac ("netfilter: nf_tables: validate maximum value of
u32 netlink attributes") introduced nft_parse_u32_check with a return
value of "unsigned int", yet on error it returns "-ERANGE".
This patch corrects the mismatch by changing the return value to "int",
which happens to match the actual users of nft_parse_u32_check already.
Note that commit 21a9e0f1568ea ("netfilter: nft_exthdr: fix error
handling in nft_exthdr_init()) attempted to address the issue, but
did not address the return type of nft_parse_u32_check.
Signed-off-by: John W. Linville <linville@tuxdriver.com> Cc: Laura Garcia Liebana <nevola@gmail.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 36b701fae12ac ("netfilter: nf_tables: validate maximum value...") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Ulrich Weber [Mon, 24 Oct 2016 16:07:23 +0000 (18:07 +0200)]
netfilter: nf_conntrack_sip: extend request line validation
on SIP requests, so a fragmented TCP SIP packet from an allow header starting with
INVITE,NOTIFY,OPTIONS,REFER,REGISTER,UPDATE,SUBSCRIBE
Content-Length: 0
will not bet interpreted as an INVITE request. Also Request-URI must start with an alphabetic character.
Liping Zhang [Sat, 22 Oct 2016 10:51:26 +0000 (18:51 +0800)]
netfilter: nf_tables: fix race when create new element in dynset
Packets may race when create the new element in nft_hash_update:
CPU0 CPU1
lookup_fast - fail lookup_fast - fail
new - ok new - ok
insert - ok insert - fail(EEXIST)
So when race happened, we reuse the existing element. Otherwise,
these *racing* packets will not be handled properly.
Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Liping Zhang [Sat, 22 Oct 2016 10:51:25 +0000 (18:51 +0800)]
netfilter: nf_tables: fix *leak* when expr clone fail
When nft_expr_clone failed, a series of problems will happen:
1. module refcnt will leak, we call __module_get at the beginning but
we forget to put it back if ops->clone returns fail
2. memory will be leaked, if clone fail, we just return NULL and forget
to free the alloced element
3. set->nelems will become incorrect when set->size is specified. If
clone fail, we should decrease the set->nelems
Now this patch fixes these problems. And fortunately, clone fail will
only happen on counter expression when memory is exhausted.
Fixes: 086f332167d6 ("netfilter: nf_tables: add clone interface to expression operations") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Liping Zhang [Sat, 22 Oct 2016 10:51:24 +0000 (18:51 +0800)]
netfilter: nft_dynset: fix panic if NFT_SET_HASH is not enabled
When CONFIG_NFT_SET_HASH is not enabled and I input the following rule:
"nft add rule filter output flow table test {ip daddr counter }", kernel
panic happened on my system:
BUG: unable to handle kernel NULL pointer dereference at (null)
IP: [< (null)>] (null)
[...]
Call Trace:
[<ffffffffa0590466>] ? nft_dynset_eval+0x56/0x100 [nf_tables]
[<ffffffffa05851bb>] nft_do_chain+0xfb/0x4e0 [nf_tables]
[<ffffffffa0432f01>] ? nf_conntrack_tuple_taken+0x61/0x210 [nf_conntrack]
[<ffffffffa0459ea6>] ? get_unique_tuple+0x136/0x560 [nf_nat]
[<ffffffffa043bca1>] ? __nf_ct_ext_add_length+0x111/0x130 [nf_conntrack]
[<ffffffffa045a357>] ? nf_nat_setup_info+0x87/0x3b0 [nf_nat]
[<ffffffff81761e27>] ? ipt_do_table+0x327/0x610
[<ffffffffa045a6d7>] ? __nf_nat_alloc_null_binding+0x57/0x80 [nf_nat]
[<ffffffffa059f21f>] nft_ipv4_output+0xaf/0xd0 [nf_tables_ipv4]
[<ffffffff81702515>] nf_iterate+0x55/0x60
[<ffffffff81702593>] nf_hook_slow+0x73/0xd0
Because in rbtree type set, ops->update is not implemented. So just keep
it simple, in such case, report -EOPNOTSUPP to the user space.
Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Mintz, Yuval [Mon, 24 Oct 2016 05:48:09 +0000 (08:48 +0300)]
MAINTAINERS: Update qlogic networking drivers
Following Cavium's acquisition of qlogic we need to update all the qlogic
drivers maintainer's entries to point to our new e-mail addresses,
as well as update some of the driver's maintainers as those are no longer
working for Cavium.
I would like to thank Sony Chacko and Rajesh Borundia for their support
and development of our various networking drivers.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The Hyper-V netvsc driver was looking at the incorrect status bits
in the checksum info. It was setting the receive checksum unnecessary
flag based on the IP header checksum being correct. The checksum
flag is skb is about TCP and UDP checksum status. Because of this
bug, any packet received with bad TCP checksum would be passed
up the stack and to the application causing data corruption.
The problem is reproducible via netcat and netem.
This had a side effect of not doing receive checksum offload
on IPv6. The driver was also also always doing checksum offload
independent of the checksum setting done via ethtool.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 24 Oct 2016 01:03:06 +0000 (18:03 -0700)]
udp: fix IP_CHECKSUM handling
First bug was added in commit ad6f939ab193 ("ip: Add offset parameter to
ip_cmsg_recv") : Tom missed that ipv4 udp messages could be received on
AF_INET6 socket. ip_cmsg_recv(msg, skb) should have been replaced by
ip_cmsg_recv_offset(msg, skb, sizeof(struct udphdr));
Then commit e6afc8ace6dd ("udp: remove headers from UDP packets before
queueing") forgot to adjust the offsets now UDP headers are pulled
before skb are put in receive queue.
Fixes: ad6f939ab193 ("ip: Add offset parameter to ip_cmsg_recv") Fixes: e6afc8ace6dd ("udp: remove headers from UDP packets before queueing") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Sam Kumar <samanthakumar@google.com> Cc: Willem de Bruijn <willemb@google.com> Tested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Sun, 23 Oct 2016 17:01:09 +0000 (01:01 +0800)]
sctp: fix the panic caused by route update
Commit 7303a1475008 ("sctp: identify chunks that need to be fragmented
at IP level") made the chunk be fragmented at IP level in the next round
if it's size exceed PMTU.
But there still is another case, PMTU can be updated if transport's dst
expires and transport's pmtu_pending is set in sctp_packet_transmit. If
the new PMTU is less than the chunk, the same issue with that commit can
be triggered.
So we should drop this packet and let it retransmit in another round
where it would be fragmented at IP level.
This patch is to fix it by checking the chunk size after PMTU may be
updated and dropping this packet if it's size exceed PMTU.
Fixes: 90017accff61 ("sctp: Add GSO support") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Neil Horman <nhorman@txudriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Slaby [Fri, 21 Oct 2016 12:13:24 +0000 (14:13 +0200)]
net: sctp, forbid negative length
Most of getsockopt handlers in net/sctp/socket.c check len against
sizeof some structure like:
if (len < sizeof(int))
return -EINVAL;
On the first look, the check seems to be correct. But since len is int
and sizeof returns size_t, int gets promoted to unsigned size_t too. So
the test returns false for negative lengths. Yes, (-1 < sizeof(long)) is
false.
Fix this in sctp by explicitly checking len < 0 before any getsockopt
handler is called.
Note that sctp_getsockopt_events already handled the negative case.
Since we added the < 0 check elsewhere, this one can be removed.
Signed-off-by: Jiri Slaby <jslaby@suse.cz> Cc: Vlad Yasevich <vyasevich@gmail.com> Cc: Neil Horman <nhorman@tuxdriver.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: linux-sctp@vger.kernel.org Cc: netdev@vger.kernel.org Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fabio Estevam [Fri, 21 Oct 2016 11:34:29 +0000 (09:34 -0200)]
net: fec: Call swap_buffer() prior to IP header alignment
Commit 3ac72b7b63d5 ("net: fec: align IP header in hardware") breaks
networking on mx28.
There is an erratum on mx28 (ENGR121613 - ENET big endian mode
not compatible with ARM little endian) that requires an additional
byte-swap operation to workaround this problem.
So call swap_buffer() prior to performing the IP header alignment
to restore network functionality on mx28.
Fixes: 3ac72b7b63d5 ("net: fec: align IP header in hardware") Reported-and-tested-by: Henri Roosen <henri.roosen@ginzinger.com> Signed-off-by: Fabio Estevam <fabio.estevam@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bnx2x: Use the correct divisor value for PHC clock readings.
Time Sync (PTP) implementation uses the divisor/shift value for converting
the clock ticks to nanoseconds. Driver currently defines shift value as 1,
this results in the nanoseconds value to be calculated as half the actual
value. Hence the user application fails to synchronize the device clock
value with the PTP master device clock. Need to use the 'shift' value of 0.
Signed-off-by: Sony.Chacko <Sony.Chacko@cavium.com> Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Fri, 21 Oct 2016 08:43:45 +0000 (04:43 -0400)]
qede: Fix incorrrect usage of APIs for un-mapping DMA memory
Driver uses incorrect APIs to unmap DMA memory which were
mapped using dma_map_single(). This patch fixes it to use
appropriate APIs for un-mapping DMA memory.
Signed-off-by: Manish Chopra <manish.chopra@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed: Zero-out the buffer paased to dcbx_query() API
qed_dcbx_query_params() implementation populate the values to input
buffer based on the dcbx mode and, the current negotiated state/params,
the caller of this API need to memset the buffer to zero.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qede: Reconfigure rss indirection direction table when rss count is updated
Rx indirection table entries are in the range [0, (rss_count - 1)]. If
user reduces the rss count, the table entries may not be in the ccorrect
range. Need to reconfigure the table with new rss_count as a basis.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
With the current default values for Rx path i.e., 8 queues of 8Kb entries
each with 4Kb size, interface will consume 256Mb for Rx. The default values
causing the driver probe to fail when the system memory is low. Based on
the perforamnce results, rx-ring count value of 1Kb gives the comparable
performance with Rx coalesce timeout of 12 seconds. Updating the default
values.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qede: Loopback implementation should ignore the normal traffic
During the execution of loopback test, driver may receive the packets which
are not originated by this test, loopback implementation need to skip those
packets.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
RSS configuration is not supported for 100G adapters.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qede: get_channels() need to populate max tx/rx coalesce values
Recent changes in kernel ethtool implementation requires the driver
callback for get_channels() has to populate the values for max tx/rx
coalesce fields.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Thu, 20 Oct 2016 21:19:46 +0000 (14:19 -0700)]
ipv4: use the right lock for ping_group_range
This reverts commit a681574c99be23e4d20b769bf0e543239c364af5
("ipv4: disable BH in set_ping_group_range()") because we never
read ping_group_range in BH context (unlike local_port_range).
Then, since we already have a lock for ping_group_range, those
using ip_local_ports.lock for ping_group_range are clearly typos.
We might consider to share a same lock for both ping_group_range
and local_port_range w.r.t. space saving, but that should be for
net-next.
Fixes: a681574c99be ("ipv4: disable BH in set_ping_group_range()") Fixes: ba6b918ab234 ("ping: move ping_group_range out of CONFIG_SYSCTL") Cc: Eric Dumazet <edumazet@google.com> Cc: Eric Salo <salo@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 21 Oct 2016 21:21:56 +0000 (14:21 -0700)]
net: dsa: bcm_sf2: Do not rely on kexec_in_progress
After discussing with Eric, it turns out that, while using
kexec_in_progress is a nice optimization, which prevents us from always
powering on the integrated PHY, let's just turn it on in the shutdown
path.
This removes a dependency on kexec_in_progress which, according to Eric
should not be used by modules
Fixes: 2399d6143f85 ("net: dsa: bcm_sf2: Prevent GPHY shutdown for kexec'd kernels") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 21 Oct 2016 21:21:55 +0000 (14:21 -0700)]
Revert "kexec: Export kexec_in_progress to modules"
This reverts commit 97dcaa0fcfd24daa9a36c212c1ad1d5a97759212. Based on
the review discussion with Eric, we will come up with a different fix
for the bcm_sf2 driver which does not make it rely on the
kexec_in_progress value.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Moore [Sat, 22 Oct 2016 01:49:14 +0000 (21:49 -0400)]
netns: revert "netns: avoid disabling irq for netns id"
This reverts commit bc51dddf98c9 ("netns: avoid disabling irq for
netns id") as it was found to cause problems with systems running
SELinux/audit, see the mailing list thread below:
Eventually we should be able to reintroduce this code once we have
rewritten the audit multicast code to queue messages much the same
way we do for unicast messages. A tracking issue for this can be
found below:
Reported-by: Stephen Smalley <sds@tycho.nsa.gov> Reported-by: Elad Raz <e@eladraz.com> Cc: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Similar to commit 87e9f0315952
("ipv4: fix a potential deadlock in mcast getsockopt() path")
this is due to we still have a case, ipv6_sock_mc_close(),
where we acquire sk_lock before rtnl_lock. Close this deadlock
with the similar solution, that is always acquire rtnl lock first.
Fixes: baf606d9c9b1 ("ipv4,ipv6: grab rtnl before locking the socket") Reported-by: Baozeng Ding <sploving1@gmail.com> Tested-by: Baozeng Ding <sploving1@gmail.com> Cc: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The following patchset contains Netfilter fixes for your net tree,
they are:
1) Fix compilation warning in xt_hashlimit on m68k 32-bits, from
Geert Uytterhoeven.
2) Fix wrong timeout in set elements added from packet path via
nft_dynset, from Anders K. Pedersen.
3) Remove obsolete nf_conntrack_events_retry_timeout sysctl
documentation, from Nicolas Dichtel.
4) Ensure proper initialization of log flags via xt_LOG, from
Liping Zhang.
5) Missing alias to autoload ipcomp, also from Liping Zhang.
6) Missing NFTA_HASH_OFFSET attribute validation, again from Liping.
7) Wrong integer type in the new nft_parse_u32_check() function,
from Dan Carpenter.
8) Another wrong integer type declaration in nft_exthdr_init, also
from Dan Carpenter.
9) Fix insufficient mode validation in nft_range.
10) Fix compilation warning in nft_range due to possible uninitialized
value, from Arnd Bergmann.
11) Zero nf_hook_ops allocated via xt_hook_alloc() in x_tables to
calm down kmemcheck, from Florian Westphal.
12) Schedule gc_worker() to run again if GC_MAX_EVICTS quota is reached,
from Nicolas Dichtel.
13) Fix nf_queue() after conversion to single-linked hook list, related
to incorrect bypass flag handling and incorrect hook point of
reinjection.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 21 Oct 2016 01:15:16 +0000 (18:15 -0700)]
kexec: Export kexec_in_progress to modules
The bcm_sf2 driver uses kexec_in_progress to know whether it can power
down an integrated PHY during shutdown, and can be built as a module.
Other modules may be using this in the future, so export it.
Fixes: 2399d6143f85 ("net: dsa: bcm_sf2: Prevent GPHY shutdown for kexec'd kernels") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 20 Oct 2016 17:26:48 +0000 (10:26 -0700)]
ipv4: disable BH in set_ping_group_range()
In commit 4ee3bd4a8c746 ("ipv4: disable BH when changing ip local port
range") Cong added BH protection in set_local_port_range() but missed
that same fix was needed in set_ping_group_range()
Fixes: b8f1a55639e6 ("udp: Add function to make source port for UDP tunnels") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Eric Salo <salo@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 20 Oct 2016 16:32:19 +0000 (09:32 -0700)]
net: dsa: bcm_sf2: Prevent GPHY shutdown for kexec'd kernels
For a kernel that is being kexec'd we re-enable the integrated GPHY in
order for the subsequent MDIO bus scan to succeed and properly bind to
the bcm7xxx PHY driver. If we did not do that, the GPHY would be shut
down by the time the MDIO driver is probing the bus, and it would fail
to read the correct PHY OUI and therefore bind to an appropriate PHY
driver. Later on, this would cause DSA not to be able to successfully
attach to the PHY, and the interface would not be created at all.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 20 Oct 2016 15:13:53 +0000 (17:13 +0200)]
bpf, test: fix ld_abs + vlan push/pop stress test
After commit 636c2628086e ("net: skbuff: Remove errornous length
validation in skb_vlan_pop()") mentioned test case stopped working,
throwing a -12 (ENOMEM) return code. The issue however is not due to 636c2628086e, but rather due to a buggy test case that got uncovered
from the change in behaviour in 636c2628086e.
The data_size of that test case for the skb was set to 1. In the
bpf_fill_ld_abs_vlan_push_pop() handler bpf insns are generated that
loop with: reading skb data, pushing 68 tags, reading skb data,
popping 68 tags, reading skb data, etc, in order to force a skb
expansion and thus trigger that JITs recache skb->data. Problem is
that initial data_size is too small.
While before 636c2628086e, the test silently bailed out due to the
skb->len < VLAN_ETH_HLEN check with returning 0, and now throwing an
error from failing skb_ensure_writable(). Set at least minimum of
ETH_HLEN as an initial length so that on first push of data, equivalent
pop will succeed.
Fixes: 4d9c5c53ac99 ("test_bpf: add bpf_skb_vlan_push/pop() tests") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Thu, 20 Oct 2016 13:58:02 +0000 (15:58 +0200)]
net: add recursion limit to GRO
Currently, GRO can do unlimited recursion through the gro_receive
handlers. This was fixed for tunneling protocols by limiting tunnel GRO
to one level with encap_mark, but both VLAN and TEB still have this
problem. Thus, the kernel is vulnerable to a stack overflow, if we
receive a packet composed entirely of VLAN headers.
This patch adds a recursion counter to the GRO layer to prevent stack
overflow. When a gro_receive function hits the recursion limit, GRO is
aborted for this skb and it is processed normally. This recursion
counter is put in the GRO CB, but could be turned into a percpu counter
if we run out of space in the CB.
Thanks to Vladimír Beneš <vbenes@redhat.com> for the initial bug report.
Fixes: CVE-2016-7039 Fixes: 9b174d88c257 ("net: Add Transparent Ethernet Bridging GRO support.") Fixes: 66e5133f19e9 ("vlan: Add GRO support for non hardware accelerated vlan") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Reviewed-by: Jiri Benc <jbenc@redhat.com> Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Tom Herbert <tom@herbertland.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The check for an underflow of tmp_prefered_lft is always false
because tmp_prefered_lft is unsigned. The intention of the check
was to guard against racing with an update of the
temp_prefered_lft sysctl, potentially resulting in an underflow.
As suggested by David Miller, the best way to prevent the race is
by reading the sysctl variable using READ_ONCE.
Signed-off-by: Jiri Bohac <jbohac@suse.cz> Reported-by: Julia Lawall <julia.lawall@lip6.fr> Fixes: 76506a986dc3 ("IPv6: fix DESYNC_FACTOR") Signed-off-by: David S. Miller <davem@davemloft.net>
nf_queue handling is broken since e3b37f11e6e4 ("netfilter: replace
list_head with single linked list") for two reasons:
1) If the bypass flag is set on, there are no userspace listeners and
we still have more hook entries to iterate over, then jump to the
next hook. Otherwise accept the packet. On nf_reinject() path, the
okfn() needs to be invoked.
2) We should not re-enter the same hook on packet reinjection. If the
packet is accepted, we have to skip the current hook from where the
packet was enqueued, otherwise the packets gets enqueued over and
over again.
This restores the previous list_for_each_entry_continue() behaviour
happening from nf_iterate() that was dealing with these two cases.
This patch introduces a new nf_queue() wrapper function so this fix
becomes simpler.
Fixes: e3b37f11e6e4 ("netfilter: replace list_head with single linked list") Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
It makes sense to display the descriptors even if
DES0 is zero. This helps for example in case of it
is needed to dump rx write-back descriptors to get
timestamp status.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre TORGUE <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 20 Oct 2016 15:23:08 +0000 (11:23 -0400)]
Merge branch 'ncsi-fixes'
Gavin Shan says:
====================
net/ncsi: More bug fixes
This series fixes 2 issues that were found during NCSI's availability
testing on BCM5718 and improves HNCDSC AEN handler:
* PATCH[1] refactors the code so that minimal code change is put
to PATCH[2].
* PATCH[2] fixes the NCSI channel's stale link state before doing
failover.
* PATCH[3] chooses the hot channel, which was ever chosen as active
channel, when the available channels are all in link-down state.
* PATCH[4] improves Host Network Controller Driver Status Change
(HNCDSC) AEN handler
Changelog
=========
v2:
* Merged PATCH[v1 1/2] to PATCH[v2 1].
* Avoid if/else statements in ncsi_suspend_channel() as Joel suggested.
* Added comments to explain why we need retrieve last link states in
ncsi_suspend_channel().
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Thu, 20 Oct 2016 00:45:52 +0000 (11:45 +1100)]
net/ncsi: Improve HNCDSC AEN handler
This improves AEN handler for Host Network Controller Driver Status
Change (HNCDSC):
* The channel's lock should be hold when accessing its state.
* Do failover when host driver isn't ready.
* Configure channel when host driver becomes ready.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Thu, 20 Oct 2016 00:45:51 +0000 (11:45 +1100)]
net/ncsi: Choose hot channel as active one if necessary
The issue was found on BCM5718 which has two NCSI channels in one
package: C0 and C1. C0 is in link-up state while C1 is in link-down
state. C0 is chosen as active channel until unplugging and plugging
C0's cable: On unplugging C0's cable, LSC (Link State Change) AEN
packet received on C0 to report link-down event. After that, C1 is
chosen as active channel. LSC AEN for link-up event is lost on C0
when plugging C0's cable back. We lose the network even C0 is usable.
This resolves the issue by recording the (hot) channel that was ever
chosen as active one. The hot channel is chosen to be active one
if none of available channels in link-up state. With this, C0 is still
the active one after unplugging C0's cable. LSC AEN packet received
on C0 when plugging its cable back.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Thu, 20 Oct 2016 00:45:50 +0000 (11:45 +1100)]
net/ncsi: Fix stale link state of inactive channels on failover
The issue was found on BCM5718 which has two NCSI channels in one
package: C0 and C1. Both of them are connected to different LANs,
means they are in link-up state and C0 is chosen as the active one
until resetting BCM5718 happens as below.
Resetting BCM5718 results in LSC (Link State Change) AEN packet
received on C0, meaning LSC AEN is missed on C1. When LSC AEN packet
received on C0 to report link-down, it fails over to C1 because C1
is in link-up state as software can see. However, C1 is in link-down
state in hardware. It means the link state is out of synchronization
between hardware and software, resulting in inappropriate channel (C1)
selected as active one.
This resolves the issue by sending separate GLS (Get Link Status)
commands to all channels in the package before trying to do failover.
The last link states of all channels in the package are retrieved.
With it, C0 (not C1) is selected as active one as expected.
Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gavin Shan [Thu, 20 Oct 2016 00:45:49 +0000 (11:45 +1100)]
net/ncsi: Avoid if statements in ncsi_suspend_channel()
There are several if/else statements in the state machine implemented
by switch/case in ncsi_suspend_channel() to avoid duplicated code. It
makes the code a bit hard to be understood.
This drops if/else statements in ncsi_suspend_channel() to improve the
code readability as Joel Stanley suggested. Also, it becomes easy to
add more states in the state machine without affecting current code.
No logical changes introduced by this.
Suggested-by: Joel Stanley <joel@jms.id.au> Signed-off-by: Gavin Shan <gwshan@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paul Blakey [Wed, 19 Oct 2016 14:42:39 +0000 (17:42 +0300)]
net/sched: act_mirred: Use passed lastuse argument
stats_update callback is called by NIC drivers doing hardware
offloading of the mirred action. Lastuse is passed as argument
to specify when the stats was actually last updated and is not
always the current time.
Fixes: 9798e6fe4f9b ('net: act_mirred: allow statistic updates from offloaded actions') Signed-off-by: Paul Blakey <paulb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 20 Oct 2016 14:05:45 +0000 (16:05 +0200)]
mlxsw: pci: Fix reset wait for SwitchX2
SwitchX2 firmware does not implement reset done yet. Moreover, when
busy-polled for ready magic, that slows down firmware and reset takes
longer than the defined timeout, causing initialization to fail.
So restore the previous behaviour and just sleep in this case.
Fixes: 233fa44bd67a ("mlxsw: pci: Implement reset done check") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Elad Raz [Thu, 20 Oct 2016 14:05:44 +0000 (16:05 +0200)]
mlxsw: switchx2: Fix ethernet port initialization
When creating an ethernet port fails, we must move the port to disable,
otherwise putting the port in switch partition 0 (ETH) or 1 (IB) will
always fails.
Fixes: 31557f0f9755 ("mlxsw: Introduce Mellanox SwitchX-2 ASIC support") Signed-off-by: Elad Raz <eladr@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 20 Oct 2016 14:05:43 +0000 (16:05 +0200)]
mlxsw: spectrum_router: Make mlxsw_sp_router_fib4_del return void and remove warn
The function return value is not checked anywhere. Also, the warning
causes huge slowdown when removing large number of FIB entries which
were not offloaded, because of ordering issue. Ido's preparing
a patchset to fix the ordering issue, but that is definitelly not
net tree material.
Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Thu, 20 Oct 2016 14:05:42 +0000 (16:05 +0200)]
mlxsw: spectrum_router: Use correct tree index for binding
By a mistake, there is tree index 0 passed to RALTB. Should be
MLXSW_SP_LPM_TREE_MIN.
Fixes: b45f64d16d45 ("mlxsw: spectrum_router: Use FIB notifications instead of switchdev calls") Reported-by: Yotam Gigi <yotamg@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Siverskog [Thu, 20 Oct 2016 07:05:09 +0000 (09:05 +0200)]
Bluetooth: btwilink: Fix probe return value
Probe functions should return 0 on success. This driver's probe
returns the value returned by hci_register_dev(), which is the hci
index. This works for systems with only one hci device (id = 0) but
for systems where the btwilink device ends up with an id larger than
0, things will start to fall apart.
Make the probe function return 0 on success.
Signed-off-by: Jacob Siverskog <jacob@teenage.engineering> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
The commit commit 7086605a6ab5 ("stmmac: fix error check when init ptp")
breaks the procedure added by the
commit efee95f42b5d ("ptp_clock: future-proofing drivers against PTP
subsystem becoming optional")
So this patch tries to re-import the logic added by the latest
commit above: it makes sense to have the stmmac_ptp_register
as void function and, inside the main, the stmmac_init_ptp can fails
in case of the capability cannot be supported by the HW.
Signed-off-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre TORGUE <alexandre.torgue@st.com> Cc: Rayagond Kokatanur <rayagond@vayavyalabs.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Nicolas Pitre <nico@linaro.org> Acked-by: Nicolas Pitre <nico@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Bluetooth: Fix append max 11 bytes of name to scan rsp data
Append maximum of 10 + 1 bytes of name to scan response data.
Complete name is appended only if exists and is <= 10 characters.
Else append short name if exists or shorten complete name if not.
This makes sure name is consistent across multiple advertising
instances.
Signed-off-by: Michał Narajowski <michal.narajowski@codecoup.pl> Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
This warning is harmless; we copy 'uninitialized' data from the hook ops
but it will not be used.
Long term the structures keeping run-time data should be disentangled
from those only containing config-time data (such as where in the list
to insert a hook), but thats -next material.
Reported-by: Markus Trippelsdorf <markus@trippelsdorf.de> Suggested-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Aaron Conole <aconole@bytheb.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Jiri Pirko [Tue, 18 Oct 2016 16:59:34 +0000 (18:59 +0200)]
rtnetlink: Add rtnexthop offload flag to compare mask
The offload flag is a status flag and should not be used by
FIB semantics for comparison.
Fixes: 37ed9493699c ("rtnetlink: add RTNH_F_EXTERNAL flag for fib offload") Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
The problem is that the 8021q module propagates the call to
ndo_bridge_getlink() via switchdev ops, but the switch driver doesn't
recognize the netdev, as it's not offloaded.
While we can ignore calls being made to non-bridge ports inside the
driver, a better fix would be to push this check up to the switchdev
layer.
Note that these ndos can be called for non-bridged netdev, but this only
happens in certain PF drivers which don't call the corresponding
switchdev functions anyway.
Fixes: 99f44bb3527b ("mlxsw: spectrum: Enable L3 interfaces on top of bridge devices") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Tamir Winetroub <tamirw@mellanox.com> Tested-by: Tamir Winetroub <tamirw@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The problem is that netdev_all_lower_get_next_rcu() never advances the
iterator, thereby causing the loop over the lower adjacency list to run
forever.
Fix this by advancing the iterator and avoid the infinite loop.
Fixes: 7ce856aaaf13 ("mlxsw: spectrum: Add couple of lower device helper functions") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reported-by: Tamir Winetroub <tamirw@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Garver [Mon, 17 Oct 2016 20:30:12 +0000 (16:30 -0400)]
flow_dissector: Check skb for VLAN only if skb specified.
Fixes a panic when calling eth_get_headlen(). Noticed on i40e driver.
Fixes: d5709f7ab776 ("flow_dissector: For stripped vlan, get vlan info from skb->vlan_tci") Signed-off-by: Eric Garver <e@erig.me> Reviewed-by: Jakub Sitnicki <jkbs@redhat.com> Acked-by: Amir Vadai <amir@vadai.me> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Mon, 17 Oct 2016 15:17:51 +0000 (15:17 +0000)]
qed: Use list_move_tail instead of list_del/list_add_tail
Using list_move_tail() instead of list_del() + list_add_tail().
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 17 Oct 2016 22:16:15 +0000 (00:16 +0200)]
rocker: fix maybe-uninitialized warning
In some rare configurations, we get a warning about the 'index' variable
being used without an initialization:
drivers/net/ethernet/rocker/rocker_ofdpa.c: In function ‘ofdpa_port_fib_ipv4.isra.16.constprop’:
drivers/net/ethernet/rocker/rocker_ofdpa.c:2425:92: warning: ‘index’ may be used uninitialized in this function [-Wmaybe-uninitialized]
This is a false positive, the logic is just a bit too complex for gcc
to follow here. Moving the intialization of 'index' a little further
down makes it clear to gcc that the function always returns an error
if it is not initialized.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 17 Oct 2016 22:16:09 +0000 (00:16 +0200)]
net/hyperv: avoid uninitialized variable
The hdr_offset variable is only if we deal with a TCP or UDP packet,
but as the check surrounding its usage tests for skb_is_gso()
instead, the compiler has no idea if the variable is initialized
or not at that point:
drivers/net/hyperv/netvsc_drv.c: In function ‘netvsc_start_xmit’:
drivers/net/hyperv/netvsc_drv.c:494:42: error: ‘hdr_offset’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
This adds an additional check for the transport type, which
tells the compiler that this path cannot happen. Since the
get_net_transport_info() function should always be inlined
here, I don't expect this to result in additional runtime
checks.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
gcc found a reference to an uninitialized variable in the error handling
of bcm_enet_open, introduced by a recent cleanup:
drivers/net/ethernet/broadcom/bcm63xx_enet.c: In function 'bcm_enet_open'
drivers/net/ethernet/broadcom/bcm63xx_enet.c:1129:2: warning: 'phydev' may be used uninitialized in this function [-Wmaybe-uninitialized]
This makes the use of that variable conditional, so we only reference it
here after it has been used before. Unlike my normal patches, I have not
build-tested this one, as I don't currently have mips test in my
randconfig setup.
Fixes: 625eb8667d6f ("net: ethernet: broadcom: bcm63xx: use phydev from struct net_device") Cc: Philippe Reynes <tremyfr@gmail.com> Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Mon, 17 Oct 2016 20:28:10 +0000 (15:28 -0500)]
ibmvnic: Update MTU after device initialization
It is possible for the MTU to be changed during the initialization
process with the VNIC Server. Ensure that the net device is updated
to reflect the new MTU.
Signed-off-by: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bridge: multicast: restore perm router ports on multicast enable
Satish reported a problem with the perm multicast router ports not getting
reenabled after some series of events, in particular if it happens that the
multicast snooping has been disabled and the port goes to disabled state
then it will be deleted from the router port list, but if it moves into
non-disabled state it will not be re-added because the mcast snooping is
still disabled, and enabling snooping later does nothing.
Here are the steps to reproduce, setup br0 with snooping enabled and eth1
added as a perm router (multicast_router = 2):
1. $ echo 0 > /sys/class/net/br0/bridge/multicast_snooping
2. $ ip l set eth1 down
^ This step deletes the interface from the router list
3. $ ip l set eth1 up
^ This step does not add it again because mcast snooping is disabled
4. $ echo 1 > /sys/class/net/br0/bridge/multicast_snooping
5. $ bridge -d -s mdb show
<empty>
At this point we have mcast enabled and eth1 as a perm router (value = 2)
but it is not in the router list which is incorrect.
After this change:
1. $ echo 0 > /sys/class/net/br0/bridge/multicast_snooping
2. $ ip l set eth1 down
^ This step deletes the interface from the router list
3. $ ip l set eth1 up
^ This step does not add it again because mcast snooping is disabled
4. $ echo 1 > /sys/class/net/br0/bridge/multicast_snooping
5. $ bridge -d -s mdb show
router ports on br0: eth1
Note: we can directly do br_multicast_enable_port for all because the
querier timer already has checks for the port state and will simply
expire if it's in blocking/disabled. See the comment added by
commit 9aa66382163e7 ("bridge: multicast: add a comment to
br_port_state_selection about blocking state")
Fixes: 561f1103a2b7 ("bridge: Add multicast_snooping sysfs toggle") Reported-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The newly added nft_range_eval() function handles the two possible
nft range operations, but as the compiler warning points out,
any unexpected value would lead to the 'mismatch' variable being
used without being initialized:
net/netfilter/nft_range.c: In function 'nft_range_eval':
net/netfilter/nft_range.c:45:5: error: 'mismatch' may be used uninitialized in this function [-Werror=maybe-uninitialized]
This removes the variable in question and instead moves the
condition into the switch itself, which is potentially more
efficient than adding a bogus 'default' clause as in my
first approach, and is nicer than using the 'uninitialized_var'
macro.
Fixes: 0f3cd9b36977 ("netfilter: nf_tables: add range expression") Link: http://patchwork.ozlabs.org/patch/677114/ Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tobias Klauser [Tue, 18 Oct 2016 09:22:54 +0000 (11:22 +0200)]
tcp: Remove unused but set variable
Remove the unused but set variable icsk in listening_get_next to fix the
following GCC warning when building with 'W=1':
net/ipv4/tcp_ipv4.c: In function ‘listening_get_next’:
net/ipv4/tcp_ipv4.c:1890:31: warning: variable ‘icsk’ set but not used [-Wunused-but-set-variable]
Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Tue, 18 Oct 2016 08:51:25 +0000 (14:21 +0530)]
cxgb4: Fix number of queue sets corssing the limit
Do not let number of offload queue sets to go more than
MAX_OFLD_QSETS, which would otherwise crash the driver
on machines with cores more than MAX_OFLD_QSETS.
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Cassel [Tue, 18 Oct 2016 07:20:55 +0000 (09:20 +0200)]
dwc_eth_qos: enable flow control by default
Allow autoneg to enable flow control by default.
The behavior when autoneg is off has not changed.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: Jesper Nilsson <jespern@axis.com> Acked-by: Lars Persson <larper@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Niklas Cassel [Tue, 18 Oct 2016 07:20:33 +0000 (09:20 +0200)]
dwc_eth_qos: do not clear pause flags from phy_device->supported
phy_device->supported is originally set by the PHY driver.
The ethernet driver should filter phy_device->supported to only contain
flags supported by the IP.
The IP supports setting rx and tx flow control independently,
therefore SUPPORTED_Pause and SUPPORTED_Asym_Pause should not be cleared.
If the flags are cleared, pause frames cannot be enabled (even if they
are supported by the PHY).
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Signed-off-by: Jesper Nilsson <jespern@axis.com> Acked-by: Lars Persson <larper@axis.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Klauser [Tue, 18 Oct 2016 07:07:29 +0000 (09:07 +0200)]
net/hsr: Remove unused but set variable
Remove the unused but set variable master_dev in check_local_dest to fix
the following GCC warning when building with 'W=1':
net/hsr/hsr_forward.c: In function ‘check_local_dest’:
net/hsr/hsr_forward.c:303:21: warning: variable ‘master_dev’ set but not used [-Wunused-but-set-variable]
Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 18 Oct 2016 14:26:15 +0000 (10:26 -0400)]
Merge tag 'mac80211-for-davem-2016-10-18' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
This is relatively small, mostly to get the SG/crypto
from stack removal fix that crashes things when VMAP
stack is used in conjunction with software crypto.
Aside from that, we have:
* a fix for AP_VLAN usage with the nl80211 frame command
* two fixes (and two preparation patches) for A-MSDU, one
to discard group-addressed (multicast) and unexpected
4-address A-MSDUs, the other to validate A-MSDU inner
MAC addresses properly to prevent controlled port bypass
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Vecera [Tue, 18 Oct 2016 06:16:03 +0000 (08:16 +0200)]
bnx2: fix locking when netconsole is used
Functions bnx2_reg_rd_ind(), bnx2_reg_wr_ind() and bnx2_ctx_wr()
can be called with IRQs disabled when netconsole is enabled. So they
should use spin_{,un}lock_irq{save,restore} instead of _bh variants.
Cc: Sony Chacko <sony.chacko@qlogic.com> Cc: Dept-HSGLinuxNICDev@qlogic.com Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 17 Oct 2016 17:03:04 +0000 (13:03 -0400)]
Merge branch 'net-driver-autoload'
Javier Martinez Canillas says:
====================
net: Fix module autoload for several platform drivers
I noticed that module autoload won't be working in a bunch of platform
drivers in the net subsystem and this patch series contains the fixes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: bcm_sf2: Fix module autoload for OF registration
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/net/dsa/bcm_sf2.ko | grep alias
alias: platform:brcm-sf2
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/net/dsa/b53/b53_mmap.ko | grep alias
$
net: hisilicon: Fix hns_mdio module autoload for OF registration
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/net/ethernet/hisilicon//hns_mdio.ko | grep alias
alias: platform:Hi-HNS_MDIO
alias: acpi*:HISI0141:*
net: qcom/emac: Fix module autoload for OF registration
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/net/ethernet/qualcomm/emac/qcom-emac.ko | grep alias
alias: platform:qcom-emac
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Acked-by: Timur Tabi <timur@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: hns: Fix hns_dsaf module autoload for OF registration
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/net/ethernet/hisilicon/hns/hns_dsaf.ko | grep alias
alias: acpi*:HISI00B2:*
alias: acpi*:HISI00B1:*
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ $ modinfo drivers/net/ethernet/aurora/nb8800.ko | grep alias
$
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Acked-by: Mans Rullgard <mans@mansr.com> Signed-off-by: David S. Miller <davem@davemloft.net>
If the driver is built as a module, autoload won't work because the module
alias information is not filled. So user-space can't match the registered
device with the corresponding module.
Export the module alias information using the MODULE_DEVICE_TABLE() macro.
Before this patch:
$ modinfo drivers/net/ethernet/ezchip/nps_enet.ko | grep alias
$
After this patch:
$ modinfo drivers/net/ethernet/ezchip/nps_enet.ko | grep alias
alias: of:N*T*Cezchip,nps-mgt-enetC*
alias: of:N*T*Cezchip,nps-mgt-enet
Signed-off-by: Javier Martinez Canillas <javier@osg.samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Use nft_parse_u32_check() to make sure we don't get a value over the
unsigned 8-bit integer. Moreover, make sure this value doesn't go over
the two supported range comparison modes.
Dan Carpenter [Wed, 12 Oct 2016 06:09:12 +0000 (09:09 +0300)]
netfilter: nft_exthdr: fix error handling in nft_exthdr_init()
"err" needs to be signed for the error handling to work.
Fixes: 36b701fae12a ('netfilter: nf_tables: validate maximum value of u32 netlink attributes') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Dan Carpenter [Wed, 12 Oct 2016 09:14:29 +0000 (12:14 +0300)]
netfilter: nf_tables: underflow in nft_parse_u32_check()
We don't want to allow negatives here.
Fixes: 36b701fae12a ('netfilter: nf_tables: validate maximum value of u32 netlink attributes') Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Liping Zhang [Wed, 12 Oct 2016 13:09:22 +0000 (21:09 +0800)]
netfilter: xt_ipcomp: add "ip[6]t_ipcomp" module alias name
Otherwise, user cannot add related rules if xt_ipcomp.ko is not loaded:
# iptables -A OUTPUT -p 108 -m ipcomp --ipcompspi 1
iptables: No chain/target/match by that name.
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>