bridge: mcast: give fast leave precedence over multicast router and querier
When fast leave is configured on a bridge port and an IGMP leave is
received for a group, the group is not deleted immediately if there is
a router detected or if multicast querier is configured.
Ideally the group should be deleted immediately when fast leave is
configured.
Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bridge: Fix network header pointer for vlan tagged packets
There are several devices that can receive vlan tagged packets with
CHECKSUM_PARTIAL like tap, possibly veth and xennet.
When (multiple) vlan tagged packets with CHECKSUM_PARTIAL are forwarded
by bridge to a device with the IP_CSUM feature, they end up with checksum
error because before entering bridge, the network header is set to
ETH_HLEN (not including vlan header length) in __netif_receive_skb_core(),
get_rps_cpu(), or drivers' rx functions, and nobody fixes the pointer later.
Since the network header is exepected to be ETH_HLEN in flow-dissection
and hash-calculation in RPS in rx path, and since the header pointer fix
is needed only in tx path, set the appropriate network header on forwarding
packets.
Signed-off-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
tpacket_fill_skb() can return a negative value (-errno) which
is stored in tp_len variable. In that case the following
condition will be (but shouldn't be) true:
tp_len > dev->mtu + dev->hard_header_len
as dev->mtu and dev->hard_header_len are both unsigned.
That may lead to just returning an incorrect EMSGSIZE errno
to the user.
Fixes: 52f1454f629fa ("packet: allow to transmit +4 byte in TX_RING slot for VLAN case") Signed-off-by: Alexander Drozdov <al.drozdov@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Mon, 27 Jul 2015 09:33:50 +0000 (11:33 +0200)]
arp: filter NOARP neighbours for SIOCGARP
When arp is off on a device, and ioctl(SIOCGARP) is queried,
a buggy answer is given with MAC address of the device, instead
of the mac address of the destination/gateway.
We filter out NUD_NOARP neighbours for /proc/net/arp,
we must do the same for SIOCGARP ioctl.
lpaa23:~# ip link set dev eth0 arp off
lpaa23:~# cat /proc/net/arp # check arp table is now 'empty'
IP address HW type Flags HW address Mask Device
lpaa23:~# ./arp 10.246.7.190
MAC=00:1a:11:c3:0d:7f // buggy answer before patch (this is eth0 mac)
After patch :
lpaa23:~# ip link set dev eth0 arp off
lpaa23:~# ./arp 10.246.7.190
ioctl(SIOCGARP) failed: No such device or address
Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Vytautas Valancius <valas@google.com> Cc: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ward [Sun, 26 Jul 2015 16:18:58 +0000 (12:18 -0400)]
net/ipv4: suppress NETDEV_UP notification on address lifetime update
This notification causes the FIB to be updated, which is not needed
because the address already exists, and more importantly it may undo
intentional changes that were made to the FIB after the address was
originally added. (As a point of comparison, when an address becomes
deprecated because its preferred lifetime expired, a notification on
this chain is not generated.)
The motivation for this commit is fixing an incompatibility between
DHCP clients which set and update the address lifetime according to
the lease, and a commercial VPN client which replaces kernel routes
in a way that outbound traffic is sent only through the tunnel (and
disconnects if any further route changes are detected via netlink).
Signed-off-by: David Ward <david.ward@ll.mit.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
bridge: stp: when using userspace stp stop kernel hello and hold timers
These should be handled only by the respective STP which is in control.
They become problematic for devices with limited resources with many
ports because the hold_timer is per port and fires each second and the
hello timer fires each 2 seconds even though it's global. While in
user-space STP mode these timers are completely unnecessary so it's better
to keep them off.
Also ensure that when the bridge is up these timers are started only when
running with kernel STP.
Signed-off-by: Satish Ashok <sashok@cumulusnetworks.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lars Westerhoff [Mon, 27 Jul 2015 22:32:21 +0000 (01:32 +0300)]
packet: missing dev_put() in packet_do_bind()
When binding a PF_PACKET socket, the use count of the bound interface is
always increased with dev_hold in dev_get_by_{index,name}. However,
when rebound with the same protocol and device as in the previous bind
the use count of the interface was not decreased. Ultimately, this
caused the deletion of the interface to fail with the following message:
unregister_netdevice: waiting for dummy0 to become free. Usage count = 1
This patch moves the dev_put out of the conditional part that was only
executed when either the protocol or device changed on a bind.
Fixes: 902fefb82ef7 ('packet: improve socket create/bind latency in some cases') Signed-off-by: Lars Westerhoff <lars.westerhoff@newtec.eu> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reviewed-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Mon, 27 Jul 2015 20:08:06 +0000 (13:08 -0700)]
fib_trie: Drop unnecessary calls to leaf_pull_suffix
It was reported that update_suffix was taking a long time on systems where
a large number of leaves were attached to a single node. As it turns out
fib_table_flush was calling update_suffix for each leaf that didn't have all
of the aliases stripped from it. As a result, on this large node removing
one leaf would result in us calling update_suffix for every other leaf on
the node.
The fix is to just remove the calls to leaf_pull_suffix since they are
redundant as we already have a call in resize that will go through and
update the suffix length for the node before we exit out of
fib_table_flush or fib_table_flush_external.
Reported-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 25 Jul 2015 20:38:02 +0000 (22:38 +0200)]
net: fec: Ensure clocks are enabled while using mdio bus
When a switch is attached to the mdio bus, the mdio bus can be used
while the interface is not open. If the IPG clock is not enabled, MDIO
reads/writes will simply time out.
Add support for runtime PM to control this clock. Enable/disable this
clock using runtime PM, with open()/close() and mdio read()/write()
function triggering runtime PM operations. Since PM is optional, the
IPG clock is enabled at probe and is no longer modified by
fec_enet_clk_enable(), thus if PM is not enabled in the kernel, it is
guaranteed the clock is running when MDIO operations are performed.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Lucas Stach <l.stach@pengutronix.de> Cc: tyler.baker@linaro.org Cc: fabio.estevam@freescale.com Cc: shawn.guo@linaro.org Tested-by: Fabio Estevam <fabio.estevam@freescale.com> Tested-by: Tyler Baker <tyler.baker@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: netcp: Fixes SGMII reset on network interface shutdown
This patch asserts SGMII RTRESET, i.e. resetting the SGMII Tx/Rx
logic, during network interface shutdown to avoid having the
hardware wedge when shutting down with high incoming traffic rates.
This is cleared (brought out of RTRESET) when the interface is
brought back up.
Signed-off-by: WingMan Kwok <w-kwok2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Fri, 24 Jul 2015 18:24:02 +0000 (21:24 +0300)]
net/macb: suppress compiler warnings
This patch fixes the following warnings:
drivers/net/ethernet/cadence/macb.c: In function ‘macb_handle_link_change’:
drivers/net/ethernet/cadence/macb.c:266: warning: comparison between signed and unsigned
drivers/net/ethernet/cadence/macb.c:267: warning: comparison between signed and unsigned
drivers/net/ethernet/cadence/macb.c:291: warning: comparison between signed and unsigned
drivers/net/ethernet/cadence/macb.c: In function ‘gem_update_stats’:
drivers/net/ethernet/cadence/macb.c:1908: warning: comparison between signed and unsigned
drivers/net/ethernet/cadence/macb.c: In function ‘gem_get_ethtool_strings’:
drivers/net/ethernet/cadence/macb.c:1988: warning: comparison between signed and unsigned
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Fri, 24 Jul 2015 18:24:00 +0000 (21:24 +0300)]
net/macb: check if macb_config present
The commit 98b5a0f4a228 introduces jumbo frame support, but also it assumes
that macb_config present which is not always true.
The configuration without macb_config fails to boot.
Unable to handle kernel NULL pointer dereference at virtual address 00000010
ptbr = 90350000 pgd = 00000000
Oops: Kernel access of bad area, sig: 11 [#1]
FRAME_POINTER chip: 0x01f:0x1e82 rev 2
Modules linked in:
CPU: 0 PID: 1 Comm: swapper Not tainted 4.2.0-rc3-next-20150723+ #13
task: 91c26000 ti: 91c28000 task.ti: 91c28000
PC is at macb_probe+0x140/0x61c
Fixes: 98b5a0f4a228 (net: macb: Add support for jumbo frames) Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Fri, 24 Jul 2015 18:23:59 +0000 (21:23 +0300)]
net/macb: improve big endian CPU support
The commit a50dad355a53 (net: macb: Add big endian CPU support) converted I/O
accessors to readl_relaxed() and writel_relaxed() and consequentially broke
MACB driver on AVR32 platforms such as ATNGW100.
This patch improves I/O access by checking endiannes first and use the
corresponding methods.
Fixes: a50dad355a53 (net: macb: Add big endian CPU support) Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
- Disable U1/U2 during initialization.
- Disable lpm when linking is on, and enable it when linking is off.
- Disable U1/U2 when enabling runtime suspend.
It is possible to let hw stop working, if the U1/U2 request occurs
during some situations. The patch is used to avoid it.
Signed-off-by: Hayes Wang <hayeswang@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 27 Jul 2015 04:53:08 +0000 (21:53 -0700)]
Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth
Johan Hedberg says:
====================
pull request: bluetooth 2015-07-23
Here's another one-patch pull request for 4.2 which targets a potential
NULL pointer dereference in the LE Security Manager code that can be
triggered by using older user space tools. The issue has been there
since 4.0 so there's the appropriate "Cc: stable" in place.
Let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
niu: don't count tx error twice in case of headroom realloc fails
Fixes: a3138df9 ("[NIU]: Add Sun Neptune ethernet driver.") Signed-off-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
A second issue results in softlockup since the evictor may restart the
eviction loop for a (potentially) unlimited number of times while local
softirqs are disabled.
Frank reports that test system remained stable for 14 hours of testing
(before, crash occured within half an hour in their setup).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
inet: frags: remove INET_FRAG_EVICTED and use list_evictor for the test
We can simply remove the INET_FRAG_EVICTED flag to avoid all the flags
race conditions with the evictor and use a participation test for the
evictor list, when we're at that point (after inet_frag_kill) in the
timer there're 2 possible cases:
1. The evictor added the entry to its evictor list while the timer was
waiting for the chainlock
or
2. The timer unchained the entry and the evictor won't see it
In both cases we should be able to see list_evictor correctly due
to the sync on the chainlock.
Joint work with Florian Westphal.
Tested-by: Frank Schreuder <fschreuder@transip.nl> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
inet: frag: don't wait for timer deletion when evicting
Frank reports 'NMI watchdog: BUG: soft lockup' errors when
load is high. Instead of (potentially) unbounded restarts of the
eviction process, just skip to the next entry.
One caveat is that, when a netns is exiting, a timer may still be running
by the time inet_evict_bucket returns.
We use the frag memory accounting to wait for outstanding timers,
so that when we free the percpu counter we can be sure no running
timer will trip over it.
Reported-and-tested-by: Frank Schreuder <fschreuder@transip.nl> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
commit 65ba1f1ec0eff ("inet: frags: fix a race between inet_evict_bucket
and inet_frag_kill") describes the bug, but the fix doesn't work reliably.
Problem is that ->flags member can be set on other cpu without chainlock
being held by that task, i.e. the RMW-Cycle can clear INET_FRAG_EVICTED
bit after we put the element on the evictor private list.
We can crash when walking the 'private' evictor list since an element can
be deleted from list underneath the evictor.
Join work with Nikolay Alexandrov.
Fixes: b13d3cbfb8e8 ("inet: frag: move eviction of queues to work queue") Reported-by: Johan Schuijt <johan@transip.nl> Tested-by: Frank Schreuder <fschreuder@transip.nl> Signed-off-by: Nikolay Alexandrov <nikolay@cumulusnetworks.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 22 Jul 2015 14:31:49 +0000 (16:31 +0200)]
net: sctp: stop spamming klog with rfc6458, 5.3.2. deprecation warnings
Back then when we added support for SCTP_SNDINFO/SCTP_RCVINFO from
RFC6458 5.3.4/5.3.5, we decided to add a deprecation warning for the
(as per RFC deprecated) SCTP_SNDRCV via commit bbbea41d5e53 ("net:
sctp: deprecate rfc6458, 5.3.2. SCTP_SNDRCV support"), see [1].
Imho, it was not a good idea, and we should just revert that message
for a couple of reasons:
1) It's uapi and therefore set in stone forever.
2) To be able to run on older and newer kernels, an SCTP application
would need to probe for both, SCTP_SNDRCV, but also SCTP_SNDINFO/
SCTP_RCVINFO support, so that on older kernels, it can make use
of SCTP_SNDRCV, and on newer kernels SCTP_SNDINFO/SCTP_RCVINFO.
In my (limited) experience, a lot of SCTP appliances are migrating
to newer kernels only ve(ee)ry slowly.
3) Some people don't have the chance to change their applications,
f.e. due to proprietary legacy stuff. So, they'll hit this warning
in fast path and are stuck with older kernels.
But i.e. due to point 1) I really fail to see the benefit of a warning.
So just revert that for now, the issue was reported up Jamal.
Reported-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Michael Tuexen <tuexen@fh-muenster.de> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 26 Jul 2015 23:29:26 +0000 (16:29 -0700)]
Merge branch 'mlx4-fixes'
Or Gerlitz says:
====================
mlx4 driver fixes, July 22nd 2015
Just few mlx4 fixes.. the fix related to propagating port state
changes to VF should go to -stable of >= 3.11, all the rest just
to 4.2-rc
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net/mlx4_en: Remove BUG_ON assert when checking if ring is full
In mlx4_en_is_ring_empty we check if ring surpassed its size.
Since the prod and cons indicators are u32, there might be a state where
prod wrapped around and cons, making this assert false, although no
actual bug exists (other code segment can cope with this state).
Signed-off-by: Ido Shamay <idos@mellanox.com> Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Wed, 22 Jul 2015 13:53:48 +0000 (16:53 +0300)]
net/mlx4_core: Relieve cpu load average on the port sending flow
When a port is not attached, the FW requires a longer than usual time to
execute the SENSE_PORT command. In the command flow, the
wait_for_completion_timeout call used in mlx4_cmd_wait puts the kernel
thread into the uninterruptible state during this time. This, in turn,
due to the computation method, causes the CPU load average to increase.
Fix this by using wait_for_completion_interruptible_timeout() for the
SENSE_PORT command, which puts the thread in the interruptible state.
In this state, the thread does not contribute to the CPU load average.
Treat the interrupted case as if the SENSE_PORT command returned
port_type = NONE.
Fix suggested by Gideon Naim <gideonn@mellanox.com> and
Bart Van Assche <bart.vanassche@sandisk.com>.
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Wed, 22 Jul 2015 13:53:47 +0000 (16:53 +0300)]
net/mlx4_core: Fix wrong index in propagating port change event to VFs
The port-change event processing in procedure mlx4_eq_int() uses "slave"
as the vf_oper array index. Since the value of "slave" is the PF function
index, the result is that the PF link state is used for deciding to
propagate the event for all the VFs. The VF link state should be used,
so the VF function index should be used here.
Fixes: 948e306d7d64 ('net/mlx4: Add VF link state support') Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Or Gerlitz [Wed, 22 Jul 2015 13:53:46 +0000 (16:53 +0300)]
net/mlx4_core: Use sink counter for the VF default as fallback
Some old PF drivers don't let VFs allocate counters, in that case, use
the sink counter so the VF can load and operate properly.
Fixes: 6de5f7f6a1fa ('net/mlx4_core: Allocate default counter per port') Reported-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Since slave_changelink support was added there have been a few race
conditions when using br_setport() since some of the port functions it
uses require the bridge lock. It is very easy to trigger a lockup due to
some internal spin_lock() usage without bh disabled, also it's possible to
get the bridge into an inconsistent state.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: 3ac636b8591c ("bridge: implement rtnl_link_ops->slave_changelink") Reviewed-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset contains 2 changes for the alternative routes,
one to add tb_id/fa_slen check needed after the recent
fib_trie optimizations for fib aliases and the second
change attempts to support alternative routes with TOS
requirement.
Sorry that I don't have access to the original
report from Hagen Paul Pfeifer. I hope he will see this
change.
The second change adds fa_default field to the
fib aliases (which can be many) and if the feature to
filter the alternative routes by TOS is not worth it,
this second patch can be scrapped.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
fib_select_default considers alternative routes only when
res->fi is for the first alias in res->fa_head. In the
common case this can happen only when the initial lookup
matches the first alias with highest TOS value. This
prevents the alternative routes to require specific TOS.
This patch solves the problem as follows:
- routes that require specific TOS should be returned by
fib_select_default only when TOS matches, as already done
in fib_table_lookup. This rule implies that depending on the
TOS we can have many different lists of alternative gateways
and we have to keep the last used gateway (fa_default) in first
alias for the TOS instead of using single tb_default value.
- as the aliases are ordered by many keys (TOS desc,
fib_priority asc), we restrict the possible results to
routes with matching TOS and lowest metric (fib_priority)
and routes that match any TOS, again with lowest metric.
For example, packet with TOS 8 can not use gw3 (not lowest
metric), gw4 (different TOS) and gw6 (not lowest metric),
all other gateways can be used:
tos 8 via gw1 metric 2 <--- res->fa_head and res->fi
tos 8 via gw2 metric 2
tos 8 via gw3 metric 3
tos 4 via gw4
tos 0 via gw5
tos 0 via gw6 metric 1
Reported-by: Hagen Paul Pfeifer <hagen@jauu.net> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
fib_trie starting from 4.1 can link fib aliases from
different prefixes in same list. Make sure the alternative
gateways are in same table and for same prefix (0) by
checking tb_id and fa_slen.
Fixes: 79e5ad2ceb00 ("fib_trie: Remove leaf_info") Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hedberg [Mon, 20 Jul 2015 17:31:25 +0000 (20:31 +0300)]
Bluetooth: Fix NULL pointer dereference in smp_conn_security
The l2cap_conn->smp pointer may be NULL for various valid reasons where SMP has
failed to initialize properly. One such scenario is when crypto support is
missing, another when the adapter has been powered on through a legacy method.
The smp_conn_security() function should have the appropriate check for this
situation to avoid NULL pointer dereferences.
1) Don't use shared bluetooth antenna in iwlwifi driver for management
frames, from Emmanuel Grumbach.
2) Fix device ID check in ath9k driver, from Felix Fietkau.
3) Off by one in xen-netback BUG checks, from Dan Carpenter.
4) Fix IFLA_VF_PORT netlink attribute validation, from Daniel Borkmann.
5) Fix races in setting peeked bit flag in SKBs during datagram
receive. If it's shared we have to clone it otherwise the value can
easily be corrupted. Fix from Herbert Xu.
7) Fix use after free in fq_codel and sfq packet schedulers, from WANG
Cong.
8) ipvlan bug fixes (memory leaks, missing rcu_dereference_bh, etc.)
from WANG Cong and Konstantin Khlebnikov.
9) Memory leak in act_bpf packet action, from Alexei Starovoitov.
10) ARM bpf JIT bug fixes from Nicolas Schichan.
11) Fix backwards compat of ANY_LAYOUT in virtio_net driver, from
Michael S Tsirkin.
12) Destruction of bond with different ARP header types not handled
correctly, fix from Nikolay Aleksandrov.
13) Revert GRO receive support in ipv6 SIT tunnel driver, causes
regressions because the GRO packets created cannot be processed
properly on the GSO side if we forward the frame. From Herbert Xu.
14) TCCR update race and other fixes to ravb driver from Sergei
Shtylyov.
15) Fix SKB leaks in caif_queue_rcv_skb(), from Eric Dumazet.
16) Fix panics on packet scheduler filter replace, from Daniel Borkmann.
17) Make sure AF_PACKET sees properly IP headers in defragmented frames
(via PACKET_FANOUT_FLAG_DEFRAG option), from Edward Hyunkoo Jee.
18) AF_NETLINK cannot hold mutex in RCU callback, fix from Florian
Westphal.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (84 commits)
ravb: fix ring memory allocation
net: phy: dp83867: Fix warning check for setting the internal delay
openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes
netlink: don't hold mutex in rcu callback when releasing mmapd ring
ARM: net: fix vlan access instructions in ARM JIT.
ARM: net: handle negative offsets in BPF JIT.
ARM: net: fix condition for load_order > 0 when translating load instructions.
tcp: suppress a division by zero warning
drivers: net: cpsw: remove tx event processing in rx napi poll
inet: frags: fix defragmented packet's IP header for af_packet
net: mvneta: fix refilling for Rx DMA buffers
stmmac: fix setting of driver data in stmmac_dvr_probe
sched: cls_flow: fix panic on filter replace
sched: cls_flower: fix panic on filter replace
sched: cls_bpf: fix panic on filter replace
net/mdio: fix mdio_bus_match for c45 PHY
net: ratelimit warnings about dst entry refcount underflow or overflow
caif: fix leaks and race in caif_queue_rcv_skb()
qmi_wwan: add the second QMI/network interface for Sierra Wireless MC7305/MC7355
ravb: fix race updating TCCR
...
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull ARM64 fixes from Catalin Marinas:
- arm64 build fix following the move of the thread_struct to the end of
task_struct and the asm offsets becoming too large for the AArch64
ISA
- preparatory patch for moving irq_data struct members (applied now to
reduce dependency for the next merging window)
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
ARM64/irq: Use access helper irq_data_get_affinity_mask()
arm64: switch_to: calculate cpu context pointer using separate register
Joe Stringer [Wed, 22 Jul 2015 04:37:31 +0000 (21:37 -0700)]
netfilter: nf_conntrack: Support expectations in different zones
When zones were originally introduced, the expectation functions were
all extended to perform lookup using the zone. However, insertion was
not modified to check the zone. This means that two expectations which
are intended to apply for different connections that have the same tuple
but exist in different zones cannot both be tracked.
Fixes: 5d0aa2ccd4 (netfilter: nf_conntrack: add support for "conntrack zones") Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Will Deacon [Mon, 20 Jul 2015 14:14:53 +0000 (15:14 +0100)]
arm64: switch_to: calculate cpu context pointer using separate register
Commit 0c8c0f03e3a2 ("x86/fpu, sched: Dynamically allocate 'struct fpu'")
moved the thread_struct to the bottom of task_struct. As a result, the
offset is now too large to be used in an immediate add on arm64 with
some kernel configs:
arch/arm64/kernel/entry.S: Assembler messages:
arch/arm64/kernel/entry.S:588: Error: immediate out of range
arch/arm64/kernel/entry.S:597: Error: immediate out of range
This patch calculates the offset using an additional register instead of
an immediate offset.
Sergei Shtylyov [Tue, 21 Jul 2015 22:31:59 +0000 (01:31 +0300)]
ravb: fix ring memory allocation
The driver is written as if it can adapt to a low memory situation allocating
less RX skbs and TX aligned buffers than the respective RX/TX ring sizes. In
reality though the driver would malfunction in this case. Stop being overly
smart and just fail in such situation -- this is achieved by moving the memory
allocation from ravb_ring_format() to ravb_ring_init().
We leave dma_map_single() calls in place but make their failure non-fatal
by marking the corresponding RX descriptors with zero data size which should
prevent DMA to an invalid addresses.
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Murphy [Tue, 21 Jul 2015 17:06:45 +0000 (12:06 -0500)]
net: phy: dp83867: Fix warning check for setting the internal delay
Fix warning: logical ‘or’ of collectively exhaustive tests is always true
Change the internal delay check from an 'or' condition to an 'and'
condition.
Reported-by: David Binderman <dcb314@hotmail.com> Signed-off-by: Dan Murphy <dmurphy@ti.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
openvswitch: allocate nr_node_ids flow_stats instead of num_possible_nodes
Some architectures like POWER can have a NUMA node_possible_map that
contains sparse entries. This causes memory corruption with openvswitch
since it allocates flow_cache with a multiple of num_possible_nodes() and
assumes the node variable returned by for_each_node will index into
flow->stats[node].
Use nr_node_ids to allocate a maximal sparse array instead of
num_possible_nodes().
The crash was noticed after 3af229f2 was applied as it changed the
node_possible_map to match node_online_map on boot. Fixes: 3af229f2071f5b5cb31664be6109561fbe19c861 Signed-off-by: Chris J Arges <chris.j.arges@canonical.com> Acked-by: Pravin B Shelar <pshelar@nicira.com> Acked-by: Nishanth Aravamudan <nacc@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The socket should be dead at this point. It might be simpler to
add a netlink_release_ring() function which doesn't require
locking at all.
Reported-by: "Kirill A. Shutemov" <kirill@shutemov.name> Diagnosed-by: Cong Wang <cwang@twopensource.com> Suggested-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 22 Jul 2015 05:19:55 +0000 (22:19 -0700)]
Merge branch 'arm-bpf-fixes'
Nicolas Schichan says:
====================
BPF JIT fixes for ARM
These patches are fixing bugs in the ARM JIT and should probably find
their way to a stable kernel. All 60 test_bpf tests in Linux 4.1 release
are now passing OK (was 54 out of 60 before).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Schichan [Tue, 21 Jul 2015 12:14:14 +0000 (14:14 +0200)]
ARM: net: fix vlan access instructions in ARM JIT.
This makes BPF_ANC | SKF_AD_VLAN_TAG and BPF_ANC | SKF_AD_VLAN_TAG_PRESENT
have the same behaviour as the in kernel VM and makes the test_bpf LD_VLAN_TAG
and LD_VLAN_TAG_PRESENT tests pass.
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Schichan [Tue, 21 Jul 2015 12:14:13 +0000 (14:14 +0200)]
ARM: net: handle negative offsets in BPF JIT.
Previously, the JIT would reject negative offsets known during code
generation and mishandle negative offsets provided at runtime.
Fix that by calling bpf_internal_load_pointer_neg_helper()
appropriately in the jit_get_skb_{b,h,w} slow path helpers and by forcing
the execution flow to the slow path helpers when the offset is
negative.
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Nicolas Schichan [Tue, 21 Jul 2015 12:14:12 +0000 (14:14 +0200)]
ARM: net: fix condition for load_order > 0 when translating load instructions.
To check whether the load should take the fast path or not, the code
would check that (r_skb_hlen - load_order) is greater than the offset
of the access using an "Unsigned higher or same" condition. For
halfword accesses and an skb length of 1 at offset 0, that test is
valid, as we end up comparing 0xffffffff(-1) and 0, so the fast path
is taken and the filter allows the load to wrongly succeed. A similar
issue exists for word loads at offset 0 and an skb length of less than
4.
Fix that by using the condition "Signed greater than or equal"
condition for the fast path code for load orders greater than 0.
Signed-off-by: Nicolas Schichan <nschichan@freebox.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 22 Jul 2015 05:02:00 +0000 (07:02 +0200)]
tcp: suppress a division by zero warning
Andrew Morton reported following warning on one ARM build
with gcc-4.4 :
net/ipv4/inet_hashtables.c: In function 'inet_ehash_locks_alloc':
net/ipv4/inet_hashtables.c:617: warning: division by zero
Even guarded with a test on sizeof(spinlock_t), compiler does not
like current construct on a !CONFIG_SMP build.
Remove the warning by using a temporary variable.
Fixes: 095dc8e0c368 ("tcp: fix/cleanup inet_ehash_locks_alloc()") Reported-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kinglong Mee reports a memory leak with that patch, and Jan Kara confirms:
"Thanks for report! You are right that my patch introduces a race
between fsnotify kthread and fsnotify_destroy_group() which can result
in leaking inotify event on group destruction.
I haven't yet decided whether the right fix is not to queue events for
dying notification group (as that is pointless anyway) or whether we
should just fix the original problem differently... Whenever I look
at fsnotify code mark handling I get lost in the maze of locks, lists,
and subtle differences between how different notification systems
handle notification marks :( I'll think about it over night"
and after thinking about it, Jan says:
"OK, I have looked into the code some more and I found another
relatively simple way of fixing the original oops. It will be IMHO
better than trying to fixup this issue which has more potential for
breakage. I'll ask Linus to revert the fsnotify fix he already merged
and send a new fix"
Reported-by: Kinglong Mee <kinglongmee@gmail.com> Requested-by: Jan Kara <jack@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
David S. Miller [Tue, 21 Jul 2015 23:06:39 +0000 (16:06 -0700)]
Merge tag 'wireless-drivers-for-davem-2015-07-20' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers
Kalle Valo says:
====================
ath9k:
* fix device ID check for AR956x
iwlwifi:
* bug fixes specific for 8000 series
* fix a crash in time events
* fix a crash in PCIe transport
* fix BT Coex code that prevented association on certain
devices (3160).
* revert the new RBD allocation model because it introduced
a bug when running on weak VM setups.
* new device IDs
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'pinctrl-v4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl
Pull pin control fixes from Linus Walleij:
"Here are some overly ripe pin control fixes for the v4.2 series.
They got delayed because of various crap commits and having to clean
and rinse the patch stack a few times. Now they are however looking
good.
- some dead defines dropped from the Samsung driver, was targeted for
-rc2 but got delayed
- drop the strict mode from abx500, this was too strict
- fix the R-Car sparse IRQs code to work as intended
- fix the IRQ code for the pinctrl-single GPIO backend to not enforce
threaded IRQs
- clear the latched events/IRQs for the Broadcom BCM2835 driver
- fix up debugfs for the Freescale imx1 driver
- fix a typo bug in the Schmitt Trigger setup in the LPC18xx driver"
* tag 'pinctrl-v4.2-2' of git://git.kernel.org/pub/scm/linux/kernel/git/linusw/linux-pinctrl:
pinctrl: lpc18xx: fix schmitt trigger setup
Subject: pinctrl: imx1-core: Fix debug output in .pin_config_set callback
pinctrl: bcm2835: Clear the event latch register when disabling interrupts
pinctrl: single: ensure pcs irq will not be forced threaded
sh-pfc: fix sparse GPIOs for R-Car SoCs
pinctrl: abx500: remove strict mode
pinctrl: samsung: Remove old unused defines
Merge tag 'trace-v4.2-rc2-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing sample code fix from Steven Rostedt:
"He Kuang noticed that the sample code using the trace_event helper
function __get_dynamic_array_len() is broken.
This only changes the sample code, and I'm pushing this now instead of
later because I don't want others using the broken code as an example
when using it for real"
* tag 'trace-v4.2-rc2-fix2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Fix sample output of dynamic arrays
Mugunthan V N [Tue, 21 Jul 2015 10:30:42 +0000 (16:00 +0530)]
drivers: net: cpsw: remove tx event processing in rx napi poll
With commit c03abd84634d ("net: ethernet: cpsw: don't requests IRQs
we don't use") common isr and napi are separated into separate tx isr
and rx isr/napi, but still in rx napi tx events are handled. So removing
the tx event handling in rx napi.
Signed-off-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
inet: frags: fix defragmented packet's IP header for af_packet
When ip_frag_queue() computes positions, it assumes that the passed
sk_buff does not contain L2 headers.
However, when PACKET_FANOUT_FLAG_DEFRAG is used, IP reassembly
functions can be called on outgoing packets that contain L2 headers.
Also, IPv4 checksum is not corrected after reassembly.
Fixes: 7736d33f4262 ("packet: Add pre-defragmentation support for ipv4 fanouts.") Signed-off-by: Edward Hyunkoo Jee <edjee@google.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Willem de Bruijn <willemb@google.com> Cc: Jerry Chu <hkchu@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Simon Guinot [Sun, 19 Jul 2015 11:00:53 +0000 (13:00 +0200)]
net: mvneta: fix refilling for Rx DMA buffers
With the actual code, if a memory allocation error happens while
refilling a Rx descriptor, then the original Rx buffer is both passed
to the networking stack (in a SKB) and let in the Rx ring. This leads
to various kernel oops and crashes.
As a fix, this patch moves Rx descriptor refilling ahead of building
SKB with the associated Rx buffer. In case of a memory allocation
failure, data is dropped and the original DMA buffer is put back into
the Rx ring.
Signed-off-by: Simon Guinot <simon.guinot@sequanux.org> Fixes: c5aff18204da ("net: mvneta: driver for Marvell Armada 370/XP network unit") Cc: <stable@vger.kernel.org> # v3.8+ Tested-by: Yoann Sculo <yoann@sculo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Joachim Eastwood [Fri, 17 Jul 2015 21:48:17 +0000 (23:48 +0200)]
stmmac: fix setting of driver data in stmmac_dvr_probe
Commit 803f8fc46274b ("stmmac: move driver data setting into
stmmac_dvr_probe") mistakenly set priv and not priv->dev as
driver data. This meant that the remove, resume and suspend
callbacks that fetched and tried to use this data would most
likely explode. Fix the issue by using the correct variable.
Fixes: 803f8fc46274b ("stmmac: move driver data setting into stmmac_dvr_probe") Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 17 Jul 2015 20:38:45 +0000 (22:38 +0200)]
sched: cls_flow: fix panic on filter replace
The following test case causes a NULL pointer dereference in cls_flow:
tc filter add dev foo parent 1: handle 0x1 flow hash keys dst action ok
tc filter replace dev foo parent 1: pref 49152 handle 0x1 \
flow hash keys mark action drop
To be more precise, actually two different panics are fixed, the first
occurs because tcf_exts_init() is not called on the newly allocated
filter when we do a replace. And the second panic uncovered after that
happens since the arguments of list_replace_rcu() are swapped, the old
element needs to be the first argument and the new element the second.
Fixes: 70da9f0bf999 ("net: sched: cls_flow use RCU") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 17 Jul 2015 20:38:44 +0000 (22:38 +0200)]
sched: cls_flower: fix panic on filter replace
The following test case causes a NULL pointer dereference in cls_flower:
tc filter add dev foo parent 1: flower eth_type ipv4 action ok flowid 1:1
tc filter replace dev foo parent 1: pref 49152 handle 0x1 \
flower eth_type ipv6 action ok flowid 1:1
The problem is that commit 77b9900ef53a ("tc: introduce Flower classifier")
accidentally swapped the arguments of list_replace_rcu(), the old
element needs to be the first argument and the new element the second.
Fixes: 77b9900ef53a ("tc: introduce Flower classifier") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Pirko <jiri@resnulli.us> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 17 Jul 2015 20:38:43 +0000 (22:38 +0200)]
sched: cls_bpf: fix panic on filter replace
The following test case causes a NULL pointer dereference in cls_bpf:
FOO="1,6 0 0 4294967295,"
tc filter add dev foo parent 1: bpf bytecode "$FOO" flowid 1:1 action ok
tc filter replace dev foo parent 1: pref 49152 handle 0x1 \
bpf bytecode "$FOO" flowid 1:1 action drop
The problem is that commit 1f947bf151e9 ("net: sched: rcu'ify cls_bpf")
accidentally swapped the arguments of list_replace_rcu(), the old
element needs to be the first argument and the new element the second.
Fixes: 1f947bf151e9 ("net: sched: rcu'ify cls_bpf") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: John Fastabend <john.r.fastabend@intel.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 21 Jul 2015 07:17:53 +0000 (00:17 -0700)]
Merge tag 'mac80211-for-davem-2015-07-17' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Some fixes for the current cycle:
1. Arik introduced an rtnl-locked regulatory API to be able
to differentiate between place do/don't have the RTNL;
this fixes missing locking in some of the code paths
2. Two small mesh bugfixes from Bob, one to avoid treating
a certain malformed over-the-air frame and one to avoid
sending a garbage field over the air.
3. A fix for powersave during WoWLAN suspend from Krishna Chaitanya.
4. A fix for a powersave vs. aggregation teardown race, from Michal.
5. Thomas reduced the loglevel of CRDA messages to avoid spamming
the kernel log with mostly irrelevant information.
6. Tom fixed a dangling debugfs directory pointer that could cause
crashes if subsequent addition of the same interface to debugfs
failed for some reason.
7. A fix from myself for a list corruption issue in mac80211 during
combined interface shutdown/removal - shut down interfaces first
and only then remove them to avoid that.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: ratelimit warnings about dst entry refcount underflow or overflow
Kernel generates a lot of warnings when dst entry reference counter
overflows and becomes negative. That bug was seen several times at
machines with outdated 3.10.y kernels. Most like it's already fixed
in upstream. Anyway that flood completely kills machine and makes
further debugging impossible.
Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 17 Jul 2015 08:19:23 +0000 (10:19 +0200)]
caif: fix leaks and race in caif_queue_rcv_skb()
1) If sk_filter() is applied, skb was leaked (not freed)
2) Testing SOCK_DEAD twice is racy :
packet could be freed while already queued.
3) Remove obsolete comment about caching skb->len
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qmi_wwan: add the second QMI/network interface for Sierra Wireless MC7305/MC7355
Sierra Wireless MC7305/MC7355 with USB ID 1199:9041 also provide a
second QMI/network interface like the MC73xx with USB ID 1199:68c0 on
USB interface #10 when used in the appropriate USB configuration.
Add the corresponding QMI_FIXED_INTF entry to the qmi_wwan driver.
Please note that the second QMI/network interface is not working for
early MC73xx firmware versions like 01.08.x as the device does not
respond to QMI messages on the second /dev/cdc-wdm port.
Signed-off-by: Reinhard Speyerer <rspmn@arcor.de> Signed-off-by: David S. Miller <davem@davemloft.net>
net: netcp: fix improper initialization in netcp_ndo_open()
The keystone qmss will raise interrupt when packet arrive at the
receive queue. Only control available to avoid interrupt from happening
is to keep the free descriptor queue (FDQ) empty in the receive side.
So the filling of descriptors into the FDQ has to happen after
request_irq() call is made as part of knav_queue_enable_notify(). So
move the function netcp_rxpool_refill() after this call.
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: correct the MAC address for "follow" fail_over_mac policy
The "follow" fail_over_mac policy is useful for multiport devices that
either become confused or incur a performance penalty when multiple
ports are programmed with the same MAC address, but the same MAC
address still may happened by this steps for this policy:
1) echo +eth0 > /sys/class/net/bond0/bonding/slaves
bond0 has the same mac address with eth0, it is MAC1.
2) echo +eth1 > /sys/class/net/bond0/bonding/slaves
eth1 is backup, eth1 has MAC2.
3) ifconfig eth0 down
eth1 became active slave, bond will swap MAC for eth0 and eth1,
so eth1 has MAC1, and eth0 has MAC2.
4) ifconfig eth1 down
there is no active slave, and eth1 still has MAC1, eth2 has MAC2.
5) ifconfig eth0 up
the eth0 became active slave again, the bond set eth0 to MAC1.
Something wrong here, then if you set eth1 up, the eth0 and eth1 will have the same
MAC address, it will break this policy for ACTIVE_BACKUP mode.
This patch will fix this problem by finding the old active slave and
swap them MAC address before change active slave.
Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> Tested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Mon, 20 Jul 2015 09:55:38 +0000 (17:55 +0800)]
Revert "sit: Add gro callbacks to sit_offload"
This patch reverts 19424e052fb44da2f00d1a868cbb51f3e9f4bbb5 ("sit:
Add gro callbacks to sit_offload") because it generates packets
that cannot be handled even by our own GSO.
Reported-by: Wolfgang Walter <linux@stwm.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
net: dsa: bcm_sf2: do not use indirect reads and writes for 7445E0
7445E0 contains an ECO which disconnected the internal SF2 pseudo-PHY which was
known to conflict with the external pseudo-PHY of BCM53125 switches. This
motivated the need to utilize the internal SF2 MDIO controller via indirect
register reads/writes to control external Broadcom switches due to this address
conflict (both responded at address 30d).
For 7445E0, the internal pseudo-PHY of the SF2 switch got disconnected, and as
a consequence this prevents the internal SF2 MDIO bus controller from reading
data (reads back everything as 0) since the MDI line is tied low.
Fix this by making the indirect register reads and writes conditional to
7445D0, on 7445E0 we can utilize the SWITCH_MDIO controller (backed by
mdio-unimac and not the DSA created slave MII bus).
We utilize of_machine_is_compatible() here since this is the only way for use
to differentiate between these two chips in a way that does not violate layers
or becomes (too) vendor-specific.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: correctly handle bonding type change on enslave failure
If the bond is enslaving a device with different type it will be setup
by it, but if after being setup the enslave fails the bond doesn't
switch back its type and also keeps pointers to foreign structures that can
be long gone. Thus revert back any type changes if the enslave failed and
the bond had to change its type.
Example:
Before patch:
$ echo lo > bond0/bonding/slaves
-bash: echo: write error: Cannot assign requested address
$ ip l sh bond0
20: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
mode DEFAULT group default
link/loopback 16:54:78:34:bd:41 brd 00:00:00:00:00:00
$ echo +eth1 > bond0/bonding/slaves
$ ip l sh bond0
20: bond0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode
DEFAULT group default qlen 1000
link/ether 52:54:00:3f:47:69 brd ff:ff:ff:ff:ff:ff
(notice the MASTER flag is gone)
After patch:
$ echo lo > bond0/bonding/slaves
-bash: echo: write error: Cannot assign requested address
$ ip l sh bond0
21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
mode DEFAULT group default qlen 1000
link/ether 6e:66:94:f6:07:fc brd ff:ff:ff:ff:ff:ff
$ echo +eth1 > bond0/bonding/slaves
$ ip l sh bond0
21: bond0: <BROADCAST,MULTICAST,MASTER> mtu 1500 qdisc noop state DOWN
mode DEFAULT group default qlen 1000
link/ether 52:54:00:3f:47:69 brd ff:ff:ff:ff:ff:ff
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: e36b9d16c6a6 ("bonding: clean muticast addresses when device changes type") Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: fix destruction of bond with devices different from arphrd_ether
When the bonding is being unloaded and the netdevice notifier is
unregistered it executes NETDEV_UNREGISTER for each device which should
remove the bond's proc entry but if the device enslaved is not of
ARPHRD_ETHER type and is in front of the bonding, it may execute
bond_release_and_destroy() first which would release the last slave and
destroy the bond device leaving the proc entry and thus we will get the
following error (with dynamic debug on for bond_netdev_event to see the
events order):
[ 908.963051] eql: event: 9
[ 908.963052] eql: IFF_SLAVE
[ 908.963054] eql: event: 2
[ 908.963056] eql: IFF_SLAVE
[ 908.963058] eql: event: 6
[ 908.963059] eql: IFF_SLAVE
[ 908.963110] bond0: Releasing active interface eql
[ 908.976168] bond0: Destroying bond bond0
[ 908.976266] bond0 (unregistering): Released all slaves
[ 908.984097] ------------[ cut here ]------------
[ 908.984107] WARNING: CPU: 0 PID: 1787 at fs/proc/generic.c:575
remove_proc_entry+0x112/0x160()
[ 908.984110] remove_proc_entry: removing non-empty directory
'net/bonding', leaking at least 'bond0'
[ 908.984111] Modules linked in: bonding(-) eql(O) 9p nfsd auth_rpcgss
oid_registry nfs_acl nfs lockd grace fscache sunrpc crct10dif_pclmul
crc32_pclmul crc32c_intel ghash_clmulni_intel ppdev qxl drm_kms_helper
snd_hda_codec_generic aesni_intel ttm aes_x86_64 glue_helper pcspkr lrw
gf128mul ablk_helper cryptd snd_hda_intel virtio_console snd_hda_codec
psmouse serio_raw snd_hwdep snd_hda_core 9pnet_virtio 9pnet evdev joydev
drm virtio_balloon snd_pcm snd_timer snd soundcore i2c_piix4 i2c_core
pvpanic acpi_cpufreq parport_pc parport processor thermal_sys button
autofs4 ext4 crc16 mbcache jbd2 hid_generic usbhid hid sg sr_mod cdrom
ata_generic virtio_blk virtio_net floppy ata_piix e1000 libata ehci_pci
virtio_pci scsi_mod uhci_hcd ehci_hcd virtio_ring virtio usbcore
usb_common [last unloaded: bonding]
Thus remove the proc entry manually if bond_release_and_destroy() is
used. Because of the checks in bond_remove_proc_entry() it's not a
problem for a bond device to change namespaces (the bug fixed by the
Fixes commit) but since commit f9399814927ad ("bonding: Don't allow bond devices to change network
namespaces.") that can't happen anyway.
Reported-by: Carol Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Fixes: a64d49c3dd50 ("bonding: Manage /proc/net/bonding/ entries from
the netdev events") Tested-by: Carol L Soto <clsoto@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Wed, 15 Jul 2015 14:07:07 +0000 (10:07 -0400)]
net: dsa: mv88e6xxx: fix fid_mask when leaving bridge
The mv88e6xxx_priv_state structure contains an fid_mask, where 1 means
the FID is free to use, 0 means the FID is in use.
This patch fixes the bit clear in mv88e6xxx_leave_bridge() when
assigning a new FID to a port.
Example scenario: I have 7 ports, port 5 is CPU, port 6 is unused (no
PHY). After setting the ports 0, 1 and 2 in bridge br0, and ports 3 and
4 in bridge br1, I have the following fid_mask: 0b111110010110 (0xf96).
Indeed, br0 uses FID 0, and br1 uses FID 3.
After setting nomaster for port 0, I get the wrong fid_mask: 0b10 (0x2).
With this patch we correctly get 0b111110010100 (0xf94), meaning port 0
uses FID 1, br0 uses FID 0, and br1 uses FID 3.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: David S. Miller <davem@davemloft.net>
virtio_net: don't require ANY_LAYOUT with VERSION_1
ANY_LAYOUT is a compatibility feature. It's implied
for VERSION_1 devices, and non-transitional devices
might not offer it. Change code to behave accordingly.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com> Reviewed-by: Paolo Bonzini <pbonzini@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'ipvs-fixes-for-v4.2' of https://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs
Simon Horman says:
====================
IPVS Fixes for v4.2
please consider this fix for v4.2.
For reasons that are not clear to me it is a bumper crop.
It seems to me that they are all relevant to stable.
Please let me know if you need my help to get the fixes into stable.
* ipvs: fix ipv6 route unreach panic
This problem appears to be present since IPv6 support was added to
IPVS in v2.6.28.
* ipvs: skb_orphan in case of forwarding
This appears to resolve a problem resulting from a side effect of 41063e9dd119 ("ipv4: Early TCP socket demux.") which was included in v3.6.
* ipvs: do not use random local source address for tunnels
This appears to resolve a problem introduced by 026ace060dfe ("ipvs: optimize dst usage for real server") in v3.10.
* ipvs: fix crash if scheduler is changed
This appears to resolve a problem introduced by ceec4c381681 ("ipvs: convert services to rcu") in v3.10.
Julian has provided backports of the fix:
* [PATCHv2 3.10.81] ipvs: fix crash if scheduler is changed
http://www.spinics.net/lists/lvs-devel/msg04008.html
* [PATCHv2 3.12.44,3.14.45,3.18.16,4.0.6] ipvs: fix crash if scheduler is changed
http://www.spinics.net/lists/lvs-devel/msg04007.html
Please let me know how you would like to handle guiding these
backports into stable.
* ipvs: fix crash with sync protocol v0 and FTP
This appears to resolve a problem introduced by 749c42b620a9 ("ipvs: reduce sync rate with time thresholds") in v3.5
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: fix netns dependencies with conntrack templates
Quoting Daniel Borkmann:
"When adding connection tracking template rules to a netns, f.e. to
configure netfilter zones, the kernel will endlessly busy-loop as soon
as we try to delete the given netns in case there's at least one
template present, which is problematic i.e. if there is such bravery that
the priviledged user inside the netns is assumed untrusted.
Minimal example:
ip netns add foo
ip netns exec foo iptables -t raw -A PREROUTING -d 1.2.3.4 -j CT --zone 1
ip netns del foo
What happens is that when nf_ct_iterate_cleanup() is being called from
nf_conntrack_cleanup_net_list() for a provided netns, we always end up
with a net->ct.count > 0 and thus jump back to i_see_dead_people. We
don't get a soft-lockup as we still have a schedule() point, but the
serving CPU spins on 100% from that point onwards.
Since templates are normally allocated with nf_conntrack_alloc(), we
also bump net->ct.count. The issue why they are not yet nf_ct_put() is
because the per netns .exit() handler from x_tables (which would eventually
invoke xt_CT's xt_ct_tg_destroy() that drops reference on info->ct) is
called in the dependency chain at a *later* point in time than the per
netns .exit() handler for the connection tracker.
This is clearly a chicken'n'egg problem: after the connection tracker
.exit() handler, we've teared down all the connection tracking
infrastructure already, so rightfully, xt_ct_tg_destroy() cannot be
invoked at a later point in time during the netns cleanup, as that would
lead to a use-after-free. At the same time, we cannot make x_tables depend
on the connection tracker module, so that the xt_ct_tg_destroy() would
be invoked earlier in the cleanup chain."
Daniel confirms this has to do with the order in which modules are loaded or
having compiled nf_conntrack as modules while x_tables built-in. So we have no
guarantees regarding the order in which netns callbacks are executed.
Fix this by allocating the templates through kmalloc() from the respective
SYNPROXY and CT targets, so they don't depend on the conntrack kmem cache.
Then, release then via nf_ct_tmpl_free() from destroy_conntrack(). This branch
is marked as unlikely since conntrack templates are rarely allocated and only
from the configuration plane path.
Note that templates are not kept in any list to avoid further dependencies with
nf_conntrack anymore, thus, the tmpl larval list is removed.
Reported-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Tested-by: Daniel Borkmann <daniel@iogearbox.net>
This causes some of the offsets used in entry.S to overflow their
instruction operand field. To fix this use aghi to create a
dedicated pointer for the thread_struct.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Migrate avr32 driver to the new 'set-state' interface provided by
clockevents core, the earlier 'set-mode' interface is marked obsolete
now.
This also enables us to implement callbacks for new states of clockevent
devices, for example: ONESHOT_STOPPED.
We want to call cpu_idle_poll_ctrl() in shutdown only if we were in
oneshot or resume state earlier. Create another variable to save this
information and check that in shutdown callback.
Joachim Eastwood [Tue, 14 Jul 2015 22:25:26 +0000 (00:25 +0200)]
pinctrl: lpc18xx: fix schmitt trigger setup
The param_val variable is what determines if schmitt
trigger is enabled on a pin or not. A typo here mean
that schmitt trigger was always enabled for standard
and i2c pins.
Signed-off-by: Joachim Eastwood <manabian@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Subject: pinctrl: imx1-core: Fix debug output in .pin_config_set callback
imx1_pinconf_set assumes that the array of pins in struct
imx1_pinctrl_soc_info can be indexed by pin id to get the
pinctrl_pin_desc for a pin. This used to be correct up to commit 607af165c047 which removed some entries from the array and so made it
wrong to access the array by pin id.
The result of this bug is a wrong pin name in the output for small pin
ids and an oops for the bigger ones.
This patch is the result of a discussion that includes patches by Markus
Pargmann and Chris Ruehl.
Fixes: 607af165c047 ("pinctrl: i.MX27: Remove nonexistent pad definitions") Cc: stable@vger.kernel.org Reported-by: Chris Ruehl <chris.ruehl@gtsys.com.hk> Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Reviewed-by: Markus Pargmann <mpa@pengutronix.de> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Jonathan Bell [Tue, 30 Jun 2015 11:35:39 +0000 (12:35 +0100)]
pinctrl: bcm2835: Clear the event latch register when disabling interrupts
It's possible to hit a race condition if interrupts are generated on a GPIO
pin when the IRQ line in question is being disabled.
If the interrupt is freed, bcm2835_gpio_irq_disable() is called which
disables the event generation sources (edge, level). If an event occurred
between the last disabling of hard IRQs and the write to the event
source registers, a bit would be set in the GPIO event detect register
(GPEDSn) which goes unacknowledged by bcm2835_gpio_irq_handler()
so Linux complains loudly.
There is no per-GPIO mask register, so when disabling GPIO interrupts
write 1 to the relevant bit in GPEDSn to clear out any stale events.
Signed-off-by: Jonathan Bell <jonathan@raspberrypi.org> Acked-by: Stephen Warren <swarren@wwwdotorg.org> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
pinctrl: single: ensure pcs irq will not be forced threaded
The PSC IRQ is requested using request_irq() API and as result it can
be forced to be threaded IRQ in RT-Kernel if PCS_QUIRK_HAS_SHARED_IRQ
is enabled for pinctrl domain.
As result, following 'possible irq lock inversion dependency' report
can be seen:
=========================================================
[ INFO: possible irq lock inversion dependency detected ] 3.14.43-rt42-00360-g96ff499-dirty #24 Not tainted
---------------------------------------------------------
irq/369-pinctrl/927 just changed the state of lock:
(&pcs->lock){+.....}, at: [<c0375b54>] pcs_irq_handle+0x48/0x9c
but this lock was taken by another, HARDIRQ-safe lock in the past:
(&irq_desc_lock_class){-.....}
and interrupts could create inverse lock ordering between them.
other info that might help us debug this:
Possible interrupt unsafe locking scenario: