]> git.proxmox.com Git - mirror_ubuntu-zesty-kernel.git/log
mirror_ubuntu-zesty-kernel.git
8 years agoext4: fix bh->b_state corruption
Jan Kara [Fri, 19 Feb 2016 05:18:25 +0000 (00:18 -0500)]
ext4: fix bh->b_state corruption

BugLink: http://bugs.launchpad.net/bugs/1553179
commit ed8ad83808f009ade97ebbf6519bc3a97fefbc0c upstream.

ext4 can update bh->b_state non-atomically in _ext4_get_block() and
ext4_da_get_block_prep(). Usually this is fine since bh is just a
temporary storage for mapping information on stack but in some cases it
can be fully living bh attached to a page. In such case non-atomic
update of bh->b_state can race with an atomic update which then gets
lost. Usually when we are mapping bh and thus updating bh->b_state
non-atomically, nobody else touches the bh and so things work out fine
but there is one case to especially worry about: ext4_finish_bio() uses
BH_Uptodate_Lock on the first bh in the page to synchronize handling of
PageWriteback state. So when blocksize < pagesize, we can be atomically
modifying bh->b_state of a buffer that actually isn't under IO and thus
can race e.g. with delalloc trying to map that buffer. The result is
that we can mistakenly set / clear BH_Uptodate_Lock bit resulting in the
corruption of PageWriteback state or missed unlock of BH_Uptodate_Lock.

Fix the problem by always updating bh->b_state bits atomically.

Reported-by: Nikolay Borisov <kernel@kyup.com>
Signed-off-by: Jan Kara <jack@suse.cz>
Signed-off-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Nikolay Borisov <kernel@kyup.com>
[NB: Backported to 4.4.2]
Acked-by: Theodore Ts'o <tytso@mit.edu>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agosctp: Fix port hash table size computation
Neil Horman [Thu, 18 Feb 2016 21:10:57 +0000 (16:10 -0500)]
sctp: Fix port hash table size computation

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit d9749fb5942f51555dc9ce1ac0dbb1806960a975 ]

Dmitry Vyukov noted recently that the sctp_port_hashtable had an error in
its size computation, observing that the current method never guaranteed
that the hashsize (measured in number of entries) would be a power of two,
which the input hash function for that table requires.  The root cause of
the problem is that two values need to be computed (one, the allocation
order of the storage requries, as passed to __get_free_pages, and two the
number of entries for the hash table).  Both need to be ^2, but for
different reasons, and the existing code is simply computing one order
value, and using it as the basis for both, which is wrong (i.e. it assumes
that ((1<<order)*PAGE_SIZE)/sizeof(bucket) is still ^2 when its not).

To fix this, we change the logic slightly.  We start by computing a goal
allocation order (which is limited by the maximum size hash table we want
to support.  Then we attempt to allocate that size table, decreasing the
order until a successful allocation is made.  Then, with the resultant
successful order we compute the number of buckets that hash table supports,
which we then round down to the nearest power of two, giving us the number
of entries the table actually supports.

I've tested this locally here, using non-debug and spinlock-debug kernels,
and the number of entries in the hashtable consistently work out to be
powers of two in all cases.

Signed-off-by: Neil Horman <nhorman@tuxdriver.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
CC: Dmitry Vyukov <dvyukov@google.com>
CC: Vladislav Yasevich <vyasevich@gmail.com>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agounix_diag: fix incorrect sign extension in unix_lookup_by_ino
Dmitry V. Levin [Fri, 19 Feb 2016 01:27:48 +0000 (04:27 +0300)]
unix_diag: fix incorrect sign extension in unix_lookup_by_ino

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit b5f0549231ffb025337be5a625b0ff9f52b016f0 ]

The value passed by unix_diag_get_exact to unix_lookup_by_ino has type
__u32, but unix_lookup_by_ino's argument ino has type int, which is not
a problem yet.
However, when ino is compared with sock_i_ino return value of type
unsigned long, ino is sign extended to signed long, and this results
to incorrect comparison on 64-bit architectures for inode numbers
greater than INT_MAX.

This bug was found by strace test suite.

Fixes: 5d3cae8bc39d ("unix_diag: Dumping exact socket core")
Signed-off-by: Dmitry V. Levin <ldv@altlinux.org>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agotipc: unlock in error path
Insu Yun [Wed, 17 Feb 2016 16:47:35 +0000 (11:47 -0500)]
tipc: unlock in error path

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit b53ce3e7d407aa4196877a48b8601181162ab158 ]

tipc_bcast_unlock need to be unlocked in error path.

Signed-off-by: Insu Yun <wuninsu@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agortnl: RTM_GETNETCONF: fix wrong return value
Anton Protopopov [Wed, 17 Feb 2016 02:43:16 +0000 (21:43 -0500)]
rtnl: RTM_GETNETCONF: fix wrong return value

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit a97eb33ff225f34a8124774b3373fd244f0e83ce ]

An error response from a RTM_GETNETCONF request can return the positive
error value EINVAL in the struct nlmsgerr that can mislead userspace.

Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoIFF_NO_QUEUE: Fix for drivers not calling ether_setup()
Phil Sutter [Wed, 17 Feb 2016 14:37:43 +0000 (15:37 +0100)]
IFF_NO_QUEUE: Fix for drivers not calling ether_setup()

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit a813104d923339144078939175faf4e66aca19b4 ]

My implementation around IFF_NO_QUEUE driver flag assumed that leaving
tx_queue_len untouched (specifically: not setting it to zero) by drivers
would make it possible to assign a regular qdisc to them without having
to worry about setting tx_queue_len to a useful value. This was only
partially true: I overlooked that some drivers don't call ether_setup()
and therefore not initialize tx_queue_len to the default value of 1000.
Consequently, removing the workarounds in place for that case in qdisc
implementations which cared about it (namely, pfifo, bfifo, gred, htb,
plug and sfb) leads to problems with these specific interface types and
qdiscs.

Luckily, there's already a sanitization point for drivers setting
tx_queue_len to zero, which can be reused to assign the fallback value
most qdisc implementations used, which is 1.

Fixes: 348e3435cbefa ("net: sched: drop all special handling of tx_queue_len == 0")
Tested-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agotcp/dccp: fix another race at listener dismantle
Eric Dumazet [Thu, 18 Feb 2016 13:39:18 +0000 (05:39 -0800)]
tcp/dccp: fix another race at listener dismantle

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 7716682cc58e305e22207d5bb315f26af6b1e243 ]

Ilya reported following lockdep splat:

kernel: =========================
kernel: [ BUG: held lock freed! ]
kernel: 4.5.0-rc1-ceph-00026-g5e0a311 #1 Not tainted
kernel: -------------------------
kernel: swapper/5/0 is freeing memory
ffff880035c9d200-ffff880035c9dbff, with a lock still held there!
kernel: (&(&queue->rskq_lock)->rlock){+.-...}, at:
[<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0
kernel: 4 locks held by swapper/5/0:
kernel: #0:  (rcu_read_lock){......}, at: [<ffffffff8169ef6b>]
netif_receive_skb_internal+0x4b/0x1f0
kernel: #1:  (rcu_read_lock){......}, at: [<ffffffff816e977f>]
ip_local_deliver_finish+0x3f/0x380
kernel: #2:  (slock-AF_INET){+.-...}, at: [<ffffffff81685ffb>]
sk_clone_lock+0x19b/0x440
kernel: #3:  (&(&queue->rskq_lock)->rlock){+.-...}, at:
[<ffffffff816f6a88>] inet_csk_reqsk_queue_add+0x28/0xa0

To properly fix this issue, inet_csk_reqsk_queue_add() needs
to return to its callers if the child as been queued
into accept queue.

We also need to make sure listener is still there before
calling sk->sk_data_ready(), by holding a reference on it,
since the reference carried by the child can disappear as
soon as the child is put on accept queue.

Reported-by: Ilya Dryomov <idryomov@gmail.com>
Fixes: ebb516af60e1 ("tcp/dccp: fix race at listener dismantle phase")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoroute: check and remove route cache when we get route
Xin Long [Thu, 18 Feb 2016 13:21:19 +0000 (21:21 +0800)]
route: check and remove route cache when we get route

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit deed49df7390d5239024199e249190328f1651e7 ]

Since the gc of ipv4 route was removed, the route cached would has
no chance to be removed, and even it has been timeout, it still could
be used, cause no code to check it's expires.

Fix this issue by checking  and removing route cache when we get route.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agonet_sched fix: reclassification needs to consider ether protocol changes
Jamal Hadi Salim [Thu, 18 Feb 2016 12:38:04 +0000 (07:38 -0500)]
net_sched fix: reclassification needs to consider ether protocol changes

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 619fe32640b4b01f370574d50344ae0f62689816 ]

actions could change the etherproto in particular with ethernet
tunnelled data. Typically such actions, after peeling the outer header,
will ask for the packet to be  reclassified. We then need to restart
the classification with the new proto header.

Example setup used to catch this:
sudo tc qdisc add dev $ETH ingress
sudo $TC filter add dev $ETH parent ffff: pref 1 protocol 802.1Q \
u32 match u32 0 0 flowid 1:1 \
action  vlan pop reclassify

Fixes: 3b3ae880266d ("net: sched: consolidate tc_classify{,_compat}")
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agopppoe: fix reference counting in PPPoE proxy
Guillaume Nault [Mon, 15 Feb 2016 16:01:10 +0000 (17:01 +0100)]
pppoe: fix reference counting in PPPoE proxy

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 29e73269aa4d36f92b35610c25f8b01c789b0dc8 ]

Drop reference on the relay_po socket when __pppoe_xmit() succeeds.
This is already handled correctly in the error path.

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agol2tp: Fix error creating L2TP tunnels
Mark Tomlinson [Mon, 15 Feb 2016 03:24:44 +0000 (16:24 +1300)]
l2tp: Fix error creating L2TP tunnels

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 853effc55b0f975abd6d318cca486a9c1b67e10f ]

A previous commit (33f72e6) added notification via netlink for tunnels
when created/modified/deleted. If the notification returned an error,
this error was returned from the tunnel function. If there were no
listeners, the error code ESRCH was returned, even though having no
listeners is not an error. Other calls to this and other similar
notification functions either ignore the error code, or filter ESRCH.
This patch checks for ESRCH and does not flag this as an error.

Reviewed-by: Hamish Martin <hamish.martin@alliedtelesis.co.nz>
Signed-off-by: Mark Tomlinson <mark.tomlinson@alliedtelesis.co.nz>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agonet/mlx4_en: Avoid changing dev->features directly in run-time
Eugenia Emantayev [Wed, 17 Feb 2016 15:24:27 +0000 (17:24 +0200)]
net/mlx4_en: Avoid changing dev->features directly in run-time

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 925ab1aa9394bbaeac47ee5b65d3fdf0fb8135cf ]

It's forbidden to manually change dev->features in run-time. Currently, this is
done in the driver to make sure that GSO_UDP_TUNNEL is advertized only when
VXLAN tunnel is set. However, since the stack actually does features intersection
with hw_enc_features, we can safely revert to advertizing features early when
registering the netdevice.

Fixes: f4a1edd56120 ('net/mlx4_en: Advertize encapsulation offloads [...]')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agonet/mlx4_en: Count HW buffer overrun only once
Amir Vadai [Wed, 17 Feb 2016 15:24:22 +0000 (17:24 +0200)]
net/mlx4_en: Count HW buffer overrun only once

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 281e8b2fdf8e4ef366b899453cae50e09b577ada ]

RdropOvflw counts overrun of HW buffer, therefore should
be used for rx_fifo_errors only.

Currently RdropOvflw counter is mistakenly also set into
rx_missed_errors and rx_over_errors too, which makes the
device total dropped packets accounting to show wrong results.

Fix that. Use it for rx_fifo_errors only.

Fixes: c27a02cd94d6 ('mlx4_en: Add driver for Mellanox ConnectX 10GbE NIC')
Signed-off-by: Amir Vadai <amir@vadai.me>
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoqmi_wwan: add "4G LTE usb-modem U901"
Bjørn Mork [Fri, 12 Feb 2016 15:42:14 +0000 (16:42 +0100)]
qmi_wwan: add "4G LTE usb-modem U901"

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit aac8d3c282e024c344c5b86dc1eab7af88bb9716 ]

Thomas reports:

T:  Bus=01 Lev=01 Prnt=01 Port=03 Cnt=01 Dev#=  4 Spd=480 MxCh= 0
D:  Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs=  1
P:  Vendor=05c6 ProdID=6001 Rev=00.00
S:  Manufacturer=USB Modem
S:  Product=USB Modem
S:  SerialNumber=1234567890ABCDEF
C:  #Ifs= 5 Cfg#= 1 Atr=e0 MxPwr=500mA
I:  If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
I:  If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
I:  If#= 2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option
I:  If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan
I:  If#= 4 Alt= 0 #EPs= 2 Cls=08(stor.) Sub=06 Prot=50 Driver=usb-storage

Reported-by: Thomas Schäfer <tschaefer@t-online.de>
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agotcp: md5: release request socket instead of listener
Eric Dumazet [Fri, 12 Feb 2016 06:50:29 +0000 (22:50 -0800)]
tcp: md5: release request socket instead of listener

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 729235554d805c63e5e274fcc6a98e71015dd847 ]

If tcp_v4_inbound_md5_hash() returns an error, we must release
the refcount on the request socket, not on the listener.

The bug was added for IPv4 only.

Fixes: 079096f103fac ("tcp/dccp: install syn_recv requests into ehash table")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agotipc: fix premature addition of node to lookup table
Jon Paul Maloy [Wed, 10 Feb 2016 21:14:57 +0000 (16:14 -0500)]
tipc: fix premature addition of node to lookup table

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit d5c91fb72f1652ea3026925240a0998a42ddb16b ]

In commit 5266698661401a ("tipc: let broadcast packet reception
use new link receive function") we introduced a new per-node
broadcast reception link instance. This link is created at the
moment the node itself is created. Unfortunately, the allocation
is done after the node instance has already been added to the node
lookup hash table. This creates a potential race condition, where
arriving broadcast packets are able to find and access the node
before it has been fully initialized, and before the above mentioned
link has been created. The result is occasional crashes in the function
tipc_bcast_rcv(), which is trying to access the not-yet existing link.

We fix this by deferring the addition of the node instance until after
it has been fully initialized in the function tipc_node_create().

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoaf_unix: Guard against other == sk in unix_dgram_sendmsg
Rainer Weikusat [Thu, 11 Feb 2016 19:37:27 +0000 (19:37 +0000)]
af_unix: Guard against other == sk in unix_dgram_sendmsg

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit a5527dda344fff0514b7989ef7a755729769daa1 ]

The unix_dgram_sendmsg routine use the following test

if (unlikely(unix_peer(other) != sk && unix_recvq_full(other))) {

to determine if sk and other are in an n:1 association (either
established via connect or by using sendto to send messages to an
unrelated socket identified by address). This isn't correct as the
specified address could have been bound to the sending socket itself or
because this socket could have been connected to itself by the time of
the unix_peer_get but disconnected before the unix_state_lock(other). In
both cases, the if-block would be entered despite other == sk which
might either block the sender unintentionally or lead to trying to unlock
the same spin lock twice for a non-blocking send. Add a other != sk
check to guard against this.

Fixes: 7d267278a9ec ("unix: avoid use-after-free in ep_remove_wait_queue")
Reported-By: Philipp Hahn <pmhahn@pmhahn.de>
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Tested-by: Philipp Hahn <pmhahn@pmhahn.de>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoaf_unix: Don't set err in unix_stream_read_generic unless there was an error
Rainer Weikusat [Mon, 8 Feb 2016 18:47:19 +0000 (18:47 +0000)]
af_unix: Don't set err in unix_stream_read_generic unless there was an error

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 1b92ee3d03af6643df395300ba7748f19ecdb0c5 ]

The present unix_stream_read_generic contains various code sequences of
the form

err = -EDISASTER;
if (<test>)
goto out;

This has the unfortunate side effect of possibly causing the error code
to bleed through to the final

out:
return copied ? : err;

and then to be wrongly returned if no data was copied because the caller
didn't supply a data buffer, as demonstrated by the program available at

http://pad.lv/1540731

Change it such that err is only set if an error condition was detected.

Fixes: 3822b5c2fc62 ("af_unix: Revert 'lock_interruptible' in stream receive code")
Reported-by: Joseph Salisbury <joseph.salisbury@canonical.com>
Signed-off-by: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoipv4: fix memory leaks in ip_cmsg_send() callers
Eric Dumazet [Thu, 4 Feb 2016 14:23:28 +0000 (06:23 -0800)]
ipv4: fix memory leaks in ip_cmsg_send() callers

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 919483096bfe75dda338e98d56da91a263746a0a ]

Dmitry reported memory leaks of IP options allocated in
ip_cmsg_send() when/if this function returns an error.

Callers are responsible for the freeing.

Many thanks to Dmitry for the report and diagnostic.

Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agobonding: Fix ARP monitor validation
Jay Vosburgh [Tue, 2 Feb 2016 21:35:56 +0000 (13:35 -0800)]
bonding: Fix ARP monitor validation

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 21a75f0915dde8674708b39abfcda113911c49b1 ]

The current logic in bond_arp_rcv will accept an incoming ARP for
validation if (a) the receiving slave is either "active" (which includes
the currently active slave, or the current ARP slave) or, (b) there is a
currently active slave, and it has received an ARP since it became active.
For case (b), the receiving slave isn't the currently active slave, and is
receiving the original broadcast ARP request, not an ARP reply from the
target.

This logic can fail if there is no currently active slave.  In
this situation, the ARP probe logic cycles through all slaves, assigning
each in turn as the "current_arp_slave" for one arp_interval, then setting
that one as "active," and sending an ARP probe from that slave.  The
current logic expects the ARP reply to arrive on the sending
current_arp_slave, however, due to switch FDB updating delays, the reply
may be directed to another slave.

This can arise if the bonding slaves and switch are working, but
the ARP target is not responding.  When the ARP target recovers, a
condition may result wherein the ARP target host replies faster than the
switch can update its forwarding table, causing each ARP reply to be sent
to the previous current_arp_slave.  This will never pass the logic in
bond_arp_rcv, as neither of the above conditions (a) or (b) are met.

Some experimentation on a LAN shows ARP reply round trips in the
200 usec range, but my available switches never update their FDB in less
than 4000 usec.

This patch changes the logic in bond_arp_rcv to additionally
accept an ARP reply for validation on any slave if there is a current ARP
slave and it sent an ARP probe during the previous arp_interval.

Fixes: aeea64ac717a ("bonding: don't trust arp requests unless active slave really works")
Cc: Veaceslav Falico <vfalico@gmail.com>
Cc: Andy Gospodarek <gospo@cumulusnetworks.com>
Signed-off-by: Jay Vosburgh <jay.vosburgh@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agobpf: fix branch offset adjustment on backjumps after patching ctx expansion
Daniel Borkmann [Wed, 10 Feb 2016 15:47:11 +0000 (16:47 +0100)]
bpf: fix branch offset adjustment on backjumps after patching ctx expansion

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit a1b14d27ed0965838350f1377ff97c93ee383492 ]

When ctx access is used, the kernel often needs to expand/rewrite
instructions, so after that patching, branch offsets have to be
adjusted for both forward and backward jumps in the new eBPF program,
but for backward jumps it fails to account the delta. Meaning, for
example, if the expansion happens exactly on the insn that sits at
the jump target, it doesn't fix up the back jump offset.

Analysis on what the check in adjust_branches() is currently doing:

  /* adjust offset of jmps if necessary */
  if (i < pos && i + insn->off + 1 > pos)
    insn->off += delta;
  else if (i > pos && i + insn->off + 1 < pos)
    insn->off -= delta;

First condition (forward jumps):

  Before:                         After:

  insns[0]                        insns[0]
  insns[1] <--- i/insn            insns[1] <--- i/insn
  insns[2] <--- pos               insns[P] <--- pos
  insns[3]                        insns[P]  `------| delta
  insns[4] <--- target_X          insns[P]   `-----|
  insns[5]                        insns[3]
                                  insns[4] <--- target_X
                                  insns[5]

First case is if we cross pos-boundary and the jump instruction was
before pos. This is handeled correctly. I.e. if i == pos, then this
would mean our jump that we currently check was the patchlet itself
that we just injected. Since such patchlets are self-contained and
have no awareness of any insns before or after the patched one, the
delta is correctly not adjusted. Also, for the second condition in
case of i + insn->off + 1 == pos, means we jump to that newly patched
instruction, so no offset adjustment are needed. That part is correct.

Second condition (backward jumps):

  Before:                         After:

  insns[0]                        insns[0]
  insns[1] <--- target_X          insns[1] <--- target_X
  insns[2] <--- pos <-- target_Y  insns[P] <--- pos <-- target_Y
  insns[3]                        insns[P]  `------| delta
  insns[4] <--- i/insn            insns[P]   `-----|
  insns[5]                        insns[3]
                                  insns[4] <--- i/insn
                                  insns[5]

Second interesting case is where we cross pos-boundary and the jump
instruction was after pos. Backward jump with i == pos would be
impossible and pose a bug somewhere in the patchlet, so the first
condition checking i > pos is okay only by itself. However, i +
insn->off + 1 < pos does not always work as intended to trigger the
adjustment. It works when jump targets would be far off where the
delta wouldn't matter. But, for example, where the fixed insn->off
before pointed to pos (target_Y), it now points to pos + delta, so
that additional room needs to be taken into account for the check.
This means that i) both tests here need to be adjusted into pos + delta,
and ii) for the second condition, the test needs to be <= as pos
itself can be a target in the backjump, too.

Fixes: 9bac3d6d548e ("bpf: allow extended BPF programs access skb fields")
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoflow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen
Alexander Duyck [Tue, 9 Feb 2016 10:49:54 +0000 (02:49 -0800)]
flow_dissector: Fix unaligned access in __skb_flow_dissector when used by eth_get_headlen

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 461547f3158978c180d74484d58e82be9b8e7357 ]

This patch fixes an issue with unaligned accesses when using
eth_get_headlen on a page that was DMA aligned instead of being IP aligned.
The fact is when trying to check the length we don't need to be looking at
the flow label so we can reorder the checks to first check if we are
supposed to gather the flow label and then make the call to actually get
it.

v2:  Updated path so that either STOP_AT_FLOW_LABEL or KEY_FLOW_LABEL can
     cause us to check for the flow label.

Reported-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agonet: Copy inner L3 and L4 headers as unaligned on GRE TEB
Alexander Duyck [Tue, 9 Feb 2016 14:14:43 +0000 (06:14 -0800)]
net: Copy inner L3 and L4 headers as unaligned on GRE TEB

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 78565208d73ca9b654fb9a6b142214d52eeedfd1 ]

This patch corrects the unaligned accesses seen on GRE TEB tunnels when
generating hash keys.  Specifically what this patch does is make it so that
we force the use of skb_copy_bits when the GRE inner headers will be
unaligned due to NET_IP_ALIGNED being a non-zero value.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Acked-by: Tom Herbert <tom@herbertland.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agosctp: translate network order to host order when users get a hmacid
Xin Long [Wed, 3 Feb 2016 15:33:30 +0000 (23:33 +0800)]
sctp: translate network order to host order when users get a hmacid

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 7a84bd46647ff181eb2659fdc99590e6f16e501d ]

Commit ed5a377d87dc ("sctp: translate host order to network order when
setting a hmacid") corrected the hmacid byte-order when setting a hmacid.
but the same issue also exists on getting a hmacid.

We fix it by changing hmacids to host order when users get them with
getsockopt.

Fixes: Commit ed5a377d87dc ("sctp: translate host order to network order when setting a hmacid")
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoenic: increment devcmd2 result ring in case of timeout
Sandeep Pillai [Wed, 3 Feb 2016 09:10:44 +0000 (14:40 +0530)]
enic: increment devcmd2 result ring in case of timeout

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit ca7f41a4957b872577807169bd7464b36aae9b9c ]

Firmware posts the devcmd result in result ring. In case of timeout, driver
does not increment the current result pointer and firmware could post the
result after timeout has occurred. During next devcmd, driver would be
reading the result of previous devcmd.

Fix this by incrementing result even in case of timeout.

Fixes: 373fb0873d43 ("enic: add devcmd2")
Signed-off-by: Sandeep Pillai <sanpilla@cisco.com>
Signed-off-by: Govindarajulu Varadarajan <_govind@gmx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agotg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs
Siva Reddy Kallam [Wed, 3 Feb 2016 08:39:38 +0000 (14:09 +0530)]
tg3: Fix for tg3 transmit queue 0 timed out when too many gso_segs

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit b7d987295c74500b733a0ba07f9a9bcc4074fa83 ]

tg3_tso_bug() can hit a condition where the entire tx ring is not big
enough to segment the GSO packet. For example, if MSS is very small,
gso_segs can exceed the tx ring size. When we hit the condition, it
will cause tx timeout.

tg3_tso_bug() is called to handle TSO and DMA hardware bugs.
For TSO bugs, if tg3_tso_bug() cannot succeed, we have to drop the packet.
For DMA bugs, we can still fall back to linearize the SKB and let the
hardware transmit the TSO packet.

This patch adds a function tg3_tso_bug_gso_check() to check if there
are enough tx descriptors for GSO before calling tg3_tso_bug().
The caller will then handle the error appropriately - drop or
lineraize the SKB.

v2: Corrected patch description to avoid confusion.

Signed-off-by: Siva Reddy Kallam <siva.kallam@broadcom.com>
Signed-off-by: Michael Chan <mchan@broadcom.com>
Acked-by: Prashant Sreedharan <prashant@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agonet:Add sysctl_max_skb_frags
Hans Westgaard Ry [Wed, 3 Feb 2016 08:26:57 +0000 (09:26 +0100)]
net:Add sysctl_max_skb_frags

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 5f74f82ea34c0da80ea0b49192bb5ea06e063593 ]

Devices may have limits on the number of fragments in an skb they support.
Current codebase uses a constant as maximum for number of fragments one
skb can hold and use.
When enabling scatter/gather and running traffic with many small messages
the codebase uses the maximum number of fragments and may thereby violate
the max for certain devices.
The patch introduces a global variable as max number of fragments.

Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agotcp: do not drop syn_recv on all icmp reports
Eric Dumazet [Wed, 3 Feb 2016 03:31:12 +0000 (19:31 -0800)]
tcp: do not drop syn_recv on all icmp reports

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 9cf7490360bf2c46a16b7525f899e4970c5fc144 ]

Petr Novopashenniy reported that ICMP redirects on SYN_RECV sockets
were leading to RST.

This is of course incorrect.

A specific list of ICMP messages should be able to drop a SYN_RECV.

For instance, a REDIRECT on SYN_RECV shall be ignored, as we do
not hold a dst per SYN_RECV pseudo request.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111751
Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table")
Reported-by: Petr Novopashenniy <pety@rusnet.ru>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agounix: correctly track in-flight fds in sending process user_struct
Hannes Frederic Sowa [Wed, 3 Feb 2016 01:11:03 +0000 (02:11 +0100)]
unix: correctly track in-flight fds in sending process user_struct

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 415e3d3e90ce9e18727e8843ae343eda5a58fad6 ]

The commit referenced in the Fixes tag incorrectly accounted the number
of in-flight fds over a unix domain socket to the original opener
of the file-descriptor. This allows another process to arbitrary
deplete the original file-openers resource limit for the maximum of
open files. Instead the sending processes and its struct cred should
be credited.

To do so, we add a reference counted struct user_struct pointer to the
scm_fp_list and use it to account for the number of inflight unix fds.

Fixes: 712f4aad406bb1 ("unix: properly account for FDs passed over unix sockets")
Reported-by: David Herrmann <dh.herrmann@gmail.com>
Cc: David Herrmann <dh.herrmann@gmail.com>
Cc: Willy Tarreau <w@1wt.eu>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoipv6: fix a lockdep splat
Eric Dumazet [Wed, 3 Feb 2016 01:55:01 +0000 (17:55 -0800)]
ipv6: fix a lockdep splat

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 44c3d0c1c0a880354e9de5d94175742e2c7c9683 ]

Silence lockdep false positive about rcu_dereference() being
used in the wrong context.

First one should use rcu_dereference_protected() as we own the spinlock.

Second one should be a normal assignation, as no barrier is needed.

Fixes: 18367681a10bd ("ipv6 flowlabel: Convert np->ipv6_fl_list to RCU.")
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoipv6: addrconf: Fix recursive spin lock call
subashab@codeaurora.org [Tue, 2 Feb 2016 02:11:10 +0000 (02:11 +0000)]
ipv6: addrconf: Fix recursive spin lock call

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 16186a82de1fdd868255448274e64ae2616e2640 ]

A rcu stall with the following backtrace was seen on a system with
forwarding, optimistic_dad and use_optimistic set. To reproduce,
set these flags and allow ipv6 autoconf.

This occurs because the device write_lock is acquired while already
holding the read_lock. Back trace below -

INFO: rcu_preempt self-detected stall on CPU { 1}  (t=2100 jiffies
 g=3992 c=3991 q=4471)
<6> Task dump for CPU 1:
<2> kworker/1:0     R  running task    12168    15   2 0x00000002
<2> Workqueue: ipv6_addrconf addrconf_dad_work
<6> Call trace:
<2> [<ffffffc000084da8>] el1_irq+0x68/0xdc
<2> [<ffffffc000cc4e0c>] _raw_write_lock_bh+0x20/0x30
<2> [<ffffffc000bc5dd8>] __ipv6_dev_ac_inc+0x64/0x1b4
<2> [<ffffffc000bcbd2c>] addrconf_join_anycast+0x9c/0xc4
<2> [<ffffffc000bcf9f0>] __ipv6_ifa_notify+0x160/0x29c
<2> [<ffffffc000bcfb7c>] ipv6_ifa_notify+0x50/0x70
<2> [<ffffffc000bd035c>] addrconf_dad_work+0x314/0x334
<2> [<ffffffc0000b64c8>] process_one_work+0x244/0x3fc
<2> [<ffffffc0000b7324>] worker_thread+0x2f8/0x418
<2> [<ffffffc0000bb40c>] kthread+0xe0/0xec

v2: do addrconf_dad_kick inside read lock and then acquire write
lock for ipv6_ifa_notify as suggested by Eric

Fixes: 7fd2561e4ebdd ("net: ipv6: Add a sysctl to make optimistic
addresses useful candidates")

Cc: Eric Dumazet <edumazet@google.com>
Cc: Erik Kline <ek@google.com>
Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoipv6/udp: use sticky pktinfo egress ifindex on connect()
Paolo Abeni [Fri, 29 Jan 2016 11:30:20 +0000 (12:30 +0100)]
ipv6/udp: use sticky pktinfo egress ifindex on connect()

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 1cdda91871470f15e79375991bd2eddc6e86ddb1 ]

Currently, the egress interface index specified via IPV6_PKTINFO
is ignored by __ip6_datagram_connect(), so that RFC 3542 section 6.7
can be subverted when the user space application calls connect()
before sendmsg().
Fix it by initializing properly flowi6_oif in connect() before
performing the route lookup.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()
Paolo Abeni [Fri, 29 Jan 2016 11:30:19 +0000 (12:30 +0100)]
ipv6: enforce flowi6_oif usage in ip6_dst_lookup_tail()

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 6f21c96a78b835259546d8f3fb4edff0f651d478 ]

The current implementation of ip6_dst_lookup_tail basically
ignore the egress ifindex match: if the saddr is set,
ip6_route_output() purposefully ignores flowi6_oif, due
to the commit d46a9d678e4c ("net: ipv6: Dont add RT6_LOOKUP_F_IFACE
flag if saddr set"), if the saddr is 'any' the first route lookup
in ip6_dst_lookup_tail fails, but upon failure a second lookup will
be performed with saddr set, thus ignoring the ifindex constraint.

This commit adds an output route lookup function variant, which
allows the caller to specify lookup flags, and modify
ip6_dst_lookup_tail() to enforce the ifindex match on the second
lookup via said helper.

ip6_route_output() becames now a static inline function build on
top of ip6_route_output_flags(); as a side effect, out-of-tree
modules need now a GPL license to access the output route lookup
functionality.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: David Ahern <dsa@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agotcp: beware of alignments in tcp_get_info()
Eric Dumazet [Wed, 27 Jan 2016 18:52:43 +0000 (10:52 -0800)]
tcp: beware of alignments in tcp_get_info()

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit ff5d749772018602c47509bdc0093ff72acd82ec ]

With some combinations of user provided flags in netlink command,
it is possible to call tcp_get_info() with a buffer that is not 8-bytes
aligned.

It does matter on some arches, so we need to use put_unaligned() to
store the u64 fields.

Current iproute2 package does not trigger this particular issue.

Fixes: 0df48c26d841 ("tcp: add tcpi_bytes_acked to tcp_info")
Fixes: 977cb0ecf82e ("tcp: add pacing_rate information into tcp_info")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoswitchdev: Require RTNL mutex to be held when sending FDB notifications
Ido Schimmel [Wed, 27 Jan 2016 14:16:43 +0000 (15:16 +0100)]
switchdev: Require RTNL mutex to be held when sending FDB notifications

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 4f2c6ae5c64c353fb1b0425e4747e5603feadba1 ]

When switchdev drivers process FDB notifications from the underlying
device they resolve the netdev to which the entry points to and notify
the bridge using the switchdev notifier.

However, since the RTNL mutex is not held there is nothing preventing
the netdev from disappearing in the middle, which will cause
br_switchdev_event() to dereference a non-existing netdev.

Make switchdev drivers hold the lock at the beginning of the
notification processing session and release it once it ends, after
notifying the bridge.

Also, remove switchdev_mutex and fdb_lock, as they are no longer needed
when RTNL mutex is held.

Fixes: 03bf0c281234 ("switchdev: introduce switchdev notifier")
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoinet: frag: Always orphan skbs inside ip_defrag()
Joe Stringer [Fri, 22 Jan 2016 23:49:12 +0000 (15:49 -0800)]
inet: frag: Always orphan skbs inside ip_defrag()

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 8282f27449bf15548cb82c77b6e04ee0ab827bdc ]

Later parts of the stack (including fragmentation) expect that there is
never a socket attached to frag in a frag_list, however this invariant
was not enforced on all defrag paths. This could lead to the
BUG_ON(skb->sk) during ip_do_fragment(), as per the call stack at the
end of this commit message.

While the call could be added to openvswitch to fix this particular
error, the head and tail of the frags list are already orphaned
indirectly inside ip_defrag(), so it seems like the remaining fragments
should all be orphaned in all circumstances.

kernel BUG at net/ipv4/ip_output.c:586!
[...]
Call Trace:
 <IRQ>
 [<ffffffffa0205270>] ? do_output.isra.29+0x1b0/0x1b0 [openvswitch]
 [<ffffffffa02167a7>] ovs_fragment+0xcc/0x214 [openvswitch]
 [<ffffffff81667830>] ? dst_discard_out+0x20/0x20
 [<ffffffff81667810>] ? dst_ifdown+0x80/0x80
 [<ffffffffa0212072>] ? find_bucket.isra.2+0x62/0x70 [openvswitch]
 [<ffffffff810e0ba5>] ? mod_timer_pending+0x65/0x210
 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
 [<ffffffffa03205a2>] ? nf_conntrack_in+0x252/0x500 [nf_conntrack]
 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
 [<ffffffffa02051a3>] do_output.isra.29+0xe3/0x1b0 [openvswitch]
 [<ffffffffa0206411>] do_execute_actions+0xe11/0x11f0 [openvswitch]
 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
 [<ffffffffa0206822>] ovs_execute_actions+0x32/0xd0 [openvswitch]
 [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
 [<ffffffffa02068a2>] ovs_execute_actions+0xb2/0xd0 [openvswitch]
 [<ffffffffa020b505>] ovs_dp_process_packet+0x85/0x140 [openvswitch]
 [<ffffffffa0215019>] ? ovs_ct_get_labels+0x49/0x80 [openvswitch]
 [<ffffffffa0213a1d>] ovs_vport_receive+0x5d/0xa0 [openvswitch]
 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
 [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
 [<ffffffffa02148fc>] internal_dev_xmit+0x6c/0x140 [openvswitch]
 [<ffffffffa0214895>] ? internal_dev_xmit+0x5/0x140 [openvswitch]
 [<ffffffff81660299>] dev_hard_start_xmit+0x2b9/0x5e0
 [<ffffffff8165fc21>] ? netif_skb_features+0xd1/0x1f0
 [<ffffffff81660f20>] __dev_queue_xmit+0x800/0x930
 [<ffffffff81660770>] ? __dev_queue_xmit+0x50/0x930
 [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
 [<ffffffff81669876>] ? neigh_resolve_output+0x106/0x220
 [<ffffffff81661060>] dev_queue_xmit+0x10/0x20
 [<ffffffff816698e8>] neigh_resolve_output+0x178/0x220
 [<ffffffff816a8e6f>] ? ip_finish_output2+0x1ff/0x590
 [<ffffffff816a8e6f>] ip_finish_output2+0x1ff/0x590
 [<ffffffff816a8cee>] ? ip_finish_output2+0x7e/0x590
 [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
 [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
 [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
 [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
 [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
 [<ffffffff816ab4c0>] ip_output+0x70/0x110
 [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
 [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
 [<ffffffff816abf89>] ip_send_skb+0x19/0x40
 [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
 [<ffffffff816df21a>] icmp_push_reply+0xea/0x120
 [<ffffffff816df93d>] icmp_reply.constprop.23+0x1ed/0x230
 [<ffffffff816df9ce>] icmp_echo.part.21+0x4e/0x50
 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
 [<ffffffff810d5f9e>] ? rcu_read_lock_held+0x5e/0x70
 [<ffffffff816dfa06>] icmp_echo+0x36/0x70
 [<ffffffff816e0d11>] icmp_rcv+0x271/0x450
 [<ffffffff816a4ca7>] ip_local_deliver_finish+0x127/0x3a0
 [<ffffffff816a4bc1>] ? ip_local_deliver_finish+0x41/0x3a0
 [<ffffffff816a5160>] ip_local_deliver+0x60/0xd0
 [<ffffffff816a4b80>] ? ip_rcv_finish+0x560/0x560
 [<ffffffff816a46fd>] ip_rcv_finish+0xdd/0x560
 [<ffffffff816a5453>] ip_rcv+0x283/0x3e0
 [<ffffffff810b6302>] ? match_held_lock+0x192/0x200
 [<ffffffff816a4620>] ? inet_del_offload+0x40/0x40
 [<ffffffff8165d062>] __netif_receive_skb_core+0x392/0xae0
 [<ffffffff8165e68e>] ? process_backlog+0x8e/0x230
 [<ffffffff810b53f1>] ? mark_held_locks+0x71/0x90
 [<ffffffff8165d7c8>] __netif_receive_skb+0x18/0x60
 [<ffffffff8165e678>] process_backlog+0x78/0x230
 [<ffffffff8165e6dd>] ? process_backlog+0xdd/0x230
 [<ffffffff8165e355>] net_rx_action+0x155/0x400
 [<ffffffff8106b48c>] __do_softirq+0xcc/0x420
 [<ffffffff816a8e87>] ? ip_finish_output2+0x217/0x590
 [<ffffffff8178e78c>] do_softirq_own_stack+0x1c/0x30
 <EOI>
 [<ffffffff8106b88e>] do_softirq+0x4e/0x60
 [<ffffffff8106b948>] __local_bh_enable_ip+0xa8/0xb0
 [<ffffffff816a8eb0>] ip_finish_output2+0x240/0x590
 [<ffffffff816a9a31>] ? ip_do_fragment+0x831/0x8a0
 [<ffffffff816a9a31>] ip_do_fragment+0x831/0x8a0
 [<ffffffff816a8c70>] ? ip_copy_metadata+0x1b0/0x1b0
 [<ffffffff816a9ae3>] ip_fragment.constprop.49+0x43/0x80
 [<ffffffff816a9c9c>] ip_finish_output+0x17c/0x340
 [<ffffffff8169a6f4>] ? nf_hook_slow+0xe4/0x190
 [<ffffffff816ab4c0>] ip_output+0x70/0x110
 [<ffffffff816a9b20>] ? ip_fragment.constprop.49+0x80/0x80
 [<ffffffff816aa9f9>] ip_local_out+0x39/0x70
 [<ffffffff816abf89>] ip_send_skb+0x19/0x40
 [<ffffffff816abfe3>] ip_push_pending_frames+0x33/0x40
 [<ffffffff816d55d3>] raw_sendmsg+0x7d3/0xc30
 [<ffffffff810b732b>] ? __lock_acquire+0x3db/0x1b90
 [<ffffffff816e7557>] ? inet_sendmsg+0xc7/0x1d0
 [<ffffffff810b63c4>] ? __lock_is_held+0x54/0x70
 [<ffffffff816e759a>] inet_sendmsg+0x10a/0x1d0
 [<ffffffff816e7495>] ? inet_sendmsg+0x5/0x1d0
 [<ffffffff8163e398>] sock_sendmsg+0x38/0x50
 [<ffffffff8163ec5f>] ___sys_sendmsg+0x25f/0x270
 [<ffffffff811aadad>] ? handle_mm_fault+0x8dd/0x1320
 [<ffffffff8178c147>] ? _raw_spin_unlock+0x27/0x40
 [<ffffffff810529b2>] ? __do_page_fault+0x1e2/0x460
 [<ffffffff81204886>] ? __fget_light+0x66/0x90
 [<ffffffff8163f8e2>] __sys_sendmsg+0x42/0x80
 [<ffffffff8163f932>] SyS_sendmsg+0x12/0x20
 [<ffffffff8178cb17>] entry_SYSCALL_64_fastpath+0x12/0x6f
Code: 00 00 44 89 e0 e9 7c fb ff ff 4c 89 ff e8 e7 e7 ff ff 41 8b 9d 80 00 00 00 2b 5d d4 89 d8 c1 f8 03 0f b7 c0 e9 33 ff ff f
 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48
RIP  [<ffffffff816a9a92>] ip_do_fragment+0x892/0x8a0
 RSP <ffff88006d603170>

Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action")
Signed-off-by: Joe Stringer <joe@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agotipc: fix connection abort during subscription cancel
Parthasarathy Bhuvaragan [Wed, 27 Jan 2016 10:35:59 +0000 (11:35 +0100)]
tipc: fix connection abort during subscription cancel

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 4d5cfcba2f6ec494d8810b9e3c0a7b06255c8067 ]

In 'commit 7fe8097cef5f ("tipc: fix nullpointer bug when subscribing
to events")', we terminate the connection if the subscription
creation fails.
In the same commit, the subscription creation result was based on
the value of the subscription pointer (set in the function) instead
of the return code.

Unfortunately, the same function tipc_subscrp_create() handles
subscription cancel request. For a subscription cancellation request,
the subscription pointer cannot be set. Thus if a subscriber has
several subscriptions and cancels any of them, the connection is
terminated.

In this commit, we terminate the connection based on the return value
of tipc_subscrp_create().
Fixes: commit 7fe8097cef5f ("tipc: fix nullpointer bug when subscribing to events")
Reviewed-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agonet: dsa: fix mv88e6xxx switches
Russell King [Sun, 24 Jan 2016 09:22:05 +0000 (09:22 +0000)]
net: dsa: fix mv88e6xxx switches

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit db0e51afa481088e6396f11e02018d64113a6578 ]

Since commit 76e398a62712 ("net: dsa: use switchdev obj for VLAN add/del
ops"), the Marvell 88E6xxx switch has been unable to pass traffic
between ports - any received traffic is discarded by the switch.
Taking a port out of bridge mode and configuring a vlan on it also the
port to start passing traffic.

With the debugfs files re-instated to allow debug of this issue by
comparing the register settings between the working and non-working
case, the reason becomes clear:

     GLOBAL GLOBAL2 SERDES   0    1    2    3    4    5    6
- 7:  1111    707f    2001     2    2    2    2    2    0    2
+ 7:  1111    707f    2001     1    1    1    1    1    0    1

Register 7 for the ports is the default vlan tag register, and in the
non-working setup, it has been set to 2, despite vlan 2 not being
configured.  This causes the switch to drop all packets coming in to
these ports.  The working setup has the default vlan tag register set
to 1, which is the default vlan when none is configured.

Inspection of the code reveals why.  The code prior to this commit
was:

- for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) {
...
- if (!err && vlan->flags & BRIDGE_VLAN_INFO_PVID)
- err = ds->drv->port_pvid_set(ds, p->port, vid);

but the new code is:

+ for (vid = vlan->vid_begin; vid <= vlan->vid_end; ++vid) {
...
+ }
...
+ if (pvid)
+ err = _mv88e6xxx_port_pvid_set(ds, port, vid);

This causes the new code to always set the default vlan to one higher
than the old code.

Fix this.

Fixes: 76e398a62712 ("net: dsa: use switchdev obj for VLAN add/del ops")
Cc: <stable@vger.kernel.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agosctp: allow setting SCTP_SACK_IMMEDIATELY by the application
Marcelo Ricardo Leitner [Fri, 22 Jan 2016 20:29:49 +0000 (18:29 -0200)]
sctp: allow setting SCTP_SACK_IMMEDIATELY by the application

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 27f7ed2b11d42ab6d796e96533c2076ec220affc ]

This patch extends commit b93d6471748d ("sctp: implement the sender side
for SACK-IMMEDIATELY extension") as it didn't white list
SCTP_SACK_IMMEDIATELY on sctp_msghdr_parse(), causing it to be
understood as an invalid flag and returning -EINVAL to the application.

Note that the actual handling of the flag is already there in
sctp_datamsg_from_user().

https://tools.ietf.org/html/rfc7053#section-7

Fixes: b93d6471748d ("sctp: implement the sender side for SACK-IMMEDIATELY extension")
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Vlad Yasevich <vyasevich@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agopptp: fix illegal memory access caused by multiple bind()s
Hannes Frederic Sowa [Fri, 22 Jan 2016 00:39:43 +0000 (01:39 +0100)]
pptp: fix illegal memory access caused by multiple bind()s

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 9a368aff9cb370298fa02feeffa861f2db497c18 ]

Several times already this has been reported as kasan reports caused by
syzkaller and trinity and people always looked at RCU races, but it is
much more simple. :)

In case we bind a pptp socket multiple times, we simply add it to
the callid_sock list but don't remove the old binding. Thus the old
socket stays in the bucket with unused call_id indexes and doesn't get
cleaned up. This causes various forms of kasan reports which were hard
to pinpoint.

Simply don't allow multiple binds and correct error handling in
pptp_bind. Also keep sk_state bits in place in pptp_connect.

Fixes: 00959ade36acad ("PPTP: PPP over IPv4 (Point-to-Point Tunneling Protocol)")
Cc: Dmitry Kozlov <xeb@mail.ru>
Cc: Sasha Levin <sasha.levin@oracle.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Cc: Dave Jones <davej@codemonkey.org.uk>
Reported-by: Dave Jones <davej@codemonkey.org.uk>
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoaf_unix: fix struct pid memory leak
Eric Dumazet [Sun, 24 Jan 2016 21:53:50 +0000 (13:53 -0800)]
af_unix: fix struct pid memory leak

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit fa0dc04df259ba2df3ce1920e9690c7842f8fa4b ]

Dmitry reported a struct pid leak detected by a syzkaller program.

Bug happens in unix_stream_recvmsg() when we break the loop when a
signal is pending, without properly releasing scm.

Fixes: b3ca9b02b007 ("net: fix multithreaded signal handling in unix recv routines")
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Rainer Weikusat <rweikusat@mobileactivedefense.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agotcp: fix NULL deref in tcp_v4_send_ack()
Eric Dumazet [Thu, 21 Jan 2016 16:02:54 +0000 (08:02 -0800)]
tcp: fix NULL deref in tcp_v4_send_ack()

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit e62a123b8ef7c5dc4db2c16383d506860ad21b47 ]

Neal reported crashes with this stack trace :

 RIP: 0010:[<ffffffff8c57231b>] tcp_v4_send_ack+0x41/0x20f
...
 CR2: 0000000000000018 CR3: 000000044005c000 CR4: 00000000001427e0
...
  [<ffffffff8c57258e>] tcp_v4_reqsk_send_ack+0xa5/0xb4
  [<ffffffff8c1a7caa>] tcp_check_req+0x2ea/0x3e0
  [<ffffffff8c19e420>] tcp_rcv_state_process+0x850/0x2500
  [<ffffffff8c1a6d21>] tcp_v4_do_rcv+0x141/0x330
  [<ffffffff8c56cdb2>] sk_backlog_rcv+0x21/0x30
  [<ffffffff8c098bbd>] tcp_recvmsg+0x75d/0xf90
  [<ffffffff8c0a8700>] inet_recvmsg+0x80/0xa0
  [<ffffffff8c17623e>] sock_aio_read+0xee/0x110
  [<ffffffff8c066fcf>] do_sync_read+0x6f/0xa0
  [<ffffffff8c0673a1>] SyS_read+0x1e1/0x290
  [<ffffffff8c5ca262>] system_call_fastpath+0x16/0x1b

The problem here is the skb we provide to tcp_v4_send_ack() had to
be parked in the backlog of a new TCP fastopen child because this child
was owned by the user at the time an out of window packet arrived.

Before queuing a packet, TCP has to set skb->dev to NULL as the device
could disappear before packet is removed from the queue.

Fix this issue by using the net pointer provided by the socket (being a
timewait or a request socket).

IPv6 is immune to the bug : tcp_v6_send_response() already gets the net
pointer from the socket if provided.

Fixes: 168a8f58059a ("tcp: TCP Fast Open Server - main code path")
Reported-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Jerry Chu <hkchu@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agolwt: fix rx checksum setting for lwt devices tunneling over ipv6
Paolo Abeni [Wed, 17 Feb 2016 18:30:01 +0000 (19:30 +0100)]
lwt: fix rx checksum setting for lwt devices tunneling over ipv6

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit c868ee7063bdb53f3ef9eac7bcec84960980b471 ]

the commit 35e2d1152b22 ("tunnels: Allow IPv6 UDP checksums to be
correctly controlled.") changed the default xmit checksum setting
for lwt vxlan/geneve ipv6 tunnels, so that now the checksum is not
set into external UDP header.
This commit changes the rx checksum setting for both lwt vxlan/geneve
devices created by openvswitch accordingly, so that lwt over ipv6
tunnel pairs are again able to communicate with default values.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Jiri Benc <jbenc@redhat.com>
Acked-by: Jesse Gross <jesse@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agotunnels: Allow IPv6 UDP checksums to be correctly controlled.
Jesse Gross [Thu, 21 Jan 2016 00:22:47 +0000 (16:22 -0800)]
tunnels: Allow IPv6 UDP checksums to be correctly controlled.

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 35e2d1152b22eae99c961affbe85374bef05a775 ]

When configuring checksums on UDP tunnels, the flags are different
for IPv4 vs. IPv6 (and reversed). However, when lightweight tunnels
are enabled the flags used are always the IPv4 versions, which are
ignored in the IPv6 code paths. This uses the correct IPv6 flags, so
checksums can be controlled appropriately.

Fixes: a725e514 ("vxlan: metadata based tunneling for IPv6")
Fixes: abe492b4 ("geneve: UDP checksum configuration via netlink")
Signed-off-by: Jesse Gross <jesse@kernel.org>
Acked-by: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agonet: dp83640: Fix tx timestamp overflow handling.
Manfred Rudigier [Wed, 20 Jan 2016 10:22:28 +0000 (11:22 +0100)]
net: dp83640: Fix tx timestamp overflow handling.

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 81e8f2e930fe76b9814c71b9d87c30760b5eb705 ]

PHY status frames are not reliable, the PHY may not be able to send them
during heavy receive traffic. This overflow condition is signaled by the
PHY in the next status frame, but the driver did not make use of it.
Instead it always reported wrong tx timestamps to user space after an
overflow happened because it assigned newly received tx timestamps to old
packets in the queue.

This commit fixes this issue by clearing the tx timestamp queue every time
an overflow happens, so that no timestamps are delivered for overflow
packets. This way time stamping will continue correctly after an overflow.

Signed-off-by: Manfred Rudigier <manfred.rudigier@omicron.at>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agogro: Make GRO aware of lightweight tunnels.
Jesse Gross [Thu, 21 Jan 2016 01:59:49 +0000 (17:59 -0800)]
gro: Make GRO aware of lightweight tunnels.

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit ce87fc6ce3f9f4488546187e3757cf666d9d4a2a ]

GRO is currently not aware of tunnel metadata generated by lightweight
tunnels and stored in the dst. This leads to two possible problems:
 * Incorrectly merging two frames that have different metadata.
 * Leaking of allocated metadata from merged frames.

This avoids those problems by comparing the tunnel information before
merging, similar to how we handle other metadata (such as vlan tags),
and releasing any state when we are done.

Reported-by: John <john.phillips5@hpe.com>
Fixes: 2e15ea39 ("ip_gre: Add support to collect tunnel metadata.")
Signed-off-by: Jesse Gross <jesse@kernel.org>
Acked-by: Eric Dumazet <edumazet@google.com>
Acked-by: Thomas Graf <tgraf@suug.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoaf_iucv: Validate socket address length in iucv_sock_bind()
Ursula Braun [Tue, 19 Jan 2016 09:41:33 +0000 (10:41 +0100)]
af_iucv: Validate socket address length in iucv_sock_bind()

BugLink: http://bugs.launchpad.net/bugs/1553179
[ Upstream commit 52a82e23b9f2a9e1d429c5207f8575784290d008 ]

Signed-off-by: Ursula Braun <ursula.braun@de.ibm.com>
Reported-by: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Evgeny Cherkashin <Eugene.Crosser@ru.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agonet/mlx4_en: Choose time-stamping shift value according to HW frequency
Eugenia Emantayev [Wed, 17 Feb 2016 15:24:23 +0000 (17:24 +0200)]
net/mlx4_en: Choose time-stamping shift value according to HW frequency

BugLink: http://bugs.launchpad.net/bugs/1552627
Previously, the shift value used for time-stamping was constant and didn't
depend on the HW chip frequency. Change that to take the frequency into account
and calculate the maximal value in cycles per wraparound of ten seconds. This
time slot was chosen since it gives a good accuracy in time synchronization.

Algorithm for shift value calculation:
 * Round up the maximal value in cycles to nearest power of two

 * Calculate maximal multiplier by division of all 64 bits set
   to above result

 * Then, invert the function clocksource_khz2mult() to get the shift from
   maximal mult value

Fixes: ec693d47010e ('net/mlx4_en: Add HW timestamping (TS) support')
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit 31c128b66e5b28f468076e4f3ca3025c35342041)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agopowerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages
Russell Currey [Wed, 13 Jan 2016 01:04:32 +0000 (12:04 +1100)]
powerpc/powernv: Fix OPAL_CONSOLE_FLUSH prototype and usages

BugLink: http://bugs.launchpad.net/bugs/1552332
The recently added OPAL API call, OPAL_CONSOLE_FLUSH, originally took no
parameters and returned nothing.  The call was updated to accept the
terminal number to flush, and returned various values depending on the
state of the output buffer.

The prototype has been updated and its usage in the OPAL kmsg dumper has
been modified to support its new behaviour as an incremental flush.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit c88c5d43732a0356f99e5e4d1ad62ab1ea516b81)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agopowerpc/powernv: Add a kmsg_dumper that flushes console output on panic
Russell Currey [Fri, 27 Nov 2015 06:23:07 +0000 (17:23 +1100)]
powerpc/powernv: Add a kmsg_dumper that flushes console output on panic

BugLink: http://bugs.launchpad.net/bugs/1552332
On BMC machines, console output is controlled by the OPAL firmware and is
only flushed when its pollers are called.  When the kernel is in a panic
state, it no longer calls these pollers and thus console output does not
completely flush, causing some output from the panic to be lost.

Output is only actually lost when the kernel is configured to not power off
or reboot after panic (i.e. CONFIG_PANIC_TIMEOUT is set to 0) since OPAL
flushes the console buffer as part of its power down routines.  Before this
patch, however, only partial output would be printed during the timeout wait.

This patch adds a new kmsg_dumper which gets called at panic time to ensure
panic output is not lost.  It accomplishes this by calling OPAL_CONSOLE_FLUSH
in the OPAL API, and if that is not available, the pollers are called enough
times to (hopefully) completely flush the buffer.

The flushing mechanism will only affect output printed at and before the
kmsg_dump call in kernel/panic.c:panic().  As such, the "end Kernel panic"
message may still be truncated as follows:

>Call Trace:
>[c000000f1f603b00] [c0000000008e9458] dump_stack+0x90/0xbc (unreliable)
>[c000000f1f603b30] [c0000000008e7e78] panic+0xf8/0x2c4
>[c000000f1f603bc0] [c000000000be4860] mount_block_root+0x288/0x33c
>[c000000f1f603c80] [c000000000be4d14] prepare_namespace+0x1f4/0x254
>[c000000f1f603d00] [c000000000be43e8] kernel_init_freeable+0x318/0x350
>[c000000f1f603dc0] [c00000000000bd74] kernel_init+0x24/0x130
>[c000000f1f603e30] [c0000000000095b0] ret_from_kernel_thread+0x5c/0xac
>---[ end Kernel panic - not

This functionality is implemented as a kmsg_dumper as it seems to be the
most sensible way to introduce platform-specific functionality to the
panic function.

Signed-off-by: Russell Currey <ruscur@russell.cc>
Reviewed-by: Andrew Donnellan <andrew.donnellan@au1.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
(cherry picked from commit affddff69c55eb68969448f35f59054a370bc7c1)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoqla2xxx: use TARGET_SCF_USE_CPUID flag to indiate CPU Affinity
Quinn Tran [Wed, 10 Feb 2016 23:59:14 +0000 (18:59 -0500)]
qla2xxx: use TARGET_SCF_USE_CPUID flag to indiate CPU Affinity

BugLink: http://bugs.launchpad.net/bugs/1541456
Signed-off-by: Quinn Tran <quinn.tran@qlogic.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Fixes: fb3269b ("qla2xxx: Add selective command queuing")
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit 5327c7dbd1a7fd980608f44789076a636e5ee5fc)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoqla2xxx: Use ATIO type to send correct tmr response
Swapnil Nagle [Thu, 4 Feb 2016 16:45:17 +0000 (11:45 -0500)]
qla2xxx: Use ATIO type to send correct tmr response

BugLink: http://bugs.launchpad.net/bugs/1541456
The function value inside se_cmd can change if the TMR is cancelled.
Use original ATIO Type to correctly determine CTIO response.

Signed-off-by: Swapnil Nagle <swapnil.nagle@purestroage.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit d7236ac368212bd6fc8b45f050136ee53e6a6f2d)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoqla2xxx: Fix stale pointer access.
Quinn Tran [Thu, 4 Feb 2016 16:45:16 +0000 (11:45 -0500)]
qla2xxx: Fix stale pointer access.

BugLink: http://bugs.launchpad.net/bugs/1541456
[ Upstream Commit 84e32a06f4f8756ce9ec3c8dc7e97896575f0771 ]

Commit 84e32a0 ("qla2xxx: Use pci_enable_msix_range() instead of
pci_enable_msix()") introduced a regression when target mode is enabled.
In qla24xx_enable_msix(), ha->max_rsp_queues was incorrectly set
to a value higher than the number of response queues allocated causing
an invalid dereference. Specifically here in qla2x00_init_rings():
    *rsp->in_ptr = 0;

Add additional check to make sure the pointer is valid. following
call stack will be seen

---- 8< ----
RIP: 0010:[<ffffffffa02ccadc>]  [<ffffffffa02ccadc>] qla2x00_init_rings+0xdc/0x320 [qla2xxx]
RSP: 0018:ffff880429447dd8  EFLAGS: 00010082
....
Call Trace:
[<ffffffffa02ceb40>] qla2x00_abort_isp+0x170/0x6b0 [qla2xxx]
[<ffffffffa02c6f77>] qla2x00_do_dpc+0x357/0x7f0 [qla2xxx]
[<ffffffffa02c6c20>] ? qla2x00_relogin+0x260/0x260 [qla2xxx]
[<ffffffff8107d2c9>] kthread+0xc9/0xe0
[<ffffffff8107d200>] ? flush_kthread_worker+0x90/0x90
[<ffffffff8172cc6f>] ret_from_fork+0x3f/0x70
[<ffffffff8107d200>] ? flush_kthread_worker+0x90/0x90
---- 8< ----

Cc: <stable@vger.kernel.org>
Signed-off-by: Quinn Tran <quinn.tran@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit cb43285ff7039fe3c4b0bc476e6d6569c31104f3)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoqla2xxx: Fix TMR ABORT interaction issue between qla2xxx and TCM
Quinn Tran [Tue, 8 Dec 2015 00:48:57 +0000 (19:48 -0500)]
qla2xxx: Fix TMR ABORT interaction issue between qla2xxx and TCM

BugLink: http://bugs.launchpad.net/bugs/1541456
During lun reset, TMR thread from TCM would issue abort
to qla driver.  At abort time, each command is in different
state.  Depending on the state, qla will use the TMR thread
to trigger a command free(cmd_kref--) if command is not
down at firmware.

Signed-off-by: Quinn Tran <quinn.tran@qlogic.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit a07100e00ac42a4474530ce17b4978c9e06bde55)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoqla2xxx: Fix warning reported by static checker
Himanshu Madhani [Wed, 20 Jan 2016 23:42:58 +0000 (15:42 -0800)]
qla2xxx: Fix warning reported by static checker

BugLink: http://bugs.launchpad.net/bugs/1541456
This patch fixes following warning

drivers/scsi/qla2xxx/qla_target.c:3587 qlt_do_ctio_completion()
warn: impossible condition '(logged_out == 41) => (0-1 == 41)'

drivers/scsi/qla2xxx/qla_target.c
 3580                  case CTIO_PORT_LOGGED_OUT:
 3581                  case CTIO_PORT_UNAVAILABLE:
 3582                  {
 3583                          bool logged_out = (status & 0xFFFF);
 3584                          ql_dbg(ql_dbg_tgt_mgt, vha, 0xf059,
 3585                              "qla_target(%d): CTIO with %s status %x "
 3586                              "received (state %x, se_cmd %p)\n", vha->vp_idx,
 3587                              (logged_out == CTIO_PORT_LOGGED_OUT) ?
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                                    Bool cannot equal 0x26.
 3588                              "PORT LOGGED OUT" : "PORT UNAVAILABLE",

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(cherry picked from commit dacb58221805bb72ec46a73826c9e59a587d7d68)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agotarget/transport: add flag to indicate CPU Affinity is observed
Quinn Tran [Wed, 10 Feb 2016 23:59:13 +0000 (18:59 -0500)]
target/transport: add flag to indicate CPU Affinity is observed

BugLink: http://bugs.launchpad.net/bugs/1552332
Signed-off-by: Quinn Tran <quinn.tran@qlogic.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bart.vanassche@sandisk.com>
Fixes: fb3269b ("qla2xxx: Add selective command queuing")
Signed-off-by: Himanshu Madhani <himanshu.madhani@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
(back ported from commit 9095adaab8c1d82707e4e9961b6ad79b62f3361b)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
Conflicts:
include/target/target_core_base.h

8 years agoUBUNTU: [Config] Add s390x zfcp to scsi-modules udeb
Tim Gardner [Thu, 3 Mar 2016 00:15:34 +0000 (17:15 -0700)]
UBUNTU: [Config] Add s390x zfcp to scsi-modules udeb

BugLink: http://bugs.launchpad.net/bugs/1552314
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: Start new release
Tim Gardner [Wed, 2 Mar 2016 15:08:47 +0000 (08:08 -0700)]
UBUNTU: Start new release

Ignore: yes
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: Ubuntu-4.4.0-10.25
Tim Gardner [Wed, 2 Mar 2016 14:07:41 +0000 (07:07 -0700)]
UBUNTU: Ubuntu-4.4.0-10.25

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agosecurity: let security modules use PTRACE_MODE_* with bitmasks
Jann Horn [Wed, 20 Jan 2016 23:00:01 +0000 (15:00 -0800)]
security: let security modules use PTRACE_MODE_* with bitmasks

BugLink: http://bugs.launchpad.net/bugs/1551894
It looks like smack and yama weren't aware that the ptrace mode
can have flags ORed into it - PTRACE_MODE_NOAUDIT until now, but
only for /proc/$pid/stat, and with the PTRACE_MODE_*CREDS patch,
all modes have flags ORed into them.

Signed-off-by: Jann Horn <jann@thejh.net>
Acked-by: Kees Cook <keescook@chromium.org>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Morris <james.l.morris@oracle.com>
Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Andy Lutomirski <luto@kernel.org>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Willy Tarreau <w@1wt.eu>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 3dfb7d8cdbc7ea0c2970450e60818bb3eefbad69)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: (noup) locking/qspinlock: Move __ARCH_SPIN_LOCK_UNLOCKED to qspinlock_...
Dan Streetman [Mon, 29 Feb 2016 19:52:25 +0000 (14:52 -0500)]
UBUNTU: SAUCE: (noup) locking/qspinlock: Move __ARCH_SPIN_LOCK_UNLOCKED to qspinlock_types.h

BugLink: http://bugs.launchpad.net/bugs/1545330
Move the __ARCH_SPIN_LOCK_UNLOCKED definition from qspinlock.h into
qspinlock_types.h.

The definition of __ARCH_SPIN_LOCK_UNLOCKED comes from the build arch's
include files; but on x86 when CONFIG_QUEUED_SPINLOCKS=y, it just
it's defined in asm-generic/qspinlock.h.  In most cases, this doesn't
matter because linux/spinlock.h includes asm/spinlock.h, which for x86
includes asm-generic/qspinlock.h.  However, any code that only includes
linux/mutex.h will break, because it only includes asm/spinlock_types.h.

For example, this breaks systemtap, which only includes mutex.h.

Signed-off-by: Dan Streetman <dan.streetman@canonical.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Waiman Long <Waiman.Long@hpe.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dan Streetman <ddstreet@ieee.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1455907767-17821-1-git-send-email-dan.streetman@canonical.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry-picked from commit b82e530290a0437522720becaf4abdf8ca4cb7d2 git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: (noup) cgroup: fix and restructure error handling in copy_cgroup_ns()
Tejun Heo [Mon, 29 Feb 2016 20:52:45 +0000 (20:52 +0000)]
UBUNTU: SAUCE: (noup) cgroup: fix and restructure error handling in copy_cgroup_ns()

copy_cgroup_ns()'s error handling was broken and the attempt to fix it
d22025570e2e ("cgroup: fix alloc_cgroup_ns() error handling in
copy_cgroup_ns()") was broken too in that it ended up trying an
ERR_PTR() value.

There's only one place where copy_cgroup_ns() needs to perform cleanup
after failure.  Simplify and fix the error handling by removing the
goto's.

(Ported from upstream patch for linux-next - Serge)

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Serge Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agommc: sdhci: Fix DMA descriptor with zero data length
Adrian Hunter [Thu, 26 Nov 2015 12:00:48 +0000 (14:00 +0200)]
mmc: sdhci: Fix DMA descriptor with zero data length

BugLink: http://bugs.launchpad.net/bugs/1520454
SDHCI has built-in DMA called ADMA2.  ADMA2 uses a descriptor
table to define DMA scatter-gather.  Each desciptor can specify
a data length up to 65536 bytes, however the length field is
only 16-bits so zero means 65536.  Consequently, putting zero
when the size is zero must not be allowed.  This patch fixes
one case where zero data length could be set inadvertently.

The problem happens because unaligned data gets split and the
code did not consider that the remaining aligned portion might
be zero length.  That case really only happens for SDIO because
SD and eMMC cards transfer blocks that are invariably sector-
aligned.  For SDIO, access to function registers is done by
data transfer (CMD53) when the register is bigger than 1 byte.
Generally registers are 4 bytes but 2-byte registers are possible.
So DMA of 4 bytes or less can happen.  When 32-bit DMA is used,
the data alignment must be 4, so 4-byte transfers won't casue a
problem, but a 2-byte transfer could.  However with the introduction
of 64-bit DMA, the data alignment for 64-bit DMA was made 8 bytes,
so all 4-byte transfers not on 8-byte boundaries get "split" into
a 4-byte chunk and a 0-byte chunk, thereby hitting the bug.

In fact, a closer look at the SDHCI specs indicates that only the
descriptor table requires 8-byte alignment for 64-bit DMA.  That
will be dealt with in a separate patch, but the potential for a
2-byte access remains, so this fix is needed anyway.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: stable@vger.kernel.org # v3.19+
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
(cherry picked from commit 347ea32dc118326c4f2636928239a29d192cc9b8)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agommc: sdhci: 64-bit DMA actually has 4-byte alignment
Adrian Hunter [Thu, 26 Nov 2015 12:00:49 +0000 (14:00 +0200)]
mmc: sdhci: 64-bit DMA actually has 4-byte alignment

BugLink: http://bugs.launchpad.net/bugs/1520454
The version 3.00 SDHCI spec. was a bit unclear about the
required data alignment for 64-bit DMA, whereas the version
4.10 spec. uses different language and indicates that only
4-byte alignment is required rather than the 8-byte alignment
currently implemented.  That make no difference to SD and EMMC
which invariably transfer data in sector-aligned blocks.
However with SDIO, it results in using more DMA descriptors
than necessary.  Theoretically that slows DMA slightly although
DMA is not the limiting factor for throughput, so there is no
discernable impact on performance.  Nevertheless, the driver
should follw the spec unless there is good reason not to, so
this patch corrects the alignment criterion.

There is a more complicated criterion for the DMA descriptor
table itself.  However the table is allocated by dma_alloc_coherent()
which allocates pages (i.e. aligned to a page boundary).
For simplicity just check it is 8-byte aligned, but add a comment
that some Intel controllers actually require 8-byte alignment
even when using 32-bit DMA.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
(cherry picked from commit 04a5ae6fdd018af29675eb8b6c2550c87f471570)
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: Start new release
Tim Gardner [Mon, 29 Feb 2016 20:04:41 +0000 (13:04 -0700)]
UBUNTU: Start new release

Ignore: yes
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: Ubuntu-4.4.0-9.24
Tim Gardner [Mon, 29 Feb 2016 16:21:49 +0000 (09:21 -0700)]
UBUNTU: Ubuntu-4.4.0-9.24

Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: [Config] drop linux-image-3.0 provides
Andy Whitcroft [Fri, 26 Feb 2016 21:20:43 +0000 (21:20 +0000)]
UBUNTU: [Config] drop linux-image-3.0 provides

Signed-off-by: Andy Whitcroft <apw@canonical.com>
8 years agoUBUNTU: [Config] supply zfs dkms Provides: based on do_zfs
Andy Whitcroft [Fri, 26 Feb 2016 20:38:49 +0000 (20:38 +0000)]
UBUNTU: [Config] supply zfs dkms Provides: based on do_zfs

Signed-off-by: Andy Whitcroft <apw@canonical.com>
8 years agoUBUNTU: [Debian] supply zfs dkms Provides: based on do_zfs
Andy Whitcroft [Fri, 26 Feb 2016 20:38:49 +0000 (20:38 +0000)]
UBUNTU: [Debian] supply zfs dkms Provides: based on do_zfs

Signed-off-by: Andy Whitcroft <apw@canonical.com>
8 years agoUBUNTU: SAUCE: Move replacedby allocation into label_alloc
John Johansen [Sat, 6 Feb 2016 22:40:15 +0000 (14:40 -0800)]
UBUNTU: SAUCE: Move replacedby allocation into label_alloc

label_merge_insert - requires that a replacedby exists on either the
existing in tree label or on the label attempting to be added. If
we get into a situation where the in tree label is replacedby and
the replacedby doesn't exists an oops will occur.

Unfortunately we can not know if the intree label was created with or
without a replacedby so, we always have to create the replacedby. Instead
Make sure this is so be tying to the label_allocation.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: Fixup: __label_update() still doesn't handle some cases correctly.
John Johansen [Wed, 20 Jan 2016 19:18:21 +0000 (11:18 -0800)]
UBUNTU: SAUCE: Fixup: __label_update() still doesn't handle some cases correctly.

The old label needs to be removed, so call label_remove on it. This is
only needed by the inv path but that path shares code and removing
won't hurt the non-inv path.

Also the proxy redirect needs to be done at the insert or after to make
sure the redirect target is correct.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: fix: audit "no_new_privs" case for exec failure
John Johansen [Tue, 26 Jan 2016 12:23:28 +0000 (04:23 -0800)]
UBUNTU: SAUCE: fix: audit "no_new_privs" case for exec failure

This also adds auditing of the targets name for Denials due to ptrace
restrictions.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: fixup: warning about aa_label_vec_find_or_create not being static
John Johansen [Fri, 5 Feb 2016 07:28:11 +0000 (23:28 -0800)]
UBUNTU: SAUCE: fixup: warning about aa_label_vec_find_or_create not being static

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: fix refcount race when finding a child profile
John Johansen [Thu, 17 Dec 2015 02:09:10 +0000 (18:09 -0800)]
UBUNTU: SAUCE: apparmor: fix refcount race when finding a child profile

When finding a child profile via an rcu critical section, the profile
may be put and scheduled for deletion after the child is found but
before its refcount is incremented.

Protect against this by repeating the lookup if the profiles refcount
is 0 and is one its way to deletion.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: fixup: cast poison values to remove warnings
John Johansen [Tue, 15 Dec 2015 13:01:53 +0000 (05:01 -0800)]
UBUNTU: SAUCE: fixup: cast poison values to remove warnings

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: fixup: get rid of unused var build warning
John Johansen [Tue, 15 Dec 2015 12:56:49 +0000 (04:56 -0800)]
UBUNTU: SAUCE: fixup: get rid of unused var build warning

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: fixup: 20/23 locking issue around in __label_update
John Johansen [Tue, 15 Dec 2015 12:45:56 +0000 (04:45 -0800)]
UBUNTU: SAUCE: fixup: 20/23 locking issue around in __label_update

aa_label_remove needs to take the ls write lock but, we are already
holding it so move to __aa_label_remove

also fix bug where the old label is not being forwarded to the new label
when it results in a new namespace due to an the old ns being removed.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: fixup: make __share_replacedby private to get rid of build warning
John Johansen [Tue, 15 Dec 2015 12:43:30 +0000 (04:43 -0800)]
UBUNTU: SAUCE: fixup: make __share_replacedby private to get rid of build warning

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: fix: replacedby forwarding is not being properly update when ns is...
John Johansen [Tue, 15 Dec 2015 12:42:12 +0000 (04:42 -0800)]
UBUNTU: SAUCE: fix: replacedby forwarding is not being properly update when ns is destroyed

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: fix log of apparmor audit message when kern_path() fails
John Johansen [Wed, 2 Dec 2015 11:33:02 +0000 (03:33 -0800)]
UBUNTU: SAUCE: apparmor: fix log of apparmor audit message when kern_path() fails

BugLink: http://bugs.launchpad.net/bugs/1482943
apparmor use kern_path() to lookup the path of the dev_name, and when
this fails apparmor emits a DENIED log message. However for bind and
move mounts the underlying code does a call to kern_path() regardless
of apparmor being present and so has the same failure.

In these cases when kern_path() fails apparmor is not responsible for
the mount failure as the kernel will fail the mount regarless of
apparmor's presence, so just return the error and don't log an apparmor
audit message.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: fixup: cleanup return handling of labels
John Johansen [Mon, 23 Nov 2015 21:51:42 +0000 (13:51 -0800)]
UBUNTU: SAUCE: fixup: cleanup return handling of labels

fixup of
[20/23] apparmor: Fix: refcount bug when inserting label update that transitions ns

may want to split into 2 patches

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: fix: ref count leak when profile sha1 hash is read
John Johansen [Wed, 18 Nov 2015 19:41:05 +0000 (11:41 -0800)]
UBUNTU: SAUCE: apparmor: fix: ref count leak when profile sha1 hash is read

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: Fix: query label file permission
John Johansen [Thu, 15 Oct 2015 22:33:30 +0000 (15:33 -0700)]
UBUNTU: SAUCE: apparmor: Fix: query label file permission

File permissions have not been updated to use the newer
compute_perms fn yet. So export the fn to compute the file
permissions and use it in query_label until file permissions
have been converted.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: Don't remove label on rcu callback if the label has already...
John Johansen [Wed, 14 Oct 2015 12:08:27 +0000 (05:08 -0700)]
UBUNTU: SAUCE: apparmor: Don't remove label on rcu callback if the label has already been removed

Double calling remove on a label does a double update of the replacedby
which is extra work and messes up tracking stats.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: Fix: break circular refcount for label that is directly...
John Johansen [Wed, 14 Oct 2015 12:00:54 +0000 (05:00 -0700)]
UBUNTU: SAUCE: apparmor: Fix: break circular refcount for label that is directly freed.

There are a few cases when racing an update where a label can be allocated
with its replacedby, and end up being freed directly because it lost the
race and will not be used. However without breaking the circular ref
between the label and its replacedby, a double free of the label will
occur:
   label being freed
      ref count from label to replacedby is put,
         ref count from replacedby is put
    label is scheduled to be freed
         replacedby is freed
      label is freed
   rcu call back to free label triggers
      label is freed again

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: Fix: refcount bug when inserting label update that transitio...
John Johansen [Wed, 14 Oct 2015 11:56:02 +0000 (04:56 -0700)]
UBUNTU: SAUCE: apparmor: Fix: refcount bug when inserting label update that transitions ns

We already have a refcount on l, and aa_label_insert() returns another
refcount so we need to make sure to put the extra one.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: Fix: now that insert can force replacement use it instead...
John Johansen [Wed, 14 Oct 2015 11:52:44 +0000 (04:52 -0700)]
UBUNTU: SAUCE: apparmor: Fix: now that insert can force replacement use it instead of remove_and_insert

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor Fix: refcount bug in pivotroot mediation
John Johansen [Sun, 11 Oct 2015 00:43:23 +0000 (17:43 -0700)]
UBUNTU: SAUCE: apparmor Fix: refcount bug in pivotroot mediation

pivotroot medition may change the tasks current cred if the a transition
rule is defined. However aa_begin_current_label(), and
aa_end_current_label() define a critical section block where the tasks
cred label are not allowed to be updated. Specifically they do not take
a refcount on the tasks cred, but will return a refcounted label IF
there is an updated version of the label that can not be immediately
updated. The aa_end_current_label() fn detects whether the label used
has a refcount to put by comparing the label to the task's cred label,
and if its different putting label.

When the task cred's label is changed within this critical section,
the cred update will put the creds label reference, and then the
aa_begin_current_label() fn will detect the difference in the cred
and working label and subsequentially do an extra put on the label.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: ensure that repacedby sharing is done correctly
John Johansen [Tue, 13 Oct 2015 16:35:13 +0000 (09:35 -0700)]
UBUNTU: SAUCE: apparmor: ensure that repacedby sharing is done correctly

don't put the reference when sharing a replacedby until after all
over references are gotten, otherwise the label might drop to a 0
refcount and start being freed, even though it should be held by
the shared replacedby.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: Fix: update replacedby allocation to take a gfp parameter
John Johansen [Fri, 9 Oct 2015 20:30:05 +0000 (13:30 -0700)]
UBUNTU: SAUCE: apparmor: Fix: update replacedby allocation to take a gfp parameter

BugLink: http://bugs.launchpad.net/bugs/1448912
The allocation of a replacedby in the label merge path needs to be
done using GFP_ATOMIC.

specifically aa_label_merge() is called from a context in file.c that
passes GFP_ATOMIC to aa_label_merge() which then is doing GFP_KERNEL
for the replacedby allocation.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: Fix: convert replacedby update to be protected by the labels...
John Johansen [Thu, 8 Oct 2015 20:35:21 +0000 (13:35 -0700)]
UBUNTU: SAUCE: apparmor: Fix: convert replacedby update to be protected by the labelset lock

BugLink: http://bugs.launchpad.net/bugs/1448912
replacedby updates must be able to occur when in an rcu critical sections,
and when spin locks are held. As such it can not use a mutex lock to
protect its critical section. Since replacedby updates are accompanied by
labelset insertion and removals use the labelset write lock to protect
the update critical section.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: Fix: add required locking of __aa_update_replacedby on merge...
John Johansen [Wed, 7 Oct 2015 14:57:04 +0000 (07:57 -0700)]
UBUNTU: SAUCE: apparmor: Fix: add required locking of __aa_update_replacedby on merge path

BugLink: http://bugs.launchpad.net/bugs/1448912
__aa_update_replacedby needs the ns lock held, this is done for profile
load/replace/remove case and the label_update case but not when called
from the label merge paths.

NOTE: this is just a conceptal "fix", it can not be validly used as
      label_merge is called from atomic context and taking a mutex_lock
      may sleep.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: Fix: deadlock in aa_put_label() call chain
John Johansen [Wed, 7 Oct 2015 14:47:45 +0000 (07:47 -0700)]
UBUNTU: SAUCE: apparmor: Fix: deadlock in aa_put_label() call chain

BugLink: http://bugs.launchpad.net/bugs/1448912
When aa_put_label() is called from a fn that is holding the labelset
lock, it can result in a deadlock if the put count reaches 0 triggering
the kref callback, which tries to take the label set lock.

Rework so the label_kref callback deferrs removing the label from
the labelset until the rcu callback, ensuring the lock is not held
by the calling code.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: Fix: label_vec_merge insertion
John Johansen [Wed, 7 Oct 2015 00:00:35 +0000 (17:00 -0700)]
UBUNTU: SAUCE: apparmor: Fix: label_vec_merge insertion

BugLink: http://bugs.launchpad.net/bugs/1448912
label_vec_merge should only do the insertion after the vector is copied.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: Fix: ensure new labels resulting from merge have a replacedby
John Johansen [Sun, 4 Oct 2015 22:51:45 +0000 (15:51 -0700)]
UBUNTU: SAUCE: apparmor: Fix: ensure new labels resulting from merge have a replacedby

BugLink: http://bugs.launchpad.net/bugs/1448912
Basic profile labels always have a replacedby allocated and set but
the code used to create labels from merges without a replacedby and
let label_update allocate and set those labels replacedby structs.

While the label_merge fix addressed the race between label_merge and
label_update, it still left a bug where labels from merges race
label_update so that they remain permanently stale, because they
don't have proper replacedby information that should be updated during
their replacement.

Specifically a label from a merge will not have a replacedby if it has
never been through a label_update cycle, and the direct replacement
from the label_merge fix is NOT updating the replacedby to avoid doing
allocations under lock. This results in the old label being permanently
diconnected and its references never updating correctly.

To fix this create all labels that result from a merge with a replacedby.
This results in all labels inserted into the labelset having valid
replacedby structs. In the case that the insertion of a label results
in a replacement due to it creating an updated version of the label,
the old labels replacedby will be reused and the new one can be freed.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: Fix: refcount leak in aa_label_merge
John Johansen [Thu, 5 Nov 2015 04:17:01 +0000 (20:17 -0800)]
UBUNTU: SAUCE: apparmor: Fix: refcount leak in aa_label_merge

if aa_label_alloc() fails the refs taken on a and b are leaked.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: Fix: refcount race between locating in labelset and get
John Johansen [Wed, 30 Sep 2015 11:00:22 +0000 (04:00 -0700)]
UBUNTU: SAUCE: apparmor: Fix: refcount race between locating in labelset and get

BugLink: http://bugs.launchpad.net/bugs/1448912
The labelset does not hold a refcount on the labels its contains, all
lookups are done under lock. However in the window between finding a
label in the labelset and getting its reference, where the last label
reference can be put causing the label to begin its cleanup.

Ensure the any label in the set has valid reference before returning
its reference. We do this by getting its reference and failing on
that reference if the label has begun cleanup.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: Fix: label merge handling of marking unconfined and stale
John Johansen [Thu, 24 Sep 2015 12:22:24 +0000 (05:22 -0700)]
UBUNTU: SAUCE: apparmor: Fix: label merge handling of marking unconfined and stale

BugLink: http://bugs.launchpad.net/bugs/1448912
Fix a couple of bugs in label merge.
- the unconfined status may not be correctly set in the case of a stale
  profile
- if merge(A,B) == A' where A' is revised none stale version of A then
  the insertion of A' to replace A can fail.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: add underscores to indicate aa_label_next_not_in_set() use...
John Johansen [Tue, 25 Aug 2015 22:40:20 +0000 (15:40 -0700)]
UBUNTU: SAUCE: apparmor: add underscores to indicate aa_label_next_not_in_set() use needs locking

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>
8 years agoUBUNTU: SAUCE: apparmor: debug: POISON label and replaceby pointer on free
John Johansen [Fri, 17 Jul 2015 19:36:10 +0000 (12:36 -0700)]
UBUNTU: SAUCE: apparmor: debug: POISON label and replaceby pointer on free

And label and replacedby reference poisoning to make catching and
debugging label refcount errors easier.

Signed-off-by: John Johansen <john.johansen@canonical.com>
Signed-off-by: Tim Gardner <tim.gardner@canonical.com>