Several updates on the MAINTAINERS section for Netfilter:
1) Add Florian Westphal, he's been part of the coreteam since October 2012.
He's been dedicating tireless efforts to improve the Netfilter codebase,
fix bugs and push ongoing new developments ever since.
2) Add http://www.nftables.org/ URL, currently pointing to
http://www.netfilter.org.
3) Update project status from Supported to Maintained.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org> Acked-by: Florian Westphal <fw@strlen.de>
Merge tag 'ipvs-fixes-for-v4.11' of http://git.kernel.org/pub/scm/linux/kernel/git/horms/ipvs
Simon Horman says:
====================
IPVS Fixes for v4.11
I would also like it considered for stable.
* Explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
to avoid oops caused by IPVS accesing IPv6 routing code in such
circumstances.
====================
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Paolo Abeni [Thu, 20 Apr 2017 09:44:16 +0000 (11:44 +0200)]
ipvs: explicitly forbid ipv6 service/dest creation if ipv6 mod is disabled
When creating a new ipvs service, ipv6 addresses are always accepted
if CONFIG_IP_VS_IPV6 is enabled. On dest creation the address family
is not explicitly checked.
This allows the user-space to configure ipvs services even if the
system is booted with ipv6.disable=1. On specific configuration, ipvs
can try to call ipv6 routing code at setup time, causing the kernel to
oops due to fib6_rules_ops being NULL.
This change addresses the issue adding a check for the ipv6
module being enabled while validating ipv6 service operations and
adding the same validation for dest operations.
According to git history, this issue is apparently present since
the introduction of ipv6 support, and the oops can be triggered
since commit 09571c7ae30865ad ("IPVS: Add function to determine
if IPv6 address is local")
Fixes: 09571c7ae30865ad ("IPVS: Add function to determine if IPv6 address is local") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Simon Horman <horms@verge.net.au>
Dave Johnson [Mon, 24 Apr 2017 13:11:24 +0000 (09:11 -0400)]
netfilter: Wrong icmp6 checksum for ICMPV6_TIME_EXCEED in reverse SNATv6 path
When recalculating the outer ICMPv6 checksum for a reverse path NATv6
such as ICMPV6_TIME_EXCEED nf_nat_icmpv6_reply_translation() was
accessing data beyond the headlen of the skb for non-linear skb. This
resulted in incorrect ICMPv6 checksum as garbage data was used.
Patch replaces csum_partial() with skb_checksum() which supports
non-linear skbs similar to nf_nat_icmp_reply_translation() from ipv4.
Signed-off-by: Dave Johnson <dave-kernel@centerclick.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nft_dynset: continue to next expr if _OP_ADD succeeded
Currently, after adding the following nft rules:
# nft add set x target1 { type ipv4_addr \; flags timeout \;}
# nft add rule x y set add ip daddr timeout 1d @target1 counter
the counters will always be zero despite of the elements are added
to the dynamic set "target1" or not, as we will break the nft expr
traversal unconditionally:
# nft list ruleset
...
set target1 {
...
elements = { 8.8.8.8 expires 23h59m53s}
}
chain output {
...
set add ip daddr timeout 1d @target1 counter packets 0 bytes 0
^ ^
...
}
Since we add the elements to the set successfully, we should continue
to the next expression.
Additionally, if elements are added to "flow table" successfully, we
will _always_ continue to the next expr, even if the operation is
_OP_ADD. So it's better to keep them to be consistent.
Fixes: 22fe54d5fefc ("netfilter: nf_tables: add support for dynamic set updates") Reported-by: Robert White <rwhite@pobox.com> Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
bridge: ebtables: fix reception of frames DNAT-ed to bridge device/port
When trying to redirect bridged frames to the bridge device itself or
a bridge port (brouting) via the dnat target then this currently fails:
The ethernet destination of the frame is dnat'ed to the MAC address of
the bridge device or port just fine. However, the IP code drops it in
the beginning of ip_input.c/ip_rcv() as the dnat target left
the skb->pkt_type as PACKET_OTHERHOST.
Fixing this by resetting skb->pkt_type to an appropriate type after
dnat'ing.
Signed-off-by: Linus Lüssing <linus.luessing@c0d3.blue> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Peter Tirsek [Tue, 18 Apr 2017 17:39:58 +0000 (12:39 -0500)]
netfilter: xt_socket: Fix broken IPv6 handling
Commit 834184b1f3a4 ("netfilter: defrag: only register defrag
functionality if needed") used the outdated XT_SOCKET_HAVE_IPV6 macro
which was removed earlier in commit 8db4c5be88f6 ("netfilter: move
socket lookup infrastructure to nf_socket_ipv{4,6}.c"). With that macro
never being defined, the xt_socket match emits an "Unknown family 10"
warning when used with IPv6:
netfilter: ctnetlink: acquire ct->lock before operating nf_ct_seqadj
We should acquire the ct->lock before accessing or modifying the
nf_ct_seqadj, as another CPU may modify the nf_ct_seqadj at the same
time during its packet proccessing.
Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: ctnetlink: make it safer when updating ct->status
After converting to use rcu for conntrack hash, one CPU may update
the ct->status via ctnetlink, while another CPU may process the
packets and update the ct->status.
So the non-atomic operation "ct->status |= status;" via ctnetlink
becomes unsafe, and this may clear the IPS_DYING_BIT bit set by
another CPU unexpectedly. For example:
CPU0 CPU1
ctnetlink_change_status __nf_conntrack_find_get
old = ct->status nf_ct_gc_expired
- nf_ct_kill
- test_and_set_bit(IPS_DYING_BIT
new = old | status; -
ct->status = new; <-- oops, _DYING_ is cleared!
Now using a series of atomic bit operation to solve the above issue.
Also note, user shouldn't set IPS_TEMPLATE, IPS_SEQ_ADJUST directly,
so make these two bits be unchangable too.
If we set the IPS_TEMPLATE_BIT, ct will be freed by nf_ct_tmpl_free,
but actually it is alloced by nf_conntrack_alloc.
If we set the IPS_SEQ_ADJUST_BIT, this may cause the NULL pointer
deference, as the nfct_seqadj(ct) maybe NULL.
Last, add some comments to describe the logic change due to the
commit a963d710f367 ("netfilter: ctnetlink: Fix regression in CTA_STATUS
processing"), which makes me feel a little confusing.
Fixes: 76507f69c44e ("[NETFILTER]: nf_conntrack: use RCU for conntrack hash") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: ctnetlink: fix deadlock due to acquire _expect_lock twice
Currently, ctnetlink_change_conntrack is always protected by _expect_lock,
but this will cause a deadlock when deleting the helper from a conntrack,
as the _expect_lock will be acquired again by nf_ct_remove_expectations:
Since the operations are unrelated to nf_ct_expect, so we can drop the
_expect_lock. Also note, after removing the _expect_lock protection,
another CPU may invoke nf_conntrack_helper_unregister, so we should
use rcu_read_lock to protect __nf_conntrack_helper_find invoked by
ctnetlink_change_helper.
Fixes: ca7433df3a67 ("netfilter: conntrack: seperate expect locking from nf_conntrack_lock") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: ctnetlink: drop the incorrect cthelper module request
First, when creating a new ct, we will invoke request_module to try to
load the related inkernel cthelper. So there's no need to call the
request_module again when updating the ct helpinfo.
Second, ctnetlink_change_helper may be called with rcu_read_lock held,
i.e. rcu_read_lock -> nfqnl_recv_verdict -> nfqnl_ct_parse ->
ctnetlink_glue_parse -> ctnetlink_glue_parse_ct ->
ctnetlink_change_helper. But the request_module invocation may sleep,
so we can't call it with the rcu_read_lock held.
Remove it now.
Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nft_set_bitmap: free dummy elements when destroy the set
We forget to free dummy elements when deleting the set. So when I was
running nft-test.py, I saw many kmemleak warnings:
kmemleak: 1344 new suspected memory leaks ...
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff8800631345c8 (size 32):
comm "nft", pid 9075, jiffies 4295743309 (age 1354.815s)
hex dump (first 32 bytes):
f8 63 13 63 00 88 ff ff 88 79 13 63 00 88 ff ff .c.c.....y.c....
04 0c 00 00 00 00 00 00 00 00 00 00 08 03 00 00 ................
backtrace:
[<ffffffff819059da>] kmemleak_alloc+0x4a/0xa0
[<ffffffff81288174>] __kmalloc+0x164/0x310
[<ffffffffa061269d>] nft_set_elem_init+0x3d/0x1b0 [nf_tables]
[<ffffffffa06130da>] nft_add_set_elem+0x45a/0x8c0 [nf_tables]
[<ffffffffa0613645>] nf_tables_newsetelem+0x105/0x1d0 [nf_tables]
[<ffffffffa05fe6d4>] nfnetlink_rcv+0x414/0x770 [nfnetlink]
[<ffffffff817f0ca6>] netlink_unicast+0x1f6/0x310
[<ffffffff817f10c6>] netlink_sendmsg+0x306/0x3b0
...
Fixes: e920dde516088 ("netfilter: nft_set_bitmap: keep a list of dummy elements") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_ct_helper: permit cthelpers with different names via nfnetlink
cthelpers added via nfnetlink may have the same tuple, i.e. except for
the l3proto and l4proto, other fields are all zero. So even with the
different names, we will also fail to add them:
# nfct helper add ssdp inet udp
# nfct helper add tftp inet udp
nfct v1.4.3: netlink error: File exists
So in order to avoid unpredictable behaviour, we should:
1. cthelpers can be selected by nft ct helper obj or xt_CT target, so
report error if duplicated { name, l3proto, l4proto } tuple exist.
2. cthelpers can be selected by nf_ct_tuple_src_mask_cmp when
nf_ct_auto_assign_helper is enabled, so also report error if duplicated
{ l3proto, l4proto, src-port } tuple exist.
Also note, if the cthelper is added from userspace, then the src-port will
always be zero, it's invalid for nf_ct_auto_assign_helper, so there's no
need to check the second point listed above.
Fixes: 893e093c786c ("netfilter: nf_ct_helper: bail out on duplicated helpers") Signed-off-by: Liping Zhang <zlpnobody@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
openvswitch: Delete conntrack entry clashing with an expectation.
Conntrack helpers do not check for a potentially clashing conntrack
entry when creating a new expectation. Also, nf_conntrack_in() will
check expectations (via init_conntrack()) only if a conntrack entry
can not be found. The expectation for a packet which also matches an
existing conntrack entry will not be removed by conntrack, and is
currently handled inconsistently by OVS, as OVS expects the
expectation to be removed when the connection tracking entry matching
that expectation is confirmed.
It should be noted that normally an IP stack would not allow reuse of
a 5-tuple of an old (possibly lingering) connection for a new data
connection, so this is somewhat unlikely corner case. However, it is
possible that a misbehaving source could cause conntrack entries be
created that could then interfere with new related connections.
Fix this in the OVS module by deleting the clashing conntrack entry
after an expectation has been matched. This causes the following
nf_conntrack_in() call also find the expectation and remove it when
creating the new conntrack entry, as well as the forthcoming reply
direction packets to match the new related connection instead of the
old clashing conntrack entry.
Fixes: 7f8a436eaa2c ("openvswitch: Add conntrack action") Reported-by: Yang Song <yangsong@vmware.com> Signed-off-by: Jarno Rajahalme <jarno@ovn.org> Acked-by: Joe Stringer <joe@ovn.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
1. When nf_ct_timeout_ext_add failed in xt_ct_set_timeout, it should
free the timeout refcnt.
Now goto the err_put_timeout error handler instead of going ahead.
2. When the time policy is not found, we should call module_put.
Otherwise, the related cthelper module cannot be removed anymore.
It is easy to reproduce by typing the following command:
# iptables -t raw -A OUTPUT -p tcp -j CT --helper ftp --timeout xxx
2) Verify lengths properly in IPSEC reqeusts, from Herbert Xu.
3) Fix out of bounds access in ipv6 segment routing code, from David
Lebrun.
4) Don't write into the header of cloned SKBs in smsc95xx driver, from
James Hughes.
5) Several other drivers have this bug too, fix them. From Eric
Dumazet.
6) Fix access to uninitialized data in TC action cookie code, from
Wolfgang Bumiller.
7) Fix double free in IPV6 segment routing, again from David Lebrun.
8) Don't let userspace set the RTF_PCPU flag, oops. From David Ahern.
9) Fix use after free in qrtr code, from Dan Carpenter.
10) Don't double-destroy devices in ip6mr code, from Nikolay
Aleksandrov.
11) Don't pass out-of-range TX queue indices into drivers, from Tushar
Dave.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
netpoll: Check for skb->queue_mapping
ip6mr: fix notification device destruction
bpf, doc: update bpf maintainers entry
net: qrtr: potential use after free in qrtr_sendmsg()
bpf: Fix values type used in test_maps
net: ipv6: RTF_PCPU should not be settable from userspace
gso: Validate assumption of frag_list segementation
kaweth: use skb_cow_head() to deal with cloned skbs
ch9200: use skb_cow_head() to deal with cloned skbs
lan78xx: use skb_cow_head() to deal with cloned skbs
sr9700: use skb_cow_head() to deal with cloned skbs
cx82310_eth: use skb_cow_head() to deal with cloned skbs
smsc75xx: use skb_cow_head() to deal with cloned skbs
ipv6: sr: fix double free of skb after handling invalid SRH
MAINTAINERS: Add "B:" field for networking.
net sched actions: allocate act cookie early
qed: Fix issue in populating the PFC config paramters.
qed: Fix possible system hang in the dcbnl-getdcbx() path.
qed: Fix sending an invalid PFC error mask to MFW.
qed: Fix possible error in populating max_tc field.
...
Tushar Dave [Thu, 20 Apr 2017 22:57:31 +0000 (15:57 -0700)]
netpoll: Check for skb->queue_mapping
Reducing real_num_tx_queues needs to be in sync with skb queue_mapping
otherwise skbs with queue_mapping greater than real_num_tx_queues
can be sent to the underlying driver and can result in kernel panic.
One such event is running netconsole and enabling VF on the same
device. Or running netconsole and changing number of tx queues via
ethtool on same device.
Andrey Konovalov reported a BUG caused by the ip6mr code which is caused
because we call unregister_netdevice_many for a device that is already
being destroyed. In IPv4's ipmr that has been resolved by two commits
long time ago by introducing the "notify" parameter to the delete
function and avoiding the unregister when called from a notifier, so
let's do the same for ip6mr.
Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 20 Apr 2017 15:27:58 +0000 (17:27 +0200)]
bpf, doc: update bpf maintainers entry
Add various related files that have been missing under
BPF entry covering essential parts of its infrastructure
and also add myself as co-maintainer.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 20 Apr 2017 10:21:30 +0000 (13:21 +0300)]
net: qrtr: potential use after free in qrtr_sendmsg()
If skb_pad() fails then it frees the skb so we should check for errors.
Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Miller [Thu, 20 Apr 2017 19:20:16 +0000 (15:20 -0400)]
bpf: Fix values type used in test_maps
Maps of per-cpu type have their value element size adjusted to 8 if it
is specified smaller during various map operations.
This makes test_maps as a 32-bit binary fail, in fact the kernel
writes past the end of the value's array on the user's stack.
To be quite honest, I think the kernel should reject creation of a
per-cpu map that doesn't have a value size of at least 8 if that's
what the kernel is going to silently adjust to later.
If the user passed something smaller, it is a sizeof() calcualtion
based upon the type they will actually use (just like in this testcase
code) in later calls to the map operations.
Fixes: df570f577231 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY") Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit
set. Flags passed to the kernel are blindly copied to the allocated
rt6_info by ip6_route_info_create making a newly inserted route appear
as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set
and expects rt->dst.from to be set - which it is not since it is not
really a per-cpu copy. The subsequent call to __ip6_dst_alloc then
generates the fault.
Fix by checking for the flag and failing with EINVAL.
Fixes: d52d3997f843f ("ipv6: Create percpu rt6_info") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilan Tayari [Wed, 19 Apr 2017 18:26:07 +0000 (21:26 +0300)]
gso: Validate assumption of frag_list segementation
Commit 07b26c9454a2 ("gso: Support partial splitting at the frag_list
pointer") assumes that all SKBs in a frag_list (except maybe the last
one) contain the same amount of GSO payload.
This assumption is not always correct, resulting in the following
warning message in the log:
skb_segment: too many frags
For example, mlx5 driver in Striding RQ mode creates some RX SKBs with
one frag, and some with 2 frags.
After GRO, the frag_list SKBs end up having different amounts of payload.
If this frag_list SKB is then forwarded, the aforementioned assumption
is violated.
Validate the assumption, and fall back to software GSO if it not true.
Change-Id: Ia03983f4a47b6534dd987d7a2aad96d54d46d212 Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer") Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:26 +0000 (09:59 -0700)]
kaweth: use skb_cow_head() to deal with cloned skbs
We can use skb_cow_head() to properly deal with clones,
especially the ones coming from TCP stack that allow their head being
modified. This avoids a copy.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:25 +0000 (09:59 -0700)]
ch9200: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: 4a476bd6d1d9 ("usbnet: New driver for QinHeng CH9200 devices") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:24 +0000 (09:59 -0700)]
lan78xx: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Cc: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:23 +0000 (09:59 -0700)]
sr9700: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: c9b37458e956 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:22 +0000 (09:59 -0700)]
cx82310_eth: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: cc28a20e77b2 ("introduce cx82310_eth: Conexant CX82310-based ADSL router USB ethernet driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:21 +0000 (09:59 -0700)]
smsc75xx: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: d0cad871703b ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Wed, 19 Apr 2017 14:10:19 +0000 (16:10 +0200)]
ipv6: sr: fix double free of skb after handling invalid SRH
The icmpv6_param_prob() function already does a kfree_skb(),
this patch removes the duplicate one.
Fixes: 1ababeba4a21f3dba3da3523c670b207fb2feb62 ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Just two fixes.
The first fixes kprobing a stdu, and is marked for stable as it's been
broken for ~ever. In hindsight this could have gone in next.
The other is a fix for a change we merged this cycle, where if we take
a certain exception when the kernel is running relocated (currently
only used for kdump), we checkstop the box.
Thanks to Ravi Bangoria"
* tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y
powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
"A couple of last minute fixes for regressions in this cycle. More
specifically:
- Two patches from Andy, adjusting the NVMe APST quirks to avoid some
issues specific to one Toshiba drive, and some variant of Samsung
on two specific Dell laptops.
- A fix for mtip32xx, turning off mq scheduling on that device. We
have a real fix for this, but it's too late in the cycle.
Thankfully we already have a NO_SCHED flag we can apply here. A
prep patch for this is ensuring that we honor the NO_SCHED flag
when attempting to online switch schedulers, previsouly we only did
so for drive load time. From Ming.
- Fixing an oops in blk-mq polling with scheduling attached. This one
is easily reproducible, it would be a shame to release 4.11 with
that issue. From me.
I'd prefer not having to send in patches at this point in time, but
the above are all things that have regressed in this cycle and the
fixes are relatively straight forward"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: fix potential oops with polling and blk-mq scheduler
nvme: Quirk APST off on "THNSF5256GPUK TOSHIBA"
nvme: Adjust the Samsung APST quirk
mtip32xx: pass BLK_MQ_F_NO_SCHED
block: respect BLK_MQ_F_NO_SCHED
Merge tag 'acpi-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI build fix from Rafael Wysocki:
"This avoids a false-positive build warning from the compiler.
Specifics:
- Avoid a false-positive warning regarding a variable that may not be
initialized that started to trigger after a previous general build
fix (Arnd Bergmann)"
* tag 'acpi-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / power: Avoid maybe-uninitialized warning
Merge tag 'mmc-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- kmalloc sdio scratch buffer to make it DMA-friendly
MMC host:
- dw_mmc: Fix behaviour for SDIO IRQs when runtime PM is used
- sdhci-esdhc-imx: Correct pad I/O drive strength for UHS-DDR50
cards"
* tag 'mmc-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-esdhc-imx: increase the pad I/O drive strength for DDR50 card
mmc: dw_mmc: Don't allow Runtime PM for SDIO cards
mmc: sdio: fix alignment issue in struct sdio_func
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: prevent NR_ISOLATE_* stats from going negative
Revert "mm, page_alloc: only use per-cpu allocator for irq-safe requests"
Rabin Vincent [Thu, 20 Apr 2017 21:37:46 +0000 (14:37 -0700)]
mm: prevent NR_ISOLATE_* stats from going negative
Commit 6afcf8ef0ca0 ("mm, compaction: fix NR_ISOLATED_* stats for pfn
based migration") moved the dec_node_page_state() call (along with the
page_is_file_cache() call) to after putback_lru_page().
But page_is_file_cache() can change after putback_lru_page() is called,
so it should be called before putback_lru_page(), as it was before that
patch, to prevent NR_ISOLATE_* stats from going negative.
Without this fix, non-CONFIG_SMP kernels end up hanging in the
while(too_many_isolated()) { congestion_wait() } loop in
shrink_active_list() due to the negative stats.
While the patch worked great for userspace allocations, the fact that
softirq loses the per-cpu allocator caused problems. It needs to be
redone taking into account that a separate list is needed for hard/soft
IRQs or alternatively find a cheap way of detecting reentry due to an
interrupt. Both are possible but sufficiently tricky that it shouldn't
be rushed.
Jesper had one method for allowing softirqs but reported that the cost
was high enough that it performed similarly to a plain revert. His
figures for netperf TCP_STREAM were as follows
Baseline v4.10.0 : 60316 Mbit/s
Current 4.11.0-rc6: 47491 Mbit/s
Jesper's patch : 60662 Mbit/s
This patch : 60106 Mbit/s
As this is a regression, I wish to revert to noirq allocator for now and
go back to the drawing board.
blk-mq: fix potential oops with polling and blk-mq scheduler
If we have a scheduler attached, blk_mq_tag_to_rq() on the
scheduled tags will return NULL if a request is no longer
in flight. This is different than using the normal tags,
where it will always return the fixed request. Check for
this condition for polling, in case we happen to enter
polling for a completed request.
The request address remains valid, so this check and return
should be perfectly safe.
Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers") Tested-by: Stephen Bates <sbates@raithlin.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Andy Lutomirski [Thu, 20 Apr 2017 20:37:55 +0000 (13:37 -0700)]
nvme: Adjust the Samsung APST quirk
I got a couple more reports: the Samsung APST issues appears to
affect multiple 950-series devices in Dell XPS 15 9550 and Precision
5510 laptops. Change the quirk: rather than blacklisting the
firmware on the first problematic SSD that was reported, disable
APST on all 144d:a802 devices if they're installed in the two
affected Dell models. While we're at it, disable only the deepest
sleep state instead of all of them -- the reporters say that this is
sufficient to fix the problem.
(I have a device that appears to be entirely identical to one of the
affected devices, but I have a different Dell laptop, so it's not
the case that all Samsung devices with firmware BXW75D0Q are broken
under all circumstances.)
Samsung engineers have an affected system, and hopefully they'll
give us a better workaround some time soon. In the mean time, this
should minimize regressions.
See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184
Policing filters do not use the TCA_ACT_* enum and the tb[]
nlattr array in tcf_action_init_1() doesn't get filled for
them so we should not try to look for a TCA_ACT_COOKIE
attribute in the then uninitialized array.
The error handling in cookie allocation then calls
tcf_hash_release() leading to invalid memory access later
on.
Additionally, if cookie allocation fails after an already
existing non-policing filter has successfully been changed,
tcf_action_release() should not be called, also we would
have to roll back the changes in the error handling, so
instead we now allocate the cookie early and assign it on
success at the end.
CVE-2017-7979 Fixes: 1045ba77a596 ("net sched actions: Add support for user cookies") Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed: Fix issue in populating the PFC config paramters.
Change ieee_setpfc() callback implementation to populate traffic class
count with the user provided value.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed: Fix possible system hang in the dcbnl-getdcbx() path.
qed_dcbnl_get_dcbx() API uses kmalloc in GFT_KERNEL mode. The API gets
invoked in the interrupt context by qed_dcbnl_getdcbx callback. Need
to invoke this kmalloc in atomic mode.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed: Fix sending an invalid PFC error mask to MFW.
PFC error-mask value is not supported by MFW, but this bit could be
set in the pfc bit-map of the operational parameters if remote device
supports it. These operational parameters are used as basis for
populating the dcbx config parameters. User provided configs will be
applied on top of these parameters and then send them to MFW when
requested. Driver need to clear the error-mask bit before sending the
config parameters to MFW.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed: Fix possible error in populating max_tc field.
Some adapters may not publish the max_tc value. Populate the default
value for max_tc field in case the mfw didn't provide one.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
James Hughes [Wed, 19 Apr 2017 10:13:40 +0000 (11:13 +0100)]
smsc95xx: Use skb_cow_head to deal with cloned skbs
The driver was failing to check that the SKB wasn't cloned
before adding checksum data.
Replace existing handling to extend/copy the header buffer
with skb_cow_head.
Signed-off-by: James Hughes <james.hughes@raspberrypi.org> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sekhar Nori [Wed, 19 Apr 2017 08:38:24 +0000 (14:08 +0530)]
MAINTAINERS: update entry for TI's CPSW driver
Mugunthan V N, who was reviewing TI's CPSW driver patches is
not working for TI anymore and wont be reviewing patches for
that driver.
Drop Mugunthan as the maintiainer for this driver.
Grygorii continues to be a reviewer. Dave Miller applies the
patches directly and adding a maintainer is actually
misleading since get_maintainer.pl script stops suggesting
that Dave Miller be copied.
Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 18 Apr 2017 19:14:26 +0000 (22:14 +0300)]
dp83640: don't recieve time stamps twice
This patch is prompted by a static checker warning about a potential
use after free. The concern is that netif_rx_ni() can free "skb" and we
call it twice.
When I look at the commit that added this, it looks like some stray
lines were added accidentally. It doesn't make sense to me that we
would recieve the same data two times. I asked the author but never
recieved a response.
I can't test this code, but I'm pretty sure my patch is correct.
Fixes: 4b063258ab93 ("dp83640: Delay scheduled work.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 18 Apr 2017 15:59:49 +0000 (17:59 +0200)]
ipv6: sr: fix out-of-bounds access in SRH validation
This patch fixes an out-of-bounds access in seg6_validate_srh() when the
trailing data is less than sizeof(struct sr6_tlv).
Reported-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Maloney [Tue, 18 Apr 2017 15:14:16 +0000 (11:14 -0400)]
selftests/net: Fixes psock_fanout CBPF test case
'psock_fanout' has been failing since commit 4d7b9dc1f36a9 ("tools:
psock_lib: harden socket filter used by psock tests"). That commit
changed the CBPF filter to examine the full ethernet frame, and was
tested on 'psock_tpacket' which uses SOCK_RAW. But 'psock_fanout' was
also using this same CBPF in two places, for filtering and fanout, on a
SOCK_DGRAM socket.
Change 'psock_fanout' to use SOCK_RAW so that the CBPF program used with
SO_ATTACH_FILTER can examine the entire frame. Create a new CBPF
program for use with PACKET_FANOUT_DATA which ignores the header, as it
cannot see the ethernet header.
Tested: Ran tools/testing/selftests/net/psock_{fanout,tpacket} 10 times,
and they all passed.
Fixes: 4d7b9dc1f36a9 ("tools: psock_lib: harden socket filter used by psock tests") Signed-off-by: 'Mike Maloney <maloneykernel@gmail.com>' Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Thu, 20 Apr 2017 19:32:16 +0000 (21:32 +0200)]
mac80211: reject ToDS broadcast data frames
AP/AP_VLAN modes don't accept any real 802.11 multicast data
frames, but since they do need to accept broadcast management
frames the same is currently permitted for data frames. This
opens a security problem because such frames would be decrypted
with the GTK, and could even contain unicast L3 frames.
Since the spec says that ToDS frames must always have the BSSID
as the RA (addr1), reject any other data frames.
The problem was originally reported in "Predicting, Decrypting,
and Abusing WPA2/802.11 Group Keys" at usenix
https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/vanhoef
and brought to my attention by Jouni.
Cc: stable@vger.kernel.org Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
--
Dave, I didn't want to send you a new pull request for a single
commit yet again - can you apply this one patch as is? Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'trace-v4.11-rc5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull two more ftrace fixes from Steven Rostedt:
"While continuing my development, I uncovered two more small bugs.
One is a race condition when enabling the snapshot function probe
trigger. It enables the probe before allocating the snapshot, and if
the probe triggers first, it stops tracing with a warning that the
snapshot buffer was not allocated.
The seconds is that the snapshot file should show how to use it when
it is empty. But a bug fix from long ago broke the "is empty" test and
the snapshot file no longer displays the help message"
* tag 'trace-v4.11-rc5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Have ring_buffer_iter_empty() return true when empty
tracing: Allocate the snapshot buffer before enabling probe
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fix from Martin Schwidefsky:
"There is one more fix I would like to see in 4.11: The combination of
KVM, CMMA and heavy paging can cause data corruption, the fix is to
clear the _PAGE_UNUSED bit in set_pte_at()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: fix CMMA vs KSM vs others
Merge tag 'keys-fixes-20170419' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull keyrings fixes from David Howells:
(1) Disallow keyrings whose name begins with a '.' to be joined
[CVE-2016-9604].
(2) Change the name of the dead type to ".dead" to prevent user access
[CVE-2017-6951].
(3) Fix keyctl_set_reqkey_keyring() to not leak thread keyrings
[CVE-2017-7472]
* tag 'keys-fixes-20170419' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
KEYS: fix keyctl_set_reqkey_keyring() to not leak thread keyrings
KEYS: Change the name of the dead type to ".dead" to prevent user access
KEYS: Disallow keyrings beginning with '.' to be joined as session keyrings
David S. Miller [Thu, 20 Apr 2017 17:36:42 +0000 (13:36 -0400)]
Merge tag 'mac80211-for-davem-2017-04-18' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
A single fix, for the MU-MIMO monitor mode, that fixes
bad SKB accesses if the SKB was paged, which is the case
for the only driver supporting this - iwlwifi.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
mmc: sdhci-esdhc-imx: increase the pad I/O drive strength for DDR50 card
Currently for DDR50 card, it need tuning in default. We meet tuning fail
issue for DDR50 card and some data CRC error when DDR50 sd card works.
This is because the default pad I/O drive strength can't make sure DDR50
card work stable. So increase the pad I/O drive strength for DDR50 card,
and use pins_100mhz.
This fixes DDR50 card support for IMX since DDR50 tuning was enabled from
commit 9faac7b95ea4 ("mmc: sdhci: enable tuning for DDR50")
Jason Gerecke [Wed, 19 Apr 2017 21:47:24 +0000 (14:47 -0700)]
HID: wacom: Override incorrect logical maximum contact identifier
It apears that devices designed around Wacom's G11 chipset (e.g. Lenovo
ThinkPad Yoga 260, Lenovo ThinkPad X1 Yoga, Dell XPS 12 9250, Dell Venue
8 Pro 5855, etc.) suffer from a common issue in their HID descriptors.
The logical maximum is not updated for the "Contact Identifier" usage,
leaving it as just "1" despite these devices being capable of tracking
far more touches.
Commit 60a221869803 began ignoring usages with out-of-range values,
causing problems for devices based on this chipset. Touches after
the first will have an out-of-range Contact Identifier, and ignoring
that usage will cause the kernel to incorrectly slot each finger's
events (along with all the knock-on userspace effects that entails).
This commit checks for these buggy descriptors and updates the maximum
where required. Prior chipsets have used "255" as the maximum (and the
G11, at least, doesn't seem to actually use IDs outside the range of
1..CONTACTMAX) so continue using this value.
Cc: stable@vger.kernel.org Fixes: 60a221869803 ("HID: wacom: generic: add support for touchring") Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
ring-buffer: Have ring_buffer_iter_empty() return true when empty
I noticed that reading the snapshot file when it is empty no longer gives a
status. It suppose to show the status of the snapshot buffer as well as how
to allocate and use it. For example:
># cat snapshot
# tracer: nop
#
#
# * Snapshot is allocated *
#
# Snapshot commands:
# echo 0 > snapshot : Clears and frees snapshot buffer
# echo 1 > snapshot : Allocates snapshot buffer, if not already allocated.
# Takes a snapshot of the main buffer.
# echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free)
# (Doesn't have to be '2' works with any number that
# is not a '0' or '1')
What happened was that it was using the ring_buffer_iter_empty() function to
see if it was empty, and if it was, it showed the status. But that function
was returning false when it was empty. The reason was that the iter header
page was on the reader page, and the reader page was empty, but so was the
buffer itself. The check only tested to see if the iter was on the commit
page, but the commit page was no longer pointing to the reader page, but as
all pages were empty, the buffer is also.
Cc: stable@vger.kernel.org Fixes: 651e22f2701b ("ring-buffer: Always reset iterator to reader page") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd"
- one stm32f4 fix for a change that introduced the PLL_I2S and PLL_SAI
boards
- two Allwinner clk driver build fixes
- two Allwinner CPU clk driver fixes where we see random CPUFreq
crashes because the CPU's PLL locks up sometimes when we change the
rate
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: sunxi-ng: a33: gate then ungate PLL CPU clk after rate change
clk: sunxi-ng: Add clk notifier to gate then ungate PLL clocks
clk: sunxi-ng: fix build failure in ccu-sun9i-a80 driver
clk: sunxi-ng: fix build error without CONFIG_RESET_CONTROLLER
clk: stm32f4: fix: exclude values 0 and 1 for PLLQ
We are under rcu read lock protection at that point:
rcu_read_lock();
d = atomic_long_read(&ns->stashed);
if (!d)
goto slow;
dentry = (struct dentry *)d;
if (!lockref_get_not_dead(&dentry->d_lockref))
goto slow;
rcu_read_unlock();
but don't use a proper RCU API on the free path, therefore a parallel
__d_free() could free it at the same time. We need to mark the stashed
dentry with DCACHE_RCUACCESS so that __d_free() will be called after all
readers leave RCU.
Fixes: e149ed2b805f ("take the targets of /proc/*/ns/* symlinks to separate fs") Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Michal Hocko [Wed, 19 Apr 2017 07:52:46 +0000 (09:52 +0200)]
mm: make mm_percpu_wq non freezable
Geert has reported a freeze during PM resume and some additional
debugging has shown that the device_resume worker cannot make a forward
progress because it waits for an event which is stuck waiting in
drain_all_pages:
Tetsuo has properly noted that mm_percpu_wq is created as WQ_FREEZABLE
so it is frozen this early during resume so we are effectively
deadlocked. Fix this by dropping WQ_FREEZABLE when creating
mm_percpu_wq. We really want to have it operational all the time.
Fixes: ce612879ddc7 ("mm: move pcp and lru-pcp draining into single wq") Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org> Debugged-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Signed-off-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Merge tag 'backlight-for-v4.11' of git://git.linaro.org/people/daniel.thompson/linux
Pull backlight fix from Daniel Thompson:
"Normally pull requests for backlight come from Lee Jones (and will
continue to do so) but the bug fixed here is annoying for few people
so I'm providing a little holiday cover.
Fix a single bug in the PWM backlight driver and make it play nice
with a wider range of GPIO devices. This bug is a regression and was
independently discovered by Geert Uytterhoevan and Paul Kocialkowski
(and is tested by both)"
* tag 'backlight-for-v4.11' of git://git.linaro.org/people/daniel.thompson/linux:
backlight: pwm_bl: Fix GPIO out for unimplemented .get_direction()
gcc -O2 cannot always prove that the loop in acpi_power_get_inferred_state()
is enterered at least once, so it assumes that cur_state might not get
initialized:
drivers/acpi/power.c: In function 'acpi_power_get_inferred_state':
drivers/acpi/power.c:222:9: error: 'cur_state' may be used uninitialized in this function [-Werror=maybe-uninitialized]
This sets the variable to zero at the start of the loop, to ensure that
there is well-defined behavior even for an empty list. This gets rid of
the warning.
The warning first showed up when the -Os flag got removed in a bug fix
patch in linux-4.11-rc5.
I would suggest merging this addon patch on top of that bug fix to avoid
introducing a new warning in the stable kernels.
Fixes: 61b79e16c68d (ACPI: Fix incompatibility with mcount-based function graph tracing) Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Ming Lei [Sat, 15 Apr 2017 12:38:23 +0000 (20:38 +0800)]
mtip32xx: pass BLK_MQ_F_NO_SCHED
The recent introduced MQ IO scheduler breaks mtip32xx in the
following way.
mtip32xx use the 'request_index' passed to .init_request() as
hardware tag index for initializing hardware queue, and it
actually require that rq->tag is always same with 'request_index'
passed to .init_request(). Current blk-mq IO scheduler can't
guarantee this point, so this patch passes BLK_MQ_F_NO_SCHED
and at least make mtip32xx working.
This patch fixes the following strange hardware failure. The
issue can be triggered easily when doing I/O with mq-deadline
enabled.
backlight: pwm_bl: Fix GPIO out for unimplemented .get_direction()
Commit 7613c922315e308a ("backlight: pwm_bl: Move the checks for initial
power state to a separate function") not just moved some code, but made
slight changes in semantics.
If a gpiochip doesn't implement the optional .get_direction() callback,
gpiod_get_direction always returns -EINVAL, which is never equal to
GPIOF_DIR_IN, leading to the GPIO not being configured for output.
To avoid this, invert the test and check for not GPIOF_DIR_OUT instead,
like the original code did.
This restores the display on r8a7740/armadillo.
Fixes: 7613c922315e308a ("backlight: pwm_bl: Move the checks for initial power state to a separate function") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Reviewed-by: Peter Ujfalusi <peter.ujfalusi@ti.com> Acked-by: Philipp Zabel <p.zabel@pengutronix.de> Acked-by: Daniel Thompson <daniel.thompson@linaro.org> Signed-off-by: Daniel Thompson <daniel.thompson@linaro.org>
tracing: Allocate the snapshot buffer before enabling probe
Currently the snapshot trigger enables the probe and then allocates the
snapshot. If the probe triggers before the allocation, it could cause the
snapshot to fail and turn tracing off. It's best to allocate the snapshot
buffer first, and then enable the trigger. If something goes wrong in the
enabling of the trigger, the snapshot buffer is still allocated, but it can
also be freed by the user by writting zero into the snapshot buffer file.
Also add a check of the return status of alloc_snapshot().
Cc: stable@vger.kernel.org Fixes: 77fd5c15e3 ("tracing: Add snapshot trigger to function probes") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Jason Gerecke [Thu, 13 Apr 2017 15:39:49 +0000 (08:39 -0700)]
HID: wacom: Treat HID_DG_TOOLSERIALNUMBER as unsigned
Because HID_DG_TOOLSERIALNUMBER doesn't first cast the value recieved from HID
to an unsigned type, sign-extension rules can cause the value of
wacom_wac->serial[0] to inadvertently wind up with all 32 of its highest bits
set if the highest bit of "value" was set.
This can cause problems for Tablet PC devices which use AES sensors and the
xf86-input-wacom userspace driver. It is not uncommon for AES sensors to send a
serial number of '0' while the pen is entering or leaving proximity. The
xf86-input-wacom driver ignores events with a serial number of '0' since it
cannot match them up to an in-use tool. To ensure the xf86-input-wacom driver
does not ignore the final out-of-proximity event, the kernel does not send
MSC_SERIAL events when the value of wacom_wac->serial[0] is '0'. If the highest
bit of HID_DG_TOOLSERIALNUMBER is set by an in-prox pen which later leaves
proximity and sends a '0' for HID_DG_TOOLSERIALNUMBER, then only the lowest 32
bits of wacom_wac->serial[0] are actually cleared, causing the kernel to send
an MSC_SERIAL event. Since the 'input_event' function takes an 'int' as
argument, only those lowest (now-cleared) 32 bits of wacom_wac->serial[0] are
sent to userspace, causing xf86-input-wacom to ignore the event. If the event
was the final out-of-prox event, then xf86-input-wacom may remain in a state
where it believes the pen is in proximity and refuses to allow other devices
under its control (e.g. the touchscreen) to move the cursor.
It should be noted that EMR devices and devices which use both the
HID_DG_TOOLSERIALNUMBER and WACOM_HID_WD_SERIALHI usages (in that order) would
be immune to this issue. It appears only AES devices are affected.
Fixes: f85c9dc678a ("HID: wacom: generic: Support tool ID and additional tool types") Cc: stable@vger.kernel.org Signed-off-by: Jason Gerecke <jason.gerecke@wacom.com> Acked-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Sergei Shtylyov [Mon, 17 Apr 2017 12:55:22 +0000 (15:55 +0300)]
sh_eth: unmap DMA buffers when freeing rings
The DMA API debugging (when enabled) causes:
WARNING: CPU: 0 PID: 1445 at lib/dma-debug.c:519 add_dma_entry+0xe0/0x12c
DMA-API: exceeded 7 overlapping mappings of cacheline 0x01b2974d
to be printed after repeated initialization of the Ether device, e.g.
suspend/resume or 'ifconfig' up/down. This is because DMA buffers mapped
using dma_map_single() in sh_eth_ring_format() and sh_eth_start_xmit() are
never unmapped. Resolve this problem by unmapping the buffers when freeing
the descriptor rings; in order to do it right, we'd have to add an extra
parameter to sh_eth_txfree() (we rename this function to sh_eth_tx_free(),
while at it).
Based on the commit a47b70ea86bd ("ravb: unmap descriptors when freeing
rings").
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1) BPF tail call handling bug fixes from Daniel Borkmann.
2) Fix allowance of too many rx queues in sfc driver, from Bert
Kenward.
3) Non-loopback ipv6 packets claiming src of ::1 should be dropped,
from Florian Westphal.
4) Statistics requests on KSZ9031 can crash, fix from Grygorii
Strashko.
5) TX ring handling fixes in mediatek driver, from Sean Wang.
6) ip_ra_control can deadlock, fix lock acquisition ordering to fix,
from Cong WANG.
7) Fix use after free in ip_recv_error(), from Willem de Buijn.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
bpf: fix checking xdp_adjust_head on tail calls
bpf: fix cb access in socket filter programs on tail calls
ipv6: drop non loopback packets claiming to originate from ::1
net: ethernet: mediatek: fix inconsistency of port number carried in TXD
net: ethernet: mediatek: fix inconsistency between TXD and the used buffer
net: phy: micrel: fix crash when statistic requested for KSZ9031 phy
net: vrf: Fix setting NLM_F_EXCL flag when adding l3mdev rule
net: thunderx: Fix set_max_bgx_per_node for 81xx rgx
net-timestamp: avoid use-after-free in ip_recv_error
ipv4: fix a deadlock in ip_ra_control
sfc: limit the number of receive queues
Make sure the start adderess is aligned to PMD_SIZE
boundary when freeing page table backing a hugepage
region. The issue was causing segfaults when a region
backed by 64K pages was unmapped since such a region
is in general not PMD_SIZE aligned.
Signed-off-by: Nitin Gupta <nitin.m.gupta@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Jordan [Mon, 10 Apr 2017 15:50:52 +0000 (11:50 -0400)]
sparc64: Use LOCKDEP_SMALL, not PROVE_LOCKING_SMALL
CONFIG_PROVE_LOCKING_SMALL shrinks the memory usage of lockdep so the
kernel text, data, and bss fit in the required 32MB limit, but this
option is not set for every config that enables lockdep.
A 4.10 kernel fails to boot with the console output
Kernel: Using 8 locked TLB entries for main kernel image.
hypervisor_tlb_lock[2000000:0:8000000071c007c3:1]: errors with f
Program terminated
To fix, rename CONFIG_PROVE_LOCKING_SMALL to CONFIG_LOCKDEP_SMALL, and
enable this option with CONFIG_LOCKDEP=y so we get the reduced memory
usage every time lockdep is turned on.
Tested that CONFIG_LOCKDEP_SMALL is set to 'y' if and only if
CONFIG_LOCKDEP is set to 'y'. When other lockdep-related config options
that select CONFIG_LOCKDEP are enabled (e.g. CONFIG_LOCK_STAT or
CONFIG_PROVE_LOCKING), verified that CONFIG_LOCKDEP_SMALL is also
enabled.
Fixes: e6b5f1be7afe ("config: Adding the new config parameter CONFIG_PROVE_LOCKING_SMALL for sparc") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Reviewed-by: Babu Moger <babu.moger@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Douglas Anderson [Tue, 11 Apr 2017 22:55:43 +0000 (15:55 -0700)]
mmc: dw_mmc: Don't allow Runtime PM for SDIO cards
According to the SDIO standard interrupts are normally signalled in a
very complicated way. They require the card clock to be running and
require the controller to be paying close attention to the signals
coming from the card. This simply can't happen with the clock stopped
or with the controller in a low power mode.
To that end, we'll disable runtime_pm when we detect that an SDIO card
was inserted. This is much like with what we do with the special
"SDMMC_CLKEN_LOW_PWR" bit that dw_mmc supports.
NOTE: we specifically do this Runtime PM disabling at card init time
rather than in the enable_sdio_irq() callback. This is _different_
than how SDHCI does it. Why do we do it differently?
- Unlike SDHCI, dw_mmc uses the standard sdio_irq code in Linux (AKA
dw_mmc doesn't set MMC_CAP2_SDIO_IRQ_NOTHREAD).
- Because we use the standard sdio_irq code:
- We see a constant stream of enable_sdio_irq(0) and
enable_sdio_irq(1) calls. This is because the standard code
disables interrupts while processing and re-enables them after.
- While interrupts are disabled, there's technically a period where
we could get runtime disabled while processing interrupts.
- If we are runtime disabled while processing interrupts, we'll
reset the controller at resume time (see dw_mci_runtime_resume),
which seems like a terrible idea because we could possibly have
another interrupt pending.
To fix the above isues we'd want to put something in the standard
sdio_irq code that makes sure to call pm_runtime get/put when
interrupts are being actively being processed. That's possible to do,
but it seems like a more complicated mechanism when we really just
want the runtime pm disabled always for SDIO cards given that all the
other bits needed to get Runtime PM vs. SDIO just aren't there.
NOTE: at some point in time someone might come up with a fancy way to
do SDIO interrupts and still allow (some) amount of runtime PM.
Technically we could turn off the card clock if we used an alternate
way of signaling SDIO interrupts (and out of band interrupt is one way
to do this). We probably wouldn't actually want to fully runtime
suspend in this case though--at least not with the current
dw_mci_runtime_resume() which basically fully resets the controller at
resume time.
Fixes: e9ed8835e990 ("mmc: dw_mmc: add runtime PM callback") Cc: <stable@vger.kernel.org> Reported-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Douglas Anderson <dianders@chromium.org> Acked-by: Jaehoon Chung <jh80.chung@samsung.com> Reviewed-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
Merge tag 'trace-v4.11-rc5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull ftrace testcase update from Steven Rostedt:
"While testing my development branch, without the fix for the pid use
after free bug, the selftest that Namhyung added triggers it. I
figured it would be good to add the test for the bug after the fix,
such that it does not exist without the fix.
I added another patch that lets the test only test part of the pid
filtering, and ignores the function-fork (filtering on children as
well) if the function-fork feature does not exist. This feature is
added by Namhyung just before he added this test. But since the test
tests both with and without the feature, it would be good to let it
not fail if the feature does not exist"
* tag 'trace-v4.11-rc5-4' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
selftests: ftrace: Add check for function-fork before running pid filter test
selftests: ftrace: Add a testcase for function PID filter
Heiner Kallweit [Wed, 29 Mar 2017 18:54:37 +0000 (20:54 +0200)]
mmc: sdio: fix alignment issue in struct sdio_func
Certain 64-bit systems (e.g. Amlogic Meson GX) require buffers to be
used for DMA to be 8-byte-aligned. struct sdio_func has an embedded
small DMA buffer not meeting this requirement.
When testing switching to descriptor chain mode in meson-gx driver
SDIO is broken therefore. Fix this by allocating the small DMA buffer
separately as kmalloc ensures that the returned memory area is
properly aligned for every basic data type.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Tested-by: Helmut Klein <hgkr.klein@gmail.com> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org>
selftests: ftrace: Add check for function-fork before running pid filter test
Have the func-filter-pid test check for the function-fork option before
testing it. It can still test the pid filtering, but will stop before
testing the function-fork option for children inheriting the pids.
This allows the test to be added before the function-fork feature, but after
a bug fix that triggers one of the bugs the test can cause.
Cc: Namhyung Kim <namhyung@kernel.org> Cc: Shuah Khan <shuahkh@osg.samsung.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Merge tag 'trace-v4.11-rc5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull ftrace fix from Steven Rostedt:
"Namhyung Kim discovered a use after free bug. It has to do with adding
a pid filter to function tracing in an instance, and then freeing the
instance"
* tag 'trace-v4.11-rc5-3' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ftrace: Fix function pid filter on instances
Namhyung Kim [Mon, 17 Apr 2017 02:44:30 +0000 (11:44 +0900)]
selftests: ftrace: Add a testcase for function PID filter
Like event pid filtering test, add function pid filtering test with the
new "function-fork" option. It also tests it on an instance directory
so that it can verify the bug related pid filtering on instances.
Eric Biggers [Tue, 18 Apr 2017 14:31:09 +0000 (15:31 +0100)]
KEYS: fix keyctl_set_reqkey_keyring() to not leak thread keyrings
This fixes CVE-2017-7472.
Running the following program as an unprivileged user exhausts kernel
memory by leaking thread keyrings:
#include <keyutils.h>
int main()
{
for (;;)
keyctl_set_reqkey_keyring(KEY_REQKEY_DEFL_THREAD_KEYRING);
}
Fix it by only creating a new thread keyring if there wasn't one before.
To make things more consistent, make install_thread_keyring_to_cred()
and install_process_keyring_to_cred() both return 0 if the corresponding
keyring is already present.
Fixes: d84f4f992cbd ("CRED: Inaugurate COW credentials") Cc: stable@vger.kernel.org # 2.6.29+ Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: David Howells <dhowells@redhat.com>
David Howells [Tue, 18 Apr 2017 14:31:08 +0000 (15:31 +0100)]
KEYS: Change the name of the dead type to ".dead" to prevent user access
This fixes CVE-2017-6951.
Userspace should not be able to do things with the "dead" key type as it
doesn't have some of the helper functions set upon it that the kernel
needs. Attempting to use it may cause the kernel to crash.
Fix this by changing the name of the type to ".dead" so that it's rejected
up front on userspace syscalls by key_get_type_from_user().
Though this doesn't seem to affect recent kernels, it does affect older
ones, certainly those prior to:
commit c06cfb08b88dfbe13be44a69ae2fdc3a7c902d81
Author: David Howells <dhowells@redhat.com>
Date: Tue Sep 16 17:36:06 2014 +0100
KEYS: Remove key_type::match in favour of overriding default by match_preparse
which went in before 3.18-rc1.
Signed-off-by: David Howells <dhowells@redhat.com>
cc: stable@vger.kernel.org
David Howells [Tue, 18 Apr 2017 14:31:07 +0000 (15:31 +0100)]
KEYS: Disallow keyrings beginning with '.' to be joined as session keyrings
This fixes CVE-2016-9604.
Keyrings whose name begin with a '.' are special internal keyrings and so
userspace isn't allowed to create keyrings by this name to prevent
shadowing. However, the patch that added the guard didn't fix
KEYCTL_JOIN_SESSION_KEYRING. Not only can that create dot-named keyrings,
it can also subscribe to them as a session keyring if they grant SEARCH
permission to the user.
This, for example, allows a root process to set .builtin_trusted_keys as
its session keyring, at which point it has full access because now the
possessor permissions are added. This permits root to add extra public
keys, thereby bypassing module verification.
This also affects kexec and IMA.
This can be tested by (as root):
keyctl session .builtin_trusted_keys
keyctl add user a a @s
keyctl list @s
Michael Ellerman [Tue, 18 Apr 2017 04:08:15 +0000 (14:08 +1000)]
powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y
Prior to commit 2337d207288f ("powerpc/64: CONFIG_RELOCATABLE support for hmi
interrupts"), the branch from hmi_exception_early() to hmi_exception_realmode()
was just a bl hmi_exception_realmode, which the linker would turn into a bl to
the local entry point of hmi_exception_realmode. This was broken when
CONFIG_RELOCATABLE=y because hmi_exception_realmode() is not in the low part of
the kernel text that is copied down to 0x0.
But in fixing that, we added a new bug on little endian kernels. Because the
branch is now a bctrl when CONFIG_RELOCATABLE=y, we branch to the global entry
point of hmi_exception_realmode(). The global entry point must be called with
r12 containing the address of hmi_exception_realmode(), because it uses that
value to calculate the TOC value (r2).
This may manifest as a checkstop, because we take a junk value from r12 which
came from HSRR1, add a small constant to it and then use that as the TOC
pointer. The HSRR1 value will have 0x9 as the top nibble, which puts it above
RAM and somewhere in MMIO space.
Fix it by changing the BRANCH_LINK_TO_FAR() macro to always use r12 to load the
label we're branching to. This means r12 will be setup correctly on LE, fixing
this bug, and r12 is also volatile across function calls on BE so it's a good
choice anyway.
Fixes: 2337d207288f ("powerpc/64: CONFIG_RELOCATABLE support for hmi interrupts") Reported-by: Mahesh Salgaonkar <mahesh@linux.vnet.ibm.com> Acked-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
On a 64-bit system when the user probes on a 'stdu' instruction, the kernel does
not emulate actual store in emulate_step() because it may corrupt the exception
frame. So the kernel does the actual store operation in exception return code
i.e. resume_kernel().
resume_kernel() loads the saved stack pointer from memory using lwz, which only
loads the low 32-bits of the address, causing the kernel crash.
Fix this by loading the 64-bit value instead.
Fixes: be96f63375a1 ("powerpc: Split out instruction analysis part of emulate_step()") Cc: stable@vger.kernel.org # v3.18+ Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com> Reviewed-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Reviewed-by: Ananth N Mavinakayanahalli <ananth@linux.vnet.ibm.com>
[mpe: Change log massage, add stable tag] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Herbert Xu [Thu, 13 Apr 2017 10:35:59 +0000 (18:35 +0800)]
af_key: Fix sadb_x_ipsecrequest parsing
The parsing of sadb_x_ipsecrequest is broken in a number of ways.
First of all we're not verifying sadb_x_ipsecrequest_len. This
is needed when the structure carries addresses at the end. Worse
we don't even look at the length when we parse those optional
addresses.
The migration code had similar parsing code that's better but
it also has some deficiencies. The length is overcounted first
of all as it includes the header itself. It also fails to check
the length before dereferencing the sa_family field.
This patch fixes those problems in parse_sockaddr_pair and then
uses it in parse_ipsecrequest.
cifs: Do not send echoes before Negotiate is complete
commit 4fcd1813e640 ("Fix reconnect to not defer smb3 session reconnect
long after socket reconnect") added support for Negotiate requests to
be initiated by echo calls.
To avoid delays in calling echo after a reconnect, I added the patch
introduced by the commit b8c600120fc8 ("Call echo service immediately
after socket reconnect").
This has however caused a regression with cifs shares which do not have
support for echo calls to trigger Negotiate requests. On connections
which need to call Negotiation, the echo calls trigger an error which
triggers a reconnect which in turn triggers another echo call. This
results in a loop which is only broken when an operation is performed on
the cifs share. For an idle share, it can DOS a server.
The patch uses the smb_operation can_echo() for cifs so that it is
called only if connection has been already been setup.
kernel bz: 194531
Signed-off-by: Sachin Prabhu <sprabhu@redhat.com> Tested-by: Jonathan Liu <net147@gmail.com> Acked-by: Pavel Shilovsky <pshilov@microsoft.com> CC: Stable <stable@vger.kernel.org> Signed-off-by: Steve French <smfrench@gmail.com>