Roi Dayan [Sun, 25 Sep 2016 10:52:44 +0000 (13:52 +0300)]
net/devlink: Add E-Switch encapsulation control
This is an e-switch global knob to enable HW support for applying
encapsulation/decapsulation to VF traffic as part of SRIOV e-switch offloading.
The actual encap/decap is carried out (along with the matching and other actions)
per offloaded e-switch rules, e.g as done when offloading the TC tunnel key action.
Signed-off-by: Roi Dayan <roid@mellanox.com> Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
2) Verify lengths properly in IPSEC reqeusts, from Herbert Xu.
3) Fix out of bounds access in ipv6 segment routing code, from David
Lebrun.
4) Don't write into the header of cloned SKBs in smsc95xx driver, from
James Hughes.
5) Several other drivers have this bug too, fix them. From Eric
Dumazet.
6) Fix access to uninitialized data in TC action cookie code, from
Wolfgang Bumiller.
7) Fix double free in IPV6 segment routing, again from David Lebrun.
8) Don't let userspace set the RTF_PCPU flag, oops. From David Ahern.
9) Fix use after free in qrtr code, from Dan Carpenter.
10) Don't double-destroy devices in ip6mr code, from Nikolay
Aleksandrov.
11) Don't pass out-of-range TX queue indices into drivers, from Tushar
Dave.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (30 commits)
netpoll: Check for skb->queue_mapping
ip6mr: fix notification device destruction
bpf, doc: update bpf maintainers entry
net: qrtr: potential use after free in qrtr_sendmsg()
bpf: Fix values type used in test_maps
net: ipv6: RTF_PCPU should not be settable from userspace
gso: Validate assumption of frag_list segementation
kaweth: use skb_cow_head() to deal with cloned skbs
ch9200: use skb_cow_head() to deal with cloned skbs
lan78xx: use skb_cow_head() to deal with cloned skbs
sr9700: use skb_cow_head() to deal with cloned skbs
cx82310_eth: use skb_cow_head() to deal with cloned skbs
smsc75xx: use skb_cow_head() to deal with cloned skbs
ipv6: sr: fix double free of skb after handling invalid SRH
MAINTAINERS: Add "B:" field for networking.
net sched actions: allocate act cookie early
qed: Fix issue in populating the PFC config paramters.
qed: Fix possible system hang in the dcbnl-getdcbx() path.
qed: Fix sending an invalid PFC error mask to MFW.
qed: Fix possible error in populating max_tc field.
...
Tushar Dave [Thu, 20 Apr 2017 22:57:31 +0000 (15:57 -0700)]
netpoll: Check for skb->queue_mapping
Reducing real_num_tx_queues needs to be in sync with skb queue_mapping
otherwise skbs with queue_mapping greater than real_num_tx_queues
can be sent to the underlying driver and can result in kernel panic.
One such event is running netconsole and enabling VF on the same
device. Or running netconsole and changing number of tx queues via
ethtool on same device.
Andrey Konovalov reported a BUG caused by the ip6mr code which is caused
because we call unregister_netdevice_many for a device that is already
being destroyed. In IPv4's ipmr that has been resolved by two commits
long time ago by introducing the "notify" parameter to the delete
function and avoiding the unregister when called from a notifier, so
let's do the same for ip6mr.
Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
With CONFIG_I2C=m and NET_DSA_SMSC_LAN9303=y, we run into a link error:
drivers/base/regmap/regmap-i2c.o: In function `regmap_smbus_byte_reg_read':
regmap-i2c.c:(.text.regmap_smbus_byte_reg_read+0x18): undefined reference to `i2c_smbus_read_byte_data'
drivers/base/regmap/regmap-i2c.o: In function `regmap_smbus_byte_reg_write':
regmap-i2c.c:(.text.regmap_smbus_byte_reg_write+0x18): undefined reference to `i2c_smbus_write_byte_data'
drivers/base/regmap/regmap-i2c.o: In function `regmap_smbus_word_reg_read':
regmap-i2c.c:(.text.regmap_smbus_word_reg_read+0x18): undefined reference to `i2c_smbus_read_word_data'
drivers/base/regmap/regmap-i2c.o: In function `regmap_smbus_word_read_swapped':
regmap-i2c.c:(.text.regmap_smbus_word_read_swapped+0x18): undefined reference to `i2c_smbus_read_word_data'
drivers/base/regmap/regmap-i2c.o: In function `regmap_smbus_word_write_swapped':
This adds a Kconfig dependency to avoid the broken configuration.
Fixes: be4e119f9914 ("net: dsa: LAN9303: add I2C managed mode support") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Apr 2017 19:29:40 +0000 (15:29 -0400)]
Merge tag 'nfc-next-4.12-1' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/nfc-next
Samuel Ortiz says:
====================
NFC 4.12 pull request
This is the NFC pull request for 4.12. We have:
- Improvements for the pn533 command queue handling and device
registration order.
- Removal of platform data for the pn544 and st21nfca drivers.
- Additional device tree options to support more trf7970a hardware options.
- Support for Sony's RC-S380P through the port100 driver.
- Removal of the obsolte nfcwilink driver.
- Headers inclusion cleanups (miscdevice.h, unaligned.h) for many drivers.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
bonding: fix wq initialization for links created via netlink
Earlier patch 4493b81bea ("bonding: initialize work-queues during
creation of bond") moved the work-queue initialization from bond_open()
to bond_create(). However this caused the link those are created using
netlink 'create bond option' (ip link add bondX type bond); create the
new trunk without initializing work-queues. Prior to the above mentioned
change, ndo_open was in both paths and things worked correctly. The
consequence is visible in the report shared by Joe Stringer -
I've noticed that this patch breaks bonding within namespaces if
you're not careful to perform device cleanup correctly.
Here's my repro script, you can run on any net-next with this patch
and you'll start seeing some weird behaviour:
ip netns add foo
ip li add veth0 type veth peer name veth0+ netns foo
ip li add veth1 type veth peer name veth1+ netns foo
ip netns exec foo ip li add bond0 type bond
ip netns exec foo ip li set dev veth0+ master bond0
ip netns exec foo ip li set dev veth1+ master bond0
ip netns exec foo ip addr add dev bond0 192.168.0.1/24
ip netns exec foo ip li set dev bond0 up
ip li del dev veth0
ip li del dev veth1
The second to last command segfaults, last command hangs. rtnl is now
permanently locked. It's not a problem if you take bond0 down before
deleting veths, or delete bond0 before deleting veths. If you delete
either end of the veth pair as per above, either inside or outside the
namespace, it hits this problem.
Here's some kernel logs:
[ 1221.801610] bond0: Enslaving veth0+ as an active interface with an up link
[ 1224.449581] bond0: Enslaving veth1+ as an active interface with an up link
[ 1281.193863] bond0: Releasing backup interface veth0+
[ 1281.193866] bond0: the permanent HWaddr of veth0+ -
16:bf:fb:e0:b8:43 - is still in use by bond0 - set the HWaddr of
veth0+ to a different address to avoid conflicts
[ 1281.193867] ------------[ cut here ]------------
[ 1281.193873] WARNING: CPU: 0 PID: 2024 at kernel/workqueue.c:1511
__queue_delayed_work+0x13f/0x150
[ 1281.193873] Modules linked in: bonding veth openvswitch nf_nat_ipv6
nf_nat_ipv4 nf_nat autofs4 nfsd auth_rpcgss nfs_acl binfmt_misc nfs
lockd grace sunrpc fscache ppdev vmw_balloon coretemp psmouse
serio_raw vmwgfx ttm drm_kms_helper vmw_vmci netconsole parport_pc
configfs drm i2c_piix4 fb_sys_fops syscopyarea sysfillrect sysimgblt
shpchp mac_hid nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4
nf_defrag_ipv4 nf_conntrack libcrc32c lp parport hid_generic usbhid
hid mptspi mptscsih e1000 mptbase ahci libahci
[ 1281.193905] CPU: 0 PID: 2024 Comm: ip Tainted: G W
4.10.0-bisect-bond-v0.14 #37
[ 1281.193906] Hardware name: VMware, Inc. VMware Virtual
Platform/440BX Desktop Reference Platform, BIOS 6.00 09/30/2014
[ 1281.193906] Call Trace:
[ 1281.193912] dump_stack+0x63/0x89
[ 1281.193915] __warn+0xd1/0xf0
[ 1281.193917] warn_slowpath_null+0x1d/0x20
[ 1281.193918] __queue_delayed_work+0x13f/0x150
[ 1281.193920] queue_delayed_work_on+0x27/0x40
[ 1281.193929] bond_change_active_slave+0x25b/0x670 [bonding]
[ 1281.193932] ? synchronize_rcu_expedited+0x27/0x30
[ 1281.193935] __bond_release_one+0x489/0x510 [bonding]
[ 1281.193939] ? addrconf_notify+0x1b7/0xab0
[ 1281.193942] bond_netdev_event+0x2c5/0x2e0 [bonding]
[ 1281.193944] ? netconsole_netdev_event+0x124/0x190 [netconsole]
[ 1281.193947] notifier_call_chain+0x49/0x70
[ 1281.193948] raw_notifier_call_chain+0x16/0x20
[ 1281.193950] call_netdevice_notifiers_info+0x35/0x60
[ 1281.193951] rollback_registered_many+0x23b/0x3e0
[ 1281.193953] unregister_netdevice_many+0x24/0xd0
[ 1281.193955] rtnl_delete_link+0x3c/0x50
[ 1281.193956] rtnl_dellink+0x8d/0x1b0
[ 1281.193960] rtnetlink_rcv_msg+0x95/0x220
[ 1281.193962] ? __kmalloc_node_track_caller+0x35/0x280
[ 1281.193964] ? __netlink_lookup+0xf1/0x110
[ 1281.193966] ? rtnl_newlink+0x830/0x830
[ 1281.193967] netlink_rcv_skb+0xa7/0xc0
[ 1281.193969] rtnetlink_rcv+0x28/0x30
[ 1281.193970] netlink_unicast+0x15b/0x210
[ 1281.193971] netlink_sendmsg+0x319/0x390
[ 1281.193974] sock_sendmsg+0x38/0x50
[ 1281.193975] ___sys_sendmsg+0x25c/0x270
[ 1281.193978] ? mem_cgroup_commit_charge+0x76/0xf0
[ 1281.193981] ? page_add_new_anon_rmap+0x89/0xc0
[ 1281.193984] ? lru_cache_add_active_or_unevictable+0x35/0xb0
[ 1281.193985] ? __handle_mm_fault+0x4e9/0x1170
[ 1281.193987] __sys_sendmsg+0x45/0x80
[ 1281.193989] SyS_sendmsg+0x12/0x20
[ 1281.193991] do_syscall_64+0x6e/0x180
[ 1281.193993] entry_SYSCALL64_slow_path+0x25/0x25
[ 1281.193995] RIP: 0033:0x7f6ec122f5a0
[ 1281.193995] RSP: 002b:00007ffe69e89c48 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
[ 1281.193997] RAX: ffffffffffffffda RBX: 00007ffe69e8dd60 RCX: 00007f6ec122f5a0
[ 1281.193997] RDX: 0000000000000000 RSI: 00007ffe69e89c90 RDI: 0000000000000003
[ 1281.193998] RBP: 00007ffe69e89c90 R08: 0000000000000000 R09: 0000000000000003
[ 1281.193999] R10: 00007ffe69e89a10 R11: 0000000000000246 R12: 0000000058f14b9f
[ 1281.193999] R13: 0000000000000000 R14: 00000000006473a0 R15: 00007ffe69e8e450
[ 1281.194001] ---[ end trace 713a77486cbfbfa3 ]---
Fixes: 4493b81bea ("bonding: initialize work-queues during creation of bond") Reported-by: Joe Stringer <joe@ovn.org> Tested-by: Joe Stringer <joe@ovn.org> Signed-off-by: Mahesh Bandewar <maheshb@google.com> Acked-by: Andy Gospodarek <andy@greyhouse.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Thu, 20 Apr 2017 15:27:58 +0000 (17:27 +0200)]
bpf, doc: update bpf maintainers entry
Add various related files that have been missing under
BPF entry covering essential parts of its infrastructure
and also add myself as co-maintainer.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently driver use phy_start_aneg() in arc_emac_open() to bring
up PHY. But phy_start() function is more appropriate for this purposes.
Besides that it call phy_start_aneg() as part of PHY startup sequence
it also can correctly bring up PHY from error and suspended states.
So the patch replace phy_start_aneg() to phy_start().
Also the patch add call to phy_stop() to arc_emac_stop() to allow
the PHY device to be fully suspended when the interface is unused.
Signed-off-by: Alexander Kochetkov <al.kochet@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Thu, 20 Apr 2017 10:21:30 +0000 (13:21 +0300)]
net: qrtr: potential use after free in qrtr_sendmsg()
If skb_pad() fails then it frees the skb so we should check for errors.
Fixes: bdabad3e363d ("net: Add Qualcomm IPC router") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Miller [Thu, 20 Apr 2017 19:20:16 +0000 (15:20 -0400)]
bpf: Fix values type used in test_maps
Maps of per-cpu type have their value element size adjusted to 8 if it
is specified smaller during various map operations.
This makes test_maps as a 32-bit binary fail, in fact the kernel
writes past the end of the value's array on the user's stack.
To be quite honest, I think the kernel should reject creation of a
per-cpu map that doesn't have a value size of at least 8 if that's
what the kernel is going to silently adjust to later.
If the user passed something smaller, it is a sizeof() calcualtion
based upon the type they will actually use (just like in this testcase
code) in later calls to the map operations.
Fixes: df570f577231 ("samples/bpf: unit test for BPF_MAP_TYPE_PERCPU_ARRAY") Signed-off-by: David S. Miller <davem@davemloft.net> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>
This adds the basic infrastructure for IPsec hardware
offloading, it creates a configuration API and adjusts
the packet path.
1) Add the needed netdev features to configure IPsec offloads.
2) Add the IPsec hardware offloading API.
3) Prepare the ESP packet path for hardware offloading.
4) Add gso handlers for esp4 and esp6, this implements
the software fallback for GSO packets.
5) Add xfrm replay handler functions for offloading.
6) Change ESP to use a synchronous crypto algorithm on
offloading, we don't have the option for asynchronous
returns when we handle IPsec at layer2.
7) Add a xfrm validate function to validate_xmit_skb. This
implements the software fallback for non GSO packets.
8) Set the inner_network and inner_transport members of
the SKB, as well as encapsulation, to reflect the actual
positions of these headers, and removes them only once
encryption is done on the payload.
From Ilan Tayari.
9) Prepare the ESP GRO codepath for hardware offloading.
10) Fix incorrect null pointer check in esp6.
From Colin Ian King.
11) Fix for the GSO software fallback path to detect the
fallback correctly.
From Ilan Tayari.
Please pull or let me know if there are problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Apr 2017 18:13:11 +0000 (14:13 -0400)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2017-04-19
This series contains updates to i40e and i40evf only, most notable being
the addition of trace points for BPF programs.
Tobias Klauser updates i40evf to use net_device stats struct instead
of a local private copy.
Preethi updates the VF driver to not enable receive checksum offload by
default for tunneled packets.
Alex fixes an issue he introduced when he converted the code over to
using the length field to determine if a descriptor was done or not.
Mitch adds the ability to dump additional information on the VFs, which
is not available through 'ip link show' using debugfs.
Scott adds trace points to the drivers so that BPF programs can be
attached for feature testing and verification.
Jingjing adds admin queue functions for Pipeline Personalization Profile
commands.
Jake does most of the heavy lifting in this series, starting with the
a reduction in the scope of the RTNL lock being held while resetting VFs
to allow multiple PFs to reset in a timely manner. Factored out the
direct queue modification so that we are able to re-use the code.
Reduced the wait time for admin queue commands to complete, since we were
waiting a minimum of a millisecond, when in practice the admin queue
command is processed often much faster. Cleaned up code (flag) we never
use. Make the code to resetting all the VFs optimized for parallel
computing instead of the current way is a serialized fashion, to help
reduce the time it takes.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The NAPI data structure is embedded in the netvsc_device structure
and is freed when device is closed. There is still a reference
(in NAPI list) to this which causes a crash in netif_napi_del
when device is removed. Fix by managing NAPI instances correctly.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Wed, 19 Apr 2017 21:21:22 +0000 (14:21 -0700)]
net_sched: remove useless NULL to tp->root
There is no need to NULL tp->root in ->destroy(), since tp is
going to be freed very soon, and existing readers are still
safe to read them.
For cls_route, we always init its tp->root, so it can't be NULL,
we can drop more useless code.
Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
WANG Cong [Wed, 19 Apr 2017 21:21:21 +0000 (14:21 -0700)]
net_sched: move the empty tp check from ->destroy() to ->delete()
We could have a race condition where in ->classify() path we
dereference tp->root and meanwhile a parallel ->destroy() makes it
a NULL. Daniel cured this bug in commit d936377414fa
("net, sched: respect rcu grace period on cls destruction").
This happens when ->destroy() is called for deleting a filter to
check if we are the last one in tp, this tp is still linked and
visible at that time. The root cause of this problem is the semantic
of ->destroy(), it does two things (for non-force case):
1) check if tp is empty
2) if tp is empty we could really destroy it
and its caller, if cares, needs to check its return value to see if it
is really destroyed. Therefore we can't unlink tp unless we know it is
empty.
As suggested by Daniel, we could actually move the test logic to ->delete()
so that we can safely unlink tp after ->delete() tells us the last one is
just deleted and before ->destroy().
Fixes: 1e052be69d04 ("net_sched: destroy proto tp when all filters are gone") Cc: Roi Dayan <roid@mellanox.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit
set. Flags passed to the kernel are blindly copied to the allocated
rt6_info by ip6_route_info_create making a newly inserted route appear
as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set
and expects rt->dst.from to be set - which it is not since it is not
really a per-cpu copy. The subsequent call to __ip6_dst_alloc then
generates the fault.
Fix by checking for the flag and failing with EINVAL.
Fixes: d52d3997f843f ("ipv6: Create percpu rt6_info") Reported-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Tested-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 19 Apr 2017 21:01:17 +0000 (23:01 +0200)]
bpf: add napi_id read access to __sk_buff
Add napi_id access to __sk_buff for socket filter program types, tc
program types and other bpf_convert_ctx_access() users. Having access
to skb->napi_id is useful for per RX queue listener siloing, f.e.
in combination with SO_ATTACH_REUSEPORT_EBPF and when busy polling is
used, meaning SO_REUSEPORT enabled listeners can then select the
corresponding socket at SYN time already [1]. The skb is marked via
skb_mark_napi_id() early in the receive path (e.g., napi_gro_receive()).
Currently, sockets can only use SO_INCOMING_NAPI_ID from 6d4339028b35
("net: Introduce SO_INCOMING_NAPI_ID") as a socket option to look up
the NAPI ID associated with the queue for steering, which requires a
prior sk_mark_napi_id() after the socket was looked up.
Semantics for the __sk_buff napi_id access are similar, meaning if
skb->napi_id is < MIN_NAPI_ID (e.g. outgoing packets using sender_cpu),
then an invalid napi_id of 0 is returned to the program, otherwise a
valid non-zero napi_id.
Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
K. Y. Srinivasan [Wed, 19 Apr 2017 20:53:49 +0000 (13:53 -0700)]
netvsc: Deal with rescinded channels correctly
We will not be able to send packets over a channel that has been
rescinded. Make necessary adjustments so we can properly cleanup
even when the channel is rescinded. This issue can be trigerred
in the NIC hot-remove path.
Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Apr 2017 17:33:55 +0000 (13:33 -0400)]
Merge branch 'ibmvnic-updates-and-bug-fixes'
Nathan Fontenot says:
====================
ibmvnic: Updates and bug fixes
This set of patches is a series of updates to remove some unneeded
and unused code in the driver as well as bug fixes for the
ibmvnic driver.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Brian King [Wed, 19 Apr 2017 17:45:10 +0000 (13:45 -0400)]
ibmvnic: Disable irq prior to close
Add some code to call disable_irq on all the vnic interface's irqs.
This fixes a crash observed when closing an active interface, as
seen in the oops below when we try to access a buffer in the interrupt
handler which we've already freed.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
We should not be releasing the crq's when calling close for the
adapter, these need to remain open to facilitate operations such
as updating the mac address. The crq's should be released in the
adpaters remove routine.
Additionally, we need to call release_reources from remove. This
corrects the scenario of trying to remove an adapter that has only
been probed.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The inflight list used to track memory that is allocated for crq that are
inflight is not needed. The one piece of the inflight list that does need
to be cleaned at module exit is the error buffer list which is already
attached to the adapter struct.
This patch removes the inflight list and moves checking the error buffer
list to ibmvnic_remove.
Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Brian King [Wed, 19 Apr 2017 17:44:53 +0000 (13:44 -0400)]
ibmvnic: Do not disable IRQ after scheduling tasklet
Since the primary CRQ is only used for service functions and
not in the performance path, simplify the code a bit and avoid
disabling the IRQ.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Brian King [Wed, 19 Apr 2017 17:44:47 +0000 (13:44 -0400)]
ibmvnic: Fixup atomic API usage
Replace a couple of modifications of an atomic followed
by a read of the atomic, which is no longer atomic, to
use atomic_XX_return variants to avoid race conditions.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Brian King [Wed, 19 Apr 2017 17:44:41 +0000 (13:44 -0400)]
ibmvnic: Unmap longer term buffer before free
Make sure we unregister long term buffers from the adapter
prior to DMA unmapping it and freeing the buffer. Failure
to do so could result in a DMA to a now invalid address.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ibmvnic: Fix ibmvnic_change_mac_addr struct format
The ibmvnic_change_mac_addr struct alignment was not matching the defined
format in PAPR+, it had the reserved and return code fields swapped. As a
consequence, the CHANGE_MAC_ADDR_RSP commands were being improperly handled
and executed even when the operation wasn't successfully completed by the
system firmware.
Also changing the endianness of the debug message to make it easier to
parse the CRQ content.
Signed-off-by: Murilo Fossa Vicentini <muvic@linux.vnet.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Falcon [Wed, 19 Apr 2017 17:44:29 +0000 (13:44 -0400)]
ibmvnic: Report errors when failing to release sub-crqs
Add reporting of errors when releasing sub-crqs fails.
Signed-off-by: Thomas Falcon <tlfalcon@us.ibm.com> Signed-off-by: Nathan Fontenot <nfont@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ilan Tayari [Wed, 19 Apr 2017 18:26:07 +0000 (21:26 +0300)]
gso: Validate assumption of frag_list segementation
Commit 07b26c9454a2 ("gso: Support partial splitting at the frag_list
pointer") assumes that all SKBs in a frag_list (except maybe the last
one) contain the same amount of GSO payload.
This assumption is not always correct, resulting in the following
warning message in the log:
skb_segment: too many frags
For example, mlx5 driver in Striding RQ mode creates some RX SKBs with
one frag, and some with 2 frags.
After GRO, the frag_list SKBs end up having different amounts of payload.
If this frag_list SKB is then forwarded, the aforementioned assumption
is violated.
Validate the assumption, and fall back to software GSO if it not true.
Change-Id: Ia03983f4a47b6534dd987d7a2aad96d54d46d212 Fixes: 07b26c9454a2 ("gso: Support partial splitting at the frag_list pointer") Signed-off-by: Ilan Tayari <ilant@mellanox.com> Signed-off-by: Ilya Lesokhin <ilyal@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
gcc points out an useless assignment that was added during code refactoring:
drivers/net/ethernet/cavium/liquidio/lio_ethtool.c: In function 'octnet_intrmod_callback':
drivers/net/ethernet/cavium/liquidio/lio_ethtool.c:1315:59: error: parameter 'oct_dev' set but not used [-Werror=unused-but-set-parameter]
This is harmless but can clearly be remove to avoid the warning.
Fixes: 50c0add534d2 ("liquidio: refactor interrupt moderation code") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:26 +0000 (09:59 -0700)]
kaweth: use skb_cow_head() to deal with cloned skbs
We can use skb_cow_head() to properly deal with clones,
especially the ones coming from TCP stack that allow their head being
modified. This avoids a copy.
Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:25 +0000 (09:59 -0700)]
ch9200: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: 4a476bd6d1d9 ("usbnet: New driver for QinHeng CH9200 devices") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:24 +0000 (09:59 -0700)]
lan78xx: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: 55d7de9de6c3 ("Microchip's LAN7800 family USB 2/3 to 10/100/1000 Ethernet device driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Cc: Woojung Huh <woojung.huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:23 +0000 (09:59 -0700)]
sr9700: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: c9b37458e956 ("USB2NET : SR9700 : One chip USB 1.1 USB2NET SR9700Device Driver Support") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:22 +0000 (09:59 -0700)]
cx82310_eth: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: cc28a20e77b2 ("introduce cx82310_eth: Conexant CX82310-based ADSL router USB ethernet driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 19 Apr 2017 16:59:21 +0000 (09:59 -0700)]
smsc75xx: use skb_cow_head() to deal with cloned skbs
We need to ensure there is enough headroom to push extra header,
but we also need to check if we are allowed to change headers.
skb_cow_head() is the proper helper to deal with this.
Fixes: d0cad871703b ("smsc75xx: SMSC LAN75xx USB gigabit ethernet adapter driver") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: James Hughes <james.hughes@raspberrypi.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Replace 2 jiffies with sysctl netdev_budget_usecs to enable softirq tuning
Constants used for tuning are generally a bad idea, especially as hardware
changes over time. Replace the constant 2 jiffies with sysctl variable
netdev_budget_usecs to enable sysadmins to tune the softirq processing.
Also document the variable.
For example, a very fast machine might tune this to 1000 microseconds,
while my regression testing 486DX-25 needs it to be 4000 microseconds on
a nearly idle network to prevent time_squeeze from being incremented.
Version 2: changed jiffies to microseconds for predictable units.
Signed-off-by: Matthew Whitehead <tedheadster@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 21 Apr 2017 17:21:32 +0000 (13:21 -0400)]
Merge branch 'iptunnel-policy-based-routing'
Craig Gallek says:
====================
ip_tunnel: Allow policy-based routing through tunnels
iproute2 changes to follow. Example usage:
ip link add gre-test type gre local 10.0.0.1 remote 10.0.0.2 fwmark 0x4
ip -detail link show gre-test
...
ip link set gre-test type gre fwmark 0
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
ip_tunnel: Allow policy-based routing through tunnels
This feature allows the administrator to set an fwmark for
packets traversing a tunnel. This allows the use of independent
routing tables for tunneled packets without the use of iptables.
There is no concept of per-packet routing decisions through IPv4
tunnels, so this implementation does not need to work with
per-packet route lookups as the v6 implementation may
(with IP6_TNL_F_USE_ORIG_FWMARK).
Further, since the v4 tunnel ioctls share datastructures
(which can not be trivially modified) with the kernel's internal
tunnel configuration structures, the mark attribute must be stored
in the tunnel structure itself and passed as a parameter when
creating or changing tunnel attributes.
Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ip6_tunnel: Allow policy-based routing through tunnels
This feature allows the administrator to set an fwmark for
packets traversing a tunnel. This allows the use of independent
routing tables for tunneled packets without the use of iptables.
Signed-off-by: Craig Gallek <kraig@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Wed, 19 Apr 2017 14:10:19 +0000 (16:10 +0200)]
ipv6: sr: fix double free of skb after handling invalid SRH
The icmpv6_param_prob() function already does a kfree_skb(),
this patch removes the duplicate one.
Fixes: 1ababeba4a21f3dba3da3523c670b207fb2feb62 ("ipv6: implement dataplane support for rthdr type 4 (Segment Routing Header)") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Cc: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux
Pull powerpc fixes from Michael Ellerman:
"Just two fixes.
The first fixes kprobing a stdu, and is marked for stable as it's been
broken for ~ever. In hindsight this could have gone in next.
The other is a fix for a change we merged this cycle, where if we take
a certain exception when the kernel is running relocated (currently
only used for kdump), we checkstop the box.
Thanks to Ravi Bangoria"
* tag 'powerpc-4.11-8' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
powerpc/64: Fix HMI exception on LE with CONFIG_RELOCATABLE=y
powerpc/kprobe: Fix oops when kprobed on 'stdu' instruction
Merge branch 'for-linus' of git://git.kernel.dk/linux-block
Pull block layer fixes from Jens Axboe:
"A couple of last minute fixes for regressions in this cycle. More
specifically:
- Two patches from Andy, adjusting the NVMe APST quirks to avoid some
issues specific to one Toshiba drive, and some variant of Samsung
on two specific Dell laptops.
- A fix for mtip32xx, turning off mq scheduling on that device. We
have a real fix for this, but it's too late in the cycle.
Thankfully we already have a NO_SCHED flag we can apply here. A
prep patch for this is ensuring that we honor the NO_SCHED flag
when attempting to online switch schedulers, previsouly we only did
so for drive load time. From Ming.
- Fixing an oops in blk-mq polling with scheduling attached. This one
is easily reproducible, it would be a shame to release 4.11 with
that issue. From me.
I'd prefer not having to send in patches at this point in time, but
the above are all things that have regressed in this cycle and the
fixes are relatively straight forward"
* 'for-linus' of git://git.kernel.dk/linux-block:
blk-mq: fix potential oops with polling and blk-mq scheduler
nvme: Quirk APST off on "THNSF5256GPUK TOSHIBA"
nvme: Adjust the Samsung APST quirk
mtip32xx: pass BLK_MQ_F_NO_SCHED
block: respect BLK_MQ_F_NO_SCHED
Merge tag 'acpi-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI build fix from Rafael Wysocki:
"This avoids a false-positive build warning from the compiler.
Specifics:
- Avoid a false-positive warning regarding a variable that may not be
initialized that started to trigger after a previous general build
fix (Arnd Bergmann)"
* tag 'acpi-4.11-final' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI / power: Avoid maybe-uninitialized warning
Merge tag 'mmc-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc
Pull MMC fixes from Ulf Hansson:
"MMC core:
- kmalloc sdio scratch buffer to make it DMA-friendly
MMC host:
- dw_mmc: Fix behaviour for SDIO IRQs when runtime PM is used
- sdhci-esdhc-imx: Correct pad I/O drive strength for UHS-DDR50
cards"
* tag 'mmc-v4.11-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
mmc: sdhci-esdhc-imx: increase the pad I/O drive strength for DDR50 card
mmc: dw_mmc: Don't allow Runtime PM for SDIO cards
mmc: sdio: fix alignment issue in struct sdio_func
tag_lan9303.c does check for a NULL dst but that's already checked by
dsa_switch_rcv() one layer above.
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Juergen Borleis <jbe@pengutronix.de> Signed-off-by: David S. Miller <davem@davemloft.net>
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm: prevent NR_ISOLATE_* stats from going negative
Revert "mm, page_alloc: only use per-cpu allocator for irq-safe requests"
Rabin Vincent [Thu, 20 Apr 2017 21:37:46 +0000 (14:37 -0700)]
mm: prevent NR_ISOLATE_* stats from going negative
Commit 6afcf8ef0ca0 ("mm, compaction: fix NR_ISOLATED_* stats for pfn
based migration") moved the dec_node_page_state() call (along with the
page_is_file_cache() call) to after putback_lru_page().
But page_is_file_cache() can change after putback_lru_page() is called,
so it should be called before putback_lru_page(), as it was before that
patch, to prevent NR_ISOLATE_* stats from going negative.
Without this fix, non-CONFIG_SMP kernels end up hanging in the
while(too_many_isolated()) { congestion_wait() } loop in
shrink_active_list() due to the negative stats.
While the patch worked great for userspace allocations, the fact that
softirq loses the per-cpu allocator caused problems. It needs to be
redone taking into account that a separate list is needed for hard/soft
IRQs or alternatively find a cheap way of detecting reentry due to an
interrupt. Both are possible but sufficiently tricky that it shouldn't
be rushed.
Jesper had one method for allowing softirqs but reported that the cost
was high enough that it performed similarly to a plain revert. His
figures for netperf TCP_STREAM were as follows
Baseline v4.10.0 : 60316 Mbit/s
Current 4.11.0-rc6: 47491 Mbit/s
Jesper's patch : 60662 Mbit/s
This patch : 60106 Mbit/s
As this is a regression, I wish to revert to noirq allocator for now and
go back to the drawing board.
blk-mq: fix potential oops with polling and blk-mq scheduler
If we have a scheduler attached, blk_mq_tag_to_rq() on the
scheduled tags will return NULL if a request is no longer
in flight. This is different than using the normal tags,
where it will always return the fixed request. Check for
this condition for polling, in case we happen to enter
polling for a completed request.
The request address remains valid, so this check and return
should be perfectly safe.
Fixes: bd166ef183c2 ("blk-mq-sched: add framework for MQ capable IO schedulers") Tested-by: Stephen Bates <sbates@raithlin.com> Signed-off-by: Jens Axboe <axboe@fb.com>
Andy Lutomirski [Thu, 20 Apr 2017 20:37:55 +0000 (13:37 -0700)]
nvme: Adjust the Samsung APST quirk
I got a couple more reports: the Samsung APST issues appears to
affect multiple 950-series devices in Dell XPS 15 9550 and Precision
5510 laptops. Change the quirk: rather than blacklisting the
firmware on the first problematic SSD that was reported, disable
APST on all 144d:a802 devices if they're installed in the two
affected Dell models. While we're at it, disable only the deepest
sleep state instead of all of them -- the reporters say that this is
sufficient to fix the problem.
(I have a device that appears to be entirely identical to one of the
affected devices, but I have a different Dell laptop, so it's not
the case that all Samsung devices with firmware BXW75D0Q are broken
under all circumstances.)
Samsung engineers have an affected system, and hopefully they'll
give us a better workaround some time soon. In the mean time, this
should minimize regressions.
See https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1678184
Policing filters do not use the TCA_ACT_* enum and the tb[]
nlattr array in tcf_action_init_1() doesn't get filled for
them so we should not try to look for a TCA_ACT_COOKIE
attribute in the then uninitialized array.
The error handling in cookie allocation then calls
tcf_hash_release() leading to invalid memory access later
on.
Additionally, if cookie allocation fails after an already
existing non-policing filter has successfully been changed,
tcf_action_release() should not be called, also we would
have to roll back the changes in the error handling, so
instead we now allocate the cookie early and assign it on
success at the end.
CVE-2017-7979 Fixes: 1045ba77a596 ("net sched actions: Add support for user cookies") Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed: Fix issue in populating the PFC config paramters.
Change ieee_setpfc() callback implementation to populate traffic class
count with the user provided value.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed: Fix possible system hang in the dcbnl-getdcbx() path.
qed_dcbnl_get_dcbx() API uses kmalloc in GFT_KERNEL mode. The API gets
invoked in the interrupt context by qed_dcbnl_getdcbx callback. Need
to invoke this kmalloc in atomic mode.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed: Fix sending an invalid PFC error mask to MFW.
PFC error-mask value is not supported by MFW, but this bit could be
set in the pfc bit-map of the operational parameters if remote device
supports it. These operational parameters are used as basis for
populating the dcbx config parameters. User provided configs will be
applied on top of these parameters and then send them to MFW when
requested. Driver need to clear the error-mask bit before sending the
config parameters to MFW.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
qed: Fix possible error in populating max_tc field.
Some adapters may not publish the max_tc value. Populate the default
value for max_tc field in case the mfw didn't provide one.
Signed-off-by: Sudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
James Hughes [Wed, 19 Apr 2017 10:13:40 +0000 (11:13 +0100)]
smsc95xx: Use skb_cow_head to deal with cloned skbs
The driver was failing to check that the SKB wasn't cloned
before adding checksum data.
Replace existing handling to extend/copy the header buffer
with skb_cow_head.
Signed-off-by: James Hughes <james.hughes@raspberrypi.org> Acked-by: Eric Dumazet <edumazet@google.com> Acked-by: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 19 Apr 2017 09:59:15 +0000 (12:59 +0300)]
net/mlx5e: IPoIB, Fix error handling in mlx5_rdma_netdev_alloc()
The labels were out of order, so it either could result in an Oops or a
leak.
Fixes: 48935bbb7ae8 ("net/mlx5e: IPoIB, Add netdevice profile skeleton") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Wed, 19 Apr 2017 09:54:33 +0000 (12:54 +0300)]
qede: allocate enough data for ->arfs_fltr_bmap
We've got the number of longs, yes, but we should multiply by
sizeof(long) to get the number of bytes needed.
Fixes: e4917d46a653 ("qede: Add aRFS support") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sekhar Nori [Wed, 19 Apr 2017 08:38:24 +0000 (14:08 +0530)]
MAINTAINERS: update entry for TI's CPSW driver
Mugunthan V N, who was reviewing TI's CPSW driver patches is
not working for TI anymore and wont be reviewing patches for
that driver.
Drop Mugunthan as the maintiainer for this driver.
Grygorii continues to be a reviewer. Dave Miller applies the
patches directly and adding a maintainer is actually
misleading since get_maintainer.pl script stops suggesting
that Dave Miller be copied.
Signed-off-by: Sekhar Nori <nsekhar@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 20 Apr 2017 20:10:31 +0000 (16:10 -0400)]
Merge branch '10GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
10GbE Intel Wired LAN Driver Updates 2017-04-18
This series contains updates to mainly ixgbe with only one ixgbevf change.
Usha adds a check to ensure the creation of number of VF's is valid based
on the traffic classes configured, all to avoid transmit hangs.
Joe Perches reduces the use of pr_cont since the output can be interleaved
by other processes.
Tony cleans up the code overwriting the KX4 config, which is configured by
the NVM. Adds a check for MMNGC.MNG_VETO, to resolve an issue where we
were getting a link loss for the BMC when loading the driver.
Don fixes up SGMII x553 config details which were missed in earlier
implementations. Added support for x552 XFI backplane interface support.
Cleaned up an unused define, which was causing confusion on supported
devices.
Emil fixes a link issue on KR parts by making sure the default setting is
set. Refactors the code so that the code for allocating memory for the
list of MAC addresses that the VFs can use into its own function. Made
some code cleans to help readability and ensure notification of SRIOV
being enabled is done upon completion. Fixed an issue where if we failed
to allocate vfinfo in __ixgbe_enable_sriov() the driver would crash with
a NULL pointer dereference.
Philippe Reynes updates ixgbevf to use the new API for
{get|set}_link_ksettings.
Alex increases the headroom allocation when using build_skb() on a
system with 4K pages. Fixed an issue in ixgbe_dump() where we were no
longer clearing the status bit.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Dan Carpenter [Tue, 18 Apr 2017 19:14:26 +0000 (22:14 +0300)]
dp83640: don't recieve time stamps twice
This patch is prompted by a static checker warning about a potential
use after free. The concern is that netif_rx_ni() can free "skb" and we
call it twice.
When I look at the commit that added this, it looks like some stray
lines were added accidentally. It doesn't make sense to me that we
would recieve the same data two times. I asked the author but never
recieved a response.
I can't test this code, but I'm pretty sure my patch is correct.
Fixes: 4b063258ab93 ("dp83640: Delay scheduled work.") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Stefan Sørensen <stefan.sorensen@spectralink.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ipv6: Fix UDP early demux lookup with udp_l3mdev_accept=0
David Ahern reported that 5425077d73e0c ("net: ipv6: Add early demux
handler for UDP unicast") breaks udp_l3mdev_accept=0 since early
demux for IPv6 UDP was doing a generic socket lookup which does not
require an exact match. Fix this by making UDPv6 early demux match
connected sockets only.
v1->v2: Take reference to socket after match as suggested by Eric
v2->v3: Add comment before break
Fixes: 5425077d73e0c ("net: ipv6: Add early demux handler for UDP unicast") Reported-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Cc: Eric Dumazet <edumazet@google.com> Acked-by: David Ahern <dsa@cumulusnetworks.com> Tested-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 18 Apr 2017 16:45:52 +0000 (09:45 -0700)]
tcp: remove poll() flakes with FastOpen
When using TCP FastOpen for an active session, we send one wakeup event
from tcp_finish_connect(), right before the data eventually contained in
the received SYNACK is queued to sk->sk_receive_queue.
This means that depending on machine load or luck, poll() users
might receive POLLOUT events instead of POLLIN|POLLOUT
To fix this, we need to move the call to sk->sk_state_change()
after the (optional) call to tcp_rcv_fastopen_synack()
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 18 Apr 2017 16:45:51 +0000 (09:45 -0700)]
tcp: remove poll() flakes when receiving RST
When a RST packet is processed, we send two wakeup events to interested
polling users.
First one by a sk->sk_error_report(sk) from tcp_reset(),
followed by a sk->sk_state_change(sk) from tcp_done().
Depending on machine load and luck, poll() can either return POLLERR,
or POLLIN|POLLOUT|POLLERR|POLLHUP (this happens on 99 % of the cases)
This is probably fine, but we can avoid the confusion by reordering
things so that we have more TCP fields updated before the first wakeup.
This might even allow us to remove some barriers we added in the past.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Lebrun [Tue, 18 Apr 2017 15:59:49 +0000 (17:59 +0200)]
ipv6: sr: fix out-of-bounds access in SRH validation
This patch fixes an out-of-bounds access in seg6_validate_srh() when the
trailing data is less than sizeof(struct sr6_tlv).
Reported-by: Andrey Konovalov <andreyknvl@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: David Lebrun <david.lebrun@uclouvain.be> Signed-off-by: David S. Miller <davem@davemloft.net>
Mike Maloney [Tue, 18 Apr 2017 15:14:16 +0000 (11:14 -0400)]
selftests/net: Fixes psock_fanout CBPF test case
'psock_fanout' has been failing since commit 4d7b9dc1f36a9 ("tools:
psock_lib: harden socket filter used by psock tests"). That commit
changed the CBPF filter to examine the full ethernet frame, and was
tested on 'psock_tpacket' which uses SOCK_RAW. But 'psock_fanout' was
also using this same CBPF in two places, for filtering and fanout, on a
SOCK_DGRAM socket.
Change 'psock_fanout' to use SOCK_RAW so that the CBPF program used with
SO_ATTACH_FILTER can examine the entire frame. Create a new CBPF
program for use with PACKET_FANOUT_DATA which ignores the header, as it
cannot see the ethernet header.
Tested: Ran tools/testing/selftests/net/psock_{fanout,tpacket} 10 times,
and they all passed.
Fixes: 4d7b9dc1f36a9 ("tools: psock_lib: harden socket filter used by psock tests") Signed-off-by: 'Mike Maloney <maloneykernel@gmail.com>' Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Thu, 20 Apr 2017 19:32:16 +0000 (21:32 +0200)]
mac80211: reject ToDS broadcast data frames
AP/AP_VLAN modes don't accept any real 802.11 multicast data
frames, but since they do need to accept broadcast management
frames the same is currently permitted for data frames. This
opens a security problem because such frames would be decrypted
with the GTK, and could even contain unicast L3 frames.
Since the spec says that ToDS frames must always have the BSSID
as the RA (addr1), reject any other data frames.
The problem was originally reported in "Predicting, Decrypting,
and Abusing WPA2/802.11 Group Keys" at usenix
https://www.usenix.org/conference/usenixsecurity16/technical-sessions/presentation/vanhoef
and brought to my attention by Jouni.
Cc: stable@vger.kernel.org Reported-by: Jouni Malinen <j@w1.fi> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
--
Dave, I didn't want to send you a new pull request for a single
commit yet again - can you apply this one patch as is? Signed-off-by: David S. Miller <davem@davemloft.net>
When there is no FID set for a specific packet, the HW will drop it.
However, by default these packets are useful to be delivered to CPU as
it can inspect them and program HW accordingly. So add this trap.
This would only ever happen when port is enslaved to an OVS master.
Otherwise, packets would be dropped during VLAN / STP filtering,
before FID classification.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum: Allow ports to work under OVS master
>From now on, a port can become a slave of OVS master. All vlans
are enabled, STP state is set to "forwarding". It is up to the OVS
userspace daemon to setup the flows either in kernel or in HW using TC
flower offload.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum: Teach mlxsw_sp_port_vlan_set to accept any vlan range
So far, mlxsw_sp_port_vlan_set range is limited by
MLXSW_REG_SPVM_REC_MAX_COUNT. In spectrum_switchdev code this is
wrapped up by a helper function which actually does multiple calls
to FW for bigger ranges. Move the code into mlxsw_sp_port_vlan_set
and use it always. That allows caller not to care about the range.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_flower: Set dummy FID before forward action
HW requires the FID to be valid in order for the forward action to work.
So regardless of the current FID validity, just set the dummy FID which
would do the trick.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
For forwarding using ACL action, HW needs a valid FID to be setup. It
does not actually use it, so it can be any valid FID. So create a dummy
FID only for this purpose.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Implement part of multipurpose Virtual Router and Forwarding Domain
Action that takes care of setting up FID. We need to use it to be able
to forward packets using ACL action when no FID is associated on RX.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'trace-v4.11-rc5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull two more ftrace fixes from Steven Rostedt:
"While continuing my development, I uncovered two more small bugs.
One is a race condition when enabling the snapshot function probe
trigger. It enables the probe before allocating the snapshot, and if
the probe triggers first, it stops tracing with a warning that the
snapshot buffer was not allocated.
The seconds is that the snapshot file should show how to use it when
it is empty. But a bug fix from long ago broke the "is empty" test and
the snapshot file no longer displays the help message"
* tag 'trace-v4.11-rc5-5' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
ring-buffer: Have ring_buffer_iter_empty() return true when empty
tracing: Allocate the snapshot buffer before enabling probe
bindings: net: stmmac: add missing note about LPI interrupt
The hardware has a LPI interrupt.
There is already code in the stmmac driver to parse and handle the
interrupt. However, this information was missing from the DT binding.
At the same time, improve the description of the existing interrupts.
Signed-off-by: Niklas Cassel <niklas.cassel@axis.com> Acked-By: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bpf: remove reference to sock_filter_ext from kerneldoc comment
struct sock_filter_ext didn't make it into the tree and is now called
struct bpf_insn. Reword the kerneldoc comment for bpf_convert_filter()
accordingly.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fix from Martin Schwidefsky:
"There is one more fix I would like to see in 4.11: The combination of
KVM, CMMA and heavy paging can cause data corruption, the fix is to
clear the _PAGE_UNUSED bit in set_pte_at()"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/mm: fix CMMA vs KSM vs others