Heiner Kallweit [Mon, 5 Mar 2018 21:34:27 +0000 (22:34 +0100)]
net: phy: remove phy_error from phy_disable_interrupts
All callers of phy_disable_interrupts() call phy_error() in the error
case. Therefore we don't need to do this within the function too.
This change also allows us to use phy_disable_interrupts() in code
holding phydev->lock (because phy_error() can take this lock).
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Bhole [Tue, 6 Mar 2018 08:31:32 +0000 (17:31 +0900)]
selftests/net: fix in_netns.sh script
execute the subprocess in netns using 'ip netns exec'
Fixes: cc30c93fa020 ("selftests/net: ignore background traffic in psock_fanout") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: mvpp2: mvpp2_check_hw_buf_num() can be static
Fixes: effbf5f58d64 ("net: mvpp2: update the BM buffer free/destroy logic") Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
There are two compatibility strings for mv88e6xxx, but it isn't clear
from the documentation why only those two exist when the mv88e6xxx driver
supports more than the 6085 and 6190. Briefly describe how the compatible
property is used, and provide guidance on which to use.
The model list comes from looking at port_base_addr values (0x0 vs 0x10)
in drivers/net/dsa/mv88e6xxx/chip.c.
Signed-off-by: Brandon Streiff <brandon.streiff@ni.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
tipc: bcast: use true and false for boolean values
Assign true or false to boolean variables instead of an integer value.
This issue was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 7 Mar 2018 17:05:56 +0000 (12:05 -0500)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
1GbE Intel Wired LAN Driver Updates 2018-03-05
This series contains updates to igb only.
Corinna Vinschen adds the support for trusted VFs into the igb driver.
Mika fixes an issue where PCIe device is physically unplugged can cause
a kernel crash. This issue is that netif_device_detach() is called in
these cases, which prevents netif_unregister() from bringing the device
down properly.
Christophe JAILLET fixes an issue with igb where HWTSTAMP_TX_ON was
being handled like a bit mask and not a value.
v2: dropped the e1000e fix from the series since I will be pushing it
through David Miller's net tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 7 Mar 2018 16:44:43 +0000 (11:44 -0500)]
Merge branch 'lan743x-driver'
Bryan Whitehead says:
====================
lan743x: Add new lan743x driver
Add new lan743x driver.
The lan743x from Microchip Technologies Inc,
is a PCIe to Gigabit Ethernet Controller.
Updates for V4:
Patch 1/2 - Applied community suggestions
convert to using module_pci_driver
Updates for V3:
Patch 1/2 - Applied community suggestions
removed initialization tracking flags.
converted to 64 bit statistics.
converted tx clean up tasklet to napi.
Updates for V2:
Patch 1/2 - Applied community suggestions
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bryan Whitehead [Mon, 5 Mar 2018 19:23:30 +0000 (14:23 -0500)]
lan743x: Add main source files for new lan743x driver
Add main source files for new lan743x driver
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP_SENDALL: This flag, if set, will cause a one-to-many
style socket to send the message to all associations that
are currently established on this socket. For the one-to-
one style socket, this flag has no effect.
Note there is another msg_control option:
5.3.8. SCTP AUTH Information Structure (SCTP_AUTHINFO)
It's a little complicated, I will post it in another patchset after
this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 5 Mar 2018 12:44:20 +0000 (20:44 +0800)]
sctp: add support for snd flag SCTP_SENDALL process in sendmsg
This patch is to add support for snd flag SCTP_SENDALL process
in sendmsg, as described in section 5.3.4 of RFC6458.
With this flag, you can send the same data to all the asocs of
this sk once.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 5 Mar 2018 12:44:19 +0000 (20:44 +0800)]
sctp: add support for SCTP_DSTADDRV4/6 Information for sendmsg
This patch is to add support for Destination IPv4/6 Address options
for sendmsg, as described in section 5.3.9/10 of RFC6458.
With this option, you can provide more than one destination addrs
to sendmsg when creating asoc, like sctp_connectx.
It's also a necessary send info for sctp_sendv.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 5 Mar 2018 12:44:18 +0000 (20:44 +0800)]
sctp: add support for PR-SCTP Information for sendmsg
This patch is to add support for PR-SCTP Information for sendmsg,
as described in section 5.3.7 of RFC6458.
With this option, you can specify pr_policy and pr_value for user
data in sendmsg.
It's also a necessary send info for sctp_sendv.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When addressing a review comment in a early version of the offending
patch a comment where left in which should have been removed. Remove the
comment to keep it consistent with the code.
Fixes: 75efa06f457bbed3 ("ravb: add support for changing MTU") Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Thu, 1 Mar 2018 12:23:28 +0000 (15:23 +0300)]
net: Make account struct net to memcg
The patch adds SLAB_ACCOUNT to flags of net_cachep cache,
which enables accounting of struct net memory to memcg kmem.
Since number of net_namespaces may be significant, user
want to know, how much there were consumed, and control.
Note, that we do not account net_generic to the same memcg,
where net was accounted, moreover, we don't do this at all (*).
We do not want the situation, when single memcg memory deficit
prevents us to register new pernet_operations.
(*)Even despite there is !current process accounting already
available in linux-next. See kmalloc_memcg() there for the details.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Davidlohr Bueso [Mon, 22 Jan 2018 17:21:37 +0000 (09:21 -0800)]
ia64/err-inject: Use get_user_pages_fast()
At the point of sysfs callback, the call to gup is
done without mmap_sem (or any lock for that matter).
This is racy. As such, use the get_user_pages_fast()
alternative and safely avoid taking the lock, if possible.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>
Matthew Wilcox [Mon, 19 Feb 2018 17:41:26 +0000 (09:41 -0800)]
ia64: Convert remaining atomic operations
While we've only seen inlining problems with atomic_sub_return(),
the other atomic operations could have the same problem. Convert all
remaining operations to use the same solution as atomic_sub_return().
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Corentin Labbe [Wed, 14 Feb 2018 12:19:06 +0000 (12:19 +0000)]
ia64: convert unwcheck.py to python3
Since my system use python3 as default, arch/ia64/scripts/unwcheck.py no
longer run.
This patch convert it to the python3 syntax.
I have ran it with python2/python3 while printing values of
start/end/rlen_sum which could be impacted by this change and I see no difference.
Fixes: 94a47083522e ("scripts: change scripts to use system python instead of env") Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Linus Torvalds [Mon, 5 Mar 2018 19:57:06 +0000 (11:57 -0800)]
Merge tag 'linux-kselftest-4.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"A fix for regression in memory-hotplug install script that prevents
the test from running on the target"
* tag 'linux-kselftest-4.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: memory-hotplug: fix emit_tests regression
1) Use an appropriate TSQ pacing shift in mac80211, from Toke
Høiland-Jørgensen.
2) Just like ipv4's ip_route_me_harder(), we have to use skb_to_full_sk
in ip6_route_me_harder, from Eric Dumazet.
3) Fix several shutdown races and similar other problems in l2tp, from
James Chapman.
4) Handle missing XDP flush properly in tuntap, for real this time.
From Jason Wang.
5) Out-of-bounds access in powerpc ebpf tailcalls, from Daniel
Borkmann.
6) Fix phy_resume() locking, from Andrew Lunn.
7) IFLA_MTU values are ignored on newlink for some tunnel types, fix
from Xin Long.
8) Revert F-RTO middle box workarounds, they only handle one dimension
of the problem. From Yuchung Cheng.
9) Fix socket refcounting in RDS, from Ka-Cheong Poon.
10) Don't allow ppp unit registration to an unregistered channel, from
Guillaume Nault.
11) Various hv_netvsc fixes from Stephen Hemminger.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (98 commits)
hv_netvsc: propagate rx filters to VF
hv_netvsc: filter multicast/broadcast
hv_netvsc: defer queue selection to VF
hv_netvsc: use napi_schedule_irqoff
hv_netvsc: fix race in napi poll when rescheduling
hv_netvsc: cancel subchannel setup before halting device
hv_netvsc: fix error unwind handling if vmbus_open fails
hv_netvsc: only wake transmit queue if link is up
hv_netvsc: avoid retry on send during shutdown
virtio-net: re enable XDP_REDIRECT for mergeable buffer
ppp: prevent unregistered channels from connecting to PPP units
tc-testing: skbmod: fix match value of ethertype
mlxsw: spectrum_switchdev: Check success of FDB add operation
net: make skb_gso_*_seglen functions private
net: xfrm: use skb_gso_validate_network_len() to check gso sizes
net: sched: tbf: handle GSO_BY_FRAGS case in enqueue
net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len
rds: Incorrect reference counting in TCP socket creation
net: ethtool: don't ignore return from driver get_fecparam method
vrf: check forwarding on the original netdevice when generating ICMP dest unreachable
...
David S. Miller [Mon, 5 Mar 2018 17:55:55 +0000 (12:55 -0500)]
Merge branch 'mvpp2-jumbo-frames-support'
Antoine Tenart says:
====================
net: mvpp2: jumbo frames support
This series enable jumbo frames support in the Marvell PPv2 driver. The
first 2 patches rework the buffer management, then two patches prepare for
the final patch which adds the jumbo frames support into the driver.
This is based on top of net-next, and was tested on a mcbin.
Thanks!
Antoine
Since v1:
- Improved the Tx FIFO initialization comment.
- Improved the pool sanity check in mvpp2_bm_pool_use().
- Fixed pool related comments.
- Cosmetic fixes (used BIT() whenever possible).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Chulski [Mon, 5 Mar 2018 14:16:54 +0000 (15:16 +0100)]
net: mvpp2: jumbo frames support
This patch adds the support for jumbo frames in the Marvell PPv2 driver.
A third buffer pool is added with 10KB buffers, which is used if the MTU
is higher than 1518B for packets larger than 1518B. Please note only the
port 0 supports hardware checksum offload due to the Tx FIFO size
limitation.
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
[Antoine: cosmetic cleanup, commit message] Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Mon, 5 Mar 2018 14:16:53 +0000 (15:16 +0100)]
net: mvpp2: enable UDP/TCP checksum over IPv6
This patch adds the NETIF_F_IPV6_CSUM to the driver's features to enable
UDP/TCP checksum over IPv6. No extra configuration of the engine is
needed on top of the IPv4 counterpart, which already is in the features
list (NETIF_F_IP_CSUM).
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yan Markman [Mon, 5 Mar 2018 14:16:52 +0000 (15:16 +0100)]
net: mvpp2: use a data size of 10kB for Tx FIFO on port 0
This patch sets the Tx FIFO data size on port 0 to 10kB. This prepares
the PPv2 driver for the Jumbo frame support addition as the hardware
will need big enough Tx FIFO buffers when dealing with frames going
through an interface with an MTU of 9000.
Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Antoine: commit message, small reworks.] Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Chulski [Mon, 5 Mar 2018 14:16:51 +0000 (15:16 +0100)]
net: mvpp2: update the BM buffer free/destroy logic
The buffer free routine is updated to release only given a number of
buffers, and the destroy routine now checks the actual number of buffers
in the (BPPI and BPPE) HW counters before draining the pools. This
change helps getting jumbo frames support.
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
[Antoine: cosmetic cleanup, commit message] Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Chulski [Mon, 5 Mar 2018 14:16:50 +0000 (15:16 +0100)]
net: mvpp2: use the same buffer pool for all ports
This patch configures the buffer manager long pool for all ports part of
the same CP. Long pool separation between ports is redundant since there
are no performance improvement when different pools are used.
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
[Antoine: cosmetic cleanup, commit message] Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: core: dst: Add kernel-doc for 'net' parameter
This fixes the following kernel-doc warning:
./include/net/dst.h:366: warning: Function parameter or member 'net' not described in 'skb_tunnel_rx'
Fixes: ea23192e8e57 ("tunnels: harmonize cleanup done on skb on rx path") Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: David S. Miller <davem@davemloft.net>
net: core: dst_cache_set_ip6: Rename 'addr' parameter to 'saddr' for consistency
The other dst_cache_{get,set}_ip{4,6} functions, and the doc comment for
dst_cache_set_ip6 use 'saddr' for their source address parameter. Rename
the parameter to increase consistency.
This fixes the following kernel-doc warnings:
./include/net/dst_cache.h:58: warning: Function parameter or member 'addr' not described in 'dst_cache_set_ip6'
./include/net/dst_cache.h:58: warning: Excess function parameter 'saddr' description in 'dst_cache_set_ip6'
Fixes: 911362c70df5 ("net: add dst_cache support") Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: David S. Miller <davem@davemloft.net>
'HWTSTAMP_TX_ON' should be handled as a value, not as a bit mask.
The modified code should behave the same, because HWTSTAMP_TX_ON is 1
and no other possible values of 'tx_type' would match the test.
However, this is more future-proof, should other values be allowed one day.
See 'struct hwtstamp_config' in 'include/uapi/linux/net_tstamp.h'
This fixes a warning reported by smatch:
igb_xmit_frame_ring() warn: bit shifter 'HWTSTAMP_TX_ON' used for logical '&'
Fixes: 26bd4e2db06be ("igb: protect TX timestamping from API misuse") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mika Westerberg [Tue, 23 Jan 2018 10:28:41 +0000 (13:28 +0300)]
igb: Do not call netif_device_detach() when PCIe link goes missing
When the driver notices that PCIe link is gone by reading 0xffffffff
from a register it clears hw->hw_addr and then calls netif_device_detach().
This happens when the PCIe device is physically unplugged for example
the user disconnected the Thunderbolt cable.
However, netif_device_detach() prevents netif_unregister() from bringing
the device down properly including tearing down MSI-X vectors. This
triggers following crash during the driver removal:
To prevent the crash do not call netif_device_detach() in igb_rd32().
This should be fine because hw->hw_addr is set to NULL preventing future
hardware access of the now missing device.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=198181 Reported-by: Ferenc Boldog <ferenc.boldog@gmail.com> Reported-by: Nikolay Bogoychev <nheart@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
this series continues to review and to convert pernet_operations
to make them possible to be executed in parallel for several
net namespaces in the same time. The patches touch mostly netfilter,
also there are small number of changes in other places.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:32:23 +0000 (14:32 +0300)]
net: Convert proto_gre_net_ops
These pernet_operations register and unregister sysctl.
nf_conntrack_l4proto_gre4->init_net is simple memory
initializer. Also, exit method removes gre keymap_list,
which is per-net. This looks safe to be executed
in parallel with other pernet_operations.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:32:15 +0000 (14:32 +0300)]
net: Convert ctnetlink_net_ops
These pernet_operations register and unregister
two conntrack notifiers, and they seem to be safe
to be executed in parallel.
General/not related to async pernet_operations JFI:
ctnetlink_net_exit_batch() actions are grouped in batch,
and this could look like there is synchronize_rcu()
is forgotten. But there is synchronize_rcu() on module
exit patch (in ctnetlink_exit()), so this batch may
be reworked as simple .exit method.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:32:06 +0000 (14:32 +0300)]
net: Convert nf_conntrack_net_ops
These pernet_operations register and unregister sysctl and /proc
entries. Exit batch method also waits till all per-net conntracks
are dead. Thus, they are safe to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:55 +0000 (14:31 +0300)]
net: Convert ip_set_net_ops
These pernet_operations initialize and destroy
net_generic(net, ip_set_net_id)-related data.
Since ip_set is under CONFIG_IP_SET, it's easy
to watch drivers, which depend on this config.
All of them are in net/netfilter/ipset directory,
except of net/netfilter/xt_set.c. There are no
more drivers, which use ip_set, and all of
the above don't register another pernet_operations.
Also, there are is no indirect users, as header
file include/linux/netfilter/ipset/ip_set.h does
not define indirect users by something like this:
Kirill Tkhai [Mon, 5 Mar 2018 11:31:47 +0000 (14:31 +0300)]
net: Convert fou_net_ops
These pernet_operations initialize and destroy
pernet net_generic(net, fou_net_id) list.
The rest of net_generic(net, fou_net_id) accesses
may happen after netlink message, and in-tree
pernet_operations do not send FOU_GENL_NAME messages.
So, these pernet_operations are safe to be marked
as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:28 +0000 (14:31 +0300)]
net: Convert dccp_v4_ops
These pernet_operations create and destroy net::dccp::v4_ctl_sk.
It looks like another pernet_operations don't want to send
dccp packets to dying or creating net. Batch method similar
to ipv4/ipv6 sockets and it has to be safe to be executed
in parallel with anything else. So, we mark them as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:19 +0000 (14:31 +0300)]
net: Convert cangw_pernet_ops
These pernet_operations have a deal with cgw_list,
and the rest of accesses are made under rtnl_lock().
The only exception is cgw_dump_jobs(), which is
accessed under rcu_read_lock(). cgw_dump_jobs() is
called on netlink request, and it does not seem,
foreign pernet_operations want to send a net such
the messages. So, we mark them as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:10 +0000 (14:31 +0300)]
net: Convert caif_net_ops
Init method just allocates memory for new cfg, and
assigns net_generic(net, caif_net_id). Despite there is
synchronize_rcu() on error path in cfcnfg_create(),
in real this function does not use global lists,
so it looks like this synchronize_rcu() is some legacy
inheritance. Exit method removes caif devices under
rtnl_lock().
There could be a problem, if someone from foreign net
pernet_operations dereference caif_net_id of this net.
It's dereferenced in get_cfcnfg() and caif_device_list().
get_cfcnfg() is used from netdevice notifiers, where
they are called under rtnl_lock(). The notifiers can't
be called from foreign nets pernet_operations. Also,
it's used from caif_disconnect_client() and from
caif_connect_client(). The both of the functions work
with caif socket, and there is the only possibility
to have a socket, when the net is dead. This may happen
only of the socket was created as kern using sk_alloc().
Grep by PF_CAIF shows we do not create kern caif sockets,
so get_cfcnfg() is safe.
caif_device_list() is used in netdevice notifiers and exit
method under rtnl lock. Also, from caif_get() used in
the netdev notifiers and in caif_flow_cb(). The last item
is skb destructor. Since there are no kernel caif sockets
nobody can send net a packet in parallel with init/exit,
so this is also safe.
So, these pernet_operations are safe to be async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:00 +0000 (14:31 +0300)]
net: Convert arp_tables_net_ops and ip6_tables_net_ops
These pernet_operations call xt_proto_init() and xt_proto_fini(),
which just register and unregister /proc entries.
They are safe to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nobody can send such a packet to a net before it's became
registered, nobody can send a packet after all netdevices
are unregistered. So, these pernet_operations are able
to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:30:41 +0000 (14:30 +0300)]
net: Convert broute_net_ops, frame_filter_net_ops and frame_nat_net_ops
These pernet_operations use ebt_register_table() and
ebt_unregister_table() to act on the tables, which
are used as argument in ebt_do_table(), called from
ebtables hooks.
Since there are no net-related bridge packets in-flight,
when the init and exit methods are called, these
pernet_operations are safe to be executed in parallel
with any other.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 5 Mar 2018 01:37:47 +0000 (17:37 -0800)]
selftests: forwarding: Add suppport to create veth interfaces
For tests using veth interfaces, the test infrastructure can create
the netdevs if they do not exist. Arguably this is a preferred approach
since the tests require p$N and p$(N+1) to be pairs.
Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add a generic netlink family for NCSI. This supports three commands;
NCSI_CMD_PKG_INFO which returns information on packages and their
associated channels, NCSI_CMD_SET_INTERFACE which allows a specific
package or package/channel combination to be set as the preferred
choice, and NCSI_CMD_CLEAR_INTERFACE which clears any preferred setting.
Signed-off-by: Samuel Mendoza-Jonas <sam@mendozajonas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Sun, 4 Mar 2018 12:12:04 +0000 (14:12 +0200)]
net: Make RX-FCS and LRO mutually exclusive
LRO and RX-FCS offloads cannot be enabled at the same time since it is
not clear what should happen to the FCS of each coalesced packet.
The FCS is not really part of the TCP payload, hence cannot be merged
into one big packet. On the other hand, providing one big LRO packet
with one FCS contradicts the RX-FCS feature goal.
Use the fix features mechanism in order to prevent intersection of the
features and drop LRO in case RX-FCS is requested.
Enabling RX-FCS while LRO is enabled will result in:
$ ethtool -K ens6 rx-fcs on
Actual changes:
large-receive-offload: off [requested on]
rx-fcs: on
Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Intiyaz Basha [Sat, 3 Mar 2018 02:29:04 +0000 (18:29 -0800)]
liquidio: Corrected Rx bytes counting
Corrected stats mismatch between Host Tx and its peer Rx stats
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Acked-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 5 Mar 2018 03:18:21 +0000 (22:18 -0500)]
Merge branch 'hv_netvsc-minor-fixes'
Stephen Hemminger says:
====================
hv_netvsc: minor fixes
These are improvements to netvsc driver. They aren't functionality
changes so not targeting net-next; and they are not show stopper
bugs that need to go to stable either.
v2
- drop the irq flags patch, defer it to net-next
- split the multicast filter flag patch out
- change propogate rx mode patch to handle startup of vf
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The netvsc device should propagate filters to the SR-IOV VF
device (if present). The flags also need to be propagated to the
VF device as well. This only really matters on local Hyper-V
since Azure does not support multiple addresses.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When VF is used for accelerated networking it will likely have
more queues (and different policy) than the synthetic NIC.
This patch defers the queue policy to the VF so that all the
queues can be used. This impacts workloads like local generate UDP.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
hv_netvsc: fix race in napi poll when rescheduling
There is a race between napi_reschedule and re-enabling interrupts
which could lead to missed host interrrupts. This occurs when
interrupts are re-enabled (hv_end_read) and vmbus irq callback
(netvsc_channel_cb) has already scheduled NAPI.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
hv_netvsc: cancel subchannel setup before halting device
Block setup of multiple channels earlier in the teardown
process. This avoids possible races between halt and subchannel
initialization.
Suggested-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Change the initialization order so that the device is ready to transmit
(ie connect vsp is completed) before setting the internal reference
to the device with RCU.
This avoids any races on initialization and prevents retry issues
on shutdown.
Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Fri, 2 Mar 2018 09:29:14 +0000 (17:29 +0800)]
virtio-net: re enable XDP_REDIRECT for mergeable buffer
XDP_REDIRECT support for mergeable buffer was removed since commit 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
case"). This is because we don't reserve enough tailroom for struct
skb_shared_info which breaks XDP assumption. So this patch fixes this
by reserving enough tailroom and using fixed size of rx buffer.
Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 2 Mar 2018 17:41:16 +0000 (18:41 +0100)]
ppp: prevent unregistered channels from connecting to PPP units
PPP units don't hold any reference on the channels connected to it.
It is the channel's responsibility to ensure that it disconnects from
its unit before being destroyed.
In practice, this is ensured by ppp_unregister_channel() disconnecting
the channel from the unit before dropping a reference on the channel.
However, it is possible for an unregistered channel to connect to a PPP
unit: register a channel with ppp_register_net_channel(), attach a
/dev/ppp file to it with ioctl(PPPIOCATTCHAN), unregister the channel
with ppp_unregister_channel() and finally connect the /dev/ppp file to
a PPP unit with ioctl(PPPIOCCONNECT).
Once in this situation, the channel is only held by the /dev/ppp file,
which can be released at anytime and free the channel without letting
the parent PPP unit know. Then the ppp structure ends up with dangling
pointers in its ->channels list.
Prevent this scenario by forbidding unregistered channels from
connecting to PPP units. This maintains the code logic by keeping
ppp_unregister_channel() responsible from disconnecting the channel if
necessary and avoids modification on the reference counting mechanism.
This issue seems to predate git history (successfully reproduced on
Linux 2.6.26 and earlier PPP commits are unrelated).
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 2 Mar 2018 15:03:32 +0000 (16:03 +0100)]
ipvlan: forbid vlan devices on top of ipvlan
Currently we allow the creation of 8021q devices on top of
ipvlan, but such devices are nonfunctional, as the underlying
ipvlan rx_hanlder hook can't match the relevant traffic.
Be explicit and forbid the creation of such nonfunctional devices.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Fri, 2 Mar 2018 13:44:39 +0000 (14:44 +0100)]
tc-testing: skbmod: fix match value of ethertype
iproute2 print_skbmod() prints the configured ethertype using format 0x%X:
therefore, test 9aa8 systematically fails, because it configures action #4
using ethertype 0x0031, and expects 0x0031 when it reads it back. Changing
the expected value to 0x31 lets the test result 'not ok' become 'ok'.
tested with:
# ./tdc.py -e 9aa8
Test 9aa8: Get a single skbmod action from a list
All test results:
1..1
ok 1 9aa8 Get a single skbmod action from a list
Fixes: cf797ac49b94 ("tc-testing: Add test cases for police and skbmod") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Fri, 2 Mar 2018 09:29:14 +0000 (17:29 +0800)]
virtio-net: re enable XDP_REDIRECT for mergeable buffer
XDP_REDIRECT support for mergeable buffer was removed since commit 7324f5399b06 ("virtio_net: disable XDP_REDIRECT in receive_mergeable()
case"). This is because we don't reserve enough tailroom for struct
skb_shared_info which breaks XDP assumption. So this patch fixes this
by reserving enough tailroom and using fixed size of rx buffer.
Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Bhole [Fri, 2 Mar 2018 02:22:20 +0000 (11:22 +0900)]
selftests: rtnetlink: remove testns on test fail
This patch removes testns after test failure so that next test can
continue with clean ns
Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: William Tu <u9012063@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 4 Mar 2018 23:35:02 +0000 (18:35 -0500)]
Merge branch 'gre-seq-collect_md'
William Tu says:
====================
gre: add sequence number for collect md mode.
Currently GRE sequence number can only be used in native tunnel mode.
The first patch adds sequence number support for gre collect
metadata mode, and the second patch tests it using BPF.
RFC2890 defines GRE sequence number to be specific to the traffic
flow identified by the key. However, this patch does not implement
per-key seqno. The sequence number is shared in the same tunnel
device. That is, different tunnel keys using the same collect_md
tunnel share single sequence number.
A new BFP uapi tunnel flag 'BPF_F_SEQ_NUMBER' is added.
--
v1->v2:
rename BPF_F_GRE_SEQ to BPF_F_SEQ_NUMBER suggested by Daniel
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
William Tu [Thu, 1 Mar 2018 21:49:57 +0000 (13:49 -0800)]
gre: add sequence number for collect md mode.
Currently GRE sequence number can only be used in native
tunnel mode. This patch adds sequence number support for
gre collect metadata mode. RFC2890 defines GRE sequence
number to be specific to the traffic flow identified by the
key. However, this patch does not implement per-key seqno.
The sequence number is shared in the same tunnel device.
That is, different tunnel keys using the same collect_md
tunnel share single sequence number.
Signed-off-by: William Tu <u9012063@gmail.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
New adapter needs CMD_OPENF_IG_DESCCACHE flag to be set. If this flag is
not set, fw flushes the global IG desc cache. This flag is nop in older
adapter.
Also increment driver version
Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Thu, 1 Mar 2018 16:42:40 +0000 (16:42 +0000)]
net: amd8111e: remove redundant assignment to 'tx_index'
The variable tx_index is being initialized with a value that is never
read and re-assigned a little later, hence the initialization is redundant
and can be removed.
Cleans up clang warning:
drivers/net/ethernet/amd/amd8111e.c:652:6: warning: Value stored to
'tx_index' during its initialization is never read
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Thu, 1 Mar 2018 11:27:35 +0000 (13:27 +0200)]
r8169: switch to device-managed functions in probe (part 2)
This is a follow up to the commit
4c45d24a759d ("r8169: switch to device-managed functions in probe")
to move towards managed resources even more.
Cc: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Shevchenko [Thu, 1 Mar 2018 11:27:34 +0000 (13:27 +0200)]
r8169: Dereference MMIO address immediately before use
There is no need to dereference struct rtl8169_private to get mmio_addr
in almost every function in the driver.
Replace it by using pointer to struct rtl8169_private directly.
No functional change intended.
Next step might be a conversion of RTL_Wxx() / RTL_Rxx() macros
to inline functions for sake of type checking.
Cc: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shalom Toledo [Thu, 1 Mar 2018 10:37:05 +0000 (11:37 +0100)]
mlxsw: spectrum_switchdev: Check success of FDB add operation
Until now, we assumed that in case of error when adding FDB entries, the
write operation will fail, but this is not the case. Instead, we need to
check that the number of entries reported in the response is equal to
the number of entries specified in the request.
Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC") Reported-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Shalom Toledo <shalomt@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The Virtual Interfaces are connected to an internal switch on the chip
which allows VIs attached to the same port to talk to each other even
when the port link is down. As a result, we generally want to always
report a VI's link as being "up".
Based on the original work by: Casey Leedom <leedom@chelsio.com> Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
As requested [1], I went through and had a look at users of gso_size to
see if there were things that need to be fixed to consider
GSO_BY_FRAGS, and I have tried to improve our helper functions to deal
with this case.
I found a few. This fixes bugs relating to the use of
skb_gso_*_seglen() where GSO_BY_FRAGS is not considered.
Patch 1 renames skb_gso_validate_mtu to skb_gso_validate_network_len.
This is follow-up to my earlier patch 2b16f048729b ("net: create
skb_gso_validate_mac_len()"), and just makes everything a bit clearer.
Patches 2 and 3 replace the final users of skb_gso_network_seglen() -
which doesn't consider GSO_BY_FRAGS - with
skb_gso_validate_network_len(), which does. This allows me to make the
skb_gso_*_seglen functions private in patch 4 - now future users won't
accidentally do the wrong comparison.
Two things remain. One is qdisc_pkt_len_init, which is discussed at
[2] - it's caught up in the GSO_DODGY mess. I don't have any expertise
in GSO_DODGY, and it looks like a good clean fix will involve
unpicking the whole validation mess, so I have left it for now.
Secondly, there are 3 eBPF opcodes that change the gso_size of an SKB
and don't consider GSO_BY_FRAGS. This is going through the bpf tree.
PS: This is all in the core networking stack. For a driver to be
affected by this it would need to support NETIF_F_GSO_SCTP /
NETIF_F_GSO_SOFTWARE and then either use gso_size or not be a purely
virtual device. (Many drivers look at gso_size, but do not support
SCTP segmentation, so the core network will segment an SCTP gso before
it hits them.) Based on that, the only driver that may be affected is
sunvnet, but I have no way of testing it, so I haven't looked at it.
v2: split out bpf stuff
fix review comments from Dave Miller
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Axtens [Thu, 1 Mar 2018 06:13:40 +0000 (17:13 +1100)]
net: make skb_gso_*_seglen functions private
They're very hard to use properly as they do not consider the
GSO_BY_FRAGS case. Code should use skb_gso_validate_network_len
and skb_gso_validate_mac_len as they do consider this case.
Make the seglen functions static, which stops people using them
outside of skbuff.c
Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Axtens [Thu, 1 Mar 2018 06:13:39 +0000 (17:13 +1100)]
net: xfrm: use skb_gso_validate_network_len() to check gso sizes
Replace skb_gso_network_seglen() with
skb_gso_validate_network_len(), as it considers the GSO_BY_FRAGS
case.
Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Axtens [Thu, 1 Mar 2018 06:13:38 +0000 (17:13 +1100)]
net: sched: tbf: handle GSO_BY_FRAGS case in enqueue
tbf_enqueue() checks the size of a packet before enqueuing it.
However, the GSO size check does not consider the GSO_BY_FRAGS
case, and so will drop GSO SCTP packets, causing a massive drop
in throughput.
Use skb_gso_validate_mac_len() instead, as it does consider that
case.
Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
If you take a GSO skb, and split it into packets, will the network
length (L3 headers + L4 headers + payload) of those packets be small
enough to fit within a given MTU?
skb_gso_validate_mtu gives you the answer to that question. However,
we recently added to add a way to validate the MAC length of a split GSO
skb (L2+L3+L4+payload), and the names get confusing, so rename
skb_gso_validate_mtu to skb_gso_validate_network_len
Signed-off-by: Daniel Axtens <dja@axtens.net> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 4 Mar 2018 20:12:48 +0000 (12:12 -0800)]
Merge branch 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Thomas Gleixner:
"A small set of fixes for x86:
- Add missing instruction suffixes to assembly code so it can be
compiled by newer GAS versions without warnings.
- Switch refcount WARN exceptions to UD2 as we did in general
- Make the reboot on Intel Edison platforms work
- A small documentation update so text and sample command match"
* 'x86/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
Documentation, x86, resctrl: Make text and sample command match
x86/platform/intel-mid: Handle Intel Edison reboot correctly
x86/asm: Add instruction suffixes to bitops
x86/entry/64: Add instruction suffix
x86/refcounts: Switch to UD2 for exceptions
Linus Torvalds [Sun, 4 Mar 2018 19:40:16 +0000 (11:40 -0800)]
Merge branch 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86/pti fixes from Thomas Gleixner:
"Three fixes related to melted spectrum:
- Sync the cpu_entry_area page table to initial_page_table on 32 bit.
Otherwise suspend/resume fails because resume uses
initial_page_table and triggers a triple fault when accessing the
cpu entry area.
- Zero the SPEC_CTL MRS on XEN before suspend to address a
shortcoming in the hypervisor.
- Fix another switch table detection issue in objtool"
* 'x86-pti-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/cpu_entry_area: Sync cpu_entry_area to initial_page_table
objtool: Fix another switch table detection issue
x86/xen: Zero MSR_IA32_SPEC_CTRL before suspend