Kirill Tkhai [Wed, 7 Mar 2018 09:40:09 +0000 (12:40 +0300)]
net: Convert nfnl_queue_net_ops
These pernet_operations register and unregister net::nf::queue_handler
and /proc entry. The handler is accessed only under RCU, so this looks
safe to convert them.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Wed, 7 Mar 2018 09:40:00 +0000 (12:40 +0300)]
net: Convert nfnl_log_net_ops
These pernet_operations create and destroy /proc entries.
Also, exit method unsets nfulnl_logger. The logger is not
set by default, and it becomes bound via userspace request.
So, they look safe to be made async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Wed, 7 Mar 2018 09:39:51 +0000 (12:39 +0300)]
net: Convert cttimeout_ops
These pernet_operations also look closed in themself.
Exit method touch only per-net structures, so it's
safe to execute them for several net namespaces in parallel.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Wed, 7 Mar 2018 09:39:42 +0000 (12:39 +0300)]
net: Convert nfnl_acct_ops
These pernet_operations look closed in themself,
and there are no other users of net::nfnl_acct_list
outside. They are safe to be executed for several
net namespaces in parallel.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Wed, 7 Mar 2018 09:39:33 +0000 (12:39 +0300)]
net: Convert nfnetlink_net_ops
These pernet_operations create and destroy net::nfnl
socket of NETLINK_NETFILTER code. There are no other
places, where such type the socket is created, except
these pernet_operations. It seem other pernet_operations
depending on CONFIG_NETFILTER_NETLINK send messages
to this socket. So, we mark it async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Wed, 7 Mar 2018 09:39:14 +0000 (12:39 +0300)]
net: Convert xfrm_user_net_ops
These pernet_operations create and destroy net::xfrm::nlsk
socket of NETLINK_XFRM. There is only entry point, where
it's dereferenced, it's xfrm_user_rcv_msg(). There is no
in-kernel senders to this socket.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
have exit methods, which call ip6t_unregister_table().
ip6table_filter_net_ops has init method registering
filter table.
Since there must not be in-flight ipv6 packets at the time
of pernet_operations execution and since pernet_operations
don't send ipv6 packets each other, these pernet_operations
are safe to be async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/sched: cls_flower: Add support to handle first frag as match field
Allow setting firstfrag as matching option in tc flower classifier.
# tc filter add dev eth0 protocol ip parent ffff: \
flower indev eth0 \
ip_flags firstfrag
action mirred egress redirect dev eth1
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 8 Mar 2018 16:23:38 +0000 (11:23 -0500)]
Merge branch 'hns3-next'
Peng Li says:
====================
fix some bugs for hns3 driver
This patchset fix some bugs for hns3 driver.
[Patch 1/6 - Patch 3/6] fix bugs related about VF driver.
[Patch 3/6 - Patch 6/6] fix the bugs about ethtool_ops.set_channels.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Thu, 8 Mar 2018 11:41:53 +0000 (19:41 +0800)]
net: hns3: fix the queue id for tqp enable&&reset
Command HCLGE_OPC_CFG_COM_TQP_QUEUE should use queue id in the
function, but command HCLGE_OPC_RESET_TQP_QUEUE should use global
queue id.
This patch fixes the queue id about queue enable/disable/reset.
Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Thu, 8 Mar 2018 11:41:51 +0000 (19:41 +0800)]
net: hns3: set the cmdq out_vld bit to 0 after used
Driver check the out_vld bit when get a new cmdq BD, if the bit is 1,
the BD is valid. driver Should set the bit 0 after used and hw will
set the bit 1 if get a valid BD.
Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Anders Roxell [Thu, 8 Mar 2018 10:17:23 +0000 (11:17 +0100)]
selftests/net: enable fragments for fib-onlink-tests
We miss CONFIG_* fragments so test fib-onlink-tests.sh can do:
ip li add lisa type vrf table 1101
ip li add veth1 type veth peer name veth2
And the follow message occurs if it isn't enabled:
Configuring interfaces
RTNETLINK answers: Operation not supported
This enables for NET_NRF (and friends) and VETH so we can create a vrf
table and veth.
Fixes: 153e1b84f477 ("selftests: Add FIB onlink tests") Signed-off-by: Anders Roxell <anders.roxell@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Thu, 8 Mar 2018 09:29:30 +0000 (10:29 +0100)]
ipvlan: properly annotate rx_handler access
The rx_handler field is rcu-protected, but I forgot to use the
proper accessor while refactoring netif_is_ipvlan_port(). Such
function only check the rx_handler value, so it is safe, but we need
to properly read rx_handler via rcu_access_pointer() to avoid sparse
warnings.
Fixes: 1ec54cb44e67 ("net: unpollute priv_flags space") Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Wed, 7 Mar 2018 16:43:19 +0000 (08:43 -0800)]
ip6mr: remove synchronize_rcu() in favor of SOCK_RCU_FREE
Kirill found that recently added synchronize_rcu() call in
ip6mr_sk_done()
was slowing down netns dismantle and posted a patch to use it only if
the socket
was found.
I instead suggested to get rid of this call, and use instead
SOCK_RCU_FREE
We might later change IPv4 side to use the same technique and unify
both stacks. IPv4 does not use synchronize_rcu() but has a call_rcu()
that could be replaced by SOCK_RCU_FREE.
Tested:
time for i in {1..1000}; do unshare -n /bin/false;done
Before : real 7m18.911s
After : real 10.187s
Fixes: 8571ab479a6e ("ip6mr: Make mroute_sk rcu-based") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Yuval Mintz <yuvalm@mellanox.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
A couple of enhancements to the rds zerocop code
- patch 1 refactors rds_message_copy_from_user to pull the zcopy logic
into its own function
- patch 2 drops the usage sk_buff to track MSG_ZEROCOPY cookies and
uses a simple linked list (enhancement suggested by willemb during
code review)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
rds: use list structure to track information for zerocopy completion notification
Commit 401910db4cd4 ("rds: deliver zerocopy completion notification
with data") removes support fo r zerocopy completion notification
on the sk_error_queue, thus we no longer need to track the cookie
information in sk_buff structures.
This commit removes the struct sk_buff_head rs_zcookie_queue by
a simpler list that results in a smaller memory footprint as well
as more efficient memory_allocation time.
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
rds: refactor zcopy code into rds_message_zcopy_from_user
Move the large block of code predicated on zcopy from
rds_message_copy_from_user into a new function,
rds_message_zcopy_from_user()
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Remove VLA usage and change the 'len' argument to a u8 and use a 256
byte buffer on the stack. Notice that these lengths are limited by the
encoding field in the VPD structure, which is a u8 [1].
Fix the SO_ZEROCOPY switch case on sock_setsockopt() avoiding the
ret values to be overwritten by the one set on the default case.
Fixes: 28190752c7092 ("sock: permit SO_ZEROCOPY on PF_RDS socket") Signed-off-by: Jesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Sowmini Varadhan <sowmini.varadhan@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This series adds unicast filtering support to the Marvell PPv2 controller.
This is implemented using the header parser cababilities of the PPv2,
which allows for generic packet filtering based on matching patterns in
the packet headers.
PPv2 controller only has 256 of these entries, and we need to share them
with other features, such as VLAN filtering.
For each interface, we have 5 entries dedicated to unicast filtering (the
controller's own address, and 4 other), and 21 to multicast filtering.
When this number is reached, the controller switches to unicast or
multicast promiscuous mode.
The first patch reworks the function that adds and removes addresses to the
filter. This is preparatory work to ease UC filter implementation.
The second patch adds the UC filtering feature.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Marvell PPv2 controller can be used to implement packet filtering based
on the destination MAC address. This is already used to implement
multicast filtering. This patch adds support for Unicast filtering.
Filtering is based on so-called "TCAM entries" to implement filtering.
Due to their limited number and the fact that these are also used for
other purposes, we reserve 80 entries for both unicast and multicast
filters. On top of the broadcast address, and each interface's own MAC
address, we reserve 25 entries per port, 4 for unicast filters, 21 for
multicast.
Whenever unicast or multicast range for one port is full, the filtering
is disabled and port goes into promiscuous mode for the given type of
addresses.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: mvpp2: Simplify MAC filtering function parameters
The mvpp2_prs_mac_da_accept function takes into parameter both the
struct representing the controller and the port id. This is meaningful
when we want to create TCAM entries for non-initialized ports, but in
this case we expect the port to be initialized before starting adding or
removing MAC addresses to the per-port filter.
This commit changes the function so that it takes struct mvpp2_port as
a parameter instead.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 7 Mar 2018 09:46:58 +0000 (10:46 +0100)]
net: cdc_eem: clean up bind error path
Drop bogus call to usb_driver_release_interface() from an error path in
the usbnet bind() callback, which is called during interface probe. At
this point the interface is not bound and usb_driver_release_interface()
returns early.
Also remove the bogus call to clear the interface data, which is owned
by the usbnet driver and would not even have been set by the time bind()
is called.
Signed-off-by: Johan Hovold <johan@kernel.org> Acked-by: Oliver Neukum <oneukum@suse.com> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Johan Hovold [Wed, 7 Mar 2018 09:46:57 +0000 (10:46 +0100)]
net: kalmia: clean up bind error path
Drop bogus call to usb_driver_release_interface() from an error path in
the usbnet bind() callback, which is called during interface probe. At
this point the interface is not bound and usb_driver_release_interface()
returns early.
Also remove the bogus call to clear the interface data, which is owned
by the usbnet driver and would not even have been set by the time bind()
is called.
Signed-off-by: Johan Hovold <johan@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
This series consists of some fixes and refactors for the mlx5 drivers,
especially around the FPGA and flow steering. Most of them are trivial
fixes and are the foundation of allowing IPSec acceleration from user-space.
We use flow steering abstraction in order to accelerate IPSec packets.
When a user creates a steering rule, [s]he states that we'll carry an
encrypt/decrypt flow action (using a specific configuration) for every
packet which conforms to a certain match. Since currently offloading these
packets is done via FPGA, we'll add another set of flow steering ops.
These ops will execute the required FPGA commands and then call the
standard steering ops.
In order to achieve this, we need that the commands will get all the
required information. Therefore, we pass the fte object and embed the
flow_action struct inside the fte. In addition, we add the shim layer
that will later be used for alternating between the standard and the
FPGA steering commands.
Some fixes, like " net/mlx5e: Wait for FPGA command responses with a timeout"
are very relevant for user-space applications, as these applications could
be killed, but we still want to wait for the FPGA and update the kernel's
database.
Regards,
Aviad and Matan
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefano Brivio [Tue, 6 Mar 2018 21:16:27 +0000 (22:16 +0100)]
selftests: net: Introduce first PMTU test
One single test implemented so far: test_pmtu_vti6_exception
checks that the PMTU of a route exception, caused by a tunnel
exceeding the link layer MTU, is affected by administrative
changes of the tunnel MTU. Creation of the route exception is
checked too.
Requested-by: David Ahern <dsahern@gmail.com> Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fengguang Wu [Tue, 6 Mar 2018 18:23:23 +0000 (02:23 +0800)]
enic: fix boolreturn.cocci warnings
drivers/net/ethernet/cisco/enic/vnic_dev.c:1294:9-10: WARNING: return of 0/1 in function 'vnic_dev_capable_udp_rss' with return type bool
Return statements in functions returning bool should use
true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci
Fixes: 48398b6e7065 ("enic: set UDP rss flag") CC: Govindarajulu Varadarajan <gvaradar@cisco.com> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
drivers/net/dsa/mv88e6xxx/serdes.c:66:9-10: WARNING: return of 0/1 in function 'mv88e6352_port_has_serdes' with return type bool
Return statements in functions returning bool should use
true/false instead of 1/0.
Generated by: scripts/coccinelle/misc/boolreturn.cocci
Fixes: eb755c3f6b7d ("net: dsa: mv88e6xxx: Add helper to determining if port has SERDES") CC: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Tue, 6 Mar 2018 09:56:31 +0000 (10:56 +0100)]
net: unpollute priv_flags space
the ipvlan device driver defines and uses 2 bits inside the priv_flags
net_device field. Such bits and the related helper are used only
inside the ipvlan device driver, and the core networking does not
need to be aware of them.
This change moves netif_is_ipvlan* helper in the ipvlan driver and
re-implement them looking for ipvlan specific symbols instead of
using priv_flags.
Overall this frees two bits inside priv_flags - and move the following
ones to avoid gaps - without any intended functional change.
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: phy: remove phy_error from phy_disable_interrupts
All callers of phy_disable_interrupts() call phy_error() in the error
case. Therefore we don't need to do this within the function too.
This change also allows us to use phy_disable_interrupts() in code
holding phydev->lock (because phy_error() takes this lock).
Make use of this in phy_stop().
v2:
- splitted into two separate patches
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 5 Mar 2018 21:34:27 +0000 (22:34 +0100)]
net: phy: remove phy_error from phy_disable_interrupts
All callers of phy_disable_interrupts() call phy_error() in the error
case. Therefore we don't need to do this within the function too.
This change also allows us to use phy_disable_interrupts() in code
holding phydev->lock (because phy_error() can take this lock).
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Prashant Bhole [Tue, 6 Mar 2018 08:31:32 +0000 (17:31 +0900)]
selftests/net: fix in_netns.sh script
execute the subprocess in netns using 'ip netns exec'
Fixes: cc30c93fa020 ("selftests/net: ignore background traffic in psock_fanout") Signed-off-by: Prashant Bhole <bhole_prashant_q7@lab.ntt.co.jp> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: mvpp2: mvpp2_check_hw_buf_num() can be static
Fixes: effbf5f58d64 ("net: mvpp2: update the BM buffer free/destroy logic") Signed-off-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
There are two compatibility strings for mv88e6xxx, but it isn't clear
from the documentation why only those two exist when the mv88e6xxx driver
supports more than the 6085 and 6190. Briefly describe how the compatible
property is used, and provide guidance on which to use.
The model list comes from looking at port_base_addr values (0x0 vs 0x10)
in drivers/net/dsa/mv88e6xxx/chip.c.
Signed-off-by: Brandon Streiff <brandon.streiff@ni.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
tipc: bcast: use true and false for boolean values
Assign true or false to boolean variables instead of an integer value.
This issue was detected with the help of Coccinelle.
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 7 Mar 2018 17:05:56 +0000 (12:05 -0500)]
Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
1GbE Intel Wired LAN Driver Updates 2018-03-05
This series contains updates to igb only.
Corinna Vinschen adds the support for trusted VFs into the igb driver.
Mika fixes an issue where PCIe device is physically unplugged can cause
a kernel crash. This issue is that netif_device_detach() is called in
these cases, which prevents netif_unregister() from bringing the device
down properly.
Christophe JAILLET fixes an issue with igb where HWTSTAMP_TX_ON was
being handled like a bit mask and not a value.
v2: dropped the e1000e fix from the series since I will be pushing it
through David Miller's net tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 7 Mar 2018 16:44:43 +0000 (11:44 -0500)]
Merge branch 'lan743x-driver'
Bryan Whitehead says:
====================
lan743x: Add new lan743x driver
Add new lan743x driver.
The lan743x from Microchip Technologies Inc,
is a PCIe to Gigabit Ethernet Controller.
Updates for V4:
Patch 1/2 - Applied community suggestions
convert to using module_pci_driver
Updates for V3:
Patch 1/2 - Applied community suggestions
removed initialization tracking flags.
converted to 64 bit statistics.
converted tx clean up tasklet to napi.
Updates for V2:
Patch 1/2 - Applied community suggestions
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Bryan Whitehead [Mon, 5 Mar 2018 19:23:30 +0000 (14:23 -0500)]
lan743x: Add main source files for new lan743x driver
Add main source files for new lan743x driver
Signed-off-by: Bryan Whitehead <Bryan.Whitehead@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
SCTP_SENDALL: This flag, if set, will cause a one-to-many
style socket to send the message to all associations that
are currently established on this socket. For the one-to-
one style socket, this flag has no effect.
Note there is another msg_control option:
5.3.8. SCTP AUTH Information Structure (SCTP_AUTHINFO)
It's a little complicated, I will post it in another patchset after
this.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 5 Mar 2018 12:44:20 +0000 (20:44 +0800)]
sctp: add support for snd flag SCTP_SENDALL process in sendmsg
This patch is to add support for snd flag SCTP_SENDALL process
in sendmsg, as described in section 5.3.4 of RFC6458.
With this flag, you can send the same data to all the asocs of
this sk once.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 5 Mar 2018 12:44:19 +0000 (20:44 +0800)]
sctp: add support for SCTP_DSTADDRV4/6 Information for sendmsg
This patch is to add support for Destination IPv4/6 Address options
for sendmsg, as described in section 5.3.9/10 of RFC6458.
With this option, you can provide more than one destination addrs
to sendmsg when creating asoc, like sctp_connectx.
It's also a necessary send info for sctp_sendv.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 5 Mar 2018 12:44:18 +0000 (20:44 +0800)]
sctp: add support for PR-SCTP Information for sendmsg
This patch is to add support for PR-SCTP Information for sendmsg,
as described in section 5.3.7 of RFC6458.
With this option, you can specify pr_policy and pr_value for user
data in sendmsg.
It's also a necessary send info for sctp_sendv.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When addressing a review comment in a early version of the offending
patch a comment where left in which should have been removed. Remove the
comment to keep it consistent with the code.
Fixes: 75efa06f457bbed3 ("ravb: add support for changing MTU") Reported-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> Acked-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Thu, 1 Mar 2018 12:23:28 +0000 (15:23 +0300)]
net: Make account struct net to memcg
The patch adds SLAB_ACCOUNT to flags of net_cachep cache,
which enables accounting of struct net memory to memcg kmem.
Since number of net_namespaces may be significant, user
want to know, how much there were consumed, and control.
Note, that we do not account net_generic to the same memcg,
where net was accounted, moreover, we don't do this at all (*).
We do not want the situation, when single memcg memory deficit
prevents us to register new pernet_operations.
(*)Even despite there is !current process accounting already
available in linux-next. See kmalloc_memcg() there for the details.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Aviad Yehezkel [Sun, 18 Feb 2018 13:00:54 +0000 (15:00 +0200)]
net/mlx5: Flow steering cmd interface should get the fte when deleting
Previously, deleting a flow steering entry only got the index.
Since the FPGA implementation of FTE's deletion might need to dig
inside the FTE itself, we would like to get the FTE's context.
Changing the interface to pass the FTE context.
Aviad Yehezkel [Sun, 18 Feb 2018 11:17:17 +0000 (13:17 +0200)]
net/mlx5: Add empty egress namespace to flow steering core
Currently, we don't support egress flow steering namespace in mlx5
flow steering core implementation. However, when we want to encrypt
a packet, we model it as a flow steering rule in the egress path.
To overcome this, we add an empty egress namespace to flow steering.
This namespace is initialized only when ipsec support exists.
In the future, this will grow to a full blown full steering
implementation, resembling the ingress path.
Matan Barak [Sun, 20 Aug 2017 12:46:51 +0000 (15:46 +0300)]
net/mlx5: Add shim layer between fs and cmd
The shim layer allows each namespace to define possibly different
functionality for add/delete/update commands. The shim layer
introduced here, will be used to support flow steering with the FPGA.
Matan Barak [Wed, 16 Aug 2017 06:43:48 +0000 (09:43 +0300)]
{net,IB}/mlx5: Add has_tag to mlx5_flow_act
The has_tag member will indicate whether a tag action was specified
in flow specification.
A flow tag 0 = MLX5_FS_DEFAULT_FLOW_TAG is assumed a valid flow tag
that is currently used by mlx5 RDMA driver, whereas in HW flow_tag = 0
means that the user doesn't care about flow_tag. HW always provide
a flow_tag = 0 if all flow tags requested on a specific flow are 0.
So we need a way (in the driver) to differentiate between a user really
requesting flow_tag = 0 and a user who does not care, in order to be
able to report conflicting flow tags on a specific flow.
Boris Pismenny [Wed, 16 Aug 2017 06:33:30 +0000 (09:33 +0300)]
IB/mlx5: Pass mlx5_flow_act struct instead of multiple arguments
Group and pass all function arguments of parse_flow_attr call in one
common struct mlx5_flow_act.
This patch passes all the action arguments of parse_flow_attr in one common
struct mlx5_flow_act. It allows us to scale the number of actions without adding
new arguments to the function.
Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Boris Pismenny <borisp@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com> Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Matan Barak [Sun, 19 Nov 2017 15:51:13 +0000 (15:51 +0000)]
net/mlx5: FPGA and IPSec initialization to be before flow steering
Some flow steering namespace initialization (i.e. egress namespace)
might depend on FPGA capabilities. Changing the initialization order
such that the FPGA will be initialized before flow steering.
Flow steering fs cmds initialization might depend on
IPSec capabilities. Changing the initialization order such
that the IPSec will be initialized before flow steering as well.
Davidlohr Bueso [Mon, 22 Jan 2018 17:21:37 +0000 (09:21 -0800)]
ia64/err-inject: Use get_user_pages_fast()
At the point of sysfs callback, the call to gup is
done without mmap_sem (or any lock for that matter).
This is racy. As such, use the get_user_pages_fast()
alternative and safely avoid taking the lock, if possible.
Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Signed-off-by: Tony Luck <tony.luck@intel.com>
Matthew Wilcox [Mon, 19 Feb 2018 17:41:26 +0000 (09:41 -0800)]
ia64: Convert remaining atomic operations
While we've only seen inlining problems with atomic_sub_return(),
the other atomic operations could have the same problem. Convert all
remaining operations to use the same solution as atomic_sub_return().
Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Corentin Labbe [Wed, 14 Feb 2018 12:19:06 +0000 (12:19 +0000)]
ia64: convert unwcheck.py to python3
Since my system use python3 as default, arch/ia64/scripts/unwcheck.py no
longer run.
This patch convert it to the python3 syntax.
I have ran it with python2/python3 while printing values of
start/end/rlen_sum which could be impacted by this change and I see no difference.
Fixes: 94a47083522e ("scripts: change scripts to use system python instead of env") Signed-off-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Tony Luck <tony.luck@intel.com>
Linus Torvalds [Mon, 5 Mar 2018 19:57:06 +0000 (11:57 -0800)]
Merge tag 'linux-kselftest-4.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest
Pull kselftest fixes from Shuah Khan:
"A fix for regression in memory-hotplug install script that prevents
the test from running on the target"
* tag 'linux-kselftest-4.16-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest:
selftests: memory-hotplug: fix emit_tests regression
1) Use an appropriate TSQ pacing shift in mac80211, from Toke
Høiland-Jørgensen.
2) Just like ipv4's ip_route_me_harder(), we have to use skb_to_full_sk
in ip6_route_me_harder, from Eric Dumazet.
3) Fix several shutdown races and similar other problems in l2tp, from
James Chapman.
4) Handle missing XDP flush properly in tuntap, for real this time.
From Jason Wang.
5) Out-of-bounds access in powerpc ebpf tailcalls, from Daniel
Borkmann.
6) Fix phy_resume() locking, from Andrew Lunn.
7) IFLA_MTU values are ignored on newlink for some tunnel types, fix
from Xin Long.
8) Revert F-RTO middle box workarounds, they only handle one dimension
of the problem. From Yuchung Cheng.
9) Fix socket refcounting in RDS, from Ka-Cheong Poon.
10) Don't allow ppp unit registration to an unregistered channel, from
Guillaume Nault.
11) Various hv_netvsc fixes from Stephen Hemminger.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (98 commits)
hv_netvsc: propagate rx filters to VF
hv_netvsc: filter multicast/broadcast
hv_netvsc: defer queue selection to VF
hv_netvsc: use napi_schedule_irqoff
hv_netvsc: fix race in napi poll when rescheduling
hv_netvsc: cancel subchannel setup before halting device
hv_netvsc: fix error unwind handling if vmbus_open fails
hv_netvsc: only wake transmit queue if link is up
hv_netvsc: avoid retry on send during shutdown
virtio-net: re enable XDP_REDIRECT for mergeable buffer
ppp: prevent unregistered channels from connecting to PPP units
tc-testing: skbmod: fix match value of ethertype
mlxsw: spectrum_switchdev: Check success of FDB add operation
net: make skb_gso_*_seglen functions private
net: xfrm: use skb_gso_validate_network_len() to check gso sizes
net: sched: tbf: handle GSO_BY_FRAGS case in enqueue
net: rename skb_gso_validate_mtu -> skb_gso_validate_network_len
rds: Incorrect reference counting in TCP socket creation
net: ethtool: don't ignore return from driver get_fecparam method
vrf: check forwarding on the original netdevice when generating ICMP dest unreachable
...
David S. Miller [Mon, 5 Mar 2018 17:55:55 +0000 (12:55 -0500)]
Merge branch 'mvpp2-jumbo-frames-support'
Antoine Tenart says:
====================
net: mvpp2: jumbo frames support
This series enable jumbo frames support in the Marvell PPv2 driver. The
first 2 patches rework the buffer management, then two patches prepare for
the final patch which adds the jumbo frames support into the driver.
This is based on top of net-next, and was tested on a mcbin.
Thanks!
Antoine
Since v1:
- Improved the Tx FIFO initialization comment.
- Improved the pool sanity check in mvpp2_bm_pool_use().
- Fixed pool related comments.
- Cosmetic fixes (used BIT() whenever possible).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Chulski [Mon, 5 Mar 2018 14:16:54 +0000 (15:16 +0100)]
net: mvpp2: jumbo frames support
This patch adds the support for jumbo frames in the Marvell PPv2 driver.
A third buffer pool is added with 10KB buffers, which is used if the MTU
is higher than 1518B for packets larger than 1518B. Please note only the
port 0 supports hardware checksum offload due to the Tx FIFO size
limitation.
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
[Antoine: cosmetic cleanup, commit message] Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Mon, 5 Mar 2018 14:16:53 +0000 (15:16 +0100)]
net: mvpp2: enable UDP/TCP checksum over IPv6
This patch adds the NETIF_F_IPV6_CSUM to the driver's features to enable
UDP/TCP checksum over IPv6. No extra configuration of the engine is
needed on top of the IPv4 counterpart, which already is in the features
list (NETIF_F_IP_CSUM).
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yan Markman [Mon, 5 Mar 2018 14:16:52 +0000 (15:16 +0100)]
net: mvpp2: use a data size of 10kB for Tx FIFO on port 0
This patch sets the Tx FIFO data size on port 0 to 10kB. This prepares
the PPv2 driver for the Jumbo frame support addition as the hardware
will need big enough Tx FIFO buffers when dealing with frames going
through an interface with an MTU of 9000.
Signed-off-by: Yan Markman <ymarkman@marvell.com>
[Antoine: commit message, small reworks.] Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Chulski [Mon, 5 Mar 2018 14:16:51 +0000 (15:16 +0100)]
net: mvpp2: update the BM buffer free/destroy logic
The buffer free routine is updated to release only given a number of
buffers, and the destroy routine now checks the actual number of buffers
in the (BPPI and BPPE) HW counters before draining the pools. This
change helps getting jumbo frames support.
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
[Antoine: cosmetic cleanup, commit message] Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Chulski [Mon, 5 Mar 2018 14:16:50 +0000 (15:16 +0100)]
net: mvpp2: use the same buffer pool for all ports
This patch configures the buffer manager long pool for all ports part of
the same CP. Long pool separation between ports is redundant since there
are no performance improvement when different pools are used.
Signed-off-by: Stefan Chulski <stefanc@marvell.com>
[Antoine: cosmetic cleanup, commit message] Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: core: dst: Add kernel-doc for 'net' parameter
This fixes the following kernel-doc warning:
./include/net/dst.h:366: warning: Function parameter or member 'net' not described in 'skb_tunnel_rx'
Fixes: ea23192e8e57 ("tunnels: harmonize cleanup done on skb on rx path") Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: David S. Miller <davem@davemloft.net>
net: core: dst_cache_set_ip6: Rename 'addr' parameter to 'saddr' for consistency
The other dst_cache_{get,set}_ip{4,6} functions, and the doc comment for
dst_cache_set_ip6 use 'saddr' for their source address parameter. Rename
the parameter to increase consistency.
This fixes the following kernel-doc warnings:
./include/net/dst_cache.h:58: warning: Function parameter or member 'addr' not described in 'dst_cache_set_ip6'
./include/net/dst_cache.h:58: warning: Excess function parameter 'saddr' description in 'dst_cache_set_ip6'
Fixes: 911362c70df5 ("net: add dst_cache support") Signed-off-by: Jonathan Neuschäfer <j.neuschaefer@gmx.net> Signed-off-by: David S. Miller <davem@davemloft.net>
'HWTSTAMP_TX_ON' should be handled as a value, not as a bit mask.
The modified code should behave the same, because HWTSTAMP_TX_ON is 1
and no other possible values of 'tx_type' would match the test.
However, this is more future-proof, should other values be allowed one day.
See 'struct hwtstamp_config' in 'include/uapi/linux/net_tstamp.h'
This fixes a warning reported by smatch:
igb_xmit_frame_ring() warn: bit shifter 'HWTSTAMP_TX_ON' used for logical '&'
Fixes: 26bd4e2db06be ("igb: protect TX timestamping from API misuse") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Mika Westerberg [Tue, 23 Jan 2018 10:28:41 +0000 (13:28 +0300)]
igb: Do not call netif_device_detach() when PCIe link goes missing
When the driver notices that PCIe link is gone by reading 0xffffffff
from a register it clears hw->hw_addr and then calls netif_device_detach().
This happens when the PCIe device is physically unplugged for example
the user disconnected the Thunderbolt cable.
However, netif_device_detach() prevents netif_unregister() from bringing
the device down properly including tearing down MSI-X vectors. This
triggers following crash during the driver removal:
To prevent the crash do not call netif_device_detach() in igb_rd32().
This should be fine because hw->hw_addr is set to NULL preventing future
hardware access of the now missing device.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=198181 Reported-by: Ferenc Boldog <ferenc.boldog@gmail.com> Reported-by: Nikolay Bogoychev <nheart@gmail.com> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
this series continues to review and to convert pernet_operations
to make them possible to be executed in parallel for several
net namespaces in the same time. The patches touch mostly netfilter,
also there are small number of changes in other places.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:32:23 +0000 (14:32 +0300)]
net: Convert proto_gre_net_ops
These pernet_operations register and unregister sysctl.
nf_conntrack_l4proto_gre4->init_net is simple memory
initializer. Also, exit method removes gre keymap_list,
which is per-net. This looks safe to be executed
in parallel with other pernet_operations.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:32:15 +0000 (14:32 +0300)]
net: Convert ctnetlink_net_ops
These pernet_operations register and unregister
two conntrack notifiers, and they seem to be safe
to be executed in parallel.
General/not related to async pernet_operations JFI:
ctnetlink_net_exit_batch() actions are grouped in batch,
and this could look like there is synchronize_rcu()
is forgotten. But there is synchronize_rcu() on module
exit patch (in ctnetlink_exit()), so this batch may
be reworked as simple .exit method.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:32:06 +0000 (14:32 +0300)]
net: Convert nf_conntrack_net_ops
These pernet_operations register and unregister sysctl and /proc
entries. Exit batch method also waits till all per-net conntracks
are dead. Thus, they are safe to be marked as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:55 +0000 (14:31 +0300)]
net: Convert ip_set_net_ops
These pernet_operations initialize and destroy
net_generic(net, ip_set_net_id)-related data.
Since ip_set is under CONFIG_IP_SET, it's easy
to watch drivers, which depend on this config.
All of them are in net/netfilter/ipset directory,
except of net/netfilter/xt_set.c. There are no
more drivers, which use ip_set, and all of
the above don't register another pernet_operations.
Also, there are is no indirect users, as header
file include/linux/netfilter/ipset/ip_set.h does
not define indirect users by something like this:
Kirill Tkhai [Mon, 5 Mar 2018 11:31:47 +0000 (14:31 +0300)]
net: Convert fou_net_ops
These pernet_operations initialize and destroy
pernet net_generic(net, fou_net_id) list.
The rest of net_generic(net, fou_net_id) accesses
may happen after netlink message, and in-tree
pernet_operations do not send FOU_GENL_NAME messages.
So, these pernet_operations are safe to be marked
as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:28 +0000 (14:31 +0300)]
net: Convert dccp_v4_ops
These pernet_operations create and destroy net::dccp::v4_ctl_sk.
It looks like another pernet_operations don't want to send
dccp packets to dying or creating net. Batch method similar
to ipv4/ipv6 sockets and it has to be safe to be executed
in parallel with anything else. So, we mark them as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kirill Tkhai [Mon, 5 Mar 2018 11:31:19 +0000 (14:31 +0300)]
net: Convert cangw_pernet_ops
These pernet_operations have a deal with cgw_list,
and the rest of accesses are made under rtnl_lock().
The only exception is cgw_dump_jobs(), which is
accessed under rcu_read_lock(). cgw_dump_jobs() is
called on netlink request, and it does not seem,
foreign pernet_operations want to send a net such
the messages. So, we mark them as async.
Signed-off-by: Kirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net>