netfilter: nf_defrag: Skip defrag if NOTRACK is set
conntrack defrag is needed only if some module like CONNTRACK or NAT
explicitly requests it. For plain forwarding scenarios, defrag is
not needed and can be skipped if NOTRACK is set in a rule.
Since conntrack defrag is currently higher priority than raw table,
setting NOTRACK is not sufficient. We need to move raw to a higher
priority for iptables only.
This is achieved by introducing a module parameter "raw_before_defrag"
which allows to change the priority of raw table to place it before
defrag. By default, the parameter is disabled and the priority of raw
table is NF_IP_PRI_RAW to support legacy behavior. If the module
parameter is enabled, then the priority of the raw table is set to
NF_IP_PRI_RAW_BEFORE_DEFRAG.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
The newly added NF_FLOW_TABLE options cause some build failures in
randconfig kernels:
- when CONFIG_NF_CONNTRACK is disabled, or is a loadable module but
NF_FLOW_TABLE is built-in:
In file included from net/netfilter/nf_flow_table.c:8:0:
include/net/netfilter/nf_conntrack.h:59:22: error: field 'ct_general' has incomplete type
struct nf_conntrack ct_general;
include/net/netfilter/nf_conntrack.h: In function 'nf_ct_get':
include/net/netfilter/nf_conntrack.h:148:15: error: 'const struct sk_buff' has no member named '_nfct'
include/net/netfilter/nf_conntrack.h: In function 'nf_ct_put':
include/net/netfilter/nf_conntrack.h:157:2: error: implicit declaration of function 'nf_conntrack_put'; did you mean 'nf_ct_put'? [-Werror=implicit-function-declaration]
net/netfilter/nf_flow_table.o: In function `nf_flow_offload_work_gc':
(.text+0x1540): undefined reference to `nf_ct_delete'
- when CONFIG_NF_TABLES is disabled:
In file included from net/ipv6/netfilter/nf_flow_table_ipv6.c:13:0:
include/net/netfilter/nf_tables.h: In function 'nft_gencursor_next':
include/net/netfilter/nf_tables.h:1189:14: error: 'const struct net' has no member named 'nft'; did you mean 'nf'?
- when CONFIG_NF_FLOW_TABLE_INET is enabled, but NF_FLOW_TABLE_IPV4
or NF_FLOW_TABLE_IPV6 are not, or are loadable modules
net/netfilter/nf_flow_table_inet.o: In function `nf_flow_offload_inet_hook':
nf_flow_table_inet.c:(.text+0x94): undefined reference to `nf_flow_offload_ipv6_hook'
nf_flow_table_inet.c:(.text+0x40): undefined reference to `nf_flow_offload_ip_hook'
- when CONFIG_NF_FLOW_TABLES is disabled, but the other options are
enabled:
net/netfilter/nf_flow_table_inet.o: In function `nf_flow_offload_inet_hook':
nf_flow_table_inet.c:(.text+0x6c): undefined reference to `nf_flow_offload_ipv6_hook'
net/netfilter/nf_flow_table_inet.o: In function `nf_flow_inet_module_exit':
nf_flow_table_inet.c:(.exit.text+0x8): undefined reference to `nft_unregister_flowtable_type'
net/netfilter/nf_flow_table_inet.o: In function `nf_flow_inet_module_init':
nf_flow_table_inet.c:(.init.text+0x8): undefined reference to `nft_register_flowtable_type'
net/ipv4/netfilter/nf_flow_table_ipv4.o: In function `nf_flow_ipv4_module_exit':
nf_flow_table_ipv4.c:(.exit.text+0x8): undefined reference to `nft_unregister_flowtable_type'
net/ipv4/netfilter/nf_flow_table_ipv4.o: In function `nf_flow_ipv4_module_init':
nf_flow_table_ipv4.c:(.init.text+0x8): undefined reference to `nft_register_flowtable_type'
This adds additional Kconfig dependencies to ensure that NF_CONNTRACK and NF_TABLES
are always visible from NF_FLOW_TABLE, and that the internal dependencies between
the four new modules are met.
Fixes: 7c23b629a808 ("netfilter: flow table support for the mixed IPv4/IPv6 family") Fixes: 0995210753a2 ("netfilter: flow table support for IPv6") Fixes: 97add9f0d66d ("netfilter: flow table support for IPv4") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: add IPv6 segment routing header 'srh' match
It allows matching packets based on Segment Routing Header
(SRH) information.
The implementation considers revision 7 of the SRH draft.
https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-07
Currently supported match options include:
(1) Next Header
(2) Hdr Ext Len
(3) Segments Left
(4) Last Entry
(5) Tag value of SRH
Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: core: return EBUSY in case NAT hook is already in use
EEXIST is used for an object that already exists, with the same
name/handle. However, there no same object there, instead there is a
object that is using the single slot that is available for NAT hooks
since patch f92b40a8b264 ("netfilter: core: only allow one nat hook per
hook point"). Let's change this return value before this behaviour gets
exposed in the first -rc.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Now that we have a single table list for each netns, we can get rid of
one pointer per family and the global afinfo list, thus, shrinking
struct netns for nftables that now becomes 64 bytes smaller.
And call __nft_release_afinfo() from __net_exit path accordingly to
release netnamespace objects on removal.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_tables: add single table list for all families
Place all existing user defined tables in struct net *, instead of
having one list per family. This saves us from one level of indentation
in netlink dump functions.
Place pointer to struct nft_af_info in struct nft_table temporarily, as
we still need this to put back reference module reference counter on
table removal.
This patch comes in preparation for the removal of struct nft_af_info.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_tables: remove nhooks field from struct nft_af_info
We already validate the hook through bitmask, so this check is
superfluous. When removing this, this patch is also fixing a bug in the
new flowtable codebase, since ctx->afi points to the table family
instead of the netdev family which is where the flowtable is really
hooked in.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
David S. Miller [Tue, 9 Jan 2018 17:38:57 +0000 (12:38 -0500)]
Merge branch 'r8169-improve-runtime-pm'
Heiner Kallweit says:
====================
r8169: improve runtime pm
On my system with two network ports I found that runtime PM didn't
suspend the unused port. Therefore I checked runtime pm in this driver
in somewhat more detail and this series improves runtime pm in general
and solves the mentioned issue.
Tested on a system with RTL8168evl (MAC version 34).
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 8 Jan 2018 20:39:13 +0000 (21:39 +0100)]
r8169: improve runtime pm in general and suspend unused ports
So far rpm doesn't cover cases like unused ports which are never
brought up. If they are active at probe time they remain in this state.
Included in this patch:
- Let the idle notification check whether we can suspend and let it
schedule the suspend. This way we don't need to have calls to
pm_schedule_suspend in different places.
- At the end of rtl_open and rtl_init_one send an idle notification
to allow suspending if the link is down. If a cable is plugged in
aneg is finished before the suspend timer expires and the suspend
request is cancelled.
- Change rtl8169_runtime_suspend to power down the chip if the
interface is down.
Successfully tested on a RTL8168evl (mac version 34).
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 8 Jan 2018 20:39:07 +0000 (21:39 +0100)]
r8169: improve runtime pm in rtl8169_check_link_status
This patch partially reverts commit e4fbce740f07 "r8169: Fix runtime
power management" from 2010. At that time the suspend delay was 100ms
and therefore suspending happened during initial aneg. Currently
suspend delay is 5s, so suspend starts after aneg and the issue
doesn't exist any longer. On my system aneg takes almost 3s, to be on
the safe side let's increase the suspend delay to 10s.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Mon, 8 Jan 2018 20:39:02 +0000 (21:39 +0100)]
r8169: remove unneeded rpm ops in rtl_shutdown
This patch reverts commit 2a15cd2ff488 "r8169: runtime resume before
shutdown" from 2012. Few months after this change the underlying issue
was solved in the PCI core with commit 3ff2de9ba1a2 "PCI/PM: Resume
device before shutdown".
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
tipc: improvements to group messaging
We make a number of simplifications and improvements to the group
messaging service. They aim at readability/maintainability of the code
as well as scalability.
The series is based on commit f9c935db8086 ("tipc: fix problems with
multipoint-to-point flow control) which has been applied to 'net' but
not yet to 'net-next'.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Mon, 8 Jan 2018 20:03:31 +0000 (21:03 +0100)]
tipc: improve poll() for group member socket
The current criteria for returning POLLOUT from a group member socket is
too simplistic. It basically returns POLLOUT as soon as the group has
external destinations, something obviously leading to a lot of spinning
during destination congestion situations. At the same time, the internal
congestion handling is unnecessarily complex.
We now change this as follows.
- We introduce an 'open' flag in struct tipc_group. This flag is used
only to help poll() get the setting of POLLOUT right, and *not* for
congeston handling as such. This means that a user can choose to
ignore an EAGAIN for a destination and go on sending messages to
other destinations in the group if he wants to.
- The flag is set to false every time we return EAGAIN on a send call.
- The flag is set to true every time any member, i.e., not necessarily
the member that caused EAGAIN, is removed from the small_win list.
- We remove the group member 'usr_pending' flag. The size of the send
window and presence in the 'small_win' list is sufficient criteria
for recognizing congestion.
This solution seems to be a reasonable compromise between 'anycast',
which is normally not waiting for POLLOUT for a specific destination,
and the other three send modes, which are.
Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Mon, 8 Jan 2018 20:03:30 +0000 (21:03 +0100)]
tipc: improve groupcast scope handling
When a member joins a group, it also indicates a binding scope. This
makes it possible to create both node local groups, invisible to other
nodes, as well as cluster global groups, visible everywhere.
In order to avoid that different members end up having permanently
differing views of group size and memberhip, we must inhibit locally
and globally bound members from joining the same group.
We do this by using the binding scope as an additional separator between
groups. I.e., a member must ignore all membership events from sockets
using a different scope than itself, and all lookups for message
destinations must require an exact match between the message's lookup
scope and the potential target's binding scope.
Apart from making it possible to create local groups using the same
identity on different nodes, a side effect of this is that it now also
becomes possible to create a cluster global group with the same identity
across the same nodes, without interfering with the local groups.
Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Mon, 8 Jan 2018 20:03:29 +0000 (21:03 +0100)]
tipc: add option to suppress PUBLISH events for pre-existing publications
Currently, when a user is subscribing for binding table publications,
he will receive a PUBLISH event for all already existing matching items
in the binding table.
However, a group socket making a subscriptions doesn't need this initial
status update from the binding table, because it has already scanned it
during the join operation. Worse, the multiplicatory effect of issuing
mutual events for dozens or hundreds group members within a short time
frame put a heavy load on the topology server, with the end result that
scale out operations on a big group tend to take much longer than needed.
We now add a new filter option, TIPC_SUB_NO_STATUS, for topology server
subscriptions, so that this initial avalanche of events is suppressed.
This change, along with the previous commit, significantly improves the
range and speed of group scale out operations.
We keep the new option internal for the tipc driver, at least for now.
Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Mon, 8 Jan 2018 20:03:28 +0000 (21:03 +0100)]
tipc: send out join messages as soon as new member is discovered
When a socket is joining a group, we look up in the binding table to
find if there are already other members of the group present. This is
used for being able to return EAGAIN instead of EHOSTUNREACH if the
user proceeds directly to a send attempt.
However, the information in the binding table can be used to directly
set the created member in state MBR_PUBLISHED and send a JOIN message
to the peer, instead of waiting for a topology PUBLISH event to do this.
When there are many members in a group, the propagation time for such
events can be significant, and we can save time during the join
operation if we use the initial lookup result fully.
In this commit, we eliminate the member state MBR_DISCOVERED which has
been the result of the initial lookup, and do instead go directly to
MBR_PUBLISHED, which initiates the setup.
After this change, the tipc_member FSM looks as follows:
Jon Maloy [Mon, 8 Jan 2018 20:03:27 +0000 (21:03 +0100)]
tipc: simplify group LEAVE sequence
After the changes in the previous commit the group LEAVE sequence
can be simplified.
We now let the arrival of a LEAVE message unconditionally issue a group
DOWN event to the user. When a topology WITHDRAW event is received, the
member, if it still there, is set to state LEAVING, but we only issue a
group DOWN event when the link to the peer node is gone, so that no
LEAVE message is to be expected.
Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Mon, 8 Jan 2018 20:03:26 +0000 (21:03 +0100)]
tipc: create group member event messages when they are needed
In the current implementation, a group socket receiving topology
events about other members just converts the topology event message
into a group event message and stores it until it reaches the right
state to issue it to the user. This complicates the code unnecessarily,
and becomes impractical when we in the coming commits will need to
create and issue membership events independently.
In this commit, we change this so that we just notice the type and
origin of the incoming topology event, and then drop the buffer. Only
when it is time to actually send a group event to the user do we
explicitly create a new message and send it upwards.
Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Maloy [Mon, 8 Jan 2018 20:03:24 +0000 (21:03 +0100)]
tipc: let group member stay in JOINED mode if unable to reclaim
We handle a corner case in the function tipc_group_update_rcv_win().
During extreme pessure it might happen that a message receiver has all
its active senders in RECLAIMING or REMITTED mode, meaning that there
is nobody to reclaim advertisements from if an additional sender tries
to go active.
Currently we just set the new sender to ACTIVE anyway, hence at least
theoretically opening up for a receiver queue overflow by exceeding the
MAX_ACTIVE limit. The correct solution to this is to instead add the
member to the pending queue, while letting the oldest member in that
queue revert to JOINED state.
In this commit we refactor the code for handling message arrival from
a JOINED member, both to make it more comprehensible and to cover the
case described above.
Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset by Jenny adds sanity checks in ethtool ringparam
operation for input upper bounds, similarly to what's done in
ethtool_set_channels.
The checks are added in patch 1, using a call to get_ringparam
prior to calling set_ringparam NDO.
Patch 2 changes the function's behavior in mlx4_en, so that
it returns an error for out-of-range input, instead of rounding
it to closest valid, similar to mlx5e.
Patch 3 removes the upper bound checks in mlx5e_ethtool_set_ringparam
as it becomes redundant.
Series generated against net-next commit: f66faae2f80a Merge branch 'ipv6-ipv4-nexthop-align'
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net/mlx5e: Remove redundant checks in set_ringparam
Since the checks are done in upper layer ethtool code,
checks in driver are not needed any more.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/mlx4_en: Align behavior of set ring size flow via ethtool
In current implementation, any requested RX/TX ring size value
that is less than minimum is silently casted to nearest valid value.
Update this behavior to align with mlx5 behavior by printing warning
in dmesg and remaining the size unchanged.
Kernel is responsible for verifying against the maximum.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
ethtool: Ensure new ring parameters are within bounds during SRINGPARAM
Add a sanity check to ensure that all requested ring parameters
are within bounds, which should reduce errors in driver implementation.
Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Baron [Fri, 5 Jan 2018 22:44:54 +0000 (17:44 -0500)]
virtio_net: propagate linkspeed/duplex settings from the hypervisor
The ability to set speed and duplex for virtio_net is useful in various
scenarios as described here:
16032be virtio_net: add ethtool support for set and get of settings
However, it would be nice to be able to set this from the hypervisor,
such that virtio_net doesn't require custom guest ethtool commands.
Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows
the hypervisor to export a linkspeed and duplex setting. The user can
subsequently overwrite it later if desired via: 'ethtool -s'.
Note that VIRTIO_NET_F_SPEED_DUPLEX is defined as bit 63, the intention
is that device feature bits are to grow down from bit 63, since the
transports are starting from bit 24 and growing up.
Signed-off-by: Jason Baron <jbaron@akamai.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Jason Wang <jasowang@redhat.com> Cc: virtio-dev@lists.oasis-open.org Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Walter [Fri, 5 Jan 2018 13:33:31 +0000 (14:33 +0100)]
macsec: Add support for GCM-AES-256 cipher suite
This adds support for the GCM-AES-256 cipher suite as specified in
IEEE 802.1AEbn-2011. The prepared cipher suite selection mechanism is used,
with GCM-AES-128 being the default cipher suite as defined in the standard.
Signed-off-by: Felix Walter <felix.walter@cloudandheat.com> Cc: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 9 Jan 2018 15:57:19 +0000 (10:57 -0500)]
Merge branch 'XDP-transmission-for-tuntap'
Jason Wang says:
====================
XDP transmission for tuntap
This series tries to implement XDP transmission (ndo_xdp_xmit) for
tuntap. Pointer ring was used for queuing both XDP buffers and
sk_buff, this is done by encoding the type into lowest bit of the
pointer and storin XDP metadata in the headroom of XDP buff.
Tests gets 3.05 Mpps when doing xdp_redirect_map from ixgbe to VM
(testpmd + virtio-net in guest). This gives us ~20% improvments
compared to use skb during redirect.
Please review.
Changes from V1:
- slient warnings
- fix typos
- add skb mode number in the commit log
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Thu, 4 Jan 2018 03:14:28 +0000 (11:14 +0800)]
tuntap: XDP transmission
This patch implements XDP transmission for TAP. Since we can't create
new queues for TAP during XDP set, exist ptr_ring was reused for
queuing XDP buffers. To differ xdp_buff from sk_buff, TUN_XDP_FLAG
(0x1UL) was encoded into lowest bit of xpd_buff pointer during
ptr_ring_produce, and was decoded during consuming. XDP metadata was
stored in the headroom of the packet which should work in most of
cases since driver usually reserve enough headroom. Very minor changes
were done for vhost_net: it just need to peek the length depends on
the type of pointer.
Tests were done on two Intel E5-2630 2.40GHz machines connected back
to back through two 82599ES. Traffic were generated/received through
MoonGen/testpmd(rxonly). It reports ~20% improvements when
xdp_redirect_map is doing redirection from ixgbe to TAP (from 2.50Mpps
to 3.05Mpps)
Cc: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1) Frag and UDP handling fixes in i40e driver, from Amritha Nambiar and
Alexander Duyck.
2) Undo unintentional UAPI change in netfilter conntrack, from Florian
Westphal.
3) Revert a change to how error codes are returned from
dev_get_valid_name(), it broke some apps.
4) Cannot cache routes for ipv6 tunnels in the tunnel is ipv4/ipv6
dual-stack. From Eli Cooper.
5) Fix missed PMTU updates in geneve, from Xin Long.
6) Cure double free in macvlan, from Gao Feng.
7) Fix heap out-of-bounds write in rds_message_alloc_sgs(), from
Mohamed Ghannam.
8) FEC bug fixes from FUgang Duan (mis-accounting of dev_id, missed
deferral of probe when the regulator is not ready yet).
9) Missing DMA mapping error checks in 3c59x, from Neil Horman.
10) Turn off Broadcom tags for some b53 switches, from Florian Fainelli.
11) Fix OOPS when get_target_net() is passed an SKB whose NETLINK_CB()
isn't initialized. From Andrei Vagin.
12) Fix crashes in fib6_add(), from Wei Wang.
13) PMTU bug fixes in SCTP from Marcelo Ricardo Leitner.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
sh_eth: fix TXALCR1 offsets
mdio-sun4i: Fix a memory leak
phylink: mark expected switch fall-throughs in phylink_mii_ioctl
sctp: fix the handling of ICMP Frag Needed for too small MTUs
sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled
xen-netfront: enable device after manual module load
bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc()
sh_eth: fix SH7757 GEther initialization
net: fec: free/restore resource in related probe error pathes
uapi/if_ether.h: prevent redefinition of struct ethhdr
ipv6: fix general protection fault in fib6_add()
RDS: null pointer dereference in rds_atomic_free_op
sh_eth: fix TSU resource handling
net: stmmac: enable EEE in MII, GMII or RGMII only
rtnetlink: give a user socket to get_target_net()
MAINTAINERS: Update my email address.
can: ems_usb: improve error reporting for error warning and error passive
can: flex_can: Correct the checking for frame length in flexcan_start_xmit()
can: gs_usb: fix return value of the "set_bittiming" callback
...
Yang Shi [Mon, 8 Jan 2018 19:52:54 +0000 (03:52 +0800)]
net: tipc: remove unused hardirq.h
Preempt counter APIs have been split out, currently, hardirq.h just
includes irq_enter/exit APIs which are not used by TIPC at all.
So, remove the unused hardirq.h.
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com> Acked-by: Ying Xue <ying.xue@windriver.com> Tested-by: Ying Xue <ying.xue@windriver.com> Cc: Jon Maloy <jon.maloy@ericsson.com> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Shi [Mon, 8 Jan 2018 19:52:53 +0000 (03:52 +0800)]
net: ovs: remove unused hardirq.h
Preempt counter APIs have been split out, currently, hardirq.h just
includes irq_enter/exit APIs which are not used by openvswitch at all.
So, remove the unused hardirq.h.
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com> Acked-by: Pravin B Shelar <pshelar@ovn.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: dev@openvswitch.org Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Shi [Mon, 8 Jan 2018 19:52:52 +0000 (03:52 +0800)]
net: caif: remove unused hardirq.h
Preempt counter APIs have been split out, currently, hardirq.h just
includes irq_enter/exit APIs which are not used by caif at all.
So, remove the unused hardirq.h.
Signed-off-by: Yang Shi <yang.s@alibaba-inc.com> Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no> Cc: "David S. Miller" <davem@davemloft.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Sun, 7 Jan 2018 10:08:40 +0000 (12:08 +0200)]
8139cp: Replace WARN_ONCE with netdev_WARN_ONCE
Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.
Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: Realtek linux nic maintainers <nic_swsd@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Sun, 7 Jan 2018 10:08:39 +0000 (12:08 +0200)]
bnx2x: Replace WARN_ONCE with netdev_WARN_ONCE
Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.
Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: Ariel Elior <ariel.elior@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Sun, 7 Jan 2018 10:08:38 +0000 (12:08 +0200)]
e1000: Replace WARN_ONCE with netdev_WARN_ONCE
Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.
Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gal Pressman [Sun, 7 Jan 2018 10:08:35 +0000 (12:08 +0200)]
net: Fix netdev_WARN_ONCE macro
netdev_WARN_ONCE is broken (whoops..), this fix will remove the
unnecessary "condition" parameter, add the missing comma and change
"arg" to "args".
Fixes: 375ef2b1f0d0 ("net: Introduce netdev_*_once functions") Signed-off-by: Gal Pressman <galp@mellanox.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your
net-next tree:
1) Free hooks via call_rcu to speed up netns release path, from
Florian Westphal.
2) Reduce memory footprint of hook arrays, skip allocation if family is
not present - useful in case decnet support is not compiled built-in.
Patches from Florian Westphal.
3) Remove defensive check for malformed IPv4 - including ihl field - and
IPv6 headers in x_tables and nf_tables.
4) Add generic flow table offload infrastructure for nf_tables, this
includes the netlink control plane and support for IPv4, IPv6 and
mixed IPv4/IPv6 dataplanes. This comes with NAT support too. This
patchset adds the IPS_OFFLOAD conntrack status bit to indicate that
this flow has been offloaded.
5) Add secpath matching support for nf_tables, from Florian.
6) Save some code bytes in the fast path for the nf_tables netdev,
bridge and inet families.
7) Allow one single NAT hook per point and do not allow to register NAT
hooks in nf_tables before the conntrack hook, patches from Florian.
8) Seven patches to remove the struct nf_af_info abstraction, instead
we perform direct calls for IPv4 which is faster. IPv6 indirections
are still needed to avoid dependencies with the 'ipv6' module, but
these now reside in struct nf_ipv6_ops.
9) Seven patches to handle NFPROTO_INET from the Netfilter core,
hence we can remove specific code in nf_tables to handle this
pseudofamily.
10) No need for synchronize_net() call for nf_queue after conversion
to hook arrays. Also from Florian.
11) Call cond_resched_rcu() when dumping large sets in ipset to avoid
softlockup. Again from Florian.
12) Pass lockdep_nfnl_is_held() to rcu_dereference_protected(), patch
from Florian Westphal.
13) Fix matching of counters in ipset, from Jozsef Kadlecsik.
14) Missing nfnl lock protection in the ip_set_net_exit path, also
from Jozsef.
15) Move connlimit code that we can reuse from nf_tables into
nf_conncount, from Florian Westhal.
And asorted cleanups:
16) Get rid of nft_dereference(), it only has one single caller.
17) Add nft_set_is_anonymous() helper function.
18) Remove NF_ARP_FORWARD leftover chain definition in nf_tables_arp.
19) Remove unnecessary comments in nf_conntrack_h323_asn1.c
From Varsha Rao.
20) Remove useless parameters in frag_safe_skb_hp(), from Gao Feng.
21) Constify layer 4 conntrack protocol definitions, function
parameters to register/unregister these protocol trackers, and
timeouts. Patches from Florian Westphal.
22) Remove nlattr_size indirection, from Florian Westphal.
23) Add fall-through comments as -Wimplicit-fallthrough needs this,
from Gustavo A. R. Silva.
24) Use swap() macro to exchange values in ipset, patch from
Gustavo A. R. Silva.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Tue, 9 Jan 2018 00:17:31 +0000 (16:17 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Doug Ledford:
- One line fix to mlx4 error flow (same as mlx5 fix in last pull
request, just in the mlx4 driver)
- Fix a race condition in the IPoIB driver. This patch is larger than
just a one line fix, but resolves a race condition in a fairly
straight forward manner
- Fix a locking issue in the RDMA netlink code. This patch is also
larger than I would like for a late -rc. It has, however, had a week
to bake in the rdma tree prior to this pull request
- One line fix to fix granting remote machine access to memory that
they don't need and shouldn't have
- One line fix to correct the fact that our sgid/dgid pair is swapped
from what you would expect when receiving an incoming connection
request
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/srpt: Fix ACL lookup during login
IB/srpt: Disable RDMA access by the initiator
RDMA/netlink: Fix locking around __ib_get_device_by_index
IB/ipoib: Fix race condition in neigh creation
IB/mlx4: Fix mlx4_ib_alloc_mr error flow
Yafang Shao [Sun, 7 Jan 2018 06:31:47 +0000 (14:31 +0800)]
net: tracepoint: exposing sk_faimily in tracepoint inet_sock_set_state
As of now, there're two sk_family are traced with sock:inet_sock_set_state,
which are AF_INET and AF_INET6.
So the sk_family are exposed as well.
Then we can conveniently use it to do the filter.
Both sk_family and sk_protocol are showed in the printk message, so we need
not expose them as tracepoint arguments.
Suggested-by: Brendan Gregg <brendan.d.gregg@gmail.com> Suggested-by: Song Liu <songliubraving@fb.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Reviewed-by: Song Liu <songliubraving@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Sat, 6 Jan 2018 21:26:47 +0000 (00:26 +0300)]
sh_eth: fix TXALCR1 offsets
The TXALCR1 offsets are incorrect in the register offset tables, most
probably due to copy&paste error. Luckily, the driver never uses this
register. :-)
Fixes: 4a55530f38e4 ("net: sh_eth: modify the definitions of register") Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
If the probing of the regulator is deferred, the memory allocated by
'mdiobus_alloc_size()' will be leaking.
It should be freed before the next call to 'sun4i_mdio_probe()' which will
reallocate it.
Fixes: 4bdcb1dd9feb ("net: Add MDIO bus driver for the Allwinner EMAC") Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Guillaume Nault [Fri, 5 Jan 2018 18:47:14 +0000 (19:47 +0100)]
l2tp: adjust comments about L2TPv3 offsets
The "offset" option has been removed by
commit 900631ee6a26 ("l2tp: remove configurable payload offset").
Signed-off-by: Guillaume Nault <g.nault@alphalink.fr> Acked-by: James Chapman <jchapman@katalix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
phylink: mark expected switch fall-throughs in phylink_mii_ioctl
In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.
Addresses-Coverity-ID: 1463447 ("Missing break in switch") Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 8 Jan 2018 19:19:13 +0000 (14:19 -0500)]
Merge branch 'SCTP-PMTU-discovery-fixes'
Marcelo Ricardo Leitner says:
====================
SCTP PMTU discovery fixes
This patchset fixes 2 issues with PMTU discovery that can lead to flood
of retransmissions.
The first patch fixes the issue for when PMTUD is disabled by the
application, while the second fixes it for when its enabled.
Please consider these to stable.
====================
Acked-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Fri, 5 Jan 2018 16:07:10 +0000 (16:07 +0000)]
net: phy: fix wrong masks to phy_modify()
The mask argument for phy_modify() in several locations was inverted.
Fixes: fea23fb591cc ("net: phy: convert read-modify-write to phy_modify()") Reported-by: Heiner Kallweit <hkallweit1@gmail.com> Tested-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
sctp: fix the handling of ICMP Frag Needed for too small MTUs
syzbot reported a hang involving SCTP, on which it kept flooding dmesg
with the message:
[ 246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too
low, using default minimum of 512
That happened because whenever SCTP hits an ICMP Frag Needed, it tries
to adjust to the new MTU and triggers an immediate retransmission. But
it didn't consider the fact that MTUs smaller than the SCTP minimum MTU
allowed (512) would not cause the PMTU to change, and issued the
retransmission anyway (thus leading to another ICMP Frag Needed, and so
on).
As IPv4 (ip_rt_min_pmtu=556) and IPv6 (IPV6_MIN_MTU=1280) minimum MTU
are higher than that, sctp_transport_update_pmtu() is changed to
re-fetch the PMTU that got set after our request, and with that, detect
if there was an actual change or not.
The fix, thus, skips the immediate retransmission if the received ICMP
resulted in no change, in the hope that SCTP will select another path.
Note: The value being used for the minimum MTU (512,
SCTP_DEFAULT_MINSEGMENT) is not right and instead it should be (576,
SCTP_MIN_PMTU), but such change belongs to another patch.
Changes from v1:
- do not disable PMTU discovery, in the light of commit 06ad391919b2 ("[SCTP] Don't disable PMTU discovery when mtu is small")
and as suggested by Xin Long.
- changed the way to break the rtx loop by detecting if the icmp
resulted in a change or not
Changes from v2:
none
See-also: https://lkml.org/lkml/2017/12/22/811 Reported-by: syzbot <syzkaller@googlegroups.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled
Currently, if PMTU discovery is disabled on a given transport, but the
configured value is higher than the actual PMTU, it is likely that we
will get some icmp Frag Needed. The issue is, if PMTU discovery is
disabled, we won't update the information and will issue a
retransmission immediately, which may very well trigger another ICMP,
and another retransmission, leading to a loop.
The fix is to simply not trigger immediate retransmissions if PMTU
discovery is disabled on the given transport.
Changes from v2:
- updated stale comment, noticed by Xin Long
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eduardo Otubo [Fri, 5 Jan 2018 08:42:16 +0000 (09:42 +0100)]
xen-netfront: enable device after manual module load
When loading the module after unloading it, the network interface would
not be enabled and thus wouldn't have a backend counterpart and unable
to be used by the guest.
The guest would face errors like:
[root@guest ~]# ethtool -i eth0
Cannot get driver information: No such device
[root@guest ~]# ifconfig eth0
eth0: error fetching interface information: Device not found
This patch initializes the state of the netfront device whenever it is
loaded manually, this state would communicate the netback to create its
device and establish the connection between them.
Signed-off-by: Eduardo Otubo <otubo@redhat.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zhu Yanjun [Fri, 5 Jan 2018 04:06:39 +0000 (23:06 -0500)]
forcedeth: remove duplicate structure member in rx
Since both first_rx and rx_ring are the head of rx ring, it not
necessary to use two structure members to statically indicate
the head of rx ring. So first_rx is removed.
CC: Srinivas Eeda <srinivas.eeda@oracle.com> CC: Joe Jin <joe.jin@oracle.com> CC: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 8 Jan 2018 19:13:45 +0000 (14:13 -0500)]
Merge branch 'bnxt_en_fixes'
Michael Chan says:
====================
bnxt_en: 2 small bug fixes.
The first one fixes the TC Flower flow parameter passed to firmware. The
2nd one fixes the VF index range checking for iproute2 SRIOV related commands.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Venkat Duvvuru [Thu, 4 Jan 2018 23:46:55 +0000 (18:46 -0500)]
bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
In bnxt_vf_ndo_prep (which is called by bnxt_get_vf_config ndo), there is a
check for "Invalid VF id". Currently, the check is done against max_vfs.
However, the user doesn't always create max_vfs. So, the check should be
against the created number of VFs. The number of bnxt_vf_info structures
that are allocated in bnxt_alloc_vf_resources routine is the "number of
requested VFs". So, if an "invalid VF id" falls between the requested
number of VFs and the max_vfs, the driver will be dereferencing an invalid
pointer.
Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.") Signed-off-by: Venkat Devvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Challa [Thu, 4 Jan 2018 23:46:54 +0000 (18:46 -0500)]
bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc()
flow_type in HWRM_FLOW_ALLOC is not being populated correctly due to
incorrect passing of pointer and size of l3_mask argument of is_wildcard().
Fixed this.
Fixes: db1d36a27324 ("bnxt_en: add TC flower offload flow_alloc/free FW cmds") Signed-off-by: Sunil Challa <sunilkumar.challa@broadcom.com> Reviewed-by: Sathya Perla <sathya.perla@broadcom.com> Reviewed-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Mon, 8 Jan 2018 19:13:08 +0000 (11:13 -0800)]
Merge branch 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup
Pull cgroup fixes from Tejun Heo:
"This contains fixes for the following two non-trivial issues:
- The task iterator got broken while adding thread mode support for
v4.14. It was less visible because it only triggers when both
cgroup1 and cgroup2 hierarchies are in use. The recent versions of
systemd uses cgroup2 for process management even when cgroup1 is
used for resource control exposing this issue.
- cpuset CPU hotplug path could deadlock when racing against exits.
There also are two patches to replace unlimited strcpy() usages with
strlcpy()"
* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
cgroup: fix css_task_iter crash on CSS_TASK_ITER_PROC
cgroup: Fix deadlock in cpu hotplug path
cgroup: use strlcpy() instead of strscpy() to avoid spurious warning
cgroup: avoid copying strings longer than the buffers
Stefano Brivio [Thu, 4 Jan 2018 23:38:05 +0000 (00:38 +0100)]
tcp: Split BUG_ON() in tcp_tso_should_defer() into two assertions
The two conditions triggering BUG_ON() are somewhat unrelated:
the tcp_skb_pcount() check is meant to catch TSO flaws, the
second one checks sanity of congestion window bookkeeping.
Split them into two separate BUG_ON() assertions on two lines,
so that we know which one actually triggers, when they do.
Signed-off-by: Stefano Brivio <sbrivio@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Thu, 4 Jan 2018 22:03:54 +0000 (14:03 -0800)]
net: ipv6: Allow connect to linklocal address from socket bound to vrf
Allow a process bound to a VRF to connect to a linklocal address.
Currently, this fails because of a mismatch between the scope of the
linklocal address and the sk_bound_dev_if inherited by the VRF binding:
$ ssh -6 fe80::70b8:cff:fedd:ead8%eth1
ssh: connect to host fe80::70b8:cff:fedd:ead8%eth1 port 22: Invalid argument
Relax the scope check to allow the socket to be bound to the same L3
device as the scope id.
This makes ipv6 linklocal consistent with other relaxed checks enabled
by commits 1ff23beebdd3 ("net: l3mdev: Allow send on enslaved interface")
and 7bb387c5ab12a ("net: Allow IP_MULTICAST_IF to set index to L3 slave").
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergei Shtylyov [Thu, 4 Jan 2018 21:26:46 +0000 (00:26 +0300)]
sh_eth: remove sh_eth_plat_data::edmac_endian
Since the commit 888cc8c20cf ("sh_eth: remove EDMAC_BIG_ENDIAN") (geez,
I didn't realize that was 2 years ago!) the initializers in the SuperH
platform code for the 'sh_eth_plat_data::edmac_endian' stopped to matter,
so we can remove that field for good (not sure if it was ever useful --
SH7786 Ether has been reported to have the same EDMAC descriptor/register
endiannes as configured for the SuperH CPU)...
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 8 Jan 2018 19:06:20 +0000 (14:06 -0500)]
Merge branch 'hns3-next'
Peng Li says:
====================
add some new features and fix some bugs for HNS3 driver
This patchset adds some new features support and fixes some bugs:
[Patch 1/20] adds support to enable/disable vlan filter with ethtool
[Patch 2/20] disables VFs change rxvlan offload status
[Patch 3/20 - 13/120 fix bugs and refine some codes for packet
statistics, support query with both ifconfig and ethtool.
[Patch 14/20 - 20/20] fix some other bugs.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fuyun Liang [Fri, 5 Jan 2018 10:18:22 +0000 (18:18 +0800)]
net: hns3: fix for not setting pause parameters
Pause parameters include source address, transmit gap and pause time.
The default value of the pause source address is zero in the hardware.
Default pause parameters need to be set to the hardware. Also, when
setting new mac address, the pause source address need to be updated.
Fixes: 9dc2145d910e ("net: hns3: Add support for PFC setting in TM module") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fuyun Liang [Fri, 5 Jan 2018 10:18:21 +0000 (18:18 +0800)]
net: hns3: add MTU initialization for hardware
When initializing the MAC, the MTU vlaue need to be set to the hardware
too. Otherwise, the MTU value of software will be different from the MTU
value of hardware.
Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fuyun Liang [Fri, 5 Jan 2018 10:18:20 +0000 (18:18 +0800)]
net: hns3: fix for changing MTU
when changing MTU, The new MTU must need to be set to netdevice.
Fixes: a8e8b7ff3517 ("net: hns3: Add support to change MTU in HNS3 hardware") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fuyun Liang [Fri, 5 Jan 2018 10:18:19 +0000 (18:18 +0800)]
net: hns3: fix for setting MTU
When setting MTU, actually what we do is configuring the max frame size
for the hardware. ETH_HLEN、ETH_FCS_LEN and VLAN_HLEN must need to be
considered. And the frame size which is less than the default value
should not be set to the hardware. Because in the hardware, the the max
frame size not only controls the RX packet size, but also controls the
TX packet size. the RX packets whose size are greater than the setting
value will be dropped.
This patch fixes the bug setting a error max frame size to hardware.
Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fuyun Liang [Fri, 5 Jan 2018 10:18:18 +0000 (18:18 +0800)]
net: hns3: fix for updating fc_mode_last_time
commit a9c782822166 ("net: hns3: add support for set_pauseparam")
adds set_pauseparam support for ethtool cmd, but forgets to update
fc_mode_last_time when PFC mode is disabled in hclge_cfg_pauseparam().
The wrong fc_mode_last_time will be used to update flow control mode
when lldpad has been running. As a result, when using the ethtool
command "-a", user will get a wrong pause parameter.
This patch adds the fc_mode_last_time update when PFC mode is disabled.
Fixes: a9c782822166 ("net: hns3: add support for set_pauseparam") Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Jan 2018 10:18:14 +0000 (18:18 +0800)]
net: hns3: Fix an error macro definition of HNS3_TQP_STAT
The member "stats_offset" was designed to indicate the offset
of each member of struct ring_stats in struct hns3_enet_ring,
but forgot to add the offset of the member in struct ring_stats.
Fixes: 496d03e960a ("net: hns3: Add Ethtool support to HNS3 driver") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Jan 2018 10:18:13 +0000 (18:18 +0800)]
net: hns3: Fix a loop index error of tqp statistics query
An error loop index was used while querying statistics data
of tqps, which may cause call trace.
Fixes: 496d03e960ae ("net: hns3: Add Ethtool support to HNS3 driver") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Jan 2018 10:18:12 +0000 (18:18 +0800)]
net: hns3: Fix an error of total drop packet statistics
The dropped tx/rx packets number of each tqp should also
be counted into the total drop tx/rx packets numbers.
Fixes: 76ad4f0ee74 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Jan 2018 10:18:11 +0000 (18:18 +0800)]
net: hns3: Mask the packet statistics query when NIC is down
Update the HNS3_NIC_STATE_DOWN bit when NIC state changes.
When NIC is down, mask the packet statistics for querying
with ifconfig command. It's a common practice.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Jan 2018 10:18:10 +0000 (18:18 +0800)]
net: hns3: Modify the update period of packet statistics
It takes more than 200 query response messages between
driver and IMP, while updating the packet statistics.
It's too heavy for IMP to update it per second.
Extend the update period of packet statistics data from
1 second to 300 seconds(if too long, the statistics may
overflow).
As a result, we need to update it while querying with
ifconfig tool to keep the statistics data fresh.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Jan 2018 10:18:07 +0000 (18:18 +0800)]
net: hns3: Unify the strings display of packet statistics
Some members of packet statistics are named in different styles.
This patch unifies them with new internal name rules, the main
modification are below:
trans --> tx
rcv --> rx
rcb_q%d_tx --> txq#%d
rcb_q%d_rx --> rxq#%d
sw_err_cnt(tx side) --> tx_dropped
sw_err_cnt(rx side) --> rx_dropped
pkts --> packets
tx_err_cnt --> errors
rx_err_cnt --> errors
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 5 Jan 2018 10:18:06 +0000 (18:18 +0800)]
net: hns3: Disable VFs change rxvlan offload status
Rxvlan offload status can only be changed by PF. Initialize
the value of NETIF_F_HW_VLAN_CTAG_RX bit of hw_features for
VFS to false, make sure user can't be able to change it.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This series introduces the MAPv4 packet format for checksum
offload plus some other minor changes.
Patches 1-3 are cleanups.
Patch 4 renames the ingress format to data format so that all data
formats can be configured using this going forward.
Patch 5 uses the pacing helper to improve TCP transmit performance.
Patch 6-9 defines the the MAPv4 for checksum offload for RX and TX.
A new header and trailer format are used as part of MAPv4.
For RX checksum offload, only the 1's complement of the IP payload
portion is computed by hardware. The meta data from RX header is
used to verify the checksum field in the packet. Note that the
IP packet and its field itself is not modified by hardware.
This gives metadata to help with the RX checksum. For TX, the
required metadata is filled up so hardware can compute the
checksum.
Patch 10 enables GSO on rmnet devices
v1->v2: Fix sparse errors reported by kbuild test robot
v2->v3: Update the commit message for Patch 5 based on Eric's comments
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Real devices may support scatter gather(SG), so enable SG on rmnet
devices to use GSO. GSO reduces CPU cycles by 20% for a rate of
146Mpbs for a single stream TCP connection.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: qualcomm: rmnet: Add support for TX checksum offload
TX checksum offload applies to TCP / UDP packets which are not
fragmented using the MAPv4 checksum trailer. The following needs to be
done to have checksum computed in hardware -
1. Set the checksum start offset and inset offset.
2. Set the csum_enabled bit
3. Compute and set 1's complement of partial checksum field in
transport header.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: qualcomm: rmnet: Handle command packets with checksum trailer
When using the MAPv4 packet format in conjunction with MAP commands,
a dummy DL checksum trailer will be appended to the packet. Before
this packet is sent out as an ACK, the DL checksum trailer needs to be
removed.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
net: qualcomm: rmnet: Add support for RX checksum offload
When using the MAPv4 packet format, receive checksum offload can be
enabled in hardware. The checksum computation over pseudo header is
not offloaded but the rest of the checksum computation over
the payload is offloaded. This applies only for TCP / UDP packets
which are not fragmented.
rmnet validates the TCP/UDP checksum for the packet using the checksum
from the checksum trailer added to the packet by hardware. The
validation performed is as following -
1. Perform 1's complement over the checksum value from the trailer
2. Compute 1's complement checksum over IPv4 / IPv6 header and
subtracts it from the value from step 1
3. Computes 1's complement checksum over IPv4 / IPv6 pseudo header and
adds it to the value from step 2
4. Subtracts the checksum value from the TCP / UDP header from the
value from step 3.
5. Compares the value from step 4 to the checksum value from the
TCP / UDP header.
6. If the comparison in step 5 succeeds, CHECKSUM_UNNECESSARY is set
and the packet is passed on to network stack. If there is a
failure, then the packet is passed on as such without modifying
the ip_summed field.
The checksum field is also checked for UDP checksum 0 as per RFC 768
and for unexpected TCP checksum of 0.
If checksum offload is disabled when using MAPv4 packet format in
receive path, the packet is queued as is to network stack without
the validations above.
Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>