]> git.proxmox.com Git - mirror_ubuntu-eoan-kernel.git/log
mirror_ubuntu-eoan-kernel.git
6 years agonetfilter: nf_defrag: Skip defrag if NOTRACK is set
Subash Abhinov Kasiviswanathan [Thu, 11 Jan 2018 03:51:57 +0000 (20:51 -0700)]
netfilter: nf_defrag: Skip defrag if NOTRACK is set

conntrack defrag is needed only if some module like CONNTRACK or NAT
explicitly requests it. For plain forwarding scenarios, defrag is
not needed and can be skipped if NOTRACK is set in a rule.

Since conntrack defrag is currently higher priority than raw table,
setting NOTRACK is not sufficient. We need to move raw to a higher
priority for iptables only.

This is achieved by introducing a module parameter "raw_before_defrag"
which allows to change the priority of raw table to place it before
defrag. By default, the parameter is disabled and the priority of raw
table is NF_IP_PRI_RAW to support legacy behavior. If the module
parameter is enabled, then the priority of the raw table is set to
NF_IP_PRI_RAW_BEFORE_DEFRAG.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: clusterip: make sure arp hooks are available
Florian Westphal [Thu, 11 Jan 2018 08:21:29 +0000 (09:21 +0100)]
netfilter: clusterip: make sure arp hooks are available

The clusterip target needs to register an arp mangling hook,
so make sure NF_ARP hooks are available.

Fixes: 2a95183a5e ("netfilter: don't allocate space for arp/bridge hooks unless needed")
Reported-by: kernel test robot <fengguang.wu@intel.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: improve flow table Kconfig dependencies
Arnd Bergmann [Wed, 10 Jan 2018 17:10:59 +0000 (18:10 +0100)]
netfilter: improve flow table Kconfig dependencies

The newly added NF_FLOW_TABLE options cause some build failures in
randconfig kernels:

- when CONFIG_NF_CONNTRACK is disabled, or is a loadable module but
  NF_FLOW_TABLE is built-in:

  In file included from net/netfilter/nf_flow_table.c:8:0:
  include/net/netfilter/nf_conntrack.h:59:22: error: field 'ct_general' has incomplete type
    struct nf_conntrack ct_general;
  include/net/netfilter/nf_conntrack.h: In function 'nf_ct_get':
  include/net/netfilter/nf_conntrack.h:148:15: error: 'const struct sk_buff' has no member named '_nfct'
  include/net/netfilter/nf_conntrack.h: In function 'nf_ct_put':
  include/net/netfilter/nf_conntrack.h:157:2: error: implicit declaration of function 'nf_conntrack_put'; did you mean 'nf_ct_put'? [-Werror=implicit-function-declaration]

  net/netfilter/nf_flow_table.o: In function `nf_flow_offload_work_gc':
  (.text+0x1540): undefined reference to `nf_ct_delete'

- when CONFIG_NF_TABLES is disabled:

  In file included from net/ipv6/netfilter/nf_flow_table_ipv6.c:13:0:
  include/net/netfilter/nf_tables.h: In function 'nft_gencursor_next':
  include/net/netfilter/nf_tables.h:1189:14: error: 'const struct net' has no member named 'nft'; did you mean 'nf'?

 - when CONFIG_NF_FLOW_TABLE_INET is enabled, but NF_FLOW_TABLE_IPV4
  or NF_FLOW_TABLE_IPV6 are not, or are loadable modules

  net/netfilter/nf_flow_table_inet.o: In function `nf_flow_offload_inet_hook':
  nf_flow_table_inet.c:(.text+0x94): undefined reference to `nf_flow_offload_ipv6_hook'
  nf_flow_table_inet.c:(.text+0x40): undefined reference to `nf_flow_offload_ip_hook'

- when CONFIG_NF_FLOW_TABLES is disabled, but the other options are
  enabled:

  net/netfilter/nf_flow_table_inet.o: In function `nf_flow_offload_inet_hook':
  nf_flow_table_inet.c:(.text+0x6c): undefined reference to `nf_flow_offload_ipv6_hook'
  net/netfilter/nf_flow_table_inet.o: In function `nf_flow_inet_module_exit':
  nf_flow_table_inet.c:(.exit.text+0x8): undefined reference to `nft_unregister_flowtable_type'
  net/netfilter/nf_flow_table_inet.o: In function `nf_flow_inet_module_init':
  nf_flow_table_inet.c:(.init.text+0x8): undefined reference to `nft_register_flowtable_type'
  net/ipv4/netfilter/nf_flow_table_ipv4.o: In function `nf_flow_ipv4_module_exit':
  nf_flow_table_ipv4.c:(.exit.text+0x8): undefined reference to `nft_unregister_flowtable_type'
  net/ipv4/netfilter/nf_flow_table_ipv4.o: In function `nf_flow_ipv4_module_init':
  nf_flow_table_ipv4.c:(.init.text+0x8): undefined reference to `nft_register_flowtable_type'

This adds additional Kconfig dependencies to ensure that NF_CONNTRACK and NF_TABLES
are always visible from NF_FLOW_TABLE, and that the internal dependencies between
the four new modules are met.

Fixes: 7c23b629a808 ("netfilter: flow table support for the mixed IPv4/IPv6 family")
Fixes: 0995210753a2 ("netfilter: flow table support for IPv6")
Fixes: 97add9f0d66d ("netfilter: flow table support for IPv4")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: add IPv6 segment routing header 'srh' match
Ahmed Abdelsalam [Sun, 7 Jan 2018 18:22:02 +0000 (19:22 +0100)]
netfilter: add IPv6 segment routing header 'srh' match

It allows matching packets based on Segment Routing Header
(SRH) information.
The implementation considers revision 7 of the SRH draft.
https://tools.ietf.org/html/draft-ietf-6man-segment-routing-header-07

Currently supported match options include:
(1) Next Header
(2) Hdr Ext Len
(3) Segments Left
(4) Last Entry
(5) Tag value of SRH

Signed-off-by: Ahmed Abdelsalam <amsalam20@gmail.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: core: return EBUSY in case NAT hook is already in use
Pablo Neira Ayuso [Wed, 10 Jan 2018 14:24:15 +0000 (15:24 +0100)]
netfilter: core: return EBUSY in case NAT hook is already in use

EEXIST is used for an object that already exists, with the same
name/handle. However, there no same object there, instead there is a
object that is using the single slot that is available for NAT hooks
since patch f92b40a8b264 ("netfilter: core: only allow one nat hook per
hook point"). Let's change this return value before this behaviour gets
exposed in the first -rc.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: remove duplicated include
Wei Yongjun [Wed, 10 Jan 2018 13:06:46 +0000 (13:06 +0000)]
netfilter: remove duplicated include

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: core: make local function __nf_unregister_net_hook static
Wei Yongjun [Wed, 10 Jan 2018 07:05:06 +0000 (07:05 +0000)]
netfilter: core: make local function __nf_unregister_net_hook static

Fixes the following sparse warning:

net/netfilter/core.c:380:6: warning:
 symbol '__nf_unregister_net_hook' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_tables: fix a typo in nf_tables_getflowtable()
Wei Yongjun [Wed, 10 Jan 2018 07:04:54 +0000 (07:04 +0000)]
netfilter: nf_tables: fix a typo in nf_tables_getflowtable()

Fix a typo, we should check 'flowtable' instead of 'table'.

Fixes: 3b49e2e94e6e ("netfilter: nf_tables: add flow table netlink frontend")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: x_tables: unbreak module auto loading
Florian Westphal [Tue, 9 Jan 2018 13:30:48 +0000 (14:30 +0100)]
netfilter: x_tables: unbreak module auto loading

a typo causes module auto load support to never be compiled in.

Fixes: 03d13b6868a2 ("netfilter: xtables: add and use xt_request_find_table_lock")
Reported-by: Pablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_tables: get rid of struct nft_af_info abstraction
Pablo Neira Ayuso [Tue, 9 Jan 2018 01:48:47 +0000 (02:48 +0100)]
netfilter: nf_tables: get rid of struct nft_af_info abstraction

Remove the infrastructure to register/unregister nft_af_info structure,
this structure stores no useful information anymore.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_tables: get rid of pernet families
Pablo Neira Ayuso [Tue, 9 Jan 2018 01:42:11 +0000 (02:42 +0100)]
netfilter: nf_tables: get rid of pernet families

Now that we have a single table list for each netns, we can get rid of
one pointer per family and the global afinfo list, thus, shrinking
struct netns for nftables that now becomes 64 bytes smaller.

And call __nft_release_afinfo() from __net_exit path accordingly to
release netnamespace objects on removal.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_tables: add single table list for all families
Pablo Neira Ayuso [Tue, 9 Jan 2018 01:38:03 +0000 (02:38 +0100)]
netfilter: nf_tables: add single table list for all families

Place all existing user defined tables in struct net *, instead of
having one list per family. This saves us from one level of indentation
in netlink dump functions.

Place pointer to struct nft_af_info in struct nft_table temporarily, as
we still need this to put back reference module reference counter on
table removal.

This patch comes in preparation for the removal of struct nft_af_info.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_tables: remove struct nft_af_info parameter in nf_tables_chain_type_loo...
Pablo Neira Ayuso [Tue, 19 Dec 2017 12:40:22 +0000 (13:40 +0100)]
netfilter: nf_tables: remove struct nft_af_info parameter in nf_tables_chain_type_lookup()

Pass family number instead, this comes in preparation for the removal of
struct nft_af_info.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_tables: no need for struct nft_af_info to enable/disable table
Pablo Neira Ayuso [Tue, 19 Dec 2017 11:17:52 +0000 (12:17 +0100)]
netfilter: nf_tables: no need for struct nft_af_info to enable/disable table

nf_tables_table_enable() and nf_tables_table_disable() take a pointer to
struct nft_af_info that is never used, remove it.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_tables: remove flag field from struct nft_af_info
Pablo Neira Ayuso [Tue, 19 Dec 2017 13:07:52 +0000 (14:07 +0100)]
netfilter: nf_tables: remove flag field from struct nft_af_info

Replace it by a direct check for the netdev protocol family.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agonetfilter: nf_tables: remove nhooks field from struct nft_af_info
Pablo Neira Ayuso [Tue, 19 Dec 2017 12:53:45 +0000 (13:53 +0100)]
netfilter: nf_tables: remove nhooks field from struct nft_af_info

We already validate the hook through bitmask, so this check is
superfluous. When removing this, this patch is also fixing a bug in the
new flowtable codebase, since ctx->afi points to the table family
instead of the netdev family which is where the flowtable is really
hooked in.

Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
6 years agoMerge branch 'r8169-improve-runtime-pm'
David S. Miller [Tue, 9 Jan 2018 17:38:57 +0000 (12:38 -0500)]
Merge branch 'r8169-improve-runtime-pm'

Heiner Kallweit says:

====================
r8169: improve runtime pm

On my system with two network ports I found that runtime PM didn't
suspend the unused port. Therefore I checked runtime pm in this driver
in somewhat more detail and this series improves runtime pm in general
and solves the mentioned issue.

Tested on a system with RTL8168evl (MAC version 34).
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: improve runtime pm in general and suspend unused ports
Heiner Kallweit [Mon, 8 Jan 2018 20:39:13 +0000 (21:39 +0100)]
r8169: improve runtime pm in general and suspend unused ports

So far rpm doesn't cover cases like unused ports which are never
brought up. If they are active at probe time they remain in this state.
Included in this patch:

- Let the idle notification check whether we can suspend and let it
  schedule the suspend. This way we don't need to have calls to
  pm_schedule_suspend in different places.

- At the end of rtl_open and rtl_init_one send an idle notification
  to allow suspending if the link is down. If a cable is plugged in
  aneg is finished before the suspend timer expires and the suspend
  request is cancelled.

- Change rtl8169_runtime_suspend to power down the chip if the
  interface is down.

Successfully tested on a RTL8168evl (mac version 34).

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: improve runtime pm in rtl8169_check_link_status
Heiner Kallweit [Mon, 8 Jan 2018 20:39:07 +0000 (21:39 +0100)]
r8169: improve runtime pm in rtl8169_check_link_status

This patch partially reverts commit e4fbce740f07 "r8169: Fix runtime
power management" from 2010. At that time the suspend delay was 100ms
and therefore suspending happened during initial aneg. Currently
suspend delay is 5s, so suspend starts after aneg and the issue
doesn't exist any longer. On my system aneg takes almost 3s, to be on
the safe side let's increase the suspend delay to 10s.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agor8169: remove unneeded rpm ops in rtl_shutdown
Heiner Kallweit [Mon, 8 Jan 2018 20:39:02 +0000 (21:39 +0100)]
r8169: remove unneeded rpm ops in rtl_shutdown

This patch reverts commit 2a15cd2ff488 "r8169: runtime resume before
shutdown" from 2012. Few months after this change the underlying issue
was solved in the PCI core with commit 3ff2de9ba1a2 "PCI/PM: Resume
device before shutdown".

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'tipc-improvements-to-group-messaging'
David S. Miller [Tue, 9 Jan 2018 17:35:59 +0000 (12:35 -0500)]
Merge branch 'tipc-improvements-to-group-messaging'

Jon Maloy says:

====================
tipc: improvements to group messaging

We make a number of simplifications and improvements to the group
messaging service. They aim at readability/maintainability of the code
as well as scalability.

The series is based on commit f9c935db8086 ("tipc: fix problems with
multipoint-to-point flow control) which has been applied to 'net' but
not yet to 'net-next'.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: improve poll() for group member socket
Jon Maloy [Mon, 8 Jan 2018 20:03:31 +0000 (21:03 +0100)]
tipc: improve poll() for group member socket

The current criteria for returning POLLOUT from a group member socket is
too simplistic. It basically returns POLLOUT as soon as the group has
external destinations, something obviously leading to a lot of spinning
during destination congestion situations. At the same time, the internal
congestion handling is unnecessarily complex.

We now change this as follows.

- We introduce an 'open' flag in  struct tipc_group. This flag is used
  only to help poll() get the setting of POLLOUT right, and *not* for
  congeston handling as such. This means that a user can choose to
  ignore an  EAGAIN for a destination and go on sending messages to
  other destinations in the group if he wants to.

- The flag is set to false every time we return EAGAIN on a send call.

- The flag is set to true every time any member, i.e., not necessarily
  the member that caused EAGAIN, is removed from the small_win list.

- We remove the group member 'usr_pending' flag. The size of the send
  window and presence in the 'small_win' list is sufficient criteria
  for recognizing congestion.

This solution seems to be a reasonable compromise between 'anycast',
which is normally not waiting for POLLOUT for a specific destination,
and the other three send modes, which are.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: improve groupcast scope handling
Jon Maloy [Mon, 8 Jan 2018 20:03:30 +0000 (21:03 +0100)]
tipc: improve groupcast scope handling

When a member joins a group, it also indicates a binding scope. This
makes it possible to create both node local groups, invisible to other
nodes, as well as cluster global groups, visible everywhere.

In order to avoid that different members end up having permanently
differing views of group size and memberhip, we must inhibit locally
and globally bound members from joining the same group.

We do this by using the binding scope as an additional separator between
groups. I.e., a member must ignore all membership events from sockets
using a different scope than itself, and all lookups for message
destinations must require an exact match between the message's lookup
scope and the potential target's binding scope.

Apart from making it possible to create local groups using the same
identity on different nodes, a side effect of this is that it now also
becomes possible to create a cluster global group with the same identity
across the same nodes, without interfering with the local groups.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: add option to suppress PUBLISH events for pre-existing publications
Jon Maloy [Mon, 8 Jan 2018 20:03:29 +0000 (21:03 +0100)]
tipc: add option to suppress PUBLISH events for pre-existing publications

Currently, when a user is subscribing for binding table publications,
he will receive a PUBLISH event for all already existing matching items
in the binding table.

However, a group socket making a subscriptions doesn't need this initial
status update from the binding table, because it has already scanned it
during the join operation. Worse, the multiplicatory effect of issuing
mutual events for dozens or hundreds group members within a short time
frame put a heavy load on the topology server, with the end result that
scale out operations on a big group tend to take much longer than needed.

We now add a new filter option, TIPC_SUB_NO_STATUS, for topology server
subscriptions, so that this initial avalanche of events is suppressed.
This change, along with the previous commit, significantly improves the
range and speed of group scale out operations.

We keep the new option internal for the tipc driver, at least for now.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: send out join messages as soon as new member is discovered
Jon Maloy [Mon, 8 Jan 2018 20:03:28 +0000 (21:03 +0100)]
tipc: send out join messages as soon as new member is discovered

When a socket is joining a group, we look up in the binding table to
find if there are already other members of the group present. This is
used for being able to return EAGAIN instead of EHOSTUNREACH if the
user proceeds directly to a send attempt.

However, the information in the binding table can be used to directly
set the created member in state MBR_PUBLISHED and send a JOIN message
to the peer, instead of waiting for a topology PUBLISH event to do this.
When there are many members in a group, the propagation time for such
events can be significant, and we can save time during the join
operation if we use the initial lookup result fully.

In this commit, we eliminate the member state MBR_DISCOVERED which has
been the result of the initial lookup, and do instead go directly to
MBR_PUBLISHED, which initiates the setup.

After this change, the tipc_member FSM looks as follows:

     +-----------+
---->| PUBLISHED |-----------------------------------------------+
PUB- +-----------+                                 LEAVE/WITHRAW |
LISH       |JOIN                                                 |
           |     +-------------------------------------------+   |
           |     |                            LEAVE/WITHDRAW |   |
           |     |                +------------+             |   |
           |     |   +----------->|  PENDING   |---------+   |   |
           |     |   |msg/maxactv +-+---+------+  LEAVE/ |   |   |
           |     |   |              |   |       WITHDRAW |   |   |
           |     |   |   +----------+   |                |   |   |
           |     |   |   |revert/maxactv|                |   |   |
           |     |   |   V              V                V   V   V
           |   +----------+  msg  +------------+       +-----------+
           +-->|  JOINED  |------>|   ACTIVE   |------>|  LEAVING  |--->
           |   +----------+       +--- -+------+ LEAVE/+-----------+DOWN
           |        A   A               |      WITHDRAW A   A    A   EVT
           |        |   |               |RECLAIM        |   |    |
           |        |   |REMIT          V               |   |    |
           |        |   |== adv   +------------+        |   |    |
           |        |   +---------| RECLAIMING |--------+   |    |
           |        |             +-----+------+  LEAVE/    |    |
           |        |                   |REMIT   WITHDRAW   |    |
           |        |                   |< adv              |    |
           |        |msg/               V            LEAVE/ |    |
           |        |adv==ADV_IDLE+------------+   WITHDRAW |    |
           |        +-------------|  REMITTED  |------------+    |
           |                      +------------+                 |
           |PUBLISH                                              |
JOIN +-----------+                                LEAVE/WITHDRAW |
---->|  JOINING  |-----------------------------------------------+
     +-----------+

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: simplify group LEAVE sequence
Jon Maloy [Mon, 8 Jan 2018 20:03:27 +0000 (21:03 +0100)]
tipc: simplify group LEAVE sequence

After the changes in the previous commit the group LEAVE sequence
can be simplified.

We now let the arrival of a LEAVE message unconditionally issue a group
DOWN event to the user. When a topology WITHDRAW event is received, the
member, if it still there, is set to state LEAVING, but we only issue a
group DOWN event when the link to the peer node is gone, so that no
LEAVE message is to be expected.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: create group member event messages when they are needed
Jon Maloy [Mon, 8 Jan 2018 20:03:26 +0000 (21:03 +0100)]
tipc: create group member event messages when they are needed

In the current implementation, a group socket receiving topology
events about other members just converts the topology event message
into a group event message and stores it until it reaches the right
state to issue it to the user. This complicates the code unnecessarily,
and becomes impractical when we in the coming commits will need to
create and issue membership events independently.

In this commit, we change this so that we just notice the type and
origin of the incoming topology event, and then drop the buffer. Only
when it is time to actually send a group event to the user do we
explicitly create a new message and send it upwards.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: adjustment to group member FSM
Jon Maloy [Mon, 8 Jan 2018 20:03:25 +0000 (21:03 +0100)]
tipc: adjustment to group member FSM

Analysis reveals that the member state MBR_QURANTINED in reality is
unnecessary, and can be replaced by the state MBR_JOINING at all
occurrencs.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: let group member stay in JOINED mode if unable to reclaim
Jon Maloy [Mon, 8 Jan 2018 20:03:24 +0000 (21:03 +0100)]
tipc: let group member stay in JOINED mode if unable to reclaim

We handle a corner case in the function tipc_group_update_rcv_win().
During extreme pessure it might happen that a message receiver has all
its active senders in RECLAIMING or REMITTED mode, meaning that there
is nobody to reclaim advertisements from if an additional sender tries
to go active.

Currently we just set the new sender to ACTIVE anyway, hence at least
theoretically opening up for a receiver queue overflow by exceeding the
MAX_ACTIVE limit. The correct solution to this is to instead add the
member to the pending queue, while letting the oldest member in that
queue revert to JOINED state.

In this commit we refactor the code for handling message arrival from
a JOINED member, both to make it more comprehensible and to cover the
case described above.

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotipc: a couple of cleanups
Jon Maloy [Mon, 8 Jan 2018 20:03:23 +0000 (21:03 +0100)]
tipc: a couple of cleanups

- We remove the 'reclaiming' member list in struct tipc_group, since
  it doesn't serve any purpose.

- We simplify the GRP_REMIT_MSG branch of tipc_group_protocol_rcv().

Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'ethtool-ringparam-upper-bound'
David S. Miller [Tue, 9 Jan 2018 16:54:50 +0000 (11:54 -0500)]
Merge branch 'ethtool-ringparam-upper-bound'

Tariq Toukan says:

====================
ethtool ringparam upper bound

This patchset by Jenny adds sanity checks in ethtool ringparam
operation for input upper bounds, similarly to what's done in
ethtool_set_channels.

The checks are added in patch 1, using a call to get_ringparam
prior to calling set_ringparam NDO.

Patch 2 changes the function's behavior in mlx4_en, so that
it returns an error for out-of-range input, instead of rounding
it to closest valid, similar to mlx5e.

Patch 3 removes the upper bound checks in mlx5e_ethtool_set_ringparam
as it becomes redundant.

Series generated against net-next commit:
f66faae2f80a Merge branch 'ipv6-ipv4-nexthop-align'
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx5e: Remove redundant checks in set_ringparam
Eugenia Emantayev [Mon, 8 Jan 2018 14:00:26 +0000 (16:00 +0200)]
net/mlx5e: Remove redundant checks in set_ringparam

Since the checks are done in upper layer ethtool code,
checks in driver are not needed any more.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx4_en: Align behavior of set ring size flow via ethtool
Eugenia Emantayev [Mon, 8 Jan 2018 14:00:25 +0000 (16:00 +0200)]
net/mlx4_en: Align behavior of set ring size flow via ethtool

In current implementation, any requested RX/TX ring size value
that is less than minimum is silently casted to nearest valid value.
Update this behavior to align with mlx5 behavior by printing warning
in dmesg and remaining the size unchanged.
Kernel is responsible for verifying against the maximum.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoethtool: Ensure new ring parameters are within bounds during SRINGPARAM
Eugenia Emantayev [Mon, 8 Jan 2018 14:00:24 +0000 (16:00 +0200)]
ethtool: Ensure new ring parameters are within bounds during SRINGPARAM

Add a sanity check to ensure that all requested ring parameters
are within bounds, which should reduce errors in driver implementation.

Signed-off-by: Eugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoipv6: use ARRAY_SIZE for array sizing calculation on array seg6_action_table
Colin Ian King [Sun, 7 Jan 2018 23:50:26 +0000 (23:50 +0000)]
ipv6: use ARRAY_SIZE for array sizing calculation on array seg6_action_table

Use the ARRAY_SIZE macro on array seg6_action_table to determine size of
the array. Improvement suggested by coccinelle.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobe2net: use ARRAY_SIZE for array sizing calculation on array cmd_priv_map
Colin Ian King [Sun, 7 Jan 2018 23:45:08 +0000 (23:45 +0000)]
be2net: use ARRAY_SIZE for array sizing calculation on array cmd_priv_map

Use the ARRAY_SIZE macro on array cmd_priv_map to determine size of the
array.  Improvement suggested by coccinelle.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agovirtio_net: propagate linkspeed/duplex settings from the hypervisor
Jason Baron [Fri, 5 Jan 2018 22:44:54 +0000 (17:44 -0500)]
virtio_net: propagate linkspeed/duplex settings from the hypervisor

The ability to set speed and duplex for virtio_net is useful in various
scenarios as described here:

16032be virtio_net: add ethtool support for set and get of settings

However, it would be nice to be able to set this from the hypervisor,
such that virtio_net doesn't require custom guest ethtool commands.

Introduce a new feature flag, VIRTIO_NET_F_SPEED_DUPLEX, which allows
the hypervisor to export a linkspeed and duplex setting. The user can
subsequently overwrite it later if desired via: 'ethtool -s'.

Note that VIRTIO_NET_F_SPEED_DUPLEX is defined as bit 63, the intention
is that device feature bits are to grow down from bit 63, since the
transports are starting from bit 24 and growing up.

Signed-off-by: Jason Baron <jbaron@akamai.com>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: virtio-dev@lists.oasis-open.org
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomacsec: Add support for GCM-AES-256 cipher suite
Felix Walter [Fri, 5 Jan 2018 13:33:31 +0000 (14:33 +0100)]
macsec: Add support for GCM-AES-256 cipher suite

This adds support for the GCM-AES-256 cipher suite as specified in
IEEE 802.1AEbn-2011. The prepared cipher suite selection mechanism is used,
with GCM-AES-128 being the default cipher suite as defined in the standard.

Signed-off-by: Felix Walter <felix.walter@cloudandheat.com>
Cc: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'XDP-transmission-for-tuntap'
David S. Miller [Tue, 9 Jan 2018 15:57:19 +0000 (10:57 -0500)]
Merge branch 'XDP-transmission-for-tuntap'

Jason Wang says:

====================
XDP transmission for tuntap

This series tries to implement XDP transmission (ndo_xdp_xmit) for
tuntap. Pointer ring was used for queuing both XDP buffers and
sk_buff, this is done by encoding the type into lowest bit of the
pointer and storin XDP metadata in the headroom of XDP buff.

Tests gets 3.05 Mpps when doing xdp_redirect_map from ixgbe to VM
(testpmd + virtio-net in guest). This gives us ~20% improvments
compared to use skb during redirect.

Please review.

Changes from V1:

- slient warnings
- fix typos
- add skb mode number in the commit log
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotuntap: XDP transmission
Jason Wang [Thu, 4 Jan 2018 03:14:28 +0000 (11:14 +0800)]
tuntap: XDP transmission

This patch implements XDP transmission for TAP. Since we can't create
new queues for TAP during XDP set, exist ptr_ring was reused for
queuing XDP buffers. To differ xdp_buff from sk_buff, TUN_XDP_FLAG
(0x1UL) was encoded into lowest bit of xpd_buff pointer during
ptr_ring_produce, and was decoded during consuming. XDP metadata was
stored in the headroom of the packet which should work in most of
cases since driver usually reserve enough headroom. Very minor changes
were done for vhost_net: it just need to peek the length depends on
the type of pointer.

Tests were done on two Intel E5-2630 2.40GHz machines connected back
to back through two 82599ES. Traffic were generated/received through
MoonGen/testpmd(rxonly). It reports ~20% improvements when
xdp_redirect_map is doing redirection from ixgbe to TAP (from 2.50Mpps
to 3.05Mpps)

Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agotun/tap: use ptr_ring instead of skb_array
Jason Wang [Thu, 4 Jan 2018 03:14:27 +0000 (11:14 +0800)]
tun/tap: use ptr_ring instead of skb_array

This patch switches to use ptr_ring instead of skb_array. This will be
used to enqueue different types of pointers by encoding type into
lower bits.

Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Tue, 9 Jan 2018 15:37:00 +0000 (10:37 -0500)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Tue, 9 Jan 2018 04:21:39 +0000 (20:21 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Frag and UDP handling fixes in i40e driver, from Amritha Nambiar and
    Alexander Duyck.

 2) Undo unintentional UAPI change in netfilter conntrack, from Florian
    Westphal.

 3) Revert a change to how error codes are returned from
    dev_get_valid_name(), it broke some apps.

 4) Cannot cache routes for ipv6 tunnels in the tunnel is ipv4/ipv6
    dual-stack. From Eli Cooper.

 5) Fix missed PMTU updates in geneve, from Xin Long.

 6) Cure double free in macvlan, from Gao Feng.

 7) Fix heap out-of-bounds write in rds_message_alloc_sgs(), from
    Mohamed Ghannam.

 8) FEC bug fixes from FUgang Duan (mis-accounting of dev_id, missed
    deferral of probe when the regulator is not ready yet).

 9) Missing DMA mapping error checks in 3c59x, from Neil Horman.

10) Turn off Broadcom tags for some b53 switches, from Florian Fainelli.

11) Fix OOPS when get_target_net() is passed an SKB whose NETLINK_CB()
    isn't initialized. From Andrei Vagin.

12) Fix crashes in fib6_add(), from Wei Wang.

13) PMTU bug fixes in SCTP from Marcelo Ricardo Leitner.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (56 commits)
  sh_eth: fix TXALCR1 offsets
  mdio-sun4i: Fix a memory leak
  phylink: mark expected switch fall-throughs in phylink_mii_ioctl
  sctp: fix the handling of ICMP Frag Needed for too small MTUs
  sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled
  xen-netfront: enable device after manual module load
  bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
  bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc()
  sh_eth: fix SH7757 GEther initialization
  net: fec: free/restore resource in related probe error pathes
  uapi/if_ether.h: prevent redefinition of struct ethhdr
  ipv6: fix general protection fault in fib6_add()
  RDS: null pointer dereference in rds_atomic_free_op
  sh_eth: fix TSU resource handling
  net: stmmac: enable EEE in MII, GMII or RGMII only
  rtnetlink: give a user socket to get_target_net()
  MAINTAINERS: Update my email address.
  can: ems_usb: improve error reporting for error warning and error passive
  can: flex_can: Correct the checking for frame length in flexcan_start_xmit()
  can: gs_usb: fix return value of the "set_bittiming" callback
  ...

6 years agonet: tipc: remove unused hardirq.h
Yang Shi [Mon, 8 Jan 2018 19:52:54 +0000 (03:52 +0800)]
net: tipc: remove unused hardirq.h

Preempt counter APIs have been split out, currently, hardirq.h just
includes irq_enter/exit APIs which are not used by TIPC at all.

So, remove the unused hardirq.h.

Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Tested-by: Ying Xue <ying.xue@windriver.com>
Cc: Jon Maloy <jon.maloy@ericsson.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ovs: remove unused hardirq.h
Yang Shi [Mon, 8 Jan 2018 19:52:53 +0000 (03:52 +0800)]
net: ovs: remove unused hardirq.h

Preempt counter APIs have been split out, currently, hardirq.h just
includes irq_enter/exit APIs which are not used by openvswitch at all.

So, remove the unused hardirq.h.

Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
Acked-by: Pravin B Shelar <pshelar@ovn.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: dev@openvswitch.org
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: caif: remove unused hardirq.h
Yang Shi [Mon, 8 Jan 2018 19:52:52 +0000 (03:52 +0800)]
net: caif: remove unused hardirq.h

Preempt counter APIs have been split out, currently, hardirq.h just
includes irq_enter/exit APIs which are not used by caif at all.

So, remove the unused hardirq.h.

Signed-off-by: Yang Shi <yang.s@alibaba-inc.com>
Cc: Dmitry Tarnyagin <dmitry.tarnyagin@lockless.no>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net-netdev_WARN_ONCE'
David S. Miller [Tue, 9 Jan 2018 01:53:15 +0000 (20:53 -0500)]
Merge branch 'net-netdev_WARN_ONCE'

Gal Pressman says:

====================
Replace WARN_ONCE usages with netdev_WARN_ONCE

This series will fix an issue in netdev_WARN_ONCE, improve its formatting and
replace drivers' usage of WARN_ONCE to netdev_WARN_ONCE.

Driver specific patches were compilation tested, in addition, functional tested
on Mellanox NIC.

v1->v2:
- Addressed commit message comments in patch #1
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years ago8139cp: Replace WARN_ONCE with netdev_WARN_ONCE
Gal Pressman [Sun, 7 Jan 2018 10:08:40 +0000 (12:08 +0200)]
8139cp: Replace WARN_ONCE with netdev_WARN_ONCE

Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnx2x: Replace WARN_ONCE with netdev_WARN_ONCE
Gal Pressman [Sun, 7 Jan 2018 10:08:39 +0000 (12:08 +0200)]
bnx2x: Replace WARN_ONCE with netdev_WARN_ONCE

Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: Ariel Elior <ariel.elior@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoe1000: Replace WARN_ONCE with netdev_WARN_ONCE
Gal Pressman [Sun, 7 Jan 2018 10:08:38 +0000 (12:08 +0200)]
e1000: Replace WARN_ONCE with netdev_WARN_ONCE

Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet/mlx5e: Replace WARN_ONCE with netdev_WARN_ONCE
Gal Pressman [Sun, 7 Jan 2018 10:08:37 +0000 (12:08 +0200)]
net/mlx5e: Replace WARN_ONCE with netdev_WARN_ONCE

Use the more appropriate netdev_WARN_ONCE instead of WARN_ONCE macro.

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: No line break on netdev_WARN* formatting
Gal Pressman [Sun, 7 Jan 2018 10:08:36 +0000 (12:08 +0200)]
net: No line break on netdev_WARN* formatting

Remove the unnecessary line break between the netdev name and reg state
to the actual message that should be printed.

For example, this:
[86730.307236] ------------[ cut here ]------------
[86730.313496] netdevice: enp27s0f0
Message from the driver
[...]

Will be replaced with:
[86770.259289] ------------[ cut here ]------------
[86770.265191] netdevice: enp27s0f0: Message from the driver
[...]

Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: Fix netdev_WARN_ONCE macro
Gal Pressman [Sun, 7 Jan 2018 10:08:35 +0000 (12:08 +0200)]
net: Fix netdev_WARN_ONCE macro

netdev_WARN_ONCE is broken (whoops..), this fix will remove the
unnecessary "condition" parameter, add the missing comma and change
"arg" to "args".

Fixes: 375ef2b1f0d0 ("net: Introduce netdev_*_once functions")
Signed-off-by: Gal Pressman <galp@mellanox.com>
Reviewed-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
David S. Miller [Tue, 9 Jan 2018 01:40:42 +0000 (20:40 -0500)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter/IPVS updates for net-next

The following patchset contains Netfilter/IPVS updates for your
net-next tree:

1) Free hooks via call_rcu to speed up netns release path, from
   Florian Westphal.

2) Reduce memory footprint of hook arrays, skip allocation if family is
   not present - useful in case decnet support is not compiled built-in.
   Patches from Florian Westphal.

3) Remove defensive check for malformed IPv4 - including ihl field - and
   IPv6 headers in x_tables and nf_tables.

4) Add generic flow table offload infrastructure for nf_tables, this
   includes the netlink control plane and support for IPv4, IPv6 and
   mixed IPv4/IPv6 dataplanes. This comes with NAT support too. This
   patchset adds the IPS_OFFLOAD conntrack status bit to indicate that
   this flow has been offloaded.

5) Add secpath matching support for nf_tables, from Florian.

6) Save some code bytes in the fast path for the nf_tables netdev,
   bridge and inet families.

7) Allow one single NAT hook per point and do not allow to register NAT
   hooks in nf_tables before the conntrack hook, patches from Florian.

8) Seven patches to remove the struct nf_af_info abstraction, instead
   we perform direct calls for IPv4 which is faster. IPv6 indirections
   are still needed to avoid dependencies with the 'ipv6' module, but
   these now reside in struct nf_ipv6_ops.

9) Seven patches to handle NFPROTO_INET from the Netfilter core,
   hence we can remove specific code in nf_tables to handle this
   pseudofamily.

10) No need for synchronize_net() call for nf_queue after conversion
    to hook arrays. Also from Florian.

11) Call cond_resched_rcu() when dumping large sets in ipset to avoid
    softlockup. Again from Florian.

12) Pass lockdep_nfnl_is_held() to rcu_dereference_protected(), patch
    from Florian Westphal.

13) Fix matching of counters in ipset, from Jozsef Kadlecsik.

14) Missing nfnl lock protection in the ip_set_net_exit path, also
    from Jozsef.

15) Move connlimit code that we can reuse from nf_tables into
    nf_conncount, from Florian Westhal.

And asorted cleanups:

16) Get rid of nft_dereference(), it only has one single caller.

17) Add nft_set_is_anonymous() helper function.

18) Remove NF_ARP_FORWARD leftover chain definition in nf_tables_arp.

19) Remove unnecessary comments in nf_conntrack_h323_asn1.c
    From Varsha Rao.

20) Remove useless parameters in frag_safe_skb_hp(), from Gao Feng.

21) Constify layer 4 conntrack protocol definitions, function
    parameters to register/unregister these protocol trackers, and
    timeouts. Patches from Florian Westphal.

22) Remove nlattr_size indirection, from Florian Westphal.

23) Add fall-through comments as -Wimplicit-fallthrough needs this,
    from Gustavo A. R. Silva.

24) Use swap() macro to exchange values in ipset, patch from
    Gustavo A. R. Silva.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Linus Torvalds [Tue, 9 Jan 2018 00:17:31 +0000 (16:17 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma fixes from Doug Ledford:

 - One line fix to mlx4 error flow (same as mlx5 fix in last pull
   request, just in the mlx4 driver)

 - Fix a race condition in the IPoIB driver. This patch is larger than
   just a one line fix, but resolves a race condition in a fairly
   straight forward manner

 - Fix a locking issue in the RDMA netlink code. This patch is also
   larger than I would like for a late -rc. It has, however, had a week
   to bake in the rdma tree prior to this pull request

 - One line fix to fix granting remote machine access to memory that
   they don't need and shouldn't have

 - One line fix to correct the fact that our sgid/dgid pair is swapped
   from what you would expect when receiving an incoming connection
   request

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
  IB/srpt: Fix ACL lookup during login
  IB/srpt: Disable RDMA access by the initiator
  RDMA/netlink: Fix locking around __ib_get_device_by_index
  IB/ipoib: Fix race condition in neigh creation
  IB/mlx4: Fix mlx4_ib_alloc_mr error flow

6 years agoMerge tag 'platform-drivers-x86-v4.15-4' of git://git.infradead.org/linux-platform...
Linus Torvalds [Mon, 8 Jan 2018 19:52:24 +0000 (11:52 -0800)]
Merge tag 'platform-drivers-x86-v4.15-4' of git://git.infradead.org/linux-platform-drivers-x86

Pull x86 platform driver fix from Darren Hart:
 "Address a wmi initcall ordering race resulting in a difficult to
  reproduce boot failure"

* tag 'platform-drivers-x86-v4.15-4' of git://git.infradead.org/linux-platform-drivers-x86:
  platform/x86: wmi: Call acpi_wmi_init() later

6 years agonet: tracepoint: exposing sk_faimily in tracepoint inet_sock_set_state
Yafang Shao [Sun, 7 Jan 2018 06:31:47 +0000 (14:31 +0800)]
net: tracepoint: exposing sk_faimily in tracepoint inet_sock_set_state

As of now, there're two sk_family are traced with sock:inet_sock_set_state,
which are AF_INET and AF_INET6.
So the sk_family are exposed as well.
Then we can conveniently use it to do the filter.

Both sk_family and sk_protocol are showed in the printk message, so we need
not expose them as tracepoint arguments.

Suggested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Suggested-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Reviewed-by: Song Liu <songliubraving@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosh_eth: fix TXALCR1 offsets
Sergei Shtylyov [Sat, 6 Jan 2018 21:26:47 +0000 (00:26 +0300)]
sh_eth: fix TXALCR1 offsets

The  TXALCR1 offsets are incorrect in the register offset tables, most
probably due to copy&paste error.  Luckily, the driver never uses this
register. :-)

Fixes: 4a55530f38e4 ("net: sh_eth: modify the definitions of register")
Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agomdio-sun4i: Fix a memory leak
Christophe JAILLET [Sat, 6 Jan 2018 08:00:09 +0000 (09:00 +0100)]
mdio-sun4i: Fix a memory leak

If the probing of the regulator is deferred, the memory allocated by
'mdiobus_alloc_size()' will be leaking.
It should be freed before the next call to 'sun4i_mdio_probe()' which will
reallocate it.

Fixes: 4bdcb1dd9feb ("net: Add MDIO bus driver for the Allwinner EMAC")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agol2tp: adjust comments about L2TPv3 offsets
Guillaume Nault [Fri, 5 Jan 2018 18:47:14 +0000 (19:47 +0100)]
l2tp: adjust comments about L2TPv3 offsets

The "offset" option has been removed by
commit 900631ee6a26 ("l2tp: remove configurable payload offset").

Signed-off-by: Guillaume Nault <g.nault@alphalink.fr>
Acked-by: James Chapman <jchapman@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agophylink: mark expected switch fall-throughs in phylink_mii_ioctl
Gustavo A. R. Silva [Fri, 5 Jan 2018 17:23:45 +0000 (11:23 -0600)]
phylink: mark expected switch fall-throughs in phylink_mii_ioctl

In preparation to enabling -Wimplicit-fallthrough, mark switch cases
where we are expecting to fall through.

Addresses-Coverity-ID: 1463447 ("Missing break in switch")
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'SCTP-PMTU-discovery-fixes'
David S. Miller [Mon, 8 Jan 2018 19:19:13 +0000 (14:19 -0500)]
Merge branch 'SCTP-PMTU-discovery-fixes'

Marcelo Ricardo Leitner says:

====================
SCTP PMTU discovery fixes

This patchset fixes 2 issues with PMTU discovery that can lead to flood
of retransmissions.
The first patch fixes the issue for when PMTUD is disabled by the
application, while the second fixes it for when its enabled.

Please consider these to stable.
====================

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: phy: fix wrong masks to phy_modify()
Russell King [Fri, 5 Jan 2018 16:07:10 +0000 (16:07 +0000)]
net: phy: fix wrong masks to phy_modify()

The mask argument for phy_modify() in several locations was inverted.

Fixes: fea23fb591cc ("net: phy: convert read-modify-write to phy_modify()")
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: fix the handling of ICMP Frag Needed for too small MTUs
Marcelo Ricardo Leitner [Fri, 5 Jan 2018 13:17:18 +0000 (11:17 -0200)]
sctp: fix the handling of ICMP Frag Needed for too small MTUs

syzbot reported a hang involving SCTP, on which it kept flooding dmesg
with the message:
[  246.742374] sctp: sctp_transport_update_pmtu: Reported pmtu 508 too
low, using default minimum of 512

That happened because whenever SCTP hits an ICMP Frag Needed, it tries
to adjust to the new MTU and triggers an immediate retransmission. But
it didn't consider the fact that MTUs smaller than the SCTP minimum MTU
allowed (512) would not cause the PMTU to change, and issued the
retransmission anyway (thus leading to another ICMP Frag Needed, and so
on).

As IPv4 (ip_rt_min_pmtu=556) and IPv6 (IPV6_MIN_MTU=1280) minimum MTU
are higher than that, sctp_transport_update_pmtu() is changed to
re-fetch the PMTU that got set after our request, and with that, detect
if there was an actual change or not.

The fix, thus, skips the immediate retransmission if the received ICMP
resulted in no change, in the hope that SCTP will select another path.

Note: The value being used for the minimum MTU (512,
SCTP_DEFAULT_MINSEGMENT) is not right and instead it should be (576,
SCTP_MIN_PMTU), but such change belongs to another patch.

Changes from v1:
- do not disable PMTU discovery, in the light of commit
06ad391919b2 ("[SCTP] Don't disable PMTU discovery when mtu is small")
and as suggested by Xin Long.
- changed the way to break the rtx loop by detecting if the icmp
  resulted in a change or not
Changes from v2:
none

See-also: https://lkml.org/lkml/2017/12/22/811
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosctp: do not retransmit upon FragNeeded if PMTU discovery is disabled
Marcelo Ricardo Leitner [Fri, 5 Jan 2018 13:17:17 +0000 (11:17 -0200)]
sctp: do not retransmit upon FragNeeded if PMTU discovery is disabled

Currently, if PMTU discovery is disabled on a given transport, but the
configured value is higher than the actual PMTU, it is likely that we
will get some icmp Frag Needed. The issue is, if PMTU discovery is
disabled, we won't update the information and will issue a
retransmission immediately, which may very well trigger another ICMP,
and another retransmission, leading to a loop.

The fix is to simply not trigger immediate retransmissions if PMTU
discovery is disabled on the given transport.

Changes from v2:
- updated stale comment, noticed by Xin Long

Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoxen-netfront: enable device after manual module load
Eduardo Otubo [Fri, 5 Jan 2018 08:42:16 +0000 (09:42 +0100)]
xen-netfront: enable device after manual module load

When loading the module after unloading it, the network interface would
not be enabled and thus wouldn't have a backend counterpart and unable
to be used by the guest.

The guest would face errors like:

  [root@guest ~]# ethtool -i eth0
  Cannot get driver information: No such device

  [root@guest ~]# ifconfig eth0
  eth0: error fetching interface information: Device not found

This patch initializes the state of the netfront device whenever it is
loaded manually, this state would communicate the netback to create its
device and establish the connection between them.

Signed-off-by: Eduardo Otubo <otubo@redhat.com>
Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoforcedeth: remove duplicate structure member in rx
Zhu Yanjun [Fri, 5 Jan 2018 04:06:39 +0000 (23:06 -0500)]
forcedeth: remove duplicate structure member in rx

Since both first_rx and rx_ring are the head of rx ring, it not
necessary to use two structure members to statically indicate
the head of rx ring. So first_rx is removed.

CC: Srinivas Eeda <srinivas.eeda@oracle.com>
CC: Joe Jin <joe.jin@oracle.com>
CC: Junxiao Bi <junxiao.bi@oracle.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'bnxt_en_fixes'
David S. Miller [Mon, 8 Jan 2018 19:13:45 +0000 (14:13 -0500)]
Merge branch 'bnxt_en_fixes'

Michael Chan says:

====================
bnxt_en: 2 small bug fixes.

The first one fixes the TC Flower flow parameter passed to firmware.  The
2nd one fixes the VF index range checking for iproute2 SRIOV related commands.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.
Venkat Duvvuru [Thu, 4 Jan 2018 23:46:55 +0000 (18:46 -0500)]
bnxt_en: Fix the 'Invalid VF' id check in bnxt_vf_ndo_prep routine.

In bnxt_vf_ndo_prep (which is called by bnxt_get_vf_config ndo), there is a
check for "Invalid VF id". Currently, the check is done against max_vfs.
However, the user doesn't always create max_vfs. So, the check should be
against the created number of VFs. The number of bnxt_vf_info structures
that are allocated in bnxt_alloc_vf_resources routine is the "number of
requested VFs". So, if an "invalid VF id" falls between the requested
number of VFs and the max_vfs, the driver will be dereferencing an invalid
pointer.

Fixes: c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver.")
Signed-off-by: Venkat Devvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agobnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc()
Sunil Challa [Thu, 4 Jan 2018 23:46:54 +0000 (18:46 -0500)]
bnxt_en: Fix population of flow_type in bnxt_hwrm_cfa_flow_alloc()

flow_type in HWRM_FLOW_ALLOC is not being populated correctly due to
incorrect passing of pointer and size of l3_mask argument of is_wildcard().
Fixed this.

Fixes: db1d36a27324 ("bnxt_en: add TC flower offload flow_alloc/free FW cmds")
Signed-off-by: Sunil Challa <sunilkumar.challa@broadcom.com>
Reviewed-by: Sathya Perla <sathya.perla@broadcom.com>
Reviewed-by: Venkat Duvvuru <venkatkumar.duvvuru@broadcom.com>
Signed-off-by: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj...
Linus Torvalds [Mon, 8 Jan 2018 19:13:08 +0000 (11:13 -0800)]
Merge branch 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup

Pull cgroup fixes from Tejun Heo:
 "This contains fixes for the following two non-trivial issues:

   - The task iterator got broken while adding thread mode support for
     v4.14. It was less visible because it only triggers when both
     cgroup1 and cgroup2 hierarchies are in use. The recent versions of
     systemd uses cgroup2 for process management even when cgroup1 is
     used for resource control exposing this issue.

   - cpuset CPU hotplug path could deadlock when racing against exits.

  There also are two patches to replace unlimited strcpy() usages with
  strlcpy()"

* 'for-4.15-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup:
  cgroup: fix css_task_iter crash on CSS_TASK_ITER_PROC
  cgroup: Fix deadlock in cpu hotplug path
  cgroup: use strlcpy() instead of strscpy() to avoid spurious warning
  cgroup: avoid copying strings longer than the buffers

6 years agotcp: Split BUG_ON() in tcp_tso_should_defer() into two assertions
Stefano Brivio [Thu, 4 Jan 2018 23:38:05 +0000 (00:38 +0100)]
tcp: Split BUG_ON() in tcp_tso_should_defer() into two assertions

The two conditions triggering BUG_ON() are somewhat unrelated:
the tcp_skb_pcount() check is meant to catch TSO flaws, the
second one checks sanity of congestion window bookkeeping.

Split them into two separate BUG_ON() assertions on two lines,
so that we know which one actually triggers, when they do.

Signed-off-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: ipv6: Allow connect to linklocal address from socket bound to vrf
David Ahern [Thu, 4 Jan 2018 22:03:54 +0000 (14:03 -0800)]
net: ipv6: Allow connect to linklocal address from socket bound to vrf

Allow a process bound to a VRF to connect to a linklocal address.
Currently, this fails because of a mismatch between the scope of the
linklocal address and the sk_bound_dev_if inherited by the VRF binding:
    $ ssh -6 fe80::70b8:cff:fedd:ead8%eth1
    ssh: connect to host fe80::70b8:cff:fedd:ead8%eth1 port 22: Invalid argument

Relax the scope check to allow the socket to be bound to the same L3
device as the scope id.

This makes ipv6 linklocal consistent with other relaxed checks enabled
by commits 1ff23beebdd3 ("net: l3mdev: Allow send on enslaved interface")
and 7bb387c5ab12a ("net: Allow IP_MULTICAST_IF to set index to L3 slave").

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agosh_eth: remove sh_eth_plat_data::edmac_endian
Sergei Shtylyov [Thu, 4 Jan 2018 21:26:46 +0000 (00:26 +0300)]
sh_eth: remove sh_eth_plat_data::edmac_endian

Since the commit 888cc8c20cf ("sh_eth: remove EDMAC_BIG_ENDIAN") (geez,
I didn't realize that was 2 years ago!) the initializers in the SuperH
platform code for the 'sh_eth_plat_data::edmac_endian' stopped to matter,
so we can remove that field for good (not sure if  it  was ever useful --
SH7786 Ether has been reported  to have the same EDMAC descriptor/register
endiannes as configured for the SuperH CPU)...

Signed-off-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'hns3-next'
David S. Miller [Mon, 8 Jan 2018 19:06:20 +0000 (14:06 -0500)]
Merge branch 'hns3-next'

Peng Li says:

====================
add some new features and fix some bugs for HNS3 driver

This patchset adds some new features support and fixes some bugs:
[Patch 1/20] adds support to enable/disable vlan filter with ethtool
[Patch 2/20] disables VFs change rxvlan offload status
[Patch 3/20 - 13/120 fix bugs and refine some codes for packet
statistics, support query with both ifconfig and ethtool.
[Patch 14/20 - 20/20] fix some other bugs.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Add more packet size statisctics
Jian Shen [Fri, 5 Jan 2018 10:18:24 +0000 (18:18 +0800)]
net: hns3: Add more packet size statisctics

The statistics of rx/tx packets size greater than 1518
are not detailed. This patch adds more statistics for
different packet size range.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: remove redundant semicolon
Peng Li [Fri, 5 Jan 2018 10:18:23 +0000 (18:18 +0800)]
net: hns3: remove redundant semicolon

There is a redundant semicolon, this patch removes it.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: fix for not setting pause parameters
Fuyun Liang [Fri, 5 Jan 2018 10:18:22 +0000 (18:18 +0800)]
net: hns3: fix for not setting pause parameters

Pause parameters include source address, transmit gap and pause time.
The default value of the pause source address is zero in the hardware.
Default pause parameters need to be set to the hardware. Also, when
setting new mac address, the pause source address need to be updated.

Fixes: 9dc2145d910e ("net: hns3: Add support for PFC setting in TM module")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: add MTU initialization for hardware
Fuyun Liang [Fri, 5 Jan 2018 10:18:21 +0000 (18:18 +0800)]
net: hns3: add MTU initialization for hardware

When initializing the MAC, the MTU vlaue need to be set to the hardware
too. Otherwise, the MTU value of software will be different from the MTU
value of hardware.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: fix for changing MTU
Fuyun Liang [Fri, 5 Jan 2018 10:18:20 +0000 (18:18 +0800)]
net: hns3: fix for changing MTU

when changing MTU, The new MTU must need to be set to netdevice.

Fixes: a8e8b7ff3517 ("net: hns3: Add support to change MTU in HNS3 hardware")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: fix for setting MTU
Fuyun Liang [Fri, 5 Jan 2018 10:18:19 +0000 (18:18 +0800)]
net: hns3: fix for setting MTU

When setting MTU, actually what we do is configuring the max frame size
for the hardware. ETH_HLEN、ETH_FCS_LEN and VLAN_HLEN must need to be
considered. And the frame size which is less than the default value
should not be set to the hardware. Because in the hardware, the the max
frame size not only controls the RX packet size, but also controls the
TX packet size. the RX packets whose size are greater than the setting
value will be dropped.

This patch fixes the bug setting a error max frame size to hardware.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: fix for updating fc_mode_last_time
Fuyun Liang [Fri, 5 Jan 2018 10:18:18 +0000 (18:18 +0800)]
net: hns3: fix for updating fc_mode_last_time

commit a9c782822166 ("net: hns3: add support for set_pauseparam")
adds set_pauseparam support for ethtool cmd, but forgets to update
fc_mode_last_time when PFC mode is disabled in hclge_cfg_pauseparam().
The wrong fc_mode_last_time will be used to update flow control mode
when lldpad has been running. As a result, when using the ethtool
command "-a", user will get a wrong pause parameter.

This patch adds the fc_mode_last_time update when PFC mode is disabled.

Fixes: a9c782822166 ("net: hns3: add support for set_pauseparam")
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix a response data read error of tqp statistics query
Jian Shen [Fri, 5 Jan 2018 10:18:17 +0000 (18:18 +0800)]
net: hns3: Fix a response data read error of tqp statistics query

The result of tqp statistics query was read with an
error position, fix it according to the user manual.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Add packet statistics of netdev
Jian Shen [Fri, 5 Jan 2018 10:18:16 +0000 (18:18 +0800)]
net: hns3: Add packet statistics of netdev

Add packet statistics of netdev for ethtool -S, in
order to show the statistics data for current net
device.

Remove update_stats() calling because it has been
completed in hns3_get_netdev_stats().

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Remove a useless member of struct hns3_stats
Jian Shen [Fri, 5 Jan 2018 10:18:15 +0000 (18:18 +0800)]
net: hns3: Remove a useless member of struct hns3_stats

The member "stats_size" of struct hns3_stats is useless,
remove it and fix the macro definition which has uses this
struct.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix an error macro definition of HNS3_TQP_STAT
Jian Shen [Fri, 5 Jan 2018 10:18:14 +0000 (18:18 +0800)]
net: hns3: Fix an error macro definition of HNS3_TQP_STAT

The member "stats_offset" was designed to indicate the offset
of each member of struct ring_stats in struct hns3_enet_ring,
but forgot to add the offset of the member in struct ring_stats.

Fixes: 496d03e960a ("net: hns3: Add Ethtool support to HNS3 driver")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix a loop index error of tqp statistics query
Jian Shen [Fri, 5 Jan 2018 10:18:13 +0000 (18:18 +0800)]
net: hns3: Fix a loop index error of tqp statistics query

An error loop index was used while querying statistics data
of tqps, which may cause call trace.

Fixes: 496d03e960ae ("net: hns3: Add Ethtool support to HNS3 driver")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix an error of total drop packet statistics
Jian Shen [Fri, 5 Jan 2018 10:18:12 +0000 (18:18 +0800)]
net: hns3: Fix an error of total drop packet statistics

The dropped tx/rx packets number of each tqp should also
be counted into the total drop tx/rx packets numbers.

Fixes: 76ad4f0ee74 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Mask the packet statistics query when NIC is down
Jian Shen [Fri, 5 Jan 2018 10:18:11 +0000 (18:18 +0800)]
net: hns3: Mask the packet statistics query when NIC is down

Update the HNS3_NIC_STATE_DOWN bit when NIC state changes.
When NIC is down, mask the packet statistics for querying
with ifconfig command. It's a common practice.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Modify the update period of packet statistics
Jian Shen [Fri, 5 Jan 2018 10:18:10 +0000 (18:18 +0800)]
net: hns3: Modify the update period of packet statistics

It takes more than 200 query response messages between
driver and IMP, while updating the packet statistics.
It's too heavy for IMP to update it per second.

Extend the update period of packet statistics data from
1 second to 300 seconds(if too long, the statistics may
overflow).

As a result, we need to update it while querying with
ifconfig tool to keep the statistics data fresh.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Remove repeat statistic of rx_errors
Jian Shen [Fri, 5 Jan 2018 10:18:09 +0000 (18:18 +0800)]
net: hns3: Remove repeat statistic of rx_errors

The igu_rx_err_pkt indicates the same error with
mac_rx_fcs_err_pkt_num, so remove it.

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Fix spelling errors
Jian Shen [Fri, 5 Jan 2018 10:18:08 +0000 (18:18 +0800)]
net: hns3: Fix spelling errors

Fix spelling error "overrsize" --> "oversize".

Fixes: 46a3df9f9718 ("net: hns3: Add HNS3 Acceleration Engine & Compatibility Layer Support")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Unify the strings display of packet statistics
Jian Shen [Fri, 5 Jan 2018 10:18:07 +0000 (18:18 +0800)]
net: hns3: Unify the strings display of packet statistics

Some members of packet statistics are named in different styles.
This patch unifies them with new internal name rules, the main
modification are below:
trans --> tx
rcv --> rx
rcb_q%d_tx -->  txq#%d
rcb_q%d_rx -->  rxq#%d
sw_err_cnt(tx side) --> tx_dropped
sw_err_cnt(rx side) --> rx_dropped
pkts --> packets
tx_err_cnt --> errors
rx_err_cnt --> errors

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Disable VFs change rxvlan offload status
Jian Shen [Fri, 5 Jan 2018 10:18:06 +0000 (18:18 +0800)]
net: hns3: Disable VFs change rxvlan offload status

Rxvlan offload status can only be changed by PF. Initialize
the value of NETIF_F_HW_VLAN_CTAG_RX bit of hw_features for
VFS to false, make sure user can't be able to change it.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: hns3: Add ethtool interface for vlan filter
Jian Shen [Fri, 5 Jan 2018 10:18:05 +0000 (18:18 +0800)]
net: hns3: Add ethtool interface for vlan filter

This patch adds vlan filter enable switch to
support ethtool -K ethX rx-vlan-filter on/off.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agoMerge branch 'net-qualcomm-rmnet-Enable-csum-offloads'
David S. Miller [Mon, 8 Jan 2018 18:58:50 +0000 (13:58 -0500)]
Merge branch 'net-qualcomm-rmnet-Enable-csum-offloads'

Subash Abhinov Kasiviswanathan says:

====================
net: qualcomm: rmnet: Enable csum offloads

This series introduces the MAPv4 packet format for checksum
offload plus some other minor changes.

Patches 1-3 are cleanups.

Patch 4 renames the ingress format to data format so that all data
formats can be configured using this going forward.

Patch 5 uses the pacing helper to improve TCP transmit performance.

Patch 6-9 defines the the MAPv4 for checksum offload for RX and TX.
A new header and trailer format are used as part of MAPv4.
For RX checksum offload, only the 1's complement of the IP payload
portion is computed by hardware. The meta data from RX header is
used to verify the checksum field in the packet. Note that the
IP packet and its field itself is not modified by hardware.
This gives metadata to help with the RX checksum. For TX, the
required metadata is filled up so hardware can compute the
checksum.

Patch 10 enables GSO on rmnet devices

v1->v2: Fix sparse errors reported by kbuild test robot

v2->v3: Update the commit message for Patch 5 based on Eric's comments
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: qualcomm: rmnet: Add support for GSO
Subash Abhinov Kasiviswanathan [Sun, 7 Jan 2018 18:36:40 +0000 (11:36 -0700)]
net: qualcomm: rmnet: Add support for GSO

Real devices may support scatter gather(SG), so enable SG on rmnet
devices to use GSO. GSO reduces CPU cycles by 20% for a rate of
146Mpbs for a single stream TCP connection.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: qualcomm: rmnet: Add support for TX checksum offload
Subash Abhinov Kasiviswanathan [Sun, 7 Jan 2018 18:36:39 +0000 (11:36 -0700)]
net: qualcomm: rmnet: Add support for TX checksum offload

TX checksum offload applies to TCP / UDP packets which are not
fragmented using the MAPv4 checksum trailer. The following needs to be
done to have checksum computed in hardware -

1. Set the checksum start offset and inset offset.
2. Set the csum_enabled bit
3. Compute and set 1's complement of partial checksum field in
   transport header.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: qualcomm: rmnet: Handle command packets with checksum trailer
Subash Abhinov Kasiviswanathan [Sun, 7 Jan 2018 18:36:38 +0000 (11:36 -0700)]
net: qualcomm: rmnet: Handle command packets with checksum trailer

When using the MAPv4 packet format in conjunction with MAP commands,
a dummy DL checksum trailer will be appended to the packet. Before
this packet is sent out as an ACK, the DL checksum trailer needs to be
removed.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
6 years agonet: qualcomm: rmnet: Add support for RX checksum offload
Subash Abhinov Kasiviswanathan [Sun, 7 Jan 2018 18:36:37 +0000 (11:36 -0700)]
net: qualcomm: rmnet: Add support for RX checksum offload

When using the MAPv4 packet format, receive checksum offload can be
enabled in hardware. The checksum computation over pseudo header is
not offloaded but the rest of the checksum computation over
the payload is offloaded. This applies only for TCP / UDP packets
which are not fragmented.

rmnet validates the TCP/UDP checksum for the packet using the checksum
from the checksum trailer added to the packet by hardware. The
validation performed is as following -

1. Perform 1's complement over the checksum value from the trailer
2. Compute 1's complement checksum over IPv4 / IPv6 header and
   subtracts it from the value from step 1
3. Computes 1's complement checksum over IPv4 / IPv6 pseudo header and
   adds it to the value from step 2
4. Subtracts the checksum value from the TCP / UDP header from the
   value from step 3.
5. Compares the value from step 4 to the checksum value from the
   TCP / UDP header.
6. If the comparison in step 5 succeeds, CHECKSUM_UNNECESSARY is set
   and the packet is passed on to network stack. If there is a
   failure, then the packet is passed on as such without modifying
   the ip_summed field.

The checksum field is also checked for UDP checksum 0 as per RFC 768
and for unexpected TCP checksum of 0.

If checksum offload is disabled when using MAPv4 packet format in
receive path, the packet is queued as is to network stack without
the validations above.

Signed-off-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: David S. Miller <davem@davemloft.net>