git.proxmox.com Git - mirror

tc/pedit: p_eth: ETH header editor

For example, forward tcp traffic to veth0 and set
destination mac address to 11:22:33:44:55:66 :
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
    action pedit ex munge \
      eth dst set 11:22:33:44:55:66 \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>

tc/pedit: Support fields bigger than 32 bits

Make parse_val() accept fields up to 128 bits long, this should be
enough for current use cases and involves a minimal change to code.

Signed-off-by: Amir Vadai <amir@vadai.me>

tc/pedit: p_ip: introduce editing ttl header

Enable user to edit IP header ttl field.

For example, to forward any TCP packet and decrease its TTL by one:
$ tc filter add dev enp0s9 protocol ip parent ffff: \
    flower \
      ip_proto tcp \
    action pedit ex munge \
      ip ttl add 0xff pipe \
    action mirred egress \
      redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>

tc/pedit: Introduce 'add' operation

This command could be useful to increase/decrease fields value.

Signed-off-by: Amir Vadai <amir@vadai.me>

tc/pedit: Extend pedit to specify offset relative to mac/transport headers

Utilize the extended pedit netlink to set an offset relative to a
specific header type. Old netlink only enabled the user to set
approximated  offset relative to the IPv4 header.

To use this extended functionality need to use the 'ex' keyword after
'pedit' and before any 'munge'.
e.g:
$ tc filter add dev ens9 protocol ip parent ffff: \
    flower \
      ip_proto udp \
      dst_port 80 \
    action pedit ex munge \
      ip dst set 1.1.1.1 \
      pipe \
    action mirred egress redirect dev veth0

Signed-off-by: Amir Vadai <amir@vadai.me>

tc/pedit: Fix a typo in pedit usage message

Signed-off-by: Amir Vadai <amir@vadai.me>

iplink: whitespace cleanup

Break lines to conform to 80 col guideline.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

iplink: add support for IFLA_CARRIER attribute

Add support to set IFLA_CARRIER attribute.

Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com>

routel: fix infinite loop in line parser

As noticed by one of the few users of routel script, it ends up in an
infinite loop when they pull out the cable from the NIC used for some
route. This is caused by its parser expecting the line of "ip route show"
output consists of "key value" pairs (except for the initial target range),
together with an old trap of Bourne style shells that "shift 2" does
nothing if there is only one argument left. Some keywords, e.g. "linkdown",
are not followed by a value.

Improve the parser to

(1) only set variables for keywords we care about
(2) recognize (currently) known keywords without value

This is still far from perfect (and certainly not future proof) but to
fully fix the script, one would probably have to rewrite the logic
completely (and I'm not sure it's worth the effort).

Signed-off-by: Michal Kubecek <mkubecek@suse.cz>

man: ip-rule.8: Further clarify how to interpret priority value

Despite the past changes, users seemed to get confused by the seemingly
contradictory relation of priority value and actual rule priority.

Signed-off-by: Phil Sutter <phil@nwl.cc>

gre6: fix copy/paste bugs in GREv6 attribute manipulation

Fixes: af89576d7a8c("iproute2: GRE over IPv6 tunnel support.")
Signed-off-by: Craig Gallek <kraig@google.com>

actions: Add support for user cookies

Make use of 128b user cookies

Introduce optional 128-bit action cookie.
Like all other cookie schemes in the networking world (eg in protocols
like http or existing kernel fib protocol field, etc) the idea is to
save user state that when retrieved serves as a correlator. The kernel
_should not_ intepret it. The user can store whatever they wish in the
128 bits.

Sample exercise(showing variable length use of cookie)

.. create an accept action with cookie a1b2c3d4
sudo $TC actions add action ok index 1 cookie a1b2c3d4

.. dump all gact actions..
sudo $TC -s actions ls action gact

    action order 0: gact action pass
     random type none pass val 0
     index 1 ref 1 bind 0 installed 5 sec used 5 sec
    Action statistics:
    Sent 0 bytes 0 pkt (dropped 0, overlimits 0 requeues 0)
    backlog 0b 0p requeues 0
    cookie a1b2c3d4

.. bind the accept action to a filter..
sudo $TC filter add dev lo parent ffff: protocol ip prio 1 \
u32 match ip dst 127.0.0.1/32 flowid 1:1 action gact index 1

... send some traffic..
$ ping 127.0.0.1 -c 3
PING 127.0.0.1 (127.0.0.1) 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.020 ms
64 bytes from 127.0.0.1: icmp_seq=2 ttl=64 time=0.027 ms
64 bytes from 127.0.0.1: icmp_seq=3 ttl=64 time=0.038 ms

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>

ip vrf: Add command name next to pid

'ip vrf pids' is used to list processes bound to a vrf, but it only
shows the pid leaving a lot of work for the user. Add the command
name to the output. With this patch you get the more user friendly:

    $ ip vrf pids mgmt
     1121  ntpd
     1418  gdm-session-wor
     1488  gnome-session
     1491  dbus-launch
     1492  dbus-daemon
     1565  sshd
     ...

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

netem: fix out of bounds access in maketable

The maketable program used to generate one of the configuration
files at build time for netem would access past the end of the array
for one input value. This is a bug inherited from original NISTnet.
Just fold the value, like other code there.

This is not a runtime error security problem.
It only impacts the build process if the build machine
had extra hardening enabled.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

ip-route: Prevent some other double spaces in output

Print spaces only after text.

CC: Phil Sutter <phil@nwl.cc>
Signed-off-by: Timothy Redaelli <tredaelli@redhat.com>
Acked-by: Phil Sutter <phil@nwl.cc>

man: ip-link: Specify min/max values for bridge slave priority and cost

The values are parsed as u16/u32, but kernel limits allowed values.

Signed-off-by: Phil Sutter <phil@nwl.cc>

ip: link: Add missing link type help texts

These are basically stubs: The types which lacked their own help text
simply don't accept any options (yet). Still it might be a bit confusing
to users if they are presented with the generic 'ip link' help text
instead of something saying there are no type specific options.

Signed-off-by: Phil Sutter <phil@nwl.cc>

ip: link: Unify link type help functions a bit

Take help function in iplink_bridge.c as an example and make other link
types' help functions similar:

* Use a single fprintf() call (if possible).
* Don't state a full command line, just "... type OPTIONS".
* Put every option in it's own line, align options by column.
* List mandatory options first.

link_veth.c is intentionally left untouched because it's 'peer' option
eats all kinds of generic link options and the help text points this out
without duplicating all the options there again.

Signed-off-by: Phil Sutter <phil@nwl.cc>

ip: link: macvlan: Add newline to help output

A newline between synopsis and variable definition looks nice and is
consistent with others.

Signed-off-by: Phil Sutter <phil@nwl.cc>

ip: link: bond: Fix whitespace in help text

Signed-off-by: Phil Sutter <phil@nwl.cc>

man: ip-link.8: document bridge options

Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>

tc: print skbedit action when dumping actions.

Signed-off-by: Roman Mashak <mrv@mojatatu.com>

man: fix man page warnings

While generating PDFs from the man pages, I saw the warning below from
several files. Compared the tc-matchall.8 with bridge.8 and used .RI
instead of .R. It should have no effect on the man page rendering.

`R' is a string (producing the registered sign), not a macro.

Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>

update headers from 4.11-rc3

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

vxlan: use preferred address family when neither group or remote is specified

When neither group or remote is specified (or if they are specified with
the any address), nothing is sent to the kernel. In this case, the
kernel defaults to IPv4. This makes impossible to use IPv6 with
unspecified unicast remote ("bridge fdb add" will return
EAFNOTSUPPORT).

If the user specifies a preferred address family (eg, "ip -6 link add"),
then send either IFLA_VXLAN_GROUP or IFLA_VXLAN_GROUP6 to enforce the
use of the appropriate family.

Signed-off-by: Vincent Bernat <vincent@bernat.im>

ip route: Add missing space between nexthop and via for mpls multipath routes

MPLS multipath routes are missing a space between 'nexthop' and 'via':

$ ip -net ns1 -f mpls ro ls
100
nexthopvia inet 172.16.2.2 dev virt12
nexthopvia inet 172.16.3.2 dev br0

Add it.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

man: add examples to ip.8

Having some examples in the top level man page might make it a little bit easier
for new users to get started. Reused some words / sentences from the existing
man pages.

Suggested-by: 積丹尼 Dan Jacobson <jidanni@jidanni.org>
Signed-off-by: Alexander Alemayhu <alexander@alemayhu.com>

update headers from 4.11-rc2

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

man: Fix formatting of vrf parameter of ip-link show command

Add missing opening " [" for the vrf parameter.

Signed-off-by: Robert Shearman <rshearma@brocade.com>

pie: remove always false condition

When built with GCC warnings enabled:
q_pie.c: In function ‘pie_parse_opt’:
q_pie.c:78:38: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
        (alpha > ALPHA_MAX) || (alpha < ALPHA_MIN)) {
                                      ^
q_pie.c:85:35: warning: comparison of unsigned expression < 0 is always false [-Wtype-limits]
        (beta > BETA_MAX) || (beta < BETA_MIN)) {
                                   ^

This is because MIN is 0 and unsigned number can never be less than 0.
Therefore just remove the _MIN values.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

iplink: add support for afstats subcommand

Add support for new afstats subcommand. This uses the new
IFLA_STATS_AF_SPEC attribute of RTM_GETSTATS messages to show
per-device, AF-specific stats. At the moment the kernel only supports
MPLS AF stats, so that is all that's implemented here.

The print_num function is exposed from ipaddress.c to be used for
printing the new stats so that the human-readable option, if set, can
be respected.

Example of use:

    $ ./ip/ip -f mpls link afstats dev eth1
    3: eth1
        mpls:
            RX: bytes  packets  errors  dropped  noroute
            9016       98       0       0        0
            TX: bytes  packets  errors  dropped
            7232       113      0       0

Signed-off-by: Robert Shearman <rshearma@brocade.com>

man: ss.8: Add missing protocols to description of -A

The list was missing dccp and sctp protocols.

Signed-off-by: Phil Sutter <phil@nwl.cc>

devlink: Add json and pretty options to help and man

While at it also fixed missing double dash for long opts.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>

bpf: test for valid type in bpf_get_work_dir

Jan-Erik reported an assertion in bpf_prog_to_subdir() failed where
type was BPF_PROG_TYPE_UNSPEC, which is only used in bpf_init_env()
to auto-mount and cache the bpf fs mount point.

Therefore, make sure when bpf_init_env() is called multiple times
(f.e. eBPF classifier with eBPF action attached) and bpf_mnt_cached
is set already that the type is also valid. In bpf_init_env(), we're
only interested in the mount point and not a type-specific subdir.

Fixes: e42256699cac ("bpf: make tc's bpf loader generic and move into lib")
Reported-by: Jan-Erik Rediger <janerik@rediger.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

color: use "light" colors for dark background

COLORFGBG environment variable is used to detect dark background.

Idea and a bit of code is borrowed from Vim, thanks.

Signed-off-by: Petr Vorel <pvorel@suse.cz>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

bpf: remove unnecessary cast

No need to cast RTA_DATA

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

tc: use rta_getattr_u32

Don't cast RTA_DATA use newish accessors.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

xfrm: remove unnecessary casts

Since RTA_DATA() returns void * no need to cast it.

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

iproute2: tc: introduce build dependency on libnetlink

Rebuilding libnetlink doesn't trigger rebuild of tc, which is wrong
(especially so for builds where libnetlink.a gets statically linked into
tc). Fix that by introducing an explicit dependency.

Signed-off-by: Jiri Kosina <jkosina@suse.cz>

netlink route attribute cleanup

Use the new helper functions rta_getattr_u* instead of direct
cast of RTA_DATA(). Where RTA_DATA() is a structure, then remove
the unnecessary cast since RTA_DATA() is void *

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

{f,m}_bpf: dump tag over insns

We already export TCA_BPF_TAG resp. TCA_ACT_BPF_TAG from kernel commit
f1f7714ea51c ("bpf: rework prog_digest into prog_tag"), thus also dump
it when filter/actions are shown.

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

tc: flower: Fix parsing ip address

Fix order of arguments when passed to __flower_parse_ip_addr.

Fixes: ("f888f4e20534 tc: flower: Support matching ARP")
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Paul Blakey <paulb@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>

ip: Add support for MPLS netconf

Add support for MPLS netconf to ip monitor and ip netconf commands.
Changes to header files not included as those are typically pulled
in my a header sync with the kernel.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

Update headers based on 4.11 merge window

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

update headers from net-next

updated sctp.h

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

add missing iplink_xstats.c

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Merge branch 'master' into net-next

v4.10.0

devlink: use DEVLINK_CMD_ESWITCH_* instead of DEVLINK_CMD_ESWITCH_MODE_*

Sync with kernel and don't use the obsolete enum values.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>

iplink: bridge_slave: add support for displaying xstats

This patch adds support to the bridge_slave link type for displaying
xstats by reusing the previously added bridge xstats callbacks.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

iplink: bridge: add support for displaying xstats

Add support for the new parse/print_ifla_xstats callbacks and use them to
print the per-bridge multicast stats.
Example:
$ ip link xstats type bridge
br0
                    IGMP queries:
                      RX: v1 0 v2 0 v3 0
                      TX: v1 0 v2 0 v3 0
                    IGMP reports:
                      RX: v1 0 v2 0 v3 0
                      TX: v1 0 v2 0 v3 0
                    IGMP leaves: RX: 0 TX: 0
                    IGMP parse errors: 0
                    MLD queries:
                      RX: v1 0 v2 0
                      TX: v1 0 v2 0
                    MLD reports:
                      RX: v1 0 v2 0
                      TX: v1 0 v2 0
                    MLD leaves: RX: 0 TX: 0
                    MLD parse errors: 0

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>

iplink: add support for xstats subcommand

This patch adds support for a new xstats link subcommand which uses the
specified link type's new parse/print_ifla_xstats callbacks to display
extended statistics.

Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Merge branch 'master' into net-next

devlink: Call dl_free in early exit case

Prior to parsing command options, the devlink tool allocates memory
to store results. In case of early exit (wrong parameters or version
check), this memory wasn't freed.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>

man page: add page for skbmod action

Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Roman Mashak <mrv@mojatatu.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>

Merge branch 'master' into net-next

utils: hex2mem get rid of unnecessary goto

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Merge branch 'master' into net-next

update headers from 4.10-rc8

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Merge branch 'merge-4.10' of /tmp/iproute2

Merge branch 'merge-4.10' into next-merge

ip vrf: Detect invalid vrf name in pids command

Verify VRF name is valid before attempting to read cgroups files.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

ip vrf: Handle VRF nesting in namespace

Since cgroups are not namespace aware, the directory heirarchy used by
ip vrf should account for network namespaces. In this case, change the
path from CGRP/BASE/vrf/NAME to CGRP/BASE/NETNS/vrf/NAME where CGRP is
the cgroup2 mount path, BASE in any base heirarchy inherited before VRF
is applied and NAME is the VRF name.

The intent is as follows: a user logs into the box into some namespace
with a name known to iproute2. Some other policy may have put the
process into a BASE heirarchy. From there the user executes a task in
a VRF and in doing so the task heirarchy becomes CGRP/BASE/NETNS/vrf/NAME.
The namespace level is omitted for the default namespace.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

ip netns: refactor netns_identify

Move guts of netns_identify into a standalone function that returns
the netns name in a given buffer.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

ip vrf: Handle vrf in a cgroup hierarchy

Add support for VRF in a pre-existing hierarchy. For example, if the
current process is running in CGRP/foo/bar, the 'ip vrf exec NAME CMD'
should run CMD in the cgroup CGRP/foo/bar/vrf/NAME.

When listing process ids in a VRF, search for the directory vrf/NAME
regardless of base path (foo/bar/vrf/NAME and vrf/NAME) are still
running against the same vrf NAME.

Reported-by: Andy Lutomirski <luto@amacapital.net>
Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

Merge branch 'merge-4.10' into next-merge

tc: flower: support masked ICMP code and type match

Extend ICMP code and type match to support masks.

Also add missing documentation to synopsis in manpage.

tc qdisc add dev eth0 ingress
tc filter add dev eth0 protocol ipv6 parent ffff: flower \
indev eth0 ip_proto icmpv6 type 128/240 code 0 action drop

Signed-off-by: Simon Horman <simon.horman@netronome.com>

tc: flower: provide generic masked u8 print helper

Provide generic masked u8 print helper and use it to print arp operations.

Also:
* Make name parameter of arp op print helper const.
* Consistently use __u8 rather than uint8_t, in keeping with the
pervasive style in the file.

Signed-off-by: Simon Horman <simon.horman@netronome.com>

tc: flower: provide generic masked u8 parser helper

Provide generic masked u8 paser helper and use it to parse arp operations.

Also consistently use __u8 rather than uint8_t, in keeping with the
pervasive style in the file.

Signed-off-by: Simon Horman <simon.horman@netronome.com>

update headers from net-next

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Merge branch 'master' into next-merge

testsuite: search for kernel config in /boot

Add support for finding the kernel config in Debian
and derivatives.

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>

testsuite: refactor kernel config search

Signed-off-by: Asbjørn Sloth Tønnesen <asbjorn@asbjorn.st>

tc: matchall: Print skip flags when dumping a filter

Print the skip flags when we dump a filter.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked by: Yotam Gigi <yotamg@mellanox.com>
Reviewed-by: Simon Horman <simon.horman@netronome.com>

ip route: Make name of protocol 0 consistent

iproute2 can inconsistently show the name of protocol 0 if a route with
a custom protocol is added. For example:
  dsa@cartman:~$ ip -6 ro ls table all | egrep 'proto none|proto unspec'
  local ::1 dev lo  table local  proto none  metric 0  pref medium
  local fe80::225:90ff:fecb:1c18 dev lo  table local  proto none  metric 0  pref medium
  local fe80::92e2:baff:fe5c:da5d dev lo  table local  proto none  metric 0  pref medium

protocol 0 is pretty printed as "none". Add a route with a custom protocol:
  dsa@cartman:~$ sudo ip -6 ro add  2001:db8:200::1/128 dev eth0 proto 123

And now display has switched from "none" to "unspec":
  dsa@cartman:~$ ip -6 ro ls table all | egrep 'proto none|proto unspec'
  local ::1 dev lo  table local  proto unspec  metric 0  pref medium
  local fe80::225:90ff:fecb:1c18 dev lo  table local  proto unspec  metric 0  pref medium
  local fe80::92e2:baff:fe5c:da5d dev lo  table local  proto unspec  metric 0  pref medium

The rt_protos file has the id to name mapping as "unspec" while
rtnl_rtprot_tab[0] has "none". The presence of a custom protocol id
triggers reading the rt_protos file and overwriting the string in
rtnl_rtprot_tab. All of this is logic from 2004 and earlier.

Update rtnl_rtprot_tab to "unspec" to match the enum value.

Signed-off-by: David Ahern <dsa@cumulusnetworks.com>

man: ip-link.8: Document bridge_slave fdb_flush option

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

testsuite: Search kernel config in modules dir also

At least in Fedora there is no /proc/config.gz but instead
/lib/modules/`uname -r`/config, so use that as a fallback.

Signed-off-by: Phil Sutter <phil@nwl.cc>

testsuite: Generate nlmsg blob at runtime

Since netlink messages are in host byte order, shipping a pre-generated
nlmsg blob won't suffice on systems with different endianness. Therefore
generate the blob at runtime, so it's content fits the hosts endianness.

Note that the generated message will contain only a single interface
featuring two VFs instead of the full list before. Yet this is
sufficient, as it triggers the crash with iproute versions prior to
commit 8c29ae7cc2494 ("ip link: Fix crash on older kernels when show VF
dev").

Signed-off-by: Phil Sutter <phil@nwl.cc>

tc: flower: Update documentation to indicate ARP takes IPv4 prefixes

Unlike other PREFIXes documented in the usage for tc flower, which accept
both IPv4 and IPv6 prefixes, arp_sip and arp_tip only accepts IPv4
prefixes.

Signed-off-by: Simon Horman <simon.horman@netronome.com>

tc: flower: use correct type when calling flower_icmp_attr_type

Use enum flower_icmp_field rather than bool as type of third parameter
when calling flower_icmp_attr_type.

Fixes: eb3b5696f163 ("tc: flower: support matching on ICMP type and code")
Signed-off-by: Simon Horman <simon.horman@netronome.com>

man: ip-link.8: Document bridge_slave fdb_flush option

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>

tc: add missing sample file

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

update headers from bridge tunnel metadata

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

tc: bash-completion: Add support for matchall

Add support for the matchall classifier and its parameters.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>

tc: bash-completion: Add support for filter actions

Previously, the autocomplete routine did not complete actions after a
filter keyword, for example:

$ tc filter add dev eth0 u32 [...] action <TAB>

did not suggest the actions list, and:

$ tc filter add dev eth0 u32 [...] action mirred <TAB>

did not suggest the specific mirred parameters. Add the support for this
kind of completion by adding the _tc_filter_action_options routine and
invoking it from inside _tc_filter_options.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>

tc: bash-completion: Make the *_KIND variables global

The QDISC_KIND, FILTER_KIND, ACTION_KIND variables may be used by other
routines, thus make them global variables.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>

tc: bash-completion: Prepare action autocomplete to support several actions

The action autocomplete routine (_tc_action_options) currently does not
support several actions statements in one tc command line as it uses the
_tc_once_attr and _tc_one_from_list.

For example, in that case:

$ tc filter add dev eth0 handle ffff: u32 [...]  \
   action sample group 5 rate 12 \
   action sample <TAB>

the _tc_once_attr function, when invoked with "group rate" will not
suggest those as they already exist on the command line.

Fix the function to use the _from variant, thus allowing each action
autocomplete start from the action keyword, and not from the beginning of
the command line.

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>

tc: bash-completion: Add the _from variant to _tc_one* funcs

The _tc_one_of_list and _tc_once_attr functions simplfy the bash
completion task by validating each attr exist only once on the command
line.

For example, for the command line:

$ a b c d e

and the call to _tc_once_attr with "a f g", the function will suggest
"f g" as "a" existed in the command line in args 0.

Add the _from variant to those functions, which allows having the command
line option once from a specified index. In the previous example, calling
_tc_once_attr with 4 and "a f g" will suggest "a f g".

Signed-off-by: Yotam Gigi <yotamg@mellanox.com>

tc: man: matchall: Update examples to include sample

Add an example of packet sampling to the tc-matchall man page examples
section. The example uses the matchall classifier and the sample action to
create packet sampling on a port.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>

tc: man: Add man entry for the tc-sample action

In addition to general information about the tc action, the man entry
contains common usage examples and information about the tlv fields packed
within each sampled packet.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>

tc: Add support for the sample tc action

The sample tc action allows sampling packets matching a classifier. It
peeks randomly packets, and samples them using the psample netlink
channel. The user can specify the psample group, which the packet will be
sampled to, the sampling rate and the packet truncation (to save
kernel-user traffic).

The sampled packets contain informative metadata, for example, the input
interface and the original packet length.

The action syntax:
tc filter add [...] \
action sample rate <RATE> group <GROUP> [trunc <SIZE>]
[...]

Where:
  RATE := The sampling rate which is the ratio of packets observed at the
  data source to the samples generated
  GROUP := the psample module sampling group
  SIZE := optional truncation size

An example for a common usecase of the sample tc action: to sample ingress
traffic from interface eth1, one may use the commands:

tc qdisc add dev eth1 handle ffff: ingress

tc filter add dev eth1 parent ffff: \
       matchall action sample rate 12 group 4

Where the first command adds an ingress qdisc and the second starts
sampling randomly with an average of one sampled packet per 12 packets
on dev eth1 to psample group 4.

Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Yotam Gigi <yotamg@mellanox.com>

Merge branch 'master' into net-next

tcp: header file update

Signed-off-by: Stephen Hemminger <stephen@networkplumber.org>

Merge branch 'master' into net-next

man: ip-route.8: Fix 'expires' indenting

Descriptions of each route sub-command's arguments are enclosed in
.RS/.RE pairs. For 'replace' sub-command, '.RE' was incorrectly put
before the last argument ('expires').

Fixes: 3fbe7ca847367 ("iproute2: ip-route.8.in: Add expires option for ip route")
Signed-off-by: Phil Sutter <phil@nwl.cc>

ss: print tcpi_rcv_mss and tcpi_advmss

tcpi_rcv_mss and tcpi_advmss tcp info fields were not yet reported
by ss.

While adding GRO support to packetdrill, I found this was useful.

Signed-off-by: Eric Dumazet <edumazet@google.com>

ip: HSR: Fix cut and paste error

Fixes: 5c0aec93a516 ("ip: Add HSR support")
Signed-off-by: Ralf Baechle <ralf@linux-mips.org>

ifstat: Add xstat to ifstat man page

Add documentation about the extended statistics to the ifstat man page.
Add ifstat man age to the man8 Makefile

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>

ifstat: Add "sw only" extended statistics to ifstat

Add support for extended statistics of SW only type, for counting only the
packets that went via the cpu. (useful for systems with forward
offloading). It reads it from filter type IFLA_STATS_LINK_OFFLOAD_XSTATS
and sub type IFLA_OFFLOAD_XSTATS_CPU_HIT.

It is under the name 'cpu_hits'
(or any shorten of it as 'cpu' or simply 'c')

For example:
ifstat -x c

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>

ifstat: Add extended statistics to ifstat

Extended stats are part of the RTM_GETSTATS method. This patch adds them
to ifstat.
While extended stats can come in many forms, we support only the
rtnl_link_stats64 struct for them (which is the 64 bits version of struct
rtnl_link_stats).
We support stats in the main nesting level, or one lower.
The extension can be called by its name or any shorten of it. If there is
more than one matched, the first one will be picked.

To get the extended stats the flag -x <stats type> is used.

Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>