Daniel Borkmann [Wed, 1 Apr 2015 15:57:44 +0000 (17:57 +0200)]
tc, bpf: finalize eBPF support for cls and act front-end
This work finalizes both eBPF front-ends for the classifier and action
part in tc, it allows for custom ELF section selection, a simplified tc
command frontend (while keeping compat), reusing of common maps between
classifier and actions residing in the same object file, and exporting
of all map fds to an eBPF agent for handing off further control in user
space.
It also adds an extensive example of how eBPF can be used, and a minimal
self-contained example agent that dumps map data. The example is well
documented and hopefully provides a good starting point into programming
cls_bpf and act_bpf.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexei Starovoitov <ast@plumgrid.com> Cc: Jiri Pirko <jiri@resnulli.us> Cc: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Alexei Starovoitov <ast@plumgrid.com> Acked-by: Thomas Graf <tgraf@suug.ch> Acked-by: Jiri Pirko <jiri@resnulli.us> Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
add a new command to configure the SPD hash table:
ip xfrm policy set [ hthresh4 LBITS RBITS ] [ hthresh6 LBITS RBITS ]
and code to display the SPD hash configuration:
ip -s -s xfrm policy count
hthresh4: defines minimum local and remote IPv4 prefix lengths of
selectors to hash a policy. If prefix lengths are greater or equal
to the thresholds, then the policy is hashed, otherwise it falls back
in the policy_inexact chained list.
hthresh6: defines minimum local and remote IPv6 prefix lengths of
selectors to hash a policy, otherwise it falls back
in the policy_inexact chained list.
Example:
% ip -s -s xfrm policy count
SPD IN 0 OUT 0 FWD 0 (Sock: IN 0 OUT 0 FWD 0)
SPD buckets: count 7 Max 1048576
SPD IPv4 thresholds: local 32 remote 32
SPD IPv6 thresholds: local 128 remote 128
% ip xfrm pol set hthresh4 24 16 hthresh6 64 56
% ip -s -s xfrm policy count
SPD IN 0 OUT 0 FWD 0 (Sock: IN 0 OUT 0 FWD 0)
SPD buckets: count 7 Max 1048576
SPD IPv4 thresholds: local 24 remote 16
SPD IPv6 thresholds: local 64 remote 56
- Pull in the uapi mpls.h
- Update rtnetlink.h to include the mpls rtnetlink notification multicast group.
- Define AF_MPLS in utils.h if it is not defined from elsewhere
as is done with AF_DECnet
The address syntax for multiple mpls labels is a complete invention.
When I looked there seemed to be no wide spread convention for talking
about an mpls label stack in text for. Sometimes people did:
"{ Label1, Label2, Label3 }", sometimes people would do:
"[ label3, label2, label1 ]", and most of the time label
stacks were not explicitly shown at all.
The syntax I wound up using, so it would not have spaces and so it
would visually distinct from other kinds of addresses is.
label1/label2/label3 Where label1 is the label at the top of the label
stack and label3 is the label at the bottom on the label stack.
When there is a single label this matches what seems to be convention
with other tools. Just print out the numeric value of the mpls label.
The netlink protocol for labels uses the on the wire format for a
label stack. The ttl and traffic class are expected to be 0. Using
the on the wire format is common and what happens with other address
types. BGP when passing label stacks also uses this technique with the
exception that the ttl byte is not included making each label in a BGP
label stack 3 bytes instead of 4.
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
This attribute is like RTA_DST except it specifies the destination
address to place on a packet when it leaves the host. For ip based
protocols this is destination NAT and not a common part of forwarding.
For protocols like MPLS label swapping is something that typically
happens on every hop.
There is likely to be a RTA_NEWSRC at some point so RTA_NEWDST
is printed as "as to" and can be specified either as "as to"
or just "as"
Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Daniel Borkmann [Mon, 16 Mar 2015 18:37:41 +0000 (19:37 +0100)]
tc: add eBPF support to f_bpf
This work adds the tc frontend for kernel commit e2e9b6541dd4 ("cls_bpf:
add initial eBPF support for programmable classifiers").
A C-like classifier program (f.e. see e2e9b6541dd4) is being compiled via
LLVM's eBPF backend into an ELF file, that is then being passed to tc. tc
then loads, if any, eBPF maps and eBPF opcodes (with fixed-up eBPF map file
descriptors) out of its dedicated sections, and via bpf(2) into the kernel
and then the resulting fd via netlink down to cls_bpf. cls_bpf allows for
annotations, currently, I've used the file name for that, so that the user
can easily identify his filter when dumping configurations back.
tc filter show dev em1 [...]
filter parent 1: protocol all pref 49152 bpf handle 0x1 flowid x:y cls.o
I placed the parser bits derived from Alexei's kernel sample, into tc_bpf.c
as my next step is to also add the same support for BPF action, so we can
have a fully fledged eBPF classifier and action in tc.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@plumgrid.com>
Madhu Challa [Wed, 4 Mar 2015 18:30:10 +0000 (10:30 -0800)]
ip: enable configuring multicast group autojoin
Joining multicast group on ethernet level via "ip maddr" command would
not work if we have an Ethernet switch that does igmp snooping since
the switch would not replicate multicast packets on ports that did not
have IGMP reports for the multicast addresses.
Linux vxlan interfaces created via "ip link add vxlan" have the group option
that enables then to do the required join.
By extending ip address command with option "autojoin" we can get similar
functionality for openvswitch vxlan interfaces as well as other tunneling
mechanisms that need to receive multicast traffic.
example:
ip address add 224.1.1.10/24 dev eth5 autojoin
ip address del 224.1.1.10/24 dev eth5
Scott Feldman [Sun, 8 Mar 2015 06:15:35 +0000 (22:15 -0800)]
route: label externally offloaded routes
On ip route print dump, label externally offloaded routes with "external".
Offloaded routes are flagged with RTNH_F_EXTERNAL, a recent additon to
net-next. For example:
$ ip route
default via 192.168.0.2 dev eth0
11.0.0.0/30 dev swp1 proto kernel scope link src 11.0.0.2 external
11.0.0.4/30 via 11.0.0.1 dev swp1 proto zebra metric 20 external
11.0.0.8/30 dev swp2 proto kernel scope link src 11.0.0.10 external
11.0.0.12/30 via 11.0.0.9 dev swp2 proto zebra metric 20 external
12.0.0.2 proto zebra metric 30 external
nexthop via 11.0.0.1 dev swp1 weight 1
nexthop via 11.0.0.9 dev swp2 weight 1
12.0.0.3 via 11.0.0.1 dev swp1 proto zebra metric 20 external
12.0.0.4 via 11.0.0.9 dev swp2 proto zebra metric 20 external
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.15
Signed-off-by: Scott Feldman <sfeldma@gmail.com> Reviewed-by: Jiri Pirko <jiri@resnulli.us>
Daniel Borkmann [Wed, 18 Mar 2015 09:13:34 +0000 (10:13 +0100)]
tc: m_bpf: fix next arg selection after tc opcode
Next argument after the tc opcode/verdict is optional, using NEXT_ARG()
requires to have another argument after that one otherwise tc will bail
out. Therefore, we need to advance to the next argument manually as done
elsewhere.
Fixes: 86ab59a6660f ("tc: add support for BPF based actions") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Pirko <jiri@resnulli.us>
Roopa Prabhu [Wed, 18 Mar 2015 02:18:28 +0000 (19:18 -0700)]
lib utils: fix family during af_bit_len calculation
commit f3a2ddc124e0 ("lib utils: Use helpers to get AF bit/byte len")
used a wrong family or family of zero in the default case
during af_bit_len calculation causing ip route commands to
fail with below error
Error: an inet prefix is expected rather than "10.0.2.14/24".
Mark Einon [Mon, 16 Mar 2015 09:59:09 +0000 (09:59 +0000)]
ip: Make uniform the use of synonyms list, show and lst
Where used in the ip tool, the 'show' option always has the synonyms
'list' and 'lst', except for ip-token and ip-addrlabel, which are missing
'lst'. Add this as a synonym for these commands.
Vadim Kochan [Sat, 7 Mar 2015 06:30:58 +0000 (08:30 +0200)]
ip netns: Fix rtnl error while print netns list
Observed on the Linux 3.18:
# ip netns
RTNETLINK answers: Operation not supported
net0
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> Fixes: d182ee1307c7 ("ipnetns: allow to get and set netns ids") Signed-off-by: Vadim Kochan <vadim4j@gmail.com>
Keep ss output consistent and format DCTCP socket statistics similar to skmen
and timer where a group of logical values are grouped by brackets. This makes
parser scripts *and* humans more happy.
Nicolas Dichtel [Tue, 17 Feb 2015 16:30:38 +0000 (17:30 +0100)]
iplink: add support of IFLA_LINK_NETNSID attribute
This new attribute is now advertised by the kernel for x-netns interfaces.
It's also possible to set it when an interface is created (and thus creating a
x-netns interface with one single message).
Example:
$ ip netns add foo
$ ip netns add bar
$ ip -n foo netns set bar 15
$ ip -n foo link add ipip1 link-netnsid 15 type ipip remote 10.16.0.121 local 10.16.0.249
$ ip -n foo link ls ipip1
3: ipip1@NONE: <POINTOPOINT,NOARP> mtu 1480 qdisc noop state DOWN mode DEFAULT group default
link/ipip 10.16.0.249 peer 10.16.0.121 link-netnsid 15
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Nicolas Dichtel [Tue, 17 Feb 2015 16:30:37 +0000 (17:30 +0100)]
ipnetns: allow to get and set netns ids
The kernel now provides ids for peer netns. This patch implements a new command
'set' to assign an id.
When netns are listed, if an id is assigned, it is now displayed.
Example:
$ ip netns add foo
$ ip netns set foo 1
$ ip netns
foo (id: 1)
init_net
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Tom Herbert [Thu, 29 Jan 2015 16:52:01 +0000 (08:52 -0800)]
iproute: Descriptions of fou and gue options in ip-link man pages
Add section for additional arguments to GRE, IPIP, and SIT types
that are related to Foo-over-UDP and Generic UDP Encapsulation.
Also, added an example GUE configuration in the examples section.
Tom Herbert [Thu, 29 Jan 2015 16:51:58 +0000 (08:51 -0800)]
ip link: Add support for remote checksum offload to IP tunnels
This patch adds support to remote checksum checksum offload
confinguration for IPIP, SIT, and GRE tunnels. This patch
adds a [no]encap-remcsum to ip link command which applicable
when configured tunnels that use GUE.
ip link add name tun1 type gre remote 192.168.1.1 local 192.168.1.2 \
ttl 225 encap fou encap-sport auto encap-dport 7777 encap-csum \
encap-remcsum
This would create an GRE tunnel in GUE encapsulation where the source
port is automatically selected (based on hash of inner packet),
checksums in the encapsulating UDP header are enabled (needed.for
remote checksum offload), and remote checksum ffload is configured to
be used on the tunnel (affects TX side).
Roopa Prabhu [Mon, 26 Jan 2015 02:26:24 +0000 (18:26 -0800)]
iproute2: bridge: support vlan range adds
This patch adds vlan range support to bridge add command
using the newly added vinfo flags BRIDGE_VLAN_INFO_RANGE_BEGIN and
BRIDGE_VLAN_INFO_RANGE_END.
$bridge vlan show
port vlan ids
br0 1 PVID Egress Untagged
dummy0 1 PVID Egress Untagged
$bridge vlan add vid 10-15 dev dummy0
port vlan ids
br0 1 PVID Egress Untagged
dummy0 1 PVID Egress Untagged
10
11
12
13
14
15
$bridge vlan del vid 14 dev dummy0
$bridge vlan show
port vlan ids
br0 1 PVID Egress Untagged
dummy0 1 PVID Egress Untagged
10
11
12
13
15
$bridge vlan del vid 10-15 dev dummy0
$bridge vlan show
port vlan ids
br0 1 PVID Egress Untagged
dummy0 1 PVID Egress Untagged
Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: Wilson Kok <wkok@cumulusnetworks.com>
Oliver Hartkopp [Thu, 22 Jan 2015 18:04:33 +0000 (19:04 +0100)]
can: Add support for CAN FD non-ISO feature
This patch makes CAN_CTRLMODE_FD_NON_ISO netlink feature configurable.
During the CAN FD standardization process within the ISO it turned out that
the failure detection capability has to be improved.
The CAN in Automation organization (CiA) defined the already implemented CAN
FD controllers as 'non-ISO' and the upcoming improved CAN FD controllers as
'ISO' compliant. See at http://www.can-cia.com/index.php?id=1937
Starting with the - currently non-ISO - driver for M_CAN v3.0.1 introduced in
Linux 3.18 this bit needs to be propagated to userspace. In future drivers this
bit will become configurable depending on the CAN FD controllers capabilities.
Signed-off-by: Oliver Hartkopp <socketcan@hartkopp.net>
iproute2/ip: fix up filter when printing addresses
"ip addr show up" would exclude the interface (link), but include the
addresses of down interfaces (which looked like they where indented
under a different interface). This fixes the filtering.
For a full example see the original bug report at:
http://bugs.debian.org/776040
Reported-by: Paul Slootman <paul@debian.org> CC: 776040@bugs.debian.org Signed-off-by: Andreas Henriksson <andreas@fatal.se>
Vadim Kochan [Sun, 18 Jan 2015 14:10:18 +0000 (16:10 +0200)]
ip netns: Allow exec on each netns
This change allows to exec some cmd on each
named netns (except default) by specifying '-all' option:
# ip -all netns exec ip link
Each command executes synchronously.
Exit status is not considered, so there might be a case
that some CMD can fail on some netns but success on the other.
EXAMPLES:
1) Show link info on all netns:
$ ip -all netns exec ip link
netns: test_net
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: tap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 500
link/ether 1a:19:6f:25:eb:85 brd ff:ff:ff:ff:ff:ff
netns: home0
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: tap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 500
link/ether ea:1a:59:40:d3:29 brd ff:ff:ff:ff:ff:ff
netns: lan0
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN mode DEFAULT group default
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
4: tap0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast state DOWN mode DEFAULT group default qlen 500
link/ether ce:49:d5:46:81:ea brd ff:ff:ff:ff:ff:ff
Vadim Kochan [Thu, 8 Jan 2015 17:32:22 +0000 (19:32 +0200)]
ss: Filter inet dgram sockets with established state by default
As inet dgram sockets (udp, raw) can call connect(...) - they
might be set in ESTABLISHED state. So keep the original behaviour of
'ss' which filtered them by ESTABLISHED state by default. So:
$ ss -u
or
$ ss -w
Will show only ESTABLISHED UDP sockets by default.
Nicolas Dichtel [Thu, 15 Jan 2015 10:36:25 +0000 (11:36 +0100)]
lib: fix setns() function when !HAVE_SETNS
When HAVE_SETNS is not set, iproute2 provides a local implementation of this
function based on __NR_setns.
This macro is defined in sys/syscall.h, which was not included, thus the local
implementation always returned -1.
CC: Vadim Kochan <vadim4j@gmail.com> Fixes: eb67e4498aec ("lib: Add netns_switch func for change network namespace") Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>