Vivien Didelot [Mon, 6 Nov 2017 21:11:51 +0000 (16:11 -0500)]
net: dsa: setup routing table
The *_complete() functions take too much arguments to do only one thing:
they try to fetch the dsa_port structures corresponding to device nodes
under the "link" list property of DSA ports, and use them to setup the
routing table of switches.
This patch simplifies them by providing instead simpler
dsa_{port,switch,tree}_setup_routing_table functions which return a
boolean value, true if the tree is complete.
dsa_tree_setup_routing_table is called inside dsa_tree_setup which
simplifies the switch registering function as well.
A switch's routing table is now initialized before its setup.
This also makes dsa_port_is_valid obsolete, remove it.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 6 Nov 2017 21:11:50 +0000 (16:11 -0500)]
net: dsa: use of_for_each_phandle
The OF code provides a of_for_each_phandle() helper to iterate over
phandles. Use it instead of arbitrary iterating ourselves over the list
of phandles hanging to the "link" property of the port's device node.
The of_phandle_iterator_next() helper calls of_node_put() itself on
it.node. Thus We must only do it ourselves if we break the loop.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 6 Nov 2017 21:11:49 +0000 (16:11 -0500)]
net: dsa: add find port by node helper
Instead of having two dsa_ds_find_port_dn (which returns a bool) and
dsa_dst_find_port_dn (which returns a switch) functions, provide a more
explicit dsa_tree_find_port_by_node function which returns a matching
port.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 6 Nov 2017 21:11:48 +0000 (16:11 -0500)]
net: dsa: setup and teardown ports
The dsa_dsa_port_apply and dsa_cpu_port_apply functions do exactly the
same. The dsa_user_port_apply function does not try to register a fixed
link but try to create a slave.
This commit factorizes and scopes all that in two convenient
dsa_port_setup and dsa_port_teardown functions.
It won't hurt to register a devlink_port for unused port as well.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 6 Nov 2017 21:11:47 +0000 (16:11 -0500)]
net: dsa: setup and teardown switches
This patches brings no functional changes. It removes the unused dst
argument from the dsa_ds_apply and dsa_ds_unapply functions, rename them
to dsa_switch_setup and dsa_switch_teardown for a more explicit scope.
This clarifies the steps of the setup or teardown of a switch fabric.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 6 Nov 2017 21:11:46 +0000 (16:11 -0500)]
net: dsa: setup and teardown tree
This commit provides better scope for the DSA tree setup and teardown
functions. It renames the "applied" bool to "setup" and print a message
when the tree is setup, as it is done during teardown.
At the same time, check dst->setup in dsa_tree_setup, where it is set to
true.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 6 Nov 2017 21:11:44 +0000 (16:11 -0500)]
net: dsa: setup and teardown default CPU port
The dsa_dst_parse function called just before dsa_dst_apply does not
parse the tree but does only one thing: it assigns the default CPU port
to dst->cpu_dp and to each user ports.
This patch simplifies this by calling a dsa_tree_setup_default_cpu
function at the beginning of dsa_dst_apply directly.
A dsa_port_is_user helper is added for convenience.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Mon, 6 Nov 2017 21:11:43 +0000 (16:11 -0500)]
net: dsa: constify cpu_dp member of dsa_port
A DSA port has a dedicated CPU port assigned to it, stored in the cpu_dp
member. It is not meant to be modified by a port, thus make it const.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yi Yang [Tue, 7 Nov 2017 13:07:02 +0000 (21:07 +0800)]
openvswitch: enable NSH support
v16->17
- Fixed disputed check code: keep them in nsh_push and nsh_pop
but also add them in __ovs_nla_copy_actions
v15->v16
- Add csum recalculation for nsh_push, nsh_pop and set_nsh
pointed out by Pravin
- Move nsh key into the union with ipv4 and ipv6 and add
check for nsh key in match_validate pointed out by Pravin
- Add nsh check in validate_set and __ovs_nla_copy_actions
v14->v15
- Check size in nsh_hdr_from_nlattr
- Fixed four small issues pointed out By Jiri and Eric
v13->v14
- Rename skb_push_nsh to nsh_push per Dave's comment
- Rename skb_pop_nsh to nsh_pop per Dave's comment
v12->v13
- Fix NSH header length check in set_nsh
v11->v12
- Fix missing changes old comments pointed out
- Fix new comments for v11
v10->v11
- Fix the left three disputable comments for v9
but not fixed in v10.
v9->v10
- Change struct ovs_key_nsh to
struct ovs_nsh_key_base base;
__be32 context[NSH_MD1_CONTEXT_SIZE];
- Fix new comments for v9
v8->v9
- Fix build error reported by daily intel build
because nsh module isn't selected by openvswitch
v7->v8
- Rework nested value and mask for OVS_KEY_ATTR_NSH
- Change pop_nsh to adapt to nsh kernel module
- Fix many issues per comments from Jiri Benc
v6->v7
- Remove NSH GSO patches in v6 because Jiri Benc
reworked it as another patch series and they have
been merged.
- Change it to adapt to nsh kernel module added by NSH
GSO patch series
v5->v6
- Fix the rest comments for v4.
- Add NSH GSO support for VxLAN-gpe + NSH and
Eth + NSH.
v4->v5
- Fix many comments by Jiri Benc and Eric Garver
for v4.
v3->v4
- Add new NSH match field ttl
- Update NSH header to the latest format
which will be final format and won't change
per its author's confirmation.
- Fix comments for v3.
v2->v3
- Change OVS_KEY_ATTR_NSH to nested key to handle
length-fixed attributes and length-variable
attriubte more flexibly.
- Remove struct ovs_action_push_nsh completely
- Add code to handle nested attribute for SET_MASKED
- Change PUSH_NSH to use the nested OVS_KEY_ATTR_NSH
to transfer NSH header data.
- Fix comments and coding style issues by Jiri and Eric
v1->v2
- Change encap_nsh and decap_nsh to push_nsh and pop_nsh
- Dynamically allocate struct ovs_action_push_nsh for
length-variable metadata.
OVS master and 2.8 branch has merged NSH userspace
patch series, this patch is to enable NSH support
in kernel data path in order that OVS can support
NSH in compat mode by porting this.
Signed-off-by: Yi Yang <yi.y.yang@intel.com> Acked-by: Jiri Benc <jbenc@redhat.com> Acked-by: Eric Garver <e@erig.me> Acked-by: Pravin Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Tue, 7 Nov 2017 10:38:32 +0000 (11:38 +0100)]
pktgen: document 32-bit timestamp overflow
Timestamps in pktgen are currently retrieved using the deprecated
do_gettimeofday() function that wraps its signed 32-bit seconds in 2038
(on 32-bit architectures) and requires a division operation to calculate
microseconds.
The pktgen header is also defined with the same limitations, hardcoding
to a 32-bit seconds field that can be interpreted as unsigned to produce
times that only wrap in 2106. Whatever code reads the timestamps should
be aware of that problem in general, but probably doesn't care too
much as we are mostly interested in the time passing between packets,
and that is correctly represented.
Using 64-bit nanoseconds would be cheaper and good for 584 years. Using
monotonic times would also make this unambiguous by avoiding the overflow,
but would make it harder to correlate to the times with those on remote
machines. Either approach would require adding a new runtime flag and
implementing the same thing on the remote side, which we probably don't
want to do unless someone sees it as a real problem. Also, this should
be coordinated with other pktgen implementations and might need a new
magic number.
For the moment, I'm documenting the overflow in the source code, and
changing the implementation over to an open-coded ktime_get_real_ts64()
plus division, so we don't have to look at it again while scanning for
deprecated time interfaces.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next
tree, they are:
1) Speed up table replacement on busy systems with large tables
(and many cores) in x_tables. Now xt_replace_table() synchronizes by
itself by waiting until all cpus had an even seqcount and we use no
use seqlock when fetching old counters, from Florian Westphal.
2) Add nf_l4proto_log_invalid() and nf_ct_l4proto_log_invalid() to speed
up packet processing in the fast path when logging is not enabled, from
Florian Westphal.
3) Precompute masked address from configuration plane in xt_connlimit,
from Florian.
4) Don't use explicit size for set selection if performance set policy
is selected.
5) Allow to get elements from an existing set in nf_tables.
6) Fix incorrect check in nft_hash_deactivate(), from Florian.
7) Cache netlink attribute size result in l4proto->nla_size, from
Florian.
8) Handle NFPROTO_INET in nf_ct_netns_get() from conntrack core.
9) Use power efficient workqueue in conntrack garbage collector, from
Vincent Guittot.
10) Remove unnecessary parameter, in conntrack l4proto functions, also
from Florian.
11) Constify struct nf_conntrack_l3proto definitions, from Florian.
12) Remove all typedefs in nf_conntrack_h323 via coccinelle semantic
patch, from Harsha Sharma.
13) Don't store address in the rbtree nodes in xt_connlimit, they are
never used, from Florian.
14) Fix out of bound access in the conntrack h323 helper, patch from
Eric Sesterhenn.
15) Print symbols for the address returned with %pS in IPVS, from
Helge Deller.
16) Proc output should only display its own netns in IPVS, from
KUWAZAWA Takuya.
17) Small clean up in size_entry_mwt(), from Colin Ian King.
18) Use test_and_clear_bit from nf_nat_proto_clean() instead of separated
non-atomic test and then clear bit, from Florian Westphal.
19) Consolidate prefix length maps in ipset, from Aaron Conole.
20) Fix sparse warnings in ipset, from Jozsef Kadlecsik.
21) Simplify list_set_memsize(), from simran singhal.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Miquel Raynal [Mon, 6 Nov 2017 21:56:53 +0000 (22:56 +0100)]
net: mvpp2: add ethtool GOP statistics
Add ethtool statistics support by reading the GOP statistics from the
hardware counters. Also implement a workqueue to gather the statistics
every second or some 32-bit counters could overflow.
Suggested-by: Stefan Chulski <stefanc@marvell.com> Signed-off-by: Miquel Raynal <miquel.raynal@free-electrons.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
There is no need to release explicitly some devm_ allocated resources.
If the 'mac_probe()' probe function fails, they will be released
automatically, as already done in the other error handling paths of
this function.
Also goto '_return_of_get_parent' as in the other error handling paths.
This is useless (priv->fixed_link is NULL at this point), but at least
it is consistent.
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Mon, 6 Nov 2017 15:04:54 +0000 (15:04 +0000)]
rtnetlink: fix missing size for IFLA_IF_NETNSID
The size for IFLA_IF_NETNSID is missing from the size calculation
because the proceeding semicolon was not removed. Fix this by removing
the semicolon.
Detected by CoverityScan, CID#1461135 ("Structurally dead code")
Fixes: 79e1ad148c84 ("rtnetlink: use netnsid to query interface") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Jiri Benc <jbenc@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 6 Nov 2017 14:04:39 +0000 (15:04 +0100)]
bnxt: fix bnxt_hwrm_fw_set_time for y2038
On 32-bit architectures, rtc_time_to_tm() returns incorrect results
in 2038 or later, and do_gettimeofday() is broken for the same reason.
This changes the code to use ktime_get_real_seconds() and time64_to_tm()
instead, both of them are 2038-safe, and we can also get rid of the
CONFIG_RTC_LIB dependency that way.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Mon, 6 Nov 2017 13:26:10 +0000 (14:26 +0100)]
of: add of_property_read_variable_* dummy helpers
Commit a67e9472da42 ("of: Add array read functions with min/max size
limits") added a new interface for reading variable-length arrays from
DT properties. One user was added in dsa recently and this causes a
build error because that code can be built with CONFIG_OF disabled:
net/dsa/dsa2.c: In function 'dsa_switch_parse_member_of':
net/dsa/dsa2.c:678:7: error: implicit declaration of function 'of_property_read_variable_u32_array'; did you mean 'of_property_read_u32_array'? [-Werror=implicit-function-declaration]
This adds a dummy functions for of_property_read_variable_u32_array()
and a few others that had been missing here. I decided to move
of_property_read_string() and of_property_read_string_helper() in the
process to make it easier to compare the two sets of function prototypes
to make sure they match.
Fixes: 975e6e32215e ("net: dsa: rework switch parsing") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Rob Herring <robh@kernel.org> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 8 Nov 2017 04:29:06 +0000 (13:29 +0900)]
Merge branch 'dsa-lan9303-Linting'
Egil Hjelmeland says:
====================
net: dsa: lan9303: Linting
This series is non-functional.
- Correct some errors in comments and documentation.
Remove scripts/checkpatch.pl WARNINGs and most CHECKs:
- Replace msleep(1) with usleep_range()
- Adjust indenting
Changes v1 -> v2:
- Removed patch 4 "Remove unnecessary parentheses", to be addressed later
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Mon, 6 Nov 2017 11:42:04 +0000 (12:42 +0100)]
net: dsa: lan9303: Adjust indenting
Remove scripts/checkpatch.pl CHECKs by adjusting indenting.
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Mon, 6 Nov 2017 11:42:03 +0000 (12:42 +0100)]
net: dsa: lan9303: Replace msleep(1) with usleep_range()
Remove scripts/checkpatch.pl WARNING by replacing msleep(1) with usleep_range()
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Mon, 6 Nov 2017 11:42:02 +0000 (12:42 +0100)]
net: dsa: lan9303: Fix syntax errors in device tree examples
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Egil Hjelmeland [Mon, 6 Nov 2017 11:42:01 +0000 (12:42 +0100)]
net: dsa: lan9303: Correct register names in comments
Two comments refer to registers, but lack the LAN9303_ prefix.
Fix that.
Signed-off-by: Egil Hjelmeland <privat@egil-hjelmeland.no> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 8 Nov 2017 03:23:39 +0000 (12:23 +0900)]
Merge branch 'qdisc-RED-offload'
Jiri Pirko says:
====================
qdisc RED offload
Nogah says:
Add an offload support for RED qdisc for mlxsw driver.
The first patch adds the ability to offload RED qdisc by using
ndo_setup_tc. It gives RED three commands, to offload, change or delete
the qdisc, to get the qdisc generic stats and to get it's RED xstats.
There is no enforcement on a driver to offload or not offload the qdisc and
it is up to the driver to decide.
RED qdisc is first being created and only later graft to a parent (unless
it is a root qdisc). For that reason the return value of the offload
replace command that is called in the init process doesn't reflect actual
offload state. The offload state is determined in the dump function so it
can be reflected to the user. This function is also responsible for stats
update.
The patchses 2-3 change the name of TC_SETUP_MQPRIO & TC_SETUP_CBS to match
with the new convention of QDISC prefix.
The rest of the patchset is driver support for the qdisc. Currently only
as root qdisc that is being set on the default traffic class. It supports
only the following parameters of RED: min, max, probability and ECN mode.
Limit and burst size related params are being ignored at this moment.
---
v7->v8 internal: (external RFC->v1)
- patch 1/9:
- unite the offload and un-offload functions
- clean the OFFLOAD flag when the qdisc in not offloaded
- patch 2/9:
- minor change to avoid a conflict
- patch 5/9:
- check for bad min/max values
- clean the offloaded qdisc after a bad config call
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Mon, 6 Nov 2017 06:23:49 +0000 (07:23 +0100)]
mlxsw: spectrum: Support general qdisc stats
Add support for ndo_setup_tc with enum tc_setup_type value of
TC_SETUP_QDISC_STATS. This call updates the generic qdisc stats from the
cache if the handle ID that is asked for matching the root qdisc ID and
fails otherwise.
Currently doesn't support qlen and rqueues.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Mon, 6 Nov 2017 06:23:48 +0000 (07:23 +0100)]
mlxsw: spectrum: Support RED xstats
Add support for ndo_setup_tc with enum tc_setup_type value of
TC_SETUP_RED_XSTATS. This call returns the RED qdisc xstats from the cache
if the handle ID that is asked for matching the root qdisc ID and fails
otherwise.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Mon, 6 Nov 2017 06:23:47 +0000 (07:23 +0100)]
mlxsw: spectrum: Collect tclass related stats periodically
Add more statistics to be collected from the HW periodically. These stats
are tclass based (beside ECN marked packet, that exist only port based).
They are needed to expose RED qdisc stats and xstats correctly.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Mon, 6 Nov 2017 06:23:45 +0000 (07:23 +0100)]
mlxsw: spectrum: Support RED qdisc offload
Add support for ndo_setup_tc with enum tc_setup_type value of TC_SETUP_RED.
This call sets RED qdisc on a traffic class.
This patch supports RED qdisc only as a root qdisc and set in on the
default tclass. It can be set with or without ECN.
Signed-off-by: Yuval Mintz <yuvalm@mellanox.com> Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Mon, 6 Nov 2017 06:23:43 +0000 (07:23 +0100)]
net_sch: cbs: Change TC_SETUP_CBS to TC_SETUP_QDISC_CBS
Change TC_SETUP_CBS to TC_SETUP_QDISC_CBS to match the new convention..
Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Mon, 6 Nov 2017 06:23:42 +0000 (07:23 +0100)]
net_sch: mqprio: Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO
Change TC_SETUP_MQPRIO to TC_SETUP_QDISC_MQPRIO to match the new
convention.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nogah Frankel [Mon, 6 Nov 2017 06:23:41 +0000 (07:23 +0100)]
net_sch: red: Add offload ability to RED qdisc
Add the ability to offload RED qdisc by using ndo_setup_tc.
There are four commands for RED offloading:
* TC_RED_SET: handles set and change.
* TC_RED_DESTROY: handle qdisc destroy.
* TC_RED_STATS: update the qdiscs counters (given as reference)
* TC_RED_XSTAT: returns red xstats.
Whether RED is being offloaded is being determined every time dump action
is being called because parent change of this qdisc could change its
offload state but doesn't require any RED function to be called.
Signed-off-by: Nogah Frankel <nogahf@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Lawrence Brakmo [Mon, 6 Nov 2017 02:44:10 +0000 (18:44 -0800)]
bpf: Rename tcp_bbf.readme to tcp_bpf.readme
The original patch had the wrong filename.
Fixes: bfdf75693875 ("bpf: create samples/bpf/tcp_bpf.readme") Signed-off-by: Lawrence Brakmo <brakmo@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
ila: make identifier format optional and other fixes
The identifier type and checksum neutral mapping bits are optional
in identifier formats. This patch set fixes the implementation to
make them optional and configurable.
Specific items:
- Clean up checksum diff code in ILA
- Add checksum neutral mapping auto so that checksum neutral
mapping can be configured without requiring use of the C-bit
- Add identifier type configuration and allow identifier
type to be configured so that the identifier type field does
not need to be present
- Added ILA documention: ila.txt
I have fixes for ILA in iproute2 that will be poseted separately.
Tested: Ran netperf TCP_RR on various combinations of checksum
mode and the two supported identifier types.
v2:
- Add proper sign off
- In ILA LWT, only check prefix length includes identifier type
if identifier type is enabled (ILA_ATYPE_USE_FORMAT).
- Add a hook type so that it can be specified whether ILA
translation is done on input or output route funciton in
LWT.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Sun, 5 Nov 2017 23:58:25 +0000 (15:58 -0800)]
ila: Add a hook type for LWT routes
In LWT tunnels both an input and output route method is defined.
If both of these are executed in the same path then double translation
happens and the effect is not correct.
This patch adds a new attribute that indicates the hook type. Two
values are defined for route output and route output. ILA
translation is only done for the one that is set. The default is
to enable ILA on route output.
Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Sun, 5 Nov 2017 23:58:24 +0000 (15:58 -0800)]
ila: allow configuration of identifier type
Allow identifier to be explicitly configured for a mapping.
This can either be one of the identifier types specified in the
ILA draft or a value of ILA_ATYPE_USE_FORMAT which means the
identifier type is inferred from the identifier type field.
If a value other than ILA_ATYPE_USE_FORMAT is set for a
mapping then it is assumed that the identifier type field is
not present in an identifier.
Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Sun, 5 Nov 2017 23:58:23 +0000 (15:58 -0800)]
ila: add checksum neutral map auto
Add checksum neutral auto that performs checksum neutral mapping
without using the C-bit. This is enabled by configuration of
a mapping.
The checksum neutral function has been split into
ila_csum_do_neutral_fmt and ila_csum_do_neutral_nofmt. The former
handles the C-bit and includes it in the adjustment value. The latter
just sets the adjustment value on the locator diff only.
Added configuration for checksum neutral map aut in ila_lwt
and ila_xlat.
Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Tom Herbert [Sun, 5 Nov 2017 23:58:22 +0000 (15:58 -0800)]
ila: cleanup checksum diff
Consolidate computing checksum diff into one function.
Add get_csum_diff_iaddr that computes the checksum diff between
an address argument and locator being written. get_csum_diff
calls this using the destination address in the IP header as
the argument.
Also moved ila_init_saved_csum to be close to the checksum
diff functions.
Signed-off-by: Tom Herbert <tom@quantonium.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Troy Kisky [Fri, 3 Nov 2017 17:29:59 +0000 (10:29 -0700)]
net: fec: Let fec_ptp have its own interrupt routine
This is better for code locality and should slightly
speed up normal interrupts.
This also allows PPS clock output to start working for
i.mx7. This is because i.mx7 was already using the limit
of 3 interrupts, and needed another.
Signed-off-by: Troy Kisky <troy.kisky@boundarydevices.com> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
hv_netvsc: fix a hang on channel/mtu changes
It was found that netvsc driver doesn't survive e.g.
test. I was able to identify a hang in guest/host communication, it is
fixed by PATCH1 of this series. PATCH2 is a cosmetic change masking
unneeded messages.
Changes since v1:
- Throw away patches 2 and 3 of the original series as one is unneeded and
the other is not justified [Eric Dumazet, Stephen Hemminger] so I'm only
fixing the hang now, the crash doesn't reproduce. Will keep an eye on it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
hv_netvsc: hide warnings about uninitialized/missing rndis device
Hyper-V hosts are known to send RNDIS messages even after we halt the
device in rndis_filter_halt_device(). Remove user visible messages
as they are not really useful.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
It was found that in some cases host refuses to teardown GPADL for send/
receive buffers (probably when some work with these buffere is scheduled or
ongoing). Change the teardown logic to be:
1) Send NVSP_MSG1_TYPE_REVOKE_* messages
2) Close the channel
3) Teardown GPADLs.
This seems to work reliably.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, we create a LED trigger for any link speed known to a PHY.
These triggers only fire when their exact link speed had been negotiated
(they aren't cumulative, that is, they don't fire for "their or any higher"
link speed).
What we are missing, however, is a trigger which will fire on any link
speed known to the PHY. Such trigger can then be used for implementing a
poor man's substitute of the "link" LED on boards that lack it.
Let's add it.
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phy: leds: Refactor "no link" handler into a separate function
Currently, phy_led_trigger_change_speed() is handling a "no link" condition
like it was some kind of an error (using "goto" to a code at the function
end).
However, having no link at PHY is an ordinary operational state, so let's
handle it in an appropriately named separate function so it is more obvious
what the code is doing.
Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: f3d9832e56c4 ("ipv6: addrconf: cleanup locking in ipv6_add_addr") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Valdis Kletnieks <valdis.kletnieks@vt.edu> Acked-by: David Ahern <dsahern@gmail.com> Tested-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
netfilter: nf_tables: get set elements via netlink
This patch adds a new get operation to look up for specific elements in
a set via netlink interface. You can also use it to check if an interval
already exists.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
netfilter: nf_tables: performance set policy skips size description in selection
Use the complexity and space notations if policy is performance, this
results in placing the bitmap set representation over the hashtable for
key <= 16 for better performance as we discussed during the last NFWS in
Faro, Portugal.
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Vincent Guittot [Thu, 2 Nov 2017 15:16:07 +0000 (16:16 +0100)]
netfilter: conntrack: use power efficient workqueue
conntrack uses the bounded system_long_wq workqueue for its works that
don't have to run on the cpu they have been queued.
Using bounded workqueue prevents the scheduler to make smart decision about
the best place to schedule the work.
This patch replaces system_long_wq with system_power_efficient_wq. the work
stays bounded to a cpu by default unless the CONFIG_WQ_POWER_EFFICIENT is
enable. In the latter case, the work can be scheduled on the best cpu from
a power or a performance point of view.
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Colin Ian King [Mon, 16 Oct 2017 10:24:02 +0000 (11:24 +0100)]
netfilter: ebtables: clean up initialization of buf
buf is initialized to buf_start and then set on the next statement
to buf_start + offsets[i]. Clean this up to just initialize buf
to buf_start + offsets[i] to clean up the clang build warning:
"Value stored to 'buf' during its initialization is never read"
Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
KUWAZAWA Takuya [Sun, 15 Oct 2017 11:54:10 +0000 (20:54 +0900)]
netfilter: ipvs: Fix inappropriate output of procfs
Information about ipvs in different network namespace can be seen via procfs.
How to reproduce:
# ip netns add ns01
# ip netns add ns02
# ip netns exec ns01 ip a add dev lo 127.0.0.1/8
# ip netns exec ns02 ip a add dev lo 127.0.0.1/8
# ip netns exec ns01 ipvsadm -A -t 10.1.1.1:80
# ip netns exec ns02 ipvsadm -A -t 10.1.1.2:80
The ipvsadm displays information about its own network namespace only.
# ip netns exec ns01 ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.1.1.1:80 wlc
# ip netns exec ns02 ipvsadm -Ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.1.1.2:80 wlc
But I can see information about other network namespace via procfs.
# ip netns exec ns01 cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 0A010101:0050 wlc
TCP 0A010102:0050 wlc
# ip netns exec ns02 cat /proc/net/ip_vs
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 0A010102:0050 wlc
netfilter: ipvs: Use %pS printk format for direct addresses
The debug and error printk functions in ipvs uses wrongly the %pF instead of
the %pS printk format specifier for printing symbols for the address returned
by _builtin_return_address(0). Fix it for the ia64, ppc64 and parisc64
architectures.
This patchset introduces an eBPF-based device controller for cgroup v2.
Patches (1) and (2) are a preparational work required to share some code
with the existing device controller implementation.
Patch (3) is the main patch, which introduces a new bpf prog type
and all necessary infrastructure.
Patch (4) moves cgroup_helpers.c/h to use them by patch (4).
Patch (5) implements an example of eBPF program which controls access
to device files and corresponding userspace test.
v3:
Renamed constants introduced by patch (3) to BPF_DEVCG_*
Roman Gushchin [Sun, 5 Nov 2017 13:15:34 +0000 (08:15 -0500)]
selftests/bpf: add a test for device cgroup controller
Add a test for device cgroup controller.
The test loads a simple bpf program which logs all
device access attempts using trace_printk() and forbids
all operations except operations with /dev/zero and
/dev/urandom.
Then the test creates and joins a test cgroup, and attaches
the bpf program to it.
Then it tries to perform some simple device operations
and checks the result:
create /dev/null (should fail)
create /dev/zero (should pass)
copy data from /dev/urandom to /dev/zero (should pass)
copy data from /dev/urandom to /dev/full (should fail)
copy data from /dev/random to /dev/zero (should fail)
Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Gushchin [Sun, 5 Nov 2017 13:15:33 +0000 (08:15 -0500)]
bpf: move cgroup_helpers from samples/bpf/ to tools/testing/selftesting/bpf/
The purpose of this move is to use these files in bpf tests.
Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Gushchin [Sun, 5 Nov 2017 13:15:32 +0000 (08:15 -0500)]
bpf, cgroup: implement eBPF-based device controller for cgroup v2
Cgroup v2 lacks the device controller, provided by cgroup v1.
This patch adds a new eBPF program type, which in combination
of previously added ability to attach multiple eBPF programs
to a cgroup, will provide a similar functionality, but with some
additional flexibility.
This patch introduces a BPF_PROG_TYPE_CGROUP_DEVICE program type.
A program takes major and minor device numbers, device type
(block/character) and access type (mknod/read/write) as parameters
and returns an integer which defines if the operation should be
allowed or terminated with -EPERM.
Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Tejun Heo <tj@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Gushchin [Sun, 5 Nov 2017 13:15:31 +0000 (08:15 -0500)]
device_cgroup: prepare code for bpf-based device controller
This is non-functional change to prepare the device cgroup code
for adding eBPF-based controller for cgroups v2.
The patch performs the following changes:
1) __devcgroup_inode_permission() and devcgroup_inode_mknod()
are moving to the device-cgroup.h and converting into static inline.
2) __devcgroup_check_permission() is exported.
3) devcgroup_check_permission() wrapper is introduced to be used
by both existing and new bpf-based implementations.
Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Tejun Heo <tj@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Roman Gushchin [Sun, 5 Nov 2017 13:15:30 +0000 (08:15 -0500)]
device_cgroup: add DEVCG_ prefix to ACC_* and DEV_* constants
Rename device type and access type constants defined in
security/device_cgroup.c by adding the DEVCG_ prefix.
The reason behind this renaming is to make them global namespace
friendly, as they will be moved to the corresponding header file
by following patches.
Signed-off-by: Roman Gushchin <guro@fb.com> Cc: David S. Miller <davem@davemloft.net> Cc: Tejun Heo <tj@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 5 Nov 2017 14:25:02 +0000 (23:25 +0900)]
Merge tag 'mlx5-updates-2017-11-04' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2017-11-04
This series includes:
From Huy: dscp to priority mapping for Ethernet packet.
===================================================
First six patches enable differentiated services code point (dscp) to
priority mapping for Ethernet packet. Once this feature is
enabled, the packet is routed to the corresponding priority based on its
dscp. User can combine this feature with priority flow control (pfc)
feature to have priority flow control based on the dscp.
Firmware interface:
Mellanox firmware provides two control knobs for this feature:
QPTS register allow changing the trust state between dscp and
pcp mode. The default is pcp mode. Once in dscp mode, firmware will
route the packet based on its dscp value if the dscp field exists.
QPDPM register allow mapping a specific dscp (0 to 63) to a
specific priority (0 to 7). By default, all the dscps are mapped to
priority zero.
Software interface:
This feature is controlled via application priority TLV. IEEE
specification P802.1Qcd/D2.1 defines priority selector id 5 for
application priority TLV. This APP TLV selector defines DSCP to priority
map. This APP TLV can be sent by the switch or can be set locally using
software such as lldptool. In mlx5 drivers, we add the support for net
dcb's getapp and setapp call back. Mlx5 driver only handles the selector
id 5 application entry (dscp application priority application entry).
If user sends multiple dscp to priority APP TLV entries on the same
dscp, the last sent one will take effect. All the previous sent will be
deleted.
The firmware trust state (in QPTS register) is changed based on the
number of dscp to priority application entries. When the first dscp to
priority application entry is added by the user, the trust state is
changed to dscp. When the last dscp to priority application entry is
deleted by the user, the trust state is changed to pcp.
When the port is in DSCP trust state, the transmit queue is selected
based on the dscp of the skb.
When the port is in DSCP trust state and vport inline mode is not NONE,
firmware requires mlx5 driver to copy the IP header to the
wqe ethernet segment inline header if the skb has it.
This is done by changing the transmit queue sq's min inline mode to L3.
Note that the min inline mode of sqs that belong to other features
such as xdpsq, icosq are not modified.
===================================================
Plus to the dscp series, some small misc changes are include as well:
From Inbar, Ethtool msglvl support and some debug prints in DCBNL logic
From Or Gerlitz, Enlarge the NIC TC offload table size
From Rabie, Initialize destination_flow struct to 0
From Feras, Add inner TTC table to IPoIB flow steering
From Tal, Enable CQE based moderation on TX CQ
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add support in the driver ethtool ops to modify the NFP FEC modes.
The FEC modes can be set for vNIC associated with physical ports or
for MAC representor netdevs.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Implement helpers to determine and modify FEC modes via the NSP.
The NSP advertises FEC capabilities on a per port basis and provides
support for:
* Auto mode selection
* Reed Solomon
* BaseR
* None/Off
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
nfp: add get/set link settings ndos to representors
Since it is now safe to modify link settings for representors, we can
attach the get/set link settings ndos to it. The get/set link settings
are nfp_port based operations.
If a port becomes invalid, the representor will be removed in the same
way a vnic would be.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
If the NSP port table has been refreshed, resync the representor state
with the new port information. At the moment, this only entails looking
for invalid ports and killing off representors associated with them.
The repr instance becomes NULL which is safe since the app accessor
function for reprs returns NULL when it cannot access a repr.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
The criteria that reprs cannot be replaced with another new set of reprs
has been removed. This check is not needed since the only use case that
could exercise this at the moment, would be to modify the number of
SRIOV VFs without first disabling them. This case is explicitly
disallowed in any case and subsequent patches in this series
need to be able to replace the running set of reprs.
All cases where the return code used to be checked for the
nfp_app_reprs_set function have been removed.
As stated above, it is not possible for the current code to encounter a
case where reprs exist and need to be replaced.
Signed-off-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 4 Nov 2017 15:48:55 +0000 (16:48 +0100)]
nfp: make use of MAC reinit
Recent management FW images can perform full reinit of MAC cores
without requiring a reboot. When loading the driver check if there
are changes pending and if so call NSP MAC reinit. Full application
FW reload is still required, and all MACs need to be reinited at the
same time (not only the ones which have been reconfigured, and thus
potentially causing disruption to unrelated netdevs) therefore for
now changing MAC config without reloading the driver still remains
future work.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Tested-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Sat, 4 Nov 2017 15:48:54 +0000 (16:48 +0100)]
nfp: don't depend on compiler constant propagation
Matthias reports:
nfp_eth_set_bit_config() is marked as __always_inline to allow gcc to
identify the 'mask' parameter as known to be constant at compile time,
which is required to use the FIELD_GET() macro.
The forced inlining does the trick for gcc, but for kernel builds with
clang it results in undefined symbols:
drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.o: In function
`__nfp_eth_set_aneg':
drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c:(.text+0x787):
undefined reference to `__compiletime_assert_492'
drivers/net/ethernet/netronome/nfp/nfpcore/nfp_nsp_eth.c:(.text+0x7b1):
undefined reference to `__compiletime_assert_496'
These __compiletime_assert_xyx() calls would have been optimized away
if
the compiler had seen 'mask' as a constant.
Add a macro to extract the mask and shift and pass those to
nfp_eth_set_bit_config() separately.
Reported-by: Matthias Kaehlcke <mka@chromium.org> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Tested-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Priyaranjan Jha [Fri, 3 Nov 2017 23:38:48 +0000 (16:38 -0700)]
tcp: higher throughput under reordering with adaptive RACK reordering wnd
Currently TCP RACK loss detection does not work well if packets are
being reordered beyond its static reordering window (min_rtt/4).Under
such reordering it may falsely trigger loss recoveries and reduce TCP
throughput significantly.
This patch improves that by increasing and reducing the reordering
window based on DSACK, which is now supported in major TCP implementations.
It makes RACK's reo_wnd adaptive based on DSACK and no. of recoveries.
- If DSACK is received, increment reo_wnd by min_rtt/4 (upper bounded
by srtt), since there is possibility that spurious retransmission was
due to reordering delay longer than reo_wnd.
- Persist the current reo_wnd value for TCP_RACK_RECOVERY_THRESH (16)
no. of successful recoveries (accounts for full DSACK-based loss
recovery undo). After that, reset it to default (min_rtt/4).
- At max, reo_wnd is incremented only once per rtt. So that the new
DSACK on which we are reacting, is due to the spurious retx (approx)
after the reo_wnd has been updated last time.
- reo_wnd is tracked in terms of steps (of min_rtt/4), rather than
absolute value to account for change in rtt.
In our internal testing, we observed significant increase in throughput,
in scenarios where reordering exceeds min_rtt/4 (previous static value).
Signed-off-by: Priyaranjan Jha <priyarjha@google.com> Signed-off-by: Yuchung Cheng <ycheng@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 5 Nov 2017 13:31:39 +0000 (22:31 +0900)]
Merge branch 'dsa-parsing-stage'
Vivien Didelot says:
====================
net: dsa: parsing stage
When registering a DSA switch, there is basically two stages.
The first stage is the parsing of the switch device, from either device
tree or platform data. It fetches the DSA tree to which it belongs, and
validates its ports. The switch device is then added to the tree, and
the second stage is called if this was the last switch of the tree.
The second stage is the setup of the tree, which validates that the tree
is complete, sets up the routing tables, the default CPU port for user
ports, sets up the switch drivers and finally the master interfaces,
which makes the whole switch fabric functional.
This patch series covers the first parsing stage. It fixes the type of
the switch and tree indexes to unsigned int, simplifies the tree
reference counting and the switch and CPU ports parsing.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 3 Nov 2017 23:05:27 +0000 (19:05 -0400)]
net: dsa: rework switch parsing
When parsing a switch, we have to identify to which tree it belongs and
parse its ports. Provide two functions to separate the OF and platform
data specific paths.
Also use the of_property_read_variable_u32_array function to parse the
OF member array instead of calling of_property_read_u32_index twice.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 3 Nov 2017 23:05:25 +0000 (19:05 -0400)]
net: dsa: rework switch addition and removal
This patch removes the unnecessary index argument from the
dsa_dst_add_ds and dsa_dst_del_ds functions and renames them to
dsa_tree_add_switch and dsa_tree_remove_switch respectively.
In addition to a more explicit scope, we now check the presence of an
existing switch with the same index directly within dsa_tree_add_switch.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 3 Nov 2017 23:05:24 +0000 (19:05 -0400)]
net: dsa: provide a find or new tree helper
Rename dsa_get_dst to dsa_tree_find since it doesn't increment the
reference counter, rename dsa_add_dst to dsa_tree_alloc for symmetry
with dsa_tree_free, and provide a convenient dsa_tree_touch function to
find or allocate a new tree.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 3 Nov 2017 23:05:23 +0000 (19:05 -0400)]
net: dsa: get and put tree reference counting
Provide convenient dsa_tree_get and dsa_tree_put functions scoping a DSA
tree used to increment and decrement its reference counter, instead of
poking directly its kref structure.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 3 Nov 2017 23:05:22 +0000 (19:05 -0400)]
net: dsa: simplify tree reference counting
DSA trees have a refcount used to automatically free the dsa_switch_tree
structure once there is no switch devices inside of it.
The refcount is incremented when a switch is added to the tree, and
decremented when it is removed from it.
But because of kref_init, the refcount is also incremented at
initialization, and when looking up the tree from the list for symmetry.
Thus the current code stores the number of switches plus one, and makes
the switch registration more complex.
To simplify the switch registration function, we reset the refcount to
zero after initialization and don't increment it when looking up a tree.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 3 Nov 2017 23:05:21 +0000 (19:05 -0400)]
net: dsa: make tree index unsigned
Similarly to a DSA switch and port, rename the tree index from "tree" to
"index" and make it an unsigned int because it isn't supposed to be less
than 0.
u32 is an OF specific data used to retrieve the value and has no need to
be propagated up to the tree index.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vivien Didelot [Fri, 3 Nov 2017 23:05:20 +0000 (19:05 -0400)]
net: dsa: make switch index unsigned
Define the DSA switch index as an unsigned int, because it will never be
less than 0.
Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Intiyaz Basha [Fri, 3 Nov 2017 21:32:33 +0000 (14:32 -0700)]
liquidio: do not consider packets dropped by network stack as driver Rx dropped
netdev->rx_dropped was including packets dropped by napi_gro_receive.
If a packet is dropped by network stack, it should not be counted under
driver Rx dropped.
Made necessary changes to not include network stack drops under
netdev->rx_dropped.
Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com> Signed-off-by: Satanand Burla <satananda.burla@cavium.com> Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Fri, 3 Nov 2017 20:59:07 +0000 (13:59 -0700)]
tools: bpftool: move p_err() and p_info() from main.h to common.c
The two functions were declared as static inline in a header file. There
is no particular reason why they should be inlined, they just happened to
remain in the same header file when they were turned from macros to
functions in a precious commit.
Make them non-inlined functions and move them to common.c file instead.
Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
bpf: add offload as a first class citizen
This series is my stab at what was discussed at a recent IOvisor
bi-weekly call. The idea is to make the device translator run at
the program load time. This makes the offload more explicit to
the user space. It also makes it easy for the device translator
to insert information into the original verifier log.
v2:
- include linux/bug.h instead of asm/bug.h;
- rebased on top of Craig's verifier fix (no changes, the last patch
just removes more code now). I checked the set doesn't conflict
with Jiri's, Josef's or Roman's patches, but missed Craig's fix :(
v1:
- rename the ifindex member on load;
- improve commit messages;
- split nfp patches more.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:30 +0000 (13:56 -0700)]
bpf: remove old offload/analyzer
Thanks to the ability to load a program for a specific device,
running verifier twice is no longer needed.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:29 +0000 (13:56 -0700)]
nfp: bpf: move to new BPF program offload infrastructure
Following steps are taken in the driver to offload an XDP program:
XDP_SETUP_PROG:
* prepare:
- allocate program state;
- run verifier (bpf_analyzer());
- run translation;
* load:
- stop old program if needed;
- load program;
- enable BPF if not enabled;
* clean up:
- free program image.
With new infrastructure the flow will look like this:
BPF_OFFLOAD_VERIFIER_PREP:
- allocate program state;
BPF_OFFLOAD_TRANSLATE:
- run translation;
XDP_SETUP_PROG:
- stop old program if needed;
- load program;
- enable BPF if not enabled;
BPF_OFFLOAD_DESTROY:
- free program image.
Take advantage of the new infrastructure. Allocation of driver
metadata has to be moved from jit.c to offload.c since it's now
done at a different stage. Since there is no separate driver
private data for verification step, move temporary nfp_meta
pointer into nfp_prog. We will now use user space context
offsets.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:28 +0000 (13:56 -0700)]
nfp: bpf: move translation prepare to offload.c
struct nfp_prog is currently only used internally by the translator.
This means there is a lot of parameter passing going on, between
the translator and different stages of offload. Simplify things
by allocating nfp_prog in offload.c already.
We will now use kmalloc() to allocate the program area and only
DMA map it for the time of loading (instead of allocating DMA
coherent memory upfront).
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jakub Kicinski [Fri, 3 Nov 2017 20:56:27 +0000 (13:56 -0700)]
nfp: bpf: move program prepare and free into offload.c
Most of offload/translation prepare logic will be moved to
offload.c. To help git generate more reasonable diffs
move nfp_prog_prepare() and nfp_prog_free() functions
there as a first step.
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>