KP Singh [Wed, 4 Mar 2020 19:18:53 +0000 (20:18 +0100)]
bpf: Add selftests for BPF_MODIFY_RETURN
Test for two scenarios:
* When the fmod_ret program returns 0, the original function should
be called along with fentry and fexit programs.
* When the fmod_ret program returns a non-zero value, the original
function should not be called, no side effect should be observed and
fentry and fexit programs should be called.
The result from the kernel function call and whether a side-effect is
observed is returned via the retval attr of the BPF_PROG_TEST_RUN (bpf)
syscall.
KP Singh [Wed, 4 Mar 2020 19:18:52 +0000 (20:18 +0100)]
bpf: Add test ops for BPF_PROG_TYPE_TRACING
The current fexit and fentry tests rely on a different program to
exercise the functions they attach to. Instead of doing this, implement
the test operations for tracing which will also be used for
BPF_MODIFY_RETURN in a subsequent patch.
Also, clean up the fexit test to use the generated skeleton.
KP Singh [Wed, 4 Mar 2020 19:18:49 +0000 (20:18 +0100)]
bpf: Introduce BPF_MODIFY_RETURN
When multiple programs are attached, each program receives the return
value from the previous program on the stack and the last program
provides the return value to the attached function.
The fmod_ret bpf programs are run after the fentry programs and before
the fexit programs. The original function is only called if all the
fmod_ret programs return 0 to avoid any unintended side-effects. The
success value, i.e. 0 is not currently configurable but can be made so
where user-space can specify it at load time.
For example:
int func_to_be_attached(int a, int b)
{ <--- do_fentry
do_fmod_ret:
<update ret by calling fmod_ret>
if (ret != 0)
goto do_fexit;
original_function:
<side_effects_happen_here>
} <--- do_fexit
The fmod_ret program attached to this function can be defined as:
SEC("fmod_ret/func_to_be_attached")
int BPF_PROG(func_name, int a, int b, int ret)
{
// This will skip the original function logic.
return 1;
}
The first fmod_ret program is passed 0 in its return argument.
KP Singh [Wed, 4 Mar 2020 19:18:48 +0000 (20:18 +0100)]
bpf: JIT helpers for fmod_ret progs
* Split the invoke_bpf program to prepare for special handling of
fmod_ret programs introduced in a subsequent patch.
* Move the definition of emit_cond_near_jump and emit_nops as they are
needed for fmod_ret.
* Refactor branch target alignment into its own generic helper function
i.e. emit_align.
KP Singh [Wed, 4 Mar 2020 19:18:47 +0000 (20:18 +0100)]
bpf: Refactor trampoline update code
As we need to introduce a third type of attachment for trampolines, the
flattened signature of arch_prepare_bpf_trampoline gets even more
complicated.
Refactor the prog and count argument to arch_prepare_bpf_trampoline to
use bpf_tramp_progs to simplify the addition and accounting for new
attachment types.
Andrii Nakryiko [Wed, 4 Mar 2020 18:43:36 +0000 (10:43 -0800)]
selftests/bpf: Support out-of-tree vmlinux builds for VMLINUX_BTF
Add detection of out-of-tree built vmlinux image for the purpose of
VMLINUX_BTF detection. According to Documentation/kbuild/kbuild.rst, O takes
precedence over KBUILD_OUTPUT.
Also ensure ~/path/to/build/dir also works by relying on wildcard's resolution
first, but then applying $(abspath) at the end to also handle
O=../../whatever cases.
Kees Cook [Wed, 4 Mar 2020 02:18:34 +0000 (18:18 -0800)]
kbuild: Remove debug info from kallsyms linking
When CONFIG_DEBUG_INFO is enabled, the two kallsyms linking steps spend
time collecting and writing the dwarf sections to the temporary output
files. kallsyms does not need this information, and leaving it off
halves their linking time. This is especially noticeable without
CONFIG_DEBUG_INFO_REDUCED. The BTF linking stage, however, does still
need those details.
Refactor the BTF and kallsyms generation stages slightly for more
regularized temporary names. Skip debug during kallsyms links.
Additionally move "info BTF" to the correct place since commit 8959e39272d6 ("kbuild: Parameterize kallsyms generation and correct
reporting"), which added "info LD ..." to vmlinux_link calls.
For a full debug info build with BTF, my link time goes from 1m06s to
0m54s, saving about 12 seconds, or 18%.
Daniel Borkmann [Wed, 4 Mar 2020 16:00:06 +0000 (17:00 +0100)]
Merge branch 'bpf-uapi-enums'
Andrii Nakryiko says:
====================
Convert BPF-related UAPI constants, currently defined as #define macro, into
anonymous enums. This has no difference in terms of usage of such constants in
C code (they are still could be used in all the compile-time contexts that
`#define`s can), but they are recorded as part of DWARF type info, and
subsequently get recorded as part of kernel's BTF type info. This allows those
constants to be emitted as part of vmlinux.h auto-generated header file and be
used from BPF programs. Which is especially convenient for all kinds of BPF
helper flags and makes CO-RE BPF programs nicer to write.
libbpf's btf_dump logic currently assumes enum values are signed 32-bit
values, but that doesn't match a typical case, so switch it to emit unsigned
values. Once BTF encoding of BTF_KIND_ENUM is extended to capture signedness
properly, this will be made more flexible.
As an immediate validation of the approach, runqslower's copy of
BPF_F_CURRENT_CPU #define is dropped in favor of its enum variant from
vmlinux.h.
v2->v3:
- convert only constants usable from BPF programs (BPF helper flags, map
create flags, etc) (Alexei);
v1->v2:
- fix up btf_dump test to use max 32-bit unsigned value instead of negative one.
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Andrii Nakryiko [Tue, 3 Mar 2020 00:32:33 +0000 (16:32 -0800)]
tools/runqslower: Drop copy/pasted BPF_F_CURRENT_CPU definiton
With BPF_F_CURRENT_CPU being an enum, it is now captured in vmlinux.h and is
readily usable by runqslower. So drop local copy/pasted definition in favor of
the one coming from vmlinux.h.
Andrii Nakryiko [Tue, 3 Mar 2020 00:32:32 +0000 (16:32 -0800)]
libbpf: Assume unsigned values for BTF_KIND_ENUM
Currently, BTF_KIND_ENUM type doesn't record whether enum values should be
interpreted as signed or unsigned. In Linux, most enums are unsigned, though,
so interpreting them as unsigned matches real world better.
Change btf_dump test case to test maximum 32-bit value, instead of negative
value.
Andrii Nakryiko [Tue, 3 Mar 2020 00:32:31 +0000 (16:32 -0800)]
bpf: Switch BPF UAPI #define constants used from BPF program side to enums
Switch BPF UAPI constants, previously defined as #define macro, to anonymous
enum values. This preserves constants values and behavior in expressions, but
has added advantaged of being captured as part of DWARF and, subsequently, BTF
type info. Which, in turn, greatly improves usefulness of generated vmlinux.h
for BPF applications, as it will not require BPF users to copy/paste various
flags and constants, which are frequently used with BPF helpers. Only those
constants that are used/useful from BPF program side are converted.
BPF programs may want to know whether an skb is gso. The canonical
answer is skb_is_gso(skb), which tests that gso_size != 0.
Expose this field in the same manner as gso_segs. That field itself
is not a sufficient signal, as the comment in skb_shared_info makes
clear: gso_segs may be zero, e.g., from dodgy sources.
Also prepare net/bpf/test_run for upcoming BPF_PROG_TEST_RUN tests
of the feature.
====================
This patch series adds bpf_link abstraction, analogous to libbpf's already
existing bpf_link abstraction. This formalizes and makes more uniform existing
bpf_link-like BPF program link (attachment) types (raw tracepoint and tracing
links), which are FD-based objects that are automatically detached when last
file reference is closed. These types of BPF program links are switched to
using bpf_link framework.
FD-based bpf_link approach provides great safety guarantees, by ensuring there
is not going to be an abandoned BPF program attached, if user process suddenly
exits or forgets to clean up after itself. This is especially important in
production environment and is what all the recent new BPF link types followed.
One of the previously existing inconveniences of FD-based approach, though,
was the scenario in which user process wants to install BPF link and exit, but
let attached BPF program run. Now, with bpf_link abstraction in place, it's
easy to support pinning links in BPF FS, which is done as part of the same
patch #1. This allows FD-based BPF program links to survive exit of a user
process and original file descriptor being closed, by creating an file entry
in BPF FS. This provides great safety by default, with simple way to opt out
for cases where it's needed.
Corresponding libbpf APIs are added in the same patch set, as well as
selftests for this functionality.
Other types of BPF program attachments (XDP, cgroup, perf_event, etc) are
going to be converted in subsequent patches to follow similar approach.
v1->v2:
- use bpf_link_new_fd() uniformly (Alexei).
====================
Andrii Nakryiko [Tue, 3 Mar 2020 04:31:58 +0000 (20:31 -0800)]
libbpf: Add bpf_link pinning/unpinning
With bpf_link abstraction supported by kernel explicitly, add
pinning/unpinning API for links. Also allow to create (open) bpf_link from BPF
FS file.
This API allows to have an "ephemeral" FD-based BPF links (like raw tracepoint
or fexit/freplace attachments) surviving user process exit, by pinning them in
a BPF FS, which is an important use case for long-running BPF programs.
As part of this, expose underlying FD for bpf_link. While legacy bpf_link's
might not have a FD associated with them (which will be expressed as
a bpf_link with fd=-1), kernel's abstraction is based around FD-based usage,
so match it closely. This, subsequently, allows to have a generic
pinning/unpinning API for generalized bpf_link. For some types of bpf_links
kernel might not support pinning, in which case bpf_link__pin() will return
error.
With FD being part of generic bpf_link, also get rid of bpf_link_fd in favor
of using vanialla bpf_link.
Andrii Nakryiko [Tue, 3 Mar 2020 04:31:57 +0000 (20:31 -0800)]
bpf: Introduce pinnable bpf_link abstraction
Introduce bpf_link abstraction, representing an attachment of BPF program to
a BPF hook point (e.g., tracepoint, perf event, etc). bpf_link encapsulates
ownership of attached BPF program, reference counting of a link itself, when
reference from multiple anonymous inodes, as well as ensures that release
callback will be called from a process context, so that users can safely take
mutex locks and sleep.
Additionally, with a new abstraction it's now possible to generalize pinning
of a link object in BPF FS, allowing to explicitly prevent BPF program
detachment on process exit by pinning it in a BPF FS and let it open from
independent other process to keep working with it.
Convert two existing bpf_link-like objects (raw tracepoint and tracing BPF
program attachments) into utilizing bpf_link framework, making them pinnable
in BPF FS. More FD-based bpf_links will be added in follow up patches.
selftests/bpf: Declare bpf_log_buf variables as static
The cgroup selftests did not declare the bpf_log_buf variable as static, leading
to a linker error with GCC 10 (which defaults to -fno-common). Fix this by
adding the missing static declarations.
Andrii Nakryiko [Sun, 1 Mar 2020 08:10:43 +0000 (00:10 -0800)]
bpf: Reliably preserve btf_trace_xxx types
btf_trace_xxx types, crucial for tp_btf BPF programs (raw tracepoint with
verifier-checked direct memory access), have to be preserved in kernel BTF to
allow verifier do its job and enforce type/memory safety. It was reported
([0]) that for kernels built with Clang current type-casting approach doesn't
preserve these types.
This patch fixes it by declaring an anonymous union for each registered
tracepoint, capturing both struct bpf_raw_event_map information, as well as
recording btf_trace_##call type reliably. Structurally, it's still the same
content as for a plain struct bpf_raw_event_map, so no other changes are
necessary.
====================
Move BPF_PROG, BPF_KPROBE, and BPF_KRETPROBE helper macros from private
selftests helpers to public libbpf ones. These helpers are extremely helpful
for writing tracing BPF applications and have been requested to be exposed for
easy use (e.g., [0]).
As part of this move, fix up BPF_KRETPROBE to not allow for capturing input
arguments (as it's unreliable and they will be often clobbered). Also, add
vmlinux.h header guard to allow multi-time inclusion, if necessary; but also
to let PT_REGS_PARM do proper detection of struct pt_regs field names on x86
arch. See relevant patches for more details.
Andrii Nakryiko [Sat, 29 Feb 2020 23:11:11 +0000 (15:11 -0800)]
selftests/bpf: Fix BPF_KRETPROBE macro and use it in attach_probe test
For kretprobes, there is no point in capturing input arguments from pt_regs,
as they are going to be, most probably, clobbered by the time probed kernel
function returns. So switch BPF_KRETPROBE to accept zero or one argument
(optional return result).
Andrii Nakryiko [Sat, 29 Feb 2020 23:11:10 +0000 (15:11 -0800)]
libbpf: Fix use of PT_REGS_PARM macros with vmlinux.h
Add detection of vmlinux.h to bpf_tracing.h header for PT_REGS macro.
Currently, BPF applications have to define __KERNEL__ symbol to use correct
definition of struct pt_regs on x86 arch. This is due to different field names
under internal kernel vs UAPI conditions. To make this more transparent for
users, detect vmlinux.h by checking __VMLINUX_H__ symbol.
Andrii Nakryiko [Sat, 29 Feb 2020 23:11:09 +0000 (15:11 -0800)]
bpftool: Add header guards to generated vmlinux.h
Add canonical #ifndef/#define/#endif guard for generated vmlinux.h header with
__VMLINUX_H__ symbol. __VMLINUX_H__ is also going to play double role of
identifying whether vmlinux.h is being used, versus, say, BCC or non-CO-RE
libbpf modes with dependency on kernel headers. This will make it possible to
write helper macro/functions, agnostic to exact BPF program set up.
mvneta: add XDP ethtool errors stats for TX to driver
Adding ethtool stats for when XDP transmitted packets overrun the TX
queue. This is recorded separately for XDP_TX and ndo_xdp_xmit. This
is an important aid for troubleshooting XDP based setups.
It is currently a known weakness and property of XDP that there isn't
any push-back or congestion feedback when transmitting frames via XDP.
It's easy to realise when redirecting from a higher speed link into a
slower speed link, or simply two ingress links into a single egress.
The situation can also happen when Ethernet flow control is active.
For testing the patch and provoking the situation to occur on my
Espressobin board, I configured the TX-queue to be smaller (434) than
RX-queue (512) and overload network with large MTU size frames (as a
larger frame takes longer to transmit).
Hopefully the upcoming XDP TX hook can be extended to provide insight
into these TX queue overflows, to allow programmable adaptation
strategies.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Lorenzo Bianconi <lorenzo@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
tehuti: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
r8152: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
net: atlantic: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
bna: bnad: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
net: inet_sock: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
net: ip6_fib: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
net: ip_fib: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
drop_monitor: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
net: mip6: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
netdevice: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
This patchset has changes wrt driver performance optimization,
load time optimization. And a change to PCI device regiatration
table for timestamp device.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Across Cavium's ThunderX and Marvell's OcteonTx2 silicons
the PTP timestamping block's PCI device ID and vendor ID
have remained same but the HW architecture has changed.
Hence added PCI subsystem IDs to the device table to avoid
this driver from being probed on OcteonTx2 silicons.
Signed-off-by: Prakash Brahmajyosyula <bprakash@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Mon, 2 Mar 2020 09:59:01 +0000 (15:29 +0530)]
net: thunderx: Reduce mbox wait response time.
Replace msleep() with usleep_range() as internally it uses hrtimers.
This will put a cap on maximum wait time.
Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Mon, 2 Mar 2020 09:59:00 +0000 (15:29 +0530)]
net: thunderx: Adjust CQE_RX drop levels for better performance
With the current RX RED/DROP levels of 192/184 for CQE_RX, when
packet incoming rate is high, LLC is getting polluted resulting
in more cache misses and higher latency in packet processing. This
slows down the whole process and performance loss. Hence reduced
the levels to 224/216 (ie for a CQ size of 1024, Rx pkts will be
red dropped or dropped when unused CQE are less than 128/160 respectively)
Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
octeontx2: Flow control support and other misc changes
This patch series adds flow control support (802.3 pause frames) and
has other changes wrt generic admin function (AF) driver functionality.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Mon, 2 Mar 2020 07:19:28 +0000 (12:49 +0530)]
octeontx2-af: Modify rvu_reg_poll() to check reg atleast twice
Currently on the first check if the operation is still not
finished, the poll goes to sleep for 2-5 usecs. But if for
some reason (due to other priority stuff like interrupts etc) by
the time the poll wakes up the 10ms time is expired then we don't
check if operation is finished or not and return failure.
This patch modifies poll logic to check HW operation after sleep so
that the status is checked atleast twice.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Mon, 2 Mar 2020 07:19:27 +0000 (12:49 +0530)]
octeontx2-af: Enable PCI master
Bus mastering is enabled by firmware, but when this driver
is unbinded bus mastering gets disabled by the PCI subsystem
which results interrupts not working when driver is reloaded.
Hence set bus mastering everytime in probe().
Also
- Converted pci_set_dma_mask() and pci_set_consistent_dma_mask()
to dma_set_mask_and_coherent().
- Cleared transaction pending bit which gets set during
driver unbind due to clearing of bus mastering (ME bit).
Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sunil Goutham [Mon, 2 Mar 2020 07:19:26 +0000 (12:49 +0530)]
octeontx2-af: Set discovery ID for RVUM block
Currently there is no way for AF dependent drivers in
any domain to check if the AF driver is loaded. This
patch sets an ID for RVUM block which will automatically
reflects in PF/VFs discovery register which they can
check and defer their probe until AF is up.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linu Cherian [Mon, 2 Mar 2020 07:19:25 +0000 (12:49 +0530)]
octeontx2-af: Optimize data retrieval from firmware
For retrieving info like interface MAC addresses, packet
parser key extraction config etc currently a command
is sent to firmware and firmware which periodically polls
for commands, processes these and returns the info.
This is resulting in interface initialization taking lot
of time. To optimize this a memory region is shared between
firmware and this driver, firmware while booting puts
static info like these into that region for driver to
read directly without using commands.
With this
- Logic for retrieving packet parser extraction config
via commands is removed and repalced with using the
shared 'fwdata' structure.
- Now RVU MSIX vector address is also retrieved from this fwdata struct
instead of from CSR. Otherwise when kexec/kdump crash kernel loads
CSR will have a IOVA setup by primary kernel which impacts
RVU PF/VF's interrupts.
- Also added a mbox handler for PF/VF interfaces to retrieve their MAC
addresses from AF.
Signed-off-by: Linu Cherian <lcherian@marvell.com> Signed-off-by: Christina Jacob <cjacob@marvell.com> Signed-off-by: Rakesh Babu <rsaladi2@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Mon, 2 Mar 2020 07:19:24 +0000 (12:49 +0530)]
octeontx2-pf: Support to enable/disable pause frames via ethtool
Added mailbox requests to retrieve backpressure IDs from AF and Aura,
CQ contexts are configured with these BPIDs. So that when resource
levels reach configured thresholds they assert backpressure on the
interface which is also mapped to same BPID.
Also added support to enable/disable pause frames generation via ethtool.
Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Geetha sowjanya [Mon, 2 Mar 2020 07:19:23 +0000 (12:49 +0530)]
octeontx2-af: Pause frame configuration at cgx
CGX LMAC, the physical interface can generate pause frames when
internal resources asserts backpressure due to exhaustion.
This patch configures CGX to generate 802.3 pause frames.
Also enabled processing of received pause frames on the line which
will assert backpressure on the internal transmit path.
Also added mailbox handlers for PF drivers to enable or disable
pause frames anytime.
Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Each of the interface receive channels can be backpressured by
resources upon exhaustion or reaching configured threshold levels.
Resources here are receive buffer queues (Auras) and pkt notification
descriptor queues (CQs). Resources and interface channels are mapped
using backpressure IDs (BPIDs).
HW supports upto 512 BPIDs, this patch divides these BPIDs statically
across CGX/LBK/SDP interfaces as follows.
BPIDs 0 - 191 are mapped to LMAC channels, 16 per LMAC.
BPIDs 192 - 255 are mapped to LBK channels.
BPIDs 256 - 511 are mapped to SDP channels.
Also did the needed basic configuration of BPIDs.
Added mbox handlers with which a PF device can request for a BPID which
it will use to configure Auras and CQs.
Signed-off-by: Geetha sowjanya <gakula@marvell.com> Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
arcnet: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
neighbour: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
net: flow_offload: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
net: dn_fib: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
ndisc: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
net: ipv6: mld: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
net: lwtunnel: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
net: ip6_route: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
net: nexthop: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: sctp: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
net: sock_reuseport: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
====================
net/ethtool: Introduce link_ksettings API for virtual network devices
This series provides an API for drivers of virtual network devices that
allows users to alter initial device speed and duplex settings to reflect
the actual capabilities of underlying hardware. The changes made include
a helper function ethtool_virtdev_set_link_ksettings, which is used to
retrieve alterable link settings. In addition, there is a new ethtool
function defined to validate those settings. These changes resolve code
duplication for existing virtual network drivers that have already
implemented this behavior. In the case of the ibmveth driver, this API is
used to provide this capability for the first time.
---
v7: - removed ethtool_validate_cmd function pointer parameter from
ethtool_virtdev_set_link_ksettings since none of the virtual drivers
pass in a custom validate function as suggested by Michal Kubecek.
v6: - removed netvsc_validate_ethtool_ss_cmd(). netvsc_drv now uses
ethtool_virtdev_validate_cmd() instead as suggested by Michal Kubecek
and approved by Haiyang Zhang.
- matched handler argument name of ethtool_virtdev_set_link_ksettings
in declaration and definition as suggested by Michal Kubecek.
- shortened validate variable assignment in
ethtool_virtdev_set_link_ksettings as suggested by Michal Kubecek.
v5: - virtdev_validate_link_ksettings is taken out of the ethtool global
structure and is instead added as an argument to
ethtool_virtdev_set_link_ksettings as suggested by Jakub Kicinski.
v4: - Cleaned up return statement in ethtool_virtdev_validate_cmd based
off of Michal Kubecek's and Thomas Falcon's suggestion.
- If the netvsc driver is using the VF device in order to get
accelerated networking, the real speed and duplex is reported by using
the VF device as suggested by Stephen Hemminger.
- The speed and duplex variables are now passed by value rather than
passed by pointer as suggested by Willem de Bruijin and Michal
Kubecek.
- Removed ethtool_virtdev_get_link_ksettings since it was too simple
to warrant a helper function.
v3: - Factored out duplicated code to core/ethtool to provide API to
virtual drivers
Cris Forno [Fri, 28 Feb 2020 20:12:05 +0000 (14:12 -0600)]
net/ethtool: Introduce link_ksettings API for virtual network devices
With the ethtool_virtdev_set_link_ksettings function in core/ethtool.c,
ibmveth, netvsc, and virtio now use the core's helper function.
Funtionality changes that pertain to ibmveth driver include:
1. Changed the initial hardcoded link speed to 1GB.
2. Added support for allowing a user to change the reported link
speed via ethtool.
Functionality changes to the netvsc driver include:
1. When netvsc_get_link_ksettings is called, it will defer to the VF
device if it exists to pull accelerated networking values, otherwise
pull default or user-defined values.
2. Similarly, if netvsc_set_link_ksettings called and a VF device
exists, the real values of speed and duplex are changed.
Signed-off-by: Cris Forno <cforno12@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cris Forno [Fri, 28 Feb 2020 20:12:04 +0000 (14:12 -0600)]
ethtool: Factored out similar ethtool link settings for virtual devices to core
Three virtual devices (ibmveth, virtio_net, and netvsc) all have
similar code to set link settings and validate ethtool command. To
eliminate duplication of code, it is factored out into core/ethtool.c.
Signed-off-by: Cris Forno <cforno12@linux.vnet.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
hsr: several code cleanup for hsr module
This patchset is to clean up hsr module code.
1. The first patch is to use debugfs_remove_recursive().
If it uses debugfs_remove_recursive() instead of debugfs_remove(),
hsr_priv() doesn't need to have "node_tbl_file" pointer variable.
2. The second patch is to use extack error message.
If HSR uses the extack instead of netdev_info(), users can get
error messages immediately without any checking the kernel message.
3. The third patch is to use netdev_err() instead of WARN_ONCE().
When a packet is being sent, hsr_addr_subst_dest() is called and
it tries to find the node with the ethernet destination address.
If it couldn't find a node, it warns with WARN_ONCE().
But, using WARN_ONCE() is a little bit overdoing.
So, in this patch, netdev_err() is used instead.
4. The fourth patch is to remove unnecessary rcu_read_{lock/unlock}().
There are some rcu_read_{lock/unlock}() in hsr module and some of
them are unnecessary. In this patch,
these unnecessary rcu_read_{lock/unlock}() will be removed.
5. The fifth patch is to use upper/lower device infrastructure.
netdev_upper_dev_link() is useful to manage lower/upper interfaces.
And this function internally validates looping, maximum depth.
If hsr module uses upper/lower device infrastructure,
it can prevent these above problems.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Fri, 28 Feb 2020 18:02:10 +0000 (18:02 +0000)]
hsr: use upper/lower device infrastructure
netdev_upper_dev_link() is useful to manage lower/upper interfaces.
And this function internally validates looping, maximum depth.
All or most virtual interfaces that could have a real interface
(e.g. macsec, macvlan, ipvlan etc.) use lower/upper infrastructure.
Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Fri, 28 Feb 2020 18:01:56 +0000 (18:01 +0000)]
hsr: remove unnecessary rcu_read_lock() in hsr module
In order to access the port list, the hsr_port_get_hsr() is used.
And this is protected by RTNL and RCU.
The hsr_fill_info(), hsr_check_carrier(), hsr_dev_open() and
hsr_get_max_mtu() are protected by RTNL.
So, rcu_read_lock() in these functions are not necessary.
The hsr_handle_frame() also uses rcu_read_lock() but this function
is called by packet path.
It's already protected by RCU.
So, the rcu_read_lock() in hsr_handle_frame() can be removed.
Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Taehee Yoo [Fri, 28 Feb 2020 18:01:46 +0000 (18:01 +0000)]
hsr: use netdev_err() instead of WARN_ONCE()
When HSR interface is sending a frame, it finds a node with
the destination ethernet address from the list.
If there is no node, it calls WARN_ONCE().
But, using WARN_ONCE() for this situation is a little bit overdoing.
So, in this patch, the netdev_err() is used instead.
Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oleksij Rempel [Fri, 28 Feb 2020 14:50:49 +0000 (15:50 +0100)]
net: ag71xx: port to phylink
The port to phylink was done as close as possible to initial
functionality.
Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de> Acked-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Esben Haabendal [Fri, 28 Feb 2020 07:57:41 +0000 (08:57 +0100)]
net: ll_temac: Add ethtool support for coalesce parameters
Please note that the delays are calculated based on typical
parameters. But as TEMAC is an HDL IP, designs may vary, and future
work might be needed to make this calculation configurable.
Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Esben Haabendal [Fri, 28 Feb 2020 07:57:26 +0000 (08:57 +0100)]
net: ll_temac: Make RX/TX ring sizes configurable
Add support for setting the RX and TX ring sizes for this driver using
ethtool. Also increase the default RX ring size as the previous default
was far too low for good performance in some configurations.
Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Esben Haabendal [Fri, 28 Feb 2020 07:57:12 +0000 (08:57 +0100)]
net: ll_temac: Remove unused start_p variable
The start_p variable was included in the initial commit,
commit 92744989533c ("net: add Xilinx ll_temac device driver"),
but has never had any real use.
Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Esben Haabendal [Fri, 28 Feb 2020 07:56:57 +0000 (08:56 +0100)]
net: ll_temac: Remove unused tx_bd_next struct field
The tx_bd_next field was included in the initial commit,
commit 92744989533c ("net: add Xilinx ll_temac device driver"),
but has never had any real use.
Signed-off-by: Esben Haabendal <esben@geanix.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: sched: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
Paolo Abeni [Fri, 28 Feb 2020 13:45:22 +0000 (14:45 +0100)]
net: datagram: drop 'destructor' argument from several helpers
The only users for such argument are the UDP protocol and the UNIX
socket family. We can safely reclaim the accounted memory directly
from the UDP code and, after the previous patch, we can do scm
stats accounting outside the datagram helpers.
Overall this cleans up a bit some datagram-related helpers, and
avoids an indirect call per packet in the UDP receive path.
v1 -> v2:
- call scm_stat_del() only when not peeking - Kirill
- fix build issue with CONFIG_INET_ESPINTCP
Signed-off-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Kirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
af_unix: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
bonding: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
net: core: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
ipv6: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
net: dccp: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
l2tp: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
Lastly, fix the following checkpatch warning:
CHECK: Prefer kernel type 'u8' over 'uint8_t'
#50: FILE: net/l2tp/l2tp_core.h:119:
+ uint8_t priv[]; /* private data */
net: mpls: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
xdp: Replace zero-length array with flexible-array member
The current codebase makes use of the zero-length array language
extension to the C90 standard, but the preferred mechanism to declare
variable-length types such as these ones is a flexible array member[1][2],
introduced in C99:
struct foo {
int stuff;
struct boo array[];
};
By making use of the mechanism above, we will get a compiler warning
in case the flexible array does not occur last in the structure, which
will help us prevent some kind of undefined behavior bugs from being
inadvertently introduced[3] to the codebase from now on.
Also, notice that, dynamic memory allocations won't be affected by
this change:
"Flexible array members have incomplete type, and so the sizeof operator
may not be applied. As a quirk of the original implementation of
zero-length arrays, sizeof evaluates to zero."[1]
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Acked-by: Jonathan Lemon <jonathan.lemon@gmail.com> Acked-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
The bpf_prog can store specific info to a sk by using bpf_sk_storage.
In other words, a sk can be extended by a bpf_prog.
This series is to support providing bpf_sk_storage data during inet_diag's
dump. The primary target is the usage like iproute2's "ss".
The first two patches are refactoring works in inet_diag to make
adding bpf_sk_storage support easier. The next two patches do
the actual work.
Please see individual patch for details.
v2:
- Add commit message for u16 to u32 change in min_dump_alloc in Patch 4 (Song)
- Add comment to explain the !skb->len check in __inet_diag_dump in Patch 4.
- Do the map->map_type check earlier in Patch 3 for readability.
====================
Martin KaFai Lau [Tue, 25 Feb 2020 23:04:27 +0000 (15:04 -0800)]
bpf: inet_diag: Dump bpf_sk_storages in inet_diag_dump()
This patch will dump out the bpf_sk_storages of a sk
if the request has the INET_DIAG_REQ_SK_BPF_STORAGES nlattr.
An array of SK_DIAG_BPF_STORAGE_REQ_MAP_FD can be specified in
INET_DIAG_REQ_SK_BPF_STORAGES to select which bpf_sk_storage to dump.
If no map_fd is specified, all bpf_sk_storages of a sk will be dumped.
bpf_sk_storages can be added to the system at runtime. It is difficult
to find a proper static value for cb->min_dump_alloc.
This patch learns the nlattr size required to dump the bpf_sk_storages
of a sk. If it happens to be the very first nlmsg of a dump and it
cannot fit the needed bpf_sk_storages, it will try to expand the
skb by "pskb_expand_head()".
Instead of expanding it in inet_sk_diag_fill(), it is expanded at a
sleepable context in __inet_diag_dump() so __GFP_DIRECT_RECLAIM can
be used. In __inet_diag_dump(), it will retry as long as the
skb is empty and the cb->min_dump_alloc becomes larger than before.
cb->min_dump_alloc is bounded by KMALLOC_MAX_SIZE. The min_dump_alloc
is also changed from 'u16' to 'u32' to accommodate a sk that may have
a few large bpf_sk_storages.
The updated cb->min_dump_alloc will also be used to allocate the skb in
the next dump. This logic already exists in netlink_dump().
Here is the sample output of a locally modified 'ss' and it could be made
more readable by using BTF later:
[root@arch-fb-vm1 ~]# ss --bpf-map-id 14 --bpf-map-id 13 -t6an 'dst [::1]:8989'
State Recv-Q Send-Q Local Address:Port Peer Address:PortProcess
ESTAB 0 0 [::1]:51072 [::1]:8989
bpf_map_id:14 value:[ 3feb ]
bpf_map_id:13 value:[ 3f ]
ESTAB 0 0 [::1]:51070 [::1]:8989
bpf_map_id:14 value:[ 3feb ]
bpf_map_id:13 value:[ 3f ]
Martin KaFai Lau [Tue, 25 Feb 2020 23:04:21 +0000 (15:04 -0800)]
bpf: INET_DIAG support in bpf_sk_storage
This patch adds INET_DIAG support to bpf_sk_storage.
1. Although this series adds bpf_sk_storage diag capability to inet sk,
bpf_sk_storage is in general applicable to all fullsock. Hence, the
bpf_sk_storage logic will operate on SK_DIAG_* nlattr. The caller
will pass in its specific nesting nlattr (e.g. INET_DIAG_*) as
the argument.
2. The request will be like:
INET_DIAG_REQ_SK_BPF_STORAGES (nla_nest) (defined in latter patch)
SK_DIAG_BPF_STORAGE_REQ_MAP_FD (nla_put_u32)
SK_DIAG_BPF_STORAGE_REQ_MAP_FD (nla_put_u32)
......
Considering there could have multiple bpf_sk_storages in a sk,
instead of reusing INET_DIAG_INFO ("ss -i"), the user can select
some specific bpf_sk_storage to dump by specifying an array of
SK_DIAG_BPF_STORAGE_REQ_MAP_FD.
If no SK_DIAG_BPF_STORAGE_REQ_MAP_FD is specified (i.e. an empty
INET_DIAG_REQ_SK_BPF_STORAGES), it will dump all bpf_sk_storages
of a sk.
3. The reply will be like:
INET_DIAG_BPF_SK_STORAGES (nla_nest) (defined in latter patch)
SK_DIAG_BPF_STORAGE (nla_nest)
SK_DIAG_BPF_STORAGE_MAP_ID (nla_put_u32)
SK_DIAG_BPF_STORAGE_MAP_VALUE (nla_reserve_64bit)
SK_DIAG_BPF_STORAGE (nla_nest)
SK_DIAG_BPF_STORAGE_MAP_ID (nla_put_u32)
SK_DIAG_BPF_STORAGE_MAP_VALUE (nla_reserve_64bit)
......
4. Unlike other INET_DIAG info of a sk which is pretty static, the size
required to dump the bpf_sk_storage(s) of a sk is dynamic as the
system adding more bpf_sk_storage_map. It is hard to set a static
min_dump_alloc size.
Hence, this series learns it at the runtime and adjust the
cb->min_dump_alloc as it iterates all sk(s) of a system. The
"unsigned int *res_diag_size" in bpf_sk_storage_diag_put()
is for this purpose.
The next patch will update the cb->min_dump_alloc as it
iterates the sk(s).
Martin KaFai Lau [Tue, 25 Feb 2020 23:04:15 +0000 (15:04 -0800)]
inet_diag: Move the INET_DIAG_REQ_BYTECODE nlattr to cb->data
The INET_DIAG_REQ_BYTECODE nlattr is currently re-found every time when
the "dump()" is re-started.
In a latter patch, it will also need to parse the new
INET_DIAG_REQ_SK_BPF_STORAGES nlattr to learn the map_fds. Thus, this
patch takes this chance to store the parsed nlattr in cb->data
during the "start" time of a dump.
By doing this, the "bc" argument also becomes unnecessary
and is removed. Also, the two copies of the INET_DIAG_REQ_BYTECODE
parsing-audit logic between compat/current version can be
consolidated to one.
Martin KaFai Lau [Tue, 25 Feb 2020 23:04:09 +0000 (15:04 -0800)]
inet_diag: Refactor inet_sk_diag_fill(), dump(), and dump_one()
In a latter patch, there is a need to update "cb->min_dump_alloc"
in inet_sk_diag_fill() as it learns the diffierent bpf_sk_storages
stored in a sk while dumping all sk(s) (e.g. tcp_hashinfo).
The inet_sk_diag_fill() currently does not take the "cb" as an argument.
One of the reason is inet_sk_diag_fill() is used by both dump_one()
and dump() (which belong to the "struct inet_diag_handler". The dump_one()
interface does not pass the "cb" along.
This patch is to make dump_one() pass a "cb". The "cb" is created in
inet_diag_cmd_exact(). The "nlh" and "in_skb" are stored in "cb" as
the dump() interface does. The total number of args in
inet_sk_diag_fill() is also cut from 10 to 7 and
that helps many callers to pass fewer args.
In particular,
"struct user_namespace *user_ns", "u32 pid", and "u32 seq"
can be replaced by accessing "cb->nlh" and "cb->skb".
A similar argument reduction is also made to
inet_twsk_diag_fill() and inet_req_diag_fill().
inet_csk_diag_dump() and inet_csk_diag_fill() are also removed.
They are mostly equivalent to inet_sk_diag_fill(). Their repeated
usages are very limited. Thus, inet_sk_diag_fill() is directly used
in those occasions.