Daniel Borkmann [Wed, 24 Jan 2018 09:46:59 +0000 (10:46 +0100)]
Merge branch 'bpf-samples-sockmap-improvements'
John Fastabend says:
====================
The sockmap sample is pretty simple at the moment. All it does is open
a few sockets attach BPF programs/sockmaps and sends a few packets.
However, for testing and debugging I wanted to have more control over
the sendmsg format and data than provided by tools like iperf3/netperf,
etc. The reason is for testing BPF programs and stream parser it is
helpful to be able submit multiple sendmsg calls with different msg
layouts. For example lots of 1B iovs or a single large MB of data, etc.
Additionally, my current test setup requires an entire orchestration
layer (cilium) to run. As well as lighttpd and http traffic generators
or for kafka testing brokers and clients. This makes it a bit more
difficult when doing performance optimizations to incrementally test
small changes and come up with performance delta's and perf numbers.
By adding a few more options and an additional few tests the sockmap
sample program can show a more complete example and do some of the
above. Because the sample program is self contained it doesn't require
additional infrastructure to run either.
This series, although still fairly crude, does provide some nice
additions. They are
- a new sendmsg tests with a sender and recv threads
- a new base tests so we can get metrics/data without BPF
- multiple GBps of throughput on base and sendmsg tests
- automatically set rlimit and common variables
That said the UI is still primitive, more features could be added,
more tests might be useful, the reporting is bare bones, etc. But,
IMO lets push this now rather than sit on it for weeks until I get
time to do the above improvements. Additional patches can address
the other limitations/issues. Another thing I am considering is
moving this into selftests, after a few more fixes so we avoid
false failures, so that we get more sockmap testing.
v2: removed bogus file added by patch 3/7
v3: 1/7 replace goto out with returns, remove sighandler update,
2/7 free iov in error cases
3/7 fix bogus makefile change, bail out early on errors
v4: add Martin's "nits" and ACKs along with fixes to 2/7 iov free
also pointed out by Martin.
Thanks Daniel and Martin for the reviews!
====================
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
John Fastabend [Mon, 22 Jan 2018 18:37:11 +0000 (10:37 -0800)]
bpf: sockmap set rlimit
Avoid extra step of setting limit from cmdline and do it directly in
the program.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
John Fastabend [Mon, 22 Jan 2018 18:36:53 +0000 (10:36 -0800)]
bpf: sockmap put client sockets in blocking mode
Put client sockets in blocking mode otherwise with sendmsg tests
its easy to overrun the socket buffers which results in the test
being aborted.
The original non-blocking was added to handle listen/accept with
a single thread the client/accepted sockets do not need to be
non-blocking.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
John Fastabend [Mon, 22 Jan 2018 18:36:36 +0000 (10:36 -0800)]
bpf: sockmap sample add base test without any BPF for comparison
Add a base test that does not use BPF hooks to test baseline case.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
John Fastabend [Mon, 22 Jan 2018 18:36:19 +0000 (10:36 -0800)]
bpf: sockmap sample, report bytes/sec
Report bytes/sec sent as well as total bytes. Useful to get rough
idea how different configurations and usage patterns perform with
sockmap.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
John Fastabend [Mon, 22 Jan 2018 18:36:02 +0000 (10:36 -0800)]
bpf: sockmap sample, use fork() for send and recv
Currently for SENDMSG tests first send completes then recv runs. This
does not work well for large data sizes and/or many iterations. So
fork the recv and send handler so that we run both send and recv. In
the future we can add a parameter to do more than a single fork of
tx/rx.
With this we can get many GBps of data which helps exercise the
sockmap code.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
John Fastabend [Mon, 22 Jan 2018 18:35:45 +0000 (10:35 -0800)]
bpf: add sendmsg option for testing BPF programs
When testing BPF programs using sockmap I often want to have more
control over how sendmsg is exercised. This becomes even more useful
as new sockmap program types are added.
This adds a test type option to select type of test to run. Currently,
only "ping" and "sendmsg" are supported, but more can be added as
needed.
John Fastabend [Mon, 22 Jan 2018 18:35:27 +0000 (10:35 -0800)]
bpf: refactor sockmap sample program update for arg parsing
sockmap sample program takes arguments from cmd line but it reads them
in using offsets into the array. Because we want to add more arguments
in the future lets do proper argument handling.
Also refactor code to pull apart sock init and ping/pong test. This
allows us to add new tests in the future.
Signed-off-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
selftests/bpf: make 'dubious pointer arithmetic' test useful
mostly revert the previous workaround and make
'dubious pointer arithmetic' test useful again.
Use (ptr - ptr) << const instead of ptr << const to generate large scalar.
The rest stays as before commit 2b36047e7889.
The test incorrectly doing
mkdir /mnt/cgroup-test-work-dirtest-bpf-based-device-cgroup
instead of
mkdir /mnt/cgroup-test-work-dir/test-bpf-based-device-cgroup
somehow such mkdir succeeds and new directory appears:
/mnt/cgroup-test-work-dir/cgroup-test-work-dirtest-bpf-based-device-cgroup
Later cleanup via nftw("/mnt/cgroup-test-work-dir", ...);
doesn't walk this directory.
"rmdir /mnt/cgroup-test-work-dir" succeeds, but bpf program and
dangling cgroup stays in memory.
That's a separate issue on a cgroup side.
For now fix the test.
Fixes: 37f1ba0909df ("selftests/bpf: add a test for device cgroup controller") Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
test_hashmap_walk takes very long time on debug kernel with kasan on.
Reduce the number of iterations in this test without sacrificing
test coverage.
Also add printfs as progress indicator.
Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Yonghong Song [Tue, 23 Jan 2018 06:10:59 +0000 (22:10 -0800)]
tools/bpf: fix a test failure in selftests prog test_verifier
Commit 111e6b45315c ("selftests/bpf: make test_verifier run most programs")
enables tools/testing/selftests/bpf/test_verifier unit cases to run
via bpf_prog_test_run command. With the latest code base,
test_verifier had one test case failure:
The test case does not set return value in the test
structure and hence the return value from the prog run
is assumed to be 0. However, the actual return value is 1.
As a result, the test failed. The fix is to correctly set
the return value in the test structure.
Fixes: 111e6b45315c ("selftests/bpf: make test_verifier run most programs") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Yonghong Song [Tue, 23 Jan 2018 06:53:51 +0000 (22:53 -0800)]
bpf: fix incorrect kmalloc usage in lpm_trie MAP_GET_NEXT_KEY rcu region
In commit b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map"),
the implemented MAP_GET_NEXT_KEY callback function is guarded with rcu read lock.
In the function body, "kmalloc(size, GFP_USER | __GFP_NOWARN)" is used which may
sleep and violate rcu read lock region requirements. This patch fixed the issue
by using GFP_ATOMIC instead to avoid blocking kmalloc. Tested with
CONFIG_DEBUG_ATOMIC_SLEEP=y as suggested by Eric Dumazet.
Fixes: b471f2f1de8b ("bpf: implement MAP_GET_NEXT_KEY command for LPM_TRIE map") Signed-off-by: Yonghong Song <yhs@fb.com> Reported-by: syzbot <syzkaller@googlegroups.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Wei Yongjun [Tue, 23 Jan 2018 02:10:38 +0000 (02:10 +0000)]
net: aquantia: make symbol hw_atl_boards static
Fixes the following sparse warning:
drivers/net/ethernet/aquantia/atlantic/aq_pci_func.c:50:34: warning:
symbol 'hw_atl_boards' was not declared. Should it be static?
Fixes: 4948293ff963 ("net: aquantia: Introduce new AQC devices and capabilities") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Tue, 23 Jan 2018 02:10:27 +0000 (02:10 +0000)]
nfp: fix error return code in nfp_pci_probe()
Fix to return error code -EINVAL instead of 0 when num_vfs above
limit_vfs, as done elsewhere in this function.
Fixes: 0dc786219186 ("nfp: handle SR-IOV already enabled when driver is probing") Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Carl Heymann [Tue, 23 Jan 2018 01:29:43 +0000 (17:29 -0800)]
nfp: fix fw dump handling of absolute rtsym size
Fix bug that causes _absolute_ rtsym sizes of > 8 bytes (as per symbol
table) to result in incorrect space used during a TLV-based debug dump.
Detail: The size calculation stage calculates the correct size (size of
the rtsym address field == 8), while the dump uses the size in the table
to calculate the TLV size to reserve. Symbols with size <= 8 are handled
OK due to aligning sizes to 8, but including any absolute symbol with
listed size > 8 leads to an ENOSPC error during the dump.
Fixes: da762863edd9 ("nfp: fix absolute rtsym handling in debug dump") Signed-off-by: Carl Heymann <carl.heymann@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Mon, 22 Jan 2018 21:49:27 +0000 (13:49 -0800)]
tun: avoid calling xdp_rxq_info_unreg() twice
Similarly to tx ring, xdp_rxq_info is only registered
when !tfile->detached, so we need to avoid calling
xdp_rxq_info_unreg() twice too. The helper tun_cleanup_tx_ring()
already checks for this properly, so it is correct to put
xdp_rxq_info_unreg() just inside there.
Reported-by: syzbot+1c788d7ce0f0888f1d7f@syzkaller.appspotmail.com Fixes: 8565d26bcb2f ("Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net") Cc: Jason Wang <jasowang@redhat.com> Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: sched: add extack support for cls offloads
I've dropped the tests from the series because test_offloads.py changes
will conflict with bpf-next patches. I will send four more patches with
tests once bpf-next is merged back, hopefully still making it into 4.16 :)
v4:
- rebase on top of Alex's changes.
---
Quentin says:
This series tries to improve user experience when eBPF hardware offload
hits error paths at load time. In particular, it introduces netlink
extended ack support in the nfp driver.
To that aim, transmission of the pointer to the extack object is piped
through the `change()` operation of the existing classifiers (patch 1 to
6). Then it is used for TC offload in the nfp driver (patch 8) and in
netdevsim (patch 9, selftest in patch 10). Patch 7 adds a helper to handle
extack messages in the core when TC offload is disabled on the net device.
For completeness extack is propagated for classifiers other than cls_bpf,
but it's up to the drivers to make use of it.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Sat, 20 Jan 2018 01:44:50 +0000 (17:44 -0800)]
nfp: bpf: use extack support to improve debugging
Use the recently added extack support for eBPF offload in the driver.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Sat, 20 Jan 2018 01:44:49 +0000 (17:44 -0800)]
nfp: bpf: plumb extack into functions related to XDP offload
Pass a pointer to an extack object to nfp_app_xdp_offload() in order to
prepare for extack usage in the nfp driver. Next step will be to forward
this extack pointer to nfp_net_bpf_offload(), once this function is able
to use it for printing error messages.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Create a wrapper around tc_can_offload() that takes an additional
extack pointer argument in order to output an error message if TC
offload is disabled on the device.
In this way, the error message is handled by the core and can be the
same for all drivers.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Sat, 20 Jan 2018 01:44:47 +0000 (17:44 -0800)]
net: sched: add extack support for offload via tc_cls_common_offload
Add extack support for hardware offload of classifiers. In order
to achieve this, a pointer to a struct netlink_ext_ack is added to the
struct tc_cls_common_offload that is passed to the callback for setting
up the classifier. Function tc_cls_common_offload_init() is updated to
support initialization of this new attribute.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Sat, 20 Jan 2018 01:44:46 +0000 (17:44 -0800)]
net: sched: cls_bpf: plumb extack support in filter for hardware offload
Pass the extack pointer obtained in the `->change()` filter operation to
cls_bpf_offload() and then to cls_bpf_offload_cmd(). This makes it
possible to use this extack pointer in drivers offloading BPF programs
in a future patch.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Sat, 20 Jan 2018 01:44:45 +0000 (17:44 -0800)]
net: sched: cls_u32: propagate extack support for filter offload
Propagate the extack pointer from the `->change()` classifier operation
to the function used for filter replacement in cls_u32. This makes it
possible to use netlink extack messages in the future at replacement
time for this filter, although it is not used at this point.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Sat, 20 Jan 2018 01:44:44 +0000 (17:44 -0800)]
net: sched: cls_matchall: propagate extack support for filter offload
Propagate the extack pointer from the `->change()` classifier operation
to the function used for filter replacement in cls_matchall. This makes
it possible to use netlink extack messages in the future at replacement
time for this filter, although it is not used at this point.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Quentin Monnet [Sat, 20 Jan 2018 01:44:43 +0000 (17:44 -0800)]
net: sched: cls_flower: propagate extack support for filter offload
Propagate the extack pointer from the `->change()` classifier operation
to the function used for filter replacement in cls_flower. This makes it
possible to use netlink extack messages in the future at replacement
time for this filter, although it is not used at this point.
Signed-off-by: Quentin Monnet <quentin.monnet@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Haiyang Zhang [Fri, 19 Jan 2018 20:26:43 +0000 (13:26 -0700)]
hv_netvsc: Use the num_online_cpus() for channel limit
Since we no longer localize channel/CPU affiliation within one NUMA
node, num_online_cpus() is used as the number of channel cap, instead of
the number of processors in a NUMA node.
This patch allows a bigger range for tuning the number of channels.
Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Salil Mehta [Fri, 19 Jan 2018 15:20:53 +0000 (15:20 +0000)]
net: hns3: converting spaces into tabs to avoid checkpatch.pl warning
Spaces were mistakenly used instead of tabs in some of the code related
to reset functionality, which caused checkpatch.pl errors. These were
missed earlier so fixing them now.
Signed-off-by: Salil Mehta <salil.mehta@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arjun Vynipadath [Fri, 19 Jan 2018 09:41:48 +0000 (15:11 +0530)]
cxgb3: assign port id to net_device->dev_port
T3 devices have different ports on same PCI function,
so using dev_port to identify ports.
Signed-off-by: Arjun Vynipadath <arjun@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bridge: return boolean instead of integer in br_multicast_is_router
Return statements in functions returning bool should use
true/false instead of 1/0.
This issue was detected with the help of Coccinelle.
Fixes: 85b352693264 ("bridge: Fix build error when IGMP_SNOOPING is not enabled") Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Thu, 18 Jan 2018 23:12:21 +0000 (15:12 -0800)]
net: stmmac: Fix reception of Broadcom switches tags
Broadcom tags inserted by Broadcom switches put a 4 byte header after
the MAC SA and before the EtherType, which may look like some sort of 0
length LLC/SNAP packet (tcpdump and wireshark do think that way). With
ACS enabled in stmmac the packets were truncated to 8 bytes on
reception, whereas clearing this bit allowed normal reception to occur.
In order to make that possible, we need to pass a net_device argument to
the different core_init() functions and we are dependent on the Broadcom
tagger padding packets correctly (which it now does). To be as little
invasive as possible, this is only done for gmac1000 when the network
device is DSA-enabled (netdev_uses_dsa() returns true).
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 22 Jan 2018 21:05:50 +0000 (16:05 -0500)]
Merge branch 'hns3-new-features'
Peng Li says:
====================
add some features to hns3 driver
This patchset adds some features to hns3 driver, include the support
for ethtool command -d, -p and support for manager table.
[Patch 1/4] adds support for ethtool command -d, its ops is get_regs.
driver will send command to command queue, and get regs number and
regs value from command queue.
[Patch 2/4] adds manager table initialization for hardware.
[Patch 3/4] adds support for ethtool command -p. For fiber ports, driver
sends command to command queue, and IMP will write SGPIO regs to control
leds.
[Patch 4/4] adds support for net status led for fiber ports. Net status
include port speed, total rx/tx packets and link status. Driver send
the status to command queue, and IMP will write SGPIO to control leds.
---
Change log:
V1 -> V2:
1, fix comments from Andrew Lunn, remove the patch "net: hns3: add
ethtool -p support for phy device".
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Fuyun Liang [Fri, 19 Jan 2018 06:41:10 +0000 (14:41 +0800)]
net: hns3: add manager table initialization for hardware
The manager table is empty by default. If it is not initialized, the
management pkgs like LLDP will be dropped by hardware. Default entries
need to be added to manager table.
Signed-off-by: Fuyun Liang <liangfuyun1@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Decotigny [Thu, 18 Jan 2018 17:59:13 +0000 (09:59 -0800)]
net: core: Expose number of link up/down transitions
Expose the number of times the link has been going UP or DOWN, and
update the "carrier_changes" counter to be the sum of these two events.
While at it, also update the sysfs-class-net documentation to cover:
carrier_changes (3.15), carrier_up_count (4.16) and carrier_down_count
(4.16)
Signed-off-by: David Decotigny <decot@googlers.com>
[Florian:
* rebase
* add documentation
* merge carrier_changes with up/down counters] Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sabrina Dubroca [Thu, 18 Jan 2018 16:48:18 +0000 (17:48 +0100)]
macsec: restore uAPI after addition of GCM-AES-256
Commit ccfdec908922 ("macsec: Add support for GCM-AES-256 cipher suite")
changed a few values in the uapi headers for MACsec.
Because of existing userspace implementations, we need to preserve the
value of MACSEC_DEFAULT_CIPHER_ID. Not doing that resulted in
wpa_supplicant segfaults when a secure channel was created using the
default cipher. Thus, swap MACSEC_DEFAULT_CIPHER_{ID,ALT} back to their
original values.
Changing the maximum length of the MACSEC_SA_ATTR_KEY attribute is
unnecessary, as the previous value (MACSEC_MAX_KEY_LEN, which was 128B)
is large enough to carry 32-bytes keys. This patch reverts
MACSEC_MAX_KEY_LEN to 128B and restores the old length check on
MACSEC_SA_ATTR_KEY.
Fixes: ccfdec908922 ("macsec: Add support for GCM-AES-256 cipher suite") Signed-off-by: Davide Caratti <dcaratti@redhat.com> Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Thu, 18 Jan 2018 02:37:34 +0000 (10:37 +0800)]
net: hns: Fix for variable may be used uninitialized warnings
When !CONFIG_REGMAP hns throws compiler warnings since
dsaf_read_syscon ignores the return result from regmap_read,
which allows val to be uninitialized.
Fixes: 86897c960b49 ("net: hns: add syscon operation for dsaf") Reported-by: Jason Gunthorpe <jgg@ziepe.ca> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This change converts existing per-cpu stats structure into per-queue one.
This should not impact on performance since each queue counter is not
updated concurrently by multiple cpus.
Performance numbers:
- Guest has 2 vcpus and 2 queues
- Guest runs netserver
- Host runs 100-flow super_netperf
====================
Armada 7k/8k PP2 ACPI support
I quickly resend the series, thanks to Antoine Tenart's remark,
who spotted !CONFIG_ACPI compilation issue after introducing
the new fwnode_irq_get() routine. Please see the details in the changelog
below and the 3/7 commit log.
mvpp2 driver can work with the ACPI representation, as exposed
on a public branch:
https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/commits/marvell-armada-wip
It was compiled together with the most recent Tianocore EDK2 revision.
Please refer to the firmware build instruction on MacchiatoBin board:
http://wiki.macchiatobin.net/tiki-index.php?page=Build+from+source+-+UEFI+EDK+II
ACPI representation of PP2 controllers (withouth PHY support) can
be viewed in the github:
* MacchiatoBin:
https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/blob/71ae395da1661374b0f07d1602afb1eee56e9794/Platforms/Marvell/Armada/AcpiTables/Armada80x0McBin/Dsdt.asl#L201
* Armada 7040 DB:
https://github.com/MarvellEmbeddedProcessors/edk2-open-platform/blob/71ae395da1661374b0f07d1602afb1eee56e9794/Platforms/Marvell/Armada/AcpiTables/Armada70x0/Dsdt.asl#L131
I will appreciate any comments or remarks.
Best regards,
Marcin
Changelog:
v3 -> v4:
* 3/7
- add new macro (ACPI_HANDLE_FWNODE) and fix
compilation with !CONFIG_ACPI
- extend commit log and mention usability of fwnode_irq_get
for the child nodes as well
v1 -> v2:
* Remove MDIO patches
* Use PP2 ports only with link interrupts
* Release second region resources in mvpp2 driver (code moved from
mvmdio), as explained in details in 5/5 commit message.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Thu, 18 Jan 2018 12:31:44 +0000 (13:31 +0100)]
net: mvpp2: enable ACPI support in the driver
This patch introduces an alternative way of obtaining resources - via
ACPI tables provided by firmware. Enabling coexistence with the DT
support, in addition to the OF_*->device_*/fwnode_* API replacement,
required following steps to be taken:
* Add mvpp2_acpi_match table
* Omit clock configuration and obtain tclk from the property - in ACPI
world, the firmware is responsible for clock maintenance.
* Disable comphy and syscon handling as they are not available for ACPI.
* Modify way of obtaining interrupts - use newly introduced
fwnode_irq_get() routine
* Until proper MDIO bus and PHY handling with ACPI is established in the
kernel, use only link interrupts feature in the driver. For the RGMII
port it results in depending on GMAC settings done during firmware
stage.
* When booting with ACPI MVPP2_QDIST_MULTI_MODE is picked by
default, as there is no need to keep any kind of the backward
compatibility.
Moreover, a memory region used by mvmdio driver is usually placed in
the middle of the address space of the PP2 network controller.
The MDIO base address is obtained without requesting memory region
(by devm_ioremap() call) in mvmdio.c, later overlapping resources are
requested by the network driver, which is responsible for avoiding
a concurrent access.
In case the MDIO memory region is declared in the ACPI, it can
already appear as 'in-use' in the OS. Because it is overlapped by second
region of the network controller, make sure it is released, before
requesting it again. The care is taken by mvpp2 driver to avoid
concurrent access to this memory region.
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Thu, 18 Jan 2018 12:31:43 +0000 (13:31 +0100)]
net: mvpp2: use device_*/fwnode_* APIs instead of of_*
OF functions can be used only for the driver using DT.
As a preparation for introducing ACPI support in mvpp2
driver, use struct fwnode_handle in order to obtain
properties from the hardware description.
This patch replaces of_* function with device_*/fwnode_*
where possible in the mvpp2.
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Thu, 18 Jan 2018 12:31:42 +0000 (13:31 +0100)]
net: mvpp2: simplify maintaining enabled ports' list
'port_count' field of the mvpp2 structure holds an overall amount
of available ports, based on DT nodes status. In order to be prepared
to support other HW description, obtain the value by incrementing it
upon each successful port initialization. This allowed for simplifying
port indexing in the controller's private array, whose size is now not
dynamically allocated, but fixed to MVPP2_MAX_PORTS.
This patch simplifies creating and filling list of enabled ports and
is a part of the preparation for adding ACPI support in the mvpp2 driver.
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Thu, 18 Jan 2018 12:31:41 +0000 (13:31 +0100)]
device property: Allow iterating over available child fwnodes
Implement a new helper function fwnode_get_next_available_child_node(),
which enables obtaining next enabled child fwnode, which
works on a similar basis to OF's of_get_next_available_child().
This commit also introduces a macro, thanks to which it is
possible to iterate over the available fwnodes, using the
new function described above.
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Thu, 18 Jan 2018 12:31:40 +0000 (13:31 +0100)]
device property: Introduce fwnode_irq_get()
Until now there were two very similar functions allowing
to get Linux IRQ number from ACPI handle (acpi_irq_get())
and OF node (of_irq_get()). The first one appeared to be used
only as a subroutine of platform_irq_get(), which (in the generic
code) limited IRQ obtaining from _CRS method only to nodes
associated to kernel's struct platform_device.
This patch introduces a new helper routine - fwnode_irq_get(),
which allows to get the IRQ number directly from the fwnode
to be used as common for OF/ACPI worlds. It is usable not
only for the parents fwnodes, but also for the child nodes
comprising their own _CRS methods with interrupts description.
In order to be able o satisfy compilation with !CONFIG_ACPI
and also simplify the new code, introduce a helper macro
(ACPI_HANDLE_FWNODE), with which it is possible to reach
an ACPI handle directly from its fwnode.
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marcin Wojtas [Thu, 18 Jan 2018 12:31:39 +0000 (13:31 +0100)]
device property: Introduce fwnode_get_phy_mode()
Until now there were two almost identical functions for
obtaining network PHY mode - of_get_phy_mode() and,
more generic, device_get_phy_mode(). However it is not uncommon,
that the network interface is represented as a child
of the actual controller, hence it is not associated
directly to any struct device, required by the latter
routine.
This commit allows for getting the PHY mode for
children nodes in the ACPI world by introducing a new function -
fwnode_get_phy_mode(). This commit also changes
device_get_phy_mode() routine to be its wrapper, in order
to prevent unnecessary duplication.
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Until now there were two almost identical functions for
obtaining MAC address - of_get_mac_address() and, more generic,
device_get_mac_address(). However it is not uncommon,
that the network interface is represented as a child
of the actual controller, hence it is not associated
directly to any struct device, required by the latter
routine.
This commit allows for getting the MAC address for
children nodes in the ACPI world by introducing a new function -
fwnode_get_mac_address(). This commit also changes
device_get_mac_address() routine to be its wrapper, in order
to prevent unnecessary duplication.
Signed-off-by: Marcin Wojtas <mw@semihalf.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ganesh Goudar [Mon, 22 Jan 2018 13:18:26 +0000 (18:48 +0530)]
cxgb4: add geneve offload support for T6
Add geneve segmentation offload support of T6 cards.
Original work by: Santosh Rastapur <santosh@chelsio.com> Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 22 Jan 2018 14:36:37 +0000 (09:36 -0500)]
Merge tag 'mac80211-next-for-davem-2018-01-22' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211-next
Johannes Berg says:
====================
Less than a handful of changes:
* possible memory leak fix in hwsim
* speed up hwsim
* add hwsim userspace rate control API
* code cleanups
====================
A conflict was resolved in mac80211_hwsim.c, mostly of
the simple overlapping changes category. One adding
a rhashtable and another adding a workqueue.
Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Mon, 22 Jan 2018 10:31:19 +0000 (10:31 +0000)]
devlink: fix memory leak on 'resource'
Currently, if the call to devlink_resource_find returns null then
the error exit path does not free the devlink_resource 'resource'
and a memory leak occurs. Fix this by kfree'ing resource on the
error exit path.
Detected by CoverityScan, CID#1464184 ("Resource leak")
Fixes: d9f9b9a4d05f ("devlink: Add support for resource abstraction") Signed-off-by: Colin Ian King <colin.king@canonical.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
mlxsw: spectrum_router: Optimize LPM trees
Ido says:
This set tries to optimize the structure of the LPM trees used for route
lookup by avoiding lookups that are guaranteed not to return a result.
This is done by making sure only used prefix lengths are present in the
tree.
First two patches are small preparatory steps towards the actual change
in the last patch.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 22 Jan 2018 08:17:42 +0000 (09:17 +0100)]
mlxsw: spectrum_router: Remove unnecessary prefix lengths from LPM tree
In commit fc922bb0dd94 ("mlxsw: spectrum_router: Use one LPM tree for
all virtual routers") I tried to make sure only used prefix lengths are
present in the LPM tree shared between all virtual routers.
However, this optimization had to be removed in commit a69518cf0b4c
("mlxsw: spectrum_router: Avoid expensive lookup during route removal"),
since determining the used prefix lengths required us to traverse all
the active virtual routers, which could result in a hung task depending
on the number of VRFs and whether routes were removed due to abort or
not.
Re-introduce the optimization by moving the prefix usage accounting from
the virtual routers to the LPM tree, as this accounting is only used in
order to determine the tree's structure.
To make the sharing of the trees more explicit, the two trees (for IPv4
and IPv6) are stored in the shared router struct and upon the creation
of a virtual router it is immediately bound to both.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Mon, 22 Jan 2018 08:17:40 +0000 (09:17 +0100)]
mlxsw: spectrum_router: Use the nodes list as indication for empty FIB
Currently, each FIB (IPv4 / IPv6) in a virtual router holds a prefix
usage that is used to choose a matching LPM tree, but also to check if
the FIB is empty, so that the LPM tree could be unbound.
Next patches will remove the reliance on the per-FIB prefix usage for
LPM tree matching. Keeping it only to check if the FIB is empty is a
waste, since we can use the nodes ({Prefix, Length}) list instead.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jason Wang [Mon, 22 Jan 2018 02:55:38 +0000 (10:55 +0800)]
tun: add missing rcu annotation
This patch fixes the following sparse warnings:
drivers/net/tun.c:2241:15: error: incompatible types in comparison expression (different address spaces)
Fixes: cd5681d7d890 ("tuntap: rename struct tun_steering_prog to struct tun_prog") Cc: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
mlxsw: Add support for mirror action with flower
Arkadi says:
Add support for mirror action with flower classifier. The first 3 patches
introduce a generic per-block resource infra. The last 4 patches add
support for flow based span.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_acl: Add support for mirror action
Add support for mirror action. Only one mirror action can be set per rule.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum: Extend mlxsw_afa_ops for counter index and implement for Spectrum
Introduce extension of mlxsw_afa_ops in order to add/del mirroring and
implement the ops for Spectrum.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Extend SPAN API for ACL case. In case of ACL triggering the MPAR register
shouldn't be configured. This patch also export those helpers for
ACL usage.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_acl: Add support for mirroring action
The patch extends the trap action for mirroring.
Signed-off-by: Arkadi Sharshevsky <arkadis@mellanox.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 19 Jan 2018 08:24:48 +0000 (09:24 +0100)]
mlxsw: core: Make counter index allocated inside the action append
So far, the caller of mlxsw_afa_block_append_counter needed to allocate
counter index by hand. Benefit from the previously introduced resource
infra and counter_index_get/put callbacks, and allocate the counter
index in place where it is needed, inside the action append function.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 19 Jan 2018 08:24:47 +0000 (09:24 +0100)]
mlxsw: core: Convert fwd_entry_ref list to be generic per-block resource list
Since the resource list needs to be used also for other entries different
to fwd_entry_ref, make the list generic. For that purpose, introduce a
resource structure with couple of helpers that the code which need to
store a per-block resource should use.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Aquantia atlantic driver new devices support
This patchset introduces a support for new Aquantia hardware:
AQC11x family with updated hardware (B1) and firmware (2.x and 3.x branches).
For that, a number of improvements in overall driver model were done:
- Firmware specific ops tables. Firmware 2.x and 3.x series support
functions are now in separate fw2x module.
- PCI module cleanup and simplification done.
- Verified and tested hardware reset process.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Fri, 19 Jan 2018 14:03:26 +0000 (17:03 +0300)]
net: aquantia: Introduce global AQC hardware reset sequence
The detailed reset sequence ensures all HW components are in aligned
state before NIC startup. It also supports cards with signed firmware (RBL)
and checks if their FW is valid.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Fri, 19 Jan 2018 14:03:24 +0000 (17:03 +0300)]
net: aquantia: Introduce firmware ops callbacks
New AQC cards will have an updated firmware with new binary interface.
This patch extracts firmware specific operations into a separate table
and prepares for the introduction of new fw 2.x and 3.x
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Fri, 19 Jan 2018 14:03:21 +0000 (17:03 +0300)]
net: aquantia: Cleanup pci functions module
Driver contained a dead code of maintaining multiple pci port instances.
That will never be used since for each pci function a separate NIC
instance is created.
Simplify this, making pci module only responsible for pci resource
management.
NIC initialization is also simplified accordingly.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Fri, 19 Jan 2018 14:03:19 +0000 (17:03 +0300)]
net: aquantia: Introduce new AQC devices and capabilities
A number of new AQC devices is going to be released. To support more
flexible capabilities management a number of static caps instances is now
declared. Devices now are mainly differs by supported speeds, but in future
more parameters will be customized. A set of AQC100 devices have
fibre media, not twisted pair - this is also reflected in
new capabilities definitions.
HW level also now directly exports hw_ops for each of A0/B0 hardware.
PCI configuration now uses a device configuration table where each
device ID is explicitly mapped with hardware OPs and capabilities
structures.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Fri, 19 Jan 2018 14:03:18 +0000 (17:03 +0300)]
net: aquantia: Introduce new device ids and constants
New set of aquantia devices has an upgraded hardware (B1).
The hardware interface is identical to B0. The difference will
be in firmware which is incompatible with old one.
Reorganized and removed duplicate speed and devid definitions
Introduced explicit flow control configuration defines
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 21 Jan 2018 23:13:23 +0000 (18:13 -0500)]
Merge tag 'mlx5-updates-2018-01-19' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mahameed says:
====================
mlx5-updates-2018-01-19
From: Or Gerlitz <ogerlitz@mellanox.com>
=======
First six patches of this series further enhances the mlx5 hairpin support.
The first two patches deal with using different hairpin instances
for flows whose packets have different priorities to align with the port
TX QoS model. The next four patches allow us to do HW spreading
of flows over a set of hairpin pairs using RSS. The last two patches
change the driver to also set the size of the HW hairpin queues.
========
Next four patches from Eran Ben Elisha <eranbe@mellanox.com>:
Add more debug data for TX timeout handling, and further enhance and optimize
TX timeout handling upon lost interrupts, which adds a mechanism for explicitly
polling EQ in case of a TX timeout in order to recover from a lost interrupt.
If this is not the case (no pending EQEs), perform a channels full recovery as
usual.
From Kamal Heib <kamalh@mellanox.com>, Two patches to extend the stats group API
to have an update_stats() callback which will be used to fetch the hardware or
software counters data, this will improve the current API and reduce code
duplication.
From Gal Pressman <galp@mellanox.com>, Last patch, Add likely to the common RX checksum
flow.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Previously it was possible to interrupt processing stats updates because
they were handled in a work queue. Interrupting the stats updates could
lead to a situation where we backup the control message queue. This patch
moves the stats update processing out of the work queue to be processed as
soon as hardware sends a request.
Reported-by: Louis Peens <louis.peens@netronome.com> Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Dirk van der Merwe <dirk.vandermerwe@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Walleij [Sun, 21 Jan 2018 13:15:41 +0000 (14:15 +0100)]
net: gemini: Depend on HAS_IOMEM
The zeroday builder notices that since Usermode Linux does not
have IO memory, the build fails for them when selecting everything
it can enable.
As the driver is clearly using memory-mapped registers to access
the network adapter, we add depends on HAS_IOMEM to solve this
problem.
Reported-by: kbuild test robot <fengguang.wu@intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Netfilter/IPVS updates for net-next
The following patchset contains Netfilter/IPVS updates for your net-next
tree. Basically, a new extension for ip6tables, simplification work of
nf_tables that saves us 500 LoC, allow raw table registration before
defragmentation, conversion of the SNMP helper to use the ASN.1 code
generator, unique 64-bit handle for all nf_tables objects and fixes to
address fallout from previous nf-next batch. More specifically, they
are:
1) Seven patches to remove family abstraction layer (struct nft_af_info)
in nf_tables, this simplifies our codebase and it saves us 64 bytes per
net namespace.
2) Add IPv6 segment routing header matching for ip6tables, from Ahmed
Abdelsalam.
3) Allow to register iptable_raw table before defragmentation, some
people do not want to waste cycles on defragmenting traffic that is
going to be dropped, hence add a new module parameter to enable this
behaviour in iptables and ip6tables. From Subash Abhinov
Kasiviswanathan. This patch needed a couple of follow up patches to
get things tidy from Arnd Bergmann.
4) SNMP helper uses the ASN.1 code generator, from Taehee Yoo. Several
patches for this helper to prepare this change are also part of this
patch series.
5) Add 64-bit handles to uniquely objects in nf_tables, from Harsha
Sharma.
6) Remove log message that several netfilter subsystems print at
boot/load time.
7) Restore x_tables module autoloading, that got broken in a previous
patch to allow singleton NAT hook callback registration per hook
spot, from Florian Westphal. Moreover, return EBUSY to report that
the singleton NAT hook slot is already in instead.
8) Several fixes for the new nf_tables flowtable representation,
including incorrect error check after nf_tables_flowtable_lookup(),
missing Kconfig dependencies that lead to build breakage and missing
initialization of priority and hooknum in flowtable object.
9) Missing NETFILTER_FAMILY_ARP dependency in Kconfig for the clusterip
target. This is due to recent updates in the core to shrink the hook
array size and compile it out if no specific family is enabled via
.config file. Patch from Florian Westphal.
10) Remove duplicated include header files, from Wei Yongjun.
11) Sparse warning fix for the NFPROTO_INET handling from the core
due to missing static function definition, also from Wei Yongjun.
12) Restore ICMPv6 Parameter Problem error reporting when
defragmentation fails, from Subash Abhinov Kasiviswanathan.
13) Remove obsolete owner field initialization from struct
file_operations, patch from Alexey Dobriyan.
14) Use boolean datatype where needed in the Netfilter codebase, from
Gustavo A. R. Silva.
15) Remove double semicolon in dynset nf_tables expression, from
Luis de Bethencourt.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The following pull-request contains BPF updates for your *net-next* tree.
The main changes are:
1) bpf array map HW offload, from Jakub.
2) support for bpf_get_next_key() for LPM map, from Yonghong.
3) test_verifier now runs loaded programs, from Alexei.
4) xdp cpumap monitoring, from Jesper.
5) variety of tests, cleanups and small x64 JIT optimization, from Daniel.
6) user space can now retrieve HW JITed program, from Jiong.
Note there is a minor conflict between Russell's arm32 JIT fixes
and removal of bpf_jit_enable variable by Daniel which should
be resolved by keeping Russell's comment and removing that variable.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The BPF verifier conflict was some minor contextual issue.
The TUN conflict was less trivial. Cong Wang fixed a memory leak of
tfile->tx_array in 'net'. This is an skb_array. But meanwhile in
net-next tun changed tfile->tx_arry into tfile->tx_ring which is a
ptr_ring.
Signed-off-by: David S. Miller <davem@davemloft.net>
====================
This series adds various misc improvements to BPF: detection
of BPF helper definition misconfiguration for mem/size argument
pairs, csum_diff helper also for XDP, various test cases,
removal of the recently added pure_initcall(), restriction
of the jit sysctls to cap_sys_admin for initns, a minor size
improvement for x86 jit in alu ops, output of complexity limit
to verifier log and last but not least having the event output
more flexible with moving to const_size_or_zero type.
Daniel Borkmann [Sat, 20 Jan 2018 00:24:37 +0000 (01:24 +0100)]
bpf: move event_output to const_size_or_zero for xdp/skb as well
Similar rationale as in a60dd35d2e39 ("bpf: change bpf_perf_event_output
arg5 type to ARG_CONST_SIZE_OR_ZERO"), change the type to CONST_SIZE_OR_ZERO
such that we can better deal with optimized code. No changes needed in
bpf_event_output() as it can also deal with 0 size entirely (e.g. as only
wake-up signal with empty frame in perf RB, or packet dumps w/o meta data
as another such possibility).
Daniel Borkmann [Sat, 20 Jan 2018 00:24:36 +0000 (01:24 +0100)]
bpf: add upper complexity limit to verifier log
Given the limit could potentially get further adjustments in the
future, add it to the log so it becomes obvious what the current
limit is w/o having to check the source first. This may also be
helpful for debugging complexity related issues on kernels that
backport from upstream.
Daniel Borkmann [Sat, 20 Jan 2018 00:24:35 +0000 (01:24 +0100)]
bpf, x86: small optimization in alu ops with imm
For the BPF_REG_0 (BPF_REG_A in cBPF, respectively), we can use
the short form of the opcode as dst mapping is on eax/rax and
thus save a byte per such operation. Added to add/sub/and/or/xor
for 32/64 bit when K immediate is used. There may be more such
low-hanging fruit to add in future as well.
Daniel Borkmann [Sat, 20 Jan 2018 00:24:34 +0000 (01:24 +0100)]
bpf: restrict access to core bpf sysctls
Given BPF reaches far beyond just networking these days, it was
never intended to allow setting and in some cases reading those
knobs out of a user namespace root running without CAP_SYS_ADMIN,
thus tighten such access.
Also the bpf_jit_enable = 2 debugging mode should only be allowed
if kptr_restrict is not set since it otherwise can leak addresses
to the kernel log. Dump a note to the kernel log that this is for
debugging JITs only when enabled.
Daniel Borkmann [Sat, 20 Jan 2018 00:24:33 +0000 (01:24 +0100)]
bpf: get rid of pure_initcall dependency to enable jits
Having a pure_initcall() callback just to permanently enable BPF
JITs under CONFIG_BPF_JIT_ALWAYS_ON is unnecessary and could leave
a small race window in future where JIT is still disabled on boot.
Since we know about the setting at compilation time anyway, just
initialize it properly there. Also consolidate all the individual
bpf_jit_enable variables into a single one and move them under one
location. Moreover, don't allow for setting unspecified garbage
values on them.
Daniel Borkmann [Sat, 20 Jan 2018 00:24:31 +0000 (01:24 +0100)]
bpf: add couple of test cases for signed extended imms
Add a couple of test cases for interpreter and JIT that are
related to an issue we faced some time ago in Cilium [1],
which is fixed in LLVM with commit e53750e1e086 ("bpf: fix
bug on silently truncating 64-bit immediate").
Test cases were run-time checking kernel to behave as intended
which should also provide some guidance for current or new
JITs in case they should trip over this. Added for cBPF and
eBPF.
I've seen two patch proposals now for helper additions that used
ARG_PTR_TO_MEM or similar in reg_X but no corresponding ARG_CONST_SIZE
in reg_X+1. Verifier won't complain in such case, but it will omit
verifying the memory passed to the helper thus ending up badly.
Detect such buggy helper function signature and bail out during
verification rather than finding them through review.
samples/bpf: xdp_monitor include cpumap tracepoints in monitoring
The xdp_redirect_cpu sample have some "builtin" monitoring of the
tracepoints for xdp_cpumap_*, but it is practical to have an external
tool that can monitor these transpoint as an easy way to troubleshoot
an application using XDP + cpumap.
Specifically I need such external tool when working on Suricata and
XDP cpumap redirect. Extend the xdp_monitor tool sample with
monitoring of these xdp_cpumap_* tracepoints. Model the output format
like xdp_redirect_cpu.
Given I needed to handle per CPU decoding for cpumap, this patch also
add per CPU info on the existing monitor events. This resembles part
of the builtin monitoring output from sample xdp_rxq_info. Thus, also
covering part of that sample in an external monitoring tool.
Performance wise, the cpumap tracepoints uses bulking, which cause
them to have very little overhead. Thus, they are enabled by default.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>