octeontx2-pf: cn10k: Config DWRR weight based on MTU
Program SQ, MDQ, TL4 to TL2 transmit scheduler queues' DWRR
weight based on DWRR MTU programmed at NIX_AF_DWRR_RPM_MTU.
The DWRR MTU from admin function is retrieved via mbox.
On OcteaonTx2 silicon, admin function driver responds with DWRR
MTU as '1'. This helps to avoid silicon specific transmit
scheduler DWRR quantum/weight configuration logic.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
On OcteonTx2 DWRR quantum is directly configured into each of
the transmit scheduler queues. And PF/VF drivers were free to
config any value upto 2^24.
On CN10K, HW is modified, the quantum configuration at scheduler
queues is in terms of weight. And SW needs to setup a base DWRR MTU
at NIX_AF_DWRR_RPM_MTU / NIX_AF_DWRR_SDP_MTU. HW will do
'DWRR MTU * weight' to get the quantum. For LBK traffic, value
programmed into NIX_AF_DWRR_RPM_MTU register is considered as
DWRR MTU.
This patch programs a default DWRR MTU of 8192 into HW and also
provides a way to change this via devlink params.
Signed-off-by: Sunil Goutham <sgoutham@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dust Li [Fri, 30 Jul 2021 03:41:55 +0000 (11:41 +0800)]
selftests/net: remove min gso test in packet_snd
This patch removed the 'raw gso min size - 1' test which
always fails now:
./in_netns.sh ./psock_snd -v -c -g -l "${mss}"
raw gso min size - 1 (expected to fail)
tx: 1524
rx: 1472
OK
After commit 7c6d2ecbda83 ("net: be more gentle about silly
gso requests coming from user"), we relaxed the min gso_size
check in virtio_net_hdr_to_skb().
So when a packet which is smaller then the gso_size,
GSO for this packet will not be set, the packet will be
send/recv successfully.
Signed-off-by: Dust Li <dust.li@linux.alibaba.com> Reviewed-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Fri, 30 Jul 2021 02:19:11 +0000 (10:19 +0800)]
bonding: 3ad: fix the concurrency between __bond_release_one() and bond_3ad_state_machine_handler()
Some time ago, I reported a calltrace issue
"did not find a suitable aggregator", please see[1].
After a period of analysis and reproduction, I find
that this problem is caused by concurrency.
Before the problem occurs, the bond structure is like follows:
step1: already removed slaver1(eth1) from list, but port1 remains
step2: receive a lacpdu and update port0
step3: port0 will be removed from agg0.lag_ports. The struct is
"agg0.lag_ports -> port1" now, and agg0 is not free. At the
same time, slaver1/agg1 has been removed from the list by step1.
So we can't find a free aggregator now.
step4: can't find suitable aggregator because of step2
step5: cause a calltrace since port->aggregator is NULL
To solve this concurrency problem, put bond_upper_dev_unlink()
after bond_3ad_unbind_slave(). In this way, we can invalid the port
first and skip this port in bond_3ad_state_machine_handler(). This
eliminates the situation that the slaver has been removed from the
list but the port is still valid.
Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Acked-by: Jay Vosburgh <jay.vosburgh@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cong Wang [Thu, 29 Jul 2021 23:12:14 +0000 (16:12 -0700)]
net_sched: refactor TC action init API
TC action ->init() API has 10 parameters, it becomes harder
to read. Some of them are just boolean and can be replaced
by flags. Similarly for the internal API tcf_action_init()
and tcf_exts_validate().
This patch converts them to flags and fold them into
the upper 16 bits of "flags", whose lower 16 bits are still
reserved for user-space. More specifically, the following
kernel flags are introduced:
TCA_ACT_FLAGS_POLICE replace 'name' in a few contexts, to
distinguish whether it is compatible with policer.
TCA_ACT_FLAGS_BIND replaces 'bind', to indicate whether
this action is bound to a filter.
TCA_ACT_FLAGS_REPLACE replaces 'ovr' in most contexts,
means we are replacing an existing action.
TCA_ACT_FLAGS_NO_RTNL replaces 'rtnl_held' but has the
opposite meaning, because we still hold RTNL in most
cases.
The only user-space flag TCA_ACT_FLAGS_NO_PERCPU_STATS is
untouched and still stored as before.
I have tested this patch with tdc and I do not see any
failure related to this patch.
Tested-by: Vlad Buslov <vladbu@nvidia.com> Acked-by: Jamal Hadi Salim<jhs@mojatatu.com> Cc: Jiri Pirko <jiri@resnulli.us> Signed-off-by: Cong Wang <cong.wang@bytedance.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Martin Kaiser [Thu, 29 Jul 2021 07:43:54 +0000 (09:43 +0200)]
niu: read property length only if we use it
In three places, the driver calls of_get_property and reads the property
length although the length is not used. Update the calls to not request
the length.
Signed-off-by: Martin Kaiser <martin@kaiser.cx> Signed-off-by: David S. Miller <davem@davemloft.net>
We've added 64 non-merge commits during the last 15 day(s) which contain
a total of 83 files changed, 5027 insertions(+), 1808 deletions(-).
The main changes are:
1) BTF-guided binary data dumping libbpf API, from Alan.
2) Internal factoring out of libbpf CO-RE relocation logic, from Alexei.
3) Ambient BPF run context and cgroup storage cleanup, from Andrii.
4) Few small API additions for libbpf 1.0 effort, from Evgeniy and Hengqi.
5) bpf_program__attach_kprobe_opts() fixes in libbpf, from Jiri.
6) bpf_{get,set}sockopt() support in BPF iterators, from Martin.
7) BPF map pinning improvements in libbpf, from Martynas.
8) Improved module BTF support in libbpf and bpftool, from Quentin.
9) Bpftool cleanups and documentation improvements, from Quentin.
10) Libbpf improvements for supporting CO-RE on old kernels, from Shuyi.
11) Increased maximum cgroup storage size, from Stanislav.
12) Small fixes and improvements to BPF tests and samples, from various folks.
* https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (64 commits)
tools: bpftool: Complete metrics list in "bpftool prog profile" doc
tools: bpftool: Document and add bash completion for -L, -B options
selftests/bpf: Update bpftool's consistency script for checking options
tools: bpftool: Update and synchronise option list in doc and help msg
tools: bpftool: Complete and synchronise attach or map types
selftests/bpf: Check consistency between bpftool source, doc, completion
tools: bpftool: Slightly ease bash completion updates
unix_bpf: Fix a potential deadlock in unix_dgram_bpf_recvmsg()
libbpf: Add btf__load_vmlinux_btf/btf__load_module_btf
tools: bpftool: Support dumping split BTF by id
libbpf: Add split BTF support for btf__load_from_kernel_by_id()
tools: Replace btf__get_from_id() with btf__load_from_kernel_by_id()
tools: Free BTF objects at various locations
libbpf: Rename btf__get_from_id() as btf__load_from_kernel_by_id()
libbpf: Rename btf__load() as btf__load_into_kernel()
libbpf: Return non-null error on failures in libbpf_find_prog_btf_id()
bpf: Emit better log message if bpf_iter ctx arg btf_id == 0
tools/resolve_btfids: Emit warnings and patch zero id for missing symbols
bpf: Increase supported cgroup storage value size
libbpf: Fix race when pinning maps in parallel
...
====================
MAINTAINERS 7d901a1e878a ("net: phy: add Maxlinear GPY115/21x/24x driver") 8a7b46fa7902 ("MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver")
- mlx5e: RX, avoid possible data corruption w/ relaxed ordering and
LRO
- phy: re-add check for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54811
PHY
- sctp: fix return value check in __sctp_rcv_asconf_lookup
Previous releases - always broken:
- bpf:
- more spectre corner case fixes, introduce a BPF nospec
instruction for mitigating Spectre v4
- fix OOB read when printing XDP link fdinfo
- sockmap: fix cleanup related races
- mac80211: fix enabling 4-address mode on a sta vif after assoc
- can:
- raw: raw_setsockopt(): fix raw_rcv panic for sock UAF
- j1939: j1939_session_deactivate(): clarify lifetime of session
object, avoid UAF
- fix number of identical memory leaks in USB drivers
- tipc:
- do not blindly write skb_shinfo frags when doing decryption
- fix sleeping in tipc accept routine"
* tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
gve: Update MAINTAINERS list
can: esd_usb2: fix memory leak
can: ems_usb: fix memory leak
can: usb_8dev: fix memory leak
can: mcba_usb_start(): add missing urb->transfer_dma initialization
can: hi311x: fix a signedness bug in hi3110_cmd()
MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver
bpf: Fix leakage due to insufficient speculative store bypass mitigation
bpf: Introduce BPF nospec instruction for mitigating Spectre v4
sis900: Fix missing pci_disable_device() in probe and remove
net: let flow have same hash in two directions
nfc: nfcsim: fix use after free during module unload
tulip: windbond-840: Fix missing pci_disable_device() in probe and remove
sctp: fix return value check in __sctp_rcv_asconf_lookup
nfc: s3fwrn5: fix undefined parameter values in dev_err()
net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32
net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev()
net/mlx5: Unload device upon firmware fatal error
net/mlx5e: Fix page allocation failure for ptp-RQ over SF
net/mlx5e: Fix page allocation failure for trap-RQ over SF
...
Merge tag 'acpi-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"These revert a recent IRQ resources handling modification that turned
out to be problematic, fix suspend-to-idle handling on AMD platforms
to take upcoming systems into account properly and fix the retrieval
of the DPTF attributes of the PCH FIVR.
Specifics:
- Revert recent change of the ACPI IRQ resources handling that
attempted to improve the ACPI IRQ override selection logic, but
introduced serious regressions on some systems (Hui Wang).
- Fix up quirks for AMD platforms in the suspend-to-idle support code
so as to take upcoming systems using uPEP HID AMDI007 into account
as appropriate (Mario Limonciello).
- Fix the code retrieving DPTF attributes of the PCH FIVR so that it
agrees on the return data type with the ACPI control method
evaluated for this purpose (Srinivas Pandruvada)"
* tag 'acpi-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: DPTF: Fix reading of attributes
Revert "ACPI: resources: Add checks for ACPI IRQ override"
ACPI: PM: Add support for upcoming AMD uPEP HID AMDI007
Since commit 1b6b26ae7053 ("pipe: fix and clarify pipe write wakeup
logic") we have sanitized the pipe write logic, and would only try to
wake up readers if they needed it.
In particular, if the pipe already had data in it before the write,
there was no point in trying to wake up a reader, since any existing
readers must have been aware of the pre-existing data already. Doing
extraneous wakeups will only cause potential thundering herd problems.
However, it turns out that some Android libraries have misused the EPOLL
interface, and expected "edge triggered" be to "any new write will
trigger it". Even if there was no edge in sight.
Quoting Sandeep Patil:
"The commit 1b6b26ae7053 ('pipe: fix and clarify pipe write wakeup
logic') changed pipe write logic to wakeup readers only if the pipe
was empty at the time of write. However, there are libraries that
relied upon the older behavior for notification scheme similar to
what's described in [1]
One such library 'realm-core'[2] is used by numerous Android
applications. The library uses a similar notification mechanism as GNU
Make but it never drains the pipe until it is full. When Android moved
to v5.10 kernel, all applications using this library stopped working.
The library has since been fixed[3] but it will be a while before all
applications incorporate the updated library"
Our regression rule for the kernel is that if applications break from
new behavior, it's a regression, even if it was because the application
did something patently wrong. Also note the original report [4] by
Michal Kerrisk about a test for this epoll behavior - but at that point
we didn't know of any actual broken use case.
So add the extraneous wakeup, to approximate the old behavior.
[ I say "approximate", because the exact old behavior was to do a wakeup
not for each write(), but for each pipe buffer chunk that was filled
in. The behavior introduced by this change is not that - this is just
"every write will cause a wakeup, whether necessary or not", which
seems to be sufficient for the broken library use. ]
It's worth noting that this adds the extraneous wakeup only for the
write side, while the read side still considers the "edge" to be purely
about reading enough from the pipe to allow further writes.
See commit f467a6a66419 ("pipe: fix and clarify pipe read wakeup logic")
for the pipe read case, which remains that "only wake up if the pipe was
full, and we read something from it".
Merge branch 'tools: bpftool: update, synchronise and validate types and options'
Quentin Monnet says:
====================
To work with the different program types, map types, attach types etc.
supported by eBPF, bpftool needs occasional updates to learn about the new
features supported by the kernel. When such types translate into new
keyword for the command line, updates are expected in several locations:
typically, the help message displayed from bpftool itself, the manual page,
and the bash completion file should be updated. The options used by the
different commands for bpftool should also remain synchronised at those
locations.
Several omissions have occurred in the past, and a number of types are
still missing today. This set is an attempt to improve the situation. It
brings up-to-date the lists of types or options in bpftool, and also adds a
Python script to the BPF selftests to automatically check that most of
these lists remain synchronised.
v2:
- Reformat some lines in the bash completion file.
- Do not reformat attach types, to preserve git-blame history.
- Do not call Python script from tools/testing/selftests/bpf/Makefile.
====================
tools: bpftool: Complete metrics list in "bpftool prog profile" doc
Profiling programs with bpftool was extended some time ago to support
two new metrics, namely itlb_misses and dtlb_misses (misses for the
instruction/data translation lookaside buffer). Update the manual page
and bash completion accordingly.
tools: bpftool: Document and add bash completion for -L, -B options
The -L|--use-loader option for using loader programs when loading, or
when generating a skeleton, did not have any documentation or bash
completion. Same thing goes for -B|--base-btf, used to pass a path to a
base BTF object for split BTF such as BTF for kernel modules.
This patch documents and adds bash completion for those options.
Fixes: 75fa1777694c ("tools/bpftool: Add bpftool support for split BTF") Fixes: d510296d331a ("bpftool: Use syscall/loader program in "prog load" and "gen skeleton" command.") Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210730215435.7095-7-quentin@isovalent.com
selftests/bpf: Update bpftool's consistency script for checking options
Update the script responsible for checking that the different types used
at various places in bpftool are synchronised, and extend it to check
the consistency of options between the help messages in the source code
and the manual pages.
tools: bpftool: Update and synchronise option list in doc and help msg
All bpftool commands support the options for JSON output and debug from
libbpf. In addition, some commands support additional options
corresponding to specific use cases.
The list of options described in the man pages for the different
commands are not always accurate. The messages for interactive help are
mostly limited to HELP_SPEC_OPTIONS, and are even less representative of
the actual set of options supported for the commands.
Let's update the lists:
- HELP_SPEC_OPTIONS is modified to contain the "default" options (JSON
and debug), and to be extensible (no ending curly bracket).
- All commands use HELP_SPEC_OPTIONS in their help message, and then
complete the list with their specific options.
- The lists of options in the man pages are updated.
- The formatting of the list for bpftool.rst is adjusted to match
formatting for the other man pages. This is for consistency, and also
because it will be helpful in a future patch to automatically check
that the files are synchronised.
tools: bpftool: Complete and synchronise attach or map types
Update bpftool's list of attach type names to tell it about the latest
attach types, or the "ringbuf" map. Also update the documentation, help
messages, and bash completion when relevant.
These missing items were reported by the newly added Python script used
to help maintain consistency in bpftool.
selftests/bpf: Check consistency between bpftool source, doc, completion
Whenever the eBPF subsystem gains new elements, such as new program or
map types, it is necessary to update bpftool if we want it able to
handle the new items.
In addition to the main arrays containing the names of these elements in
the source code, there are also multiple locations to update:
- The help message in the do_help() functions in bpftool's source code.
- The RST documentation files.
- The bash completion file.
This has led to omissions multiple times in the past. This patch
attempts to address this issue by adding consistency checks for all
these different locations. It also verifies that the bpf_prog_type,
bpf_map_type and bpf_attach_type enums from the UAPI BPF header have all
their members present in bpftool.
The script requires no argument to run, it reads and parses the
different files to check, and prints the mismatches, if any. It
currently reports a number of missing elements, which will be fixed in a
later patch:
$ ./test_bpftool_synctypes.py
Comparing [...]/linux/tools/bpf/bpftool/map.c (map_type_name) and [...]/linux/tools/bpf/bpftool/bash-completion/bpftool (BPFTOOL_MAP_CREATE_TYPES): {'ringbuf'}
Comparing BPF header (enum bpf_attach_type) and [...]/linux/tools/bpf/bpftool/common.c (attach_type_name): {'BPF_TRACE_ITER', 'BPF_XDP_DEVMAP', 'BPF_XDP', 'BPF_SK_REUSEPORT_SELECT', 'BPF_XDP_CPUMAP', 'BPF_SK_REUSEPORT_SELECT_OR_MIGRATE'}
Comparing [...]/linux/tools/bpf/bpftool/prog.c (attach_type_strings) and [...]/linux/tools/bpf/bpftool/prog.c (do_help() ATTACH_TYPE): {'skb_verdict'}
Comparing [...]/linux/tools/bpf/bpftool/prog.c (attach_type_strings) and [...]/linux/tools/bpf/bpftool/Documentation/bpftool-prog.rst (ATTACH_TYPE): {'skb_verdict'}
Comparing [...]/linux/tools/bpf/bpftool/prog.c (attach_type_strings) and [...]/linux/tools/bpf/bpftool/bash-completion/bpftool (BPFTOOL_PROG_ATTACH_TYPES): {'skb_verdict'}
Note that the script does NOT check for consistency between the list of
program types that bpftool claims it accepts and the actual list of
keywords that can be used. This is because bpftool does not "see" them,
they are ELF section names parsed by libbpf. It is not hard to parse the
section_defs[] array in libbpf, but some section names are associated
with program types that bpftool cannot load at the moment. For example,
some programs require a BTF target and an attach target that bpftool
cannot handle. The script may be extended to parse the array and check
only relevant values in the future.
The script is not added to the selftests' Makefile, because doing so
would require all patches with BPF UAPI change to also update bpftool.
Instead it is to be added to the CI.
Bash completion for bpftool gets two minor improvements in this patch.
Move the detection of attach types for "bpftool cgroup attach" outside
of the "case/esac" bloc, where we cannot reuse our variable holding the
list of supported attach types as a pattern list. After the change, we
have only one list of cgroup attach types to update when new types are
added, instead of the former two lists.
Also rename the variables holding lists of names for program types, map
types, and attach types, to make them more unique. This can make it
slightly easier to point people to the relevant variables to update, but
the main objective here is to help run a script to check that bash
completion is up-to-date with bpftool's source code.
====================
Clean devlink net namespace operations
This short series continues my work on devlink core code to make devlink
reload less prone to errors and harden it from API abuse.
Despite first patch being a clear fix, I would ask you to apply it to
net-next anyway, because the fixed patch is anyway old and it will
help us to eliminate merge conflicts that will arise for following
patches or even for the second one.
====================
Leon Romanovsky [Thu, 29 Jul 2021 17:19:25 +0000 (20:19 +0300)]
devlink: Allocate devlink directly in requested net namespace
There is no need in extra call indirection and check from impossible
flow where someone tries to set namespace without prior call
to devlink_alloc().
Instead of this extra logic and additional EXPORT_SYMBOL, use specialized
devlink allocation function that receives net namespace as an argument.
Such specialized API allows clear view when devlink initialized in wrong
net namespace and/or kernel users don't try to change devlink namespace
under the hood.
Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Leon Romanovsky [Thu, 29 Jul 2021 17:19:24 +0000 (20:19 +0300)]
devlink: Break parameter notification sequence to be before/after unload/load driver
The change of namespaces during devlink reload calls to driver unload
before it accesses devlink parameters. The commands below causes to
use-after-free bug when trying to get flow steering mode.
* ip netns add n1
* devlink dev reload pci/0000:00:09.0 netns n1
==================================================================
BUG: KASAN: use-after-free in mlx5_devlink_fs_mode_get+0x96/0xa0 [mlx5_core]
Read of size 4 at addr ffff888009d04308 by task devlink/275
Memory state around the buggy address: ffff888009d04200: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb ffff888009d04280: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff888009d04300: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^ ffff888009d04380: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff888009d04400: fa fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
The right solution to devlink reload is to notify about deletion of
parameters, unload driver, change net namespaces, load driver and notify
about addition of parameters.
Fixes: 070c63f20f6c ("net: devlink: allow to change namespaces during reload") Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Add two new APIs: btf__load_vmlinux_btf and btf__load_module_btf.
btf__load_vmlinux_btf is just an alias to the existing API named
libbpf_find_kernel_btf, rename to be more precisely and consistent
with existing BTF APIs. btf__load_module_btf can be used to load
module BTF, add it for completeness. These two APIs are useful for
implementing tracing tools and introspection tools. This is part
of the effort towards libbpf 1.0 ([0]).
Paolo Abeni [Fri, 30 Jul 2021 16:30:53 +0000 (18:30 +0200)]
sk_buff: avoid potentially clearing 'slow_gro' field
If skb_dst_set_noref() is invoked with a NULL dst, the 'slow_gro'
field is cleared, too. That could lead to wrong behavior if
the skb later enters the GRO stage.
Fix the potential issue replacing preserving a non-zero value of
the 'slow_gro' field.
Merge tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
- gendisk freeing fix (Christoph)
- blk-iocost wake ordering fix (Tejun)
- tag allocation error handling fix (John)
- loop locking fix. While this isn't the prettiest fix in the world,
nobody has any good alternatives for 5.14. Something to likely
revisit for 5.15. (Tetsuo)
* tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-block:
block: delay freeing the gendisk
blk-iocost: fix operation ordering in iocg_wake_fn()
blk-mq-sched: Fix blk_mq_sched_alloc_tags() error handling
loop: reintroduce global lock for safe loop_validate_file() traversal
Merge tag 'libata-5.14-2021-07-30' of git://git.kernel.dk/linux-block
Pull libata fixlets from Jens Axboe:
- A fix for PIO highmem (Christoph)
- Kill HAVE_IDE as it's now unused (Lukas)
* tag 'libata-5.14-2021-07-30' of git://git.kernel.dk/linux-block:
arch: Kconfig: clean up obsolete use of HAVE_IDE
libata: fix ata_pio_sector for CONFIG_HIGHMEM
Merge tag 'for-5.14-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux
Pull btrfs fixes from David Sterba:
- fix -Warray-bounds warning, to help external patchset to make it
default treewide
- fix writeable device accounting (syzbot report)
- fix fsync and log replay after a rename and inode eviction
- fix potentially lost error code when submitting multiple bios for
compressed range
* tag 'for-5.14-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
btrfs: calculate number of eb pages properly in csum_tree_block
btrfs: fix rw device counting in __btrfs_free_extra_devids
btrfs: fix lost inode on log replay after mix of fsync, rename and inode eviction
btrfs: mark compressed range uptodate only if all bio succeed
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- resume timing fix for intel-ish driver (Ye Xiang)
- fix for using incorrect MMIO register in amd_sfh driver (Dylan
MacKenzie)
- Cintiq 24HDT / 27QHDT regression fix and touch processing fix for
Wacom driver (Jason Gerecke)
- device removal bugfix for ft260 driver (Michael Zaidman)
- other small assorted fixes
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: ft260: fix device removal due to USB disconnect
HID: wacom: Skip processing of touches with negative slot values
HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDT
HID: Kconfig: Fix spelling mistake "Uninterruptable" -> "Uninterruptible"
HID: apple: Add support for Keychron K1 wireless keyboard
HID: fix typo in Kconfig
HID: ft260: fix format type warning in ft260_word_show()
HID: amd_sfh: Use correct MMIO register for DMA address
HID: asus: Remove check for same LED brightness on set
HID: intel-ish-hid: use async resume function
Subsystems affected by this patch series: lib, ocfs2, and mm (slub,
migration, and memcg)"
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook()
slub: fix unreclaimable slab stat for bulk free
mm/migrate: fix NR_ISOLATED corruption on 64-bit
mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code
ocfs2: issue zeroout to EOF blocks
ocfs2: fix zero out valid data
lib/test_string.c: move string selftest in the Runtime Testing menu
Jakub Kicinski [Fri, 30 Jul 2021 17:29:52 +0000 (19:29 +0200)]
Merge tag 'linux-can-fixes-for-5.14-20210730' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
Marc Kleine-Budde says:
====================
pull-request: can 2021-07-30
The first patch is by me and adds Yasushi SHOJI as a reviewer for the
Microchip CAN BUS Analyzer Tool driver.
Dan Carpenter's patch fixes a signedness bug in the hi311x driver.
Pavel Skripkin provides 4 patches, the first targets the mcba_usb
driver by adding the missing urb->transfer_dma initialization, which
was broken in a previous commit. The last 3 patches fix a memory leak
in the usb_8dev, ems_usb and esd_usb2 driver.
* tag 'linux-can-fixes-for-5.14-20210730' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
can: esd_usb2: fix memory leak
can: ems_usb: fix memory leak
can: usb_8dev: fix memory leak
can: mcba_usb_start(): add missing urb->transfer_dma initialization
can: hi311x: fix a signedness bug in hi3110_cmd()
MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver
====================
When kmalloc_node() a large memory, page is allocated, not slab, so when
freeing memory via kfree_rcu(), this large memory should not be used by
memcg_slab_free_hook(), because memcg_slab_free_hook() is is used for
slab.
Using page_objcgs_check() instead of page_objcgs() in
memcg_slab_free_hook() to fix this bug.
Link: https://lkml.kernel.org/r/20210728145655.274476-1-wanghai38@huawei.com Fixes: 270c6a71460e ("mm: memcontrol/slab: Use helpers to access slab page's memcg_data") Signed-off-by: Wang Hai <wanghai38@huawei.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <guro@fb.com> Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
SLUB uses page allocator for higher order allocations and update
unreclaimable slab stat for such allocations. At the moment, the bulk
free for SLUB does not share code with normal free code path for these
type of allocations and have missed the stat update. So, fix the stat
update by common code. The user visible impact of the bug is the
potential of inconsistent unreclaimable slab stat visible through
meminfo and vmstat.
Link: https://lkml.kernel.org/r/20210728155354.3440560-1-shakeelb@google.com Fixes: 6a486c0ad4dc ("mm, sl[ou]b: improve memory accounting") Signed-off-by: Shakeel Butt <shakeelb@google.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: Roman Gushchin <guro@fb.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Similar to commit 2da9f6305f30 ("mm/vmscan: fix NR_ISOLATED_FILE
corruption on 64-bit") avoid using unsigned int for nr_pages. With
unsigned int type the large unsigned int converts to a large positive
signed long.
Symptoms include CMA allocations hanging forever due to
alloc_contig_range->...->isolate_migratepages_block waiting forever in
"while (unlikely(too_many_isolated(pgdat)))".
Link: https://lkml.kernel.org/r/20210728042531.359409-1-aneesh.kumar@linux.ibm.com Fixes: c5fc5c3ae0c8 ("mm: migrate: account THP NUMA migration counters correctly") Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Reported-by: Michael Ellerman <mpe@ellerman.id.au> Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Reviewed-by: Yang Shi <shy828301@gmail.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Johannes Weiner [Thu, 29 Jul 2021 21:53:44 +0000 (14:53 -0700)]
mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code
Dan Carpenter reports:
The patch 2d146aa3aa84: "mm: memcontrol: switch to rstat" from Apr
29, 2021, leads to the following static checker warning:
kernel/cgroup/rstat.c:200 cgroup_rstat_flush()
warn: sleeping in atomic context
mm/memcontrol.c
3572 static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
3573 {
3574 unsigned long val;
3575
3576 if (mem_cgroup_is_root(memcg)) {
3577 cgroup_rstat_flush(memcg->css.cgroup);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is from static analysis and potentially a false positive. The
problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold()
which holds an rcu_read_lock(). And the cgroup_rstat_flush() function
can sleep.
3578 val = memcg_page_state(memcg, NR_FILE_PAGES) +
3579 memcg_page_state(memcg, NR_ANON_MAPPED);
3580 if (swap)
3581 val += memcg_page_state(memcg, MEMCG_SWAP);
3582 } else {
3583 if (!swap)
3584 val = page_counter_read(&memcg->memory);
3585 else
3586 val = page_counter_read(&memcg->memsw);
3587 }
3588 return val;
3589 }
__mem_cgroup_threshold() indeed holds the rcu lock. In addition, the
thresholding code is invoked during stat changes, and those contexts
have irqs disabled as well. If the lock breaking occurs inside the
flush function, it will result in a sleep from an atomic context.
Use the irqsafe flushing variant in mem_cgroup_usage() to fix this.
Link: https://lkml.kernel.org/r/20210726150019.251820-1-hannes@cmpxchg.org Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat") Signed-off-by: Johannes Weiner <hannes@cmpxchg.org> Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Chris Down <chris@chrisdown.name> Reviewed-by: Rik van Riel <riel@surriel.com> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Junxiao Bi [Thu, 29 Jul 2021 21:53:41 +0000 (14:53 -0700)]
ocfs2: issue zeroout to EOF blocks
For punch holes in EOF blocks, fallocate used buffer write to zero the
EOF blocks in last cluster. But since ->writepage will ignore EOF
pages, those zeros will not be flushed.
This "looks" ok as commit 6bba4471f0cc ("ocfs2: fix data corruption by
fallocate") will zero the EOF blocks when extend the file size, but it
isn't. The problem happened on those EOF pages, before writeback, those
pages had DIRTY flag set and all buffer_head in them also had DIRTY flag
set, when writeback run by write_cache_pages(), DIRTY flag on the page
was cleared, but DIRTY flag on the buffer_head not.
When next write happened to those EOF pages, since buffer_head already
had DIRTY flag set, it would not mark page DIRTY again. That made
writeback ignore them forever. That will cause data corruption. Even
directio write can't work because it will fail when trying to drop pages
caches before direct io, as it found the buffer_head for those pages
still had DIRTY flag set, then it will fall back to buffer io mode.
To make a summary of the issue, as writeback ingores EOF pages, once any
EOF page is generated, any write to it will only go to the page cache,
it will never be flushed to disk even file size extends and that page is
not EOF page any more. The fix is to avoid zero EOF blocks with buffer
write.
The following code snippet from qemu-img could trigger the corruption.
Link: https://lkml.kernel.org/r/20210722054923.24389-2-junxiao.bi@oracle.com Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Junxiao Bi [Thu, 29 Jul 2021 21:53:38 +0000 (14:53 -0700)]
ocfs2: fix zero out valid data
If append-dio feature is enabled, direct-io write and fallocate could
run in parallel to extend file size, fallocate used "orig_isize" to
record i_size before taking "ip_alloc_sem", when
ocfs2_zeroout_partial_cluster() zeroout EOF blocks, i_size maybe already
extended by ocfs2_dio_end_io_write(), that will cause valid data zeroed
out.
Link: https://lkml.kernel.org/r/20210722054923.24389-1-junxiao.bi@oracle.com Fixes: 6bba4471f0cc ("ocfs2: fix data corruption by fallocate") Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
lib/test_string.c: move string selftest in the Runtime Testing menu
STRING_SELFTEST is presented in the "Library routines" menu. Move it in
Kernel hacking > Kernel Testing and Coverage > Runtime Testing together
with other similar tests found in lib/
--- Runtime Testing
<*> Test functions located in the hexdump module at runtime
<*> Test string functions (NEW)
<*> Test functions located in the string_helpers module at runtime
<*> Test strscpy*() family of functions at runtime
<*> Test kstrto*() family of functions at runtime
<*> Test printf() family of functions at runtime
<*> Test scanf() family of functions at runtime
Link: https://lkml.kernel.org/r/20210719185158.190371-1-mcroce@linux.microsoft.com Signed-off-by: Matteo Croce <mcroce@microsoft.com> Cc: Peter Rosin <peda@axentia.se> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
nfc: nci: constify several pointers to u8, sk_buff and other structs
Several functions receive pointers to u8, sk_buff or other structs but
do not modify the contents so make them const. This allows doing the
same for local variables and in total makes the code a little bit safer.
This makes const also data passed as "unsigned long opt" argument to
nci_request() function. Usual flow for such functions is:
1. Receive "u8 *" and store it (the pointer) in a structure
allocated on stack (e.g. struct nci_set_config_param),
2. Call nci_request() or __nci_request() passing a callback function an
the pointer to the structure via an "unsigned long opt",
3. nci_request() calls the callback which dereferences "unsigned long
opt" in a read-only way.
This converts all above paths to use proper pointer to const data, so
entire flow is safer.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
nfc: constify several pointers to u8, char and sk_buff
Several functions receive pointers to u8, char or sk_buff but do not
modify the contents so make them const. This allows doing the same for
local variables and in total makes the code a little bit safer.
Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@canonical.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
The arch-specific Kconfig files use HAVE_IDE to indicate if IDE is
supported.
As IDE support and the HAVE_IDE config vanishes with commit b7fb14d3ac63
("ide: remove the legacy ide driver"), there is no need to mention
HAVE_IDE in all those arch-specific Kconfig files.
The issue was identified with ./scripts/checkkconfigsymbols.py.
Fixes: b7fb14d3ac63 ("ide: remove the legacy ide driver") Suggested-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Acked-by: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/r/20210728182115.4401-1-lukas.bulwahn@gmail.com Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
Pavel Skripkin [Tue, 27 Jul 2021 17:00:46 +0000 (20:00 +0300)]
can: esd_usb2: fix memory leak
In esd_usb2_setup_rx_urbs() MAX_RX_URBS coherent buffers are allocated
and there is nothing, that frees them:
1) In callback function the urb is resubmitted and that's all
2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER
is not set (see esd_usb2_setup_rx_urbs) and this flag cannot be used
with coherent buffers.
So, all allocated buffers should be freed with usb_free_coherent()
explicitly.
Side note: This code looks like a copy-paste of other can drivers. The
same patch was applied to mcba_usb driver and it works nice with real
hardware. There is no change in functionality, only clean-up code for
coherent buffers.
Pavel Skripkin [Tue, 27 Jul 2021 17:00:33 +0000 (20:00 +0300)]
can: ems_usb: fix memory leak
In ems_usb_start() MAX_RX_URBS coherent buffers are allocated and
there is nothing, that frees them:
1) In callback function the urb is resubmitted and that's all
2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER
is not set (see ems_usb_start) and this flag cannot be used with
coherent buffers.
So, all allocated buffers should be freed with usb_free_coherent()
explicitly.
Side note: This code looks like a copy-paste of other can drivers. The
same patch was applied to mcba_usb driver and it works nice with real
hardware. There is no change in functionality, only clean-up code for
coherent buffers.
Pavel Skripkin [Tue, 27 Jul 2021 16:59:57 +0000 (19:59 +0300)]
can: usb_8dev: fix memory leak
In usb_8dev_start() MAX_RX_URBS coherent buffers are allocated and
there is nothing, that frees them:
1) In callback function the urb is resubmitted and that's all
2) In disconnect function urbs are simply killed, but URB_FREE_BUFFER
is not set (see usb_8dev_start) and this flag cannot be used with
coherent buffers.
So, all allocated buffers should be freed with usb_free_coherent()
explicitly.
Side note: This code looks like a copy-paste of other can drivers. The
same patch was applied to mcba_usb driver and it works nice with real
hardware. There is no change in functionality, only clean-up code for
coherent buffers.
Yasushi reported, that his Microchip CAN Analyzer stopped working
since commit 91c02557174b ("can: mcba_usb: fix memory leak in
mcba_usb"). The problem was in missing urb->transfer_dma
initialization.
In my previous patch to this driver I refactored mcba_usb_start() code
to avoid leaking usb coherent buffers. To archive it, I passed local
stack variable to usb_alloc_coherent() and then saved it to private
array to correctly free all coherent buffers on ->close() call. But I
forgot to initialize urb->transfer_dma with variable passed to
usb_alloc_coherent().
All of this was causing device to not work, since dma addr 0 is not
valid and following log can be found on bug report page, which points
exactly to problem described above.
| DMAR: [DMA Write] Request device [00:14.0] PASID ffffffff fault addr 0 [fault reason 05] PTE Write access is not set
Dan Carpenter [Thu, 29 Jul 2021 14:12:46 +0000 (17:12 +0300)]
can: hi311x: fix a signedness bug in hi3110_cmd()
The hi3110_cmd() is supposed to return zero on success and negative
error codes on failure, but it was accidentally declared as a u8 when
it needs to be an int type.
Fixes: 57e83fb9b746 ("can: hi311x: Add Holt HI-311x CAN driver") Link: https://lore.kernel.org/r/20210729141246.GA1267@kili Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
Merge tag 'drm-fixes-2021-07-30' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Regular drm fixes pull, seems about the right size, lots of small
fixes across the board, mostly amdgpu, but msm and i915 are in there
along with panel and ttm.
amdgpu:
- Fix resource leak in an error path
- Avoid stack contents exposure in error path
- pmops check fix for S0ix vs S3
- DCN 2.1 display fixes
- DCN 2.0 display fix
- Backlight control fix for laptops with HDR panels
- Maintainers updates
i915:
- Fix vbt port mask
- Fix around reading the right DSC disable fuse in display_ver 10
- Split display version 9 and 10 in intel_setup_outputs
* tag 'drm-fixes-2021-07-30' of git://anongit.freedesktop.org/drm/drm:
maintainers: add bugs and chat URLs for amdgpu
drm/amdgpu/display: only enable aux backlight control for OLED panels
drm/amd/display: ensure dentist display clock update finished in DCN20
drm/amd/display: Add missing DCN21 IP parameter
drm/amd/display: Guard DST_Y_PREFETCH register overflow in DCN21
drm/amdgpu: Check pmops for desired suspend state
drm/msm/dp: Initialize dp->aux->drm_dev before registration
drm/msm/dp: signal audio plugged change at dp_pm_resume
drm/msm/dp: Initialize the INTF_CONFIG register
drm/msm/dp: use dp_ctrl_off_link_stream during PHY compliance test run
drm/msm: Fix display fault handling
drm/msm/dpu: Fix sm8250_mdp register length
drm/amdgpu: Avoid printing of stack contents on firmware load error
drm/amdgpu: Fix resource leak on probe error path
drm/i915/display: split DISPLAY_VER 9 and 10 in intel_setup_outputs()
drm/i915: fix not reading DSC disable fuse in GLK
drm/i915/bios: Fix ports mask
drm/panel: panel-simple: Fix proper bpc for ytc700tlag_05_201c
drm/ttm: Initialize debugfs from ttm_global_init()
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha
Pull alpha updates from Matt Turner:
"They're mostly small janitorial fixes but there's also more important
ones:
- drop the alpha-specific x86 binary loader (David Hildenbrand)
- regression fix for at least Marvel platforms (Mike Rapoport)
- fix for a scary-looking typo (Zheng Yongjun)"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mattst88/alpha:
alpha: register early reserved memory in memblock
alpha: fix spelling mistakes
alpha: Remove space between * and parameter name
alpha: fp_emul: avoid init/cleanup_module names
alpha: Add syscall_get_return_value()
binfmt: remove support for em86 (alpha only)
alpha: fix typos in a comment
alpha: defconfig: add necessary configs for boot testing
alpha: Send stop IPI to send to online CPUs
alpha: convert comma to semicolon
alpha: remove undef inline in compiler.h
alpha: Kconfig: Replace HTTP links with HTTPS ones
alpha: __udiv_qrnnd should be exported
Merge branch 'libbpf: rename btf__get_from_id() and btf__load() APIs, support split BTF'
Quentin Monnet says:
====================
As part of the effort to move towards a v1.0 for libbpf [0], this set
improves some confusing function names related to BTF loading from and to
the kernel:
- btf__load() becomes btf__load_into_kernel().
- btf__get_from_id becomes btf__load_from_kernel_by_id().
- A new version btf__load_from_kernel_by_id_split() extends the former to
add support for split BTF.
The last patch is a trivial change to bpftool to add support for dumping
split BTF objects by referencing them by their id (and not only by their
BTF path).
v3:
- Use libbpf_err_ptr() in btf__load_from_kernel_by_id(), ERR_PTR() in
bpftool's get_map_kv_btf().
- Move the definition of btf__load_from_kernel_by_id() closer to the
btf__parse() group in btf.h (move the legacy function with it).
- Fix a bug on the return value in libbpf_find_prog_btf_id(), as a new
patch.
- Move the btf__free() fixes to their own patch.
- Add "Fixes:" tags to relevant patches.
v2:
- Remove deprecation marking of legacy functions (patch 4/6 from v1).
- Make btf__load_from_kernel_by_id{,_split}() return the btf struct, adjust
surrounding code and call btf__free() when missing.
- Add new functions to v0.5.0 API (and not v0.6.0).
====================
Split BTF objects are typically BTF objects for kernel modules, which
are incrementally built on top of kernel BTF instead of redefining all
kernel symbols they need. We can use bpftool with its -B command-line
option to dump split BTF objects. It works well when the handle provided
for the BTF object to dump is a "path" to the BTF object, typically
under /sys/kernel/btf, because bpftool internally calls
btf__parse_split() which can take a "base_btf" pointer and resolve the
BTF reconstruction (although in that case, the "-B" option is
unnecessary because bpftool performs autodetection).
However, it did not work so far when passing the BTF object through its
id, because bpftool would call btf__get_from_id() which did not provide
a way to pass a "base_btf" pointer.
# bpftool btf dump id 6 -B /sys/kernel/btf/vmlinux
The libbpf API has recently changed, and btf__get_from_id() has been
deprecated in favour of btf__load_from_kernel_by_id() and its version
with support for split BTF, btf__load_from_kernel_by_id_split(). Let's
update bpftool to make it able to dump the BTF object in the second case
as well.
libbpf: Add split BTF support for btf__load_from_kernel_by_id()
Add a new API function btf__load_from_kernel_by_id_split(), which takes
a pointer to a base BTF object in order to support split BTF objects
when retrieving BTF information from the kernel.
tools: Replace btf__get_from_id() with btf__load_from_kernel_by_id()
Replace the calls to function btf__get_from_id(), which we plan to
deprecate before the library reaches v1.0, with calls to
btf__load_from_kernel_by_id() in tools/ (bpftool, perf, selftests).
Update the surrounding code accordingly (instead of passing a pointer to
the btf struct, get it as a return value from the function).
Make sure to call btf__free() (and not simply free(), which does not
free all pointers stored in the struct) on pointers to struct btf
objects retrieved at various locations.
These were found while updating the calls to btf__get_from_id().
Fixes: 999d82cbc044 ("tools/bpf: enhance test_btf file testing to test func info") Fixes: 254471e57a86 ("tools/bpf: bpftool: add support for func types") Fixes: 7b612e291a5a ("perf tools: Synthesize PERF_RECORD_* for loaded BPF programs") Fixes: d56354dc4909 ("perf tools: Save bpf_prog_info and BTF of new BPF programs") Fixes: 47c09d6a9f67 ("bpftool: Introduce "prog profile" command") Fixes: fa853c4b839e ("perf stat: Enable counting events for BPF programs") Signed-off-by: Quentin Monnet <quentin@isovalent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210729162028.29512-5-quentin@isovalent.com
libbpf: Rename btf__get_from_id() as btf__load_from_kernel_by_id()
Rename function btf__get_from_id() as btf__load_from_kernel_by_id() to
better indicate what the function does. Change the new function so that,
instead of requiring a pointer to the pointer to update and returning
with an error code, it takes a single argument (the id of the BTF
object) and returns the corresponding pointer. This is more in line with
the existing constructors.
The other tools calling the (soon-to-be) deprecated btf__get_from_id()
function will be updated in a future commit.
libbpf: Rename btf__load() as btf__load_into_kernel()
As part of the effort to move towards a v1.0 for libbpf, rename
btf__load() function, used to "upload" BTF information into the kernel,
as btf__load_into_kernel(). This new name better reflects what the
function does.
libbpf: Return non-null error on failures in libbpf_find_prog_btf_id()
Variable "err" is initialised to -EINVAL so that this error code is
returned when something goes wrong in libbpf_find_prog_btf_id().
However, a recent change in the function made use of the variable in
such a way that it is set to 0 if retrieving linear information on the
program is successful, and this 0 value remains if we error out on
failures at later stages.
Let's fix this by setting err to -EINVAL later in the function.
To avoid kernel build failure due to some missing .BTF-ids referenced
functions/types, the patch ([1]) tries to fill btf_id 0 for
these types.
In bpf verifier, for percpu variable and helper returning btf_id cases,
verifier already emitted proper warning with something like
verbose(env, "Helper has invalid btf_id in R%d\n", regno);
verbose(env, "invalid return type %d of func %s#%d\n",
fn->ret_type, func_id_name(func_id), func_id);
But this is not the case for bpf_iter context arguments.
I hacked resolve_btfids to encode btf_id 0 for struct task_struct.
With `./test_progs -n 7/5`, I got,
0: (79) r2 = *(u64 *)(r1 +0)
func 'bpf_iter_task' arg0 has btf_id 29739 type STRUCT 'bpf_iter_meta'
; struct seq_file *seq = ctx->meta->seq;
1: (79) r6 = *(u64 *)(r2 +0)
; struct task_struct *task = ctx->task;
2: (79) r7 = *(u64 *)(r1 +8)
; if (task == (void *)0) {
3: (55) if r7 != 0x0 goto pc+11
...
; BPF_SEQ_PRINTF(seq, "%8d %8d\n", task->tgid, task->pid);
26: (61) r1 = *(u32 *)(r7 +1372)
Type '(anon)' is not a struct
Basically, verifier will return btf_id 0 for task_struct.
Later on, when the code tries to access task->tgid, the
verifier correctly complains the type is '(anon)' and it is
not a struct. Users still need to backtrace to find out
what is going on.
Let us catch the invalid btf_id 0 earlier
and provide better message indicating btf_id is wrong.
The new error message looks like below:
R1 type=ctx expected=fp
; struct seq_file *seq = ctx->meta->seq;
0: (79) r2 = *(u64 *)(r1 +0)
func 'bpf_iter_task' arg0 has btf_id 29739 type STRUCT 'bpf_iter_meta'
; struct seq_file *seq = ctx->meta->seq;
1: (79) r6 = *(u64 *)(r2 +0)
; struct task_struct *task = ctx->task;
2: (79) r7 = *(u64 *)(r1 +8)
invalid btf_id for context argument offset 8
invalid bpf_context access off=8 size=8
tools/resolve_btfids: Emit warnings and patch zero id for missing symbols
Kernel functions referenced by .BTF_ids may be changed from global to static
and get inlined or get renamed/removed, and thus disappears from BTF.
This causes kernel build failure when resolve_btfids do id patch for symbols
in .BTF_ids in vmlinux. Update resolve_btfids to emit warning messages and
patch zero id for missing symbols instead of aborting kernel build process.
Tang Bin [Thu, 29 Jul 2021 04:03:00 +0000 (12:03 +0800)]
bcm63xx_enet: delete a redundant assignment
In the function bcm_enetsw_probe(), 'ret' will be assigned by
bcm_enet_change_mtu(), so 'ret = 0' make no sense.
Signed-off-by: Zhang Shengju <zhangshengju@cmss.chinamobile.com> Signed-off-by: Tang Bin <tangbin@cmss.chinamobile.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 29 Jul 2021 14:56:00 +0000 (17:56 +0300)]
net: dsa: don't set skb->offload_fwd_mark when not offloading the bridge
DSA has gained the recent ability to deal gracefully with upper
interfaces it cannot offload, such as the bridge, bonding or team
drivers. When such uppers exist, the ports are still in standalone mode
as far as the hardware is concerned.
But when we deliver packets to the software bridge in order for that to
do the forwarding, there is an unpleasant surprise in that the bridge
will refuse to forward them. This is because we unconditionally set
skb->offload_fwd_mark = true, meaning that the bridge thinks the frames
were already forwarded in hardware by us.
Since dp->bridge_dev is populated only when there is hardware offload
for it, but not in the software fallback case, let's introduce a new
helper that can be called from the tagger data path which sets the
skb->offload_fwd_mark accordingly to zero when there is no hardware
offload for bridging. This lets the bridge forward packets back to other
interfaces of our switch, if needed.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Tobias Waldekranz <tobias@waldekranz.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Di Zhu [Thu, 29 Jul 2021 13:19:30 +0000 (21:19 +0800)]
ipvlan: Add handling of NETDEV_UP events
When an ipvlan device is created on a bond device, the link state
of the ipvlan device may be abnormal. This is because bonding device
allows to add physical network card device in the down state and so
NETDEV_CHANGE event will not be notified to other listeners, so ipvlan
has no chance to update its link status.
The following steps can cause such problems:
1) bond0 is down
2) ip link add link bond0 name ipvlan type ipvlan mode l2
3) echo +enp2s7 >/sys/class/net/bond0/bonding/slaves
4) ip link set bond0 up
After these steps, use ip link command, we found ipvlan has NO-CARRIER:
ipvlan@bond0: <NO-CARRIER, BROADCAST,MULTICAST,UP,M-DOWN> mtu ...>
We can deal with this problem like VLAN: Add handling of NETDEV_UP
events. If we receive NETDEV_UP event, we will update the link status
of the ipvlan.
Signed-off-by: Di Zhu <zhudi21@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net/sched: store the last executed chain also for clsact egress
currently, only 'ingress' and 'clsact ingress' qdiscs store the tc 'chain
id' in the skb extension. However, userspace programs (like ovs) are able
to setup egress rules, and datapath gets confused in case it doesn't find
the 'chain id' for a packet that's "recirculated" by tc.
Change tcf_classify() to have the same semantic as tcf_classify_ingress()
so that a single function can be called in ingress / egress, using the tc
ingress / egress block respectively.
Suggested-by: Alaa Hleilel <alaa@nvidia.com> Signed-off-by: Davide Caratti <dcaratti@redhat.com> Reviewed-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 29 Jul 2021 20:34:45 +0000 (21:34 +0100)]
Merge branch 'dpaa2-switch-add-mirroring-support'
Ioana Ciornei says:
====================
dpaa2-switch: add mirroring support
This patch set adds per port and per VLAN mirroring in dpaa2-switch.
The first 4 patches are just cosmetic changes. We renamed the
dpaa2_switch_acl_tbl structure into dpaa2_switch_filter_block so that we
can reuse it for filters that do not use the ACL table and reorganized
the addition of trap, redirect and drop filters into a separate
function. All this just to make for a more streamlined addition of the
support for mirroring.
The next 4 patches are actually adding the advertised support. Mirroring
rules can be added in shared blocks, the driver will replicate the same
configuration on all the switch ports part of the same block.
The last patch documents the feature, presents its behavior and
limitations and gives a couple of examples.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
dpaa2-switch: offload shared block mirror filters when binding to a port
When mirroring rules are added in shared filter blocks, the same
mirroring rule has to be configured on all the switch ports that are
part of the same block.
In case a switch port joins a shared block after mirroring filters have
been already added to it, then all the mirror rules should be offloaded
to the port. The reverse, removal of mirroring rules, has to be done at
block unbind.
For this purpose, the dpaa2_switch_block_offload_mirror() and
dpaa2_switch_block_unoffload_mirror() functions are added and called
upon binding and unbinding a switch port to/from a block.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add support for per port mirroring for the DPAA2 switch. We support
only single mirror port, therefore we allow mirroring rules only as long
as the destination port is always the same.
Unlike all the actions (drop, redirect, trap) already supported by the
dpaa2-switch driver, adding mirroring filters in shared blocks is not
achieved by a singular ACL entry added in a table shared by the ports.
This is why, when a new mirror filter is added in a block we have to got
through all the switch ports sharing it and configure the filter
individually on all.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Extract the necessary steps to offload a filter by using the ACL table
in a separate function - dpaa2_switch_cls_flower_replace_acl().
This is intended to help with the code readability when the mirroring
support is added.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dpaa2-switch: rename dpaa2_switch_acl_tbl into filter_block
Until now, shared filter blocks were implemented only by ACL tables
shared between ports. Going forward, when the mirroring support will be
added, this will not be true anymore.
Rename the dpaa2_switch_acl_tbl into dpaa2_switch_filter_block so that
we make it clear that the structure is used not only for filters that
use the ACL table but will be used for all the filters that are added in
a block.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
dpaa2-switch: rename dpaa2_switch_tc_parse_action to specify the ACL
Until now, the dpaa2_switch_tc_parse_action() function was used for all
the supported tc actions since all of them were implemented by adding
ACL table entries. In the next commits, the dpaa2-switch driver will
gain mirroring support which is not using the same HW feature.
Make sure that we specify the ACL in the function name so that we make
it clear that it's only used for specific actions.
Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
scsi: acornscsi: Fix fall-through warning for clang
Fix the following fallthrough warning (on ARM):
drivers/scsi/arm/acornscsi.c:2651:2: warning: unannotated fall-through between switch labels [-Wimplicit-fallthrough]
case res_success:
^
drivers/scsi/arm/acornscsi.c:2651:2: note: insert '__attribute__((fallthrough));' to silence this warning
case res_success:
^
__attribute__((fallthrough));
drivers/scsi/arm/acornscsi.c:2651:2: note: insert 'break;' to avoid fall-through
case res_success:
^
break; Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/202107260355.bF00i5bi-lkp@intel.com/ Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull kvm fixes from Paolo Bonzini:
"ARM:
- Fix MTE shared page detection
- Enable selftest's use of PMU registers when asked to
s390:
- restore 5.13 debugfs names
x86:
- fix sizes for vcpu-id indexed arrays
- fixes for AMD virtualized LAPIC (AVIC)
- other small bugfixes
Generic:
- access tracking performance test
- dirty_log_perf_test command line parsing fix
- Fix selftest use of obsolete pthread_yield() in favour of
sched_yield()
- use cpu_relax when halt polling
- fixed missing KVM_CLEAR_DIRTY_LOG compat ioctl"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: add missing compat KVM_CLEAR_DIRTY_LOG
KVM: use cpu_relax when halt polling
KVM: SVM: use vmcb01 in svm_refresh_apicv_exec_ctrl
KVM: SVM: tweak warning about enabled AVIC on nested entry
KVM: SVM: svm_set_vintr don't warn if AVIC is active but is about to be deactivated
KVM: s390: restore old debugfs names
KVM: SVM: delay svm_vcpu_init_msrpm after svm->vmcb is initialized
KVM: selftests: Introduce access_tracking_perf_test
KVM: selftests: Fix missing break in dirty_log_perf_test arg parsing
x86/kvm: fix vcpu-id indexed array sizes
KVM: x86: Check the right feature bit for MSR_KVM_ASYNC_PF_ACK access
docs: virt: kvm: api.rst: replace some characters
KVM: Documentation: Fix KVM_CAP_ENFORCE_PV_FEATURE_CPUID name
KVM: nSVM: Swap the parameter order for svm_copy_vmrun_state()/svm_copy_vmloadsave_state()
KVM: nSVM: Rename nested_svm_vmloadsave() to svm_copy_vmloadsave_state()
KVM: arm64: selftests: get-reg-list: actually enable pmu regs in pmu sublist
KVM: selftests: change pthread_yield to sched_yield
KVM: arm64: Fix detection of shared VMAs on guest fault
David S. Miller [Thu, 29 Jul 2021 14:35:01 +0000 (15:35 +0100)]
Merge branch 'sja110-vlan-fixes'
Vladimir Oltean says:
====================
NXP SJA1105 VLAN regressions
These are 3 patches to fix issues seen with some more varied testing
done after the changes in the "Traffic termination for sja1105 ports
under VLAN-aware bridge" series were made:
https://patchwork.kernel.org/project/netdevbpf/cover/20210726165536.1338471-1-vladimir.oltean@nxp.com/
Issue 1: traffic no longer works on a port after leaving a VLAN-aware bridge
Issue 2: untagged traffic not dropped if pvid is absent from a VLAN-aware port
Issue 3: PTP and STP broken on ports under a VLAN-aware bridge
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 28 Jul 2021 21:54:29 +0000 (00:54 +0300)]
net: dsa: tag_sja1105: fix control packets on SJA1110 being received on an imprecise port
On RX, a control packet with SJA1110 will have:
- an in-band control extension (DSA tag) composed of a header and an
optional trailer (if it is a timestamp frame). We can (and do) deduce
the source port and switch id from this.
- a VLAN header, which can either be the tag_8021q RX VLAN (pvid) or the
bridge VLAN. The sja1105_vlan_rcv() function attempts to deduce the
source port and switch id a second time from this.
The basic idea is that even though we don't need the source port
information from the tag_8021q header if it's a control packet, we do
need to strip that header before we pass it on to the network stack.
The problem is that we call sja1105_vlan_rcv for ports under VLAN-aware
bridges, and that function tells us it couldn't identify a tag_8021q
header, so we need to perform imprecise RX by VID. Well, we don't,
because we already know the source port and switch ID.
This patch drops the return value from sja1105_vlan_rcv and we just look
at the source_port and switch_id values from sja1105_rcv and sja1110_rcv
which were initialized to -1. If they are still -1 it means we need to
perform imprecise RX.
Fixes: 884be12f8566 ("net: dsa: sja1105: add support for imprecise RX") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 28 Jul 2021 21:54:28 +0000 (00:54 +0300)]
net: dsa: sja1105: make sure untagged packets are dropped on ingress ports with no pvid
Surprisingly, this configuration:
ip link add br0 type bridge vlan_filtering 1
ip link set swp2 master br0
bridge vlan del dev swp2 vid 1
still has the sja1105 switch sending untagged packets to the CPU (and
failing to decode them, since dsa_find_designated_bridge_port_by_vid
searches by VID 1 and rightfully finds no bridge VLAN 1 on a port).
Dumping the switch configuration, the VLANs are managed properly:
- the pvid of swp2 is 1 in the MAC Configuration Table, but
- only the CPU port is in the port membership of VLANID 1 in the VLAN
Lookup Table
When the ingress packets are tagged with VID 1, they are properly
dropped. But when they are untagged, they are able to reach the CPU
port. Also, when the pvid in the MAC Configuration Table is changed to
e.g. 55 (an unused VLAN), the untagged packets are also dropped.
So it looks like:
- the switch bypasses ingress VLAN membership checks for untagged traffic
- the reason why the untagged traffic is dropped when I make the pvid 55
is due to the lack of valid destination ports in VLAN 55, rather than
an ingress membership violation
- the ingress VLAN membership cheks are only done for VLAN-tagged traffic
Interesting. It looks like there is an explicit bit to drop untagged
traffic, so we should probably be using that to preserve user expectations.
Note that only VLAN-aware ports should drop untagged packets due to no
pvid - when VLAN-unaware, the software bridge doesn't do this even if
there is no pvid on any bridge port and on the bridge itself. So the new
sja1105_drop_untagged() function cannot simply be called with "false"
from sja1105_bridge_vlan_add() and with "true" from sja1105_bridge_vlan_del.
Instead, we need to also consider the VLAN awareness state. That means
we need to hook the "drop untagged" setting in all the same places where
the "commit pvid" logic is, and it needs to factor in all the state when
flipping the "drop untagged" bit: is our current pvid in the VLAN Lookup
Table, and is the current port in that VLAN's port membership list?
VLAN-unaware ports will never drop untagged frames because these checks
always succeed by construction, and the tag_8021q VLANs cannot be changed
by the user.
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Wed, 28 Jul 2021 21:54:27 +0000 (00:54 +0300)]
net: dsa: sja1105: reset the port pvid when leaving a VLAN-aware bridge
Now that we no longer have the ultra-central sja1105_build_vlan_table(),
we need to be more careful about checking all corner cases manually.
For example, when a port leaves a VLAN-aware bridge, it becomes
standalone so its pvid should become a tag_8021q RX VLAN again. However,
sja1105_commit_pvid() only gets called from sja1105_bridge_vlan_add()
and from sja1105_vlan_filtering(), and no VLAN awareness change takes
place (VLAN filtering is a global setting for sja1105, so the switch
remains VLAN-aware overall).
This means that we need to put another sja1105_commit_pvid() call in
sja1105_bridge_member().
Fixes: 6dfd23d35e75 ("net: dsa: sja1105: delete vlan delta save/restore logic") Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 29 Jul 2021 14:06:50 +0000 (15:06 +0100)]
Merge branch 'mctp'
Jeremy Kerr says:
====================
Add Management Component Transport Protocol support
This series adds core MCTP support to the kernel. From the Kconfig
description:
Management Component Transport Protocol (MCTP) is an in-system
protocol for communicating between management controllers and
their managed devices (peripherals, host processors, etc.). The
protocol is defined by DMTF specification DSP0236.
This option enables core MCTP support. For communicating with other
devices, you'll want to enable a driver for a specific hardware
channel.
This implementation allows a sockets-based API for sending and receiving
MCTP messages via sendmsg/recvmsg on SOCK_DGRAM sockets. Kernel stack
control is all via netlink, using existing RTM_* messages. The userspace
ABI change is fairly small; just the necessary AF_/ETH_P_/ARPHDR_
constants, a new sockaddr, and a new netlink attribute.
For MAINTAINERS, I've just included netdev@ as the list entry. I'm happy
to alter this based on preferences here - an alternative would be the
OpenBMC list (the main user of the MCTP interface), or we can create a
new list entirely.
We have a couple of interface drivers almost ready to go at the moment,
but those can wait until the core code has some review.
This is v4 of the series; v1 and v2 were both RFC.
selinux folks: CCing 01/15 due to the new PF_MCTP protocol family.
linux-doc folks: CCing 15/15 for the new MCTP overview document.
Review, comments, questions etc. are most welcome.
Cheers,
Jeremy
v2:
- change to match spec terminology: controller -> component
- require specific capabilities for bind() & sendmsg()
- add address and tag defintions to uapi
- add selinux AF_MCTP table definitions
- remove strict cflags; warnings are present in common headers
v3:
- require caps for MCTP bind() & send()
- comment typo fixes
- switch to an array for local EIDs
- fix addrinfo dump iteration & error path
- add RTM_DELADDR
- remove GENMASK() and BIT() from uapi
v4:
- drop tun patch; that can be submitted separately
- keep nipa happy: add maintainer CCs, including doc and selinux
- net-next rebase
- Include AF_MCTP in af_family_slock_keys and pf_family_names
- Introduce MODULE_ definitions earlier
- upstream change: set_link_af no longer called with RTNL held
- add kdoc for net_device.mctp_ptr
- don't inline mctp_rt_match_eid
- require rtm_type == RTN_UNICAST in route management handlers
- remove unused RTAX policy table
- fix mctp_sock->keys rcu annotations
- fix spurious rcu_read_unlock in route input
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Johnston [Thu, 29 Jul 2021 02:20:52 +0000 (10:20 +0800)]
mctp: Allow per-netns default networks
Currently we have a compile-time default network
(MCTP_INITIAL_DEFAULT_NET). This change introduces a default_net field
on the net namespace, allowing future configuration for new interfaces.
Signed-off-by: Matt Johnston <matt@codeconstruct.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>