]> git.proxmox.com Git - mirror_ubuntu-hirsute-kernel.git/log
mirror_ubuntu-hirsute-kernel.git
5 years agosamples/bpf: Convert XDP samples to libbpf usage
Maciej Fijalkowski [Fri, 1 Feb 2019 21:42:25 +0000 (22:42 +0100)]
samples/bpf: Convert XDP samples to libbpf usage

Some of XDP samples that are attaching the bpf program to the interface
via libbpf's bpf_set_link_xdp_fd are still using the bpf_load.c for
loading and manipulating the ebpf program and maps. Convert them to do
this through libbpf usage and remove bpf_load from the picture.

While at it remove what looks like debug leftover in
xdp_redirect_map_user.c

In xdp_redirect_cpu, change the way that the program to be loaded onto
interface is chosen - user now needs to pass the program's section name
instead of the relative number. In case of typo print out the section
names to choose from.

Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: Jesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agosamples/bpf: xdp_redirect_cpu have not need for read_trace_pipe
Jesper Dangaard Brouer [Fri, 1 Feb 2019 21:42:24 +0000 (22:42 +0100)]
samples/bpf: xdp_redirect_cpu have not need for read_trace_pipe

The sample xdp_redirect_cpu is not using helper bpf_trace_printk.
Thus it makes no sense that the --debug option us reading
from /sys/kernel/debug/tracing/trace_pipe via read_trace_pipe.
Simply remove it.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agolibbpf: Add a helper for retrieving a map fd for a given name
Maciej Fijalkowski [Fri, 1 Feb 2019 21:42:23 +0000 (22:42 +0100)]
libbpf: Add a helper for retrieving a map fd for a given name

XDP samples are mostly cooperating with eBPF maps through their file
descriptors. In case of a eBPF program that contains multiple maps it
might be tiresome to iterate through them and call bpf_map__fd for each
one. Add a helper mostly based on bpf_object__find_map_by_name, but
instead of returning the struct bpf_map pointer, return map fd.

Suggested-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: powerpc64: add JIT support for bpf line info
Sandipan Das [Fri, 1 Feb 2019 10:32:32 +0000 (16:02 +0530)]
bpf: powerpc64: add JIT support for bpf line info

This adds support for generating bpf line info for
JITed programs.

Signed-off-by: Sandipan Das <sandipan@linux.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge branch 'bpf-spinlocks'
Daniel Borkmann [Fri, 1 Feb 2019 19:55:40 +0000 (20:55 +0100)]
Merge branch 'bpf-spinlocks'

Alexei Starovoitov says:

====================
Many algorithms need to read and modify several variables atomically.
Until now it was hard to impossible to implement such algorithms in BPF.
Hence introduce support for bpf_spin_lock.

The api consists of 'struct bpf_spin_lock' that should be placed
inside hash/array/cgroup_local_storage element
and bpf_spin_lock/unlock() helper function.

Example:
struct hash_elem {
    int cnt;
    struct bpf_spin_lock lock;
};
struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
if (val) {
    bpf_spin_lock(&val->lock);
    val->cnt++;
    bpf_spin_unlock(&val->lock);
}

and BPF_F_LOCK flag for lookup/update bpf syscall commands that
allows user space to read/write map elements under lock.

Together these primitives allow race free access to map elements
from bpf programs and from user space.

Key restriction: root only.
Key requirement: maps must be annotated with BTF.

This concept was discussed at Linux Plumbers Conference 2018.
Thank you everyone who participated and helped to iron out details
of api and implementation.

Patch 1: bpf_spin_lock support in the verifier, BTF, hash, array.
Patch 2: bpf_spin_lock in cgroup local storage.
Patches 3,4,5: tests
Patch 6: BPF_F_LOCK flag to lookup/update
Patches 7,8,9: tests

v6->v7:
- fixed this_cpu->__this_cpu per Peter's suggestion and added Ack.
- simplified bpf_spin_lock and load/store overlap check in the verifier
  as suggested by Andrii
- rebase

v5->v6:
- adopted arch_spinlock approach suggested by Peter
- switched to spin_lock_irqsave equivalent as the simplest way
  to avoid deadlocks in rare case of nested networking progs
  (cgroup-bpf prog in preempt_disable vs clsbpf in softirq sharing
  the same map with bpf_spin_lock)
  bpf_spin_lock is only allowed in networking progs that don't
  have arbitrary entry points unlike tracing progs.
- rebase and split test_verifier tests

v4->v5:
- disallow bpf_spin_lock for tracing progs due to insufficient preemption checks
- socket filter progs cannot use bpf_spin_lock due to missing preempt_disable
- fix atomic_set_release. Spotted by Peter.
- fixed hash_of_maps

v3->v4:
- fix BPF_EXIST | BPF_NOEXIST check patch 6. Spotted by Jakub. Thanks!
- rebase

v2->v3:
- fixed build on ia64 and archs where qspinlock is not supported
- fixed missing lock init during lookup w/o BPF_F_LOCK. Spotted by Martin

v1->v2:
- addressed several issues spotted by Daniel and Martin in patch 1
- added test11 to patch 4 as suggested by Daniel
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoselftests/bpf: test for BPF_F_LOCK
Alexei Starovoitov [Thu, 31 Jan 2019 23:40:12 +0000 (15:40 -0800)]
selftests/bpf: test for BPF_F_LOCK

Add C based test that runs 4 bpf programs in parallel
that update the same hash and array maps.
And another 2 threads that read from these two maps
via lookup(key, value, BPF_F_LOCK) api
to make sure the user space sees consistent value in both
hash and array elements while user space races with kernel bpf progs.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agolibbpf: introduce bpf_map_lookup_elem_flags()
Alexei Starovoitov [Thu, 31 Jan 2019 23:40:11 +0000 (15:40 -0800)]
libbpf: introduce bpf_map_lookup_elem_flags()

Introduce
int bpf_map_lookup_elem_flags(int fd, const void *key, void *value, __u64 flags)
helper to lookup array/hash/cgroup_local_storage elements with BPF_F_LOCK flag.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools/bpf: sync uapi/bpf.h
Alexei Starovoitov [Thu, 31 Jan 2019 23:40:10 +0000 (15:40 -0800)]
tools/bpf: sync uapi/bpf.h

add BPF_F_LOCK definition to tools/include/uapi/linux/bpf.h

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: introduce BPF_F_LOCK flag
Alexei Starovoitov [Thu, 31 Jan 2019 23:40:09 +0000 (15:40 -0800)]
bpf: introduce BPF_F_LOCK flag

Introduce BPF_F_LOCK flag for map_lookup and map_update syscall commands
and for map_update() helper function.
In all these cases take a lock of existing element (which was provided
in BTF description) before copying (in or out) the rest of map value.

Implementation details that are part of uapi:

Array:
The array map takes the element lock for lookup/update.

Hash:
hash map also takes the lock for lookup/update and tries to avoid the bucket lock.
If old element exists it takes the element lock and updates the element in place.
If element doesn't exist it allocates new one and inserts into hash table
while holding the bucket lock.
In rare case the hashmap has to take both the bucket lock and the element lock
to update old value in place.

Cgroup local storage:
It is similar to array. update in place and lookup are done with lock taken.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoselftests/bpf: add bpf_spin_lock C test
Alexei Starovoitov [Thu, 31 Jan 2019 23:40:08 +0000 (15:40 -0800)]
selftests/bpf: add bpf_spin_lock C test

add bpf_spin_lock C based test that requires latest llvm with BTF support

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoselftests/bpf: add bpf_spin_lock verifier tests
Alexei Starovoitov [Thu, 31 Jan 2019 23:40:07 +0000 (15:40 -0800)]
selftests/bpf: add bpf_spin_lock verifier tests

add bpf_spin_lock tests to test_verifier.c that don't require
latest llvm with BTF support

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agotools/bpf: sync include/uapi/linux/bpf.h
Alexei Starovoitov [Thu, 31 Jan 2019 23:40:06 +0000 (15:40 -0800)]
tools/bpf: sync include/uapi/linux/bpf.h

sync bpf.h

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: add support for bpf_spin_lock to cgroup local storage
Alexei Starovoitov [Thu, 31 Jan 2019 23:40:05 +0000 (15:40 -0800)]
bpf: add support for bpf_spin_lock to cgroup local storage

Allow 'struct bpf_spin_lock' to reside inside cgroup local storage.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: introduce bpf_spin_lock
Alexei Starovoitov [Thu, 31 Jan 2019 23:40:04 +0000 (15:40 -0800)]
bpf: introduce bpf_spin_lock

Introduce 'struct bpf_spin_lock' and bpf_spin_lock/unlock() helpers to let
bpf program serialize access to other variables.

Example:
struct hash_elem {
    int cnt;
    struct bpf_spin_lock lock;
};
struct hash_elem * val = bpf_map_lookup_elem(&hash_map, &key);
if (val) {
    bpf_spin_lock(&val->lock);
    val->cnt++;
    bpf_spin_unlock(&val->lock);
}

Restrictions and safety checks:
- bpf_spin_lock is only allowed inside HASH and ARRAY maps.
- BTF description of the map is mandatory for safety analysis.
- bpf program can take one bpf_spin_lock at a time, since two or more can
  cause dead locks.
- only one 'struct bpf_spin_lock' is allowed per map element.
  It drastically simplifies implementation yet allows bpf program to use
  any number of bpf_spin_locks.
- when bpf_spin_lock is taken the calls (either bpf2bpf or helpers) are not allowed.
- bpf program must bpf_spin_unlock() before return.
- bpf program can access 'struct bpf_spin_lock' only via
  bpf_spin_lock()/bpf_spin_unlock() helpers.
- load/store into 'struct bpf_spin_lock lock;' field is not allowed.
- to use bpf_spin_lock() helper the BTF description of map value must be
  a struct and have 'struct bpf_spin_lock anyname;' field at the top level.
  Nested lock inside another struct is not allowed.
- syscall map_lookup doesn't copy bpf_spin_lock field to user space.
- syscall map_update and program map_update do not update bpf_spin_lock field.
- bpf_spin_lock cannot be on the stack or inside networking packet.
  bpf_spin_lock can only be inside HASH or ARRAY map value.
- bpf_spin_lock is available to root only and to all program types.
- bpf_spin_lock is not allowed in inner maps of map-in-map.
- ld_abs is not allowed inside spin_lock-ed region.
- tracing progs and socket filter progs cannot use bpf_spin_lock due to
  insufficient preemption checks

Implementation details:
- cgroup-bpf class of programs can nest with xdp/tc programs.
  Hence bpf_spin_lock is equivalent to spin_lock_irqsave.
  Other solutions to avoid nested bpf_spin_lock are possible.
  Like making sure that all networking progs run with softirq disabled.
  spin_lock_irqsave is the simplest and doesn't add overhead to the
  programs that don't use it.
- arch_spinlock_t is used when its implemented as queued_spin_lock
- archs can force their own arch_spinlock_t
- on architectures where queued_spin_lock is not available and
  sizeof(arch_spinlock_t) != sizeof(__u32) trivial lock is used.
- presence of bpf_spin_lock inside map value could have been indicated via
  extra flag during map_create, but specifying it via BTF is cleaner.
  It provides introspection for map key/value and reduces user mistakes.

Next steps:
- allow bpf_spin_lock in other map types (like cgroup local storage)
- introduce BPF_F_LOCK flag for bpf_map_update() syscall and helper
  to request kernel to grab bpf_spin_lock before rewriting the value.
  That will serialize access to map elements.

Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf, cgroups: clean up kerneldoc warnings
Valdis Kletnieks [Tue, 29 Jan 2019 06:47:06 +0000 (01:47 -0500)]
bpf, cgroups: clean up kerneldoc warnings

Building with W=1 reveals some bitrot:

  CC      kernel/bpf/cgroup.o
kernel/bpf/cgroup.c:238: warning: Function parameter or member 'flags' not described in '__cgroup_bpf_attach'
kernel/bpf/cgroup.c:367: warning: Function parameter or member 'unused_flags' not described in '__cgroup_bpf_detach'

Add a kerneldoc line for 'flags'.

Fixing the warning for 'unused_flags' is best approached by
removing the unused parameter on the function call.

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: fix missing prototype warnings
Valdis Kletnieks [Tue, 29 Jan 2019 06:04:25 +0000 (01:04 -0500)]
bpf: fix missing prototype warnings

Compiling with W=1 generates warnings:

  CC      kernel/bpf/core.o
kernel/bpf/core.c:721:12: warning: no previous prototype for ?bpf_jit_alloc_exec_limit? [-Wmissing-prototypes]
  721 | u64 __weak bpf_jit_alloc_exec_limit(void)
      |            ^~~~~~~~~~~~~~~~~~~~~~~~
kernel/bpf/core.c:757:14: warning: no previous prototype for ?bpf_jit_alloc_exec? [-Wmissing-prototypes]
  757 | void *__weak bpf_jit_alloc_exec(unsigned long size)
      |              ^~~~~~~~~~~~~~~~~~
kernel/bpf/core.c:762:13: warning: no previous prototype for ?bpf_jit_free_exec? [-Wmissing-prototypes]
  762 | void __weak bpf_jit_free_exec(void *addr)
      |             ^~~~~~~~~~~~~~~~~

All three are weak functions that archs can override, provide
proper prototypes for when a new arch provides their own.

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: fix bitrotted kerneldoc
Valdis Kletnieks [Tue, 29 Jan 2019 04:04:46 +0000 (23:04 -0500)]
bpf: fix bitrotted kerneldoc

Over the years, the function signature has changed, but the
kerneldoc block hasn't.

Signed-off-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge branch 'bpf-tests-probe-kernel-support'
Daniel Borkmann [Thu, 31 Jan 2019 09:13:22 +0000 (10:13 +0100)]
Merge branch 'bpf-tests-probe-kernel-support'

Stanislav Fomichev says:

====================
If test_maps/test_verifier is running against the kernel which doesn't
have _all_ BPF features enabled, it fails with an error. This patch
series tries to probe kernel support for each failed test and skip
it instead. This lets users run BPF selftests in the not-all-bpf-yes
environments and received correct PASS/NON-PASS result.

See https://www.spinics.net/lists/netdev/msg539331.html for more
context.

The series goes like this:

* patch #1 skips sockmap tests in test_maps.c if BPF_MAP_TYPE_SOCKMAP
  map is not supported (if bpf_create_map fails, we probe the kernel
  for support)
* patch #2 skips verifier tests if test->prog_type is not supported (if
  bpf_verify_program fails, we probe the kernel for support)
* patch #3 skips verifier tests if test fixup map is not supported (if
  create_map fails, we probe the kernel for support)
* next patches fix various small issues that arise from the first four:
  * patch #4 sets "unknown func bpf_trace_printk#6" prog_type to
    BPF_PROG_TYPE_TRACEPOINT so it is correctly skipped in
    CONFIG_BPF_EVENTS=n case
  * patch #5 exposes BPF_PROG_TYPE_CGROUP_{SKB,SOCK,SOCK_ADDR} only when
    CONFIG_CGROUP_BPF=y, this makes verifier correctly skip appropriate
    tests

v3 changes:
* rebased on top of Quentin's series which adds probes to libbpf

v2 changes:
* don't sprinkle "ifdef CONFIG_CGROUP_BPF" all around net/core/filter.c,
  doing it only in the bpf_types.h is enough to disable
  BPF_PROG_TYPE_CGROUP_{SKB,SOCK,SOCK_ADDR} prog types for non-cgroup
  enabled kernels
====================

Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agobpf: BPF_PROG_TYPE_CGROUP_{SKB, SOCK, SOCK_ADDR} require cgroups enabled
Stanislav Fomichev [Mon, 28 Jan 2019 17:21:19 +0000 (09:21 -0800)]
bpf: BPF_PROG_TYPE_CGROUP_{SKB, SOCK, SOCK_ADDR} require cgroups enabled

There is no way to exercise appropriate attach points without cgroups
enabled. This lets test_verifier correctly skip tests for these
prog_types if kernel was compiled without BPF cgroup support.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoselftests/bpf: mark verifier test that uses bpf_trace_printk as BPF_PROG_TYPE_TRACEPOINT
Stanislav Fomichev [Mon, 28 Jan 2019 17:21:18 +0000 (09:21 -0800)]
selftests/bpf: mark verifier test that uses bpf_trace_printk as BPF_PROG_TYPE_TRACEPOINT

We don't have this helper if the kernel was compiled without
CONFIG_BPF_EVENTS. Setting prog_type to BPF_PROG_TYPE_TRACEPOINT
let's verifier correctly skip this test based on the missing
prog_type support in the kernel.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoselftests/bpf: skip verifier tests for unsupported map types
Stanislav Fomichev [Mon, 28 Jan 2019 17:21:17 +0000 (09:21 -0800)]
selftests/bpf: skip verifier tests for unsupported map types

Use recently introduced bpf_probe_map_type() to skip tests in the
test_verifier if map creation (create_map) fails. It's handled
explicitly for each fixup, i.e. if bpf_create_map returns negative fd,
we probe the kernel for the appropriate map support and skip the
test is map type is not supported.

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoselftests/bpf: skip verifier tests for unsupported program types
Stanislav Fomichev [Mon, 28 Jan 2019 17:21:16 +0000 (09:21 -0800)]
selftests/bpf: skip verifier tests for unsupported program types

Use recently introduced bpf_probe_prog_type() to skip tests in the
test_verifier() if bpf_verify_program() fails. The skipped test is
indicated in the output.

Example:

...
679/p bpf_get_stack return R0 within range SKIP (unsupported program
type 5)
680/p ld_abs: invalid op 1 OK
...
Summary: 863 PASSED, 165 SKIPPED, 3 FAILED

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoselftests/bpf: skip sockmap in test_maps if kernel doesn't have support
Stanislav Fomichev [Mon, 28 Jan 2019 17:21:15 +0000 (09:21 -0800)]
selftests/bpf: skip sockmap in test_maps if kernel doesn't have support

Use recently introduced bpf_probe_map_type() to skip test_sockmap()
if map creation fails. The skipped test is indicated in the output.

Example:

test_sockmap SKIP (unsupported map type BPF_MAP_TYPE_SOCKMAP)
Fork 1024 tasks to 'test_update_delete'
...
test_sockmap SKIP (unsupported map type BPF_MAP_TYPE_SOCKMAP)
Fork 1024 tasks to 'test_update_delete'
...
test_maps: OK, 2 SKIPPED

Signed-off-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
5 years agoMerge branch 'hns3-next'
David S. Miller [Wed, 30 Jan 2019 22:50:04 +0000 (14:50 -0800)]
Merge branch 'hns3-next'

Huazhong Tan says:

====================
code optimizations & bugfixes for HNS3 driver

This patchset includes bugfixes and code optimizations for the HNS3
ethernet controller driver
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: keep flow director state unchanged when reset
Jian Shen [Wed, 30 Jan 2019 20:55:52 +0000 (04:55 +0800)]
net: hns3: keep flow director state unchanged when reset

In orginal codes, driver always enables flow director when
intializing. When user disable flow director with command
ethtool -K, the flow director will be enabled again after
resetting.

This patch fixes it by only enabling it when first initialzing.

Fixes: 6871af29b3ab ("net: hns3: Add reset handle for flow director")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: stop sending keep alive msg to PF when VF is resetting
Jian Shen [Wed, 30 Jan 2019 20:55:51 +0000 (04:55 +0800)]
net: hns3: stop sending keep alive msg to PF when VF is resetting

When VF is resetting, it can't communicate to PF with mailbox msg.
This patch adds reset state checking before sending keep alive msg
to PF.

Fixes: a6d818e31d08 ("net: hns3: Add vport alive state checking support")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix an issue for hclgevf_ae_get_hdev
Peng Li [Wed, 30 Jan 2019 20:55:50 +0000 (04:55 +0800)]
net: hns3: fix an issue for hclgevf_ae_get_hdev

HNS3 VF driver support NIC and Roce, hdev stores NIC
handle and Roce handle, should use correct parameter for
container_of.

Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix improper error handling in the hclge_init_ae_dev()
Huazhong Tan [Wed, 30 Jan 2019 20:55:49 +0000 (04:55 +0800)]
net: hns3: fix improper error handling in the hclge_init_ae_dev()

While hclge_init_umv_space() failed in the hclge_init_ae_dev(),
we should undo all the operation which has been done successfully,
the last success operation maybe hclge_mac_mdio_config(), so if
hclge_init_umv_space() failed, we also need to undo it.

Fixes: 288475b2ad01 ("{topost} net: hns3: refine umv space allocation")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix for rss result nonuniform
Jian Shen [Wed, 30 Jan 2019 20:55:48 +0000 (04:55 +0800)]
net: hns3: fix for rss result nonuniform

The rss result is more uniform when use recommended hash key from
microsoft, instead of the one generated by netdev_rss_key_fill().
Also using hash algorithm "xor" is better than "toeplitz".

This patch modifies the default hash key and hash algorithm.

Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix netif_napi_del() not do problem when unloading
Huazhong Tan [Wed, 30 Jan 2019 20:55:47 +0000 (04:55 +0800)]
net: hns3: fix netif_napi_del() not do problem when unloading

When the driver is unloading, if a global reset occurs,
unmap_ring_from_vector() in the hns3_nic_uninit_vector_data() will
fail, and hns3_nic_uninit_vector_data() just return. There may be
some netif_napi_del() not be done.

Since hardware will unmap all ring while resetting, so
hns3_nic_uninit_vector_data() should ignore this error, and do the
rest uninitialization.

Fixes: 76ad4f0ee747 ("net: hns3: Add support of HNS3 Ethernet Driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: Fix NULL deref when unloading driver
Huazhong Tan [Wed, 30 Jan 2019 20:55:46 +0000 (04:55 +0800)]
net: hns3: Fix NULL deref when unloading driver

When the driver is unloading, if there is a calling of ndo_open occurs
between phy_disconnect() and unregister_netdev(), it will end up
causing the kernel to eventually hit a NULL deref:

[14942.417828] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000048
[14942.529878] Mem abort info:
[14942.551166]   ESR = 0x96000006
[14942.567070]   Exception class = DABT (current EL), IL = 32 bits
[14942.623081]   SET = 0, FnV = 0
[14942.639112]   EA = 0, S1PTW = 0
[14942.643628] Data abort info:
[14942.659227]   ISV = 0, ISS = 0x00000006
[14942.674870]   CM = 0, WnR = 0
[14942.679449] user pgtable: 4k pages, 48-bit VAs, pgdp = 00000000224ad6ad
[14942.695595] [0000000000000048] pgd=00000021e6673003, pud=00000021dbf01003, pmd=0000000000000000
[14942.723163] Internal error: Oops: 96000006 [#1] PREEMPT SMP
[14942.729358] Modules linked in: hns3(O) hclge(O) pv680_mii(O) hnae3(O) [last unloaded: hclge]
[14942.738907] CPU: 1 PID: 26629 Comm: kworker/u4:13 Tainted: G           O      4.18.0-rc1-12928-ga960791-dirty #145
[14942.749491] Hardware name: Huawei Technologies Co., Ltd. D05/D05, BIOS Hi1620 FPGA TB BOOT BIOS B763 08/17/2018
[14942.760392] Workqueue: events_power_efficient phy_state_machine
[14942.766644] pstate: 80c00009 (Nzcv daif +PAN +UAO)
[14942.771918] pc : test_and_set_bit+0x18/0x38
[14942.776589] lr : netif_carrier_off+0x24/0x70
[14942.781033] sp : ffff0000121abd20
[14942.784518] x29: ffff0000121abd20 x28: 0000000000000000
[14942.790208] x27: ffff0000164d3cd8 x26: ffff8021da68b7b8
[14942.795832] x25: 0000000000000000 x24: ffff8021eb407800
[14942.801445] x23: 0000000000000000 x22: 0000000000000000
[14942.807046] x21: 0000000000000001 x20: 0000000000000000
[14942.812672] x19: 0000000000000000 x18: ffff000009781708
[14942.818284] x17: 00000000004970e8 x16: ffff00000816ad48
[14942.823900] x15: 0000000000000000 x14: 0000000000000008
[14942.829528] x13: 0000000000000000 x12: 0000000000000f65
[14942.835149] x11: 0000000000000001 x10: 00000000000009d0
[14942.840753] x9 : ffff0000121abaa0 x8 : 0000000000000000
[14942.846360] x7 : ffff000009781708 x6 : 0000000000000003
[14942.851970] x5 : 0000000000000020 x4 : 0000000000000004
[14942.857575] x3 : 0000000000000002 x2 : 0000000000000001
[14942.863180] x1 : 0000000000000048 x0 : 0000000000000000
[14942.868875] Process kworker/u4:13 (pid: 26629, stack limit = 0x00000000c909dbf3)
[14942.876464] Call trace:
[14942.879200]  test_and_set_bit+0x18/0x38
[14942.883376]  phy_link_change+0x38/0x78
[14942.887378]  phy_state_machine+0x3dc/0x4f8
[14942.891968]  process_one_work+0x158/0x470
[14942.896223]  worker_thread+0x50/0x470
[14942.900219]  kthread+0x104/0x130
[14942.903905]  ret_from_fork+0x10/0x1c
[14942.907755] Code: d2800022 8b400c21 f9800031 9ac32044 (c85f7c22)
[14942.914185] ---[ end trace 968c9e12eb740b23 ]---

So this patch fixes it by modifying the timing to do phy_connect_direct()
and phy_disconnect().

Fixes: 256727da7395 ("net: hns3: Add MDIO support to HNS3 Ethernet driver for hip08 SoC")
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: only support tc 0 for VF
Yunsheng Lin [Wed, 30 Jan 2019 20:55:45 +0000 (04:55 +0800)]
net: hns3: only support tc 0 for VF

When the VF shares the same TC config as PF, the business
running on PF and VF must have samiliar module.

For simplicity, we are not considering VF sharing the same tc
configuration as PF use case, so this patch removes the support
of TC configuration from VF and forcing VF to just use single
TC.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: change hnae3_register_ae_dev() to int
Huazhong Tan [Wed, 30 Jan 2019 20:55:44 +0000 (04:55 +0800)]
net: hns3: change hnae3_register_ae_dev() to int

hnae3_register_ae_dev() may fail, and it should return a error code
to its caller, so change hnae3_register_ae_dev() return type to int.

Also, when hnae3_register_ae_dev() return error, hns3_probe() should
do some error handling and return the error code.

Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: use the correct interface to stop|open port
Peng Li [Wed, 30 Jan 2019 20:55:43 +0000 (04:55 +0800)]
net: hns3: use the correct interface to stop|open port

dev_close() stop the netdev and the service base on the netdev
will stop. But ndev->netdev_ops->ndo_stop() may only stop HW
and stack queue, the service base on the netdev can still work.

Fixes: 5668abda0931 ("net: hns3: add support for set_ringparam")
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: fix VF dump register issue
Jian Shen [Wed, 30 Jan 2019 20:55:42 +0000 (04:55 +0800)]
net: hns3: fix VF dump register issue

In original codes, the .get_regs_len and .get_regs were missed
assigned. This patch fixes it.

Fixes: 1600c3e5f23e ("net: hns3: Support "ethtool -d" for HNS3 VF driver")
Signed-off-by: Jian Shen <shenjian15@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: hns3: reuse the definition of l3 and l4 header info union
liyongxin [Wed, 30 Jan 2019 20:55:41 +0000 (04:55 +0800)]
net: hns3: reuse the definition of l3 and l4 header info union

Union l3_hdr_info and l4_hdr_info have already been defined in
the hns3_enet.h, so it is unnecessary to define them elsewhere.

This patch removes the redundant definition, and reuses the one
defined in the hns3_enet.h.

Signed-off-by: liyongxin <liyongxin1@huawei.com>
Signed-off-by: Peng Li <lipeng321@huawei.com>
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'net-dsa-mt7530-support-MT7530-in-the-MT7621-SoC'
David S. Miller [Wed, 30 Jan 2019 22:26:07 +0000 (14:26 -0800)]
Merge branch 'net-dsa-mt7530-support-MT7530-in-the-MT7621-SoC'

Greg Ungerer says:

====================
net: dsa: mt7530: support MT7530 in the MT7621 SoC

This is the fourth version of a patch series supporting the MT7530 switch
as used in the MediaTek MT7621 SoC. Unlike the MediaTek MT7623 the MT7621
is built around a dual core MIPS CPU architecture. But inside it uses
basically the same 7530 switch.

This series resolves all issues I had with previous versions, and I can
now reliably use the driver on a 7621 SoC platform. These patches were
generated against linux-5.0-rc4.

The first patch enables support for the existing kernel mediatek ethernet
driver on the MT7621 SoC. This support is from Bjørn Mork, with an update
and fix by me. Using this driver fixed a number of problems I had
(TX checksums, large RX packet drop) over the staging driver
(drivers/staging/mt7621-eth).

Patch 2 modifies the mt7530 DSA driver to support the 7530 switch as
implemented in the Mediatek MT7621 SoC. The last patch updates the
devicetree bindings to reflect the new support in the mt7530 driver.

There is no real dependencies between the patches, so they can be taken
independantly.

Creating a new binding for the MT7621 seems like the only viable approach
to distinguish between a stand alone 7530 switch, the silicon module
in the MT7623 SoC and the silicon in the MT7621. Certainly the 7530 ID
register in the MT7623 and MT7621 returns the same value, "0x7530001".

Looking at the mt7530.c DSA driver it might make some sense to convert
the existing "mediatek,mcm" binding to something like "mediatek,mt7623"
to be consistent with this new MT7621 support. As far as I can tell
this is the intention of this binding.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodt-bindings: net: dsa: add new MT7530 binding to support MT7621
Greg Ungerer [Wed, 30 Jan 2019 01:24:06 +0000 (11:24 +1000)]
dt-bindings: net: dsa: add new MT7530 binding to support MT7621

Add devicetree binding to support the compatible mt7530 switch as used
in the MediaTek MT7621 SoC.

Signed-off-by: Greg Ungerer <gerg@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Sean Wang <sean.wang@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: dsa: mt7530: support the 7530 switch on the Mediatek MT7621 SoC
Greg Ungerer [Wed, 30 Jan 2019 01:24:05 +0000 (11:24 +1000)]
net: dsa: mt7530: support the 7530 switch on the Mediatek MT7621 SoC

The MediaTek MT7621 SoC device contains a 7530 switch, and the existing
linux kernel 7530 DSA switch driver can be used with it.

The bulk of the changes required stem from the 7621 having different
regulator and pad setup. The existing setup of these in the 7530
driver appears to be very specific to its implemtation in the Mediatek
7623 SoC. (Not entirely surprising given the 7623 is a quad core ARM
based SoC, and the 7621 is a dual core, dual thread MIPS based SoC).

Create a new devicetree type, "mediatek,mt7621", to support the 7530
switch in the 7621 SoC. There appears to be no usable ID register to
distinguish it from a 7530 in other hardware at runtime. This is used
to carry out the appropriate configuration and setup.

Signed-off-by: Greg Ungerer <gerg@kernel.org>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Sean Wang <sean.wang@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ethernet: mediatek: support MT7621 SoC ethernet hardware
Bjørn Mork [Wed, 30 Jan 2019 01:24:04 +0000 (11:24 +1000)]
net: ethernet: mediatek: support MT7621 SoC ethernet hardware

The Mediatek MT7621 SoC contains the same ethernet hardware module as
used on a number of other MediaTek SoC parts. There are some minor
differences to deal with but we can use the same driver to support
them all.

This patch is based on work by Bjørn Mork <bjorn@mork.no>, and his
original patch is at:

https://github.com/bmork/LEDE/commit/3293bc63f5461ca1eb0bbc4ed90145335e7e3404

There is an additional compatible devicetree type added, and the primary
change to the code required is to support a single interrupt (for both
RX and TX interrupts).

Signed-off-by: Bjørn Mork <bjorn@mork.no>
[gerg@kernel.org: rebase to mainline and irq handler fix]
Signed-off-by: Greg Ungerer <gerg@kernel.org>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Sean Wang <sean.wang@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'mlxsw-spectrum_acl-Include-delta-bits-into-hashtable-key'
David S. Miller [Wed, 30 Jan 2019 18:00:40 +0000 (10:00 -0800)]
Merge branch 'mlxsw-spectrum_acl-Include-delta-bits-into-hashtable-key'

Ido Schimmel says:

====================
mlxsw: spectrum_acl: Include delta bits into hashtable key

The Spectrum-2 ASIC allows multiple rules to use the same mask provided
that the difference between their masks is small enough (up to 8
consecutive delta bits). A more detailed explanation is provided in
merge commit 756cd36626f7 ("Merge branch
'mlxsw-Introduce-algorithmic-TCAM-support'").

These delta bits are part of the rule's key and therefore rules that
only differ in their delta bits can be inserted with the same A-TCAM
mask. In case two rules share the same key and only differ in their
priority, then the second will spill to the C-TCAM.

Current code does not take the delta bits into account when checking for
duplicate rules, which leads to unnecessary spillage to the C-TCAM.
This may result in reduced scale and performance.

Patch #1 includes the delta bits in the rule's key to avoid the above
mentioned problem.

Patch #2 adds a tracepoint when a rule is inserted into the C-TCAM.

Patches #3-#5 add test cases to make sure unnecessary spillage into the
C-TCAM does not occur.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: spectrum-2: Add delta two masks one key test
Jiri Pirko [Wed, 30 Jan 2019 08:58:37 +0000 (08:58 +0000)]
selftests: spectrum-2: Add delta two masks one key test

Ensure that the bug is fixed and we no longer have C-TCAM spill for two
keys that differ only in delta.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: spectrum-2: Fix multiple_masks_test
Jiri Pirko [Wed, 30 Jan 2019 08:58:36 +0000 (08:58 +0000)]
selftests: spectrum-2: Fix multiple_masks_test

With recent fix in C-TCAM spillage for delta masks, the test stops to be
falsely positive. So fix it not to use delta by adding src_ip bits to the
masks. Alongside with that, use C-TCAM spill trace to see when the
spillage actually happens.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoselftests: spectrum-2: Extend and move trace helpers
Jiri Pirko [Wed, 30 Jan 2019 08:58:35 +0000 (08:58 +0000)]
selftests: spectrum-2: Extend and move trace helpers

Allow to specify number of trace hits and move helpers
to the beginning of the file.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_acl: Add C-TCAM spill tracepoint
Jiri Pirko [Wed, 30 Jan 2019 08:58:34 +0000 (08:58 +0000)]
mlxsw: spectrum_acl: Add C-TCAM spill tracepoint

Add some visibility to the rule addition process and trace whenever rule
spilled into C-TCAM.

Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agomlxsw: spectrum_acl: Include delta bits into hashtable key
Jiri Pirko [Wed, 30 Jan 2019 08:58:33 +0000 (08:58 +0000)]
mlxsw: spectrum_acl: Include delta bits into hashtable key

Currently only ERP mask masked bits in key are considered for
the hashtable key. That leads to false negative collisions
and fallbacks to C-TCAM in case two keys differ only in delta bits.

Fix this by taking full encoded key as a hashtable key,
including delta bits.

Reported-by: Nir Dotan <nird@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'sctp-support-SCTP_FUTURE-CURRENT-ALL_ASSOC'
David S. Miller [Wed, 30 Jan 2019 08:44:08 +0000 (00:44 -0800)]
Merge branch 'sctp-support-SCTP_FUTURE-CURRENT-ALL_ASSOC'

Xin Long says:

====================
sctp: support SCTP_FUTURE/CURRENT/ALL_ASSOC

This patchset adds the support for 3 assoc_id constants: SCTP_FUTURE_ASSOC
SCTP_CURRENT_ASSOC, SCTP_ALL_ASSOC, described in rfc6458#section-7.2:

   All socket options set on a one-to-one style listening socket also
   apply to all future accepted sockets.  For one-to-many style sockets,
   often a socket option will pass a structure that includes an assoc_id
   field.  This field can be filled with the association identifier of a
   particular association and unless otherwise specified can be filled
   with one of the following constants:

   SCTP_FUTURE_ASSOC:  Specifies that only future associations created
      after this socket option will be affected by this call.

   SCTP_CURRENT_ASSOC:  Specifies that only currently existing
      associations will be affected by this call, and future
      associations will still receive the previous default value.

   SCTP_ALL_ASSOC:  Specifies that all current and future associations
      will be affected by this call.

The functions for many other sockopts that use assoc_id also need to be
updated accordingly.
====================

Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: add SCTP_FUTURE_ASOC and SCTP_CURRENT_ASSOC for SCTP_STREAM_SCHEDULER sockopt
Xin Long [Mon, 28 Jan 2019 07:08:46 +0000 (15:08 +0800)]
sctp: add SCTP_FUTURE_ASOC and SCTP_CURRENT_ASSOC for SCTP_STREAM_SCHEDULER sockopt

Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_scheduler and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_scheduler,
it's compatible with 0.

SCTP_CURRENT_ASSOC is supported for SCTP_STREAM_SCHEDULER in this
patch. It also adds default_ss in sctp_sock to support
SCTP_FUTURE_ASSOC.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_EVENT sockopt
Xin Long [Mon, 28 Jan 2019 07:08:45 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_EVENT sockopt

Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_event and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_event,
it's compatible with 0.

SCTP_CURRENT_ASSOC is supported for SCTP_EVENT in this patch.

It also adds sctp_assoc_ulpevent_type_set() to make code more
readable.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_ENABLE_STREAM_RESET...
Xin Long [Mon, 28 Jan 2019 07:08:44 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_ENABLE_STREAM_RESET sockopt

Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_enable_strreset and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_enable_strreset,
it's compatible with 0.

SCTP_CURRENT_ASSOC is supported for SCTP_ENABLE_STREAM_RESET in this patch.
It also adjusts some code to keep a same check form as other functions.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_PRINFO sockopt
Xin Long [Mon, 28 Jan 2019 07:08:43 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_PRINFO sockopt

Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_default_prinfo and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_default_prinfo,
it's compatible with 0.

SCTP_CURRENT_ASSOC is supported for SCTP_DEFAULT_PRINFO in this patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_DEACTIVATE_KEY...
Xin Long [Mon, 28 Jan 2019 07:08:42 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_DEACTIVATE_KEY sockopt

Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_deactivate_key.
SCTP_CURRENT_ASSOC is supported for SCTP_AUTH_DEACTIVATE_KEY in this
patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_DELETE_KEY sockopt
Xin Long [Mon, 28 Jan 2019 07:08:41 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_DELETE_KEY sockopt

Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_del_key.
SCTP_CURRENT_ASSOC is supported for SCTP_AUTH_DELETE_KEY in this patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_ACTIVE_KEY sockopt
Xin Long [Mon, 28 Jan 2019 07:08:40 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_ACTIVE_KEY sockopt

Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_auth_key.
SCTP_CURRENT_ASSOC is supported for SCTP_AUTH_ACTIVE_KEY in this patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_KEY sockopt
Xin Long [Mon, 28 Jan 2019 07:08:39 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_AUTH_KEY sockopt

Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_auth_key.
SCTP_CURRENT_ASSOC is supported for SCTP_AUTH_KEY in this patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_MAX_BURST sockopt
Xin Long [Mon, 28 Jan 2019 07:08:38 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_MAX_BURST sockopt

Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_maxburst and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_maxburst,
it's compatible with 0.

SCTP_CURRENT_ASSOC is supported for SCTP_CONTEXT in this patch.
It also adjusts some code to keep a same check form as other
functions.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_CONTEXT sockopt
Xin Long [Mon, 28 Jan 2019 07:08:37 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_CONTEXT sockopt

Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_context and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_context,
it's compatible with 0.

SCTP_CURRENT_ASSOC is supported for SCTP_CONTEXT in this patch.
It also adjusts some code to keep a same check form as other
functions.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_SNDINFO sockopt
Xin Long [Mon, 28 Jan 2019 07:08:36 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_SNDINFO sockopt

Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_default_sndinfo and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_default_sndinfo,
it's compatible with 0.

SCTP_CURRENT_ASSOC is supported for SCTP_DEFAULT_SNDINFO in this patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_SEND_PARAM...
Xin Long [Mon, 28 Jan 2019 07:08:35 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DEFAULT_SEND_PARAM sockopt

Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_default_send_param and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_default_send_param,
it's compatible with 0.

SCTP_CURRENT_ASSOC is supported for SCTP_DEFAULT_SEND_PARAM in this patch.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DELAYED_SACK sockopt
Xin Long [Mon, 28 Jan 2019 07:08:34 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC and add SCTP_CURRENT_ASSOC for SCTP_DELAYED_SACK sockopt

Check with SCTP_ALL_ASSOC instead in sctp_setsockopt_delayed_ack and
check with SCTP_FUTURE_ASSOC instead in sctp_getsockopt_delayed_ack,
it's compatible with 0.

SCTP_CURRENT_ASSOC is supported for SCTP_DELAYED_SACK in this patch.

It also adds sctp_apply_asoc_delayed_ack() to make code more readable.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: add SCTP_CURRENT_ASSOC for SCTP_STREAM_SCHEDULER_VALUE sockopt
Xin Long [Mon, 28 Jan 2019 07:08:33 +0000 (15:08 +0800)]
sctp: add SCTP_CURRENT_ASSOC for SCTP_STREAM_SCHEDULER_VALUE sockopt

SCTP_STREAM_SCHEDULER_VALUE is a special one, as its value is not
save in sctp_sock, but only in asoc. So only SCTP_CURRENT_ASSOC
reserved assoc_id can be used in sctp_setsockopt_scheduler_value.

This patch adds SCTP_CURRENT_ASOC support for
SCTP_STREAM_SCHEDULER_VALUE.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC for SCTP_INTERLEAVING_SUPPORTED sockopt
Xin Long [Mon, 28 Jan 2019 07:08:32 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC for SCTP_INTERLEAVING_SUPPORTED sockopt

Check with SCTP_FUTURE_ASSOC instead in
sctp_set/getsockopt_reconfig_supported, it's compatible with 0.

It also adjusts some code to keep a same check form as other functions.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC for SCTP_RECONFIG_SUPPORTED sockopt
Xin Long [Mon, 28 Jan 2019 07:08:31 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC for SCTP_RECONFIG_SUPPORTED sockopt

Check with SCTP_FUTURE_ASSOC instead in
sctp_set/getsockopt_reconfig_supported, it's compatible with 0.

It also adjusts some code to keep a same check form as other functions.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC for SCTP_PR_SUPPORTED sockopt
Xin Long [Mon, 28 Jan 2019 07:08:30 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC for SCTP_PR_SUPPORTED sockopt

Check with SCTP_FUTURE_ASSOC instead in
sctp_set/getsockopt_pr_supported, it's compatible with 0.

It also adjusts some code to keep a same check form as other functions.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: add SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_THLDS sockopt
Xin Long [Mon, 28 Jan 2019 07:08:29 +0000 (15:08 +0800)]
sctp: add SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_THLDS sockopt

Check with SCTP_FUTURE_ASSOC instead in
sctp_set/getsockopt_paddr_thresholds, it's compatible with 0.

It also adds pf_retrans in sctp_sock to support SCTP_FUTURE_ASSOC.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC for SCTP_LOCAL_AUTH_CHUNKS sockopt
Xin Long [Mon, 28 Jan 2019 07:08:28 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC for SCTP_LOCAL_AUTH_CHUNKS sockopt

Check with SCTP_FUTURE_ASSOC instead in
sctp_getsockopt_local_auth_chunks, it's compatible with 0.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC for SCTP_MAXSEG sockopt
Xin Long [Mon, 28 Jan 2019 07:08:27 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC for SCTP_MAXSEG sockopt

Check with SCTP_FUTURE_ASSOC instead in
sctp_set/getsockopt_maxseg, it's compatible with 0.
Also check asoc_id early as other sctp setsockopts does.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC for SCTP_ASSOCINFO sockopt
Xin Long [Mon, 28 Jan 2019 07:08:26 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC for SCTP_ASSOCINFO sockopt

Check with SCTP_FUTURE_ASSOC instead in
sctp_set/getsockopt_associnfo, it's compatible with 0.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC for SCTP_RTOINFO sockopt
Xin Long [Mon, 28 Jan 2019 07:08:25 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC for SCTP_RTOINFO sockopt

Check with SCTP_FUTURE_ASSOC instead in
sctp_set/getsockopt_rtoinfo, it's compatible with 0.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: use SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_PARAMS sockopt
Xin Long [Mon, 28 Jan 2019 07:08:24 +0000 (15:08 +0800)]
sctp: use SCTP_FUTURE_ASSOC for SCTP_PEER_ADDR_PARAMS sockopt

Check with SCTP_FUTURE_ASSOC instead in
sctp_/setgetsockopt_peer_addr_params, it's compatible with 0.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agosctp: introduce SCTP_FUTURE/CURRENT/ALL_ASSOC
Xin Long [Mon, 28 Jan 2019 07:08:23 +0000 (15:08 +0800)]
sctp: introduce SCTP_FUTURE/CURRENT/ALL_ASSOC

This patch is to add 3 constants SCTP_FUTURE_ASSOC,
SCTP_CURRENT_ASSOC and SCTP_ALL_ASSOC for reserved
assoc_ids, as defined in rfc6458#section-7.2.

And add the process for them when doing lookup and
inserting in sctp_id2assoc and sctp_assoc_set_id.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge branch 'devlink-port'
David S. Miller [Wed, 30 Jan 2019 06:13:09 +0000 (22:13 -0800)]
Merge branch 'devlink-port'

Vasundhara Volam says:

====================
devlink: Add configuration parameters support for devlink_port

This patchset adds support for configuration parameters setting through
devlink_port.  Each device registers supported configuration parameters
table.

The user can retrieve data on these parameters by
"devlink port param show" command and can set new value to a
parameter by "devlink port param set" command.
All configuration modes supported by devlink_dev are supported
by devlink_port also.

Command examples and output:

pci/0000:3b:00.0/0:
  name wake-on-lan type generic
    values:
      cmode permanent value false

pci/0000:3b:00.1/1:
  name wake-on-lan type generic
    values:
      cmode permanent value false

pci/0000:af:00.0/0:
  name wake-on-lan type generic
    values:
      cmode permanent value true

pci/0000:3b:00.0/0:
  name wake-on-lan type generic
    values:
      cmode permanent value false

There is difference of opinion on adding WOL parameter to devlink, between
Jakub Kicinski and Michael Chan.

Quote from Jakud Kicinski:
********
As explained previously I think it's a very bad idea to add existing
configuration options to devlink, just because devlink has the ability
to persist the setting in NVM.  Especially that for WoL you have to get
the link up so you potentially have all link config stuff as well.  And
that n-tuple filters are one of the WoL options, meaning we'd need the
ability to persist n-tuple filters via devlink.

The effort would be far better spent helping with migrating ethtool to
netlink, and allowing persisting there.

I have not heard any reason why devlink is a better fit.  I can imagine
you're just doing it here because it's less effort for you since
ethtool is not yet migrated.
********

Quote from Michael Chan:
********
The devlink's WoL parameter is a persistent WoL parameter stored in the
NIC's NVRAM. It is different from ethtool's WoL parameter in a number of
ways. ethtool WoL is not persistent over AC power cycle and is considered
OS-present WoL. As such, ethtool WoL can use a more sophisticated pattern
including n-tuple with IP address in addition to the more basic types
(e.g. magic packet). Whereas OS-absent power up WoL should only include
magic packet and other simple types. The devlink WoL setting does not have
to match the ethtool WoL setting. The card will autoneg up to the speed
supported by Vaux so no special devlink link setting is needed.
********

Future expansion of WOL parameter to devlink:
********
Add an additional flag to support additional setting to address link settings.
This will allow attributes to support both runtime and persistent
configuration.
********

v7->v8:
* Re-ordered function definitions.
* Append with "Acked-by: Jiri Pirko <jiri@mellanox.com>" to first 3 patches.
* Add missing devlink_port_param_driverinit_value_get() declaration.

v6->v7:
* Remove RFC tag from the patch-set.

v5->v6:
* Replace '-' with '*' in cover letter to avoid cutoff by git.

v4->v5:
* Added quotes from Jakub Kicinski and Michael chan on devlink's WOL
  parameter in the cover letter.

v3->v4:
* Update changes done from v2 to v3 version in individual patch
  descriptions.

v2->v3:
Make following changes as per suggestions from Jiri Pirko and
Michal Kubecek.
* Add a helper __devlink_params_register() with common code used by
  both devlink_params_register() and devlink_port_params_register().
* Define only WOL types used now and define them as bitfield, so that
  mutliple WOL types can be enabled upon power on.
* Modify "wake-on-lan" name to "wake_on_lan" to be symmetric with
  previous definitions.
* Rename DEVLINK_PARAM_WOL_XXX to DEVLINK_PARAM_WAKE_XXX to be
  symmetrical with ethtool WOL definitions.
* Modify bnxt_dl_wol_validate(), to throw error message when user gives
  value other than DEVLINK_PARAM_WAKE_MAGIC or to disable WOL.
* Use netdev_err() instead of netdev_warn(), when devlink_port_register()
  and devlink_port_params_register() returns error. Also, don't log rc
  in this message.

v1->v2:
Make following changes as per suggestions from Jiri Pirko.
* Remove separate enum devlink_port_param_generic_id for port params.
  Instead club it with existing device params. Accordingly refactor
  remaining patchset.
* Move INIT_LIST_HEAD of port param_list to devlink_port_register()
* Add a helper devlink_param_verify() to be used for both
  devlink_params_register() and devlink_port_params_register().
* Add a helper __devlink_params_unregister() for common code in
  devlink_params_unregister() and devlink_port_params_unregister().
* Move DEVLINK_CMD_PORT_PARAM_XXX definitions to the end of the enum.
* Split the patches for devlink_port_param_driverinit_value_get() and
  devlink_port_param_driverinit_value_set() into separate patches.
* define DEVLINK_PARAM_GENERIC_ID_WOL type as u8 and define enum for
  different types of WOL. Accordingly modify bnxt_en patch to validate
  wol type.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agobnxt_en: Add bnxt_en initial port params table and register it
Vasundhara Volam [Mon, 28 Jan 2019 12:30:27 +0000 (18:00 +0530)]
bnxt_en: Add bnxt_en initial port params table and register it

Register devlink_port with devlink and create initial port params
table for bnxt_en. The table consists of a generic parameter:

wake_on_lan: Enables Wake on Lan for this port when magic packet
is received with this port's MAC address using ACPI pattern.
If enabled, the controller asserts a wake pin upon reception of
WoL packet.  ACPI (Advanced Configuration and Power Interface) is
an industry specification for the efficient handling of power
consumption in desktop and mobile computers.

v2->v3:
- Modify bnxt_dl_wol_validate(), to throw error message when user gives
  value other than DEVLINK_PARAM_WAKE_MAGIC ot to disable WOL.
- Use netdev_err() instead of netdev_warn(), when devlink_port_register()
  and devlink_port_params_register() returns error. Also, don't log rc
  in this message.

Cc: Michael Chan <michael.chan@broadcom.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add a generic wake_on_lan port parameter
Vasundhara Volam [Mon, 28 Jan 2019 12:30:26 +0000 (18:00 +0530)]
devlink: Add a generic wake_on_lan port parameter

wake_on_lan - Enables Wake on Lan for this port. If enabled,
the controller asserts a wake pin based on the WOL type.

v2->v3:
- Define only WOL types used now and define them as bitfield, so that
  mutliple WOL types can be enabled upon power on.
- Modify "wake-on-lan" name to "wake_on_lan" to be symmetric with
  previous definitions.
- Rename DEVLINK_PARAM_WOL_XXX to DEVLINK_PARAM_WAKE_XXX to be
  symmetrical with ethtool WOL definitions.

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add devlink notifications support for port params
Vasundhara Volam [Mon, 28 Jan 2019 12:30:25 +0000 (18:00 +0530)]
devlink: Add devlink notifications support for port params

Add notification call for devlink port param set, register and unregister
functions.
Add devlink_port_param_value_changed() function to enable the driver notify
devlink on value change. Driver should use this function after value was
changed on any configuration mode part to driverinit.

v7->v8:
Order devlink_port_param_value_changed() definitions followed by
devlink_param_value_changed()

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add support for driverinit set value for devlink_port
Vasundhara Volam [Mon, 28 Jan 2019 12:30:24 +0000 (18:00 +0530)]
devlink: Add support for driverinit set value for devlink_port

Add support for "driverinit" configuration mode value for devlink_port
configuration parameters. Add devlink_port_param_driverinit_value_set()
function to help the driver set the value to devlink_port.

Also, move the common code to __devlink_param_driverinit_value_set()
to be used by both device and port params.

v7->v8:
Re-order the definitions as follows:
__devlink_param_driverinit_value_get
__devlink_param_driverinit_value_set
devlink_param_driverinit_value_get
devlink_param_driverinit_value_set
devlink_port_param_driverinit_value_get
devlink_port_param_driverinit_value_set

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add support for driverinit get value for devlink_port
Vasundhara Volam [Mon, 28 Jan 2019 12:30:23 +0000 (18:00 +0530)]
devlink: Add support for driverinit get value for devlink_port

Add support for "driverinit" configuration mode value for devlink_port
configuration parameters. Add devlink_port_param_driverinit_value_get()
function to help the driver get the value from devlink_port.

Also, move the common code to __devlink_param_driverinit_value_get()
to be used by both device and port params.

v7->v8:
-Add the missing devlink_port_param_driverinit_value_get() declaration.
-Also, order devlink_port_param_driverinit_value_get() after
devlink_param_driverinit_value_get/set() calls

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add port param set command
Vasundhara Volam [Mon, 28 Jan 2019 12:30:22 +0000 (18:00 +0530)]
devlink: Add port param set command

Add port param set command to set the value for a parameter.
Value can be set to any of the supported configuration modes.

v7->v8: Append "Acked-by: Jiri Pirko <jiri@mellanox.com>"

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add port param get command
Vasundhara Volam [Mon, 28 Jan 2019 12:30:21 +0000 (18:00 +0530)]
devlink: Add port param get command

Add port param get command which gets data per parameter.
It also has option to dump the parameters data per port.

v7->v8: Append "Acked-by: Jiri Pirko <jiri@mellanox.com>"

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agodevlink: Add devlink_param for port register and unregister
Vasundhara Volam [Mon, 28 Jan 2019 12:30:20 +0000 (18:00 +0530)]
devlink: Add devlink_param for port register and unregister

Add functions to register and unregister for the driver supported
configuration parameters table per port.

v7->v8:
- Order the definitions following way as suggested by Jiri.
__devlink_params_register
__devlink_params_unregister
devlink_params_register
devlink_params_unregister
devlink_port_params_register
devlink_port_params_unregister
- Append with Acked-by: Jiri Pirko <jiri@mellanox.com>.

v2->v3:
- Add a helper __devlink_params_register() with common code used by
  both devlink_params_register() and devlink_port_params_register().

Cc: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Vasundhara Volam <vasundhara-v.volam@broadcom.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
David S. Miller [Wed, 30 Jan 2019 05:18:54 +0000 (21:18 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

5 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Linus Torvalds [Wed, 30 Jan 2019 01:11:47 +0000 (17:11 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

 1) Need to save away the IV across tls async operations, from Dave
    Watson.

 2) Upon successful packet processing, we should liberate the SKB with
    dev_consume_skb{_irq}(). From Yang Wei.

 3) Only apply RX hang workaround on effected macb chips, from Harini
    Katakam.

 4) Dummy netdev need a proper namespace assigned to them, from Josh
    Elsasser.

 5) Some paths of nft_compat run lockless now, and thus we need to use a
    proper refcnt_t. From Florian Westphal.

 6) Avoid deadlock in mlx5 by doing IRQ locking, from Moni Shoua.

 7) netrom does not refcount sockets properly wrt. timers, fix that by
    using the sock timer API. From Cong Wang.

 8) Fix locking of inexact inserts of xfrm policies, from Florian
    Westphal.

 9) Missing xfrm hash generation bump, also from Florian.

10) Missing of_node_put() in hns driver, from Yonglong Liu.

11) Fix DN_IFREQ_SIZE, from Johannes Berg.

12) ip6mr notifier is invoked during traversal of wrong table, from Nir
    Dotan.

13) TX promisc settings not performed correctly in qed, from Manish
    Chopra.

14) Fix OOB access in vhost, from Jason Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (52 commits)
  MAINTAINERS: Add entry for XDP (eXpress Data Path)
  net: set default network namespace in init_dummy_netdev()
  net: b44: replace dev_kfree_skb_xxx by dev_consume_skb_xxx for drop profiles
  net: caif: call dev_consume_skb_any when skb xmit done
  net: 8139cp: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: macb: Apply RXUBR workaround only to versions with errata
  net: ti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: apple: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  net: amd8111e: replace dev_kfree_skb_irq by dev_consume_skb_irq
  net: alteon: replace dev_kfree_skb_irq by dev_consume_skb_irq
  net: tls: Fix deadlock in free_resources tx
  net: tls: Save iv in tls_rec for async crypto requests
  vhost: fix OOB in get_rx_bufs()
  qed: Fix stack out of bounds bug
  qed: Fix system crash in ll2 xmit
  qed: Fix VF probe failure while FLR
  qed: Fix LACP pdu drops for VFs
  qed: Fix bug in tx promiscuous mode settings
  net: i825xx: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
  netfilter: ipt_CLUSTERIP: fix warning unused variable cn
  ...

5 years agoMAINTAINERS: Add entry for XDP (eXpress Data Path)
Jesper Dangaard Brouer [Tue, 29 Jan 2019 13:22:33 +0000 (14:22 +0100)]
MAINTAINERS: Add entry for XDP (eXpress Data Path)

Add multiple people as maintainers for XDP, sorted alphabetically.

XDP is also tied to driver level support and code, but we cannot add all
drivers to the list. Instead K: and N: match on 'xdp' in hope to catch some
of those changes in drivers.

Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Acked-by: David S. Miller <davem@davemloft.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: set default network namespace in init_dummy_netdev()
Josh Elsasser [Sat, 26 Jan 2019 22:38:33 +0000 (14:38 -0800)]
net: set default network namespace in init_dummy_netdev()

Assign a default net namespace to netdevs created by init_dummy_netdev().
Fixes a NULL pointer dereference caused by busy-polling a socket bound to
an iwlwifi wireless device, which bumps the per-net BUSYPOLLRXPACKETS stat
if napi_poll() received packets:

  BUG: unable to handle kernel NULL pointer dereference at 0000000000000190
  IP: napi_busy_loop+0xd6/0x200
  Call Trace:
    sock_poll+0x5e/0x80
    do_sys_poll+0x324/0x5a0
    SyS_poll+0x6c/0xf0
    do_syscall_64+0x6b/0x1f0
    entry_SYSCALL_64_after_hwframe+0x3d/0xa2

Fixes: 7db6b048da3b ("net: Commonize busy polling code to focus on napi_id instead of socket")
Signed-off-by: Josh Elsasser <jelsasser@appneta.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4: cxgb4_tc_u32: use struct_size() in kvzalloc()
Gustavo A. R. Silva [Tue, 29 Jan 2019 17:05:23 +0000 (11:05 -0600)]
cxgb4: cxgb4_tc_u32: use struct_size() in kvzalloc()

One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with memory
for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

instance = kvzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4: clip_tbl: Use struct_size() in kvzalloc()
Gustavo A. R. Silva [Tue, 29 Jan 2019 16:44:44 +0000 (10:44 -0600)]
cxgb4: clip_tbl: Use struct_size() in kvzalloc()

One of the more common cases of allocation size calculations is finding the
size of a structure that has a zero-sized array at the end, along with memory
for some number of elements for that array. For example:

struct foo {
    int stuff;
    struct boo entry[];
};

instance = kvzalloc(sizeof(struct foo) + count * sizeof(struct boo), GFP_KERNEL);

Instead of leaving these open-coded and prone to type mistakes, we can now
use the new struct_size() helper:

instance = kvzalloc(struct_size(instance, entry, count), GFP_KERNEL);

This code was detected with the help of Coccinelle.

Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: b44: replace dev_kfree_skb_xxx by dev_consume_skb_xxx for drop profiles
Yang Wei [Tue, 29 Jan 2019 15:04:40 +0000 (23:04 +0800)]
net: b44: replace dev_kfree_skb_xxx by dev_consume_skb_xxx for drop profiles

The skb should be freed by dev_consume_skb_any() in b44_start_xmit()
when bounce_skb is used. The skb is be replaced by bounce_skb, so the
original skb should be consumed(not drop).

dev_consume_skb_irq() should be called in b44_tx() when skb xmit
done. It makes drop profiles(dropwatch, perf) more friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: caif: call dev_consume_skb_any when skb xmit done
Yang Wei [Tue, 29 Jan 2019 15:32:22 +0000 (23:32 +0800)]
net: caif: call dev_consume_skb_any when skb xmit done

The skb shouled be consumed when xmit done, it makes drop profiles
(dropwatch, perf) more friendly.
dev_kfree_skb_irq()/kfree_skb() shouled be replaced by
dev_consume_skb_any(), it makes code cleaner.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: 8139cp: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
Yang Wei [Tue, 29 Jan 2019 14:40:51 +0000 (22:40 +0800)]
net: 8139cp: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles

dev_consume_skb_irq() should be called in cp_tx() when skb xmit
done. It makes drop profiles(dropwatch, perf) more friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoMAINTAINERS: update cxgb4 and cxgb3 maintainer
Arjun Vynipadath [Tue, 29 Jan 2019 10:08:42 +0000 (15:38 +0530)]
MAINTAINERS: update cxgb4 and cxgb3 maintainer

Vishal Kulkarni will be the new maintainer for Chelsio cxgb3/cxgb4
drivers.

Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agocxgb4vf: Update port information in cxgb4vf_open()
Arjun Vynipadath [Tue, 29 Jan 2019 09:50:19 +0000 (15:20 +0530)]
cxgb4vf: Update port information in cxgb4vf_open()

It's possible that the basic port information could have
changed since we first read it.

Signed-off-by: Arjun Vynipadath <arjun@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: macb: Apply RXUBR workaround only to versions with errata
Harini Katakam [Tue, 29 Jan 2019 09:50:03 +0000 (15:20 +0530)]
net: macb: Apply RXUBR workaround only to versions with errata

The interrupt handler contains a workaround for RX hang applicable
to Zynq and AT91RM9200 only. Subsequent versions do not need this
workaround. This workaround unnecessarily resets RX whenever RX used
bit read is observed, which can be often under heavy traffic. There
is no other action performed on RX UBR interrupt. Hence introduce a
CAPS mask; enable this interrupt and workaround only on affected
versions.

Signed-off-by: Harini Katakam <harini.katakam@xilinx.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agoliquidio: fix the validation of rx checksum status from NIC hardware
Veerasenareddy Burru [Mon, 28 Jan 2019 19:38:31 +0000 (19:38 +0000)]
liquidio: fix the validation of rx checksum status from NIC hardware

Fixed the code that was incorrectly interpreting the rx checksum validation
status from hardware, and updating kernel that the packet arrived with
correct checksum though the packet arrived with incorrect checksum and
hardware also indicated checksum is not correct.

Signed-off-by: Veerasenareddy Burru <vburru@marvell.com>
Acked-by: Derek Chickles <dchickles@marvell.com>
Signed-off-by: Felix Manlunas <fmanlunas@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: ti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
Yang Wei [Mon, 28 Jan 2019 23:40:10 +0000 (07:40 +0800)]
net: ti: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles

dev_consume_skb_irq() should be called in cpmac_end_xmit() when
xmit done. It makes drop profiles more friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: apple: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles
Yang Wei [Mon, 28 Jan 2019 23:39:13 +0000 (07:39 +0800)]
net: apple: replace dev_kfree_skb_irq by dev_consume_skb_irq for drop profiles

dev_consume_skb_irq() should be called in bmac_txdma_intr() when
xmit done. It makes drop profiles more friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: amd8111e: replace dev_kfree_skb_irq by dev_consume_skb_irq
Yang Wei [Sun, 27 Jan 2019 15:58:25 +0000 (23:58 +0800)]
net: amd8111e: replace dev_kfree_skb_irq by dev_consume_skb_irq

dev_consume_skb_irq() should be called in amd8111e_tx() when xmit
done. It makes drop profiles more friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: alteon: replace dev_kfree_skb_irq by dev_consume_skb_irq
Yang Wei [Sun, 27 Jan 2019 15:56:34 +0000 (23:56 +0800)]
net: alteon: replace dev_kfree_skb_irq by dev_consume_skb_irq

dev_consume_skb_irq() should be called in ace_tx_int() when xmit
done. It makes drop profiles more friendly.

Signed-off-by: Yang Wei <yang.wei9@zte.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: tls: Fix deadlock in free_resources tx
Dave Watson [Sun, 27 Jan 2019 00:59:03 +0000 (00:59 +0000)]
net: tls: Fix deadlock in free_resources tx

If there are outstanding async tx requests (when crypto returns EINPROGRESS),
there is a potential deadlock: the tx work acquires the lock, while we
cancel_delayed_work_sync() while holding the lock.  Drop the lock while waiting
for the work to complete.

Fixes: a42055e8d2c30 ("Add support for async encryption of records...")
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agonet: tls: Save iv in tls_rec for async crypto requests
Dave Watson [Sun, 27 Jan 2019 00:57:38 +0000 (00:57 +0000)]
net: tls: Save iv in tls_rec for async crypto requests

aead_request_set_crypt takes an iv pointer, and we change the iv
soon after setting it.  Some async crypto algorithms don't save the iv,
so we need to save it in the tls_rec for async requests.

Found by hardcoding x64 aesni to use async crypto manager (to test the async
codepath), however I don't think this combination can happen in the wild.
Presumably other hardware offloads will need this fix, but there have been
no user reports.

Fixes: a42055e8d2c30 ("Add support for async encryption of records...")
Signed-off-by: Dave Watson <davejwatson@fb.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
5 years agovhost: fix OOB in get_rx_bufs()
Jason Wang [Mon, 28 Jan 2019 07:05:05 +0000 (15:05 +0800)]
vhost: fix OOB in get_rx_bufs()

After batched used ring updating was introduced in commit e2b3b35eb989
("vhost_net: batch used ring update in rx"). We tend to batch heads in
vq->heads for more than one packet. But the quota passed to
get_rx_bufs() was not correctly limited, which can result a OOB write
in vq->heads.

        headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
                    vhost_len, &in, vq_log, &log,
                    likely(mergeable) ? UIO_MAXIOV : 1);

UIO_MAXIOV was still used which is wrong since we could have batched
used in vq->heads, this will cause OOB if the next buffer needs more
than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've
batched 64 (VHOST_NET_BATCH) heads:
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
=============================================================================
BUG kmalloc-8k (Tainted: G    B            ): Redzone overwritten
-----------------------------------------------------------------------------

INFO: 0x00000000fd93b7a2-0x00000000f0713384. First byte 0xa9 instead of 0xcc
INFO: Allocated in alloc_pd+0x22/0x60 age=3933677 cpu=2 pid=2674
    kmem_cache_alloc_trace+0xbb/0x140
    alloc_pd+0x22/0x60
    gen8_ppgtt_create+0x11d/0x5f0
    i915_ppgtt_create+0x16/0x80
    i915_gem_create_context+0x248/0x390
    i915_gem_context_create_ioctl+0x4b/0xe0
    drm_ioctl_kernel+0xa5/0xf0
    drm_ioctl+0x2ed/0x3a0
    do_vfs_ioctl+0x9f/0x620
    ksys_ioctl+0x6b/0x80
    __x64_sys_ioctl+0x11/0x20
    do_syscall_64+0x43/0xf0
    entry_SYSCALL_64_after_hwframe+0x44/0xa9
INFO: Slab 0x00000000d13e87af objects=3 used=3 fp=0x          (null) flags=0x200000000010201
INFO: Object 0x0000000003278802 @offset=17064 fp=0x00000000e2e6652b

Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for
vhost-net. This is done through set the limitation through
vhost_dev_init(), then set_owner can allocate the number of iov in a
per device manner.

This fixes CVE-2018-16880.

Fixes: e2b3b35eb989 ("vhost_net: batch used ring update in rx")
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>