git.proxmox.com Git - mirror_ubuntu-eoan-kernel.git/log

bpf: Fix various lib and testsuite build failures on 32-bit.

Cannot cast a u64 to a pointer on 32-bit without an intervening (long)
cast otherwise GCC warns.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

selftests/bpf: add config fragment CONFIG_FTRACE_SYSCALLS

CONFIG_FTRACE_SYSCALLS=y is required for get_cgroup_id_user test case
this test reads a file from debug trace path
/sys/kernel/debug/tracing/events/syscalls/sys_enter_nanosleep/id

Signed-off-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Acked-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'bpf-sk-msg-pop-data'

John Fastabend says:

====================
After being able to add metadata to messages with sk_msg_push_data we
have also found it useful to be able to "pop" this metadata off before
sending it to applications in some cases. This series adds a new helper
sk_msg_pop_data() and the associated patches to add tests and tools/lib
support.

Thanks!

v2: Daniel caught that we missed adding sk_msg_pop_data to the changes
    data helper so that the verifier ensures BPF programs revalidate
    data after using this helper. Also improve documentation adding a
    return description and using RST syntax per Quentin's comment. And
    delta calculations for DROP with pop'd data (albeit a strange set
    of operations for a program to be doing) had potential to be
    incorrect possibly confusing user space applications, so fix it.
====================

Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: test_sockmap, add options for msg_pop_data() helper

Similar to msg_pull_data and msg_push_data add a set of options to
have msg_pop_data() exercised.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: add msg_pop_data helper to tools

Add the necessary header definitions to tools for new
msg_pop_data_helper.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: helper to pop data from messages

This adds a BPF SK_MSG program helper so that we can pop data from a
msg. We use this to pop metadata from a previous push data call.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge branch 'libbpf-versioning-doc'

Andrey Ignatov says:

====================
This patch set adds ABI versioning and documentation to libbpf.

Patch 1 renames btf_get_from_id to btf__get_from_id to follow naming
convention.
Patch 2 adds version script and has more details on ABI versioning.
Patch 3 adds simple check that all global symbols are versioned.
Patch 4 documents a few aspects of libbpf API and ABI in dev process.

v1->v2:
* add patch from Martin KaFai Lau <kafai@fb.com> to rename btf_get_from_id;
* add documentation for libbpf API and ABI.
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Document API and ABI conventions

Document API and ABI for libbpf: naming convention, symbol visibility,
ABI versioning.

This is just a starting point. Documentation can be significantly
extended in the future to cover more topics.

ABI versioning section touches only a few basic points with a link to
more comprehensive documentation from Ulrich Drepper. This section can
be extended in the future when there is better understanding what works
well and what not so well in libbpf development process and production
usage.

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Verify versioned symbols

Since ABI versioning info is kept separately from the code it's easy to
forget to update it while adding a new API.

Add simple verification that all global symbols exported with LIBBPF_API
are versioned in libbpf.map version script.

The idea is to check that number of global symbols in libbpf-in.o, that
is the input to the linker, matches with number of unique versioned
symbols in libbpf.so, that is the output of the linker. If these numbers
don't match, it may mean some symbol was not versioned and make will
fail.

"Unique" means that if a symbol is present in more than one version of
ABI due to ABI changes, it'll be counted once.

Another option to calculate number of global symbols in the "input"
could be to count number of LIBBPF_ABI entries in C headers but it seems
to be fragile.

Example of output when a symbol is missing in version script:

    ...
    LD       libbpf-in.o
    LINK     libbpf.a
    LINK     libbpf.so
  Warning: Num of global symbols in libbpf-in.o (115) does NOT match
  with num of versioned symbols in libbpf.so (114). Please make sure all
  LIBBPF_API symbols are versioned in libbpf.map.
  make: *** [check_abi] Error 1

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Add version script for DSO

More and more projects use libbpf and one day it'll likely be packaged
and distributed as DSO and that requires ABI versioning so that both
compatible and incompatible changes to ABI can be introduced in a safe
way in the future without breaking executables dynamically linked with a
previous version of the library.

Usual way to do ABI versioning is version script for the linker. Add
such a script for libbpf. All global symbols currently exported via
LIBBPF_API macro are added to the version script libbpf.map.

The version name LIBBPF_0.0.1 is constructed from the name of the
library + version specified by $(LIBBPF_VERSION) in Makefile.

Version script does not duplicate the work done by LIBBPF_API macro, it
rather complements it. The macro is used at compile time and can be used
by compiler to do optimization that can't be done at link time, it is
purely about global symbol visibility. The version script, in turn, is
used at link time and takes care of ABI versioning. Both techniques are
described in details in [1].

Whenever ABI is changed in the future, version script should be changed
appropriately.

[1] https://www.akkadia.org/drepper/dsohowto.pdf

Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

libbpf: Name changing for btf_get_from_id

s/btf_get_from_id/btf__get_from_id/ to restore the API naming convention.

Signed-off-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Andrey Ignatov <rdna@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

Merge branch 'non-jit-btf-func_info'

Yonghong Song says:

====================
Commit 838e96904ff3 ("bpf: Introduce bpf_func_info")
added bpf func info support. The userspace is able
to get better ksym's for bpf programs with jit, and
is able to print out func prototypes.

For a program containing func-to-func calls, the existing
implementation returns user specified number of function
calls and BTF types if jit is enabled. If the jit is not
enabled, it only returns the type for the main function.

This is undesirable. Interpreter may still be used
and we should keep feature identical regardless of
whether jit is enabled or not.
This patch fixed this discrepancy.

The following example shows bpftool output for
the bpf program in selftests test_btf_haskv.o when jit
is disabled:
  $ bpftool prog dump xlated id 1490
  int _dummy_tracepoint(struct dummy_tracepoint_args * arg):
     0: (85) call pc+2#__bpf_prog_run_args32
     1: (b7) r0 = 0
     2: (95) exit
  int test_long_fname_1(struct dummy_tracepoint_args * arg):
     3: (85) call pc+1#__bpf_prog_run_args32
     4: (95) exit
  int test_long_fname_2(struct dummy_tracepoint_args * arg):
     5: (b7) r2 = 0
     6: (63) *(u32 *)(r10 -4) = r2
     7: (79) r1 = *(u64 *)(r1 +8)
     8: (15) if r1 == 0x0 goto pc+9
     9: (bf) r2 = r10
    10: (07) r2 += -4
    11: (18) r1 = map[id:1173]
    13: (85) call bpf_map_lookup_elem#77088
    14: (15) if r0 == 0x0 goto pc+3
    15: (61) r1 = *(u32 *)(r0 +4)
    16: (07) r1 += 1
    17: (63) *(u32 *)(r0 +4) = r1
    18: (95) exit
  $ bpftool prog dump jited id 1490
    no instructions returned
====================

Signed-off-by: Alexei Starovoitov <ast@kernel.org>

tools/bpf: change selftest test_btf for both jit and non-jit

The selftest test_btf is changed to test both jit and non-jit.
The test result should be the same regardless of whether jit
is enabled or not.

Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: btf: support proper non-jit func info

Commit 838e96904ff3 ("bpf: Introduce bpf_func_info")
added bpf func info support. The userspace is able
to get better ksym's for bpf programs with jit, and
is able to print out func prototypes.

For a program containing func-to-func calls, the existing
implementation returns user specified number of function
calls and BTF types if jit is enabled. If the jit is not
enabled, it only returns the type for the main function.

This is undesirable. Interpreter may still be used
and we should keep feature identical regardless of
whether jit is enabled or not.
This patch fixed this discrepancy.

Fixes: 838e96904ff3 ("bpf: Introduce bpf_func_info")
Signed-off-by: Yonghong Song <yhs@fb.com>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>

bpf: Avoid unnecessary instruction in convert_bpf_ld_abs()

'offset' is constant and if it is zero, no need to subtract it
from BPF_REG_TMP.

Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
pull-request: bpf-next 2018-11-26

The following pull-request contains BPF updates for your *net-next* tree.

The main changes are:

1) Extend BTF to support function call types and improve the BPF
   symbol handling with this info for kallsyms and bpftool program
   dump to make debugging easier, from Martin and Yonghong.

2) Optimize LPM lookups by making longest_prefix_match() handle
   multiple bytes at a time, from Eric.

3) Adds support for loading and attaching flow dissector BPF progs
   from bpftool, from Stanislav.

4) Extend the sk_lookup() helper to be supported from XDP, from Nitin.

5) Enable verifier to support narrow context loads with offset > 0
   to adapt to LLVM code generation (currently only offset of 0 was
   supported). Add test cases as well, from Andrey.

6) Simplify passing device functions for offloaded BPF progs by
   adding callbacks to bpf_prog_offload_ops instead of ndo_bpf.
   Also convert nfp and netdevsim to make use of them, from Quentin.

7) Add support for sock_ops based BPF programs to send events to
   the perf ring-buffer through perf_event_output helper, from
   Sowmini and Daniel.

8) Add read / write support for skb->tstamp from tc BPF and cg BPF
   programs to allow for supporting rate-limiting in EDT qdiscs
   like fq from BPF side, from Vlad.

9) Extend libbpf API to support map in map types and add test cases
   for it as well to BPF kselftests, from Nikita.

10) Account the maximum packet offset accessed by a BPF program in
    the verifier and use it for optimizing nfp JIT, from Jiong.

11) Fix error handling regarding kprobe_events in BPF sample loader,
    from Daniel T.

12) Add support for queue and stack map type in bpftool, from David.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: align map type names formatting.

Make the formatting for map_type_name array consistent.

Signed-off-by: David Calavera <david.calavera@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf: btf: fix spelling mistake "Memmber" -> "Member"

There is a spelling mistake in a btf_verifier_log_member message,
fix it.

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

bpf, tags: Fix DEFINE_PER_CPU expansion

Building tags produces warning:

ctags: Warning: kernel/bpf/local_storage.c:10: null expansion of name pattern "\1"

Let's use the same fix as in commit 25528213fe9f ("tags: Fix DEFINE_PER_CPU
expansions"), even though it violates the usual code style.

Signed-off-by: Rustam Kovhaev <rkovhaev@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

net: remove unsafe skb_insert()

I do not see how one can effectively use skb_insert() without holding
some kind of lock. Otherwise other cpus could have changed the list
right before we have a chance of acquiring list->lock.

Only existing user is in drivers/infiniband/hw/nes/nes_mgt.c and this
one probably meant to use __skb_insert() since it appears nesqp->pau_list
is protected by nesqp->pau_lock. This looks like nesqp->pau_lock
could be removed, since nesqp->pau_list.lock could be used instead.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Faisal Latif <faisal.latif@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma <linux-rdma@vger.kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: bridge: remove redundant checks for null p->dev and p->br

A recent change added a null check on p->dev after p->dev was being
dereferenced by the ns_capable check on p->dev. It turns out that
neither the p->dev and p->br null checks are necessary, and can be
removed, which cleans up a static analyis warning.

As Nikolay Aleksandrov noted, these checks can be removed because:

"My reasoning of why it shouldn't be possible:
- On port add new_nbp() sets both p->dev and p->br before creating
  kobj/sysfs

- On port del (trickier) del_nbp() calls kobject_del() before call_rcu()
  to destroy the port which in turn calls sysfs_remove_dir() which uses
  kernfs_remove() which deactivates (shouldn't be able to open new
  files) and calls kernfs_drain() to drain current open/mmaped files in
  the respective dir before continuing, thus making it impossible to
  open a bridge port sysfs file with p->dev and p->br equal to NULL.

So I think it's safe to remove those checks altogether. It'd be nice to
get a second look over my reasoning as I might be missing something in
sysfs/kernfs call path."

Thanks to Nikolay Aleksandrov's suggestion to remove the check and
David Miller for sanity checking this.

Detected by CoverityScan, CID#751490 ("Dereference before null check")

Fixes: a5f3ea54f3cc ("net: bridge: add support for raw sysfs port options")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'r8169-xmit_more'

Heiner Kallweit says:

====================
r8169: make use of xmit_more and __netdev_sent_queue

This series adds helper __netdev_sent_queue to the core and makes use
of it in the r8169 driver.

Heiner Kallweit (2):
net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue
r8169: make use of xmit_more and __netdev_sent_queue

v2:
- fix minor style issue
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: make use of xmit_more and __netdev_sent_queue

Make use of xmit_more and add the functionality introduced with
3e59020abf0f ("net: bql: add __netdev_tx_sent_queue()").
I used the mlx4 driver as template.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: core: add __netdev_sent_queue as variant of __netdev_tx_sent_queue

Similar to netdev_sent_queue add helper __netdev_sent_queue as variant
of __netdev_tx_sent_queue.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests/net: add txring_overwrite

Packet sockets with PACKET_TX_RING send skbs with user data in frags.

Before commit 5cd8d46ea156 ("packet: copy user buffers before orphan
or clone") ring slots could be released prematurely, possibly allowing
a process to overwrite data still in flight.

This test opens two packet sockets, one to send and one to read.
The sender has a tx ring of one slot. It sends two packets with
different payload, then reads both and verifies their payload.

Before the above commit, both receive calls return the same data as
the send calls use the same buffer. From the commit, the clone
needed for looping onto a packet socket triggers an skb_copy_ubufs
to create a private copy. The separate sends each arrive correctly.

Signed-off-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: qualcomm: rmnet: move null check on dev before dereferecing it

Currently dev is dereferenced by the call dev_net(dev) before dev is null
checked. Fix this by null checking dev before the potential null
pointer dereference.

Detected by CoverityScan, CID#1462955 ("Dereference before null check")

Fixes: 23790ef12082 ("net: qualcomm: rmnet: Allow to configure flags for existing devices")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: remove set but not used variables 'multitrc, speed'

Fixes gcc '-Wunused-but-set-variable' warning:

drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:5883:6:
warning: variable 'multitrc' set but not used [-Wunused-but-set-variable]

drivers/net/ethernet/chelsio/cxgb4/t4_hw.c:8585:32:
warning: variable 'speed' set but not used [-Wunused-but-set-variable]

'multitrc' never used since introduction in
commit 8e3d04fd7d70 ("cxgb4: Add MPS tracing support")

'speed' never used since introduction in
commit c3168cabe1af ("cxgb4/cxgbvf: Handle 32-bit fw port capabilities")

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: fixup type in netdev_start_xmit()

Return code should be formally "netdev_tx_t".

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 fixes from Catalin Marinas::

- Fix wrong conflict resolution around CONFIG_ARM64_SSBD

- Fix sparse warning on unsigned long constant

* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: cpufeature: Fix mismerge of CONFIG_ARM64_SSBD block
arm64: sysreg: fix sparse warnings

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) Need to take mutex in ath9k_add_interface(), from Dan Carpenter.

2) Fix mt76 build without CONFIG_LEDS_CLASS, from Arnd Bergmann.

3) Fix socket wmem accounting in SCTP, from Xin Long.

4) Fix failed resume crash in ena driver, from Arthur Kiyanovski.

5) qed driver passes bytes instead of bits into second arg of
    bitmap_weight(). From Denis Bolotin.

6) Fix reset deadlock in ibmvnic, from Juliet Kim.

7) skb_scrube_packet() needs to scrub the fwd marks too, from Petr
    Machata.

8) Make sure older TCP stacks see enough dup ACKs, and avoid doing SACK
    compression during this period, from Eric Dumazet.

9) Add atomicity to SMC protocol cursor handling, from Ursula Braun.

10) Don't leave dangling error pointer if bpf_prog_add() fails in
    thunderx driver, from Lorenzo Bianconi. Also, when we unmap TSO
    headers, set sq->tso_hdrs to NULL.

11) Fix race condition over state variables in act_police, from Davide
    Caratti.

12) Disable guest csum in the presence of XDP in virtio_net, from Jason
    Wang.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (64 commits)
  net: gemini: Fix copy/paste error
  net: phy: mscc: fix deadlock in vsc85xx_default_config
  dt-bindings: dsa: Fix typo in "probed"
  net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue
  net: amd: add missing of_node_put()
  team: no need to do team_notify_peers or team_mcast_rejoin when disabling port
  virtio-net: fail XDP set if guest csum is negotiated
  virtio-net: disable guest csum during XDP set
  net/sched: act_police: add missing spinlock initialization
  net: don't keep lonely packets forever in the gro hash
  net/ipv6: re-do dad when interface has IFF_NOARP flag change
  packet: copy user buffers before orphan or clone
  ibmvnic: Update driver queues after change in ring size support
  ibmvnic: Fix RX queue buffer cleanup
  net: thunderx: set xdp_prog to NULL if bpf_prog_add fails
  net/dim: Update DIM start sample after each DIM iteration
  net: faraday: ftmac100: remove netif_running(netdev) check before disabling interrupts
  net/smc: use after free fix in smc_wr_tx_put_slot()
  net/smc: atomic SMCD cursor handling
  net/smc: add SMC-D shutdown signal
  ...

Merge tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux

Pull xfs fixes from Darrick Wong:
"Dave and I have continued our work fixing corruption problems that can
  be found when running long-term burn-in exercisers on xfs. Here are
  some patches fixing most of the problems, but there will likely be
  more. :/

   - Numerous corruption fixes for copy on write

   - Numerous corruption fixes for blocksize < pagesize writes

   - Don't miscalculate AG reservations for small final AGs

   - Fix page cache truncation to work properly for reflink and extent
     shifting

   - Fix use-after-free when retrying failed inode/dquot buffer logging

   - Fix corruptions seen when using copy_file_range in directio mode"

* tag 'xfs-4.20-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
  iomap: readpages doesn't zero page tail beyond EOF
  vfs: vfs_dedupe_file_range() doesn't return EOPNOTSUPP
  iomap: dio data corruption and spurious errors when pipes fill
  iomap: sub-block dio needs to zeroout beyond EOF
  iomap: FUA is wrong for DIO O_DSYNC writes into unwritten extents
  xfs: delalloc -> unwritten COW fork allocation can go wrong
  xfs: flush removing page cache in xfs_reflink_remap_prep
  xfs: extent shifting doesn't fully invalidate page cache
  xfs: finobt AG reserves don't consider last AG can be a runt
  xfs: fix transient reference count error in xfs_buf_resubmit_failed_buffers
  xfs: uncached buffer tracing needs to print bno
  xfs: make xfs_file_remap_range() static
  xfs: fix shared extent data corruption due to missing cow reservation

net: gemini: Fix copy/paste error

The TX stats should be started with the tx_stats_syncp,
there seems to be a copy/paste error in the driver.

Signed-off-by: Andreas Fiedler <andreas.fiedler@gmx.net>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mscc: fix deadlock in vsc85xx_default_config

The vsc85xx_default_config function called in the vsc85xx_config_init
function which is used by VSC8530, VSC8531, VSC8540 and VSC8541 PHYs
mistakenly calls phy_read and phy_write in-between phy_select_page and
phy_restore_page.

phy_select_page and phy_restore_page actually take and release the MDIO
bus lock and phy_write and phy_read take and release the lock to write
or read to a PHY register.

Let's fix this deadlock by using phy_modify_paged which handles
correctly a read followed by a write in a non-standard page.

Fixes: 6a0bfbbe20b0 ("net: phy: mscc: migrate to phy_select/restore_page functions")
Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: dsa: Fix typo in "probed"

The correct form is "can be probed", so fix the typo.

Signed-off-by: Fabio Estevam <festevam@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: thunderx: set tso_hdrs pointer to NULL in nicvf_free_snd_queue

Reset snd_queue tso_hdrs pointer to NULL in nicvf_free_snd_queue routine
since it is used to check if tso dma descriptor queue has been previously
allocated. The issue can be triggered with the following reproducer:

$ip link set dev enP2p1s0v0 xdpdrv obj xdp_dummy.o
$ip link set dev enP2p1s0v0 xdpdrv off

[  341.467649] WARNING: CPU: 74 PID: 2158 at mm/vmalloc.c:1511 __vunmap+0x98/0xe0
[  341.515010] Hardware name: GIGABYTE H270-T70/MT70-HD0, BIOS T49 02/02/2018
[  341.521874] pstate: 60400005 (nZCv daif +PAN -UAO)
[  341.526654] pc : __vunmap+0x98/0xe0
[  341.530132] lr : __vunmap+0x98/0xe0
[  341.533609] sp : ffff00001c5db860
[  341.536913] x29: ffff00001c5db860 x28: 0000000000020000
[  341.542214] x27: ffff810feb5090b0 x26: ffff000017e57000
[  341.547515] x25: 0000000000000000 x24: 00000000fbd00000
[  341.552816] x23: 0000000000000000 x22: ffff810feb5090b0
[  341.558117] x21: 0000000000000000 x20: 0000000000000000
[  341.563418] x19: ffff000017e57000 x18: 0000000000000000
[  341.568719] x17: 0000000000000000 x16: 0000000000000000
[  341.574020] x15: 0000000000000010 x14: ffffffffffffffff
[  341.579321] x13: ffff00008985eb27 x12: ffff00000985eb2f
[  341.584622] x11: ffff0000096b3000 x10: ffff00001c5db510
[  341.589923] x9 : 00000000ffffffd0 x8 : ffff0000086868e8
[  341.595224] x7 : 3430303030303030 x6 : 00000000000006ef
[  341.600525] x5 : 00000000003fffff x4 : 0000000000000000
[  341.605825] x3 : 0000000000000000 x2 : ffffffffffffffff
[  341.611126] x1 : ffff0000096b3728 x0 : 0000000000000038
[  341.616428] Call trace:
[  341.618866]  __vunmap+0x98/0xe0
[  341.621997]  vunmap+0x3c/0x50
[  341.624961]  arch_dma_free+0x68/0xa0
[  341.628534]  dma_direct_free+0x50/0x80
[  341.632285]  nicvf_free_resources+0x160/0x2d8 [nicvf]
[  341.637327]  nicvf_config_data_transfer+0x174/0x5e8 [nicvf]
[  341.642890]  nicvf_stop+0x298/0x340 [nicvf]
[  341.647066]  __dev_close_many+0x9c/0x108
[  341.650977]  dev_close_many+0xa4/0x158
[  341.654720]  rollback_registered_many+0x140/0x530
[  341.659414]  rollback_registered+0x54/0x80
[  341.663499]  unregister_netdevice_queue+0x9c/0xe8
[  341.668192]  unregister_netdev+0x28/0x38
[  341.672106]  nicvf_remove+0xa4/0xa8 [nicvf]
[  341.676280]  nicvf_shutdown+0x20/0x30 [nicvf]
[  341.680630]  pci_device_shutdown+0x44/0x88
[  341.684720]  device_shutdown+0x144/0x250
[  341.688640]  kernel_restart_prepare+0x44/0x50
[  341.692986]  kernel_restart+0x20/0x68
[  341.696638]  __se_sys_reboot+0x210/0x238
[  341.700550]  __arm64_sys_reboot+0x24/0x30
[  341.704555]  el0_svc_handler+0x94/0x110
[  341.708382]  el0_svc+0x8/0xc
[  341.711252] ---[ end trace 3f4019c8439959c9 ]---
[  341.715874] page:ffff7e0003ef4000 count:0 mapcount:0 mapping:0000000000000000 index:0x4
[  341.723872] flags: 0x1fffe000000000()
[  341.727527] raw: 001fffe000000000 ffff7e0003f1a008 ffff7e0003ef4048 0000000000000000
[  341.735263] raw: 0000000000000004 0000000000000000 00000000ffffffff 0000000000000000
[  341.742994] page dumped because: VM_BUG_ON_PAGE(page_ref_count(page) == 0)

where xdp_dummy.c is a simple bpf program that forwards the incoming
frames to the network stack (available here:
https://github.com/altoor/xdp_walkthrough_examples/blob/master/sample_1/xdp_dummy.c)

Fixes: 05c773f52b96 ("net: thunderx: Add basic XDP support")
Fixes: 4863dea3fab0 ("net: Adding support for Cavium ThunderX network controller")
Signed-off-by: Lorenzo Bianconi <lorenzo.bianconi@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ptp: Fix pass zero to ERR_PTR() in ptp_clock_register

Fix smatch warning:

drivers/ptp/ptp_clock.c:298 ptp_clock_register() warn:
passing zero to 'ERR_PTR'

'err' should be set while device_create_with_groups and
pps_register_source fails

Fixes: 85a66e550195 ("ptp: create "pins" together with the rest of attributes")
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'switchdev-blocking-notifiers'

Petr Machata says:

====================
switchdev: Convert switchdev_port_obj_{add,del}() to notifiers

An offloading driver may need to have access to switchdev events on
ports that aren't directly under its control. An example is a VXLAN port
attached to a bridge offloaded by a driver. The driver needs to know
about VLANs configured on the VXLAN device. However the VXLAN device
isn't stashed between the bridge and a front-panel-port device (such as
is the case e.g. for LAG devices), so the usual switchdev ops don't
reach the driver.

VXLAN is likely not the only device type like this: in theory any L2
tunnel device that needs offloading will prompt requirement of this
sort.

A way to fix this is to give up the notion of port object addition /
deletion as a switchdev operation, which assumes somewhat tight coupling
between the message producer and consumer. And instead send the message
over a notifier chain.

The series starts with a clean-up patch #1, where
SWITCHDEV_OBJ_PORT_{VLAN, MDB}() are fixed up to lift the constraint
that the passed-in argument be a simple variable named "obj".

switchdev_port_obj_add and _del are invoked in a context that permits
blocking. Not only that, at least for the VLAN notification, being able
to signal failure is actually important. Therefore introduce a new
blocking notifier chain that the new events will be sent on. That's done
in patch #2. Retain the current (atomic) notifier chain for the
preexisting notifications.

In patch #3, introduce two new switchdev notifier types,
SWITCHDEV_PORT_OBJ_ADD and SWITCHDEV_PORT_OBJ_DEL. These notifier types
communicate the same event as the corresponding switchdev op, except in
a form of a notification. struct switchdev_notifier_port_obj_info was
added to carry the fields that correspond to the switchdev op arguments.
An additional field, handled, will be used to communicate back to
switchdev that the event has reached an interested party, which will be
important for the two-phase commit.

In patches #4, #5, and #7, rocker, DSA resp. ethsw are updated to
subscribe to the switchdev blocking notifier chain, and handle the new
notifier types. #6 introduces a helper to determine whether a
netdevice corresponds to a front panel port.

What these three drivers have in common is that their ports don't
support any uppers besides bridge. That makes it possible to ignore any
notifiers that don't reference a front-panel port device, because they
are certainly out of scope.

Unlike the previous three, mlxsw and ocelot drivers admit stacked
devices as uppers. While the current switchdev code recursively descends
through layers of lower devices, eventually calling the op on a
front-panel port device, the notifier would reference a stacking device
that's one of front-panel ports uppers. The filtering is thus more
complex.

For ocelot, such iteration is currently pretty much required, because
there's no bookkeeping of LAG devices. mlxsw does keep the list of LAGs,
however it iterates the lower devices anyway when deciding whether an
event on a tunnel device pertains to the driver or not.

Therefore this patch set instead introduces, in patch #8, a helper to
iterate through lowers, much like the current switchdev code does,
looking for devices that match a given predicate.

Then in patches #9 and #10, first mlxsw and then ocelot are updated to
dispatch the newly-added notifier types to the preexisting
port_obj_add/_del handlers. The dispatch is done via the new helper, to
recursively descend through lower devices.

Finally in patch #11, the actual switch is made, retiring the current
SDO-based code in favor of a notifier.

Now that the event is distributed through a notifier, the explicit
netdevice check in rocker, DSA and ethsw doesn't let through any events
except those done on a front-panel port itself. It is therefore
unnecessary to check in VLAN-handling code whether a VLAN was added to
the bridge itself: such events will simply be ignored much sooner.
Therefore remove it in patch #12.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

rocker, dsa, ethsw: Don't filter VLAN events on bridge itself

Due to an explicit check in rocker_world_port_obj_vlan_add(),
dsa_slave_switchdev_event() resp. port_switchdev_event(), VLAN objects
that are added to a device that is not a front-panel port device are
ignored. Therefore this check is immaterial.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

switchdev: Replace port obj add/del SDO with a notification

Drop switchdev_ops.switchdev_port_obj_add and _del. Drop the uses of
this field from all clients, which were migrated to use switchdev
notification in the previous patches.

Add a new function switchdev_port_obj_notify() that sends the switchdev
notifications SWITCHDEV_PORT_OBJ_ADD and _DEL.

Update switchdev_port_obj_del_now() to dispatch to this new function.
Drop __switchdev_port_obj_add() and update switchdev_port_obj_add()
likewise.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ocelot: Handle SWITCHDEV_PORT_OBJ_ADD/_DEL

Following patches will change the way of distributing port object
changes from a switchdev operation to a switchdev notifier. The
switchdev code currently recursively descends through layers of lower
devices, eventually calling the op on a front-panel port device. The
notifier will instead be sent referencing the bridge port device, which
may be a stacking device that's one of front-panel ports uppers, or a
completely unrelated device.

Dispatch the new events to ocelot_port_obj_add() resp. _del() to
maintain the same behavior that the switchdev operation based code
currently has. Pass through switchdev_handle_port_obj_add() / _del() to
handle the recursive descend, because Ocelot supports LAG uppers.

Register to the new switchdev blocking notifier chain to get the new
events when they start getting distributed.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_switchdev: Handle SWITCHDEV_PORT_OBJ_ADD/_DEL

Following patches will change the way of distributing port object
changes from a switchdev operation to a switchdev notifier. The
switchdev code currently recursively descends through layers of lower
devices, eventually calling the op on a front-panel port device. The
notifier will instead be sent referencing the bridge port device, which
may be a stacking device that's one of front-panel ports uppers, or a
completely unrelated device.

To handle SWITCHDEV_PORT_OBJ_ADD and _DEL, subscribe to the blocking
notifier chain. Dispatch to mlxsw_sp_port_obj_add() resp. _del() to
maintain the behavior that the switchdev operation based code currently
has. Defer to switchdev_handle_port_obj_add() / _del() to handle the
recursive descend, because mlxsw supports a number of upper types.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

switchdev: Add helpers to aid traversal through lower devices

After the transition from switchdev operations to notifier chain (which
will take place in following patches), the onus is on the driver to find
its own devices below possible layer of LAG or other uppers.

The logic to do so is fairly repetitive: each driver is looking for its
own devices among the lowers of the notified device. For those that it
finds, it calls a handler. To indicate that the event was handled,
struct switchdev_notifier_port_obj_info.handled is set. The differences
lie only in what constitutes an "own" device and what handler to call.

Therefore abstract this logic into two helpers,
switchdev_handle_port_obj_add() and switchdev_handle_port_obj_del(). If
a driver only supports physical ports under a bridge device, it will
simply avoid this layer of indirection.

One area where this helper diverges from the current switchdev behavior
is the case of mixed lowers, some of which are switchdev ports and some
of which are not. Previously, such scenario would fail with -EOPNOTSUPP.
The helper could do that for lowers for which the passed-in predicate
doesn't hold. That would however break the case that switchdev ports
from several different drivers are stashed under one master, a scenario
that switchdev currently happily supports. Therefore tolerate any and
all unknown netdevices, whether they are backed by a switchdev driver
or not.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

staging: fsl-dpaa2: ethsw: Handle SWITCHDEV_PORT_OBJ_ADD/_DEL

Following patches will change the way of distributing port object
changes from a switchdev operation to a switchdev notifier. The
switchdev code currently recursively descends through layers of lower
devices, eventually calling the op on a front-panel port device. The
notifier will instead be sent referencing the bridge port device, which
may be a stacking device that's one of front-panel ports uppers, or a
completely unrelated device.

ethsw currently doesn't support any uppers other than bridge.
SWITCHDEV_OBJ_ID_HOST_MDB and _PORT_MDB objects are always notified on
the bridge port device. Thus the only case that a stacked device could
be validly referenced by port object notifications are bridge
notifications for VLAN objects added to the bridge itself. But the
driver explicitly rejects such notifications in port_vlans_add(). It is
therefore safe to assume that the only interesting case is that the
notification is on a front-panel port netdevice.

To handle SWITCHDEV_PORT_OBJ_ADD and _DEL, subscribe to the blocking
notifier chain. Dispatch to swdev_port_obj_add() resp. _del() to
maintain the behavior that the switchdev operation based code currently
has.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

staging: fsl-dpaa2: ethsw: Introduce ethsw_port_dev_check()

ethsw currently uses an open-coded comparison of netdev_ops to determine
whether whether a device represents a front panel port. Wrap this into a
named function to simplify reuse.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: slave: Handle SWITCHDEV_PORT_OBJ_ADD/_DEL

Following patches will change the way of distributing port object
changes from a switchdev operation to a switchdev notifier. The
switchdev code currently recursively descends through layers of lower
devices, eventually calling the op on a front-panel port device. The
notifier will instead be sent referencing the bridge port device, which
may be a stacking device that's one of front-panel ports uppers, or a
completely unrelated device.

DSA currently doesn't support any other uppers than bridge.
SWITCHDEV_OBJ_ID_HOST_MDB and _PORT_MDB objects are always notified on
the bridge port device. Thus the only case that a stacked device could
be validly referenced by port object notifications are bridge
notifications for VLAN objects added to the bridge itself. But the
driver explicitly rejects such notifications in dsa_port_vlan_add(). It
is therefore safe to assume that the only interesting case is that the
notification is on a front-panel port netdevice. Therefore keep the
filtering by dsa_slave_dev_check() in place.

To handle SWITCHDEV_PORT_OBJ_ADD and _DEL, subscribe to the blocking
notifier chain. Dispatch to rocker_port_obj_add() resp. _del() to
maintain the behavior that the switchdev operation based code currently
has.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rocker: Handle SWITCHDEV_PORT_OBJ_ADD/_DEL

Following patches will change the way of distributing port object
changes from a switchdev operation to a switchdev notifier. The
switchdev code currently recursively descends through layers of lower
devices, eventually calling the op on a front-panel port device. The
notifier will instead be sent referencing the bridge port device, which
may be a stacking device that's one of front-panel ports uppers, or a
completely unrelated device.

rocker currently doesn't support any uppers other than bridge. Thus the
only case that a stacked device could be validly referenced by port
object notifications are bridge notifications for VLAN objects added to
the bridge itself. But the driver explicitly rejects such notifications
in rocker_world_port_obj_vlan_add(). It is therefore safe to assume that
the only interesting case is that the notification is on a front-panel
port netdevice.

Subscribe to the blocking notifier chain. In the handler, filter out
notifications on any foreign netdevices. Dispatch the new notifiers to
rocker_port_obj_add() resp. _del() to maintain the behavior that the
switchdev operation based code currently has.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

switchdev: Add SWITCHDEV_PORT_OBJ_ADD, SWITCHDEV_PORT_OBJ_DEL

An offloading driver may need to have access to switchdev events on
ports that aren't directly under its control. An example is a VXLAN port
attached to a bridge offloaded by a driver. The driver needs to know
about VLANs configured on the VXLAN device. However the VXLAN device
isn't stashed between the bridge and a front-panel-port device (such as
is the case e.g. for LAG devices), so the usual switchdev ops don't
reach the driver.

VXLAN is likely not the only device type like this: in theory any L2
tunnel device that needs offloading will prompt requirement of this
sort. This falsifies the assumption that only the lower devices of a
front panel port need to be notified to achieve flawless offloading.

A way to fix this is to give up the notion of port object addition /
deletion as a switchdev operation, which assumes somewhat tight coupling
between the message producer and consumer. And instead send the message
over a notifier chain.

To that end, introduce two new switchdev notifier types,
SWITCHDEV_PORT_OBJ_ADD and SWITCHDEV_PORT_OBJ_DEL. These notifier types
communicate the same event as the corresponding switchdev op, except in
a form of a notification. struct switchdev_notifier_port_obj_info was
added to carry the fields that the switchdev op carries. An additional
field, handled, will be used to communicate back to switchdev that the
event has reached an interested party, which will be important for the
two-phase commit.

The two switchdev operations themselves are kept in place. Following
patches first convert individual clients to the notifier protocol, and
only then are the operations removed.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

switchdev: Add a blocking notifier chain

In general one can't assume that a switchdev notifier is called in a
non-atomic context, and correspondingly, the switchdev notifier chain is
an atomic one.

However, port object addition and deletion messages are delivered from a
process context. Even the MDB addition messages, whose delivery is
scheduled from atomic context, are queued and the delivery itself takes
place in blocking context. For VLAN messages in particular, keeping the
blocking nature is important for error reporting.

Therefore introduce a blocking notifier chain and related service
functions to distribute the notifications for which a blocking context
can be assumed.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Reviewed-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

switchdev: SWITCHDEV_OBJ_PORT_{VLAN, MDB}(): Sanitize

The two macros SWITCHDEV_OBJ_PORT_VLAN() and SWITCHDEV_OBJ_PORT_MDB()
expand to a container_of() call, yielding an appropriate container of
their sole argument. However, due to a name collision, the first
argument, i.e. the contained object pointer, is not the only one to get
expanded. The third argument, which is a structure member name, and
should be kept literal, gets expanded as well. The only safe way to use
these two macros is therefore to name the local variable passed to them
"obj".

To fix this, rename the sole argument of the two macros from
"obj" (which collides with the member name) to "OBJ". Additionally,
instead of passing "OBJ" to container_of() verbatim, parenthesize it, so
that a comma in the passed-in expression doesn't pollute the
container_of() invocation.

Signed-off-by: Petr Machata <petrm@mellanox.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'r8169-next'

Heiner Kallweit says:

====================
r8169: some functional improvements

This series includes a few functional improvements.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: replace macro TX_FRAGS_READY_FOR with a function

Replace macro TX_FRAGS_READY_FOR with function rtl_tx_slots_avail
to make code cleaner and type-safe.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: use napi_consume_skb where possible

Use napi_consume_skb() where possible to profit from
bulk free infrastructure.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: simplify detecting chip versions with same XID

For the GMII chip versions we set the version number which was set
already. This can be simplified.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: remove default chip versions

Even the chip versions within a family have so many differences that
using a default chip version doesn't really make sense. Instead
of leaving a best case flaky network connectivity, bail out and
report the unknown chip version.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: remove ancient GCC bug workaround in a second place

Remove ancient GCC bug workaround in a second place and factor out
rtl_8169_get_txd_opts1.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'hns3-debugfs'

Salil Mehta says:

====================
net: hns3: Adds support of debugfs to HNS3 driver

This patchset adds support of debugfs to the HNS3 driver.

Support has been added to query info related to below items:
1. Queue related ("echo queue info [queue no] > cmd")
2. Flow Director ("echo dump fd tcam > cmd")
3. TC config ("echo dump tc > cmd")
4. Transmit Module/Scheduler ("echo dump tm > cmd")
5. QoS pause ("echo dump qos pause cfg > cmd")
6. QoS buffer ("echo dump qos pri map > cmd")
7. QoS prio map ("echo dump qos buf cfg > cmd")

NOTE: Above commands are *read-only* and are only intended to
query the information from the SoC(and dump inside the kernel,
for now) and in no way tries to perform write operations for
the purpose of configuration etc.

Change Log
----------
V1-->V2:
   * Addressed the comments provided by Jakub Kicinski.
     1. Removed the .rej files mistakenly made part of Flow Director patch.
Link: https://lkml.org/lkml/2018/11/20/249
     2. Added command summary in the cover letter
Link: https://lkml.org/lkml/2018/11/22/1
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Add "qos buffer" config info query function

This patch prints qos buffer config information.

debugfs command:
echo dump qos buf cfg > cmd

Sample Command:
root@(none)# echo dump qos buf cfg > cmd
hns3 0000:7d:00.0: dump qos buf cfg
hns3 0000:7d:00.0: tx_packet_buf_tc_0: 0x1aa
hns3 0000:7d:00.0: tx_packet_buf_tc_1: 0x0
hns3 0000:7d:00.0: tx_packet_buf_tc_2: 0x0
hns3 0000:7d:00.0: tx_packet_buf_tc_3: 0x0
hns3 0000:7d:00.0: tx_packet_buf_tc_4: 0x0
hns3 0000:7d:00.0: tx_packet_buf_tc_5: 0x0
hns3 0000:7d:00.0: tx_packet_buf_tc_6: 0x0
hns3 0000:7d:00.0: tx_packet_buf_tc_7: 0x0
hns3 0000:7d:00.0:
hns3 0000:7d:00.0: rx_packet_buf_tc_0: 0x130
hns3 0000:7d:00.0: rx_packet_buf_tc_1: 0x0
hns3 0000:7d:00.0: rx_packet_buf_tc_2: 0x0
hns3 0000:7d:00.0: rx_packet_buf_tc_3: 0x0
hns3 0000:7d:00.0: rx_packet_buf_tc_4: 0x0
hns3 0000:7d:00.0: rx_packet_buf_tc_5: 0x0
hns3 0000:7d:00.0: rx_packet_buf_tc_6: 0x0
hns3 0000:7d:00.0: rx_packet_buf_tc_7: 0x0
hns3 0000:7d:00.0: rx_share_buf: 0x1e0e
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Add "qos prio map" info query function

This patch prints qos priority map information.

debugfs command:
echo dump qos pri map > cmd

Sample Command:
root@(none)# echo dump qos pri map > cmd
hns3 0000:7d:00.0: dump qos pri map
hns3 0000:7d:00.0: vlan_to_pri: 0x0
hns3 0000:7d:00.0: pri_0_to_tc: 0x0
hns3 0000:7d:00.0: pri_1_to_tc: 0x0
hns3 0000:7d:00.0: pri_2_to_tc: 0x0
hns3 0000:7d:00.0: pri_3_to_tc: 0x0
hns3 0000:7d:00.0: pri_4_to_tc: 0x0
hns3 0000:7d:00.0: pri_5_to_tc: 0x0
hns3 0000:7d:00.0: pri_6_to_tc: 0x0
hns3 0000:7d:00.0: pri_7_to_tc: 0x0
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Add "qos pause" config info query function

This patch prints qos pause config information.

debugfs command:
echo dump qos pause cfg > cmd

Sample Command:
root@(none)# echo dump qos pause cfg > cmd
hns3 0000:7d:00.0: dump qos pause cfg
hns3 0000:7d:00.0: pause_trans_gap: 0xff
hns3 0000:7d:00.0: pause_trans_time: 0xffff
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Add "tm config" info query function

This patch prints Transmit Module's Traffic sched
related config information.

debugfs command:
echo dump tm > cmd

Sample output:
root@(none)# echo dump tm > cmd
hns3 0000:7d:00.0: dump tm
hns3 0000:7d:00.0: PG_TO_PRI gp_id: 0
hns3 0000:7d:00.0: PG_TO_PRI map: 0x1
hns3 0000:7d:00.0: QS_TO_PRI qs_id: 0
hns3 0000:7d:00.0: QS_TO_PRI priority: 0
hns3 0000:7d:00.0: QS_TO_PRI link_vld: 1
hns3 0000:7d:00.0: NQ_TO_QS nq_id: 0
hns3 0000:7d:00.0: NQ_TO_QS qset_id: 1024
hns3 0000:7d:00.0: PG pg_id: 0
hns3 0000:7d:00.0: PG dwrr: 100
hns3 0000:7d:00.0: QS qs_id: 0
hns3 0000:7d:00.0: QS dwrr: 100
hns3 0000:7d:00.0: PRI pri_id: 0
hns3 0000:7d:00.0: PRI dwrr: 100
hns3 0000:7d:00.0: PRI_C pri_id: 0
hns3 0000:7d:00.0: PRI_C pri_shapping: 0x2850000
hns3 0000:7d:00.0: PRI_P pri_id: 0
hns3 0000:7d:00.0: PRI_P pri_shapping: 0x2850796
hns3 0000:7d:00.0: PG_C pg_id: 0
hns3 0000:7d:00.0: PG_C pg_shapping: 0x2850000
hns3 0000:7d:00.0: PG_P pg_id: 0
hns3 0000:7d:00.0: PG_P pg_shapping: 0x2850496
hns3 0000:7d:00.0: PORT port_shapping: 0x2850296
hns3 0000:7d:00.0: PG_SCH pg_id: 0
hns3 0000:7d:00.0: PRI_SCH pg_id: 0
hns3 0000:7d:00.0: QS_SCH pg_id: 0
hns3 0000:7d:00.0: BP_TO_QSET pg_id: 0
hns3 0000:7d:00.0: BP_TO_QSET pg_shapping: 0x0
hns3 0000:7d:00.0: BP_TO_QSET qs_bit_map: 0x0
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Add "tc config" info query function

This patch prints tc config information.

debugfs command:
echo dump tc > cmd

Sample Output:
root@(none)# echo dump tc > cmd
hns3 0000:7d:00.0: weight_offset: 14
hns3 0000:7d:00.0: tc(0): no sp mode
hns3 0000:7d:00.0: tc(1): no sp mode
hns3 0000:7d:00.0: tc(2): no sp mode
hns3 0000:7d:00.0: tc(3): no sp mode
hns3 0000:7d:00.0: tc(4): no sp mode
hns3 0000:7d:00.0: tc(5): no sp mode
hns3 0000:7d:00.0: tc(6): no sp mode
hns3 0000:7d:00.0: tc(7): no sp mode
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Add "FD flow table" info query function

All the Flow Director rules are stored in tcam blocks.
For each bit of tcam entry, the match value
depends on two input value(x, y).

debugfs command:
echo dump fd tcam > cmd

Sample output:
root@(none)# echo dump fd tcam > cmd
hns3 0000:7d:00.0: read result tcam key x(31):
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 08000000
hns3 0000:7d:00.0: 00000600
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: read result tcam key y(31):
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: f7ff0000
hns3 0000:7d:00.0: 0000f900
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 00000000
hns3 0000:7d:00.0: 0000fff8
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Add "queue info" query function

Query the queue information of the current NIC
such as BD size, queue header and tail pointer.

This patch adds support for debugfs command:
echo queue info 1 > cmd

it can print queue config information...

root@(none)# echo queue info 1 > cmd
hns3 0000:7d:00.0: queue info
hns3 0000:7d:00.0: RX(1) BASE ADD: 0x00000000ffb58000
hns3 0000:7d:00.0: RX(1) RING BD NUM: 127
hns3 0000:7d:00.0: RX(1) RING BD LEN: 2
hns3 0000:7d:00.0: RX(1) RING TAIL: 120
hns3 0000:7d:00.0: RX(1) RING HEAD: 0
hns3 0000:7d:00.0: RX(1) RING FBDNUM: 0
hns3 0000:7d:00.0: RX(1) RING PKTNUM: 0
hns3 0000:7d:00.0: TX(1) BASE ADD: 0x00000000fffd8000
hns3 0000:7d:00.0: TX(1) RING BD NUM: 127
hns3 0000:7d:00.0: TX(1) RING TC: 0
hns3 0000:7d:00.0: TX(1) RING TAIL: 2
hns3 0000:7d:00.0: TX(1) RING HEAD: 2
hns3 0000:7d:00.0: TX(1) RING FBDNUM: 0
hns3 0000:7d:00.0: TX(1) RING OFFSET: 0
hns3 0000:7d:00.0: TX(1) RING PKTNUM: 0
root@(none)#

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hns3: Add debugfs framework registration

Add the debugfs framework to the driver and create a debugfs
command interface for each device.

example command:
"echo queue info > cmd" Query the packet forwarding queue information.

Signed-off-by: liuzhongzhu <liuzhongzhu@huawei.com>
Signed-off-by: Salil Mehta <salil.mehta@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: cavium: clean up return value check in cavium_ptp_probe

ptp_clock_register never return NULL, so no need check this
in cavium_ptp_probe.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: vitesse: remove duplicate support for VSC8574

A more featureful support for VSC8574 was recently added to the
Microsemi (mscc.c) driver. I checked that features supported in the
Vitesse driver are also supported in the Microsemi driver.

Signed-off-by: Quentin Schulz <quentin.schulz@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: amd: add missing of_node_put()

of_find_node_by_path() acquires a reference to the node
returned by it and that reference needs to be dropped by its caller.
This place doesn't do that, so fix it.

Signed-off-by: Yangtao Li <tiny.windzz@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'octeontx2-af-CGX-LMAC-link-bringup-and-cleanups'

Linu Cherian says:

====================
octeontx2-af: CGX LMAC link bringup and cleanups

Patch 1: Code cleanup
Patch 2: Adds support for an unhandled hardware configuration
Patch 3: Preparatory patch for enabling cgx lmac links
Patch 4: Support for enabling cgx lmac links
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: Bringup CGX LMAC links by default

- Added new CGX firmware interface API for sending link up/down
  commands

- Do link up for cgx lmac ports by default at the time of CGX
  driver probe. Since cgx link up in driver probe affects the
  Linux boot time, linkup procedure is kept threaded using
  workqueues.
  For this, a new cgx API cgx_lmac_linkup_start has been added.

Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: Unregister cgx event callbacks gracefully

Added provision to unregister cgx event callbacks.
This enables the exit path to ensure event callbacks are
unregistered before workqueues get destroyed.

Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: Handle non-contiguous CGX LMAC interfaces

For this, cgx_id(struct cgx) definition has been changed to
reflect cgx port id instead of device instance id.
Now cgx_id can be directly used as channel offset for NPC configuration.
Assumptions on contiguous cgx port ids has been removed from
nix_calibrate_x2p as well.

As a side effect, allocation of conversion tables that were based
on cgx count are changed to cgx port id max value.
Tables would return NULL for invalid cgx ports.

Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

octeontx2-af: Misc cleanups in cgx driver

* Do CGX init before NIX init
  This would add consistency in NIX code that depends on cgx ports

* Few other misc cleanups
  - rvu_cgx_probe renamed as rvu_cgx_init for consistency
  - rvu_cgx_exit wrapper added to take care of the exit path
  - Added error check on cgx_lmac_event_handler_init
  - Minor cleanups in cgx.h related to tab alignment
  - Removed redundant ids from enum cgx_cmd_id

Signed-off-by: Linu Cherian <lcherian@marvell.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: hinic: fix null pointer dereference on pointer hwdev

Pointer hwdev is being dereferenced when declaring hwif , however, later
on hwdev is being null checked, hence we have dereference before null
check error. Fix this by assigning hwif and pdef only once hwdev has
been null checked.

Detected by CoverityScan, CID#1485581 ("Dereference before null check")

Fixes: 4a61abb100c8 ("net-next/hinic:add rx checksum offload for HiNIC")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'smc-next'

Ursula Braun says:

====================
net/smc: patches 2018-11-22

here are more patches for SMC:
* patches 1-3 and 7 are cleanups without functional change
* patches 4-6 and 8 are optimizations of existing code
* patches 9 and 10 introduce and exploit LLC message DELETE RKEY
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: unregister rkeys of unused buffer

When an rmb is no longer in use by a connection, unregister its rkey at
the remote peer with an LLC DELETE RKEY message. With this change,
unused buffers held in the buffer pool are no longer registered at the
remote peer. They are registered before the buffer is actually used and
unregistered when they are no longer used by a connection.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: add infrastructure to send delete rkey messages

Add the infrastructure to send LLC messages of type DELETE RKEY to
unregister a shared memory region at the peer.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: avoid a delay by waiting for nothing

When a send failed then don't start to wait for a response in
smc_llc_do_confirm_rkey.

Signed-off-by: Karsten Graul <kgraul@linux.ibm.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: cleanup listen worker mutex unlocking

For easier reading move the unlock of mutex smc_create_lgr_pending into
smc_listen_work(), i.e. into the function the mutex has been locked.
No functional change.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: short wait for late smc_clc_wait_msg

After sending one of the initial LLC messages CONFIRM LINK or
ADD LINK, there is already a wait for the LLC response. It does
not make sense to wait another long time for a CLC DECLINE. Thus
this patch introduces a shorter wait time for these cases.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: no link delete for a never active link

If a link is terminated that has never reached the active state,
there is no need to trigger an LLC DELETE LINK.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: allow fallback after clc timeouts

If connection initialization fails for the LLC CONFIRM LINK or the
LLC ADD LINK step, fallback to TCP should be enabled. Thus
the negative return code -EAGAIN should switch to a positive timeout
reason code in these cases, and the internal CLC socket should
not have a set sk_err.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: remove sock_error detour in clc-functions

There is no need to store the return value in sk_err, if it is
afterwards cleared again with sock_error(). This patch sets the
return value directly. Just cleanup, no functional change.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: make smc_lgr_free() static

smc_lgr_free() is just called inside smc_core.c. Make it static.
Just cleanup, no functional change.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/smc: cleanup tcp_listen_worker initialization

The tcp_listen_worker is already initialized when socket is
created (in smc_sock_alloc()). Get rid of the duplicate
initialization in smc_listen(). No functional change.

Signed-off-by: Ursula Braun <ubraun@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

team: no need to do team_notify_peers or team_mcast_rejoin when disabling port

team_notify_peers() will send ARP and NA to notify peers. team_mcast_rejoin()
will send multicast join group message to notify peers. We should do this when
enabling/changed to a new port. But it doesn't make sense to do it when a port
is disabled.

On the other hand, when we set mcast_rejoin_count to 2, and do a failover,
team_port_disable() will increase mcast_rejoin.count_pending to 2 and then
team_port_enable() will increase mcast_rejoin.count_pending to 4. We will send
4 mcast rejoin messages at latest, which will make user confused. The same
with notify_peers.count.

Fix it by deleting team_notify_peers() and team_mcast_rejoin() in
team_port_disable().

Reported-by: Liang Li <liali@redhat.com>
Fixes: fc423ff00df3a ("team: add peer notification")
Fixes: 492b200efdd20 ("team: add support for sending multicast rejoins")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: remove redundant check for eee->tx_lpi_timer < 0

fixes the smatch warning:

drivers/net/ethernet/marvell/mvneta.c:4252 mvneta_ethtool_set_eee() warn:
unsigned 'eee->tx_lpi_timer' is never less than zero.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Thomas Petazzoni <thomas.petazzoni@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: Add BPF_MAP_TYPE_QUEUE and BPF_MAP_TYPE_STACK to bpftool-map

I noticed that these two new BPF Maps are not defined in bpftool.
This patch defines those two maps and adds their names to the
bpftool-map documentation.

Signed-off-by: David Calavera <david.calavera@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

samples: bpf: fix: error handling regarding kprobe_events

Currently, kprobe_events failure won't be handled properly.
Due to calling system() indirectly to write to kprobe_events,
it can't be identified whether an error is derived from kprobe or system.

    // buf = "echo '%c:%s %s' >> /s/k/d/t/kprobe_events"
    err = system(buf);
    if (err < 0) {
        printf("failed to create kprobe ..");
        return -1;
    }

For example, running ./tracex7 sample in ext4 partition,
"echo p:open_ctree open_ctree >> /s/k/d/t/kprobe_events"
gets 256 error code system() failure.
=> The error comes from kprobe, but it's not handled correctly.

According to man of system(3), it's return value
just passes the termination status of the child shell
rather than treating the error as -1. (don't care success)

Which means, currently it's not working as desired.
(According to the upper code snippet)

    ex) running ./tracex7 with ext4 env.
    # Current Output
    sh: echo: I/O error
    failed to open event open_ctree

    # Desired Output
    failed to create kprobe 'open_ctree' error 'No such file or directory'

The problem is, error can't be verified whether from child ps
or system. But using write() directly can verify the command
failure, and it will treat all error as -1. So I suggest using
write() directly to 'kprobe_events' rather than calling system().

Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

libbpf: make bpf_object__open default to UNSPEC

currently by default libbpf's bpf_object__open requires
bpf's program to specify version in a code because of two things:
1) default prog type is set to KPROBE
2) KPROBE requires (in kernel/bpf/syscall.c) version to be specified

in this patch i'm changing default prog type to UNSPEC and also changing
requirments for version's section to be present in object file.
now it would reflect what we have today in kernel
(only KPROBE prog type requires for version to be explicitly set).

v1 -> v2:
- RFC tag has been dropped

Signed-off-by: Nikita V. Shirokov <tehnerd@tehnerd.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>

virtio-net: fail XDP set if guest csum is negotiated

We don't support partial csumed packet since its metadata will be lost
or incorrect during XDP processing. So fail the XDP set if guest_csum
feature is negotiated.

Fixes: f600b6905015 ("virtio_net: Add XDP support")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pavel Popa <pashinho1990@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

virtio-net: disable guest csum during XDP set

We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we
can receive partial csumed packets with metadata kept in the
vnet_hdr. This may have several side effects:

- It could be overridden by header adjustment, thus is might be not
correct after XDP processing.
- There's no way to pass such metadata information through
XDP_REDIRECT to another driver.
- XDP does not support checksum offload right now.

So simply disable guest csum if possible in this the case of XDP.

Fixes: 3f93522ffab2d ("virtio-net: switch off offloads on demand if possible on XDP set")
Reported-by: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Pavel Popa <pashinho1990@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
Intel Wired LAN Driver Updates 2018-11-21

This series contains updates to all of the Intel LAN drivers and
documentation.

Shannon Nelson updates the ixgbe kernel documentation to include IPsec
hardware offload.

Joe Perches cleans up whitespace issues in the igb driver.

Jesse update the netdev kernel documentation for NETIF_F_GSO_UDP_L4 to
align with the actual code. Also aligned all the NAPI driver code for
all of the Intel drivers to implement the recommendations of Eric
Dumazet to check the return code of the napi_complete_done() to
determine whether or not to enable interrupts or exit poll.

Paul E. McKenney replaces synchronize_sched() with synchronize_rcu() for
ixgbe.

Sasha implements suggestions made by Joe Perches to remove obsolete code
and to use the dev_err() method.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net-gro: use ffs() to speedup napi_gro_flush()

We very often have few flows/chains to look at, and we
might increase GRO_HASH_BUCKETS to 32 or 64 in the future.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'ceph-for-4.20-rc4' of https://github.com/ceph/ceph-client

Pullk ceph fix from Ilya Dryomov:
"A messenger fix, marked for stable"

* tag 'ceph-for-4.20-rc4' of https://github.com/ceph/ceph-client:
libceph: fall back to sendmsg for slab pages

Merge tag 'for-linus-20181123' of git://git.kernel.dk/linux-block

Pull block fix from Jens Axboe:
"Just a single fix for this week, fixing an issue with nvme-fc"

* tag 'for-linus-20181123' of git://git.kernel.dk/linux-block:
nvme-fc: resolve io failures during connect

net/sched: act_police: add missing spinlock initialization

commit f2cbd4852820 ("net/sched: act_police: fix race condition on state
variables") introduces a new spinlock, but forgets its initialization.
Ensure that tcf_police_init() initializes 'tcfp_lock' every time a 'police'
action is newly created, to avoid the following lockdep splat:

INFO: trying to register non-static key.
the code is fine but needs lockdep annotation.
turning off the locking correctness validator.
<...>
Call Trace:
  dump_stack+0x85/0xcb
  register_lock_class+0x581/0x590
  __lock_acquire+0xd4/0x1330
  ? tcf_police_init+0x2fa/0x650 [act_police]
  ? lock_acquire+0x9e/0x1a0
  lock_acquire+0x9e/0x1a0
  ? tcf_police_init+0x2fa/0x650 [act_police]
  ? tcf_police_init+0x55a/0x650 [act_police]
  _raw_spin_lock_bh+0x34/0x40
  ? tcf_police_init+0x2fa/0x650 [act_police]
  tcf_police_init+0x2fa/0x650 [act_police]
  tcf_action_init_1+0x384/0x4c0
  tcf_action_init+0xf6/0x160
  tcf_action_add+0x73/0x170
  tc_ctl_action+0x122/0x160
  rtnetlink_rcv_msg+0x2a4/0x490
  ? netlink_deliver_tap+0x99/0x400
  ? validate_linkmsg+0x370/0x370
  netlink_rcv_skb+0x4d/0x130
  netlink_unicast+0x196/0x230
  netlink_sendmsg+0x2e5/0x3e0
  sock_sendmsg+0x36/0x40
  ___sys_sendmsg+0x280/0x2f0
  ? _raw_spin_unlock+0x24/0x30
  ? handle_pte_fault+0xafe/0xf30
  ? find_held_lock+0x2d/0x90
  ? syscall_trace_enter+0x1df/0x360
  ? __sys_sendmsg+0x5e/0xa0
  __sys_sendmsg+0x5e/0xa0
  do_syscall_64+0x60/0x210
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x7f1841c7cf10
Code: c3 48 8b 05 82 6f 2c 00 f7 db 64 89 18 48 83 cb ff eb dd 0f 1f 80 00 00 00 00 83 3d 8d d0 2c 00 00 75 10 b8 2e 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 ae cc 00 00 48 89 04 24
RSP: 002b:00007ffcf9df4d68 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f1841c7cf10
RDX: 0000000000000000 RSI: 00007ffcf9df4dc0 RDI: 0000000000000003
RBP: 000000005bf56105 R08: 0000000000000002 R09: 00007ffcf9df8edc
R10: 00007ffcf9df47e0 R11: 0000000000000246 R12: 0000000000671be0
R13: 00007ffcf9df4e84 R14: 0000000000000008 R15: 0000000000000000

Fixes: f2cbd4852820 ("net/sched: act_police: fix race condition on state variables")
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: don't keep lonely packets forever in the gro hash

Eric noted that with UDP GRO and NAPI timeout, we could keep a single
UDP packet inside the GRO hash forever, if the related NAPI instance
calls napi_gro_complete() at an higher frequency than the NAPI timeout.
Willem noted that even TCP packets could be trapped there, till the
next retransmission.
This patch tries to address the issue, flushing the old packets -
those with a NAPI_GRO_CB age before the current jiffy - before scheduling
the NAPI timeout. The rationale is that such a timeout should be
well below a jiffy and we are not flushing packets eligible for sane GRO.

v1 -> v2:
- clarified the commit message and comment

RFC -> v1:
- added 'Fixes tags', cleaned-up the wording.

Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Fixes: 3b47d30396ba ("net: gro: add a per device gro flush timer")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: re-do dad when interface has IFF_NOARP flag change

When we add a new IPv6 address, we should also join corresponding solicited-node
multicast address, unless the interface has IFF_NOARP flag, as function
addrconf_join_solict() did. But if we remove IFF_NOARP flag later, we do
not do dad and add the mcast address. So we will drop corresponding neighbour
discovery message that came from other nodes.

A typical example is after creating a ipvlan with mode l3, setting up an ipv6
address and changing the mode to l2. Then we will not be able to ping this
address as the interface doesn't join related solicited-node mcast address.

Fix it by re-doing dad when interface changed IFF_NOARP flag. Then we will add
corresponding mcast group and check if there is a duplicate address on the
network.

Reported-by: Jianlin Shi <jishi@redhat.com>
Reviewed-by: Stefano Brivio <sbrivio@redhat.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'dpaa-coalesce'

Madalin Bucur says:

====================
dpaa_eth: add ethtool coalesce control

Add control of the DPAA portal interrupt coalescing settings from
ethtool.

changes from v2: read ithresh from HW, set previous values on failure
changes from v1: added range checking for the QMan APIs
====================

Signed-off-by: David S. Miller <davem@davemloft.net>