git.proxmox.com Git - mirror_ubuntu-kernels.git/log

tipc: unclone unbundled buffers before forwarding

When extracting an individual message from a received "bundle" buffer,
we just create a clone of the base buffer, and adjust it to point into
the right position of the linearized data area of the latter. This works
well for regular message reception, but during periods of extremely high
load it may happen that an extracted buffer, e.g, a connection probe, is
reversed and forwarded through an external interface while the preceding
extracted message is still unhandled. When this happens, the header or
data area of the preceding message will be partially overwritten by a
MAC header, leading to unpredicatable consequences, such as a link
reset.

We now fix this by ensuring that the msg_reverse() function never
returns a cloned buffer, and that the returned buffer always contains
sufficient valid head and tail room to be forwarded.

Reported-by: Erik Hugne <erik.hugne@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

kcm: fix /proc memory leak

Every open of /proc/net/kcm leaks 16 bytes of memory as is reported by
kmemleak:
unreferenced object 0xffff88059c0e3458 (size 192):
  comm "cat", pid 1401, jiffies 4294935742 (age 310.720s)
  hex dump (first 32 bytes):
    28 45 71 96 05 88 ff ff 00 10 00 00 00 00 00 00  (Eq.............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<ffffffff8156a2de>] kmem_cache_alloc_trace+0x16e/0x230
    [<ffffffff8162a479>] seq_open+0x79/0x1d0
    [<ffffffffa0578510>] kcm_seq_open+0x0/0x30 [kcm]
    [<ffffffff8162a479>] seq_open+0x79/0x1d0
    [<ffffffff8162a8cf>] __seq_open_private+0x2f/0xa0
    [<ffffffff81712548>] seq_open_net+0x38/0xa0
...

It is caused by a missing free in the ->release path. So fix it by
providing seq_release_net as the ->release method.

Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Fixes: cd6e111bf5 (kcm: Add statistics and proc interfaces)
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Tom Herbert <tom@herbertland.com>
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>

team: Fix possible deadlock during team enslave

Both dev_uc_sync_multiple() and dev_mc_sync_multiple() require the
source device to be locked by netif_addr_lock_bh(), but this is missing
in team's enslave function, so add it.

This fixes the following lockdep warning:

Possible interrupt unsafe locking scenario:

        CPU0                    CPU1
        ----                    ----
   lock(_xmit_ETHER/1);
                                local_irq_disable();
                                lock(&(&mc->mca_lock)->rlock);
                                lock(&team_netdev_addr_lock_key);
   <Interrupt>
     lock(&(&mc->mca_lock)->rlock);

  *** DEADLOCK ***

Fixes: cb41c997d444 ("team: team should sync the port's uc/mc addrs when add a port")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge tag 'linux-can-fixes-for-4.7-20160620' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can

Marc Kleine-Budde says:

====================
pull-request: can 2016-06-20

this is a pull request of 3 patches for the upcoming linux-4.7 release.

The first patch is by Thor Thayer for the c_can/d_can driver. It fixes the
registar access on Altera Cyclone devices, which caused CAN frames to have 0x0
in the first two bytes incorrectly. Wolfgang Grandegger's patch for the at91
driver fixes a hanging driver under high bus load situations. A patch for the
gs_usb driver by Maximilian Schneider adds support for the bytewerk.org
candleLight interface.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

can: gs_usb: Add Basic support for the bytewerk.org candleLight interface

This patchs adds basic support for the bytewerk.org candleLight interface,
a open hardware (CERN OHL) USB CAN adapter.

Signed-off-by: Hubert Denkmair <hubert@denkmair.de>
Signed-off-by: Maximilian Schneider <max@schneidersoft.net>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: at91_can: RX queue could get stuck at high bus load

At high bus load it could happen that "at91_poll()" enters with all RX
message boxes filled up. If then at the end the "quota" is exceeded as
well, "rx_next" will not be reset to the first RX mailbox and hence the
interrupts remain disabled.

Signed-off-by: Wolfgang Grandegger <wg@grandegger.com>
Tested-by: Amr Bekhit <amrbekhit@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

can: c_can: Update D_CAN TX and RX functions to 32 bit - fix Altera Cyclone access

When testing CAN write floods on Altera's CycloneV, the first 2 bytes
are sometimes 0x00, 0x00 or corrupted instead of the values sent. Also
observed bytes 4 & 5 were corrupted in some cases.

The D_CAN Data registers are 32 bits and changing from 16 bit writes to
32 bit writes fixes the problem.

Testing performed on Altera CycloneV (D_CAN). Requesting tests on other
C_CAN & D_CAN platforms.

Reported-by: Richard Andrysek <richard.andrysek@gomtec.de>
Signed-off-by: Thor Thayer <tthayer@opensource.altera.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>

Merge branch 'qed-fixes'

Yuval Mintz says:

====================
qed*: Fixes series

This series contains several small fixes to driver behavior
[4th patch is the only one containing a 'fatal' fix, but the error
is only theoretical for qede; if would require another protocol
driver yet unsubmitted to reach it].
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Add missing port-mode

The 'MODULE_FIBER' value replaced several other FIBER values
in newer management firmware images, so existing code would
fail to properly reflect its mode.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Fix returning unlimited SPQ entries

Driver has 2 sets of entries for handling ramrod configurations
toward firmware - a regular pre-allocated set of entires and a
possible 'unlimited' list of additional pending entries.

In most scenarios the 'unlimited' list would not be used, but
when it does the handling of the ramrod completion doesn't
properly handle the release of the entry.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed*: Don't reset statistics on inner reload

Several user APIs can cause driver to perform an inner-reload.
Currently, doing this would cause the HW/FW statistics of the
adapter to reset, which isn't the expected behavior [statistics
should only reset on explicit unloads].

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Prevent VF from Tx-switching 'promisc'

Internal loopback in driver is based on two things - first
is the willingness of transmitter to use it [in case of VFs,
this can be forced based on VEPA/VEB] and secondly on another
vport classification configuration which should match the
packet's destination.

Current code allows non-linux VFs to configure a 'promisc'
mode on Tx, meaning all traffic sent by VF would be loopbacked
internally by firmware; This isn't considered a valid mode and
as such should be prevented by PF.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Correct default vlan behavior

When no vlan filter is configured, firmware has a configurable
default on whether to pass only untagged packets or all packets
regardless of their tagging. Driver currently doesn't set this
field in the necessary ramrod, causing the default to always be
'receive all'.

Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: rds: fix coding style issues

Fix coding style issues in the following files:

ib_cm.c:      add space
loop.c:       convert spaces to tabs
sysctl.c:     add space
tcp.h:        convert spaces to tabs
tcp_connect.c:remove extra indentation in switch statement
tcp_recv.c:   convert spaces to tabs
tcp_send.c:   convert spaces to tabs
transport.c:  move brace up one line on for statement

Signed-off-by: Joshua Houghton <josh@awful.name>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

AX.25: Close socket connection on session completion

A socket connection made in ax.25 is not closed when session is
completed. The heartbeat timer is stopped prematurely and this is
where the socket gets closed. Allow heatbeat timer to run to close
socket. Symptom occurs in kernels >= 4.2.0

Originally sent 6/15/2016. Resend with distribution list matching
scripts/maintainer.pl output.

Signed-off-by: Basil Gunn <basil@pacabunga.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

RDS: TCP: rds_tcp_accept_one() should transition socket from RESETTING to UP

The state of the rds_connection after rds_tcp_reset_callbacks() would
be RDS_CONN_RESETTING and this is the value that should be passed
by rds_tcp_accept_one() to rds_connect_path_complete() to transition
the socket to RDS_CONN_UP.

Fixes: b5c21c0947c1 ("RDS: TCP: fix race windows in send-path quiescence
by rds_tcp_accept_one()")
Signed-off-by: Sowmini Varadhan <sowmini.varadhan@oracle.com>
Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: tilegx: use correct timespec64 type

The conversion to the 64-bit time based ptp methods left two instances
of 'struct timespec' in place. This is harmless because 64-bit
architectures define timespec64 as timespec, and this driver is
not used on 32-bit machines.

However, using 'struct timespec64' directly is obviously the right
thing to do, and will help us remove 'struct timespec' in the future.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Fixes: b9acf24f779c ("ptp: tilegx: convert to the 64 bit get/set time methods.")
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlxsw-fixes'

Jiri Pirko says:

====================
mlxsw: couple of fixes

Couple of slowpath tx stats fixes for Spectrum and SwitchX-2.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: switchx2: Don't count internal TX header bytes to stats

Stop the SW TX counter from counting the TX header bytes
since they are not being sent out.

Fixes: e577516b9db3 ("mlxsw: Fix use-after-free bug in mlxsw_sx_port_xmit")
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum: Don't count internal TX header bytes to stats

Stop the SW TX counter from counting the TX header bytes
since they are not being sent out.

Fixes: 56ade8fe3fe1 ("mlxsw: spectrum: Add initial support for Spectrum ASIC")
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: Nogah Frankel <nogahf@mellanox.com>
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: fix socket timer deadlock

We sometimes observe a 'deadly embrace' type deadlock occurring
between mutually connected sockets on the same node. This happens
when the one-hour peer supervision timers happen to expire
simultaneously in both sockets.

The scenario is as follows:

CPU 1:                          CPU 2:
--------                        --------
tipc_sk_timeout(sk1)            tipc_sk_timeout(sk2)
  lock(sk1.slock)                 lock(sk2.slock)
  msg_create(probe)               msg_create(probe)
  unlock(sk1.slock)               unlock(sk2.slock)
  tipc_node_xmit_skb()            tipc_node_xmit_skb()
    tipc_node_xmit()                tipc_node_xmit()
      tipc_sk_rcv(sk2)                tipc_sk_rcv(sk1)
        lock(sk2.slock)                 lock((sk1.slock)
        filter_rcv()                    filter_rcv()
          tipc_sk_proto_rcv()             tipc_sk_proto_rcv()
            msg_create(probe_rsp)           msg_create(probe_rsp)
            tipc_sk_respond()               tipc_sk_respond()
              tipc_node_xmit_skb()            tipc_node_xmit_skb()
                tipc_node_xmit()                tipc_node_xmit()
                  tipc_sk_rcv(sk1)                tipc_sk_rcv(sk2)
                    lock((sk1.slock)                lock((sk2.slock)
                    ===> DEADLOCK                   ===> DEADLOCK

Further analysis reveals that there are three different locations in the
socket code where tipc_sk_respond() is called within the context of the
socket lock, with ensuing risk of similar deadlocks.

We now solve this by passing a buffer queue along with all upcalls where
sk_lock.slock may potentially be held. Response or rejected message
buffers are accumulated into this queue instead of being sent out
directly, and only sent once we know we are safely outside the slock
context.

Reported-by: GUNA <gbalasun@gmail.com>
Acked-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for your net tree,
they are rather small patches but fixing several outstanding bugs in
nf_conntrack and nf_tables, as well as minor problems with missing
SYNPROXY header uapi installation:

1) Oneliner not to leak conntrack kmemcache on module removal, this
   problem was introduced in the previous merge window, patch from
   Florian Westphal.

2) Two fixes for insufficient ruleset loop validation, one due to
   incorrect flag check in nf_tables_bind_set() and another related to
   silly wrong generation mask logic from the walk path, from Liping
   Zhang.

3) Fix double-free of anonymous sets on error, this fix simplifies the
   code to let the abort path take care of releasing the set object,
   also from Liping Zhang.

4) The introduction of helper function for transactions broke the skip
   inactive rules logic from the nft_do_chain(), again from Liping
   Zhang.

5) Two patches to install uapi xt_SYNPROXY.h header and calm down
   kbuild robot due to missing #include <linux/types.h>.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: xt_SYNPROXY: include missing <linux/types.h>

./usr/include/linux/netfilter/xt_SYNPROXY.h:11: found __[us]{8,16,32,64} type without #include <linux/types.h>

Reported-by: kbuild test robot <lkp@intel.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: xt_SYNPROXY: add missing header to Kbuild

Matt Whitlock says:

Without this line, the file xt_SYNPROXY.h does not get installed in
/usr/include/linux/netfilter/, and thus user-space programs cannot make
use of it.

Reported-by: Matt Whitlock <kernel@mattwhitlock.name>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

nfp: use correct index to mask link state irq

We were using an incorrect define to get the irq vector number.
NFP_NET_CFG_LSC is a control BAR offset, LSC interrupt vector
index is called NFP_NET_IRQ_LSC_IDX. For machines with less
than 30 CPUs this meant that we were disabling/enabling IRQ 0.
For bigger hosts we were just playing with the 31st RX/TX
interrupt.

Fixes: 0ba40af963f0 ("nfp: move link state interrupt request/free calls")
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sit: correct IP protocol used in ipip6_err

Since 32b8a8e59c9c ("sit: add IPv4 over IPv4 support")
ipip6_err() may be called for packets whose IP protocol is
IPPROTO_IPIP as well as those whose IP protocol is IPPROTO_IPV6.

In the case of IPPROTO_IPIP packets the correct protocol value is not
passed to ipv4_update_pmtu() or ipv4_redirect().

This patch resolves this problem by using the IP protocol of the packet
rather than a hard-coded value. This appears to be consistent
with the usage of the protocol of a packet by icmp_socket_deliver()
the caller of ipip6_err().

I was able to exercise the redirect case by using a setup where an ICMP
redirect was received for the destination of the encapsulated packet.
However, it appears that although incorrect the protocol field is not used
in this case and thus no problem manifests. On inspection it does not
appear that a problem will manifest in the fragmentation needed/update pmtu
case either.

In short I believe this is a cosmetic fix. None the less, the use of
IPPROTO_IPV6 seems wrong and confusing.

Reviewed-by: Dinan Gunawardena <dinan.gunawardena@netronome.com>
Signed-off-by: Simon Horman <simon.horman@netronome.com>
Acked-by: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlx4e: Do not attempt to offload VXLAN ports that are unrecognized

The mlx4e driver does not support more than one port for VXLAN offload. As
such expecting the hardware to offload other ports is invalid since it
appears the parsing logic is used to perform Tx checksum and segmentation
offloads. Use the vxlan_port number to determine in which cases we can
apply the offload and in which cases we can not.

Signed-off-by: Alexander Duyck <aduyck@mirantis.com>
Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: sfc: avoid -Wtype-limits warning

When building with -Wextra, we get a harmless warning from the
EFX_EXTRACT_OWORD32 macro:

ethernet/sfc/farch.c: In function 'efx_farch_test_registers':
ethernet/sfc/farch.c:119:30: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits]
ethernet/sfc/farch.c:124:144: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits]
ethernet/sfc/farch.c:124:392: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits]
ethernet/sfc/farch.c:124:731: error: comparison of unsigned expression < 0 is always false [-Werror=type-limits]

The macro and the caller are both correct, but we can avoid the
warning by changing the index variable to a signed type.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Bert Kenward <bkenward@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'r8152-fixes'

Hayes Wang says:

====================
r8152: fix known issues

These patches fix some known issues.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: correct the rx early size

The rx early size should be

(agg_buf_sz - packet size) / 8

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: reset the bmu

Reset the BMU to clear the rx/tx fifo. This avoids that the unexpected
data remains in the hw.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: disable MAC clock speed down

Disable MAC clock speed down. It may casue the first control
transfer to contain the wrong data, when the power state change
from U1 to U0.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'bpf-fixes'

Alexei Starovoitov says:

====================
bpf fixes

Fixes for two bpf bugs:
1st bug reported by Sasha Goldshtein here:
https://github.com/iovisor/bcc/issues/570
2nd discovered by Daniel Borkmann by manual code analysis.
See patches for details.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

bpf, trace: check event type in bpf_perf_event_read

similar to bpf_perf_event_output() the bpf_perf_event_read() helper
needs to check the type of the perf_event before reading the counter.

Fixes: a43eec304259 ("bpf: introduce bpf_perf_event_output() helper")
Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

bpf: fix matching of data/data_end in verifier

The ctx structure passed into bpf programs is different depending on bpf
program type. The verifier incorrectly marked ctx->data and ctx->data_end
access based on ctx offset only. That caused loads in tracing programs
int bpf_prog(struct pt_regs *ctx) { .. ctx->ax .. }
to be incorrectly marked as PTR_TO_PACKET which later caused verifier
to reject the program that was actually valid in tracing context.
Fix this by doing program type specific matching of ctx offsets.

Fixes: 969bf05eb3ce ("bpf: direct packet access")
Reported-by: Sasha Goldshtein <goldshtn@gmail.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

gre: fix error handler

1) gre_parse_header() can be called from gre_err()

   At this point transport header points to ICMP header, not the inner
header.

2) We can not really change transport header as ipgre_err() will later
assume transport header still points to ICMP header (using icmp_hdr())

3) pskb_may_pull() logic in gre_parse_header() really works
  if we are interested at zone pointed by skb->data

4) As Jiri explained in commit b7f8fe251e46 ("gre: do not pull header in
ICMP error processing") we should not pull headers in error handler.

So this fix :

A) changes gre_parse_header() to use skb->data instead of
skb_transport_header()

B) Adds a nhs parameter to gre_parse_header() so that we can skip the
not pulled IP header from error path.
  This offset is 0 for normal receive path.

C) remove obsolete IPV6 includes

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Tom Herbert <tom@herbertland.com>
Cc: Maciej Żenczykowski <maze@google.com>
Cc: Jiri Benc <jbenc@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Don't forget pr_fmt on net_dbg_ratelimited for CONFIG_DYNAMIC_DEBUG

The implementation of net_dbg_ratelimited in the CONFIG_DYNAMIC_DEBUG
case was added with 2c94b5373 ("net: Implement net_dbg_ratelimited() for
CONFIG_DYNAMIC_DEBUG case"). The implementation strategy was to take the
usual definition of the dynamic_pr_debug macro, but alter it by adding a
call to "net_ratelimit()" in the if statement. This is, in fact, the
correct approach.

However, while doing this, the author of the commit forgot to surround
fmt by pr_fmt, resulting in unprefixed log messages appearing in the
console. So, this commit adds back the pr_fmt(fmt) invocation, making
net_dbg_ratelimited properly consistent across DEBUG, no DEBUG, and
DYNAMIC_DEBUG cases, and bringing parity with the behavior of
dynamic_pr_debug as well.

Fixes: 2c94b5373 ("net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Tim Bingham <tbingham@akamai.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: skfb: remove obsolete -I cflag

The skfp driver has been moved to drivers/net/fddi/skfp a long time
ago, but we still attempt to include headers from the old location,
which causes a warning when building with W=1:

cc1: error: /git/arm-soc/drivers/net/skfp: No such file or directory [-Werror=missing-include-dirs]
cc1: error: drivers/net/skfp: No such file or directory [-Werror=missing-include-dirs]

Clearly this include directive is not needed any more, so we can
just remove it now.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: eliminate uninitialized variable warning

net/tipc/link.c: In function ‘tipc_link_timeout’:
net/tipc/link.c:744:28: warning: ‘mtyp’ may be used uninitialized in this function [-Wuninitialized]

Fixes: 42b18f605fea ("tipc: refactor function tipc_link_timeout()")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: fix suspicious RCU usage

When run tipcTS&tipcTC test suite, the following complaint appears:

[   56.926168] ===============================
[   56.926169] [ INFO: suspicious RCU usage. ]
[   56.926171] 4.7.0-rc1+ #160 Not tainted
[   56.926173] -------------------------------
[   56.926174] net/tipc/bearer.c:408 suspicious rcu_dereference_protected() usage!
[   56.926175]
[   56.926175] other info that might help us debug this:
[   56.926175]
[   56.926177]
[   56.926177] rcu_scheduler_active = 1, debug_locks = 1
[   56.926179] 3 locks held by swapper/4/0:
[   56.926180]  #0:  (((&req->timer))){+.-...}, at: [<ffffffff810e79b5>] call_timer_fn+0x5/0x340
[   56.926203]  #1:  (&(&req->lock)->rlock){+.-...}, at: [<ffffffffa000c29b>] disc_timeout+0x1b/0xd0 [tipc]
[   56.926212]  #2:  (rcu_read_lock){......}, at: [<ffffffffa00055e0>] tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc]
[   56.926218]
[   56.926218] stack backtrace:
[   56.926221] CPU: 4 PID: 0 Comm: swapper/4 Not tainted 4.7.0-rc1+ #160
[   56.926222] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2007
[   56.926224]  0000000000000000 ffff880016803d28 ffffffff813c4423 ffff8800154252c0
[   56.926227]  0000000000000001 ffff880016803d58 ffffffff810b7512 ffff8800124d8120
[   56.926230]  ffff880013f8a160 ffff8800132b5ccc ffff8800124d8120 ffff880016803d88
[   56.926234] Call Trace:
[   56.926235]  <IRQ>  [<ffffffff813c4423>] dump_stack+0x67/0x94
[   56.926250]  [<ffffffff810b7512>] lockdep_rcu_suspicious+0xe2/0x120
[   56.926256]  [<ffffffffa00051f1>] tipc_l2_send_msg+0x131/0x1c0 [tipc]
[   56.926261]  [<ffffffffa000567c>] tipc_bearer_xmit_skb+0x14c/0x2e0 [tipc]
[   56.926266]  [<ffffffffa00055e0>] ? tipc_bearer_xmit_skb+0xb0/0x2e0 [tipc]
[   56.926273]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
[   56.926278]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
[   56.926283]  [<ffffffffa000c2d6>] disc_timeout+0x56/0xd0 [tipc]
[   56.926288]  [<ffffffff810e7a68>] call_timer_fn+0xb8/0x340
[   56.926291]  [<ffffffff810e79b5>] ? call_timer_fn+0x5/0x340
[   56.926296]  [<ffffffffa000c280>] ? tipc_disc_init_msg+0x1f0/0x1f0 [tipc]
[   56.926300]  [<ffffffff810e8f4a>] run_timer_softirq+0x23a/0x390
[   56.926306]  [<ffffffff810f89ff>] ? clockevents_program_event+0x7f/0x130
[   56.926316]  [<ffffffff819727c3>] __do_softirq+0xc3/0x4a2
[   56.926323]  [<ffffffff8106ba5a>] irq_exit+0x8a/0xb0
[   56.926327]  [<ffffffff81972456>] smp_apic_timer_interrupt+0x46/0x60
[   56.926331]  [<ffffffff81970a49>] apic_timer_interrupt+0x89/0x90
[   56.926333]  <EOI>  [<ffffffff81027fda>] ? default_idle+0x2a/0x1a0
[   56.926340]  [<ffffffff81027fd8>] ? default_idle+0x28/0x1a0
[   56.926342]  [<ffffffff810289cf>] arch_cpu_idle+0xf/0x20
[   56.926345]  [<ffffffff810adf0f>] default_idle_call+0x2f/0x50
[   56.926347]  [<ffffffff810ae145>] cpu_startup_entry+0x215/0x3e0
[   56.926353]  [<ffffffff81040ad9>] start_secondary+0xf9/0x100

The warning appears as rtnl_dereference() is wrongly used in
tipc_l2_send_msg() under RCU read lock protection. Instead the proper
usage should be that rcu_dereference_rtnl() is called here.

Fixes: 5b7066c3dd24 ("tipc: stricter filtering of packets in bearer layer")
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: Ying Xue <ying.xue@windriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'macsec-fixes'

Sabrina Dubroca says:

====================
macsec fixes

Patch 1 adds rcu_barrier() during module unload to prevent possible
panics.

Patch 2 allocates memory for scattergather lists and the IV on the
heap, since they can escape the current function's context during
crypto callbacks.

Patch 3 fixes a failure to create secure associations.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

macsec: fix SA initialization

The ASYNC flag prevents initialization on some physical machines.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

macsec: allocate sg and iv on the heap

For the crypto callbacks to work properly, we cannot have sg and iv on
the stack. Use kmalloc instead, with a single allocation for
aead_request + scatterlist + iv.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

macsec: add rcu_barrier() on module exit

Without this, the various uses of call_rcu could cause a kernel panic.

Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver")
Signed-off-by: Sabrina Dubroca <sd@queasysnail.net>
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

htb: call qdisc_root with rcu read lock held

saw a debug splat:
net/include/net/sch_generic.h:287 suspicious rcu_dereference_check() usage!
other info that might help us debug this:
rcu_scheduler_active = 1, debug_locks = 0
2 locks held by kworker/2:1/710:
#0: ("events"){.+.+.+}, at: [<ffffffff8106ca1d>]
#1: ((&q->work)){+.+...}, at: [<ffffffff8106ca1d>] process_one_work+0x14d/0x690
Workqueue: events htb_work_func
Call Trace:
[<ffffffff812dc763>] dump_stack+0x85/0xc2
[<ffffffff8109fee7>] lockdep_rcu_suspicious+0xe7/0x120
[<ffffffff814ced47>] htb_work_func+0x67/0x70

Signed-off-by: Florian Westphal <fw@strlen.de>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net sched actions: bug fix dumping actions directly didnt produce NLMSG_DONE

This refers to commands to direct action access as follows:

sudo tc actions add action drop index 12
sudo tc actions add action pipe index 10

And then dumping them like so:
sudo tc actions ls action gact

iproute2 worked because it depended on absence of TCA_ACT_TAB TLV
as end of message.
This fix has been tested with iproute2 and is backward compatible.

Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

act_ipt: fix a bind refcnt leak

And avoid calling tcf_hash_check() twice.

Fixes: a57f19d30b2d ("net sched: ipt action fix late binding")
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net_sched: prio: insure proper transactional behavior

Now prio_init() can return -ENOMEM, it also has to make sure
any allocated qdiscs are freed, since the caller (qdisc_create()) wont
call ->destroy() handler for us.

More generally, we want a transactional behavior for "tc qdisc
change ...", so prio_tune() should not make modifications if
any error is returned.

It means that we must validate parameters and allocate missing qdisc(s)
before taking root qdisc lock exactly once, to not leave the prio qdisc
in an intermediate state.

Fixes: cbdf45116478 ("net_sched: prio: properly report out of memory errors")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx4_en: initialize cmd.context_lock spinlock earlier

Maciej Żenczykowski reported lockdep warning a spinlock
was not registered before being held in mlx4_cmd_wake_completions()

cmd.context_lock initialization is not at the right place.

1) mlx4_cmd_use_events() can be called multiple times.
   Calling spin_lock_init() on a live spinlock can lead
   to hangs.

2) mlx4_cmd_wake_completions() can be called while lock
   has not been initialized.
   Lockdep complains, and current logic is not race prone.

It seems better to move the initialization earlier in
mlx4_load_one()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Maciej Żenczykowski <maze@google.com>
Cc: Eugenia Emantayev <eugenia@mellanox.com>
Cc: Tariq Toukan <tariqt@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

netfilter: nf_tables: fix a wrong check to skip the inactive rules

nft_genmask_cur has already done left-shift operator on the gencursor,
so there's no need to do left-shift operator on it again.

Fixes: ea4bd995b0f2 ("netfilter: nf_tables: add transaction helper functions")
Cc: Patrick McHardy <kaber@trash.net>
Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: fix wrong destroy anonymous sets if binding fails

When we add a nft rule like follows:
# nft add rule filter test tcp dport vmap {1: jump test}
-ELOOP error will be returned, and the anonymous set will be
destroyed.

But after that, nf_tables_abort will also try to remove the
element and destroy the set, which was already destroyed and
freed.

If we add a nft wrong rule, nft_tables_abort will do the cleanup
work rightly, so nf_tables_set_destroy call here is redundant and
wrong, remove it.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: nf_tables: reject loops from set element jump to chain

Liping Zhang says:

"Users may add such a wrong nft rules successfully, which will cause an
endless jump loop:

  # nft add rule filter test tcp dport vmap {1: jump test}

This is because before we commit, the element in the current anonymous
set is inactive, so osp->walk will skip this element and miss the
validate check."

To resolve this problem, this patch passes the generation mask to the
walk function through the iter container structure depending on the code
path:

1) If we're dumping the elements, then we have to check if the element
   is active in the current generation. Thus, we check for the current
   bit in the genmask.

2) If we're checking for loops, then we have to check if the element is
   active in the next generation, as we're in the middle of a
   transaction. Thus, we check for the next bit in the genmask.

Based on original patch from Liping Zhang.

Reported-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
Tested-by: Liping Zhang <liping.zhang@spreadtrum.com>

netfilter: nf_tables: fix wrong check of NFT_SET_MAP in nf_tables_bind_set

We should check "i" is used as a dictionary or not, "binding" is already
checked before.

Signed-off-by: Liping Zhang <liping.zhang@spreadtrum.com>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

netfilter: conntrack: destroy kmemcache on module removal

I forgot to move the kmem_cache_destroy into the exit path.

Fixes: 0c5366b3a8c7 ("netfilter: conntrack: use single slab cache)
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>

Merge branch 'ovs-notifications'

Nicolas Dichtel says:

====================
ovs: fix rtnl notifications on interface deletion

There was no rtnl notifications for interfaces (gre, vxlan, geneve) created
by ovs. This problem is fixed by adjusting the creation path.

v1 -> v2:
- add patch #1 and #4
- rework error handling in patch #2
====================

Acked-by: Pravin B Shelar <pshelar@ovn.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ovs/geneve: fix rtnl notifications on iface deletion

The function geneve_dev_create_fb() (only used by ovs) never calls
rtnl_configure_link(). The consequence is that dev->rtnl_link_state is
never set to RTNL_LINK_INITIALIZED.
During the deletion phase, the function rollback_registered_many() sends
a RTM_DELLINK only if dev->rtnl_link_state is set to RTNL_LINK_INITIALIZED.

Fixes: e305ac6cf5a1 ("geneve: Add support to collect tunnel metadata.")
CC: Pravin B Shelar <pshelar@nicira.com>
CC: Jesse Gross <jesse@nicira.com>
CC: Thomas Graf <tgraf@suug.ch>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ovs/gre: fix rtnl notifications on iface deletion

The function gretap_fb_dev_create() (only used by ovs) never calls
rtnl_configure_link(). The consequence is that dev->rtnl_link_state is
never set to RTNL_LINK_INITIALIZED.
During the deletion phase, the function rollback_registered_many() sends
a RTM_DELLINK only if dev->rtnl_link_state is set to RTNL_LINK_INITIALIZED.

Fixes: b2acd1dc3949 ("openvswitch: Use regular GRE net_device instead of vport")
CC: Thomas Graf <tgraf@suug.ch>
CC: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ovs/vxlan: fix rtnl notifications on iface deletion

The function vxlan_dev_create() (only used by ovs) never calls
rtnl_configure_link(). The consequence is that dev->rtnl_link_state is
never set to RTNL_LINK_INITIALIZED.
During the deletion phase, the function rollback_registered_many() sends
a RTM_DELLINK only if dev->rtnl_link_state is set to RTNL_LINK_INITIALIZED.

Note that the function vxlan_dev_create() is moved after the rtnl stuff so
that vxlan_dellink() can be called in this function.

Fixes: dcc38c033b32 ("openvswitch: Re-add CONFIG_OPENVSWITCH_VXLAN")
CC: Thomas Graf <tgraf@suug.ch>
CC: Pravin B Shelar <pshelar@nicira.com>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ovs/gre,geneve: fix error path when creating an iface

After ipgre_newlink()/geneve_configure() call, the netdev is registered.

Fixes: 7e059158d57b ("vxlan, gre, geneve: Set a large MTU on ovs-created tunnel devices")
CC: David Wragg <david@weave.works>
Signed-off-by: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp reuseport: fix packet of same flow hashed to different socket

There is a corner case in which udp packets belonging to a same
flow are hashed to different socket when hslot->count changes from 10
to 11:

1) When hslot->count <= 10, __udp_lib_lookup() searches udp_table->hash,
and always passes 'daddr' to udp_ehashfn().

2) When hslot->count > 10, __udp_lib_lookup() searches udp_table->hash2,
but may pass 'INADDR_ANY' to udp_ehashfn() if the sockets are bound to
INADDR_ANY instead of some specific addr.

That means when hslot->count changes from 10 to 11, the hash calculated by
udp_ehashfn() is also changed, and the udp packets belonging to a same
flow will be hashed to different socket.

This is easily reproduced:
1) Create 10 udp sockets and bind all of them to 0.0.0.0:40000.
2) From the same host send udp packets to 127.0.0.1:40000, record the
socket index which receives the packets.
3) Create 1 more udp socket and bind it to 0.0.0.0:44096. The number 44096
is 40000 + UDP_HASH_SIZE(4096), this makes the new socket put into the
same hslot as the aformentioned 10 sockets, and makes the hslot->count
change from 10 to 11.
4) From the same host send udp packets to 127.0.0.1:40000, and the socket
index which receives the packets will be different from the one received
in step 2.
This should not happen as the socket bound to 0.0.0.0:44096 should not
change the behavior of the sockets bound to 0.0.0.0:40000.

It's the same case for IPv6, and this patch also fixes that.

Signed-off-by: Su, Xuemin <suxm@chinanetcenter.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net_sched: fix pfifo_head_drop behavior vs backlog

When the qdisc is full, we drop a packet at the head of the queue,
queue the current skb and return NET_XMIT_CN

Now we track backlog on upper qdiscs, we need to call
qdisc_tree_reduce_backlog(), even if the qlen did not change.

Fixes: 2ccccf5fb43f ("net_sched: update hierarchical backlog too")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: WANG Cong <xiyou.wangcong@gmail.com>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: alx: Work around the DMA RX overflow issue

Commit 26c5f03 uses a new skb allocator to avoid the RFD overflow
issue.

But from debugging without datasheet, we found the error always
happen when the DMA RX address is set to 0x....fc0, which is very
likely to be a HW/silicon problem.

So one idea is instead of adding a new allocator, why not just
hitting the right target by avaiding the error-prone DMA address?

This patch will actually
* Remove the commit 26c5f03
* Apply rx skb with 64 bytes longer space, and if the allocated skb
has a 0x...fc0 address, it will use skb_resever(skb, 64) to
advance the address, so that the RX overflow can be avoided.

In theory this method should also apply to atl1c driver, which
I can't find anyone who can help to test on real devices.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=70761
Signed-off-by: Feng Tang <feng.tang@intel.com>
Suggested-by: Eric Dumazet <edumazet@google.com>
Tested-by: Ole Lukoie <olelukoie@mail.ru>
Acked-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv4: fix checksum annotation in udp4_csum_init

Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Tom Herbert <tom@herbertland.com>
Fixes: 4068579e1e098fa ("net: Implmement RFC 6936 (zero RX csums for UDP/IPv6")
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: fix checksum annotation in udp6_csum_init

Cc: Tom Herbert <tom@herbertland.com>
Fixes: 4068579e1e098fa ("net: Implmement RFC 6936 (zero RX csums for UDP/IPv6")
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: tcp: fix endianness annotation in tcp_v6_send_response

Cc: Florent Fourcot <florent.fourcot@enst-bretagne.fr>
Fixes: 1d13a96c74fc ("ipv6: tcp: fix flowlabel value in ACK messages send from TIME_WAIT")
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: fix endianness error in icmpv6_err

IPv6 ping socket error handler doesn't correctly convert the new 32 bit
mtu to host endianness before using.

Cc: Lorenzo Colitti <lorenzo@google.com>
Fixes: 6d0bfe22611602f ("net: ipv6: Add IPv6 support to the ping socket.")
Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

phy: marvell: fix LED configuration via marvell,reg-init

Configuring the PHY LED registers for the Marvell 88E1510 and others is
not possible, because regardless of the values in marvell,reg-init, it
is later overridden in m88e1121_config_aneg with a non-standard default.

This patch moves that default configuration to .config_init to allow
setting the LED configuration through marvell,reg-init in the device
tree, which should override said default if it exists.

Signed-off-by: Clemens Gruber <clemens.gruber@pqgruber.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ti: cpsw: use destroy ctlr to destroy channels

There is no reason to destroy channels that are destroyed while
cpdma_ctlr destroy. In this case no need to remember how much
channels where created and destroy them by one, as cpdma_ctlr
destroys all of them.

Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net_sched: prio: properly report out of memory errors

At Qdisc creation or change time, prio_tune() creates missing
pfifo qdiscs but does not return an error code if one
qdisc could not be allocated.

Leaving a qdisc in non operational state without telling user
anything about this problem is not good.

Also, testing if we replace something different than noop_qdisc
a second time makes no sense so I removed useless code.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipconfig: Protect ic_addrservaddr with IPCONFIG_DYNAMIC.

>> net/ipv4/ipconfig.c:130:15: warning: 'ic_addrservaddr' defined but not used [-Wunused-variable]
static __be32 ic_addrservaddr = NONE; /* IP Address of the IP addresses'server */

Reported-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: au1000_eth: fix PHY detection

Commit 7f854420fbfe9d49afe2ffb1df052cfe8e215541
("phy: Add API for {un}registering an mdio device to a bus.")
broke PHY detection on this driver with a copy-paste bug:
The code is looking 32 times for a PHY at address 0.

Fixes ethernet on AMD DB1100/DB1500/DB1550 boards which have
their (autodetected) PHYs at address 31.

Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "net: au1000_eth: fix PHY detection"

This reverts commit a2f27217e4e60e663b5b971b0ccb287a9548b04e.

I applied the wrong version of this.

Signed-off-by: David S. Miller <davem@davemloft.net>

net: au1000_eth: fix PHY detection

Commit 7f854420fbfe9d49afe2ffb1df052cfe8e215541
("phy: Add API for {un}registering an mdio device to a bus.")
broke PHY detection on this driver with a copy-paste bug:
The code is looking 32 times for a PHY at address 0.

Fixes ethernet on AMD DB1100/DB1500/DB1550 boards which have
their (autodetected) PHYs at address 31.

Cc: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Manuel Lauss <manuel.lauss@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mediatek-fixes'

John Crispin says:

====================
net: mediatek: various small fixes

This series contains various small fixes that we stumbled across while
doing thorough testing and code level reviewing of the driver.

Changes in V2:
* drop the DQL patch from the list until a better solution is found
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mediatek: remove superfluous queue wake up call

The code checks if the queue should be stopped because we are below the
threshold of free descriptors only to check if it should be started again.
If we do end up in a state where we are at the threshold limit, it makes
more sense to just stop the queue and wait for the next IRQ to trigger the
TX housekeeping again. There is no rush in enqueuing the next packet, it
needs to wait for all the others in the queue to be dispatched first
anyway.

Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mediatek: only wake the queue if it is stopped

The current code unconditionally wakes up the queue at the end of each
tx_poll action. Change the code to only wake up the queues if any of
them have actually been stopped before.

Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mediatek: fix off by one in the TX ring allocation

The TX ring setup has an off by one error causing it to not utilise all
descriptors. This has the side effect that we need to reset the next
pointer at runtime to make it work. Fix the off by one and remove the
code fixing the ring at runtime.

Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mediatek: increase watchdog_timeo

During stress testing, after reducing the threshold value, we have seen
TX timeouts that were caused by the watchdog_timeo value being too low.
Increase the value to 5 * HZ which is a value commonly used by many other
drivers.

Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mediatek: fix threshold value

The logic to calculate the threshold value for stopping the TX queue is
bad. Currently it will always use 1/2 of the rings size, which is way too
much. Set the threshold to MAX_SKB_FRAGS. This makes sure that the queue
is stopped when there is not enough room to accept an additional segment.

Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mediatek: disable all interrupts during probe

The current code only disables those IRQs that we will later use. To
ensure that we have a predefined state, we really want to disable all IRQs.
Change the code to disable all IRQs to achieve this.

Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mediatek: add next data pointer coherency protection

The QDMA engine can fail to update the register pointing to the next TX
descriptor if this bit does not get set in the QDMA configuration register.
Not setting this bit can result in invalid values inside the TX rings
registers which will causes TX stalls.

Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mediatek: dropped rx packets are not being counted properly

There are two places inside mtk_poll_rx where rx_dropped is not being
incremented properly. Fix this by adding the missing code to increment
the counter.

Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mediatek: invalid buffer lookup in mtk_tx_map()

The lookup of the tx_buffer in the error path inside mtk_tx_map() uses the
wrong descriptor pointer. This looks like a copy & paste error. Change the
code to use the correct pointer.

Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mediatek: fix missing free of scratch memory

Scratch memory gets allocated in mtk_init_fq_dma() but the corresponding
code to free it is missing inside mtk_dma_free() causing a memory leak.
With this patch applied, we can run ifconfig up/down several thousand
times without any problems.

Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: mediatek: add missing return code check

The code fails to check if the scratch memory was properly allocated. Add
this check and return with an error if the allocation failed.

Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ipconfig: avoid warning by making ic_addrservaddr static

The symbol ic_addrservaddr is not static, but has no declaration
to match so make it static to fix the following warning:

net/ipv4/ipconfig.c:130:8: warning: symbol 'ic_addrservaddr' was not declared. Should it be static?

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: diag: add missing declarations

The functions inet_diag_msg_common_fill and inet_diag_msg_attrs_fill
seem to have been missed from the include/linux/inet_diag.h header
file. Add them to fix the following warnings:

net/ipv4/inet_diag.c:69:6: warning: symbol 'inet_diag_msg_common_fill' was not declared. Should it be static?
net/ipv4/inet_diag.c:108:5: warning: symbol 'inet_diag_msg_attrs_fill' was not declared. Should it be static?

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

bridge: Fix incorrect re-injection of STP packets

Commit 8626c56c8279 ("bridge: fix potential use-after-free when hook
returns QUEUE or STOLEN verdict") fixed incorrect usage of NF_HOOK's
return value by consuming packets in okfn via br_pass_frame_up().

However, this function re-injects packets to the Rx path with skb->dev
set to the bridge device, which breaks kernel's STP, as all STP packets
appear to originate from the bridge device itself.

Instead, if STP is enabled and bridge isn't a 802.1ad bridge, then learn
packet's SMAC and inject it back to the Rx path for further processing
by the packet handlers.

The patch also makes netfilter's behavior consistent with regards to
packets destined to the Bridge Group Address, as no hook registered at
LOCAL_IN will ever be called, regardless if STP is enabled or not.

Cc: Florian Westphal <fw@strlen.de>
Cc: Shmulik Ladkani <shmulik.ladkani@gmail.com>
Cc: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp>
Fixes: 8626c56c8279 ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict")
Signed-off-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: smsc: reintroduced unconditional soft reset

We detected some problems using the smsc lan8720a in combination with
i.MX28 and tracked this down to commit 21009686662f ("net: phy: smsc: move
smsc_phy_config_init reset part in a soft_reset function")
With 2100968666 the generic soft reset is replaced by a specific function
which handles power down state correctly. But additionally the soft reset
itself got conditional and is therefore also only performed if the phy is
in power down state.

This patch keeps the conditional wake up from power down, but
re-introduces the unconditional soft reset using the generic soft reset
function.
It was tested on linux-4.1.25 and linux-4.7.0-rc2.

Signed-off-by: Manfred Schlaegl <manfred.schlaegl@ginzinger.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net

Pull networking fixes from David Miller:

1) nfnetlink timestamp taken from wrong skb, fix from Florian Westphal.

2) Revert some msleep conversions in rtlwifi as these spots are in
    atomic context, from Larry Finger.

3) Validate that NFTA_SET_TABLE attribute is actually specified when we
    call nf_tables_getset().  From Phil Turnbull.

4) Don't do mdio_reset in stmmac driver with spinlock held as that can
    sleep, from Vincent Palatin.

5) sk_filter() does things other than run a BPF filter, so we should
    not elide it's call just because sk->sk_filter is NULL.  Fix from
    Eric Dumazet.

6) Fix missing backlog updates in several packet schedulers, from Cong
    Wang.

7) bnx2x driver should allow VLAN add/remove while the interface is
    down, from Michal Schmidt.

8) Several RDS/TCP race fixes from Sowmini Varadhan.

9) fq_codel scheduler doesn't return correct queue length in dumps,
    from Eric Dumazet.

10) Fix TCP stats for tail loss probe and early retransmit in ipv6, from
    Yuchung Cheng.

11) Properly initialize udp_tunnel_socket_cfg in l2tp_tunnel_create(),
    from Guillaume Nault.

12) qfq scheduler leaks SKBs if a kzalloc fails, fix from Florian
    Westphal.

13) sock_fprog passed into PACKET_FANOUT_DATA needs compat handling,
    from Willem de Bruijn.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (85 commits)
  vmxnet3: segCnt can be 1 for LRO packets
  packet: compat support for sock_fprog
  stmmac: fix parameter to dwmac4_set_umac_addr()
  net/mlx5e: Fix blue flame quota logic
  net/mlx5e: Use ndo_stop explicitly at shutdown flow
  net/mlx5: E-Switch, always set mc_promisc for allmulti vports
  net/mlx5: E-Switch, Modify node guid on vf set MAC
  net/mlx5: E-Switch, Fix vport enable flow
  net/mlx5: E-Switch, Use the correct error check on returned pointers
  net/mlx5: E-Switch, Use the correct free() function
  net/mlx5: Fix E-Switch flow steering capabilities check
  net/mlx5: Fix flow steering NIC capabilities check
  net/mlx5: Fix root flow table update
  net/mlx5: Fix MLX5_CMD_OP_MAX to be defined correctly
  net/mlx5: Fix masking of reserved bits in XRCD number
  net/mlx5: Fix the size of modify QP mailbox
  mlxsw: spectrum: Don't sleep during ndo_get_phys_port_name()
  mlxsw: spectrum: Make split flow match firmware requirements
  wext: Fix 32 bit iwpriv compatibility issue with 64 bit Kernel
  cfg80211: remove get/set antenna and tx power warnings
  ...

Merge tag 'sound-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound

Pull sound fixes from Takashi Iwai:
"We have only few, mainly HD-audio device-specific fixes.  Realtek
  codec driver got a slightly more LOC, but they are all for the new
  codec chip, and won't affect others at all"

* tag 'sound-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
  ALSA: hda - Add PCI ID for Kabylake
  ALSA: hda/realtek: Add T560 docking unit fixup
  ALSA: hda - Fix headset mic detection problem for Dell machine
  ALSA: uapi: Add three missing header files to Kbuild file
  ALSA: hda/realtek - Add support for new codecs ALC700/ALC701/ALC703
  ALSA: hda/realtek - ALC256 speaker noise issue

Merge tag 'drm-fixes-for-v4.7-rc3' of git://people.freedesktop.org/~airlied/linux

Pull drm fixes from Dave Airlie:
"This weeks instalment of fixes:

  amdgpu:
     Lots of memory leak and firmware leak fixes

  nouveau:
     Collection of display fixes, KASAN fixes

  vc4:
     vblank/pageflipping fixes

  fsl-dcu:
     Regmap cache fix

  omap:
     Unused variable warning fix.

  Nothing too surprising so far"

* tag 'drm-fixes-for-v4.7-rc3' of git://people.freedesktop.org/~airlied/linux: (46 commits)
  drm/amdgpu: fix warning with powerplay disabled.
  drm/amd/powerplay: delete useless code as pptable changed in vbios.
  drm/amd/powerplay: fix bug visit array out of bounds
  drm/amdgpu: fix smu ucode memleak (v2)
  drm/amdgpu: add release firmware for cgs
  drm/amdgpu: fix tonga smu_fini mem leak
  drm/amdgpu: fix fiji smu fini mem leak
  drm/amdgpu: fix cik sdma ucode memleak
  drm/amdgpu: fix sdma24 ucode mem leak
  drm/amdgpu: fix sdma3 ucode mem leak
  drm/amdgpu: fix uvd fini mem leak
  drm/amdgpu: fix gfx 7 ucode mem leak
  drm/amdgpu: fix gfx8 ucode mem leak
  drm/amdgpu: fix missing free wb for cond_exec
  drm/amdgpu: fix memleak in pptable_init
  drm/amdgpu: fix mem leak in atombios
  drm/amdgpu: fix mem leak in pplib/hwmgr
  drm/amdgpu: fix mem leak in smumgr
  drm/amdgpu: add pipeline sync while vmid switch in same ctx
  drm/amdgpu: vBIOS post only call when mem_size zero
  ...

Merge tag 'acpi-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull ACPI fix from Rafael Wysocki:
"A recently introduced boot regression related to the ACPI EC
  initialization is addressed by restoring the previous behavior (Lv
  Zheng)"

* tag 'acpi-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  ACPI / EC: Fix a boot EC regresion by restoring boot EC support for the DSDT EC

Merge tag 'pm-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull power management fixes from Rafael Wysocki:
"Stable-candidate fixes for the intel_pstate driver and the cpuidle
  core.

  Specifics:

   - Fix two intel_pstate initialization issues, one of which was
     introduced during the 4.4 cycle (Srinivas Pandruvada)

   - Fix kernel build with CONFIG_UBSAN set and CONFIG_CPU_IDLE unset
     (Catalin Marinas)"

* tag 'pm-4.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  cpufreq: intel_pstate: Fix ->set_policy() interface for no_turbo
  cpufreq: intel_pstate: Fix code ordering in intel_pstate_set_policy()
  cpuidle: Do not access cpuidle_devices when !CONFIG_CPU_IDLE

Merge branch 'akpm' (patches from Andrew)

Merge misc fixes from Andrew Morton:
"7 fixes"

* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
  mm/fadvise.c: do not discard partial pages with POSIX_FADV_DONTNEED
  mm: introduce dedicated WQ_MEM_RECLAIM workqueue to do lru_add_drain_all
  kernel/relay.c: fix potential memory leak
  mm: thp: broken page count after commit aa88b68c3b1d
  revert "mm: memcontrol: fix possible css ref leak on oom"
  kasan: change memory hot-add error messages to info messages
  mm/hugetlb: fix huge page reserve accounting for private mappings

vmxnet3: segCnt can be 1 for LRO packets

The device emulation may send segCnt of 1 for LRO packets.

Signed-off-by: Shrikrishna Khare <skhare@vmware.com>
Signed-off-by: Jin Heo <heoj@vmware.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

packet: compat support for sock_fprog

Socket option PACKET_FANOUT_DATA takes a struct sock_fprog as argument
if PACKET_FANOUT has mode PACKET_FANOUT_CBPF. This structure contains
a pointer into user memory. If userland is 32-bit and kernel is 64-bit
the two disagree about the layout of struct sock_fprog.

Add compat setsockopt support to convert a 32-bit compat_sock_fprog to
a 64-bit sock_fprog. This is analogous to compat_sock_fprog support for
SO_REUSEPORT added in commit 1957598840f4 ("soreuseport: add compat
case for setsockopt SO_ATTACH_REUSEPORT_CBPF").

Reported-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Willem de Bruijn <willemb@google.com>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

stmmac: fix parameter to dwmac4_set_umac_addr()

The dwmac4_set_umac_addr() takes a struct mac_device_info as
the first parameter, but is being passed a ioaddr instead from
dwmac4_set_filter(). Fix the warning/bug by changing the first
parameter.

drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:159:46: warning: incorrect type in argument 1 (different address spaces)
drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:159:46: expected struct mac_device_info *hw
drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c:159:46: got void [noderef] <asn:2>*ioaddr

Note, only compile tested this as do not have any
hardware with it in.

Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
Acked-by: Giuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mlx5-fixes'

Saeed Mahameed says:

====================
Mellanox 100G mlx5 fixes for 4.7-rc

The following series provides some small fixes for mlx5 driver.

Two small fixes for the mlx5e netdev, the 1st is for the blue flame
quota accounting and the 2nd is a small refactoring in shutdown flow.

Five trivial fixes for mlx5 E-Switch.
- Allmulti mc_promisc flag was not set in a specific flow.
- Modify VF node guid when admin mac is changed.
- Race in vport enable flow.
- Misc code fixes (kvfree when needed and error pointers checking).

Three in mlx5 steering area. Correct capabilities checking and root flow table update.

Three misc fixes in mlx5 commands enum and layouts.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: Fix blue flame quota logic

Blue flame is a latency enhancement feature that allows the driver to
write the packet data directly to the NIC's registers thus making the
read of the packet data from host memory redundant.

We maintain a quota for the blue flame which is reloaded whenever we
identify that the hardware is processing send requests and processes
them fast enough so by the time we post the next send request it was
able to process all the pending ones. This indicates that the hardware
is capable of processing more blue flame requests efficiently. The blue
flame quota is decremented whenever we send using blue flame.

The current code erroneously clears the budget if we did not use blue
flame for the current post send operation and we fix it here.

Fixes: 88a85f99e51f ('net/mlx5e: TX latency optimization to save DMA reads')
Signed-off-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>