git.proxmox.com Git - mirror_ubuntu-kernels.git/log

net: introduce SO_PEERGROUPS getsockopt

This adds the new getsockopt(2) option SO_PEERGROUPS on SOL_SOCKET to
retrieve the auxiliary groups of the remote peer. It is designed to
naturally extend SO_PEERCRED. That is, the underlying data is from the
same credentials. Regarding its syntax, it is based on SO_PEERSEC. That
is, if the provided buffer is too small, ERANGE is returned and @optlen
is updated. Otherwise, the information is copied, @optlen is set to the
actual size, and 0 is returned.

While SO_PEERCRED (and thus `struct ucred') already returns the primary
group, it lacks the auxiliary group vector. However, nearly all access
controls (including kernel side VFS and SYSVIPC, but also user-space
polkit, DBus, ...) consider the entire set of groups, rather than just
the primary group. But this is currently not possible with pure
SO_PEERCRED. Instead, user-space has to work around this and query the
system database for the auxiliary groups of a UID retrieved via
SO_PEERCRED.

Unfortunately, there is no race-free way to query the auxiliary groups
of the PID/UID retrieved via SO_PEERCRED. Hence, the current user-space
solution is to use getgrouplist(3p), which itself falls back to NSS and
whatever is configured in nsswitch.conf(3). This effectively checks
which groups we *would* assign to the user if it logged in *now*. On
normal systems it is as easy as reading /etc/group, but with NSS it can
resort to quering network databases (eg., LDAP), using IPC or network
communication.

Long story short: Whenever we want to use auxiliary groups for access
checks on IPC, we need further IPC to talk to the user/group databases,
rather than just relying on SO_PEERCRED and the incoming socket. This
is unfortunate, and might even result in dead-locks if the database
query uses the same IPC as the original request.

So far, those recursions / dead-locks have been avoided by using
primitive IPC for all crucial NSS modules. However, we want to avoid
re-inventing the wheel for each NSS module that might be involved in
user/group queries. Hence, we would preferably make DBus (and other IPC
that supports access-management based on groups) work without resorting
to the user/group database. This new SO_PEERGROUPS ioctl would allow us
to make dbus-daemon work without ever calling into NSS.

Cc: Michal Sekletar <msekleta@redhat.com>
Cc: Simon McVittie <simon.mcvittie@collabora.co.uk>
Reviewed-by: Tom Gundersen <teg@jklm.no>
Signed-off-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

udp: prefetch rmem_alloc in udp_queue_rcv_skb()

On UDP packets processing, if the BH is the bottle-neck, it
always sees a cache miss while updating rmem_alloc; try to
avoid it prefetching the value as soon as we have the socket
available.

Performances under flood with multiple NIC rx queues used are
unaffected, but when a single NIC rx queue is in use, this
gives ~10% performance improvement.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qede: Fix compilation without QED_RDMA

When CONFIG_QED_RDMA isn't defined, we'd hit the following:

/include/linux/qed/qede_rdma.h:84:19:
warning: ‘qede_rdma_dev_add’ used but never defined [enabled by default]
static inline int qede_rdma_dev_add(struct qede_dev *dev);

Fixes: bbfcd1e8e167 ("qed*: Set rdma generic functions prefix")
Signed-off-by: Chad Dupuis <chad.dupuis@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8152: correct the definition

Replace VLAN_HLEN and CRC_SIZE with ETH_FCS_LEN.

Signed-off-by: Hayes Wang <hayeswang@realtek.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Jeff Kirsher says:

====================
40GbE Intel Wired LAN Driver Updates 2017-06-20

This series contains updates to i40e and i40evf only.

Björn adds additional XDP support for i40e, by adding pass and drop actions
and XDP_TX action support.

Jake fixes a possible NULL pointer dereference in
i40evf_get_ethtool_stats() which could occur if the VF fails to recover
from a reset, and then a user requests statistics.  Changed the use of
dev_info() to dev_dbg() for vf_capability client routine so that the
standard log is not spammed with this information which "might" cause
administrators to worry.  Also added more code comments to help explain
why udp_port has be in host byte order and to avoid future changes which
may cause this to break.  Fixed the holding of the RTNL lock for the
entire reset routine, reduced the scope so that the reset function will
handle its own lock, so that we do not have to wrap every reference
to i40e_do_reset() with RTNL lock/unlock.

Alice updates flags related to firmware interactions for WoL and admin
queue command address with the correct value.

Sudheer makes a fix to ensure that the array is not accessed past the
size of the array.

Greg fixes the parsing of firmware 4.33 admin queue commmand "Get CEE
DCBX PER CFG" because the firmware now creates the oper_prio_tc nibbles
reversed from those in the CDD Priority Group sub-TLV.

Carolyn adds a check and message to let users know that when in MFP mode,
changing RSS hash input set is not supported.

Shannon makes the partition bandwidth control more generic since it is not
in just one form of multi-function partitioning (MFP).  Also fixes a bug
which was causing the firmware confusion in some reset sequences, when
we were disabling interrupts and we were clearing the whole register.
Instead we should only be clearing the CAUSE_ENA bit when disabling
interrupts.

Filip adds support for OEM firmware version, so that if a OEM specific
adapter is detected, ethtool reports the OEM product version in the
firmware version string instead of etrack id.

Alan fixes a bug where the driver was not correctly exiting overflow
promiscuous mode, which can happen if "too many" MAC filters are added,
putting the driver into overflow promiscuous mode, and the filters are
then removed.  The bug occurs because the conditional for toggling
promiscuous mode was only be executed when enabled and not when it was
disabled.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipmr-ip6mr-add-Netlink-notifications-on-cache-reports'

Julien Gomes says:

====================
ipmr/ip6mr: add Netlink notifications on cache reports

Currently, all ipmr/ip6mr cache reports are sent through the
mroute/mroute6 socket only.
This forces the use of a single socket for mroute programming, cache
reports and, regarding ipmr, IGMP messages without Router Alert option
reception.

The present patches are aiming to send Netlink notifications in addition
to the existing igmpmsg/mrt6msg to give user programs a way to handle
cache reports in parallel with multiple sockets other than the
mroute/mroute6 socket.

Changes in v2:
- Changed attributes naming from {IPMRA,IP6MRA}_CACHEREPORTA_* to
  {IPMRA,IP6MRA}_CREPORT_*
- Improved packet data copy to handle non-linear packets in
  ipmr/ip6mr cache report Netlink notification creation
- Added two rtnetlink groups with restricted-binding
- Changed cache report notified groups from RTNL_{IPV4,IPV6}_MROUTE to
  the new restricted groups in ipmr/ip6mr

Changes in v3:
- Put message size calculation for {igmp,mrt6}msg_netlink_event in separate
  functions
- Increased vif id attributes size from u8 to u32
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

ip6mr: add netlink notifications on mrt6msg cache reports

Add Netlink notifications on cache reports in ip6mr, in addition to the
existing mrt6msg sent to mroute6_sk.
Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV6_MROUTE_R.

MSGTYPE, MIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the
same data as their equivalent fields in the mrt6msg header.
PKT attribute is the packet sent to mroute6_sk, without the added
mrt6msg header.

Suggested-by: Ryan Halbrook <halbrook@arista.com>
Signed-off-by: Julien Gomes <julien@arista.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipmr: add netlink notifications on igmpmsg cache reports

Add Netlink notifications on cache reports in ipmr, in addition to the
existing igmpmsg sent to mroute_sk.
Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV4_MROUTE_R.

MSGTYPE, VIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the
same data as their equivalent fields in the igmpmsg header.
PKT attribute is the packet sent to mroute_sk, without the added igmpmsg
header.

Suggested-by: Ryan Halbrook <halbrook@arista.com>
Signed-off-by: Julien Gomes <julien@arista.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: add restricted rtnl groups for ipv4 and ipv6 mroute

Add RTNLGRP_{IPV4,IPV6}_MROUTE_R as two new restricted groups for the
NETLINK_ROUTE family.
Binding to these groups specifically requires CAP_NET_ADMIN to allow
multicast of sensitive messages (e.g. mroute cache reports).

Suggested-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: Julien Gomes <julien@arista.com>
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

rtnetlink: add NEWCACHEREPORT message type

New NEWCACHEREPORT message type to be used for cache reports sent
via Netlink, effectively allowing splitting cache report reception from
mroute programming.

Suggested-by: Ryan Halbrook <halbrook@arista.com>
Signed-off-by: Julien Gomes <julien@arista.com>
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: md5: hide unused variable

Changing from a memcpy to per-member comparison left the
size variable unused:

net/ipv4/tcp_ipv4.c: In function 'tcp_md5_do_lookup':
net/ipv4/tcp_ipv4.c:910:15: error: unused variable 'size' [-Werror=unused-variable]

This does not show up when CONFIG_IPV6 is enabled, but the
variable can be removed either way, along with the now unused
assignment.

Fixes: 6797318e623d ("tcp: md5: add an address prefix for key lookup")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

idsn: fix wrong skb_put() used

in my commit b952f4dff2751252db073c27c0f8a16a416a2ddc,
- *(u8 *)skb_put(skb_out, 1) = (u8)(accm >> 24); \
+ skb_put(skb_out, (u8)(accm >> 24)); \
it should skb_put_u8()

Fixes: b952f4dff275 ("net: manual clean code which call skb_put_[data:zero])")
Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

i40e: don't hold RTNL lock for the entire reset

We recently refactored i40e_do_reset() and its friends to be able to
hold the RTNL lock only for the portions that actually need to be
protected. However, a separate refactoring added several new callers of
these functions during the PCIe error recovery and suspend/resume
cycles.

When merging the changes together, it was not noticed that we could
reduce the RTNL scope by letting the reset function handle the lock
itself, as previously it was not possible.

Fix this by replacing these call sites to indicate that the reset
function should handle its own lock. This enables multiple PFs to reset
or resume simultaneously without serializing the resets via the RTNL
lock. The end result is that on systems with lots of PFs and VFs the
resets don't stall waiting for each other to finish.

It is probable that we can also do the same for i40e_do_reset_safe, but
this author did not research that change carefully enough to be
confident.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Handle PE_CRITERR properly with IWARP enabled

When IWARP is enabled, we weren't clearing the PE_CRITERR, just logging
it and removing it from the mask. We need to do a corer to reset the
PE_CRITERR register, so set the bit for that as we handle the
interrupt.

We should also be checking for the error against the PFINT_ICR0 register,
and only need to clear it in the value getting written to
PFINT_ICR0_ENA.

Signed-off-by: Catherine Sullivan <catherine.sullivan@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: clear only cause_ena bit

When disabling interrupts, we should only be clearing the CAUSE_ENA bit,
not clearing the whole register. Clearing the whole register sets the
NEXTQ_IDX field to 0 instead of 0x7ff which can confuse the Firmware in
some reset sequences.

Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: fix disabling overflow promiscuous mode

There exists a bug in which the driver does not correctly exit overflow
promiscuous mode.  This can occur if "too many" mac filters are added,
putting the driver into overflow promiscuous mode, and the filters are
then removed.  When the failed filters are removed, the driver reports
exiting overflow promiscuous mode which is correct, however traffic
continues to be received as if in promiscuous mode still.

The bug occurs because the conditional for toggling promiscuous mode was
set to only execute when promiscuous mode was enabled and not when it
was disabled as well.  This patch fixes the conditional to correctly
execute when promiscuous mode is toggled and not just enabled.  Without
this patch, the driver is unable to correctly exit overflow promiscuous
mode.

Signed-off-by: Alan Brady <alan.brady@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Add support for OEM firmware version

This patch adds support for OEM firmware version. If OEM specific
adapter is detected ethtool reports OEM product version in firmware
version string instead of etrack id.

Signed-off-by: Filip Sadowski <filip.sadowski@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: genericize the partition bandwidth control

Partition bandwidth control is not in just one form of MFP (multi-function
partitioning), so make the code more generic and be sure to nudge the Tx
scheduler for all MFP.

Copyright updated to 2017.

Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>
Signed-off-by: Mitch Williams <mitch.a.williams@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Add message for unsupported MFP mode

This patch adds a check and message if the device is in
MFP mode as changing RSS input set is not supported in
MFP mode.

Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Support firmware CEE DCB UP to TC map re-definition

Changes parsing of FW 4.33 AQ command Get CEE DCBX OPER CFG (0x0A07).
Change is required because FW now creates the oper_prio_tc
nibbles reversed from those in the CEE Priority Group sub-TLV.
This change will only apply to FW 4.33 as future FW versions will use a
different function to parse the CEE data.

Signed-off-by: Greg Bowers <gregory.j.bowers@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: Fix potential out of bound array access

This is a fix for the static code analysis issue where dcbcfg->numapps
could be greater than size of array (i.e dcbcfg->app[I40E_DCBX_MAX_APPS]).
The fix makes sure that the array is not accessed past the size of
of the array (i.e. I40E_DCBX_MAX_APPS).

Copyright updated to 2017.

Signed-off-by: Sudheer Mogilappagari <sudheer.mogilappagari@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: comment that udp_port must be in host byte order

The firmware expects the port number passed when setting up
the UDP tunnel configuration to be in Little Endian format.
The i40e_aq_add_udp_tunnel command byte swaps the value from
host order to Little Endian.

Since commit fe0b0cd97b4f ("i40e: send correct port number to
AdminQ when enabling UDP tunnels") we've correctly
sent the value in host order.

Let's also add a comment to the function explaining that it must
be in host order, as the port numbers are commonly stored as Big
Endian values.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: use dev_dbg instead of dev_info when warning about missing routine

When searching for the vf_capability client routine, dev_info() was
used, instead of the normal dev_dbg(). This causes the message to be
displayed at standard log levels which can cause administrators to
worry. Avoid this by using dev_dbg instead.

Copyright updated to 2017.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e/i40evf: update WOL and I40E_AQC_ADDR_VALID_MASK flags

Update a few flags related to FW interactions.

Copyright updated to 2017.

Signed-off-by: Alice Michael <alice.michael@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40evf: assign num_active_queues inside i40evf_alloc_queues

The variable num_active_queues represents the number of active queues we
have for the device. We assign this pretty early in i40evf_init_subtask.

Several code locations are written with loops over the tx_rings and
rx_rings structures, which don't get allocated until
i40evf_alloc_queues, and which get freed by i40evf_free_queues.

These call sites were written under the assumption that tx_rings and
rx_rings would always be allocated at least when num_active_queues is
non-zero.

Lets fix this by moving the assignment into the function where we
allocate queues. We'll use a temporary variable for storage so that we
don't assign the value in the adapter structure until after the rings
have been set up.

Finally, when we free the queues, we'll clear the value to ensure that
we do not loop over the rings memory that no longer exists.

This resolves a possible NULL pointer dereference in
i40evf_get_ethtool_stats which could occur if the VF fails to recover
from a reset, and then a user requests statistics.

Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: add support for XDP_TX action

This patch adds proper XDP_TX action support. For each Tx ring, an
additional XDP Tx ring is allocated and setup. This version does the
DMA mapping in the fast-path, which will penalize performance for
IOMMU enabled systems. Further, debugfs support is not wired up for
the XDP Tx rings.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

i40e: add XDP support for pass and drop actions

This commit adds basic XDP support for i40e derived NICs. All XDP
actions will end up in XDP_DROP.

Signed-off-by: Björn Töpel <bjorn.topel@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

Merge tag 'mlx5-updates-2017-06-20' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux

Saeed Mahameed says:

====================
mlx5-updates-2017-06-20 (mlx5 IPoIB updates)

This series includes updates to mlx5 IPoIB netdevice driver (mlx5i),

1. We move ipoib files into separate directory, to allow it to grow
separately in its own space
2. Remove HW update carrier logic from IPoIB and VF representors profiles.
3. Add basic ethtool support. (Rings options/statistics and driver info).
4. Change MTU support.
5. Xmit path statistics reporting.
6. add PTP support.

For the new ethtool ops, PTP (ioctl) and change_mtu ndos in IPoIB, we didn't add new
implementation or new logic, we only reused those callbacks from the already existing
mlx5e (ethernet netdevice profile) and exposed them in IPoIB netdevice/ethtool ops.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 's390-net-updates-part-2'

Julian Wiedmann says:

====================
s390/net updates, part 2 (v2)

thanks for the feedback. Here's an updated patchset that honours
the reverse christmas tree and drops the __packed attribute. Please apply.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: use diag26c to get MAC address on L2

When a s390 guest runs on a z/VM host that's part of a SSI cluster,
it can be migrated to a different host. In this case, the MAC address
it originally obtained on the old host may be re-assigned to a new
guest. This would result in address conflicts between the two guests.

When running as z/VM guest, use the diag26c MAC Service to obtain
a hypervisor-managed MAC address. The MAC Service is SSI-aware, and
won't re-assign the address after the guest is migrated to a new host.

This patch adds support for the z/VM MAC Service on L2 devices.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/diag: add diag26c support

Implement support for the hypervisor diagnose 0x26c
('Access Certain System Information').
It passes a request buffer and a subfunction code, and receives
a response buffer and a return code.

Also add the scaffolding for the 'MAC Services' subfunction.
It may be used by network devices to obtain a hypervisor-managed
MAC address.

Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Acked-by: Heiko Carstens <heiko.carstens@de.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: fix packing buffer statistics

There's two spots in qeth_send_packet() where we don't accurately
account for transmitted packing buffers in qeth's performance
statistics:

1) when flushing the current buffer due to insufficient size,
   and the next buffer is not EMPTY, we need to account for that
   flushed buffer.
2) when synchronizing with the TX completion code, we reset
   flush_count and thus forget to account for any previously
   flushed buffers.

Reported-by: Nils Hoppmann <niho@de.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: add ipa return codes for bridgeport

add ipa return codes for Bridgeport (HiperSockets and OSA) according to
system level design.

Signed-off-by: Kittipon Meesompop <kmeesomp@linux.vnet.ibm.com>
Reviewed-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Reviewed-by: Ursula Braun <ubraun@linux.vnet.ibm.com>
Signed-off-by: Julian Wiedmann <jwi@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: handle errors when updating asoc

It's a bad thing not to handle errors when updating asoc. The memory
allocation failure in any of the functions called in sctp_assoc_update()
would cause sctp to work unexpectedly.

This patch is to fix it by aborting the asoc and reporting the error when
any of these functions fails.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sctp: uncork the old asoc before changing to the new one

local_cork is used to decide if it should uncork asoc outq after processing
some cmds, and it is set when replying or sending msgs. local_cork should
always have the same value with current asoc q->cork in some way.

The thing is when changing to a new asoc by cmd SET_ASOC, local_cork may
not be consistent with the current asoc any more. The cmd seqs can be:

  SCTP_CMD_UPDATE_ASSOC (asoc)
  SCTP_CMD_REPLY (asoc)
  SCTP_CMD_SET_ASOC (new_asoc)
  SCTP_CMD_DELETE_TCB (new_asoc)
  SCTP_CMD_SET_ASOC (asoc)
  SCTP_CMD_REPLY (asoc)

The 1st REPLY makes OLD asoc q->cork and local_cork both are 1, and the cmd
DELETE_TCB clears NEW asoc q->cork and local_cork. After asoc goes back to
OLD asoc, q->cork is still 1 while local_cork is 0. The 2nd REPLY will not
set local_cork because q->cork is already set and it can't be uncorked and
sent out because of this.

To keep local_cork consistent with the current asoc q->cork, this patch is
to uncork the old asoc if local_cork is set before changing to the new one.

Note that the above cmd seqs will be used in the next patch when updating
asoc and handling errors in it.

Suggested-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: Neil Horman <nhorman@tuxdriver.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dccp: call inet_add_protocol after register_pernet_subsys in dccp_v6_init

Patch "call inet_add_protocol after register_pernet_subsys in dccp_v4_init"
fixed a null pointer dereference issue for dccp_ipv4 module.

The same fix is needed for dccp_ipv6 module.

Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dccp: call inet_add_protocol after register_pernet_subsys in dccp_v4_init

Now dccp_ipv4 works as a kernel module. During loading this module, if
one dccp packet is being recieved after inet_add_protocol but before
register_pernet_subsys in which v4_ctl_sk is initialized, a null pointer
dereference may be triggered because of init_net.dccp.v4_ctl_sk is 0x0.

Jianlin found this issue when the following call trace occurred:

[  171.950177] BUG: unable to handle kernel NULL pointer dereference at 0000000000000110
[  171.951007] IP: [<ffffffffc0558364>] dccp_v4_ctl_send_reset+0xc4/0x220 [dccp_ipv4]
[...]
[  171.984629] Call Trace:
[  171.984859]  <IRQ>
[  171.985061]
[  171.985213]  [<ffffffffc0559a53>] dccp_v4_rcv+0x383/0x3f9 [dccp_ipv4]
[  171.985711]  [<ffffffff815ca054>] ip_local_deliver_finish+0xb4/0x1f0
[  171.986309]  [<ffffffff815ca339>] ip_local_deliver+0x59/0xd0
[  171.986852]  [<ffffffff810cd7a4>] ? update_curr+0x104/0x190
[  171.986956]  [<ffffffff815c9cda>] ip_rcv_finish+0x8a/0x350
[  171.986956]  [<ffffffff815ca666>] ip_rcv+0x2b6/0x410
[  171.986956]  [<ffffffff810c83b4>] ? task_cputime+0x44/0x80
[  171.986956]  [<ffffffff81586f22>] __netif_receive_skb_core+0x572/0x7c0
[  171.986956]  [<ffffffff810d2c51>] ? trigger_load_balance+0x61/0x1e0
[  171.986956]  [<ffffffff81587188>] __netif_receive_skb+0x18/0x60
[  171.986956]  [<ffffffff8158841e>] process_backlog+0xae/0x180
[  171.986956]  [<ffffffff8158799d>] net_rx_action+0x16d/0x380
[  171.986956]  [<ffffffff81090b7f>] __do_softirq+0xef/0x280
[  171.986956]  [<ffffffff816b6a1c>] call_softirq+0x1c/0x30

This patch is to move inet_add_protocol after register_pernet_subsys in
dccp_v4_init, so that v4_ctl_sk is initialized before any incoming dccp
packets are processed.

Reported-by: Jianlin Shi <jishi@redhat.com>
Signed-off-by: Xin Long <lucien.xin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

enic: Fix format truncation warning

With -Wformat-truncation, gcc throws the following warning.

Fix this by increasing the size of devname to accommodate 15 character
netdev interface name and description.

Remove length format precision for %s. We can fit entire name.

Also increment the version.

drivers/net/ethernet/cisco/enic/enic_main.c: In function ‘enic_open’:
drivers/net/ethernet/cisco/enic/enic_main.c:1740:15: warning: ‘%u’ directive output may be truncated writing between 1 and 2 bytes into a region of size between 1 and 12 [-Wformat-truncation=]
     "%.11s-rx-%u", netdev->name, i);
               ^~
drivers/net/ethernet/cisco/enic/enic_main.c:1740:5: note: directive argument in the range [0, 16]
     "%.11s-rx-%u", netdev->name, i);
     ^~~~~~~~~~~~~
drivers/net/ethernet/cisco/enic/enic_main.c:1738:4: note: ‘snprintf’ output between 6 and 18 bytes into a destination of size 16
    snprintf(enic->msix[intr].devname,
    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     sizeof(enic->msix[intr].devname),
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     "%.11s-rx-%u", netdev->name, i);
     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Signed-off-by: Govindarajulu Varadarajan <gvaradar@cisco.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: enable TSO for IPv6

There is nothing in the IP that prevents us from enabling TSO for IPv6.

Before patch:
ftp fe80::2aa:bbff:fecc:1336%eth0
ftp> get /dev/zero
882512708 bytes received in 00:14 (56.11 MiB/s)

After patch:
ftp fe80::2aa:bbff:fecc:1336%eth0
ftp> get /dev/zero
1203326784 bytes received in 00:12 (94.52 MiB/s)

Signed-off-by: Niklas Cassel <niklas.cassel@axis.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ibmvnic: Return from ibmvnic_resume if not in VNIC_OPEN state

If the ibmvnic driver is not in the VNIC_OPEN state, return from
ibmvnic_resume callback. If we are not in the VNIC_OPEN state, interrupts
may not be initialized and directly calling the interrupt handler will
cause a crash.

Signed-off-by: John Allen <jallen@linux.vnet.ibm.com>
Reviewed-by: Nathan Fontenot <nfont@linux.vnet.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: lxt: Export link partner advertising

Provide link partner advertising information.
Removed testing for gigabit modes, which is useless for a fast ethernet phy.

Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mediatek-various-performance-improvements'

John Crispin says:

====================
net-next: mediatek: various performance improvements

During development we mainly ran testing using iperf doing 1500 byte
tcp frames. It was pointed out recently, that the driver does not perform
very well when using 512 byte udp frames. The biggest problem was that
RPS was not working as no rx queue was being set. fixing this more than
doubled the throughput. Additionally the IRQ mask register is now locked
independently for RX and TX. RX IRQ aggregation is also added. With all
these patches applied we can almost triple the throughput.

While at it we also add PHY status change reporting for GMACs connecting
directly to a PHY.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net-next: mediatek: set the rx_queue to 0

The get_rps_cpu() function will not do any RPS on the data flow when no
queue is setup and always use the current cpu where the IRQ was handled
to also handle the backlog. As we only have one physical queue we always
set this to 0 unconditionally.

Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-next: mediatek: split IRQ register locking into TX and RX

Originally the driver only utilised the new QDMA engine. The current code
still assumes this is the case when locking the IRQ mask register. Since
RX now runs on the old style PDMA engine we can add a second lock. This
patch reduces the IRQ latency as the TX and RX path no longer need to wait
on each other under heavy load.

Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-next: mediatek: add RX IRQ delay support

The PDMA engine used for RX allows IRQ aggregation. The patch sets up the
corresponding registers to aggregate 4 IRQs into one. Using aggregation
reduces the load on the core handling to a quarter thus reducing IRQ
latency and increasing RX performance by around 10%.

Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-next: mediatek: print phy status changes for non DSA GMACs

Currently PHY status changes are only printed for DSA ports. This patch
adds code to also print status changes for non-fixed links.

Signed-off-by: John Crispin <john@phrozen.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'vxlan-cleanup-and-IPv6-link-local-support'

Matthias Schiffer says:

====================
vxlan: cleanup and IPv6 link-local support

Running VXLANs over IPv6 link-local addresses allows to use them as a
drop-in replacement for VLANs, avoiding to allocate additional outer IP
addresses to run the VXLAN over.

Since v1, I have added a lot more consistency checks to the address
configuration, making sure address families and scopes match. To simplify
the implementation, I also did some general refactoring of the
configuration handling in the new first patch of the series.

The second patch is more cleanup; is slightly touches OVS code, so that
list is in CC this time, too.

As in v1, the last two patches actually make VXLAN over IPv6 link-local
work, and allow multiple VXLANs with the same VNI and port, as long as
link-local addresses on different interfaces are used. As suggested, I now
store in the flags field if the VXLAN uses link-local addresses or not.

v3 removes log messages as suggested by Roopa Prabhu (as it is very unusual
for errors in netlink requests to be printed to the kernel log.) The commit
message of patch 5 has been extended to add a note about IPv4.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: allow multiple VXLANs with same VNI for IPv6 link-local addresses

As link-local addresses are only valid for a single interface, we can allow
to use the same VNI for multiple independent VXLANs, as long as the used
interfaces are distinct. This way, VXLANs can always be used as a drop-in
replacement for VLANs with greater ID space.

This also extends VNI lookup to respect the ifindex when link-local IPv6
addresses are used, so using the same VNI on multiple interfaces can
actually work.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: fix snooping for link-local IPv6 addresses

If VXLAN is run over link-local IPv6 addresses, it is necessary to store
the ifindex in the FDB entries. Otherwise, the used interface is undefined
and unicast communication will most likely fail.

Support for link-local IPv4 addresses should be possible as well, but as
the semantics aren't as well defined as for IPv6, and there doesn't seem to
be much interest in having the support, it's not implemented for now.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: check valid combinations of address scopes

* Multicast addresses are never valid as local address
* Link-local IPv6 unicast addresses may only be used as remote when the
local address is link-local as well
* Don't allow link-local IPv6 local/remote addresses without interface

We also store in the flags field if link-local addresses are used for the
follow-up patches that actually make VXLAN over link-local IPv6 work.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: improve validation of address family configuration

Address families of source and destination addresses must match, and
changelink operations can't change the address family.

In addition, always use the VXLAN_F_IPV6 to check if a VXLAN device uses
IPv4 or IPv6.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: get rid of redundant vxlan_dev.flags

There is no good reason to keep the flags twice in vxlan_dev and
vxlan_config.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

vxlan: refactor verification and application of configuration

The vxlan_dev_configure function was mixing validation and application of
the vxlan configuration; this could easily lead to bugs with the changelink
operation, as it was hard to see if the function wcould return an error
after parts of the configuration had already been applied.

This commit splits validation and application out of vxlan_dev_configure as
separate functions to make it clearer where error returns are allowed and
where the vxlan_dev or net_device may be configured. Log messages in these
functions are removed, as it is generally unexpected to find error output
for netlink requests in the kernel log. Userspace should be able to handle
errors based on the error codes returned via netlink just fine.

In addition, some validation and initialization is moved to vxlan_validate
and vxlan_setup respectively to improve grouping of similar settings.

Finally, this also fixes two actual bugs:

* if set, conf->mtu would overwrite dev->mtu in each changelink operation,
  reverting other changes of dev->mtu
* the "if (!conf->dst_port)" branch would never be run, as conf->dst_port
  was set in vxlan_setup before. This caused VXLAN-GPE to use the same
  default port as other VXLAN sockets instead of the intended IANA-assigned
  4790.

Signed-off-by: Matthias Schiffer <mschiffer@universe-factory.net>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-more-skb_put-work'

yuan linyu says:

====================
net: more skb_put_[data:zero] related work

yuan linyu (3):
  net: introduce __skb_put_[zero, data, u8]
  net: replace more place to skb_put_[data:zero]
  net: manual clean code which call skb_put_[data:zero]
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: manual clean code which call skb_put_[data:zero]

Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: replace more place to skb_put_[data:zero]

spatch file,
@@
expression skb, len, data;
type t;
@@
-memcpy((t *)skb_put(skb, len), data, len);
+skb_put_data(skb, data, len);

@@
identifier p;
expression skb, len, data;
type t;
@@
-p = (t *)memset(skb_put(skb, len), data, len);
+p = skb_put_zero(skb, len);

@@
expression skb, len, data;
type t;
@@
-memcpy((t *)__skb_put(skb, len), data, len);
+__skb_put_data(skb, data, len);

@@
identifier p;
expression skb, len, data;
type t;
@@
-p = (t *)memset(__skb_put(skb, len), data, len);
+p = __skb_put_zero(skb, len);

Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: introduce __skb_put_[zero, data, u8]

follow Johannes Berg, semantic patch file as below,
@@
identifier p, p2;
expression len;
expression skb;
type t, t2;
@@
(
-p = __skb_put(skb, len);
+p = __skb_put_zero(skb, len);
|
-p = (t)__skb_put(skb, len);
+p = __skb_put_zero(skb, len);
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, len);
|
-memset(p, 0, len);
)

@@
identifier p;
expression len;
expression skb;
type t;
@@
(
-t p = __skb_put(skb, len);
+t p = __skb_put_zero(skb, len);
)
... when != p
(
-memset(p, 0, len);
)

@@
type t, t2;
identifier p, p2;
expression skb;
@@
t *p;
...
(
-p = __skb_put(skb, sizeof(t));
+p = __skb_put_zero(skb, sizeof(t));
|
-p = (t *)__skb_put(skb, sizeof(t));
+p = __skb_put_zero(skb, sizeof(t));
)
... when != p
(
p2 = (t2)p;
-memset(p2, 0, sizeof(*p));
|
-memset(p, 0, sizeof(*p));
)

@@
expression skb, len;
@@
-memset(__skb_put(skb, len), 0, len);
+__skb_put_zero(skb, len);

@@
expression skb, len, data;
@@
-memcpy(__skb_put(skb, len), data, len);
+__skb_put_data(skb, data, len);

@@
expression SKB, C, S;
typedef u8;
identifier fn = {__skb_put};
fresh identifier fn2 = fn ## "_u8";
@@
- *(u8 *)fn(SKB, S) = C;
+ fn2(SKB, C);

Signed-off-by: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: better IEEE Prio Mapping Table description

Kill the remaining shift macro in favor of calculating at compile time
its value from the more descriptive mask, which gives us a better
representation of the register layout.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-dsa-Global-2-cosmetics'

Vivien Didelot says:

====================
net: dsa: Global 2 cosmetics

Similarly to what has been done for the Port and Global 1 registers,
this patch series prefixes and documents the macros of Global 2.

It brings no functional changes except for 1/10 which fixes the IRL init
for 88E6390 family.

Changes in v2: make *_g2_irl_init_all static inline without
NET_DSA_MV88E6XXX_GLOBAL2 and compile test with and without the symbol.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: prefix Global 2 remaining macros

Prefix and document the remaining Global 2 registers macros.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: prefix Global 2 Watchdog macros

The Marvell 88E6352 family has a Global 2 register dedicated to the
watchdog setup. But the 88E6390 turned it into an indirect table.

Prefix and document that.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: prefix Global 2 Switch MAC macros

Prefix and document the Global 2 Switch MAC registers macros.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: prefix Global 2 EEPROM macros

Prefix and document the Global 2 EEPROM registers macros.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: prefix Global 2 PVT macros

Prefix and document the Global 2 Cross-chip Port VLAN registers macros.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: prefix Global 2 MGMT macros

Prefix and document the Global 2 MGMT registers macros.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: prefix Global 2 Device Mapping macros

Prefix and document the Global 2 Device Mapping macros.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: prefix Global 2 Trunk macros

Prefix and document the Global 2 Trunk registers macros. At the same
time, fix the hask -> hash typo and use the mv88e6xxx_port_mask helper.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: clarify SMI PHY functions

Marvell chips with an SMI PHY access in Global 2 registers handle both
Clause 22 and Clause 45 of IEEE 802.3.

The 88E6390 family has addition bits to target the internal or external
PHYs connected to the device, and a Setup function in addition to the
default (register) Access function.

Prefix the SMI PHY Command and Data registers macros, implement clear
helpers for Clause 22 and 44 Access functions, rename variable to match
the SMI and switch vocabulary (device and register addresses for Clause
22 and port and device class for Clause 45.)

Finally do not use complex macros but simple 16-bit mask to document the
registers organization.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: dsa: mv88e6xxx: add irl_init_all op

Some Marvell chips have an Ingress Rate Limit unit. But the command
values slightly differs between models: 88E6352 use 3-bit for operations
while 88E6390 use different 2-bit operations.

This commit kills the IRL flags in favor of a new operation implementing
the "Init all resources to the initial state" operation.

This fixes the operation of 88E6390 family where 0x1000 means Read the
selected resource 0, register 0 on port 16, instead of init all.

A mv88e6xxx_irl_setup helper is added to wrap the operation call.

Signed-off-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-next-stmmac-dwmac-sun8i-add-support-for-V3s'

Icenowy Zheng says:

====================
net-next: stmmac: dwmac-sun8i: add support for V3s

Allwinner V3s features an EMAC like the on in H3, but without external MII
interfaces, so being not able really to use RMII/RGMII.

And it has a different default value of syscon (0x38000 instead of 0x58000
on H3), which shows a problem that the EMAC clock freq should be 24MHz.
(Both H3 and V3s SoCs doesn't have extra xtal input for EPHY, and the
main xtal is 24MHz. The default value of H3 is set to 24MHz, but the V3s
default value is set to 25MHz).

First two patches are device tree binding patches, the third forces
the frequency to 24MHz and the fourth really add the V3s support.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net-next: stmmac: dwmac-sun8i: add support for V3s EMAC

Allwinner V3s SoC has an Ethernet MAC and an internal PHY like the ones
in H3 SoC, however the MAC has no external *MII interfaces available at
GPIOs, thus only MII connection to internal PHY is supported.

Add this variant of EMAC to dwmac-sun8i driver.

The default value of the syscon EMAC-related register seems to have
changed from H3, but it seems to be a harmless change.

Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-next: stmmac: dwmac-sun8i: force EPHY clock freq to 24MHz

The EPHY control part of the EMAC syscon register has a bit called
CLK_SEL. On the datasheet it says that if it's 0 the EPHY clock is 25MHz
and if it's 1 the clock is 24MHz.

However, according to the datasheets, no Allwinner SoC with EPHY has any
extra xtal input pins for the EPHY, and the system xtal is 24MHz.

That means the EPHY is not possible to get a 25MHz xtal input, and thus
the frequency can only be 24MHz.

It doesn't matter on H3 as the default value of H3 is 24MHz, however on
V3s the default value is wrongly set to 25MHz, which prevented the EPHY
from working properly.

Force the EPHY clock frequency to 24MHz.

Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: syscon: Add DT bindings documentation for Allwinner V3s syscon

Allwinner V3s SoC has a syscon like the one in H3.

Add its compatible string.

Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net-next: Add DT bindings documentation for Allwinner V3s EMAC

Allwinner V3s SoC has a Ethernet MAC like the one in Allwinner H3, but
have no external MII capability. That means that it can only use the
EPHY and cannot do Gbps transmission.

Add binding for it.

Signed-off-by: Icenowy Zheng <icenowy@aosc.io>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-Introduction-of-the-tc-tests'

Lucas Bates says:

====================
net: Introduction of the tc tests

Apologies for sending this as one big patch. I've been sitting on this a little
too long, but it's ready and I wanted to get it out.

There are a limited number of tests to start - I plan to add more on a regular
basis.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: Introduce tc testsuite

Add the beginnings of a testsuite for tc functionality in the kernel.
These are a series of unit tests that use the tc executable and verify
the success of those commands by checking both the exit codes and the
output from tc's 'show' operation.

To run the tests:
# cd tools/testing/selftests/tc-testing
# sudo ./tdc.py

You can specify the tc executable to use with the -p argument on the command
line or editing the 'TC' variable in tdc_config.py. Refer to the README for
full details on how to run.

The initial complement of test cases are limited mostly to tc actions. Test
cases are most welcome; see the creating-testcases subdirectory for help
in creating them.

Signed-off-by: Lucas Bates <lucasb@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'qed-RDMA-and-infrastructure-for-iWARP'

Yuval Mintz says:

====================
qed*: RDMA and infrastructure for iWARP

This series focuses on RDMA in general with emphasis on required changes
toward adding iWARP support. The vast majority of the changes introduced
are in qed/qede, with a couple of small changes to qedr
[mentioned below].

The infrastructure changes:
- Patch #1 adds the ability to pass PBL memory externally for a newly
created chain.
- Patches #4, #5 rename qede_roce.[ch] into qede_rdma.[ch] + change
prefixes from _roce_ to _rdma_, as the API between qede and qedr is
agnostic to the variant of the RDMA protocol used. These patches also
touch qedr [basically to align it with the renaming, nothing more].
- Patch #7 replaces the current SPQ async mechanism into serving
registered callbacks [before adding iWARP which would add another client
in need of this sort of functionallity].

The non-infrastrucutre changes:
- Patches #2, #3 contain DCB-related changes to better align RDMA with
configured DCB.
- Patch #6 contains a minor [mostly theoretical fix] to release flow.

Changes from previous versions
------------------------------
- V4: This is actually a repost of V3 due to some confusion regarding
the sent cover-letter
- V3: Add commit log message in #4 indicating change in header inclusion
- V2: Add several inclusion into qede_rdma.h to have proper declarations
of all variable types used in it
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

qed: SPQ async callback registration

Whenever firmware indicates that there's an async indication it needs
to handle, there's a switch-case where the right functionality is called
based on function's personality and information.

Before iWARP is added [as yet another client], switch over the SPQ into
a callback-registered mechanism, allowing registration of the relevant
event-processing logic based on the function's personality. This allows
us to tidy the code by removing protocol-specifics from a common file.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Wait for resources before FUNC_CLOSE

Driver needs to wait for all resources to return from FW before it can send
the FUNC_CLOSE ramrod.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed*: Set rdma generic functions prefix

Rename the functions common to both iWARP and RoCE to have a prefix of
_rdma_ instead of _roce_.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed*: qede_roce.[ch] -> qede_rdma.[ch]

Once we have iWARP support, the qede portion of the qedr<->qede would
serve all the RDMA protocols - so rename the file to be appropriate
to its function.

While we're at it, we're also moving a couple of inclusions to it into
.h files and adding includes to make sure it contains all type
definitions it requires.

Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Disable RoCE dpm when DCBx change occurs

If DCBx update occurs while QPs are open, stop sending edpms until all
QPs are closed.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: RoCE EDPM to honor PFC

Configure device according to DCBx results so that EDPMs
made by RoCE would honor flow-control.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Chain support for external PBL

iWARP would require the chains to allocate/free their PBL memory
independently, so add the infrastructure to provide it externally.

Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: md5: add TCP_MD5SIG_EXT socket option to set a key address prefix

Replace first padding in the tcp_md5sig structure with a new flag field
and address prefix length so it can be specified when configuring a new
key for TCP MD5 signature. The tcpm_flags field will only be used if the
socket option is TCP_MD5SIG_EXT to avoid breaking existing programs, and
tcpm_prefixlen only when the TCP_MD5SIG_FLAG_PREFIX flag is set.

Signed-off-by: Bob Gilligan <gilligan@arista.com>
Signed-off-by: Eric Mowat <mowat@arista.com>
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: md5: add an address prefix for key lookup

This allows the keys used for TCP MD5 signature to be used for whole
range of addresses, specified with a prefix length, instead of only one
address as it currently is.

Signed-off-by: Bob Gilligan <gilligan@arista.com>
Signed-off-by: Eric Mowat <mowat@arista.com>
Signed-off-by: Ivan Delalande <colona@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/mlx5e: IPoIB, Add ioctl support to IPoIB device driver

Add ioctl support to IPoIB device driver. For now, this
ioctl will support timestamp get and set.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Eitan Rabin <rabin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: IPoIB, Add PTP support to IPoIB device driver

Enable PTP for IPoIB rdma_netdev and add the ability
to get the time stamping parameters using ethtool.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Eitan Rabin <rabin@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: IPoIB, Get more TX statistics

Add misses counters (bytes, packet, gso, xmit_more) in TX flow for ipoib
traffic.

Fixes: 58545449b7b ("net/mlx5e: IPoIB, Xmit flow")
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: IPoIB, Handle change_mtu

Add the ndo that supports change mtu for IPoIB.
The callback called from the ipoib ULP driver, that gives the ability to
change the SW and HW resources accordingly in the lower driver.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Use hard_mtu as part of the mlx5e_priv struct

The mtu extra space that kept for the HW is specific for each link type,
and it is different in mlx5e and mlx5i modules.
Now it is kept in the priv structures, set by the mlx5e/mlx5i driver
accordingly.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: IPoIB, Change parameters default values

Add function that sets the default values for ipoib, setting/clearing
abilities that IPoIB doesn't support, like RQ size in this case.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Add new profile function update_carrier

Updating the carrier involves specific HW setting, each profile should
use its own function for that.

Both IPoIB and VF representor don't need carrier update function, since
VF representor has only a logical link to VF and IPoIB manages its own
link via ib_core upper layer.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: IPoIB, Add ethtool support

Add support for the following:
"ethtool -S" (statistics).
"ethtool -i" (driver info).
"ethtool -g/G" (rings parameters).
"ethtool -l/L" (channels parameters).
"ethtool -c/C" (coalesce options).

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: Prevent PFC call for non ethernet ports

Port flow control supported only for ethernet ports,
therefore, prevent any call if the port type differs from
MLX5_CAP_PORT_TYPE_ETH.

Signed-off-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

net/mlx5e: IPoIB, Move to a separate directory

IPoIB netdevice driver was only introduced in previous kernel release
and it is growing in terms of features and LOC, move it to a separate
directory.

Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>

cxgb4: notify uP to route ctrlq compl to rdma rspq

During the module initialisation there is a possible race
(basically race between uld and lld) where neither the uld
nor lld notifies the uP about where to route the ctrl queue
completions. LLD skips notifying uP as the rdma queues were
not created by then (will leave it to ULD to notify the uP).
As the ULD comes up, it also skips notifying the uP as the
flag FULL_INIT_DONE is not set yet (ULD assumes that the
interface is not up yet).

Consequently, this race between uld and lld leaves uP
unnotified about where to send the ctrl queue completions
to, leading to iwarp RI_RES WR failure.

Here is the race:

CPU 0                                   CPU1

- allocates nic rx queus
- t4_sge_alloc_ctrl_txq()
(if rdma rsp queues exists,
tell uP to route ctrl queue
compl to rdma rspq)
                                - acquires the mutex_lock
                                - allocates rdma response queues
                                - if FULL_INIT_DONE set,
                                  tell uP to route ctrl queue compl
                                  to rdma rspq
                                - relinquishes mutex_lock
- acquires the mutex_lock
- enable_rx()
- set FULL_INIT_DONE
- relinquishes mutex_lock

This patch fixes the above issue.

Fixes: e7519f9926f1('cxgb4: avoid enabling napi twice to the same queue')
Signed-off-by: Raju Rangoju <rajur@chelsio.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
CC: Stable <stable@vger.kernel.org> # 4.9+
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: add new T6 pci device id's

Add 0x6082, 0x6083 and 0x6084 T6 device id's

Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

nfp: add VLAN filtering support

Add general use per-vNIC mailbox area and use it for VLAN filtering
support. Initially proto is hardcoded to 802.1q.

Signed-off-by: Pablo Cascón <pablo.cascon@netronome.com>
Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

cxgb4: fix a NULL dereference

Avoid NULL dereference in setup_sge_queues() when the adapter is
in non offload mode.

Fixes: 0fbc81b3ad51 ('chcr/cxgb4i/cxgbit/RDMA/cxgb4: Allocate resources dynamically for all cxgb4 ULD's')
Signed-off-by: Ganesh Goudar <ganeshgr@chelsio.com>
Signed-off-by: David S. Miller <davem@davemloft.net>