git.proxmox.com Git - mirror_ubuntu-kernels.git/log

ice: Allow 2 queue pairs per VF on SR-IOV initialization

Currently VFs are only allowed to get 16, 4, and 1 queue pair by
default, which require 17, 5, and 2 MSI-X vectors respectively. This
is because each VF needs a MSI-X per data queue and a MSI-X for its
other interrupt. The calculation is based on the number of VFs created,
MSI-X available, and queue pairs available at the time of VF creation.

Unfortunately the values above exclude 2 queue pairs when only 3 MSI-X
are available to each VF based on resource constraints. The current
calculation would default to 2 MSI-X and 1 queue pair. This is a waste
of resources, so fix this by allowing 2 queue pairs per VF when there
are between 2 and 5 MSI-X available per VF.

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Clear and free XLT entries on reset

This fix has been added to address memory leak issues resulting from
triggering a sudden driver reset which does not allow us to follow our
normal removal flows for SW XLT entries for advanced features.

- Adding call to destroy flow profile locks when clearing SW XLT tables.

- Extraction sequence entries were not correctly cleared previously
which could cause ownership conflicts for repeated reset-replay calls.

Fixes: 31ad4e4ee1e4 ("ice: Allocate flow profile")
Signed-off-by: Vignesh Sridhar <vignesh.sridhar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: add useful statistics

Display and count some useful hot-path statistics. The usefulness is as
follows:

- tx_restart: use to determine if the transmit ring size is too small or
  if the transmit interrupt rate is too low.
- rx_gro_dropped: use to count drops from GRO layer, which previously were
  completely uncounted when occurring.
- tx_busy: use to determine when the driver is miscounting number of
  descriptors needed for an skb.
- tx_timeout: as our other drivers, count the number of times we've reset
  due to timeout because the kernel only prints a warning once per netdev.

Several of these were already counted but not displayed.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: remove page_reuse statistic

The page reuse statistic wasn't even being displayed to the user, even
though the driver counted it. Don't waste the struct space and hot-path
cycles since the driver doesn't display it.

Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Fix RSS profile locks

Replacing flow profile locks with RSS profile locks in the function to
remove all RSS rules for a given VSI. This is to align the locks used
for RSS rule addition to VSI and removal during VSI teardown to avoid
a race condition owing to several iterations of the above operations.
In function to get RSS rules for given VSI and protocol header replacing
the pointer reference of the RSS entry with a copy of hash value to
ensure thread safety.

Signed-off-by: Vignesh Sridhar <vignesh.sridhar@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: fix the vsi_id mask to be 10 bit for set_rss_lut

set_rss_lut can fail due to incorrect vsi_id mask. vsi_id is 10 bit
but mask was 0x1FF whereas it should be 0x3FF.

For vsi_num >= 512, FW set_rss_lut can fail with return code
EACCESS (VSI ownership issue) because software was providing
incorrect vsi_num (dropping 10th bit due to incorrect mask) for
set_rss_lut admin command

Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: rename misleading grst_delay variable

The grst_delay variable in ice_check_reset contains the maximum time
(in 100 msec units) that the driver will wait for a reset event to
transition to the Device Active state. The value is the sum of three
separate components:
1) The maximum time it may take for the firmware to process its
outstanding command before handling the reset request.
2) The value in RSTCTL.GRSTDEL (the delay firmware inserts between first
seeing the driver reset request and the actual hardware assertion).
3) The maximum expected reset processing time in hardware.

Referring to this total time as "grst_delay" is misleading and
potentially confusing to someone checking the code and cross-referencing
the hardware specification.

Fix this by renaming the variable to "grst_timeout", which is more
descriptive of its actual use.

Signed-off-by: Nick Nunley <nicholas.d.nunley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: mark PM functions as __maybe_unused

In certain configurations without power management support, the
following warnings happen:

drivers/net/ethernet/intel/ice/ice_main.c:4214:12: warning:
'ice_resume' defined but not used [-Wunused-function]
4214 | static int ice_resume(struct device *dev)
| ^~~~~~~~~~
drivers/net/ethernet/intel/ice/ice_main.c:4150:12: warning:
'ice_suspend' defined but not used [-Wunused-function]
4150 | static int ice_suspend(struct device *dev)
| ^~~~~~~~~~~

Mark these functions as __maybe_unused to make it clear to the
compiler that this is going to happen based on the configuration,
which is the standard for these types of functions.

Fixes: 769c500dcc1e ("ice: Add advanced power mgmt for WoL")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

tun: add missing rcu annotation in tun_set_ebpf()

We expecte prog_p to be protected by rcu, so adding the rcu annotation
to fix the following sparse warning:

drivers/net/tun.c:3003:36: warning: incorrect type in argument 2 (different address spaces)
drivers/net/tun.c:3003:36:    expected struct tun_prog [noderef] __rcu **prog_p
drivers/net/tun.c:3003:36:    got struct tun_prog **prog_p
drivers/net/tun.c:3292:42: warning: incorrect type in argument 2 (different address spaces)
drivers/net/tun.c:3292:42:    expected struct tun_prog **prog_p
drivers/net/tun.c:3292:42:    got struct tun_prog [noderef] __rcu **
drivers/net/tun.c:3296:42: warning: incorrect type in argument 2 (different address spaces)
drivers/net/tun.c:3296:42:    expected struct tun_prog **prog_p
drivers/net/tun.c:3296:42:    got struct tun_prog [noderef] __rcu **

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: add earliest departure time to SCM_TIMESTAMPING_OPT_STATS

This change adds TCP_NLA_EDT to SCM_TIMESTAMPING_OPT_STATS that reports
the earliest departure time(EDT) of the timestamped skb. By tracking EDT
values of the skb from different timestamps, we can observe when and how
much the value changed. This allows to measure the precise delay
injected on the sender host e.g. by a bpf-base throttler.

Signed-off-by: Yousuk Seung <ysseung@google.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
Acked-by: Neal Cardwell <ncardwell@google.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
Acked-by: Yuchung Cheng <ycheng@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Tony Nguyen says:

====================
1GbE Intel Wired LAN Driver Updates 2020-07-30

This series contains updates to e100, e1000, e1000e, igb, igbvf, ixgbe,
ixgbevf, iavf, and driver documentation.

Vaibhav Gupta converts legacy .suspend() and .resume() to generic PM
callbacks for e100, igbvf, ixgbe, ixgbevf, and iavf.

Suraj Upadhyay replaces 1 byte memsets with assignments for e1000,
e1000e, igb, and ixgbe.

Alexander Klimov replaces http links with https.

Miaohe Lin replaces uses of memset to clear MAC addresses with
eth_zero_addr().
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'mptcp-syncookies'

Florian Westphal says:

====================
mptcp: add syncookie support

Changes in v2:
- first patch renames req->ts_cookie to req->syncookie instead of
  removing ts_cookie member.
- patch to add 'want_cookie' arg to init_req() functions has been dropped.
  All users of that arg were changed to check 'req->syncookie' instead.

v1 cover letter:

When syn-cookies are used the SYN?ACK never contains a MPTCP option,
because the code path that creates a request socket based on a valid
cookie ACK lacks the needed changes to construct MPTCP request sockets.

After this series, if SYN carries MP_CAPABLE option, the option is not
cleared anymore and request socket will be reconstructed using the
MP_CAPABLE option data that is re-sent with the ACK.

This means that no additional state gets encoded into the syn cookie or
the TCP timestamp.

There are two caveats for SYN-Cookies with MPTCP:

1. When syn-cookies are used, the server-generated key is not stored.
The drawback is that the next connection request that comes in before
the cookie-ACK has a small chance that it will generate the same local_key.

If this happens, the cookie ACK that comes in second will (re)compute the
token hash and then detects that this is already in use.
Unlike normal case, where the server will pick a new key value and then
re-tries, we can't do that because we already committed to the key value
(it was sent to peer already).

Im this case, MPTCP cannot be used and late TCP fallback happens.

2). SYN packets with a MP_JOIN requests cannot be handled without storing
    state. This is because the SYN contains a nonce value that is needed to
    verify the HMAC of the MP_JOIN ACK that completes the three-way
    handshake.  Also, a local nonce is generated and used in the cookie
    SYN/ACK.

There are only 2 ways to solve this:
a) Do not support JOINs when cookies are in effect.
b) Store the nonces somewhere.

The approach chosen here is b).
Patch 8 adds a fixed-size (1024 entries) state table to store the
information required to validate the MP_JOIN ACK and re-build the
request socket.

State gets stored when syn-cookies are active and the token in the JOIN
request referred to an established MPTCP connection that can also accept
a new subflow.

State is restored if the ACK cookie is valid, an MP_JOIN option is present
and the state slot contains valid data from a previous SYN.

After the request socket has been re-build, normal HMAC check is done just
as without syn cookies.

Largely identical to last RFC, except patch #8 which follows Paolos
suggestion to use a private table storage area rather than keeping
request sockets around.  This also means I dropped the patch to remove
const qualifier from sk_listener pointers.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mptcp: add test cases for mptcp join tests with syn cookies

Also add test cases with MP_JOIN when tcp_syncookies sysctl is 2 (i.e.,
syncookies are always-on).

While at it, also print the test number and add the test number
to the pcap files that can be generated optionally.

This makes it easier to match the pcap to the test case.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: mptcp: make 2nd net namespace use tcp syn cookies unconditionally

check we can establish connections also when syn cookies are in use.

Check that
MPTcpExtMPCapableSYNRX and MPTcpExtMPCapableACKRX increase for each
MPTCP test.

Check TcpExtSyncookiesSent and TcpExtSyncookiesRecv increase in netns2.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: enable JOIN requests even if cookies are in use

JOIN requests do not work in syncookie mode -- for HMAC validation, the
peers nonce and the mptcp token (to obtain the desired connection socket
the join is for) are required, but this information is only present in the
initial syn.

So either we need to drop all JOIN requests once a listening socket enters
syncookie mode, or we need to store enough state to reconstruct the request
socket later.

This adds a state table (1024 entries) to store the data present in the
MP_JOIN syn request and the random nonce used for the cookie syn/ack.

When a MP_JOIN ACK passed cookie validation, the table is consulted
to rebuild the request socket from it.

An alternate approach would be to "cancel" syn-cookie mode and force
MP_JOIN to always use a syn queue entry.

However, doing so brings the backlog over the configured queue limit.

v2: use req->syncookie, not (removed) want_cookie arg

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: syncookies: create mptcp request socket for ACK cookies with MPTCP option

If SYN packet contains MP_CAPABLE option, keep it enabled.
Syncokie validation and cookie-based socket creation is changed to
instantiate an mptcp request sockets if the ACK contains an MPTCP
connection request.

Rather than extend both cookie_v4/6_check, add a common helper to create
the (mp)tcp request socket.

Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: subflow: add mptcp_subflow_init_cookie_req helper

Will be used to initialize the mptcp request socket when a MP_CAPABLE
request was handled in syncookie mode, i.e. when a TCP ACK containing a
MP_CAPABLE option is a valid syncookie value.

Normally (non-cookie case), MPTCP will generate a unique 32 bit connection
ID and stores it in the MPTCP token storage to be able to retrieve the
mptcp socket for subflow joining.

In syncookie case, we do not want to store any state, so just generate the
unique ID and use it in the reply.

This means there is a small window where another connection could generate
the same token.

When Cookie ACK comes back, we check that the token has not been registered
in the mean time. If it was, the connection needs to fall back to TCP.

Changes in v2:
- use req->syncookie instead of passing 'want_cookie' arg to ->init_req()
(Eric Dumazet)

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: rename and export mptcp_subflow_request_sock_ops

syncookie code path needs to create an mptcp request sock.

Prepare for this and add mptcp prefix plus needed export of ops struct.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: subflow: split subflow_init_req

When syncookie support is added, we will need to add a variant of
subflow_init_req() helper. It will do almost same thing except
that it will not compute/add a token to the mptcp token tree.

To avoid excess copy&paste, this commit splits away part of the
code into a new helper, __subflow_init_req, that can then be re-used
from the 'no insert' function added in a followup change.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

mptcp: token: move retry to caller

Once syncookie support is added, no state will be stored anymore when the
syn/ack is generated in syncookie mode.

When the ACK comes back, the generated key will be taken from the TCP ACK,
the token is re-generated and inserted into the token tree.

This means we can't retry with a new key when the token is already taken
in the syncookie case.

Therefore, move the retry logic to the caller to prepare for syncookie
support in mptcp.

Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: rename request_sock cookie_ts bit to syncookie

Nowadays output function has a 'synack_type' argument that tells us when
the syn/ack is emitted via syncookies.

The request already tells us when timestamps are supported, so check
both to detect special timestamp for tcp option encoding is needed.

We could remove cookie_ts altogether, but a followup patch would
otherwise need to adjust function signatures to pass 'want_cookie' to
mptcp core.

This way, the 'existing' bit can be used.

Suggested-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: Florian Westphal <fw@strlen.de>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: cls_u32: Use struct_size() helper

Make use of the struct_size() helper, in multiple places, instead
of an open-coded version in order to avoid any potential type
mistakes and protect against potential integer overflows.

Also, remove unnecessary object identifier size.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

qede: Use %pM format specifier for MAC addresses

Convert to %pM instead of using custom code.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

qed: Use %pM format specifier for MAC addresses

Convert to %pM instead of using custom code.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hsr: Use %pM format specifier for MAC addresses

Convert to %pM instead of using custom code.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 's390-qeth-next'

Julian Wiedmann says:

====================
s390/qeth: updates 2020-07-30

please apply the following patch series for qeth to netdev's net-next tree.

This primarily brings some modernization to the RX path, laying the
groundwork for smarter RX refill policies.
Some of the patches are tagged as fixes, but really target only rare /
theoretical issues. So given where we are in the release cycle and that we
touch the main RX path, taking them through net-next seems more appropriate.
====================

Reviewed-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: use all configured RX buffers

The (misplaced) comment doesn't make any sense, enforcing an
uninitialized RX buffer won't help with IRQ reduction.

So make the best use of all available RX buffers.

Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: don't process empty bridge port events

Discard events that don't contain any entries. This shouldn't happen,
but subsequent code relies on being able to use entry 0. So better
be safe than accessing garbage.

Fixes: b4d72c08b358 ("qeth: bridgeport support - basic control")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Reviewed-by: Alexandra Winter <wintera@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: integrate RX refill worker with NAPI

Running a RX refill outside of NAPI context is inherently racy, even
though the worker is only started for an entirely idle RX ring.
>From the moment that the worker has replenished parts of the RX ring,
the HW can use those RX buffers, raise an IRQ and cause our NAPI code to
run concurrently to the RX refill worker.

Instead let the worker schedule our NAPI instance, and refill the RX
ring from there. Keeping accurate count of how many buffers still need
to be refilled also removes some quirky arithmetic from the low-level
code.

Fixes: b333293058aa ("qeth: add support for af_iucv HiperSockets transport")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

s390/qeth: tolerate pre-filled RX buffer

When preparing a buffer for RX refill, tolerate that it already has a
pool_entry attached. Otherwise we could easily leak such a pool_entry
when re-driving the RX refill after an error (from eg. do_qdio()).

This needs some minor adjustment in the code that drains RX buffer(s)
prior to RX refill and during teardown, so that ->pool_entry is NULLed
accordingly.

Fixes: 4a71df50047f ("qeth: new qeth device driver")
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: Pass NULL to skb_network_protocol() when we don't care about vlan depth

When we don't care about vlan depth, we could pass NULL instead of the
address of a unused local variable to skb_network_protocol() as a param.

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bluetooth: sco: Fix sockptr reference.

net/bluetooth/sco.c: In function ‘sco_sock_setsockopt’:
net/bluetooth/sco.c:862:3: error: cannot convert to a pointer type
862 | if (get_user(opt, (u32 __user *)optval)) {
| ^~

Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'for-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next

Johan Hedberg says:

====================
pull request: bluetooth-next 2020-07-31

Here's the main bluetooth-next pull request for 5.9:

- Fix firmware filenames for Marvell chipsets
- Several suspend-related fixes
- Addedd mgmt commands for runtime configuration
- Multiple fixes for Qualcomm-based controllers
- Add new monitoring feature for mgmt
- Fix handling of legacy cipher (E4) together with security level 4
- Add support for Realtek 8822CE controller
- Fix issues with Chinese controllers using fake VID/PID values
- Multiple other smaller fixes & improvements
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

Bluetooth: Remove CRYPTO_ALG_INTERNAL flag

The flag CRYPTO_ALG_INTERNAL is not meant to be used outside of
the Crypto API. It isn't needed here anyway.

Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>

Bluetooth: Increment management interface revision

Increment the mgmt revision due to the recently added new commands.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Johan Hedberg <johan.hedberg@intel.com>

Bluetooth: use the proper scan params when conn is pending

When an LE connection is requested and an RPA update is needed via
hci_connect_le_scan, the default scanning parameters are used rather
than the connect parameters.  This leads to significant delays in the
connection establishment process when using lower duty cycle scanning
parameters.

The patch simply looks at the pended connection list when trying to
determine which scanning parameters should be used.

Before:
< HCI Command: LE Set Extended Scan Parameters (0x08|0x0041) plen 8
                            #378 [hci0] 1659.247156
        Own address type: Public (0x00)
        Filter policy: Ignore not in white list (0x01)
        PHYs: 0x01
        Entry 0: LE 1M
          Type: Passive (0x00)
          Interval: 367.500 msec (0x024c)
          Window: 37.500 msec (0x003c)

After:
< HCI Command: LE Set Extended Scan Parameters (0x08|0x0041) plen 8
                               #39 [hci0] 7.422109
        Own address type: Public (0x00)
        Filter policy: Ignore not in white list (0x01)
        PHYs: 0x01
        Entry 0: LE 1M
          Type: Passive (0x00)
          Interval: 60.000 msec (0x0060)
          Window: 60.000 msec (0x0060)

Signed-off-by: Alain Michaud <alainm@chromium.org>
Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Reviewed-by: Yu Liu <yudiliu@google.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

net: ll_temac: Use devm_platform_ioremap_resource_byname()

platform_get_resource() may fail and return NULL, so we had better
check its return value to avoid a NULL pointer dereference a bit later
in the code. Fix it to use devm_platform_ioremap_resource_byname()
instead of calling platform_get_resource_byname() and devm_ioremap().

Fixes: 8425c41d1ef7 ("net: ll_temac: Extend support to non-device-tree platforms")
Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'net-ethernet-use-generic-power-management'

Vaibhav Gupta says:

====================
net: ethernet: use generic power management

Linux Kernel Mentee: Remove Legacy Power Management.

The purpose of this patch series is to upgrade power management in net ethernet
drivers. This has been done by upgrading .suspend() and .resume() callbacks.

The upgrade makes sure that the involvement of PCI Core does not change the
order of operations executed in a driver. Thus, does not change its behavior.

In general, drivers with legacy PM, .suspend() and .resume() make use of PCI
helper functions like pci_enable/disable_device_mem(), pci_set_power_state(),
pci_save/restore_state(), pci_enable/disable_device(), etc. to complete
their job.

The conversion requires the removal of those function calls, change the
callbacks' definition accordingly and make use of dev_pm_ops structure.

All patches are compile-tested only.

Test tools:
- Compiler: gcc (GCC) 10.1.0
- allmodconfig build: make -j$(nproc) W=1 all
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tlan: use generic power management

Drivers using legacy power management .suspen()/.resume() callbacks
have to manage PCI states and device's PM states themselves. They also
need to take care of standard configuration registers.

Switch to generic power management framework using a single
"struct dev_pm_ops" variable to take the unnecessary load from the driver.
This also avoids the need for the driver to directly call most of the PCI
helper functions and device power state control functions, as through
the generic framework PCI Core takes care of the necessary operations,
and drivers are required to do only device-specific jobs.

Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sis900: use generic power management

Drivers using legacy power management .suspen()/.resume() callbacks
have to manage PCI states and device's PM states themselves. They also
need to take care of standard configuration registers.

Switch to generic power management framework using a single
"struct dev_pm_ops" variable to take the unnecessary load from the driver.
This also avoids the need for the driver to directly call most of the PCI
helper functions and device power state control functions, as through
the generic framework PCI Core takes care of the necessary operations,
and drivers are required to do only device-specific jobs.

Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sc92031: use generic power management

Drivers using legacy power management .suspen()/.resume() callbacks
have to manage PCI states and device's PM states themselves. They also
need to take care of standard configuration registers.

Switch to generic power management framework using a single
"struct dev_pm_ops" variable to take the unnecessary load from the driver.
This also avoids the need for the driver to directly call most of the PCI
helper functions and device power state control functions, as through
the generic framework PCI Core takes care of the necessary operations,
and drivers are required to do only device-specific jobs.

Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

bnxt_en: Remove superfluous memset()

Fixes coccicheck warning:

./drivers/net/ethernet/broadcom/bnxt/bnxt.c:3730:19-37: WARNING:
dma_alloc_coherent use in stats -> hw_stats already zeroes out
memory, so memset is not needed

dma_alloc_coherent use in status already zeroes out memory,
so memset is not needed

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Li Heng <liheng40@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: Replace vmalloc with kmalloc in octeon_register_dispatch_fn()

The size of struct octeon_dispatch is too small, it is better to use
kmalloc instead of vmalloc.

Suggested-by: Joe Perches <joe@perches.com>
Signed-off-by: Wang Hai <wanghai38@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/sched: act_pedit: Use flex_array_size() helper in memcpy()

Make use of the flex_array_size() helper to calculate the size of a
flexible array member within an enclosing structure.

This helper offers defense-in-depth against potential integer
overflows, while at the same time makes it explicitly clear that
we are dealing with a flexible array member.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

mlxsw: spectrum_cnt: Use flex_array_size() helper in memcpy()

Make use of the flex_array_size() helper to calculate the size of a
flexible array member within an enclosing structure.

This helper offers defense-in-depth against potential integer
overflows, while at the same time makes it explicitly clear that
we are dealing witha flexible array member.

Also, remove unnecessary pointer identifier sub_pool.

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Reviewed-by: Ido Schimmel <idosch@mellanox.com>
Tested-by: Ido Schimmel <idosch@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

sfc_ef100: remove duplicated include from ef100_netdev.c

Remove duplicated include.

Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Acked-by: Edward Cree <ecree@solarflare.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ptp: ptp_clockmatrix: update to support 4.8.7 firmware

With 4.8.7 firmware, adjtime can change delta instead of absolute time,
which greately increases snap accuracy. PPS alignment doesn't have to
be set for every single TOD change. Other minor changes includes:
adding more debug logs, increasing snap accuracy for pre 4.8.7 firmware
and supporting new tcs2bin format.

Signed-off-by: Min Li <min.li.xe@renesas.com>
Acked-by: Richard Cochran <richardcochran@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'l2tp-tidy-up-l2tp-core-API'

Tom Parkin says:

====================
l2tp: tidy up l2tp core API

This short series makes some minor tidyup changes to the L2TP core API.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: improve API documentation in l2tp_core.h

* Improve the description of the key l2tp subsystem data structures.
* Add high-level description of the main APIs for interacting with l2tp
core.
* Add documentation for the l2tp netlink session command callbacks.
* Document the session pseudowire callbacks.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: tweak exports for l2tp_recv_common and l2tp_ioctl

All of the l2tp subsystem's exported symbols are exported using
EXPORT_SYMBOL_GPL, except for l2tp_recv_common and l2tp_ioctl.

These functions alone are not useful without the rest of the l2tp
infrastructure, so there's no practical benefit to these symbols using a
different export policy.

Change these exports to use EXPORT_SYMBOL_GPL for consistency with the
rest of l2tp.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: remove build_header callback in struct l2tp_session

The structure of an L2TP data packet header varies depending on the
version of the L2TP protocol being used.

struct l2tp_session used to have a build_header callback to abstract
this difference away. It's clearer to simply choose the correct
function to use when building the data packet (and we save on the
function pointer in the session structure).

This approach does mean dereferencing the parent tunnel structure in
order to determine the tunnel version, but we're doing that in the
transmit path in any case.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: return void from l2tp_session_delete

l2tp_session_delete is used to schedule a session instance for deletion.
The function itself always returns zero, and none of its direct callers
check its return value, so have the function return void.

This change de-facto changes the l2tp netlink session_delete callback
prototype since all pseudowires currently use l2tp_session_delete for
their implementation of that operation.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: don't export tunnel and session free functions

Tunnel and session instances are reference counted, and shouldn't be
directly freed by pseudowire code.

Rather than exporting l2tp_tunnel_free and l2tp_session_free, make them
private to l2tp_core.c, and export the refcount functions instead.

In order to do this, the refcount functions cannot be declared as
inline. Since the codepaths which take and drop tunnel and session
references are not directly in the datapath this shouldn't cause
performance issues.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

l2tp: don't export __l2tp_session_unhash

When __l2tp_session_unhash was first added it was used outside of
l2tp_core.c, but that's no longer the case.

As such, there's no longer a need to export the function. Make it
private inside l2tp_core.c, and relocate it to avoid having to declare
the function prototype in l2tp_core.h.

Since the function is no longer used outside l2tp_core.c, remove the
"__" prefix since we don't need to indicate anything special about its
expected use to callers.

Signed-off-by: Tom Parkin <tparkin@katalix.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

selftests: txtimestamp: add flag for timestamp validation tolerance.

The txtimestamp selftest sets a fixed 500us tolerance. This value was
arrived at experimentally. Some platforms have higher variances. Make
this adjustable by adding the following flag:

-t N: tolerance (usec) for timestamp validation.

Signed-off-by: Jian Yang <jianyang@google.com>
Acked-by: Willem de Bruijn <willemb@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'master' of git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next

Steffen Klassert says:

====================
pull request (net-next): ipsec-next 2020-07-30

Please note that I did the first time now --no-ff merges
of my testing branch into the master branch to include
the [PATCH 0/n] message of a patchset. Please let me
know if this is desirable, or if I should do it any
different.

1) Introduce a oseq-may-wrap flag to disable anti-replay
   protection for manually distributed ICVs as suggested
   in RFC 4303. From Petr Vaněk.

2) Patchset to fully support IPCOMP for vti4, vti6 and
   xfrm interfaces. From Xin Long.

3) Switch from a linear list to a hash list for xfrm interface
   lookups. From Eyal Birger.

4) Fixes to not register one xfrm(6)_tunnel object twice.
   From Xin Long.

5) Fix two compile errors that were introduced with the
   IPCOMP support for vti and xfrm interfaces.
   Also from Xin Long.

6) Make the policy hold queue work with VTI. This was
   forgotten when VTI was implemented.

Please pull or let me know if there are problems.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

igb: use eth_zero_addr() to clear mac address

Use eth_zero_addr() to clear mac address instead of memset().

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ixgbe: use eth_zero_addr() to clear mac address

Use eth_zero_addr() to clear mac address instead of memset().

Signed-off-by: Miaohe Lin <linmiaohe@huawei.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Documentation: intel: Replace HTTP links with HTTPS ones

Rationale:
Reduces attack surface on kernel devs opening the links for MITM
as HTTPS traffic is much harder to manipulate.

Deterministic algorithm:
For each file:
  If not .svg:
    For each line:
      If doesn't contain `\bxmlns\b`:
        For each link, `\bhttp://[^# \t\r\n]*(?:\w|/)`:
          If neither `\bgnu\.org/license`, nor `\bmozilla\.org/MPL\b`:
            If both the HTTP and HTTPS versions
            return 200 OK and serve the same content:
              Replace HTTP with HTTPS.

Signed-off-by: Alexander A. Klimov <grandmaster@al2klimov.de>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ixgbe: Remove unnecessary usages of memset

Replace memsets of 1 byte with simple assignment.
Issue found with checkpatch.pl

Signed-off-by: Suraj Upadhyay <usuraj35@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igb: Remove unnecessary usages of memset

Replace memsets of 1 byte with simple assignment.
Issue found with checkpatch.pl

Signed-off-by: Suraj Upadhyay <usuraj35@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

e1000e: Remove unnecessary usages of memset

Replace memsets of 1 byte with simple assignments.
Issue found with checkpatch.pl

Signed-off-by: Suraj Upadhyay <usuraj35@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

e1000: Remove unnecessary usages of memset

Replace memsets of 1 byte with simple assignments.
Issue reported by checkpatch.pl.

Signed-off-by: Suraj Upadhyay <usuraj35@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

e100: use generic power management

With legacy PM hooks, it was the responsibility of a driver to manage PCI
states and also the device's power state. The generic approach is to let
PCI core handle the work.

e100_suspend() calls __e100_shutdown() to perform intermediate tasks.
__e100_shutdown() calls pci_save_state() which is not recommended.

e100_suspend() also calls __e100_power_off() which is calling PCI helper
functions, pci_prepare_to_sleep(), pci_set_power_state(), along with
pci_wake_from_d3(...,false). Hence, the functin call is removed and wol is
disabled as earlier using device_wakeup_disable().

Compile-tested only.

Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ixgbevf: use generic power management

With legacy PM, drivers themselves were responsible for managing the
device's power states and takes care of register states.

After upgrading to the generic structure, PCI core will take care of
required tasks and drivers should do only device-specific operations.

The driver was invoking PCI helper functions like pci_save/restore_state(),
and pci_enable/disable_device(), which is not recommended.

Compile-tested only.

Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ixgbe: use generic power management

With legacy PM hooks, it was the responsibility of a driver to manage PCI
states and also the device's power state. The generic approach is to let
PCI core handle the work.

ixgbe_suspend() calls __ixgbe_shutdown() to perform intermediate tasks.
__ixgbe_shutdown() modifies the value of "wake" (device should be wakeup
enabled or not), responsible for controlling the flow of legacy PM.

Since, PCI core has no idea about the value of "wake", new code for generic
PM may produce unexpected results. Thus, use "device_set_wakeup_enable()"
to wakeup-enable the device accordingly.

Compile-tested only.

Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

igbvf: use generic power management

Remove legacy PM callbacks and use generic operations. With legacy code,
drivers were responsible for handling PCI PM operations like
pci_save_state(). In generic code, all these are handled by PCI core.

The generic suspend() and resume() are called at the same point the legacy
ones were called. Thus, it does not affect the normal functioning of the
driver.

__maybe_unused attribute is used with .resume() but not with .suspend(), as
.suspend() is called by .shutdown().

Compile-tested only.

Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Tested-by: Aaron Brown <aaron.f.brown@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

iavf: use generic power management

With the support of generic PM callbacks, drivers no longer need to use
legacy .suspend() and .resume() in which they had to maintain PCI states
changes and device's power state themselves. The required operations are
done by PCI core.

PCI drivers are not expected to invoke PCI helper functions like
pci_save/restore_state(), pci_enable/disable_device(),
pci_set_power_state(), etc. Their tasks are completed by PCI core itself.

Compile-tested only.

Signed-off-by: Vaibhav Gupta <vaibhavgupta40@gmail.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Revert "Bluetooth: btusb: Disable runtime suspend on Realtek devices"

This reverts commit 7ecacafc240638148567742cca41aa7144b4fe1e.

Testing this change on a board with RTL8822CE, I found that enabling
autosuspend has no effect on the stability of the system. The board
continued working after autosuspend, suspend and reboot.

The original commit makes it impossible to enable autosuspend on working
systems so it should be reverted. Disabling autosuspend should be done
via module param or udev in userspace instead.

Signed-off-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Acked-by: Kai-Heng Feng <kai.heng.feng@canonical.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Enable controller RPA resolution using Experimental feature

This patch adds support to enable the use of RPA Address resolution
using expermental feature mgmt command.

Signed-off-by: Sathish Narasimman <sathish.narasimman@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Enable RPA Timeout

Enable RPA timeout during bluetooth initialization.
The RPA timeout value is used from hdev, which initialized from
debug_fs

Signed-off-by: Sathish Narasimman <sathish.narasimman@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Enable/Disable address resolution during le create conn

In this patch if le_create_conn process is started restrict to
disable address resolution and same is disabled during
le_enh_connection_complete

Signed-off-by: Sathish Narasimman <sathish.narasimman@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Let controller creates RPA during le create conn

When address resolution is enabled and set_privacy is enabled let's
use own address type as 0x03

Signed-off-by: Sathish Narasimman <sathish.narasimman@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Translate additional address type during le_conn

When using controller based address resolution, then the new address
types 0x02 and 0x03 are used. These types need to be converted back into
either public address or random address types.

This patch is specially during LE_CREATE_CONN if using own_add_type as 0x02
or 0x03.

Signed-off-by: Sathish Narasimman <sathish.narasimman@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Update resolving list when updating whitelist

When the whitelist is updated, then also update the entries of the
resolving list for devices where IRKs are available.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sathish Narsimman <sathish.narasimman@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Configure controller address resolution if available

When the LL Privacy support is available, then as part of enabling or
disabling passive background scanning, it is required to set up the
controller based address resolution as well.

Since only passive background scanning is utilizing the whitelist, the
address resolution is now bound to the whitelist and passive background
scanning. All other resolution can be easily done by the host stack.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sathish Narsimman <sathish.narasimman@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

Bluetooth: Translate additional address type correctly

When using controller based address resolution, then the new address
types 0x02 and 0x03 are used. These types need to be converted back into
either public address or random address types.

Signed-off-by: Marcel Holtmann <marcel@holtmann.org>
Signed-off-by: Sathish Narsimman <sathish.narasimman@intel.com>
Signed-off-by: Marcel Holtmann <marcel@holtmann.org>

fib: fix fib_rules_ops indirect calls wrappers

This patch fixes:
commit b9aaec8f0be5 ("fib: use indirect call wrappers in the most common
fib_rules_ops") which didn't consider the case when
CONFIG_IPV6_MULTIPLE_TABLES is not set.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: b9aaec8f0be5 ("fib: use indirect call wrappers in the most common fib_rules_ops")
Signed-off-by: Brian Vazquez <brianvv@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue

Tony Nguyen says:

====================
100GbE Intel Wired LAN Driver Updates 2020-07-29

This series contains updates to the ice driver only.

Dave works around LFC settings not being preserved through link events.
Fixes link issues with GLOBR reset and handling of multiple link events.

Nick restores VF MSI-X after PCI reset.

Kiran corrects the error code returned in ice_aq_sw_rules if the rule
does not exist.

Paul prevents overwriting of user set descriptors.

Tarun adds masking before accessing rate limiting profile types and
corrects queue bandwidth configuration.

Victor modifies Tx queue scheduler distribution to spread more evenly
across queue group nodes.

Krzysztof sets need_wakeup flag for Tx AF_XDP.

Brett allows VLANs in safe mode.

Marcin cleans up VSIs on probe failure.

Bruce reduces the scope of a variable.

Ben removes a FW workaround.

Tony fixes an unused parameter warning.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: mvneta: fix comment about phylink_speed_down

mvneta has switched to phylink, so the comment should look
like "We may have called phylink_speed_down before".

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Reviewed-by: Russell King <rmk+kernel@armlinux.org.uk>
Signed-off-by: David S. Miller <davem@davemloft.net>

ice: fix unused parameter warning

Depending on PAGE_SIZE, the following unused parameter warning can be
reported:

drivers/net/ethernet/intel/ice/ice_txrx.c: In function ‘ice_rx_frame_truesize’:
drivers/net/ethernet/intel/ice/ice_txrx.c:513:21: warning: unused parameter ‘size’ [-Wunused-parameter]
unsigned int size)

The 'size' variable is used only when PAGE_SIZE >= 8192. Add __maybe_unused
to remove the warning.

Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>

ice: disable no longer needed workaround for FW logging

For the FW logging info AQ command, we currently set the ICE_AQ_FLAG_RD
in order to work around a FW issue. This issue has been fixed so remove the
workaround.

Signed-off-by: Ben Shelton <benjamin.h.shelton@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: reduce scope of variable

The scope of the macro local variable 'i' can be reduced. Do so to avoid
static analysis tools from complaining.

Signed-off-by: Bruce Allan <bruce.w.allan@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: cleanup VSI on probe fail

As part of ice_setup_pf_sw() a PF VSI is setup; release the VSI in case of
failure.

Signed-off-by: Marcin Szycik <marcin.szycik@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Allow all VLANs in safe mode

Currently the PF VSI's context parameters are left in a bad state when
going into safe mode. This is causing VLAN traffic to not pass. Fix this
by configuring the PF VSI to allow all VLAN tagged traffic.

Also, remove redundant comment explaining the safe mode flow in
ice_probe().

Signed-off-by: Brett Creeley <brett.creeley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: need_wakeup flag might not be set for Tx

This is a port of i40e commit 705639572e8c ("i40e: need_wakeup flag might
not be set for Tx").

Quoting the original commit message:

"The need_wakeup flag for Tx might not be set for AF_XDP sockets that
are only used to send packets. This happens if there is at least one
outstanding packet that has not been completed by the hardware and we
get that corresponding completion (which will not generate an interrupt
since interrupts are disabled in the napi poll loop) between the time we
stopped processing the Tx completions and interrupts are enabled again.
In this case, the need_wakeup flag will have been cleared at the end of
the Tx completion processing as we believe we will get an interrupt from
the outstanding completion at a later point in time. But if this
completion interrupt occurs before interrupts are enable, we lose it and
should at that point really have set the need_wakeup flag since there
are no more outstanding completions that can generate an interrupt to
continue the processing. When this happens, user space will see a Tx
queue need_wakeup of 0 and skip issuing a syscall, which means will
never get into the Tx processing again and we have a deadlock."

As a result, packet processing stops. This patch introduces a fix for
this issue, by always setting the need_wakeup flag at the end of an
interrupt processing. This ensures that the deadlock will not happen.

Signed-off-by: Krzysztof Kazimierczak <krzysztof.kazimierczak@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: distribute Tx queues evenly

Distribute the Tx queues evenly across all queue groups. This will
help the queues to get more equal sharing among the queues when all
are in use.

In the previous algorithm, the next queue group node will be picked up
only after the previous one filled with max children.
For example: if VSI is configured with 9 queues, the first 8 queues
will be assigned to queue group 1 and the 9th queue will be assigned to
queue group 2.

The 2 queue groups split the bandwidth between them equally (50:50).
The first queue group node will share the 50% bandwidth with all of
its children (8 queues). And the second queue group node will share
the entire 50% bandwidth with its only children.

The new algorithm will fix this issue.

Signed-off-by: Victor Raj <victor.raj@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Adjust scheduler default BW weight

By default the queues are configured in legacy mode. The default
BW settings for legacy/advanced modes are different. The existing
code was using the advanced mode default value of 1 which was
incorrect. This caused the unbalanced BW sharing among siblings.
The recommended default value is applied.

Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Add RL profile bit mask check

Mask bits before accessing the profile type field.

Signed-off-by: Tarun Singh <tarun.k.singh@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: fix overwriting TX/RX descriptor values when rebuilding VSI

If a user sets the value of the TX or RX descriptors to some non-default
value using 'ethtool -G' then we need to not overwrite the values when
we rebuild the VSI. The VSI rebuild could happen as a result of a user
setting the number of queues via the 'ethtool -L' command. Fix this by
checking to see if the value we have stored is non-zero and if it is
then don't change the value.

Signed-off-by: Paul M Stillwell Jr <paul.m.stillwell.jr@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: return correct error code from ice_aq_sw_rules

Return ICE_ERR_DOES_NOT_EXIST return code if admin command error code is
ICE_AQ_RC_ENOENT (not exist). ice_aq_sw_rules is used when switch
rule is getting added/deleted/updated. In case of delete/update
switch rule, admin command can return ICE_AQ_RC_ENOENT error code
if such rule does not exist, hence return ICE_ERR_DOES_NOT_EXIST error
code from ice_aq_sw_rule, so that caller of this function can decide
how to handle ICE_ERR_DOES_NOT_EXIST.

Signed-off-by: Kiran Patil <kiran.patil@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: restore VF MSI-X state during PCI reset

During a PCI FLR the MSI-X Enable flag in the VF PCI MSI-X capability
register will be cleared. This can lead to issues when a VF is
assigned to a VM because in these cases the VF driver receives no
indication of the PF PCI error/reset and additionally it is incapable
of restoring the cleared flag in the hypervisor configuration space
without fully reinitializing the driver interrupt functionality.

Since the VF driver is unable to easily resolve this condition on its own,
restore the VF MSI-X flag during the PF PCI reset handling.

Signed-off-by: Nick Nunley <nicholas.d.nunley@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: fix link event handling timing

When the driver experiences a link event (especially link up)
there can be multiple events generated. Some of these are
link fault and still have a state of DOWN set. The problem
happens when the link comes UP during the PF driver handling
one of the LINK DOWN events. The status of the link is updated
and is now seen as UP, so when the actual LINK UP event comes,
the port information has already been updated to be seen as UP,
even though none of the UP activities have been completed.

After the link information has been updated in the link
handler and evaluated for MEDIA PRESENT, if the state
of the link has been changed to UP, treat the DOWN event
as an UP event since the link is now UP.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Fix link broken after GLOBR reset

After a GLOBR, the link was broken so that a link
up situation was being seen as a link down.

The problem was that the rebuild process was updating
the port_info link status without doing any of the
other things that need to be done when link changes.

This was causing the port_info struct to have current
"UP" information so that any further UP interrupts
were skipped as redundant.

The rebuild flow should *not* be updating the port_info
struct link information, so eliminate this and leave
it to the link event handling code.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

ice: Implement LFC workaround

There is a bug where the LFC settings are not being preserved
through a link event. The registers in question are the ones
that are touched (and restored) when a set_local_mib AQ command
is performed.

On a link-up event, make sure that a set_local_mib is being
performed.

Move the function ice_aq_set_lldp_mib() from the DCB specific
ice_dcb.c to ice_common.c so that the driver always has access
to this AQ command.

Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
Signed-off-by: Tony Nguyen <anthony.l.nguyen@intel.com>

Merge branch 'net-stmmac-improve-WOL'

Jisheng Zhang says:

====================
net: stmmac: improve WOL

Currently, stmmac driver relies on the HW PMT to support WOL. We want
to support phy based WOL.

patch1 is a small improvement to disable WAKE_MAGIC for PMT case if
no pmt_magic_frame.
patch2 and patch3 are two prepation patches.
patch4 implement the phy based WOL
patch5 tries to save a bit energy if WOL is enabled.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Speed down the PHY if WoL to save energy

When WoL is enabled and the machine is powered off, the PHY remains
waiting for wakeup events at max speed, which is a waste of energy.

Slow down the PHY speed before stopping the ethernet if WoL is enabled,

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Support WOL with phy

Currently, the stmmac driver WOL implementation relies on MAC's PMT
feature. We have a case: the MAC HW doesn't enable PMT, instead, we
rely on the phy to support WOL. Implement the support for this case.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: only call pmt() during suspend/resume if HW enables PMT

This is to prepare WOL support with phy. Compared with WOL
implementation which relies on the MAC's PMT features, in phy
supported WOL case, device_may_wakeup() may also be true, but we
should not call mac's pmt() function if HW doesn't enable PMT.

And during resume, we should call phylink_start() if PMT is disabled.

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: stmmac: Move device_can_wakeup() check earlier in set_wol

If !device_can_wakeup(), there's no need to futher check. And return
-EOPNOTSUPP rather than -EINVAL if !device_can_wakeup().

Signed-off-by: Jisheng Zhang <Jisheng.Zhang@synaptics.com>
Signed-off-by: David S. Miller <davem@davemloft.net>