To make ICMPv6 closer to ICMPv4, add ratemask parameter. Since the ICMP
message types use larger numeric values, a simple bitmask doesn't fit.
I use large bitmap. The input and output are the in form of list of
ranges. Set the default to rate limit all error messages but Packet Too
Big. For Packet Too Big, use ratemask instead of hard-coded.
There are functions where icmpv6_xrlim_allow() and icmpv6_global_allow()
aren't called. This patch only adds them to icmpv6_echo_reply().
Rate limiting error messages is mandated by RFC 4443 but RFC 4890 says
that it is also acceptable to rate limit informational messages. Thus,
I removed the current hard-coded behavior of icmpv6_mask_allow() that
doesn't rate limit informational messages.
v2: Add dummy function proc_do_large_bitmap() if CONFIG_PROC_SYSCTL
isn't defined, expand the description in ip-sysctl.txt and remove
unnecessary conditional before kfree().
v3: Inline the bitmap instead of dynamically allocated. Still is a
pointer to it is needed because of the way proc_do_large_bitmap work.
Signed-off-by: Stephen Suryaputra <ssuryaextr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
In phy_device_create() we set phydev->autoneg = 1. This isn't changed
even if the PHY doesn't support autoneg. This seems to affect very
few PHY's, and they disable phydev->autoneg in their config_init
callback. So it's more of an improvement, therefore net-next.
The patch also wouldn't apply to older kernel versions because the
link mode bitmaps have been introduced recently.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 18 Apr 2019 18:25:33 +0000 (11:25 -0700)]
Merge branch '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
100GbE Intel Wired LAN Driver Updates 2019-04-18
This series contains updates to the ice driver only.
Anirudh fixes up code comments which had typos. Added support for DCB
into the ice driver, which required a bit of refactoring of the existing
code. Also fixed a potential race condition between closing and opening
the VSI for a MIB change event, so resolved this by grabbing the
rtnl_lock prior to closing. Added support to process LLDP MIB change
notifications. Added support for reporting DCB stats via ethtool.
Brett updates the calculation to increment ITR to use a direct
calculation instead of using estimations. This provides a more accurate
value.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Brett Creeley [Thu, 28 Feb 2019 23:25:47 +0000 (15:25 -0800)]
ice: Calculate ITR increment based on direct calculation
Currently when calculating how much to increment ITR by inside of
ice_update_itr() we do some estimations and intermediate
calculations. Instead of doing estimations, just do the
calculation directly. This allows for a more accurate value and it
makes it easier for the next person to understand and update.
Also, remove the dividing the ITR value by 2 when latency
driven because the ITR values are already so low for 100Gbps
speed. This should help get to the desired ITR value faster.
Signed-off-by: Brett Creeley <brett.creeley@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch adds a new function ice_pf_dcb_cfg (and related helpers)
which applies the DCB configuration obtained from the firmware. As
part of this, VSIs/netdevs are updated with traffic class information.
This patch requires a bit of a refactor of existing code.
1. For a MIB change event, the associated VSI is closed and brought up
again. The gap between closing and opening the VSI can cause a race
condition. Fix this by grabbing the rtnl_lock prior to closing the
VSI and then only free it after re-opening the VSI during a MIB
change event.
2. ice_sched_query_elem is used in ice_sched.c and with this patch, in
ice_dcb.c as well. However, ice_dcb.c is not built when CONFIG_DCB is
unset. This results in namespace warnings (ice_sched.o: Externally
defined symbols with no external references) when CONFIG_DCB is unset.
To avoid this move ice_sched_query_elem from ice_sched.c to
ice_common.c.
Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch introduces a new top level function ice_init_dcb (and
related lower level helper functions) which continues the DCB init
flow.
This function uses ice_get_dcb_cfg to get, parse and store the DCB
configuration. Once this is done, it sets itself up to be notified
by the firmware on LLDP MIB change events.
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch introduces a skeleton for ice_init_pf_dcb, the top level
function for DCB initialization. Subsequent patches will add to this
DCB init flow.
In this patch, ice_init_pf_dcb checks if DCB is a supported capability.
If so, an admin queue call to start the LLDP and DCBx in firmware is
issued. If not, an error is reported. Note that we don't fail the driver
init if DCB init fails.
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Capitalize abbreviations and spell out some that aren't obvious.
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Reviewed-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Larry Finger [Wed, 17 Apr 2019 00:35:47 +0000 (19:35 -0500)]
rtlwifi: rtl8188ee: Remove extraneous file
Somehow file drivers/net/wireless/realtek/rtlwifi/rtl8188ee/trx.c.rej was
incorporated into the sources. Obviously, it can be removed.
Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Reported-by: Andrew Morton <akpm@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org>
David Ahern [Wed, 17 Apr 2019 00:31:43 +0000 (17:31 -0700)]
net ipv6: Prevent neighbor add if protocol is disabled on device
Disabling IPv6 on an interface removes existing entries but nothing prevents
new entries from being manually added. To that end, add a new neigh_table
operation, allow_add, that is called on RTM_NEWNEIGH to see if neighbor
entries are allowed on a given device. If IPv6 is disabled on the device,
allow_add returns false and passes a message back to the user via extack.
$ echo 1 > /proc/sys/net/ipv6/conf/eth1/disable_ipv6
$ ip -6 neigh add fe80::4c88:bff:fe21:2704 dev eth1 lladdr de:ad:be:ef:01:01
Error: IPv6 is disabled on this device.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
ipv6: Use fib6_result for fib_lookups
Add fib6_result as a single data structure to hold results from a fib
lookup. IPv6 currently has everything in 1 data structure - a fib6_info,
but with nexthop objects the fib6_nh can be in a nexthop or a nexthop
can be a blackhole which affects the fib6_type and flags (REJECT).
v2
- fixed 2 bugs in patch12:
i. checking return from fib6_table_lookup in fib6_lookup
ii. call to fib6_rule_saddr in fib6_rule_action_alt should use res->nh
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 16 Apr 2019 21:36:11 +0000 (14:36 -0700)]
ipv6: Add fib6_type and fib6_flags to fib6_result
Add the fib6_flags and fib6_type to fib6_result. Update the lookup helpers
to set them and update post fib lookup users to use the version from the
result.
This allows nexthop objects to have blackhole nexthop.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 16 Apr 2019 21:36:05 +0000 (14:36 -0700)]
ipv6: Pass fib6_result to rt6_insert_exception
Update rt6_insert_exception to take a fib6_result over a fib6_info.
Change ort to f6i from the fib6_result and rename to better reflect
what it references (a fib6_info).
Since this function is already getting changed, update the comments
to reference fib6_info variables rather than the older rt6_info.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 16 Apr 2019 21:36:00 +0000 (14:36 -0700)]
ipv6: Pass fib6_result to rt6_find_cached_rt
Simplify rt6_find_cached_rt for the fast path cases and pass fib6_result
to rt6_find_cached_rt. Rename the local return variable to ret to maintain
consisting with fib6_result name.
Update the comment in rt6_find_cached_rt to reference the new names in
a fib6_info vs the old name when fib entries were an rt6_info.
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 16 Apr 2019 21:35:59 +0000 (14:35 -0700)]
ipv6: Rename fib6_multipath_select and pass fib6_result
Add 'struct fib6_result' to hold the fib entry and fib6_nh from a fib
lookup as separate entries, similar to what IPv4 now has with fib_result.
Rename fib6_multipath_select to fib6_select_path, pass fib6_result to
it, and set f6i and nh in the result once a path selection is done.
Call fib6_select_path unconditionally for path selection which means
moving the sibling and oif check to fib6_select_path. To handle the two
different call paths (2 only call multipath_select if flowi6_oif == 0 and
the other always calls it), add a new have_oif_match that controls the
sibling walk if relevant.
Update callers of fib6_multipath_select accordingly and have them use the
fib6_info and fib6_nh from the result.
This is needed for multipath nexthop objects where a single f6i can
point to multiple fib6_nh (similar to IPv4).
Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
s390/qeth: stop/wake TX queues based on their fill level
Current xmit code only stops the txq after attempting to fill an
IO buffer that hasn't been TX-completed yet. In many-connection
scenarios, this can result in frequent rejected TX attempts, requeuing
of skbs with NETDEV_TX_BUSY and extra overhead.
Now that we have a proper 1-to-1 relation between stack-side txqs and
our HW Queues, overhaul the stop/wake logic so that the xmit code
stops the txq as needed.
Given that we might map multiple skbs into a single buffer, it's crucial
to ensure that the queue always provides an _entirely_ empty IO buffer.
Otherwise large skbs (eg TSO) might not fit into the last available
buffer. So whenever qeth_do_send_packet() first utilizes an _empty_
buffer, it updates & checks the used_buffers count.
This now ensures that an skb passed to qeth_xmit() can always be mapped
into an IO buffer, so remove all of the -EBUSY roll-back handling in the
TX path. We preserve the minimal safety-checks ("Is this IO buffer
really available?"), just in case some nasty future bug ever attempts to
corrupt an in-use buffer.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
s390/qeth: add TX multiqueue support for OSA devices
This adds trivial support for multiple TX queues on OSA-style devices
(both real HW and z/VM NICs). For now we expose the driver's existing
QoS mechanism via .ndo_select_queue, and adjust the number of available
TX queues when qeth_update_from_chp_desc() detects that the
HW configuration has changed.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
s390/qeth: add TX multiqueue support for IQD devices
qeth has been supporting multiple HW Output Queues for a long time. But
rather than exposing those queues to the stack, it uses its own queue
selection logic in .ndo_start_xmit... with all the drawbacks that
entails.
Start off by switching IQD devices over to a proper mqs net_device,
and converting all the netdev_queue management code.
One oddity with IQD devices is the requirement to place all mcast
traffic on the _highest_ established HW queue. Doing so via
.ndo_select_queue seems straight-forward - but that won't work if only
some of the HW queues are active
(ie. when dev->real_num_tx_queues < dev->num_tx_queues), since
netdev_cap_txqueue() will not allow us to put skbs on the higher queues.
To make this work, we
1. let .ndo_select_queue() map all mcast traffic to netdev_queue 0, and
2. later re-map the netdev_queue and HW queue indices in
.ndo_start_xmit and the TX completion handler.
With this patch we default to a fixed set of 1 ucast and 1 mcast queue.
Support for dynamic reconfiguration is added at a later time.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
struct netdev_queue contains a counter for tx timeouts, which gets
updated by dev_watchdog(). So let's not attempt to maintain our own
statistics, in particular not by overloading the skb-error counter.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
As the documentation for netif_trans_update() says, netdev_start_xmit()
already updates the last-tx time after every good xmit. So don't
duplicate that effort.
One odd case is that qeth_flush_buffers() also gets called from our
TX completion handler, to flush out any partially filled buffer when
we switch the queue to non-packing mode. But as the TX completion
handler will _always_ wake the txq, we don't have to worry about
the TX watchdog there.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
s390/qeth: handle error from qeth_update_from_chp_desc()
Subsequent code relies on the values that qeth_update_from_chp_desc()
reads from the CHP descriptor. Rather than dealing with weird errors
later on, just handle it properly here.
Signed-off-by: Julian Wiedmann <jwi@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 17 Apr 2019 17:31:21 +0000 (10:31 -0700)]
Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue
Jeff Kirsher says:
====================
40GbE Intel Wired LAN Driver Updates 2019-04-16
This series contains updates to i40e driver only.
Adam fixes i40e so that queues can be restored to its original value if
configuring queue channels fails. Bumped the maximum API version
supported and added the API version to error messages to clarify
supported firmware API versions. Fixed the problem with the driver
being able to add only 7 multicast MAC address filters instead of 16.
Aleksandr adds support for Dynamic Device Personalization (DDP) which
allows loading profiles that change the way internal parser interprets
processed frames.
Nick fixes an issue where if we modify the VLAN stripping options when a
port VLAN is configured, it will break traffic for the VSI, so prevent
changes from being made.
Jake fixes an issue where a device reset can mess up the clock time
because we reset the clock time based on the kernel time every reset.
This causes us to potentially completely reset the PTP time, and can
cause unexpected behavior in programs like ptp4l.
Piotr fixes an LED blink issue with the 'ethtool -p' command, so that
identification blinking will work on all hardware.
Chinh fixed the error returned to correctly reflect the current state
when LLDP or DCBx is not in an operational state.
Grzegorz cleans up a misleading error message when untrusted VF tries to
exceed addresses beyond the NIC limit.
Carolyn fixes the error return code to correctly reflect the error case.
v2: updated the URL provided in the DDP patch (#2)
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'for-linus-5.1-2' of git://github.com/cminyard/linux-ipmi
Pull IPMI fixes from Corey Minyard:
"Fixes for some bugs cause by recent changes. One crash if you feed bad
data to the module parameters, one BUG that sometimes occurs when a
user closes the connection, and one bug that cause the driver to not
work if the configuration information only comes in from SMBIOS"
* tag 'for-linus-5.1-2' of git://github.com/cminyard/linux-ipmi:
ipmi: fix sleep-in-atomic in free_user at cleanup SRCU user->release_barrier
ipmi: ipmi_si_hardcode.c: init si_type array to fix a crash
ipmi: Fix failure on SMBIOS specified devices
Jose Abreu [Wed, 17 Apr 2019 07:33:05 +0000 (09:33 +0200)]
net: stmmac: Set Flow Control to automatic mode in the driver
By default Flow Control feature is not being enabled in stmmac.
This is a useful feature that can prevent loss of packets and now that
XGMAC already supports it (along with GMAC and QoS) it makes sense to
activate it.
Switch the module parameter to FLOW_AUTO so that Flow Control is
activated.
Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Wed, 17 Apr 2019 07:33:04 +0000 (09:33 +0200)]
net: stmmac: dwxgmac: Finish the Flow Control implementation
Finish the implementation of Flow Control feature. In order for it to
work correctly we need to set EHFC bit and the correct threshold values
for activating and deactivating it.
Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
It looks like the new socket options only work correctly
for native execution, but in case of compat mode fall back
to the old behavior as we ignore the 'old_timeval' flag.
Rework so we treat SO_RCVTIMEO_NEW/SO_SNDTIMEO_NEW the
same way in compat and native 32-bit mode.
Cc: Deepa Dinamani <deepa.kernel@gmail.com> Fixes: a9beb86ae6e5 ("sock: Add SO_RCVTIMEO_NEW and SO_SNDTIMEO_NEW") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Deepa Dinamani <deepa.kernel@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Tue, 16 Apr 2019 17:55:20 +0000 (10:55 -0700)]
tcp: tcp_grow_window() needs to respect tcp_space()
For some reason, tcp_grow_window() correctly tests if enough room
is present before attempting to increase tp->rcv_ssthresh,
but does not prevent it to grow past tcp_space()
This is causing hard to debug issues, like failing
the (__tcp_select_window(sk) >= tp->rcv_wnd) test
in __tcp_ack_snd_check(), causing ACK delays and possibly
slow flows.
Depending on tcp_rmem[2], MTU, skb->len/skb->truesize ratio,
we can see the problem happening on "netperf -t TCP_RR -- -r 2000,2000"
after about 60 round trips, when the active side no longer sends
immediate acks.
This bug predates git history.
Signed-off-by: Eric Dumazet <edumazet@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Acked-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
dpaa2-eth: Add flow steering support without masking
On DPAA2 platforms that lack a TCAM (like LS1088A), masking of
flow steering keys is not supported. Until now we didn't offer
flow steering capabilities at all on these platforms.
Introduce a limited support for flow steering, where we only
allow ethtool rules that share a common key (i.e. have the same
header fields). If a rule with a new composition key is wanted,
the user must first manually delete all previous rules.
First patch fixes a minor bug, the next two cleanup and prepare
the code and the last one introduces the actual FS support.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
dpaa2-eth: Add flow steering support without masking
On platforms that lack a TCAM (like LS1088A), masking of
flow steering keys is not supported. Until now we didn't
offer flow steering capabilities at all on these platforms,
since our driver implementation configured a "comprehensive"
FS key (containing all supported header fields), with masks
used to ignore the fields not present in the rules provided
by the user.
We now allow ethtool rules that share a common key (i.e. have
the same header fields). The FS key is now kept in the driver
private data and initialized when the first rule is added to
an empty table, rather than at probe time. If a rule with a new
composition key is wanted, the user must first manually delete
all previous rules.
When building a FS table entry to pass to firmware, we still use
the old building algorithm, which assumes an all-supported-fields
key, and later collapse the fields which aren't actually needed.
Masked rules are not supported; if provided, the mask value
will be ignored. For firmware versions older than MC10.7.0
(that only offer the legacy ABIs for configuring distribution
keys) flow steering without masking support remains unavailable.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Introduce an internal id bitfield to uniquely identify header fields
supported by the Rx distribution keys. For the hash key, add a
conversion from the RXH_* bitmask provided by ethtool to the internal
ids.
Signed-off-by: Ioana Radulescu <ruxandra.radulescu@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This is preventive cleanup that may save troubles later.
No need to cancel repeateadly queued work if code is properly
refactored.
Don't let the ethtool -s process interfere with the stat workqueue
scheduling.
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Tue, 16 Apr 2019 14:04:23 +0000 (15:04 +0100)]
nfp: flower: fix implicit fallthrough warning
The nfp_flower_copy_pre_actions function introduces a case statement with
an intentional fallthrough. However, this generates a warning if built
with the -Wimplicit-fallthrough flag.
Remove the warning by adding a fall through comment.
Fixes: 1c6952ca587d ("nfp: flower: generate merge flow rule") Signed-off-by: John Hurley <john.hurley@netronome.com> Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Reviewed-by: Jiong Wang <jiong.wang@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: bridge: fix netlink export of vlan_stats_per_port option
Since the introduction of the vlan_stats_per_port option the netlink
export of it has been broken since I made a typo and used the ifla
attribute instead of the bridge option to retrieve its state.
Sysfs export is fine, only netlink export has been affected.
Fixes: 9163a0fc1f0c0 ("net: bridge: add support for per-port vlan stats") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Tue, 16 Apr 2019 11:43:17 +0000 (12:43 +0100)]
qed: fix spelling mistake "faspath" -> "fastpath"
There is a spelling mistake in a DP_INFO message, fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com> Reviewed-by: Mukesh Ojha <mojha@codeaurora.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Antoine Tenart [Tue, 16 Apr 2019 10:10:20 +0000 (12:10 +0200)]
net: phy: micrel: add Asym Pause workaround
The Micrel KSZ9031 PHY may fail to establish a link when the Asymmetric
Pause capability is set. This issue is described in a Silicon Errata
(DS80000691D or DS80000692D), which advises to always disable the
capability. This patch implements the workaround by defining a KSZ9031
specific get_feature callback to force the Asymmetric Pause capability
bit to be cleared.
This fixes issues where the link would not come up at boot time, or when
the Asym Pause bit was set later on.
Signed-off-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
bnx2x: Support for timestamping in P2P mode.
The patch series adds driver support for timestamping the ptp packets
in peer-delay (P2P) mode.
- Patch (1) performs the code cleanup.
- Patch (2) adds the required implementation.
Please consider applying it 'net-next' tree.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
bnx2x: Add support for detection of P2P event packets.
The patch adds support for detecting the P2P (peer-to-peer) event packets.
This is required for timestamping the PTP packets in peer delay mode.
Unmask the below bits (set to 0) for device to detect the p2p packets.
NIG_REG_P0/1_LLH_PTP_PARAM_MASK
NIG_REG_P0/1_TLLH_PTP_PARAM_MASK
bit 1 - IPv4 DA 1 of 224.0.0.107.
bit 3 - IPv6 DA 1 of 0xFF02:0:0:0:0:0:0:6B.
bit 9 - MAC DA 1 of 0x01-80-C2-00-00-0E.
NIG_REG_P0/1_LLH_PTP_RULE_MASK
NIG_REG_P0/1_TLLH_PTP_RULE_MASK
bit 2 - {IPv4 DA 1; UDP DP 0}
bit 6 - MAC Ethertype 0 of 0x88F7.
bit 9 - MAC DA 1 of 0x01-80-C2-00-00-0E.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bnx2x: Replace magic numbers with macro definitions.
This patch performs code cleanup by defining macros for the ptp-timestamp
filters.
Signed-off-by: Sudarsana Reddy Kalluru <skalluru@marvell.com> Signed-off-by: Ariel Elior <aelior@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jie Liu [Tue, 16 Apr 2019 05:10:09 +0000 (13:10 +0800)]
tipc: set sysctl_tipc_rmem and named_timeout right range
We find that sysctl_tipc_rmem and named_timeout do not have the right minimum
setting. sysctl_tipc_rmem should be larger than zero, like sysctl_tcp_rmem.
And named_timeout as a timeout setting should be not less than zero.
Fixes: cc79dd1ba9c10 ("tipc: change socket buffer overflow control to respect sk_rcvbuf") Fixes: a5325ae5b8bff ("tipc: add name distributor resiliency queue") Signed-off-by: Jie Liu <liujie165@huawei.com> Reported-by: Qiang Ning <ningqiang1@huawei.com> Reviewed-by: Zhiqiang Liu <liuzhiqiang26@huawei.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
According to the link FSM, when a link endpoint got RESET_MSG (- a
traditional one without the stopping bit) from its peer, it moves to
PEER_RESET state and raises a LINK_DOWN event which then resets the
link itself. Its state will become ESTABLISHING after the reset event
and the link will be re-established soon after this endpoint starts to
send ACTIVATE_MSG to the peer.
There is no problem with this mechanism, however the link resetting has
cleared the link 'in_session' flag (along with the other important link
data such as: the link 'mtu') that was correctly set up at the 1st step
(i.e. when this endpoint received the peer RESET_MSG). As a result, the
link will become ESTABLISHED, but the 'in_session' flag is not set, and
all STATE_MSG from its peer will be dropped at the link_validate_msg().
It means the link not synced and will sooner or later face a failure.
Since the link reset action is obviously needed for a new link session
(this is also true in the other situations), the problem here is that
the link is re-established a bit too early when the link endpoints are
not really in-sync yet. The commit forces a resync as already done in
the previous commit 91986ee166cf ("tipc: fix link session and
re-establish issues") by simply varying the link 'peer_session' value
at the link_reset().
Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: Tuong Lien <tuong.t.lien@dektech.com.au> Signed-off-by: David S. Miller <davem@davemloft.net>
net: Fix missing meta data in skb with vlan packet
skb_reorder_vlan_header() should move XDP meta data with ethernet header
if XDP meta data exists.
Fixes: de8f3a83b0a0 ("bpf: add meta pointer for direct access") Signed-off-by: Yuya Kusakabe <yuya.kusakabe@gmail.com> Signed-off-by: Takeru Hayasaka <taketarou2@gmail.com> Co-developed-by: Takeru Hayasaka <taketarou2@gmail.com> Reviewed-by: Toshiaki Makita <makita.toshiaki@lab.ntt.co.jp> Signed-off-by: David S. Miller <davem@davemloft.net>
In preparation to enabling -Wimplicit-fallthrough, mark switch
cases where we are expecting to fall through.
This patch fixes the following warning:
drivers/net/xen-netfront.c: In function ‘netback_changed’:
drivers/net/xen-netfront.c:2038:6: warning: this statement may fall through [-Wimplicit-fallthrough=]
if (dev->state == XenbusStateClosed)
^
drivers/net/xen-netfront.c:2041:2: note: here
case XenbusStateClosing:
^~~~
Warning level 3 was used: -Wimplicit-fallthrough=3
Notice that, in this particular case, the code comment is modified
in accordance with what GCC is expecting to find.
This patch is part of the ongoing efforts to enable
-Wimplicit-fallthrough.
Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix this by sanitizing arg before using it to index dev_lec.
Notice that given that speculation windows are large, the policy is
to kill the speculation on the first load and not worry if it can be
completed with a dependent load/store [1].
net/core: work around section mismatch warning for ptp_classifier
The routine ptp_classifier_init() uses an initializer for an
automatic struct type variable which refers to an __initdata
symbol. This is perfectly legal, but may trigger a section
mismatch warning when running the compiler in -fpic mode, due
to the fact that the initializer may be emitted into an anonymous
.data section thats lack the __init annotation. So work around it
by using assignments instead.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When the commit below was introduced it changed two visible things:
- the skb was no longer passed through the protocol handlers with the
original device
- the skb was passed up the stack with skb->dev = bridge
The first change broke af_packet sockets on bridge ports. For example we
use them for hostapd which listens for ETH_P_PAE packets on the ports.
We discussed two possible fixes:
- create a clone and pass it through NF_HOOK(), act on the original skb
based on the result
- somehow signal to the caller from the okfn() that it was called,
meaning the skb is ok to be passed, which this patch is trying to
implement via returning 1 from the bridge link-local okfn()
Note that we rely on the fact that NF_QUEUE/STOLEN would return 0 and
drop/error would return < 0 thus the okfn() is called only when the
return was 1, so we signal to the caller that it was called by preserving
the return value from nf_hook().
Fixes: 8626c56c8279 ("bridge: fix potential use-after-free when hook returns QUEUE or STOLEN verdict") Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Adam Ludkiewicz [Wed, 6 Feb 2019 23:08:25 +0000 (15:08 -0800)]
i40e: Able to add up to 16 MAC filters on an untrusted VF
This patch fixes the problem with the driver being able to add only 7
multicast MAC address filters instead of 16. The problem is fixed by
changing the maximum number of MAC address filters to 16+1+1 (two extra
are needed because the driver uses 1 for unicast MAC address and 1 for
broadcast).
Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Adam Ludkiewicz [Wed, 6 Feb 2019 23:08:24 +0000 (15:08 -0800)]
i40e: Report advertised link modes on 40GBASE_SR4
Defined the advertised link mode field for 40000baseSR4_Full for
use with ethtool.
Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Adam Ludkiewicz [Wed, 6 Feb 2019 23:08:23 +0000 (15:08 -0800)]
i40e: The driver now prints the API version in error message
Added the API version in the error message for clarity.
Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Adam Ludkiewicz [Wed, 6 Feb 2019 23:08:22 +0000 (15:08 -0800)]
i40e: Changed maximum supported FW API version to 1.8
A new FW has been released, which uses API version 1.8.
Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Grzegorz Siwik [Wed, 6 Feb 2019 23:08:21 +0000 (15:08 -0800)]
i40e: Remove misleading messages for untrusted VF
Removed misleading messages when untrusted VF tries to
add more addresses than NIC limit
Signed-off-by: Grzegorz Siwik <grzegorz.siwik@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Chinh T Cao [Wed, 6 Feb 2019 23:08:20 +0000 (15:08 -0800)]
i40e: Update i40e_init_dcb to return correct error
Modify the i40e_init_dcb to return the correct error when LLDP or DCBX
is not in operational state.
Signed-off-by: Chinh T Cao <chinh.t.cao@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Piotr Marczak [Wed, 6 Feb 2019 23:08:19 +0000 (15:08 -0800)]
i40e: Fix for 10G ports LED not blinking
On some hardware LEDs would not blink after command 'ethtool -p {eth-port}'
in certain circumstances. Now, function does not care about the activity
of the LED (though still preserves its state) but forcibly executes
identification blinking and then restores the LED state.
Signed-off-by: Piotr Marczak <piotr.marczak@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jacob Keller [Tue, 12 Feb 2019 21:56:24 +0000 (13:56 -0800)]
i40e: save PTP time before a device reset
In the case where PTP is running on the hardware clock, but the kernel
system time is not being synced, a device reset can mess up the clock
time.
This occurs because we reset the clock time based on the kernel time
every reset. This causes us to potentially completely reset the PTP
time, and can cause unexpected behavior in programs like ptp4l.
Avoid this by saving the PTP time prior to device reset, and then
restoring using that time after the reset.
Directly restoring the PTP time we saved isn't perfect, because time
should have continued running, but the clock will essentially be stopped
during the reset. This is still better than the current solution of
assuming that the PTP HW clock is synced to the CLOCK_REALTIME.
We can do even better, by saving the ktime and calculating
a differential, using ktime_get(). This is based on CLOCK_MONOTONIC, and
allows us to get a fairly precise measure of the time difference between
saving and restoring the time.
Using this, we can update the saved PTP time, and use that as the value
to write to the hardware clock registers. This, of course is not perfect.
However, it does help ensure that the PTP time is restored as close as
feasible to the time it should have been if the reset had not occurred.
During device initialization, continue using the system time as the
source for the creation of the PTP clock, since this is the best known
current time source at driver load.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Nicholas Nunley [Wed, 6 Feb 2019 23:08:17 +0000 (15:08 -0800)]
i40e: don't allow changes to HW VLAN stripping on active port VLANs
Modifying the VLAN stripping options when a port VLAN is configured
will break traffic for the VSI, and conceptually doesn't make sense,
so don't allow this.
Signed-off-by: Nicholas Nunley <nicholas.d.nunley@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch introduces DDP (Dynamic Device Personalization) which allows
loading profiles that change the way internal parser interprets processed
frames. To load DDP profiles it utilizes ethtool flash feature. The files
with recipes must be located in /var/lib/firmware directory. Afterwards
the recipe can be loaded by invoking:
See further details of this feature in the i40e documentation, or
visit
https://www.intel.com/content/www/us/en/architecture-and-technology/ethernet/dynamic-device-personalization-brief.html
The driver shall verify DDP profile can be loaded in accordance with
the rules:
* Package with Group ID 0 are exclusive and can only be loaded the first.
* Packages with Group ID 0x01-0xFE can only be loaded simultaneously
with the packages from the same group.
* Packages with Group ID 0xFF are compatible with all other packages.
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Adam Ludkiewicz [Wed, 6 Feb 2019 23:08:15 +0000 (15:08 -0800)]
i40e: Queues are reserved despite "Invalid argument" error
Added a new local variable in the i40e_setup_tc function named
old_queue_pairs so num_queue_pairs can be restored to the correct
value in case configuring queue channels fails. Additionally, moved
the exit label in the i40e_setup_tc function so the if (need_reset)
block can be executed.
Also, fixed data packing in the i40e_setup_tc function.
Signed-off-by: Adam Ludkiewicz <adam.ludkiewicz@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Tony Camuso [Tue, 9 Apr 2019 19:20:03 +0000 (15:20 -0400)]
ipmi: ipmi_si_hardcode.c: init si_type array to fix a crash
The intended behavior of function ipmi_hardcode_init_one() is to default
to kcs interface when no type argument is presented when initializing
ipmi with hard coded addresses.
However, the array of char pointers allocated on the stack by function
ipmi_hardcode_init() was not inited to zeroes, so it contained stack
debris.
Consequently, passing the cruft stored in this array to function
ipmi_hardcode_init_one() caused a crash when it was unable to detect
that the char * being passed was nonsense and tried to access the
address specified by the bogus pointer.
The fix is simply to initialize the si_type array to zeroes, so if
there were no type argument given to at the command line, function
ipmi_hardcode_init_one() could properly default to the kcs interface.
Signed-off-by: Tony Camuso <tcamuso@redhat.com>
Message-Id: <1554837603-40299-1-git-send-email-tcamuso@redhat.com> Signed-off-by: Corey Minyard <cminyard@mvista.com>
Merge tag 'riscv-for-linus-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux
Pull RISC-V fixes from Palmer Dabbelt:
"This contains an assortment of RISC-V-related fixups that we found
after rc4. They're all really unrelated:
- The addition of a 32-bit defconfig, to emphasize testing the 32-bit
port.
- A device tree bindings patch, which is pre-work for some patches
that target 5.2.
- A fix to support booting on systems with more physical memory than
the maximum supported by the kernel"
* tag 'riscv-for-linus-5.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/palmer/riscv-linux:
RISC-V: Fix Maximum Physical Memory 2GiB option for 64bit systems
dt-bindings: clock: sifive: add FU540-C000 PRCI clock constants
RISC-V: Add separate defconfig for 32bit systems
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"5.1 keeps its reputation as a big bugfix release for KVM x86.
- Fix for a memory leak introduced during the merge window
- Fixes for nested VMX with ept=0
- Fixes for AMD (APIC virtualization, NMI injection)
- Fixes for Hyper-V under KVM and KVM under Hyper-V
- Fixes for 32-bit SMM and tests for SMM virtualization
- More array_index_nospec peppering"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (21 commits)
KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing
KVM: fix spectrev1 gadgets
KVM: x86: fix warning Using plain integer as NULL pointer
selftests: kvm: add a selftest for SMM
selftests: kvm: fix for compilers that do not support -no-pie
selftests: kvm/evmcs_test: complete I/O before migrating guest state
KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels
KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU
KVM: x86: clear SMM flags before loading state while leaving SMM
KVM: x86: Open code kvm_set_hflags
KVM: x86: Load SMRAM in a single shot when leaving SMM
KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU
KVM: x86: Raise #GP when guest vCPU do not support PMU
x86/kvm: move kvm_load/put_guest_xcr0 into atomic context
KVM: x86: svm: make sure NMI is injected after nmi_singlestep
svm/avic: Fix invalidate logical APIC id entry
Revert "svm: Fix AVIC incomplete IPI emulation"
kvm: mmu: Fix overflow on kvm mmu page limit calculation
KVM: nVMX: always use early vmcs check when EPT is disabled
KVM: nVMX: allow tests to use bad virtual-APIC page address
...
Vitaly Kuznetsov [Wed, 27 Mar 2019 14:12:20 +0000 (15:12 +0100)]
KVM: x86: avoid misreporting level-triggered irqs as edge-triggered in tracing
In __apic_accept_irq() interface trig_mode is int and actually on some code
paths it is set above u8:
kvm_apic_set_irq() extracts it from 'struct kvm_lapic_irq' where trig_mode
is u16. This is done on purpose as e.g. kvm_set_msi_irq() sets it to
(1 << 15) & e->msi.data
kvm_apic_local_deliver sets it to reg & (1 << 15).
Fix the immediate issue by making 'tm' into u16. We may also want to adjust
__apic_accept_irq() interface and use proper sizes for vector, level,
trig_mode but this is not urgent.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Add a simple test for SMM, based on VMX. The test implements its own
sync between the guest and the host as using our ucall library seems to
be too cumbersome: SMI handler is happening in real-address mode.
This patch also fixes KVM_SET_NESTED_STATE to happen after
KVM_SET_VCPU_EVENTS, in fact it places it last. This is because
KVM needs to know whether the processor is in SMM or not.
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 11 Apr 2019 13:51:19 +0000 (15:51 +0200)]
selftests: kvm: fix for compilers that do not support -no-pie
-no-pie was added to GCC at the same time as their configuration option
--enable-default-pie. Compilers that were built before do not have
-no-pie, but they also do not need it. Detect the option at build
time.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Paolo Bonzini [Thu, 11 Apr 2019 13:57:14 +0000 (15:57 +0200)]
selftests: kvm/evmcs_test: complete I/O before migrating guest state
Starting state migration after an IO exit without first completing IO
may result in test failures. We already have two tests that need this
(this patch in fact fixes evmcs_test, similar to what was fixed for
state_test in commit 0f73bbc851ed, "KVM: selftests: complete IO before
migrating guest state", 2019-03-13) and a third is coming. So, move the
code to vcpu_save_state, and while at it do not access register state
until after I/O is complete.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: x86: Always use 32-bit SMRAM save state for 32-bit kernels
Invoking the 64-bit variation on a 32-bit kenrel will crash the guest,
trigger a WARN, and/or lead to a buffer overrun in the host, e.g.
rsm_load_state_64() writes r8-r15 unconditionally, but enum kvm_reg and
thus x86_emulate_ctxt._regs only define r8-r15 for CONFIG_X86_64.
KVM allows userspace to report long mode support via CPUID, even though
the guest is all but guaranteed to crash if it actually tries to enable
long mode. But, a pure 32-bit guest that is ignorant of long mode will
happily plod along.
SMM complicates things as 64-bit CPUs use a different SMRAM save state
area. KVM handles this correctly for 64-bit kernels, e.g. uses the
legacy save state map if userspace has hid long mode from the guest,
but doesn't fare well when userspace reports long mode support on a
32-bit host kernel (32-bit KVM doesn't support 64-bit guests).
Since the alternative is to crash the guest, e.g. by not loading state
or explicitly requesting shutdown, unconditionally use the legacy SMRAM
save state map for 32-bit KVM. If a guest has managed to get far enough
to handle SMIs when running under a weird/buggy userspace hypervisor,
then don't deliberately crash the guest since there are no downsides
(from KVM's perspective) to allow it to continue running.
Fixes: 660a5d517aaab ("KVM: x86: save/load state on SMM switch") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
KVM: x86: Don't clear EFER during SMM transitions for 32-bit vCPU
Neither AMD nor Intel CPUs have an EFER field in the legacy SMRAM save
state area, i.e. don't save/restore EFER across SMM transitions. KVM
somewhat models this, e.g. doesn't clear EFER on entry to SMM if the
guest doesn't support long mode. But during RSM, KVM unconditionally
clears EFER so that it can get back to pure 32-bit mode in order to
start loading CRs with their actual non-SMM values.
Clear EFER only when it will be written when loading the non-SMM state
so as to preserve bits that can theoretically be set on 32-bit vCPUs,
e.g. KVM always emulates EFER_SCE.
And because CR4.PAE is cleared only to play nice with EFER, wrap that
code in the long mode check as well. Note, this may result in a
compiler warning about cr4 being consumed uninitialized. Re-read CR4
even though it's technically unnecessary, as doing so allows for more
readable code and RSM emulation is not a performance critical path.
Fixes: 660a5d517aaab ("KVM: x86: save/load state on SMM switch") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>