Currently ICE_MAX_MTU subtracts only ETH_HLEN from max frame size and
adds ETH_FCS_LEN and VLAN_HLEN, which is not what was intended.
The ETH_HLEN + ETH_FCS_LEN + VLAN_HLEN expression should be surrounded
with parentheses.
Wrap mentioned expression and take into account VLAN double tagging.
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Bruce Allan [Fri, 8 Feb 2019 20:50:28 +0000 (12:50 -0800)]
ice: Mark extack argument as __always_unused
Commit 87b0984ebfab ("net: Add extack argument to ndo_fdb_add()") in
net-next added an extended parameter to the .ndo_fdb_add op and changed
ice_fdb_add() accordingly. Update the function header and add the
__always_unused attribute to the new parameter to avoid -Wunused-parameter
warnings.
Signed-off-by: Bruce Allan <bruce.w.allan@intel.com> Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Peter Oskolkov [Sun, 24 Feb 2019 02:25:01 +0000 (18:25 -0800)]
net: fix double-free in bpf_lwt_xmit_reroute
dst_output() frees skb when it fails (see, for example,
ip_finish_output2), so it must not be freed in this case.
Fixes: 3bd0b15281af ("bpf: add handling of BPF_LWT_REROUTE to lwt_bpf.c") Signed-off-by: Peter Oskolkov <posk@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When acquiring the GPIO interrupt line for the switch, it is possible
to trigger lockdep splats. These are false positives, the mutex is in
a different IRQ descriptor. But fix it anyway, since it could mask
real locking issues.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Sat, 23 Feb 2019 16:43:57 +0000 (17:43 +0100)]
net: dsa: mv88e6xxx: Release lock while requesting IRQ
There is no need to hold the register lock while requesting the GPIO
interrupt. By not holding it we can also avoid a false positive
lockdep splat.
Reported-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
&desc->request_mutex refer to two different mutex. #1 is the GPIO for
the chip interrupt. #2 is the chained interrupt between global 1 and
global 2.
Add lockdep classes to the GPIO interrupt to avoid this.
Reported-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Vakul Garg [Sat, 23 Feb 2019 08:42:37 +0000 (08:42 +0000)]
tls: Return type of non-data records retrieved using MSG_PEEK in recvmsg
The patch enables returning 'type' in msghdr for records that are
retrieved with MSG_PEEK in recvmsg. Further it prevents records peeked
from socket from getting clubbed with any other record of different
type when records are subsequently dequeued from strparser.
For each record, we now retain its type in sk_buff's control buffer
cb[]. Inside control buffer, record's full length and offset are already
stored by strparser in 'struct strp_msg'. We store record type after
'struct strp_msg' inside 'struct tls_msg'. For tls1.2, the type is
stored just after record dequeue. For tls1.3, the type is stored after
record has been decrypted.
Inside process_rx_list(), before processing a non-data record, we check
that we must be able to return back the record type to the user
application. If not, the decrypted records in tls context's rx_list is
left there without consuming any data.
Fixes: 692d7b5d1f912 ("tls: Fix recvmsg() to be able to peek across multiple records") Signed-off-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
ipv4/v6: icmp: small cleanup and update
v2:
- Add cover letter and user proper patch subject-prefix suggested-by Eric Dumazet
This patch series contains some small cleanup and update,
1) use icmp/v6_sk_exit when icmp_sk_init fails instead of open-code
2) use new percpu allocation interface for the ipv6.icmp_sk
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Sat, 23 Feb 2019 05:30:47 +0000 (13:30 +0800)]
ila: Fix uninitialised return value in ila_xlat_nl_cmd_flush
This patch fixes an uninitialised return value error in
ila_xlat_nl_cmd_flush.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Fixes: 6c4128f65857 ("rhashtable: Remove obsolete...") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Sat, 23 Feb 2019 09:22:19 +0000 (17:22 +0800)]
net: hns3: fix improper error handling for hns3_client_start
If hns3_client_start() failed in the hns3_client_init(),
register_dev() should be undo in its error handling.
Fixes: a6d818e31d08 ("net: hns3: Add vport alive state checking support") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Shiju Jose [Sat, 23 Feb 2019 09:22:18 +0000 (17:22 +0800)]
net: hns3: fix setting of the hns reset_type for rdma hw errors
Presently the hns reset_type for the roce errors is set
in the hclge_log_and_clear_rocee_ras_error function.
This function is also called to detect and clear roce errors
while enabling the rdma error interrupts. However there is no hns
reset requested for this case. This can cause issue of wrong
reset_type used with subsequent hns reset as the
reset_type set in the above case was not cleared.
This patch moves setting of hns reset_type for the roce errors from
hclge_log_and_clear_rocee_ras_error function
to hclge_handle_rocee_ras_error.
Fixes: 630ba007f475 ("net: hns3: add handling of RDMA RAS errors") Reported-by: Huazhong Tan <tanhuazhong@huawei.com> Reported-by: Xiaofei Tan <tanxiaofei@huawei.com> Signed-off-by: Shiju Jose <shiju.jose@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Sat, 23 Feb 2019 09:22:17 +0000 (17:22 +0800)]
net: hns3: fix get VF RSS issue
For revision 0x20, VF shares the same RSS config with PF.
In original codes, it always return 0 when query RSS hash
key for VF. This patch fixes it by return the hash key
got from PF.
Fixes: 374ad291762a ("net: hns3: net: hns3: Add RSS general configuration support for VF") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Sat, 23 Feb 2019 09:22:16 +0000 (17:22 +0800)]
net: hns3: enable VF VLAN filter for each VF when initializing
For revision 0x21, the switch of VF VLAN filter is per function.
It's necessary to enable VF VLAN filter for each VF when initializing.
Otherwise, VF will be able to receive broadcast packets with unknown
VLAN when PF enters promisc mode.
Fixes: 64d114f0a750 ("net: hns3: Add egress/ingress vlan filter for revision 0x21") Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Sat, 23 Feb 2019 09:22:15 +0000 (17:22 +0800)]
net: hns3: add support to config depth for tx|rx ring separately
This patch adds support to config depth for tx|rx ring separately
by ethtool command "-G".
Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Sat, 23 Feb 2019 09:22:14 +0000 (17:22 +0800)]
net: hns3: remove hnae3_get_bit in data path
The hnae3_get_bit uses hnae3_get_field, and hnae3_get_field
masks the data, which is unnecessary in data path.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Sat, 23 Feb 2019 09:22:13 +0000 (17:22 +0800)]
net: hns3: replace hnae3_set_bit and hnae3_set_field in data path
hnae3_set_bit and hnae3_set_field masks the data before setting
the field or bit, which is unnecessary because the data is already
zero initialized.
Suggested-by: John Garry <john.garry@huawei.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Sat, 23 Feb 2019 09:22:12 +0000 (17:22 +0800)]
net: hns3: add unlikely for error handling in data path
This patch adds unlikely hint for error handling in critical data
path.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Sat, 23 Feb 2019 09:22:11 +0000 (17:22 +0800)]
net: hns3: remove some ops in struct hns3_nic_ops
The fill_desc ops has only one implementation, and
get_rxd_bnum has not been used, so this patch removes
them.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Sat, 23 Feb 2019 09:22:10 +0000 (17:22 +0800)]
net: hns3: limit some variable scope in critical data path
This patch limits some variables' scope as much as possible in
hns3_fill_desc.
Also, only set l3_type and l4_type when necessary.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Sat, 23 Feb 2019 09:22:09 +0000 (17:22 +0800)]
net: hns3: avoid mult + div op in critical data path
This patch uses shift offset to avoid doing mult and div operation.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Sat, 23 Feb 2019 09:22:08 +0000 (17:22 +0800)]
net: hns3: add xps setting support for hns3 driver
This patch adds xps setting support for hns3 driver based on
the interrupt affinity info.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
mlxsw: spectrum_acl: Don't take rtnl mutex for region rehash
Jiri says:
During region rehash, a new region is created with a more optimized set
of masks (ERPs). When transitioning to the new region, all the rules
from the old region are copied one-by-one to the new region. This
transition can be time consuming and currently done under RTNL lock.
In order to remove RTNL lock dependency during region rehash, introduce
multiple smaller locks guarding dedicated structures or parts of them.
That is the vast majority of this patchset. Only patch #1 is simple
cleanup and patches 12-15 are improving or introducing new selftests.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:30 +0000 (06:46 +0000)]
mlxsw: spectrum_acl: Remove RTNL lock assertions from ERP code
No longer require RTNL lock in this code. Newly introduced mutexes take
care of guarding objagg and bloom filter. There is no need to guard
gen_pool_alloc()/gen_pool_free() as they are fine to be called lockless.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
For MR ACL profile is does not make sense to do periodical rehashes, as
there is only one mask in use during the whole vregion lifetime.
Therefore periodical work is scheduled but the rehash never happens.
So allow to enable/disable rehash for the whole group, which is added
per-profile. Disable rehashing for MR profile.
Addition to the vregion list is done only in case the rehash is enable
on the particular vregion. Also, the addition is moved after delayed
work init to avoid schedule of uninitialized work
from vregion_rehash_intrvl_set(). Symmetrically, deletion from
the list is done before canceling the delayed work so it is
not scheduled by vregion_rehash_intrvl_set() again.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:25 +0000 (06:46 +0000)]
mlxsw: spectrum_acl: Refactor vregion association code
Refactor existing _vchunk_assoc/_vchunk_deassoc() functions into
_vregion_get()/_vregion_put() to make the code simpler and prepared for
vregion locking.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Sun, 24 Feb 2019 06:46:23 +0000 (06:46 +0000)]
mlxsw: spectrum_acl: Split TCAM group structure into two
Make the existing group structure to contain fields needed for HW region
list manipulations. Move the rest of the fields into new vgroup struct.
This makes layering cleaner as the vgroup struct is on higher level than
low-level group struct. Also, this makes it possible to introduce
fine-grained locking.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: dsa: microchip: add MIB counters support
This series of patches is to modify the KSZ9477 DSA driver to read MIB
counters periodically to avoid overflow.
The MIB counters should be read only when there is link. Otherwise it is
a waste of time as hardware never increases the counters.
Functions are added to check the port link status so that MIB counters
read call is used efficiently.
v4
- Use readx_poll_timeout
- Fix using mutex in a timer callback function problem
- use dp->slave directly instead of checking whether it is valid
- Add port_cleanup function in a separate patch
- Add a mutex so that changing device variables is safe
v3
- Use netif_carrier_ok instead of checking the phy device pointer
v2
- Create macro similar to readx_poll_timeout to use with switch
- Create ksz_port_cleanup function so that variables like on_ports and
live_ports can be updated inside it
v1
- Use readx_poll_timeout
- Do not clear MIB counters when port is enabled
- Do not advertise 1000 half-duplex mode when port is enabled
- Do not use freeze function as MIB counters may miss counts
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Tristram Ha [Sat, 23 Feb 2019 00:36:51 +0000 (16:36 -0800)]
net: dsa: microchip: add port_cleanup function
Add port_cleanup function to reset some device variables when the port is
disabled. Add a mutex to make sure changing those variables is
thread-safe.
Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tristram Ha [Sat, 23 Feb 2019 00:36:50 +0000 (16:36 -0800)]
net: dsa: microchip: remove unnecessary include headers
Remove unnecessary header include.
Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tristram Ha [Sat, 23 Feb 2019 00:36:49 +0000 (16:36 -0800)]
net: dsa: microchip: get port link status
Get port link status to know whether to read MIB counters when the link
is going down.
Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tristram Ha [Sat, 23 Feb 2019 00:36:48 +0000 (16:36 -0800)]
net: dsa: microchip: add MIB counter reading support
Add background MIB counter reading support.
Port MIB counters should only be read when there is link. Otherwise it is
a waste of time as hardware never increases those counters. There are
exceptions as some switches keep track of dropped counts no matter what.
Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tristram Ha [Sat, 23 Feb 2019 00:36:47 +0000 (16:36 -0800)]
net: dsa: microchip: prepare PHY for proper advertisement
Prepare PHY for proper advertisement as sometimes the PHY in the switch
has its own problems even though it may share the PHY id from regular PHY
but the fixes in the PHY driver do not apply.
Signed-off-by: Tristram Ha <Tristram.Ha@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: phy: marvell10g: Add 2.5GBaseT support
This series adds the missing bits necessary to fully support 2.5GBaseT
in the Marvell Alaska PHYs.
The main points for that support are :
- Making use of the .get_features call, recently introduced by Heiner
and Andrew, that allows having a fully populated list of supported
modes, including 2500BaseT.
- Configuring the MII to 2500BaseX when establishing a link at 2.5G
- Adding a small quirk to take into account the fact that some PHYs in
the family won't report the correct supported abilities
The rest of the series consists of small cosmetic improvements such as
using the correct helper to set a linkmode bit and adding macros for the
PHY ids.
We also add support for the 88E2110 PHY, which doesn't require the
quirk, and support for 2500BaseT in the PPv2 driver, in order to have a
fully working setup on the MacchiatoBin board.
Changes since V1 : Fixed formatting issue in patch 01, rebased.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: phy: marvell10g: add support for the 88x2110 PHY
This patch adds support for the 88x2110 PHY, which is similar to the
already supported 88x3310 PHY without the SFP interface.
It supports 10/100/1000BASET along with 2.5GBASET, 5GBASET and 10GBASET,
with the same interface modes that are used by the 3310.
This PHY don't have the same issue as the 88x3310 regarding 2.5/5G
abilities, and correctly follows the 802.3bz standard to list the
supported abilities.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Suggested-by: Antoine Tenart <antoine.tenart@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
The PPv2 controller is able to support 2.5G speeds, allowing to use
2.5GBASET in conjunction with PHYs that use 2500BASEX as their MII
interface when using this mode.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
As per 802.3bz, if bit 14 of (1.11) "PMA Extended Abilities" indicates
whether or not we should read register (1.21) "2.52/5G PMA Extended
Abilities", which contains information on the support of 2.5GBASET and
5GBASET.
After testing on several variants of PHYS of this family, it appears
that bit 14 in (1.11) isn't always set when it should be.
PHYs 88X3310 (on MacchiatoBin) and 88E2010 do support 2.5G and 5GBASET,
but don't have 1.11.14 set. Their register 1.21 is filled with the
correct values, indicating 2.5G and 5G support.
PHYs 88E2110 do have their 1.11.14 bit set, as it should.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phy: marvell10g: Use a #define for 88X3310 family id
The PHY ID corresponding to the 88X3310 is also used for other PHYs in
the same family, such as the 88E2010. Use a #define for the PHY id, that
ignores the last nibble.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phy: marvell10g: Use 2500BASEX when using 2.5GBASET
The Marvell Alaska family of PHYs supports 2.5GBaseT and 5GBaseT modes,
as defined in the 802.3bz specification.
Upon establishing a 2.5GBASET link, the PHY will reconfigure it's MII
interface to 2500BASEX.
At 5G, the PHY will reconfigure it's interface to 5GBASE-R, but this
mode isn't supported by any MAC for now.
This was tested with :
- The 88X3310, which is on the MacchiatoBin
- The 88E2010, an Alaska PHY that has no fiber interfaces, and is
limited to 5G maximum speed.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phy: marvell10g: Use linkmode_set_bit helper instead of __set_bit
Cosmetic patch making use of helpers dedicated to linkmodes handling.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phy: marvell10g: Use get_features to get the PHY abilities
The Alaska family of 10G PHYs has more abilities than the ones listed in
PHY_10GBIT_FULL_FEATURES, the exact list depending on the model.
Make use of the newly introduced .get_features call to build this list,
using genphy_c45_pma_read_abilities to build the list of supported
linkmodes, and adding autoneg ability based on what's reported by the AN
MMD.
.config_init is still used to validate the interface_mode.
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 22 Feb 2019 21:59:38 +0000 (22:59 +0100)]
net: phy: check PMAPMD link status only in genphy_c45_read_link
The current code reports a link as up if all devices (except a few
blacklisted ones) report the link as up. This breaks Aquantia AQCS109
for lower speeds because on this PHY the PCS link status reflects a
10G link only. For Marvell there's a similar issue, therefore PHYXS
device isn't checked.
There may be more PHYs where depending on the mode the link status
of only selected devices is relevant.
For now it seems to be sufficient to check the link status of the
PMAPMD device only. Leave the loop in the code to be prepared in
case we have to add functionality to check more than one device,
depending on the mode.
Successfully tested on a board with an AQCS109.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Fri, 22 Feb 2019 20:31:33 +0000 (12:31 -0800)]
nfp: Remove switchdev.h inclusion
This is no longer necessary after a5084bb71fa4 ("nfp: Implement
ndo_get_port_parent_id()")
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 22 Feb 2019 18:25:59 +0000 (19:25 +0100)]
net: phy: improve definition of __ETHTOOL_LINK_MODE_MASK_NBITS
The way to define __ETHTOOL_LINK_MODE_MASK_NBITS seems to be overly
complicated, go with a standard approach instead.
Whilst we're at it, move the comment to the right place.
v2:
- rebased
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: protodown support for macvlan and vxlan
This patch series adds dev_change_proto_down_generic, a generic
implementation of ndo_change_proto_down, which sets the netdev carrier
state according to the new proto_down value.
This handler adds the ability to set protodown on macvlan and vxlan
interfaces in a generic way for use by control protocols like VRRPD.
Patch (1) introduces the handler in net/code/dev.c. Patch (2) and (3) add
support for change_proto_down in macvlan and vxlan drivers, respectively,
using the new function.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Roulin [Fri, 22 Feb 2019 18:06:38 +0000 (18:06 +0000)]
vxlan: add ndo_change_proto_down support
Add ndo_change_proto_down support through dev_change_proto_down_generic
for use by control protocols like VRRPD.
Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Roulin [Fri, 22 Feb 2019 18:06:37 +0000 (18:06 +0000)]
macvlan: add ndo_change_proto_down support
Add ndo_change_proto_down support through dev_change_proto_down_generic
for use by control protocols like VRRPD.
Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andy Roulin [Fri, 22 Feb 2019 18:06:36 +0000 (18:06 +0000)]
net: dev: add generic protodown handler
Introduce dev_change_proto_down_generic, a generic ndo_change_proto_down
implementation, which sets the netdev carrier state according to proto_down.
This adds the ability to set protodown on vxlan and macvlan devices in a
generic way for use by control protocols like VRRPD.
Signed-off-by: Andy Roulin <aroulin@cumulusnetworks.com> Acked-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
Add tests for unlocked flower classifier implementation
Implement tests for tdc testsuite to verify concurrent rules update with
rtnl-unlocked flower classifier implementation. The goal of these tests
is to verify general flower classifier correctness by updating filters
on same classifier instance in parallel and to verify its atomicity by
concurrently updating filters in same handle range. All three filter
update operations (add, replace, delete) are tested.
Existing script tdc_batch.py is re-used for batch file generation. It is
extended with several optional CLI arguments that are needed for
concurrency tests. Thin wrapper tdc_multibatch.py is implemented on top
of tdc_batch.py to simplify its usage when generating multiple batch
files for several test configurations.
Parallelism in tests is implemented by running multiple instances of tc
in batch mode with xargs tool. Xargs is chosen for its ease of use and
because it is available by default on most modern Linux distributions.
====================
Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Fri, 22 Feb 2019 14:00:47 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify parallel replace/delete
Implement test that runs 5 instances of tc replace filter in parallel with
5 instances of tc del filter from same tp instance. Each instance uses its
own filter handle and key range.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Fri, 22 Feb 2019 14:00:46 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify parallel add/delete
Implement test that runs 5 instances of tc add filter in parallel with 5
instances of tc del filter from same tp instance. Each instance uses its
own filter handle and key range.
Extend tdc_multibatch.py with additional options required to implement the
test: common prefix for all generated batch files, first value of filter
handle range, MAC address prefix modifier. These are necessary to allow
creating batch files with unique keys and handle ranges with multiple
invocation of tdc_multibatch.py helper script.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Fri, 22 Feb 2019 14:00:45 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify concurrent delete
Implement test that verifies concurrent deletion of rules by executing 10
tc instances that delete flower filters in same handle range. In this case
only one tc instance succeeds in deleting a filter with particular handle.
To mitigate expected failures of all other instances, run tc with 'force'
option to continue processing batch file in case of errors and expect xargs
to return code '123' that indicates that invocation of command(s) exited
with error in range 1-125.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Fri, 22 Feb 2019 14:00:41 +0000 (16:00 +0200)]
selftests: concurrency: add test to verify parallel rules insertion
Implement test that verifies parallel rules insertion by adding 1 million
flower filters with 10 concurrent tc instances. Put it to standalone
'concurrency' category.
Implement tdc_multibatch.py helper script that is used to generate multiple
batch files for concurrent tc execution. Extend config with new 'BATCH_DIR'
variable to specify temporary output directory that is used to store batch
files generated by tdc_multibatch.py.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vlad Buslov [Fri, 22 Feb 2019 14:00:40 +0000 (16:00 +0200)]
selftests: tdc_batch.py: add options needed for concurrency tests
Extend tdc_batch.py with several optional CLI arguments that are used for
implementation of concurrency tests in following patches in this set:
- Add optional argument to specify range of filter handles used in batch
file [fitler_handle, filter_handle+number). This is needed for testing
filter deletion where it is necessary to know exact handles of configured
filters.
- Add optional argument to specify filter operation type (possible values
are ['add', 'del', 'replace']) instead of hardcoded "add" value. This
allows generation of batches for filter addition, deletion and
replacement.
- Add optional argument to allow user to change mac address prefix that is
used for all filters in batch. This is necessary to allow generating
multiple batches with unique flower classifier keys.
Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: Skip GSO length estimation if transport header is not set
qdisc_pkt_len_init expects transport_header to be set for GSO packets.
Patch [1] skips transport_header validation for GSO packets that don't
have network_header set at the moment of calling virtio_net_hdr_to_skb,
and allows them to pass into the stack. After patch [2] no placeholder
value is assigned to transport_header if dissection fails, so this patch
adds a check to the place where the value of transport_header is used.
Signed-off-by: Maxim Mikityanskiy <maximmi@mellanox.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Russell King [Fri, 22 Feb 2019 11:31:41 +0000 (11:31 +0000)]
net: phylink: update mac_config() documentation
A detail for mac_config() had been missed in the documentation for the
method - it is expected that the method will update the MAC to the
settings, rather than completely reprogram the MAC on each call.
Update the documentation for this method for this detail.
Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 22 Feb 2019 07:23:04 +0000 (08:23 +0100)]
net: phy: let genphy_c45_read_abilities also check aneg capability
When using genphy_c45_read_abilities() as get_features callback we
also have to set the autoneg capability in phydev->supported.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 24 Feb 2019 17:47:07 +0000 (09:47 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull KVM fixes from Paolo Bonzini:
"Bug fixes"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: MMU: record maximum physical address width in kvm_mmu_extended_role
kvm: x86: Return LA57 feature based on hardware capability
x86/kvm/mmu: fix switch between root and guest MMUs
s390: vsie: Use effective CRYCBD.31 to check CRYCBD validity
Pull networking fixes from David Miller:
"Hopefully the last pull request for this release. Fingers crossed:
1) Only refcount ESP stats on full sockets, from Martin Willi.
2) Missing barriers in AF_UNIX, from Al Viro.
3) RCU protection fixes in ipv6 route code, from Paolo Abeni.
4) Avoid false positives in untrusted GSO validation, from Willem de
Bruijn.
5) Forwarded mesh packets in mac80211 need more tailroom allocated,
from Felix Fietkau.
6) Use operstate consistently for linkup in team driver, from George
Wilkie.
7) ThunderX bug fixes from Vadim Lomovtsev. Mostly races between VF
and PF code paths.
8) Purge ipv6 exceptions during netdevice removal, from Paolo Abeni.
9) nfp eBPF code gen fixes from Jiong Wang.
10) bnxt_en firmware timeout fix from Michael Chan.
11) Use after free in udp/udpv6 error handlers, from Paolo Abeni.
12) Fix a race in x25_bind triggerable by syzbot, from Eric Dumazet"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (65 commits)
net: phy: realtek: Dummy IRQ calls for RTL8366RB
tcp: repaired skbs must init their tso_segs
net/x25: fix a race in x25_bind()
net: dsa: Remove documentation for port_fdb_prepare
Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"
selftests: fib_tests: sleep after changing carrier. again.
net: set static variable an initial value in atl2_probe()
net: phy: marvell10g: Fix Multi-G advertisement to only advertise 10G
bpf, doc: add bpf list as secondary entry to maintainers file
udp: fix possible user after free in error handler
udpv6: fix possible user after free in error handler
fou6: fix proto error handler argument type
udpv6: add the required annotation to mib type
mdio_bus: Fix use-after-free on device_register fails
net: Set rtm_table to RT_TABLE_COMPAT for ipv6 for tables > 255
bnxt_en: Wait longer for the firmware message response to complete.
bnxt_en: Fix typo in firmware message timeout logic.
nfp: bpf: fix ALU32 high bits clearance bug
nfp: bpf: fix code-gen bug on BPF_ALU | BPF_XOR | BPF_K
Documentation: networking: switchdev: Update port parent ID section
...
Roopa Prabhu [Sun, 24 Feb 2019 06:25:12 +0000 (22:25 -0800)]
trace: events: neigh_update: print new state in string format
Also, extend neigh_state_str to include neigh dummy states
noarp and permanent
Fixes: 9c03b282badb ("trace: events: add a few neigh tracepoints") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Walleij [Sun, 24 Feb 2019 00:11:15 +0000 (01:11 +0100)]
net: phy: realtek: Dummy IRQ calls for RTL8366RB
This fixes a regression introduced by
commit 0d2e778e38e0ddffab4bb2b0e9ed2ad5165c4bf7
"net: phy: replace PHY_HAS_INTERRUPT with a check for
config_intr and ack_interrupt".
This assumes that a PHY cannot trigger interrupt unless
it has .config_intr() or .ack_interrupt() implemented.
A later patch makes the code assume both need to be
implemented for interrupts to be present.
But this PHY (which is inside a DSA) will happily
fire interrupts without either callback.
Implement dummy callbacks for .config_intr() and
.ack_interrupt() in the phy header to fix this.
Tested on the RTL8366RB on D-Link DIR-685.
Fixes: 0d2e778e38e0 ("net: phy: replace PHY_HAS_INTERRUPT with a check for config_intr and ack_interrupt") Cc: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Sat, 23 Feb 2019 21:24:59 +0000 (13:24 -0800)]
net/x25: fix a race in x25_bind()
syzbot was able to trigger another soft lockup [1]
I first thought it was the O(N^2) issue I mentioned in my
prior fix (f657d22ee1f "net/x25: do not hold the cpu
too long in x25_new_lci()"), but I eventually found
that x25_bind() was not checking SOCK_ZAPPED state under
socket lock protection.
This means that multiple threads can end up calling
x25_insert_socket() for the same socket, and corrupt x25_list
Fixes: 90c27297a9bf ("X.25 remove bkl in bind") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: andrew hendry <andrew.hendry@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Hangbin Liu [Fri, 22 Feb 2019 13:22:32 +0000 (21:22 +0800)]
Revert "bridge: do not add port to router list when receives query with source 0.0.0.0"
This reverts commit 5a2de63fd1a5 ("bridge: do not add port to router list
when receives query with source 0.0.0.0") and commit 0fe5119e267f ("net:
bridge: remove ipv6 zero address check in mcast queries")
The reason is RFC 4541 is not a standard but suggestive. Currently we
will elect 0.0.0.0 as Querier if there is no ip address configured on
bridge. If we do not add the port which recives query with source
0.0.0.0 to router list, the IGMP reports will not be about to forward
to Querier, IGMP data will also not be able to forward to dest.
As Nikolay suggested, revert this change first and add a boolopt api
to disable none-zero election in future if needed.
Reported-by: Linus Lüssing <linus.luessing@c0d3.blue> Reported-by: Sebastian Gottschall <s.gottschall@newmedia-net.de> Fixes: 5a2de63fd1a5 ("bridge: do not add port to router list when receives query with source 0.0.0.0") Fixes: 0fe5119e267f ("net: bridge: remove ipv6 zero address check in mcast queries") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Acked-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
selftests: fib_tests: sleep after changing carrier. again.
Just like commit e2ba732a1681 ("selftests: fib_tests: sleep after
changing carrier"), wait one second to allow linkwatch to propagate the
carrier change to the stack.
There are two sets of carrier tests. The first slept after the carrier
was set to off, and when the second set ran, it was likely that the
linkwatch would be able to run again without much delay, reducing the
likelihood of a race. However, if you run 'fib_tests.sh -t carrier' on a
loop, you will quickly notice the failures.
Sleeping on the second set of tests make the failures go away.
Cc: David Ahern <dsahern@gmail.com> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Reviewed-by: David Ahern <dsahern@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 22 Feb 2019 22:50:49 +0000 (23:50 +0100)]
net: phy: don't change modes we don't care about in genphy_c45_read_lpa
Because 1000BaseT isn't covered by Clause 45, the 1000BaseT flags in
phydev->lp_advertising may have been set based on vendor registers
already. genphy_c45_read_lpa() would clear these flags as of today.
Therefore switch to mii_lpa_mod_linkmode_lpa_t.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrew Lunn [Fri, 22 Feb 2019 22:49:54 +0000 (23:49 +0100)]
net: phy: aquantia: add support for auto-negotiation configuration
Make use of the generic c45 code, plus code specific to the Aquantia
phy for 1000BaseT negotiation.
Signed-off-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Heiner Kallweit [Fri, 22 Feb 2019 22:48:14 +0000 (23:48 +0100)]
net: phy: aquantia: remove false 5G and 10G speed ability for AQCS109
AQCS109 belongs to a family of PHY's where certain members don't
support 5G or 10G. However for all members of the family the chip
reports 10G and 5G capability. Therefore remove the not supported
modes for AQCS109.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
mlxsw: Add support for new port types and speeds for Spectrum-2
Shalom says:
This patchset adds support for new port types and speeds for Spectrum-2.
Patch #1 + #2 removes an unsupported PTYS field and a duplicate link
mode entry.
Patch #3 queries port's connector type from firmware instead of deriving
it from port admin state.
Patch #4 renames functions which relate to port type-speed to be
Spectrum-1 specific.
Patch #5 defines port type-speed operations and applies it for
Spectrum-1.
Patch #6 + #7 are small renaming and cosmetic changes.
Patch #8 adds new port type-speed fields for PTYS register. These new
fields extend the existing ones in order to support more types and
speeds.
Patch #9 adds Spectrum-2 support for port type-speed operations.
Patch #10 adds Spectrum-2 new port types and speeds.
For Spectrum-2, the user must configure all the types per speed if he /
she wants a specific speed to be advertised. For example, if the user
wants to advertise 100Gbps 4-lanes speed, the following ethtool bits
should be advertised:
Supported ethtool bits for 100Gbps 4-lanes:
0x1000000000 100000baseKR4 Full
0x2000000000 100000baseSR4 Full
0x4000000000 100000baseCR4 Full
0x8000000000 100000baseLR4_ER4 Full