Alex Vesker [Sun, 10 Nov 2019 13:39:36 +0000 (15:39 +0200)]
net/mlx5: DR, Limit STE hash table enlarge based on bytemask
When an ste hash table has too many collision we enlarge it
to a bigger hash table (rehash). Rehashing collision improvement
depends on the bytemask value. The more 1 bits we have in bytemask
means better spreading in the table.
Without this fix tables can grow in size without providing any
improvement which can lead to memory depletion and failures.
This patch will limit table rehash to reduce memory and improve
the performance.
Vlad Buslov [Thu, 7 Nov 2019 11:37:57 +0000 (13:37 +0200)]
net/mlx5e: Reorder mirrer action parsing to check for encap first
Mirred action parsing code in parse_tc_fdb_actions() first checks if
out_dev has same parent id, and only verifies that there is a pending encap
action that was parsed before. Recent change in vxlan module made function
netdev_port_same_parent_id() to return true when called for mlx5 eswitch
representor and vxlan device created explicitly on mlx5 representor
device (vxlan devices created with "external" flag without explicitly
specifying parent interface are not affected). With call to
netdev_port_same_parent_id() returning true, incorrect code path is chosen
and encap rules fail to offload because vxlan dev is not a valid eswitch
forwarding dev. Dmesg log of error:
[ 1784.389797] devices ens1f0_0 vxlan1 not on same switch HW, can't offload forwarding
In order to fix the issue, rearrange conditional in parse_tc_fdb_actions()
to check for pending encap action before checking if out_dev has the same
parent id.
Fixes: 0ce1822c2a08 ("vxlan: add adjacent link to limit depth level") Signed-off-by: Vlad Buslov <vladbu@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Eli Cohen [Thu, 7 Nov 2019 07:07:34 +0000 (09:07 +0200)]
net/mlx5e: Fix ingress rate configuration for representors
Current code uses the old method of prio encoding in
flow_cls_common_offload. Fix to follow the changes introduced in
commit ef01adae0e43 ("net: sched: use major priority number as hardware priority").
Fixes: fcb64c0f5640 ("net/mlx5: E-Switch, add ingress rate support") Signed-off-by: Eli Cohen <eli@mellanox.com> Reviewed-by: Roi Dayan <roid@mellanox.com> Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Commit 1d4639567d97 ("mdio_bus: Fix PTR_ERR applied after initialization
to constant") accidentally changed a check from -ENOTSUPP to -ENOSYS,
causing failures if reset controller support is not enabled. E.g. on
r7s72100/rskrza1:
sh-eth e8203000.ethernet: MDIO init failed: -524
sh-eth: probe of e8203000.ethernet failed with error -524
Seen on r8a7740/armadillo, r7s72100/rskrza1, and r7s9210/rza2mevb.
Fixes: 1d4639567d97 ("mdio_bus: Fix PTR_ERR applied after initialization to constant") Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be> Cc: YueHaibing <yuehaibing@huawei.com> Cc: David S. Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Wed, 20 Nov 2019 02:07:15 +0000 (10:07 +0800)]
net: hns3: fix a wrong reset interrupt status mask
According to hardware user manual, bits5~7 in register
HCLGE_MISC_VECTOR_INT_STS means reset interrupts status,
but HCLGE_RESET_INT_M is defined as bits0~2 now. So it
will make hclge_reset_err_handle() read the wrong reset
interrupt status.
This patch fixes this wrong bit mask.
Fixes: 2336f19d7892 ("net: hns3: check reset interrupt status when reset fails") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Wed, 20 Nov 2019 01:25:13 +0000 (09:25 +0800)]
net: fec: fix clock count mis-match
pm_runtime_put_autosuspend in probe will call runtime suspend to
disable clks automatically if CONFIG_PM is defined. (If CONFIG_PM
is not defined, its implementation will be empty, then runtime
suspend will not be called.)
Therefore, we can call pm_runtime_get_sync to runtime resume it
first to enable clks, which matches the runtime suspend. (Only when
CONFIG_PM is defined, otherwise pm_runtime_get_sync will also be
empty, then runtime resume will not be called.)
Then it is fine to disable clks without causing clock count mis-match.
Fixes: c43eab3eddb4 ("net: fec: add missed clk_disable_unprepare in remove") Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Davide Caratti [Tue, 19 Nov 2019 22:47:33 +0000 (23:47 +0100)]
net/sched: act_pedit: fix WARN() in the traffic path
when configuring act_pedit rules, the number of keys is validated only on
addition of a new entry. This is not sufficient to avoid hitting a WARN()
in the traffic path: for example, it is possible to replace a valid entry
with a new one having 0 extended keys, thus causing splats in dmesg like:
Russell King [Tue, 19 Nov 2019 17:28:14 +0000 (17:28 +0000)]
net: phylink: fix link mode modification in PHY mode
Modifying the link settings via phylink_ethtool_ksettings_set() and
phylink_ethtool_set_pauseparam() didn't always work as intended for
PHY based setups, as calling phylink_mac_config() would result in the
unresolved configuration being committed to the MAC, rather than the
configuration with the speed and duplex setting.
This would work fine if the update caused the link to renegotiate,
but if no settings have changed, phylib won't trigger a renegotiation
cycle, and the MAC will be left incorrectly configured.
Avoid calling phylink_mac_config() unless we are using an inband mode
in phylink_ethtool_ksettings_set(), and use phy_set_asym_pause() as
introduced in 4.20 to set the PHY settings in
phylink_ethtool_set_pauseparam().
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
Corinna Vinschen [Tue, 19 Nov 2019 09:09:39 +0000 (10:09 +0100)]
r8169: disable TSO on a single version of RTL8168c to fix performance
During performance testing, I found that one of my r8169 NICs suffered
a major performance loss, a 8168c model.
Running netperf's TCP_STREAM test didn't return the expected
throughput of > 900 Mb/s, but rather only about 22 Mb/s. Strange
enough, running the TCP_MAERTS and UDP_STREAM tests all returned with
throughput > 900 Mb/s, as did TCP_STREAM with the other r8169 NICs I can
test (either one of 8169s, 8168e, 8168f).
I added my 8168c version, RTL_GIGA_MAC_VER_22, to the code
special-casing the 8168evl as per the patch below. This fixed the
performance problem for me.
Fixes: 93681cd7d94f ("r8169: enable HW csum and TSO") Signed-off-by: Corinna Vinschen <vinschen@redhat.com> Reviewed-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
I prefer to use my personal email address for kernel related work.
Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Acked-by: Rain River <rain.1986.08.12@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ivan Khoronzhuk [Tue, 19 Nov 2019 00:23:12 +0000 (02:23 +0200)]
taprio: don't reject same mqprio settings
The taprio qdisc allows to set mqprio setting but only once. In case
if mqprio settings are provided next time the error is returned as
it's not allowed to change traffic class mapping in-flignt and that
is normal. But if configuration is absolutely the same - no need to
return error. It allows to provide same command couple times,
changing only base time for instance, or changing only scheds maps,
but leaving mqprio setting w/o modification. It more corresponds the
message: "Changing the traffic mapping of a running schedule is not
supported", so reject mqprio if it's really changed.
Also corrected TC_BITMASK + 1 for consistency, as proposed.
Fixes: a3d43c0d56f1 ("taprio: Add support adding an admin schedule") Reviewed-by: Vladimir Oltean <olteanv@gmail.com> Tested-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Ivan Khoronzhuk <ivan.khoronzhuk@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Adi Suresh [Tue, 19 Nov 2019 16:02:47 +0000 (08:02 -0800)]
gve: fix dma sync bug where not all pages synced
The previous commit had a bug where the last page in the memory range
could not be synced. This change fixes the behavior so that all the
required pages are synced.
Fixes: 9cfeeb576d49 ("gve: Fixes DMA synchronization") Signed-off-by: Adi Suresh <adisuresh@google.com> Reviewed-by: Catherine Sullivan <csully@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Behún [Mon, 18 Nov 2019 18:15:05 +0000 (19:15 +0100)]
mdio_bus: fix mdio_register_device when RESET_CONTROLLER is disabled
When CONFIG_RESET_CONTROLLER is disabled, the
devm_reset_control_get_exclusive function returns -ENOTSUPP. This is not
handled in subsequent check and then the mdio device fails to probe.
When CONFIG_RESET_CONTROLLER is enabled, its code checks in OF for reset
device, and since it is not present, returns -ENOENT. -ENOENT is handled.
Add -ENOTSUPP also.
This happened to me when upgrading kernel on Turris Omnia. You either
have to enable CONFIG_RESET_CONTROLLER or use this patch.
Signed-off-by: Marek Behún <marek.behun@nic.cz> Fixes: 71dd6c0dff51b ("net: phy: add support for reset-controller") Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
net/ipv4: fix sysctl max for fib_multipath_hash_policy
Commit eec4844fae7c ("proc/sysctl: add shared variables for range
check") did:
- .extra2 = &two,
+ .extra2 = SYSCTL_ONE,
here, which doesn't seem to be intentional, given the changelog.
This patch restores it to the previous, as the value of 2 still makes
sense (used in fib_multipath_hash()).
Fixes: eec4844fae7c ("proc/sysctl: add shared variables for range check") Cc: Matteo Croce <mcroce@redhat.com> Signed-off-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Acked-by: Matteo Croce <mcroce@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tariq Toukan [Mon, 18 Nov 2019 09:41:04 +0000 (11:41 +0200)]
net/mlx4_en: Fix wrong limitation for number of TX rings
XDP_TX rings should not be limited by max_num_tx_rings_p_up.
To make sure total number of TX rings never exceed MAX_TX_RINGS,
add similar check in mlx4_en_alloc_tx_queue_per_tc(), where
a new value is assigned for num_up.
Fixes: 7e1dc5e926d5 ("net/mlx4_en: Limit the number of TX rings") Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Mon, 18 Nov 2019 09:39:34 +0000 (17:39 +0800)]
net: sched: ensure opts_len <= IP_TUNNEL_OPTS_MAX in act_tunnel_key
info->options_len is 'u8' type, and when opts_len with a value >
IP_TUNNEL_OPTS_MAX, 'info->options_len = opts_len' will cast int
to u8 and set a wrong value to info->options_len.
Kernel crashed in my test when doing:
# opts="0102:80:00800022"
# for i in {1..99}; do opts="$opts,0102:80:00800022"; done
# ip link add name geneve0 type geneve dstport 0 external
# tc qdisc add dev eth0 ingress
# tc filter add dev eth0 protocol ip parent ffff: \
flower indev eth0 ip_proto udp action tunnel_key \
set src_ip 10.0.99.192 dst_ip 10.0.99.193 \
dst_port 6081 id 11 geneve_opts $opts \
action mirred egress redirect dev geneve0
So we should do the similar check as cls_flower does, return error
when opts_len > IP_TUNNEL_OPTS_MAX in tunnel_key_copy_opts().
Fixes: 0ed5269f9e41 ("net/sched: add tunnel option support to act_tunnel_key") Signed-off-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Petr Machata [Mon, 18 Nov 2019 07:18:42 +0000 (09:18 +0200)]
mlxsw: spectrum_router: Fix determining underlay for a GRE tunnel
The helper mlxsw_sp_ipip_dev_ul_tb_id() determines the underlay VRF of a
GRE tunnel. For a tunnel without a bound device, it uses the same VRF that
the tunnel is in. However in Linux, a GRE tunnel without a bound device
uses the main VRF as the underlay. Fix the function accordingly.
mlxsw further assumed that moving a tunnel to a different VRF could cause
conflict in local tunnel endpoint address, which cannot be offloaded.
However, the only way that an underlay could be changed by moving the
tunnel device itself is if the tunnel device does not have a bound device.
But in that case the underlay is always the main VRF, so there is no
opportunity to introduce a conflict by moving such device. Thus this check
constitutes a dead code, and can be removed, which do.
Fixes: 6ddb7426a7d4 ("mlxsw: spectrum_router: Introduce loopback RIFs") Signed-off-by: Petr Machata <petrm@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Aditya Pakki [Sun, 17 Nov 2019 20:28:36 +0000 (14:28 -0600)]
net: atm: Reduce the severity of logging in unlink_clip_vcc
In case of errors in unlink_clip_vcc, the logging level is set to
pr_crit but failures in clip_setentry are handled by pr_err().
The patch changes the severity consistent across invocations.
Signed-off-by: Aditya Pakki <pakki001@umn.edu> Signed-off-by: David S. Miller <davem@davemloft.net>
Luigi Rizzo [Fri, 15 Nov 2019 20:12:25 +0000 (12:12 -0800)]
net/mlx4_en: fix mlx4 ethtool -N insertion
ethtool expects ETHTOOL_GRXCLSRLALL to set ethtool_rxnfc->data with the
total number of entries in the rx classifier table. Surprisingly, mlx4
is missing this part (in principle ethtool could still move forward and
try the insert).
Tested: compiled and run command:
phh13:~# ethtool -N eth1 flow-type udp4 queue 4
Added rule with ID 255
Signed-off-by: Luigi Rizzo <lrizzo@google.com> Reviewed-by: Tariq Toukan <tariqt@mellanox.com>
Linus Torvalds [Sun, 17 Nov 2019 02:14:32 +0000 (18:14 -0800)]
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fix from Herbert Xu:
"This reverts a number of changes to the khwrng thread which feeds the
kernel random number pool from hwrng drivers. They were trying to fix
issues with suspend-and-resume but ended up causing regressions"
* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
Revert "hwrng: core - Freeze khwrng thread during suspend"
Herbert Xu [Sun, 17 Nov 2019 00:48:17 +0000 (08:48 +0800)]
Revert "hwrng: core - Freeze khwrng thread during suspend"
This reverts commit 03a3bb7ae631 ("hwrng: core - Freeze khwrng
thread during suspend"), ff296293b353 ("random: Support freezable
kthreads in add_hwgenerator_randomness()") and 59b569480dc8 ("random:
Use wait_event_freezable() in add_hwgenerator_randomness()").
These patches introduced regressions and we need more time to
get them ready for mainline.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
1) Fix memory leak in xfrm_state code, from Steffen Klassert.
2) Fix races between devlink reload operations and device
setup/cleanup, from Jiri Pirko.
3) Null deref in NFC code, from Stephan Gerhold.
4) Refcount fixes in SMC, from Ursula Braun.
5) Memory leak in slcan open error paths, from Jouni Hogander.
6) Fix ETS bandwidth validation in hns3, from Yonglong Liu.
7) Info leak on short USB request answers in ax88172a driver, from
Oliver Neukum.
8) Release mem region properly in ep93xx_eth, from Chuhong Yuan.
9) PTP config timestamp flags validation, from Richard Cochran.
10) Dangling pointers after SKB data realloc in seg6, from Andrea Mayer.
11) Missing free_netdev() in gemini driver, from Chuhong Yuan.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (56 commits)
ipmr: Fix skb headroom in ipmr_get_route().
net: hns3: cleanup of stray struct hns3_link_mode_mapping
net/smc: fix fastopen for non-blocking connect()
rds: ib: update WR sizes when bringing up connection
net: gemini: add missed free_netdev
net: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvid
seg6: fix skb transport_header after decap_and_validate()
seg6: fix srh pointer in get_srh()
net: stmmac: Use the correct style for SPDX License Identifier
octeontx2-af: Use the correct style for SPDX License Identifier
ptp: Extend the test program to check the external time stamp flags.
mlx5: Reject requests to enable time stamping on both edges.
igb: Reject requests that fail to enable time stamping on both edges.
dp83640: Reject requests to enable time stamping on both edges.
mv88e6xxx: Reject requests to enable time stamping on both edges.
ptp: Introduce strict checking of external time stamp options.
renesas: reject unsupported external timestamp flags
mlx5: reject unsupported external timestamp flags
igb: reject unsupported external timestamp flags
dp83640: reject unsupported external timestamp flags
...
Guillaume Nault [Fri, 15 Nov 2019 17:29:52 +0000 (18:29 +0100)]
ipmr: Fix skb headroom in ipmr_get_route().
In route.c, inet_rtm_getroute_build_skb() creates an skb with no
headroom. This skb is then used by inet_rtm_getroute() which may pass
it to rt_fill_info() and, from there, to ipmr_get_route(). The later
might try to reuse this skb by cloning it and prepending an IPv4
header. But since the original skb has no headroom, skb_push() triggers
skb_under_panic():
Actually the original skb used to have enough headroom, but the
reserve_skb() call was lost with the introduction of
inet_rtm_getroute_build_skb() by commit 404eb77ea766 ("ipv4: support
sport, dport and ip_proto in RTM_GETROUTE").
We could reserve some headroom again in inet_rtm_getroute_build_skb(),
but this function shouldn't be responsible for handling the special
case of ipmr_get_route(). Let's handle that directly in
ipmr_get_route() by calling skb_realloc_headroom() instead of
skb_clone().
Fixes: 404eb77ea766 ("ipv4: support sport, dport and ip_proto in RTM_GETROUTE") Signed-off-by: Guillaume Nault <gnault@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ursula Braun [Fri, 15 Nov 2019 11:39:30 +0000 (12:39 +0100)]
net/smc: fix fastopen for non-blocking connect()
FASTOPEN does not work with SMC-sockets. Since SMC allows fallback to
TCP native during connection start, the FASTOPEN setsockopts trigger
this fallback, if the SMC-socket is still in state SMC_INIT.
But if a FASTOPEN setsockopt is called after a non-blocking connect(),
this is broken, and fallback does not make sense.
This change complements
commit cd2063604ea6 ("net/smc: avoid fallback in case of non-blocking connect")
and fixes the syzbot reported problem "WARNING in smc_unhash_sk".
Reported-by: syzbot+8488cc4cf1c9e09b8b86@syzkaller.appspotmail.com Fixes: e1bbdd570474 ("net/smc: reduce sock_put() for fallback sockets") Signed-off-by: Ursula Braun <ubraun@linux.ibm.com> Signed-off-by: Karsten Graul <kgraul@linux.ibm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dag Moxnes [Fri, 15 Nov 2019 08:56:01 +0000 (09:56 +0100)]
rds: ib: update WR sizes when bringing up connection
Currently WR sizes are updated from rds_ib_sysctl_max_send_wr and
rds_ib_sysctl_max_recv_wr when a connection is shut down. As a result,
a connection being down while rds_ib_sysctl_max_send_wr or
rds_ib_sysctl_max_recv_wr are updated, will not update the sizes when
it comes back up.
Move resizing of WRs to rds_ib_setup_qp so that connections will be setup
with the most current WR sizes.
Signed-off-by: Dag Moxnes <dag.moxnes@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Fri, 15 Nov 2019 06:24:54 +0000 (14:24 +0800)]
net: gemini: add missed free_netdev
This driver forgets to free allocated netdev in remove like
what is done in probe failure.
Add the free to fix it.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Sat, 16 Nov 2019 16:08:25 +0000 (18:08 +0200)]
net: dsa: tag_8021q: Fix dsa_8021q_restore_pvid for an absent pvid
This sequence of operations:
ip link set dev br0 type bridge vlan_filtering 1
bridge vlan del dev swp2 vid 1
ip link set dev br0 type bridge vlan_filtering 1
ip link set dev br0 type bridge vlan_filtering 0
apparently fails with the message:
[ 31.305716] sja1105 spi0.1: Reset switch and programmed static config. Reason: VLAN filtering
[ 31.322161] sja1105 spi0.1: Couldn't determine PVID attributes (pvid 0)
[ 31.328939] sja1105 spi0.1: Failed to setup VLAN tagging for port 1: -2
[ 31.335599] ------------[ cut here ]------------
[ 31.340215] WARNING: CPU: 1 PID: 194 at net/switchdev/switchdev.c:157 switchdev_port_attr_set_now+0x9c/0xa4
[ 31.349981] br0: Commit of attribute (id=6) failed.
[ 31.354890] Modules linked in:
[ 31.357942] CPU: 1 PID: 194 Comm: ip Not tainted 5.4.0-rc6-01792-gf4f632e07665-dirty #2062
[ 31.366167] Hardware name: Freescale LS1021A
[ 31.370437] [<c03144dc>] (unwind_backtrace) from [<c030e184>] (show_stack+0x10/0x14)
[ 31.378153] [<c030e184>] (show_stack) from [<c11d1c1c>] (dump_stack+0xe0/0x10c)
[ 31.385437] [<c11d1c1c>] (dump_stack) from [<c034c730>] (__warn+0xf4/0x10c)
[ 31.392373] [<c034c730>] (__warn) from [<c034c7bc>] (warn_slowpath_fmt+0x74/0xb8)
[ 31.399827] [<c034c7bc>] (warn_slowpath_fmt) from [<c11ca204>] (switchdev_port_attr_set_now+0x9c/0xa4)
[ 31.409097] [<c11ca204>] (switchdev_port_attr_set_now) from [<c117036c>] (__br_vlan_filter_toggle+0x6c/0x118)
[ 31.418971] [<c117036c>] (__br_vlan_filter_toggle) from [<c115d010>] (br_changelink+0xf8/0x518)
[ 31.427637] [<c115d010>] (br_changelink) from [<c0f8e9ec>] (__rtnl_newlink+0x3f4/0x76c)
[ 31.435613] [<c0f8e9ec>] (__rtnl_newlink) from [<c0f8eda8>] (rtnl_newlink+0x44/0x60)
[ 31.443329] [<c0f8eda8>] (rtnl_newlink) from [<c0f89f20>] (rtnetlink_rcv_msg+0x2cc/0x51c)
[ 31.451477] [<c0f89f20>] (rtnetlink_rcv_msg) from [<c1008df8>] (netlink_rcv_skb+0xb8/0x110)
[ 31.459796] [<c1008df8>] (netlink_rcv_skb) from [<c1008648>] (netlink_unicast+0x17c/0x1f8)
[ 31.468026] [<c1008648>] (netlink_unicast) from [<c1008980>] (netlink_sendmsg+0x2bc/0x3b4)
[ 31.476261] [<c1008980>] (netlink_sendmsg) from [<c0f43858>] (___sys_sendmsg+0x230/0x250)
[ 31.484408] [<c0f43858>] (___sys_sendmsg) from [<c0f44c84>] (__sys_sendmsg+0x50/0x8c)
[ 31.492209] [<c0f44c84>] (__sys_sendmsg) from [<c0301000>] (ret_fast_syscall+0x0/0x28)
[ 31.500090] Exception stack(0xedf47fa8 to 0xedf47ff0)
[ 31.505122] 7fa0: 00000002b6f2e06000000003beabd6a40000000000000000
[ 31.513265] 7fc0: 00000002b6f2e0605d6e321300000128000000000000000100000006000619c4
[ 31.521405] 7fe0: 00086078beabd6580005edbcb6e7ce68
Since VID 0 is an invalid pvid from the bridge's point of view, let's
add this check in dsa_8021q_restore_pvid to avoid restoring a pvid that
doesn't really exist.
Fixes: 5f33183b7fdf ("net: dsa: tag_8021q: Restore bridge VLANs when enabling vlan_filtering") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrea Mayer [Sat, 16 Nov 2019 15:05:53 +0000 (16:05 +0100)]
seg6: fix skb transport_header after decap_and_validate()
in the receive path (more precisely in ip6_rcv_core()) the
skb->transport_header is set to skb->network_header + sizeof(*hdr). As a
consequence, after routing operations, destination input expects to find
skb->transport_header correctly set to the next protocol (or extension
header) that follows the network protocol. However, decap behaviors (DX*,
DT*) remove the outer IPv6 and SRH extension and do not set again the
skb->transport_header pointer correctly. For this reason, the patch sets
the skb->transport_header to the skb->network_header + sizeof(hdr) in each
DX* and DT* behavior.
Signed-off-by: Andrea Mayer <andrea.mayer@uniroma2.it> Signed-off-by: David S. Miller <davem@davemloft.net>
Nishad Kamdar [Sat, 16 Nov 2019 09:40:59 +0000 (15:10 +0530)]
net: stmmac: Use the correct style for SPDX License Identifier
This patch corrects the SPDX License Identifier style in
header files related to STMicroelectronics based Multi-Gigabit
Ethernet driver. For C header files Documentation/process/license-rules.rst
mandates C-like comments (opposed to C source files where
C++ style should be used).
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.
Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Nishad Kamdar [Sat, 16 Nov 2019 09:20:45 +0000 (14:50 +0530)]
octeontx2-af: Use the correct style for SPDX License Identifier
This patch corrects the SPDX License Identifier style in
header files related to Marvell OcteonTX2 network devices.
It uses an expilict block comment for the SPDX License
Identifier.
Changes made by using a script provided by Joe Perches here:
https://lkml.org/lkml/2019/2/7/46.
Suggested-by: Joe Perches <joe@perches.com> Signed-off-by: Nishad Kamdar <nishadkamdar@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sat, 16 Nov 2019 16:20:43 +0000 (08:20 -0800)]
Merge branch 'akpm' (patches from Andrew)
Merge misc fixes from Andrew Morton:
"11 fixes"
MM fixes and one xz decompressor fix.
* emailed patches from Andrew Morton <akpm@linux-foundation.org>:
mm/debug.c: PageAnon() is true for PageKsm() pages
mm/debug.c: __dump_page() prints an extra line
mm/page_io.c: do not free shared swap slots
mm/memory_hotplug: fix try_offline_node()
mm,thp: recheck each page before collapsing file THP
mm: slub: really fix slab walking for init_on_free
mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup()
mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
lib/xz: fix XZ_DYNALLOC to avoid useless memory reallocations
mm: fix trying to reclaim unevictable lru page when calling madvise_pageout
mm: mempolicy: fix the wrong return value and potential pages leak of mbind
Ralph Campbell [Sat, 16 Nov 2019 01:35:07 +0000 (17:35 -0800)]
mm/debug.c: PageAnon() is true for PageKsm() pages
PageAnon() and PageKsm() use the low two bits of the page->mapping
pointer to indicate the page type. PageAnon() only checks the LSB while
PageKsm() checks the least significant 2 bits are equal to 3.
Therefore, PageAnon() is true for KSM pages. __dump_page() incorrectly
will never print "ksm" because it checks PageAnon() first. Fix this by
checking PageKsm() first.
Link: http://lkml.kernel.org/r/20191113000651.20677-1-rcampbell@nvidia.com Fixes: 1c6fb1d89e73 ("mm: print more information about mapping in __dump_page") Signed-off-by: Ralph Campbell <rcampbell@nvidia.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Jerome Glisse <jglisse@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It looks like the intent was to use pr_cont() for printing "flags:" but
pr_cont() usage is discouraged so fix this by extending the format to
include the flags into a single line:
Vinayak Menon [Sat, 16 Nov 2019 01:35:00 +0000 (17:35 -0800)]
mm/page_io.c: do not free shared swap slots
The following race is observed due to which a processes faulting on a
swap entry, finds the page neither in swapcache nor swap. This causes
zram to give a zero filled page that gets mapped to the process,
resulting in a user space crash later.
Consider parent and child processes Pa and Pb sharing the same swap slot
with swap_count 2. Swap is on zram with SWP_SYNCHRONOUS_IO set.
Virtual address 'VA' of Pa and Pb points to the shared swap entry.
Pa Pb
fault on VA fault on VA
do_swap_page do_swap_page
lookup_swap_cache fails lookup_swap_cache fails
Pb scheduled out
swapin_readahead (deletes zram entry)
swap_free (makes swap_count 1)
Pb scheduled in
swap_readpage (swap_count == 1)
Takes SWP_SYNCHRONOUS_IO path
zram enrty absent
zram gives a zero filled page
Fix this by making sure that swap slot is freed only when swap count
drops down to one.
Link: http://lkml.kernel.org/r/1571743294-14285-1-git-send-email-vinmenon@codeaurora.org Fixes: aa8d22a11da9 ("mm: swap: SWP_SYNCHRONOUS_IO: skip swapcache only if swapped page has no other reference") Signed-off-by: Vinayak Menon <vinmenon@codeaurora.org> Suggested-by: Minchan Kim <minchan@google.com> Acked-by: Minchan Kim <minchan@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Hugh Dickins <hughd@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
try_offline_node() is pretty much broken right now:
- The node span is updated when onlining memory, not when adding it. We
ignore memory that was mever onlined. Bad.
- We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily
trigger a kernel panic. Bad for memory that is offline but also bad
for subsection hotadd with ZONE_DEVICE, whereby the memmap of the
first PFN of a section might contain garbage.
- Sections belonging to mixed nodes are not properly considered.
As memory blocks might belong to multiple nodes, we would have to walk
all pageblocks (or at least subsections) within present sections.
However, we don't have a way to identify whether a memmap that is not
online was initialized (relevant for ZONE_DEVICE). This makes things
more complicated.
Luckily, we can piggy pack on the node span and the nid stored in memory
blocks. Currently, the node span is grown when calling
move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when
removing memory, before calling try_offline_node(). Sysfs links are
created via link_mem_sections(), e.g., during boot or when adding
memory.
If the node still spans memory or if any memory block belongs to the
nid, we don't set the node offline. As memory blocks that span multiple
nodes cannot get offlined, the nid stored in memory blocks is reliable
enough (for such online memory blocks, the node still spans the memory).
Introduce for_each_memory_block() to efficiently walk all memory blocks.
Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span
when removing ZONE_DEVICE memory to fix similar issues (access of
garbage memmaps) - until we have a reliable way to identify whether
these memmaps were properly initialized. This implies later, that once
a node had ZONE_DEVICE memory, we won't be able to set a node offline -
which should be acceptable.
Since commit f1dd2cd13c4b ("mm, memory_hotplug: do not associate
hotadded memory to zones until online") memory that is added is not
assoziated with a zone/node (memmap not initialized). The introducing
commit 60a5a19e7419 ("memory-hotplug: remove sysfs file of node")
already missed that we could have multiple nodes for a section and that
the zone/node span is updated when onlining pages, not when adding them.
I tested this by hotplugging two DIMMs to a memory-less and cpu-less
NUMA node. The node is properly onlined when adding the DIMMs. When
removing the DIMMs, the node is properly offlined.
Song Liu [Sat, 16 Nov 2019 01:34:53 +0000 (17:34 -0800)]
mm,thp: recheck each page before collapsing file THP
In collapse_file(), for !is_shmem case, current check cannot guarantee
the locked page is up-to-date. Specifically, xas_unlock_irq() should
not be called before lock_page() and get_page(); and it is necessary to
recheck PageUptodate() after locking the page.
With this bug and CONFIG_READ_ONLY_THP_FOR_FS=y, madvise(HUGE)'ed .text
may contain corrupted data. This is because khugepaged mistakenly
collapses some not up-to-date sub pages into a huge page, and assumes
the huge page is up-to-date. This will NOT corrupt data in the disk,
because the page is read-only and never written back. Fix this by
properly checking PageUptodate() after locking the page. This check
replaces "VM_BUG_ON_PAGE(!PageUptodate(page), page);".
Also, move PageDirty() check after locking the page. Current khugepaged
should not try to collapse dirty file THP, because it is limited to
read-only .text. The only case we hit a dirty page here is when the
page hasn't been written since write. Bail out and retry when this
happens.
syzbot reported bug on previous version of this patch.
Link: http://lkml.kernel.org/r/20191106060930.2571389-2-songliubraving@fb.com Fixes: 99cb0dbd47a1 ("mm,thp: add read-only THP support for (non-shmem) FS") Signed-off-by: Song Liu <songliubraving@fb.com> Reported-by: syzbot+efb9e48b9fbdc49bb34a@syzkaller.appspotmail.com Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: William Kucharski <william.kucharski@oracle.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Laura Abbott [Sat, 16 Nov 2019 01:34:50 +0000 (17:34 -0800)]
mm: slub: really fix slab walking for init_on_free
Commit 1b7e816fc80e ("mm: slub: Fix slab walking for init_on_free")
fixed one problem with the slab walking but missed a key detail: When
walking the list, the head and tail pointers need to be updated since we
end up reversing the list as a result. Without doing this, bulk free is
broken.
One way this is exposed is a NULL pointer with slub_debug=F:
=============================================================================
BUG skbuff_head_cache (Tainted: G T): Object already free
-----------------------------------------------------------------------------
Roman Gushchin [Sat, 16 Nov 2019 01:34:46 +0000 (17:34 -0800)]
mm: hugetlb: switch to css_tryget() in hugetlb_cgroup_charge_cgroup()
An exiting task might belong to an offline cgroup. In this case an
attempt to grab a cgroup reference from the task can end up with an
infinite loop in hugetlb_cgroup_charge_cgroup(), because neither the
cgroup will become online, neither the task will be migrated to a live
cgroup.
Fix this by switching over to css_tryget(). As css_tryget_online()
can't guarantee that the cgroup won't go offline, in most cases the
check doesn't make sense. In this particular case users of
hugetlb_cgroup_charge_cgroup() are not affected by this change.
A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
css_tryget() instead of css_tryget_online() in task_get_css()").
Link: http://lkml.kernel.org/r/20191106225131.3543616-2-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Shakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Roman Gushchin [Sat, 16 Nov 2019 01:34:43 +0000 (17:34 -0800)]
mm: memcg: switch to css_tryget() in get_mem_cgroup_from_mm()
We've encountered a rcu stall in get_mem_cgroup_from_mm():
rcu: INFO: rcu_sched self-detected stall on CPU
rcu: 33-....: (21000 ticks this GP) idle=6c6/1/0x4000000000000002 softirq=35441/35441 fqs=5017
(t=21031 jiffies g=324821 q=95837) NMI backtrace for cpu 33
<...>
RIP: 0010:get_mem_cgroup_from_mm+0x2f/0x90
<...>
__memcg_kmem_charge+0x55/0x140
__alloc_pages_nodemask+0x267/0x320
pipe_write+0x1ad/0x400
new_sync_write+0x127/0x1c0
__kernel_write+0x4f/0xf0
dump_emit+0x91/0xc0
writenote+0xa0/0xc0
elf_core_dump+0x11af/0x1430
do_coredump+0xc65/0xee0
get_signal+0x132/0x7c0
do_signal+0x36/0x640
exit_to_usermode_loop+0x61/0xd0
do_syscall_64+0xd4/0x100
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The problem is caused by an exiting task which is associated with an
offline memcg. We're iterating over and over in the do {} while
(!css_tryget_online()) loop, but obviously the memcg won't become online
and the exiting task won't be migrated to a live memcg.
Let's fix it by switching from css_tryget_online() to css_tryget().
As css_tryget_online() cannot guarantee that the memcg won't go offline,
the check is usually useless, except some rare cases when for example it
determines if something should be presented to a user.
A similar problem is described by commit 18fa84a2db0e ("cgroup: Use
css_tryget() instead of css_tryget_online() in task_get_css()").
Johannes:
: The bug aside, it doesn't matter whether the cgroup is online for the
: callers. It used to matter when offlining needed to evacuate all charges
: from the memcg, and so needed to prevent new ones from showing up, but we
: don't care now.
Link: http://lkml.kernel.org/r/20191106225131.3543616-1-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Tejun Heo <tj@kernel.org> Reviewed-by: Shakeel Butt <shakeeb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Michal Koutn <mkoutny@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Lasse Collin [Sat, 16 Nov 2019 01:34:39 +0000 (17:34 -0800)]
lib/xz: fix XZ_DYNALLOC to avoid useless memory reallocations
s->dict.allocated was initialized to 0 but never set after a successful
allocation, thus the code always thought that the dictionary buffer has
to be reallocated.
Link: http://lkml.kernel.org/r/20191104185107.3b6330df@tukaani.org Signed-off-by: Lasse Collin <lasse.collin@tukaani.org> Reported-by: Yu Sun <yusun2@cisco.com> Acked-by: Daniel Walker <danielwa@cisco.com> Cc: "Yixia Si (yisi)" <yisi@cisco.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
madvise_pageout() accesses the specified range of the vma and isolates
them, then runs shrink_page_list() to reclaim its memory. But it also
isolates the unevictable pages to reclaim. Hence, we can catch the
cases in shrink_page_list().
The root cause is that we scan the page tables instead of specific LRU
list. and so we need to filter out the unevictable lru pages from our
end.
Link: http://lkml.kernel.org/r/1572616245-18946-1-git-send-email-zhongjiang@huawei.com Fixes: 1a4e58cce84e ("mm: introduce MADV_PAGEOUT") Signed-off-by: zhong jiang <zhongjiang@huawei.com> Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Minchan Kim <minchan@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Yang Shi [Sat, 16 Nov 2019 01:34:33 +0000 (17:34 -0800)]
mm: mempolicy: fix the wrong return value and potential pages leak of mbind
Commit d883544515aa ("mm: mempolicy: make the behavior consistent when
MPOL_MF_MOVE* and MPOL_MF_STRICT were specified") fixed the return value
of mbind() for a couple of corner cases. But, it altered the errno for
some other cases, for example, mbind() should return -EFAULT when part
or all of the memory range specified by nodemask and maxnode points
outside your accessible address space, or there was an unmapped hole in
the specified memory range specified by addr and len.
Fix this by preserving the errno returned by queue_pages_range(). And,
the pagelist may be not empty even though queue_pages_range() returns
error, put the pages back to LRU since mbind_range() is not called to
really apply the policy so those pages should not be migrated, this is
also the old behavior before the problematic commit.
Link: http://lkml.kernel.org/r/1572454731-3925-1-git-send-email-yang.shi@linux.alibaba.com Fixes: d883544515aa ("mm: mempolicy: make the behavior consistent when MPOL_MF_MOVE* and MPOL_MF_STRICT were specified") Signed-off-by: Yang Shi <yang.shi@linux.alibaba.com> Reported-by: Li Xinhai <lixinhai.lxh@gmail.com> Reviewed-by: Li Xinhai <lixinhai.lxh@gmail.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Michal Hocko <mhocko@suse.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: <stable@vger.kernel.org> [4.19 and 5.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Just got one of these for debugging some unrelated issues, and noticed
that Lenovo seems to have gone back to using RMI4 over smbus with
Synaptics touchpads on some of their new systems, particularly this one.
So, let's enable RMI mode for the X1 Extreme 2nd Generation.
Linus Torvalds [Fri, 15 Nov 2019 21:02:34 +0000 (13:02 -0800)]
Merge tag 'for-linus-20191115' of git://git.kernel.dk/linux-block
Pull block fixes from Jens Axboe:
"A few fixes that should make it into this release. This contains:
- io_uring:
- The timeout command assumes sequence == 0 means that we want
one completion, but this kind of overloading is unfortunate as
it prevents users from doing a pure time based wait. Since
this operation was introduced in this cycle, let's correct it
now, while we can. (me)
- One-liner to fix an issue with dependent links and fixed
buffer reads. The actual IO completed fine, but the link got
severed since we stored the wrong expected value. (me)
- Add TIMEOUT to list of opcodes that don't need a file. (Pavel)
- rsxx missing workqueue destry calls. Old bug. (Chuhong)
- Fix blk-iocost active list check (Jiufei)
- Fix impossible-to-hit overflow merge condition, that still hit some
folks very rarely (Junichi)
- Fix bfq hang issue from 5.3. This didn't get marked for stable, but
will go into stable post this merge (Paolo)"
* tag 'for-linus-20191115' of git://git.kernel.dk/linux-block:
rsxx: add missed destroy_workqueue calls in remove
iocost: check active_list of all the ancestors in iocg_activate()
block, bfq: deschedule empty bfq_queues not referred by any process
io_uring: ensure registered buffer import returns the IO length
io_uring: Fix getting file for timeout
block: check bi_size overflow before merge
io_uring: make timeout sequence == 0 mean no sequence
====================
ptp: Validate the ancillary ioctl flags more carefully.
The flags passed to the ioctls for periodic output signals and
time stamping of external signals were never checked, and thus formed
a useless ABI inadvertently. More recently, a version 2 of the ioctls
was introduced in order make the flags meaningful. This series
tightens up the checks on the new ioctl flags.
- Patch 1 ensures at least one edge flag is set for the new ioctl.
- Patches 2-7 are Jacob's recent checks, picking up the tags.
- Patch 8 introduces a "strict" flag for passing to the drivers when the
new ioctl is used.
- Patches 9-12 implement the "strict" checking in the drivers.
- Patch 13 extends the test program to exercise combinations of flags.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Thu, 14 Nov 2019 18:45:07 +0000 (10:45 -0800)]
ptp: Extend the test program to check the external time stamp flags.
Because each driver and hardware has different capabilities, the test
cannot provide a simple pass/fail result, but it can at least show what
combinations of flags are supported.
Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Thu, 14 Nov 2019 18:45:06 +0000 (10:45 -0800)]
mlx5: Reject requests to enable time stamping on both edges.
This driver enables rising edge or falling edge, but not both, and so
this patch validates that the request contains only one of the two
edges.
Signed-off-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Thu, 14 Nov 2019 18:45:02 +0000 (10:45 -0800)]
ptp: Introduce strict checking of external time stamp options.
User space may request time stamps on rising edges, falling edges, or
both. However, the particular mode may or may not be supported in the
hardware or in the driver. This patch adds a "strict" flag that tells
drivers to ensure that the requested mode will be honored.
Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the renesas PTP support to explicitly reject any future flags that
get added to the external timestamp request ioctl.
In order to maintain currently functioning code, this patch accepts all
three current flags. This is because the PTP_RISING_EDGE and
PTP_FALLING_EDGE flags have unclear semantics and each driver seems to
have interpreted them slightly differently.
Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Thu, 14 Nov 2019 18:45:00 +0000 (10:45 -0800)]
mlx5: reject unsupported external timestamp flags
Fix the mlx5 core PTP support to explicitly reject any future flags that
get added to the external timestamp request ioctl.
In order to maintain currently functioning code, this patch accepts all
three current flags. This is because the PTP_RISING_EDGE and
PTP_FALLING_EDGE flags have unclear semantics and each driver seems to
have interpreted them slightly differently.
[ RC: I'm not 100% sure what this driver does, but if I'm not wrong it
follows the dp83640:
flags Meaning
---------------------------------------------------- --------------------------
PTP_ENABLE_FEATURE Time stamp rising edge
PTP_ENABLE_FEATURE|PTP_RISING_EDGE Time stamp rising edge
PTP_ENABLE_FEATURE|PTP_FALLING_EDGE Time stamp falling edge
PTP_ENABLE_FEATURE|PTP_RISING_EDGE|PTP_FALLING_EDGE Time stamp falling edge
]
Cc: Feras Daoud <ferasda@mellanox.com> Cc: Eugenia Emantayev <eugenia@mellanox.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Thu, 14 Nov 2019 18:44:59 +0000 (10:44 -0800)]
igb: reject unsupported external timestamp flags
Fix the igb PTP support to explicitly reject any future flags that
get added to the external timestamp request ioctl.
In order to maintain currently functioning code, this patch accepts all
three current flags. This is because the PTP_RISING_EDGE and
PTP_FALLING_EDGE flags have unclear semantics and each driver seems to
have interpreted them slightly differently.
This HW always time stamps both edges:
flags Meaning
---------------------------------------------------- --------------------------
PTP_ENABLE_FEATURE Time stamp both edges
PTP_ENABLE_FEATURE|PTP_RISING_EDGE Time stamp both edges
PTP_ENABLE_FEATURE|PTP_FALLING_EDGE Time stamp both edges
PTP_ENABLE_FEATURE|PTP_RISING_EDGE|PTP_FALLING_EDGE Time stamp both edges
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the dp83640 PTP support to explicitly reject any future flags that
get added to the external timestamp request ioctl.
In order to maintain currently functioning code, this patch accepts all
three current flags. This is because the PTP_RISING_EDGE and
PTP_FALLING_EDGE flags have unclear semantics and each driver seems to
have interpreted them slightly differently.
For the record, the semantics of this driver are:
flags Meaning
---------------------------------------------------- --------------------------
PTP_ENABLE_FEATURE Time stamp rising edge
PTP_ENABLE_FEATURE|PTP_RISING_EDGE Time stamp rising edge
PTP_ENABLE_FEATURE|PTP_FALLING_EDGE Time stamp falling edge
PTP_ENABLE_FEATURE|PTP_RISING_EDGE|PTP_FALLING_EDGE Time stamp falling edge
Cc: Stefan Sørensen <stefan.sorensen@spectralink.com> Cc: Richard Cochran <richardcochran@gmail.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Fix the mv88e6xxx PTP support to explicitly reject any future flags that
get added to the external timestamp request ioctl.
In order to maintain currently functioning code, this patch accepts all
three current flags. This is because the PTP_RISING_EDGE and
PTP_FALLING_EDGE flags have unclear semantics and each driver seems to
have interpreted them slightly differently.
For the record, the semantics of this driver are:
flags Meaning
---------------------------------------------------- --------------------------
PTP_ENABLE_FEATURE Time stamp falling edge
PTP_ENABLE_FEATURE|PTP_RISING_EDGE Time stamp rising edge
PTP_ENABLE_FEATURE|PTP_FALLING_EDGE Time stamp falling edge
PTP_ENABLE_FEATURE|PTP_RISING_EDGE|PTP_FALLING_EDGE Time stamp rising edge
Cc: Brandon Streiff <brandon.streiff@ni.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jacob Keller [Thu, 14 Nov 2019 18:44:56 +0000 (10:44 -0800)]
net: reject PTP periodic output requests with unsupported flags
Commit 823eb2a3c4c7 ("PTP: add support for one-shot output") introduced
a new flag for the PTP periodic output request ioctl. This flag is not
currently supported by any driver.
Fix all drivers which implement the periodic output request ioctl to
explicitly reject any request with flags they do not understand. This
ensures that the driver does not accidentally misinterpret the
PTP_PEROUT_ONE_SHOT flag, or any new flag introduced in the future.
This is important for forward compatibility: if a new flag is
introduced, the driver should reject requests to enable the flag until
the driver has actually been modified to support the flag in question.
Cc: Felipe Balbi <felipe.balbi@linux.intel.com> Cc: David S. Miller <davem@davemloft.net> Cc: Christopher Hall <christopher.s.hall@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Signed-off-by: Richard Cochran <richardcochran@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Richard Cochran [Thu, 14 Nov 2019 18:44:55 +0000 (10:44 -0800)]
ptp: Validate requests to enable time stamping of external signals.
Commit 415606588c61 ("PTP: introduce new versions of IOCTLs")
introduced a new external time stamp ioctl that validates the flags.
This patch extends the validation to ensure that at least one rising
or falling edge flag is set when enabling external time stamps.
Signed-off-by: Richard Cochran <richardcochran@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Chuhong Yuan [Thu, 14 Nov 2019 15:43:24 +0000 (23:43 +0800)]
net: ep93xx_eth: fix mismatch of request_mem_region in remove
The driver calls release_resource in remove to match request_mem_region
in probe, which is incorrect.
Fix it by using the right one, release_mem_region.
Signed-off-by: Chuhong Yuan <hslester96@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oliver Neukum [Thu, 14 Nov 2019 10:16:01 +0000 (11:16 +0100)]
ax88172a: fix information leak on short answers
If a malicious device gives a short MAC it can elicit up to
5 bytes of leaked memory out of the driver. We need to check for
ETH_ALEN instead.
Reported-by: syzbot+a8d4acdad35e6bbca308@syzkaller.appspotmail.com Signed-off-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ido Schimmel [Thu, 14 Nov 2019 09:50:57 +0000 (11:50 +0200)]
selftests: mlxsw: Adjust test to recent changes
mlxsw does not support VXLAN devices with a physical device attached and
vetoes such configurations upon enslavement to an offloaded bridge.
Commit 0ce1822c2a08 ("vxlan: add adjacent link to limit depth level")
changed the VXLAN device to be an upper of the physical device which
causes mlxsw to veto the creation of the VXLAN device with "Unknown
upper device type".
This is OK as this configuration is not supported, but it prevents us
from testing bad flows involving the enslavement of VXLAN devices with a
physical device to a bridge, regardless if the physical device is an
mlxsw netdev or not.
Adjust the test to use a dummy device as a physical device instead of a
mlxsw netdev.
Fixes: 0ce1822c2a08 ("vxlan: add adjacent link to limit depth level") Signed-off-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 15 Nov 2019 18:30:24 +0000 (10:30 -0800)]
Merge tag 'ceph-for-5.4-rc8' of git://github.com/ceph/ceph-client
Pull ceph fixes from Ilya Dryomov:
"Two fixes for the buffered reads and O_DIRECT writes serialization
patch that went into -rc1 and a fixup for a bogus warning on older gcc
versions"
* tag 'ceph-for-5.4-rc8' of git://github.com/ceph/ceph-client:
rbd: silence bogus uninitialized warning in rbd_object_map_update_finish()
ceph: increment/decrement dio counter on async requests
ceph: take the inode lock before acquiring cap refs
David Howells [Thu, 14 Nov 2019 18:41:03 +0000 (18:41 +0000)]
afs: Fix race in commit bulk status fetch
When a lookup is done, the afs filesystem will perform a bulk status-fetch
operation on the requested vnode (file) plus the next 49 other vnodes from
the directory list (in AFS, directory contents are downloaded as blobs and
parsed locally). When the results are received, it will speculatively
populate the inode cache from the extra data.
However, if the lookup races with another lookup on the same directory, but
for a different file - one that's in the 49 extra fetches, then if the bulk
status-fetch operation finishes first, it will try and update the inode
from the other lookup.
If this other inode is still in the throes of being created, however, this
will cause an assertion failure in afs_apply_status():
BUG_ON(test_bit(AFS_VNODE_UNSET, &vnode->flags));
on or about fs/afs/inode.c:175 because it expects data to be there already
that it can compare to.
Fix this by skipping the update if the inode is being created as the
creator will presumably set up the inode with the same information.
Fixes: 39db9815da48 ("afs: Fix application of the results of a inline bulk status fetch") Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Fri, 15 Nov 2019 17:14:23 +0000 (09:14 -0800)]
Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fix from Will Deacon:
"One trivial fix for -rc8/final that ensures that the script used to
detect RELR relocation support in the toolchain works correctly when
$CC contains quotes. Although it fails safely (by failing to detect
the support when it exists), it would be nice to have this fixed in
5.4 given that it was only introduced in the last merge window.
Summary:
- Handle CC variables containing quotes in tools-support-relr.sh
script"
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
scripts/tools-support-relr.sh: un-quote variables
Linus Torvalds [Fri, 15 Nov 2019 17:10:13 +0000 (09:10 -0800)]
Merge tag 'mips_fixes_5.4_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux
Pull MIPS fixes from Paul Burton:
"A fix and simplification for SGI IP27 exception handlers, and a small
MAINTAINERS update for Broadcom MIPS systems"
* tag 'mips_fixes_5.4_4' of git://git.kernel.org/pub/scm/linux/kernel/git/mips/linux:
MAINTAINERS: Remove Kevin as maintainer of BMIPS generic platforms
MIPS: SGI-IP27: fix exception handler replication
Linus Torvalds [Fri, 15 Nov 2019 17:05:39 +0000 (09:05 -0800)]
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm
Pull more KVM fixes from Paolo Bonzini:
- fixes for CONFIG_KVM_COMPAT=n
- two updates to the IFU erratum
- selftests build fix
- brown paper bag fix
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: Add a comment describing the /dev/kvm no_compat handling
KVM: x86/mmu: Take slots_lock when using kvm_mmu_zap_all_fast()
KVM: Forbid /dev/kvm being opened by a compat task when CONFIG_KVM_COMPAT=n
KVM: X86: Reset the three MSR list number variables to 0 in kvm_init_msr_list()
selftests: kvm: fix build with glibc >= 2.30
kvm: x86: disable shattered huge page recovery for PREEMPT_RT.
Linus Torvalds [Fri, 15 Nov 2019 16:47:34 +0000 (08:47 -0800)]
Merge tag 'drm-fixes-2019-11-15' of git://anongit.freedesktop.org/drm/drm
Pull drm fixes from Dave Airlie:
"Here is this weeks non-intel hw vuln fixes pull. Three drivers, all
small fixes.
i915:
- MOCS table fixes for EHL and TGL
- Update Display's rawclock on resume
- GVT's dmabuf reference drop fix
amdgpu:
- Fix a potential crash in firmware parsing
sun4i:
- One fix to the dotclock dividers range for sun4i"
* tag 'drm-fixes-2019-11-15' of git://anongit.freedesktop.org/drm/drm:
drm/amdgpu: fix null pointer deref in firmware header printing
drm/i915/tgl: MOCS table update
Revert "drm/i915/ehl: Update MOCS table for EHL"
drm/sun4i: tcon: Set min division of TCON0_DCLK to 1.
drm/i915: update rawclk also on resume
drm/i915/gvt: fix dropping obj reference twice
Linus Torvalds [Fri, 15 Nov 2019 16:44:08 +0000 (08:44 -0800)]
Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
Pull misc vfs fixes from Al Viro:
"Assorted fixes all over the place; some of that is -stable fodder,
some regressions from the last window"
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
ecryptfs_lookup_interpose(): lower_dentry->d_parent is not stable either
ecryptfs_lookup_interpose(): lower_dentry->d_inode is not stable
ecryptfs: fix unlink and rmdir in face of underlying fs modifications
audit_get_nd(): don't unlock parent too early
exportfs_decode_fh(): negative pinned may become positive without the parent locked
cgroup: don't put ERR_PTR() into fc->root
autofs: fix a leak in autofs_expire_indirect()
aio: Fix io_pgetevents() struct __compat_aio_sigset layout
fs/namespace.c: fix use-after-free of mount in mnt_warn_timestamp_expiry()
Yonglong Liu [Thu, 14 Nov 2019 02:32:41 +0000 (10:32 +0800)]
net: hns3: fix ETS bandwidth validation bug
Some device only support 4 TCs, but the driver check the total
bandwidth of 8 TCs, so may cause wrong configurations write to
the hw.
This patch uses hdev->tc_max to instead HNAE3_MAX_TC to fix it.
Fixes: e432abfb99e5 ("net: hns3: add common validation in hclge_dcb") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Thu, 14 Nov 2019 02:32:40 +0000 (10:32 +0800)]
net: hns3: reallocate SSU' buffer size when pfc_en changes
When a TC's PFC is disabled or enabled, the RX private buffer for
this TC need to be changed too, otherwise this may cause packet
dropped problem.
This patch fixes it by calling hclge_buffer_alloc to reallocate
buffer when pfc_en changes.
Fixes: cacde272dd00 ("net: hns3: Add hclge_dcb module for the support of DCB feature") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Guangbin Huang [Thu, 14 Nov 2019 02:32:39 +0000 (10:32 +0800)]
net: hns3: add compatible handling for MAC VLAN switch parameter configuration
Previously, hns3 driver just directly send specific setting bit
and mask bits of MAC VLAN switch parameter to the firmware, it
can not be compatible with the old firmware, because the old one
ignores mask bits and covers all bits with new setting bits.
So when running with old firmware, the communication between PF
and VF will fail after resetting or configuring spoof check, since
they will do the MAC VLAN switch parameter configuration.
This patch fixes this problem by reading switch parameter firstly,
then just modifies the corresponding bit and sends it to firmware.
Fixes: dd2956eab104 ("net: hns3: not allow SSU loopback while execute ethtool -t dev") Signed-off-by: Guangbin Huang <huangguangbin2@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Ulrich Hecht [Thu, 14 Nov 2019 01:49:49 +0000 (02:49 +0100)]
ravb: implement MTU change while device is up
Pre-allocates buffers sufficient for the maximum supported MTU (2026) in
order to eliminate the possibility of resource exhaustion when changing the
MTU while the device is up.
Signed-off-by: Ulrich Hecht <uli+renesas@fpond.eu> Reviewed-by: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Matt Bennett [Wed, 13 Nov 2019 23:20:03 +0000 (12:20 +1300)]
tipc: add back tipc prefix to log messages
The tipc prefix for log messages generated by tipc was
removed in commit 07f6c4bc048a ("tipc: convert tipc reference
table to use generic rhashtable").
This is still a useful prefix so add it back.
Signed-off-by: Matt Bennett <matt.bennett@alliedtelesis.co.nz> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>