Two bits in RxConfig are controlled by the following dev->feature's:
- NETIF_F_RXALL
- NETIF_F_HW_VLAN_CTAG_RX (since RTL8125)
We have to take care that RxConfig gets fully configured in
rtl_hw_start() after e.g. resume from hibernation. Therefore:
- Factor out setting the feature-controlled RxConfig bits to a new
function rtl_set_rx_config_features() that is called from
rtl8169_set_features() and rtl_hw_start().
- Don't deal with RX_VLAN_8125 in rtl_init_rxcfg(), it will be set
by rtl_set_rx_config_features().
- Don't handle NETIF_F_RXALL in rtl_set_rx_mode().
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 27 Apr 2020 18:40:25 +0000 (11:40 -0700)]
Merge branch 'net-bridge-mrp'
Horatiu Vultur says:
====================
net: bridge: mrp: Add support for Media Redundancy Protocol(MRP)
Media Redundancy Protocol is a data network protocol standardized by
International Electrotechnical Commission as IEC 62439-2. It allows rings of
Ethernet switches to overcome any single failure with recovery time faster than
STP. It is primarily used in Industrial Ethernet applications.
Based on the previous RFC[1][2][3][4][5], and patches[6][7][8], the MRP state
machine and all the timers were moved to userspace, except for the timers used
to generate MRP Test frames. In this way the userspace doesn't know and should
not know if the HW or the kernel will generate the MRP Test frames. The
following changes were added to the bridge to support the MRP:
- the existing netlink interface was extended with MRP support,
- allow to detect when a MRP frame was received on a MRP ring port
- allow MRP instance to forward/terminate MRP frames
- generate MRP Test frames in case the HW doesn't have support for this
To be able to offload MRP support to HW, the switchdev API was extend.
With these changes the userspace doesn't do the following because already the
kernel/HW will do:
- doesn't need to forward/terminate MRP frames
- doesn't need to generate MRP Test frames
- doesn't need to detect when the ring is open/closed.
The userspace application that is using the new netlink can be found here[9].
The current implementation both in kernel and userspace supports only 2 roles:
MRM - this one is responsible to send MRP_Test and MRP_Topo frames on both
ring ports. It needs to process MRP_Test to know if the ring is open or
closed. This operation is desired to be offloaded to the HW because it
requires to generate and process up to 4000 frames per second. Whenever it
detects that the ring is open it sends MRP_Topo frames to notify all MRC about
changes in the topology. MRM needs also to process MRP_LinkChange frames,
these frames are generated by the MRC. When the ring is open then the state
of both ports is to forward frames and when the ring is closed then the
secondary port is blocked.
MRC - this one is responsible to forward MRP frames between the ring ports.
In case one of the ring ports gets a link down or up, then MRC will generate
a MRP_LinkChange frames. This node should also process MRP_Topo frames and to
clear its FDB when it receives this frame.
The user interacts using the client (called 'mrp'), the client talks to the
deamon (called 'mrp_server'), which talks with the kernel using netlink. The
kernel will try to offload the requests to the HW via switchdev API.
If this will be accepted then in the future the netlink interface can be
expended with multiple attributes which are required by different roles of the
MRP. Like Media Redundancy Automanager(MRA), Media Interconnect Manager(MIM) and
Media Interconnect Client(MIC).
It is not possible to have the MRP and STP running at the same time on the
bridge, therefore add check when enabling the STP to check if MRP is already
enabled. In that case return error.
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
To integrate MRP into the bridge, the bridge needs to do the following:
- detect if the MRP frame was received on MRP ring port in that case it would be
processed otherwise just forward it as usual.
- enable parsing of MRP
- before whenever the bridge was set up, it would set all the ports in
forwarding state. Add an extra check to not set ports in forwarding state if
the port is an MRP ring port. The reason of this change is that if the MRP
instance initially sets the port in blocked state by setting the bridge up it
would overwrite this setting.
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bridge: mrp: Connect MRP API with the switchdev API
Implement the MRP API.
In case the HW can't generate MRP Test frames then the SW will try to generate
the frames. In case that also the SW will fail in generating the frames then a
error is return to the userspace. The userspace is responsible to generate all
the other MRP frames regardless if the test frames are generated by HW or SW.
The forwarding/termination of MRP frames is happening in the kernel and is done
by the MRP instance. The userspace application doesn't do the forwarding.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
bridge: switchdev: mrp: Implement MRP API for switchdev
Implement the MRP api for switchdev.
These functions will just eventually call the switchdev functions:
switchdev_port_obj_add/del and switchdev_port_attr_set.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
switchdev: mrp: Extend switchdev API to offload MRP
Extend switchdev API to add support for MRP. The HW is notified in
following cases:
SWITCHDEV_OBJ_ID_MRP: This is used when a MRP instance is added/removed
from the MRP ring.
SWITCHDEV_OBJ_ID_RING_ROLE_MRP: This is used when the role of the node
changes. The current supported roles are MRM and MRC.
SWITCHDEV_OBJ_ID_RING_TEST_MRP: This is used when to start/stop sending
MRP_Test frames on the mrp ring ports. This is called only on nodes that have
the role MRM. In case this fails then the SW will generate the frames.
SWITCHDEV_OBJ_ID_RING_STATE_STATE: This is used when the ring changes it states
to open or closed. This is required to notify HW because the MRP_Test frame
contains the field MRP_InState which contains this information.
SWITCHDEV_ATTR_ID_MRP_PORT_STATE: This is used when the port's state is
changed. It can be in blocking/forwarding mode.
SWITCHDEV_ATTR_ID_MRP_PORT_ROLE: This is used when port's role changes. The
roles of the port can be primary/secondary. This is required to notify HW
because the MRP_Test frame contains the field MRP_PortRole that contains this
information.
Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Define the MRP interface.
This interface is used by the netlink to update the MRP instances and by the MRP
to make the calls to switchdev to offload it to HW.
It defines an MRP instance 'struct br_mrp' which is a list of MRP instances.
Which will be part of the 'struct net_bridge'. Each instance has 2 ring ports,
a bridge and an ID.
In case the HW can't generate MRP Test frames then the SW will generate those.
br_mrp_add - adds a new MRP instance.
br_mrp_del - deletes an existing MRP instance. Each instance has an ID(ring_id).
br_mrp_set_port_state - changes the port state. The port can be in forwarding
state, which means that the frames can pass through or in blocked state which
means that the frames can't pass through except MRP frames. This will
eventually call the switchdev API to notify the HW. This information is used
also by the SW bridge to know how to forward frames in case the HW doesn't
have this capability.
br_mrp_set_port_role - a port role can be primary or secondary. This
information is required to be pushed to HW in case the HW can generate
MRP_Test frames. Because the MRP_Test frames contains a file with this
information. Otherwise the HW will not be able to generate the frames
correctly.
br_mrp_set_ring_state - a ring can be in state open or closed. State open means
that the mrp port stopped receiving MRP_Test frames, while closed means that
the mrp port received MRP_Test frames. Similar with br_mrp_port_role, this
information is pushed in HW because the MRP_Test frames contain this
information.
br_mrp_set_ring_role - a ring can have the following roles MRM or MRC. For the
role MRM it is expected that the HW can terminate the MRP frames, notify the
SW that it stopped receiving MRP_Test frames and trapp all the other MRP
frames. While for MRC mode it is expected that the HW can forward the MRP
frames only between the MRP ports and copy MRP_Topology frames to CPU. In
case the HW doesn't support a role it needs to return an error code different
than -EOPNOTSUPP.
br_mrp_start_test - this starts/stops the generation of MRP_Test frames. To stop
the generation of frames the interval needs to have a value of 0. In this case
the userspace needs to know if the HW supports this or not. Not to have
duplicate frames(generated by HW and SW). Because if the HW supports this then
the SW will not generate anymore frames and will expect that the HW will
notify when it stopped receiving MRP frames using the function
br_mrp_port_open.
br_mrp_port_open - this function is used by drivers to notify the userspace via
a netlink callback that one of the ports stopped receiving MRP_Test frames.
This function is called only when the node has the role MRM. It is not
supposed to be called from userspace.
br_mrp_port_switchdev_add - this corresponds to the function br_mrp_add,
and will notify the HW that a MRP instance is added. The function gets
as parameter the MRP instance.
br_mrp_port_switchdev_del - this corresponds to the function br_mrp_del,
and will notify the HW that a MRP instance is removed. The function
gets as parameter the ID of the MRP instance that is removed.
br_mrp_port_switchdev_set_state - this corresponds to the function
br_mrp_set_port_state. It would notify the HW if it should block or not
non-MRP frames.
br_mrp_port_switchdev_set_port - this corresponds to the function
br_mrp_set_port_role. It would set the port role, primary or secondary.
br_mrp_switchdev_set_role - this corresponds to the function
br_mrp_set_ring_role and would set one of the role MRM or MRC.
br_mrp_switchdev_set_ring_state - this corresponds to the function
br_mrp_set_ring_state and would set the ring to be open or closed.
br_mrp_switchdev_send_ring_test - this corresponds to the function
br_mrp_start_test. This will notify the HW to start or stop generating
MRP_Test frames. Value 0 for the interval parameter means to stop generating
the frames.
br_mrp_port_open - this function is used to notify the userspace that the port
lost the continuity of MRP Test frames.
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: bridge: Add port attribute IFLA_BRPORT_MRP_RING_OPEN
This patch adds a new port attribute, IFLA_BRPORT_MRP_RING_OPEN, which allows
to notify the userspace when the port lost the continuite of MRP frames.
This attribute is set by kernel whenever the SW or HW detects that the ring is
being open or closed.
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
To integrate MRP into the bridge, first the bridge needs to be aware of ports
that are part of an MRP ring and which rings are on the bridge.
Therefore extend bridge interface with the following:
- add new flag(BR_MPP_AWARE) to the net bridge ports, this bit will be
set when the port is added to an MRP instance. In this way it knows if
the frame was received on MRP ring port
- add new flag(BR_MRP_LOST_CONT) to the net bridge ports, this bit will be set
when the port lost the continuity of MRP Test frames.
- add a list of MRP instances
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add the option BRIDGE_MRP to allow to build in or not MRP support.
The default value is N.
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Add new nested netlink attribute to configure the MRP. These attributes are used
by the userspace to add/delete/configure MRP instances and by the kernel to
notify the userspace when the MRP ring gets open/closed. MRP nested attribute
has the following attributes:
IFLA_BRIDGE_MRP_INSTANCE - the parameter type is br_mrp_instance which contains
the instance id, and the ifindex of the two ports. The ports can't be part of
multiple instances. This is used to create/delete MRP instances.
IFLA_BRIDGE_MRP_PORT_STATE - the parameter type is u32. Which can be forwarding,
blocking or disabled.
IFLA_BRIDGE_MRP_PORT_ROLE - the parameter type is br_mrp_port_role which
contains the instance id and the role. The role can be primary or secondary.
IFLA_BRIDGE_MRP_RING_STATE - the parameter type is br_mrp_ring_state which
contains the instance id and the state. The state can be open or closed.
IFLA_BRIDGE_MRP_RING_ROLE - the parameter type is br_mrp_ring_role which
contains the instance id and the ring role. The role can be MRM or MRC.
IFLA_BRIDGE_MRP_START_TEST - the parameter type is br_mrp_start_test which
contains the instance id, the interval at which to send the MRP_Test frames,
how many test frames can be missed before declaring the ring open and the
period which represent for how long to send the test frames.
Also add the file include/uapi/linux/mrp_bridge.h which defines all the types
used by MRP that are also needed by the userpace.
Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Colin Ian King [Sat, 25 Apr 2020 11:28:14 +0000 (12:28 +0100)]
net: rtnetlink: remove redundant assignment to variable err
The variable err is being initializeed with a value that is never read
and it is being updated later with a new value. The initialization
is redundant and can be removed.
Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: openvswitch: use div_u64() for 64-by-32 divisions
Compile the kernel for arm 32 platform, the build warning found.
To fix that, should use div_u64() for divisions.
| net/openvswitch/meter.c:396: undefined reference to `__udivdi3'
[add more commit msg, change reported tag, and use div_u64 instead
of do_div by Tonghao]
Fixes: e57358873bb5d6ca ("net: openvswitch: use u64 for meter bucket") Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Tested-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: openvswitch: suitable access to the dp_meters
To fix the following sparse warning:
| net/openvswitch/meter.c:109:38: sparse: sparse: incorrect type
| in assignment (different address spaces) ...
| net/openvswitch/meter.c:720:45: sparse: sparse: incorrect type
| in argument 1 (different address spaces) ...
Reported-by: kbuild test robot <lkp@intel.com> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 26 Apr 2020 03:46:28 +0000 (20:46 -0700)]
Merge branch 'hinic-add-SR-IOV-support'
Luo bin says:
====================
hinic: add SR-IOV support
patch #1 adds mailbox channel support and vf can
communicate with pf or hw through it.
patch #2 adds support for enabling vf and tx/rx
capabilities based on vf.
patch #3 adds support for vf's basic configurations.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Zou Wei [Fri, 24 Apr 2020 13:53:14 +0000 (21:53 +0800)]
net/mlx4_core: Add missing iounmap() in error path
This fixes the following coccicheck warning:
drivers/net/ethernet/mellanox/mlx4/crdump.c:200:2-8: ERROR: missing iounmap;
ioremap on line 190 and execution via conditional on line 198
Fixes: 7ef19d3b1d5e ("devlink: report error once U32_MAX snapshot ids have been used") Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zou Wei <zou_wei@huawei.com> Reviewed-by: Saeed Mahameed <saeedm@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yang Yingliang [Fri, 24 Apr 2020 12:52:26 +0000 (20:52 +0800)]
ptp: clockmatrix: remove unnecessary comparison
The type of loaddr is u8 which is always '<=' 0xff, so the
loaddr <= 0xff is always true, we can remove this comparison.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Reviewed-by: Vincent Cheng <vincent.cheng.xh@renesas.com> Signed-off-by: David S. Miller <davem@davemloft.net>
tcp: mptcp: use mptcp receive buffer space to select rcv window
In MPTCP, the receive window is shared across all subflows, because it
refers to the mptcp-level sequence space.
MPTCP receivers already place incoming packets on the mptcp socket
receive queue and will charge it to the mptcp socket rcvbuf until
userspace consumes the data.
Update __tcp_select_window to use the occupancy of the parent/mptcp
socket instead of the subflow socket in case the tcp socket is part
of a logical mptcp connection.
This commit doesn't change choice of initial window for passive or active
connections.
While it would be possible to change those as well, this adds complexity
(especially when handling MP_JOIN requests). Furthermore, the MPTCP RFC
specifically says that a MPTCP sender 'MUST NOT use the RCV.WND field
of a TCP segment at the connection level if it does not also carry a DSS
option with a Data ACK field.'
SYN/SYNACK packets do not carry a DSS option with a Data ACK field.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Sun, 26 Apr 2020 03:29:44 +0000 (20:29 -0700)]
Merge branch 'net-hns3-refactor-for-MAC-table'
Huazhong Tan says:
====================
net: hns3: refactor for MAC table
This patchset refactors the MAC table management, configure
the MAC address asynchronously, instead of synchronously.
Base on this change, it also refines the handle of promisc
mode and filter table entries restoring after reset.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
net: hns3: optimize the filter table entries handling when resetting
Currently, the PF driver removes all (including its VFs') MAC/VLAN
flow director table entries when resetting, and restores them after
reset completed.
In fact, the hardware will clear all table entries only in IMP
reset and global reset. So driver only needs to restore the table
entries in these cases, and needs do nothing when PF reset, FLR
or other function level reset.
This patch optimizes it by removing unnecessary table entries clear
and restoring handling in the reset flow, and doing the restoring
after reset completed.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: hns3: use mutex vport_lock instead of mutex umv_lock
Currently, the driver use mutex umv_lock to protect the variable
vport->share_umv_size. And there is already a mutex vport_lock
being defined in the driver, which is designed to protect the
resource of vport. So we can use vport_lock instead of umv_lock.
Furthermore, there is a time window for protect share_umv_size
between checking UMV space and doing MAC configuration in the
lin function hclge_add_uc_addr_common(). It should be extended.
This patch uses mutex vport_lock intead of spin lock umv_lock to
protect share_umv_size, and adjusts the mutex's range.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
As the HNS3 driver doesn't update the MAC address directly in
function hns3_set_rx_mode() now, it can't know whether the
MAC table is full from __dev_uc_sync() and __dev_mc_sync(),
so it's senseless to handle the overflow promisc here.
This patch removes the handle of overflow promisc from function
hns3_set_rx_mode(), and updates the promisc mode in the service
task.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: hns3: add support for dumping UC and MC MAC list
This patch adds support for dumping entries of UC and MC MAC list,
which help checking whether a MAC address being added into hardware
or not.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently, the HNS3 driver sync and unsync MAC address in function
hns3_set_rx_mode(). For PF, it adds and deletes MAC address directly
in the path of dev_set_rx_mode(). If failed, it won't retry until
next calling of hns3_set_rx_mode(). On the other hand, if request
add and remove a same address many times at a short interval, each
request must be done one by one, can't be merged. For VF, it sends
mailbox messages to PF to request adding or deleting MAC address in
the path of function hns3_set_rx_mode(), no matter the address is
configured success.
This patch refines it by recording the MAC address in function
hns3_set_rx_mode(), and updating MAC address in the service task.
If failed, it will retry by the next calling of periodical service
task. It also uses some state to mark the state of each MAC address
in the MAC list, which can help merge configure request for a same
address. With these changes, when global reset or IMP reset occurs,
we can restore the MAC table with the MAC list.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: hns3: replace num_req_vfs with num_alloc_vport in hclge_reset_umv_space()
Like the calculation elsewhere, replaces num_req_vfs with
num_alloc_vport in hclge_reset_umv_space().
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: hns3: remove unnecessary parameter 'is_alloc' in hclge_set_umv_space()
Since hclge_set_umv_space() is only called by hclge_init_umv_space(),
so parameter 'is_alloc' is redundant.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: hns3: refine for unicast MAC VLAN space management
Currently, firmware helps manage the unicast MAC VLAN table
space for each PF. PF just needs to tell firmware its wanted
space when initializing, and unnecessary to free it when
un-intializing. So this patch removes the umv space free handle,
and removes the forward statement of hclge_set_umv_space()
by defining hclge_init_umv_space() after it.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'sched-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
"Misc fixes:
- an uclamp accounting fix
- three frequency invariance fixes and a readability improvement"
* tag 'sched-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
sched/core: Fix reset-on-fork from RT with uclamp
x86, sched: Move check for CPU type to caller function
x86, sched: Don't enable static key when starting secondary CPUs
x86, sched: Account for CPUs with less than 4 cores in freq. invariance
x86, sched: Bail out of frequency invariance if base frequency is unknown
Merge tag 'perf-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"Two changes:
- fix exit event records
- extend x86 PMU driver enumeration to add Intel Jasper Lake CPU
support"
* tag 'perf-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
perf/core: fix parent pid/tid in task exit events
perf/x86/cstate: Add Jasper Lake CPU support
Merge tag 'objtool-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull objtool fixes from Ingo Molnar:
"Two fixes: fix an off-by-one bug, and fix 32-bit builds on 64-bit
systems"
* tag 'objtool-urgent-2020-04-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
objtool: Fix off-by-one in symbol_by_offset()
objtool: Fix 32bit cross builds
1) Fix memory leak in netfilter flowtable, from Roi Dayan.
2) Ref-count leaks in netrom and tipc, from Xiyu Yang.
3) Fix warning when mptcp socket is never accepted before close, from
Florian Westphal.
4) Missed locking in ovs_ct_exit(), from Tonghao Zhang.
5) Fix large delays during PTP synchornization in cxgb4, from Rahul
Lakkireddy.
6) team_mode_get() can hang, from Taehee Yoo.
7) Need to use kvzalloc() when allocating fw tracer in mlx5 driver,
from Niklas Schnelle.
8) Fix handling of bpf XADD on BTF memory, from Jann Horn.
9) Fix BPF_STX/BPF_B encoding in x86 bpf jit, from Luke Nelson.
10) Missing queue memory release in iwlwifi pcie code, from Johannes
Berg.
11) Fix NULL deref in macvlan device event, from Taehee Yoo.
12) Initialize lan87xx phy correctly, from Yuiko Oshino.
13) Fix looping between VRF and XFRM lookups, from David Ahern.
14) etf packet scheduler assumes all sockets are full sockets, which is
not necessarily true. From Eric Dumazet.
15) Fix mptcp data_fin handling in RX path, from Paolo Abeni.
16) fib_select_default() needs to handle nexthop objects, from David
Ahern.
17) Use GFP_ATOMIC under spinlock in mac80211_hwsim, from Wei Yongjun.
18) vxlan and geneve use wrong nlattr array, from Sabrina Dubroca.
19) Correct rx/tx stats in bcmgenet driver, from Doug Berger.
20) BPF_LDX zero-extension is encoded improperly in x86_32 bpf jit, fix
from Luke Nelson.
* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (100 commits)
selftests/bpf: Fix a couple of broken test_btf cases
tools/runqslower: Ensure own vmlinux.h is picked up first
bpf: Make bpf_link_fops static
bpftool: Respect the -d option in struct_ops cmd
selftests/bpf: Add test for freplace program with expected_attach_type
bpf: Propagate expected_attach_type when verifying freplace programs
bpf: Fix leak in LINK_UPDATE and enforce empty old_prog_fd
bpf, x86_32: Fix logic error in BPF_LDX zero-extension
bpf, x86_32: Fix clobbering of dst for BPF_JSET
bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension
bpf: Fix reStructuredText markup
net: systemport: suppress warnings on failed Rx SKB allocations
net: bcmgenet: suppress warnings on failed Rx SKB allocations
macsec: avoid to set wrong mtu
mac80211: sta_info: Add lockdep condition for RCU list usage
mac80211: populate debugfs only after cfg80211 init
net: bcmgenet: correct per TX/RX ring statistics
net: meth: remove spurious copyright text
net: phy: bcm84881: clear settings on link down
chcr: Fix CPU hard lockup
...
selftests/bpf: Fix a couple of broken test_btf cases
Commit 51c39bb1d5d1 ("bpf: Introduce function-by-function verification")
introduced function linkage flag and changed the error message from
"vlen != 0" to "Invalid func linkage" and broke some fake BPF programs.
Adjust the test accordingly.
AFACT, the programs don't really need any arguments and only look
at BTF for maps, so let's drop the args altogether.
Before:
BTF raw test[103] (func (Non zero vlen)): do_test_raw:3703:FAIL expected
err_str:vlen != 0
magic: 0xeb9f
version: 1
flags: 0x0
hdr_len: 24
type_off: 0
type_len: 72
str_off: 72
str_len: 10
btf_total_size: 106
[1] INT (anon) size=4 bits_offset=0 nr_bits=32 encoding=SIGNED
[2] INT (anon) size=4 bits_offset=0 nr_bits=32 encoding=(none)
[3] FUNC_PROTO (anon) return=0 args=(1 a, 2 b)
[4] FUNC func type_id=3 Invalid func linkage
BTF libbpf test[1] (test_btf_haskv.o): libbpf: load bpf program failed:
Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Validating test_long_fname_2() func#1...
Arg#0 type PTR in test_long_fname_2() is not supported yet.
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0
peak_states 0 mark_read 0
libbpf: -- END LOG --
libbpf: failed to load program 'dummy_tracepoint'
libbpf: failed to load object 'test_btf_haskv.o'
do_test_file:4201:FAIL bpf_object__load: -4007
BTF libbpf test[2] (test_btf_newkv.o): libbpf: load bpf program failed:
Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Validating test_long_fname_2() func#1...
Arg#0 type PTR in test_long_fname_2() is not supported yet.
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0
peak_states 0 mark_read 0
libbpf: -- END LOG --
libbpf: failed to load program 'dummy_tracepoint'
libbpf: failed to load object 'test_btf_newkv.o'
do_test_file:4201:FAIL bpf_object__load: -4007
BTF libbpf test[3] (test_btf_nokv.o): libbpf: load bpf program failed:
Invalid argument
libbpf: -- BEGIN DUMP LOG ---
libbpf:
Validating test_long_fname_2() func#1...
Arg#0 type PTR in test_long_fname_2() is not supported yet.
processed 0 insns (limit 1000000) max_states_per_insn 0 total_states 0
peak_states 0 mark_read 0
libbpf: -- END LOG --
libbpf: failed to load program 'dummy_tracepoint'
libbpf: failed to load object 'test_btf_nokv.o'
do_test_file:4201:FAIL bpf_object__load: -4007
tools/runqslower: Ensure own vmlinux.h is picked up first
Reorder include paths to ensure that runqslower sources are picking up
vmlinux.h, generated by runqslower's own Makefile. When runqslower is built
from selftests/bpf, due to current -I$(BPF_INCLUDE) -I$(OUTPUT) ordering, it
might pick up not-yet-complete vmlinux.h, generated by selftests Makefile,
which could lead to compilation errors like [0]. So ensure that -I$(OUTPUT)
goes first and rely on runqslower's Makefile own dependency chain to ensure
vmlinux.h is properly completed before source code relying on it is compiled.
selftests/bpf: Add test for freplace program with expected_attach_type
This adds a new selftest that tests the ability to attach an freplace
program to a program type that relies on the expected_attach_type of the
target program to pass verification.
bpf: Propagate expected_attach_type when verifying freplace programs
For some program types, the verifier relies on the expected_attach_type of
the program being verified in the verification process. However, for
freplace programs, the attach type was not propagated along with the
verifier ops, so the expected_attach_type would always be zero for freplace
programs.
This in turn caused the verifier to sometimes make the wrong call for
freplace programs. For all existing uses of expected_attach_type for this
purpose, the result of this was only false negatives (i.e., freplace
functions would be rejected by the verifier even though they were valid
programs for the target they were replacing). However, should a false
positive be introduced, this can lead to out-of-bounds accesses and/or
crashes.
The fix introduced in this patch is to propagate the expected_attach_type
to the freplace program during verification, and reset it after that is
done.
Luke Nelson [Wed, 22 Apr 2020 17:36:30 +0000 (10:36 -0700)]
bpf, x86_32: Fix clobbering of dst for BPF_JSET
The current JIT clobbers the destination register for BPF_JSET BPF_X
and BPF_K by using "and" and "or" instructions. This is fine when the
destination register is a temporary loaded from a register stored on
the stack but not otherwise.
This patch fixes the problem (for both BPF_K and BPF_X) by always loading
the destination register into temporaries since BPF_JSET should not
modify the destination register.
This bug may not be currently triggerable as BPF_REG_AX is the only
register not stored on the stack and the verifier uses it in a limited
way.
Fixes: 03f5781be2c7b ("bpf, x86_32: add eBPF JIT compiler for ia32") Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Wang YanQing <udknight@gmail.com> Link: https://lore.kernel.org/bpf/20200422173630.8351-2-luke.r.nels@gmail.com
Luke Nelson [Wed, 22 Apr 2020 17:36:29 +0000 (10:36 -0700)]
bpf, x86_32: Fix incorrect encoding in BPF_LDX zero-extension
The current JIT uses the following sequence to zero-extend into the
upper 32 bits of the destination register for BPF_LDX BPF_{B,H,W},
when the destination register is not on the stack:
EMIT3(0xC7, add_1reg(0xC0, dst_hi), 0);
The problem is that C7 /0 encodes a MOV instruction that requires a 4-byte
immediate; the current code emits only 1 byte of the immediate. This
means that the first 3 bytes of the next instruction will be treated as
the rest of the immediate, breaking the stream of instructions.
This patch fixes the problem by instead emitting "xor dst_hi,dst_hi"
to clear the upper 32 bits. This fixes the problem and is more efficient
than using MOV to load a zero immediate.
This bug may not be currently triggerable as BPF_REG_AX is the only
register not stored on the stack and the verifier uses it in a limited
way, and the verifier implements a zero-extension optimization. But the
JIT should avoid emitting incorrect encodings regardless.
Fixes: 03f5781be2c7b ("bpf, x86_32: add eBPF JIT compiler for ia32") Signed-off-by: Xi Wang <xi.wang@gmail.com> Signed-off-by: Luke Nelson <luke.r.nels@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: H. Peter Anvin (Intel) <hpa@zytor.com> Acked-by: Wang YanQing <udknight@gmail.com> Link: https://lore.kernel.org/bpf/20200422173630.8351-1-luke.r.nels@gmail.com
Yang Yingliang [Fri, 24 Apr 2020 08:03:48 +0000 (16:03 +0800)]
ptp: idt82p33: remove unnecessary comparison
The type of loaddr is u8 which is always '<=' 0xff, so the
loaddr <= 0xff is always true, we can remove this comparison.
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Acked-by: Igor Russkikh <irusskikh@marvell.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Zheng Bin <zhengbin13@huawei.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: systemport: suppress warnings on failed Rx SKB allocations
The driver is designed to drop Rx packets and reclaim the buffers
when an allocation fails, and the network interface needs to safely
handle this packet loss. Therefore, an allocation failure of Rx
SKBs is relatively benign.
However, the output of the warning message occurs with a high
scheduling priority that can cause excessive jitter/latency for
other high priority processing.
This commit suppresses the warning messages to prevent scheduling
problems while retaining the failure count in the statistics of
the network interface.
Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: bcmgenet: suppress warnings on failed Rx SKB allocations
The driver is designed to drop Rx packets and reclaim the buffers
when an allocation fails, and the network interface needs to safely
handle this packet loss. Therefore, an allocation failure of Rx
SKBs is relatively benign.
However, the output of the warning message occurs with a high
scheduling priority that can cause excessive jitter/latency for
other high priority processing.
This commit suppresses the warning messages to prevent scheduling
problems while retaining the failure count in the statistics of
the network interface.
Signed-off-by: Doug Berger <opendmb@gmail.com> Acked-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: phy: clear phydev->suspended after soft reset
If a soft reset is triggered whilst PHY is in power-down, then
phydev->suspended will remain set. Seems we didn't face any issue yet
caused by this, but better reset the suspended flag after soft reset.
See also the following from 22.2.4.1.1
Resetting a PHY is accomplished by setting bit 0.15 to a logic one.
This action shall set the status and control registers to their default
states.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Since 6e2d85ec0559 ("net: phy: Stop with excessive soft reset")
we don't need genphy_no_soft_reset() any longer. Not setting
callback soft_reset results in a no-op now.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Move the callback into the phylink_config structure, rather than
providing a callback to set this up.
Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Tested-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 24 Apr 2020 23:44:55 +0000 (16:44 -0700)]
Merge branch 'qdisc-noop'
Jesper Dangaard Brouer says:
====================
Fix qdisc noop issue caused by driver and identify future bugs
I've been very puzzled why networking on my NXP development board,
using driver dpaa2-eth, stopped working when I updated the kernel
version >= 5.3. The observable issue were that interface would drop
all TX packets, because it had assigned the qdisc noop.
This turned out the be a NIC driver bug, that would only get triggered
when using sysctl net/core/default_qdisc=fq_codel. It was non-trivial
to find out[1] this was driver related. Thus, this patchset besides
fixing the driver bug, also helps end-user identify the issue.
Drivers ndo_setup_tc call should return -EOPNOTSUPP, when it cannot
support the qdisc type. Other return values will result in failing the
qdisc setup. This lead to qdisc noop getting assigned, which will
drop all TX packets on the interface.
Fixes: ab1e6de2bd49 ("dpaa2-eth: Add mqprio support") Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: sched: report ndo_setup_tc failures via extack
Help end-users of the 'tc' command to see if the drivers ndo_setup_tc
function call fails. Troubleshooting when this happens is non-trivial
(see full process here[1]), and results in net_device getting assigned
the 'qdisc noop', which will drop all TX packets on the interface.
Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> Tested-by: Ioana Ciornei <ioana.ciornei@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When a macsec interface is created, the mtu is calculated with the lower
interface's mtu value.
If the mtu of lower interface is lower than the length, which is needed
by macsec interface, macsec's mtu value will be overflowed.
So, if the lower interface's mtu is too low, macsec interface's mtu
should be set to 0.
Test commands:
ip link add dummy0 mtu 10 type dummy
ip link add macsec0 link dummy0 type macsec
ip link show macsec0
Before:
11: macsec0@dummy0: <BROADCAST,MULTICAST,M-DOWN> mtu 4294967274
After:
11: macsec0@dummy0: <BROADCAST,MULTICAST,M-DOWN> mtu 0
Fixes: c09440f7dcb3 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two minor fixes: one to update a Kconfig reference and the other to
fix a resource leak on an error path in sg"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: Update referenced link to cdrtools
scsi: sg: add sg_remove_request in sg_write
David S. Miller [Fri, 24 Apr 2020 22:41:51 +0000 (15:41 -0700)]
Merge branch 'mlxsw-Mirroring-cleanups'
Ido Schimmel says:
====================
mlxsw: Mirroring cleanups
This patch set contains various cleanups in SPAN (mirroring) code
noticed by Amit and I while working on future enhancements in this area.
No functional changes intended. Tested by current mirroring selftests.
Patches #1-#2 from Amit reduce nesting in a certain function and rename
a callback to a more meaningful name.
Patch #3 removes debug prints that have little value.
Patch #4 converts a reference count to 'refcount_t' in order to catch
over/under flows.
Patch #5 replaces a zero-length array with flexible-array member in
order to get a compiler warning in case the flexible array does not
occur last in the structure.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
mlxsw: spectrum_span: Replace zero-length array with flexible-array member
In a similar fashion to commit e99f8e7f88b5 ("mlxsw: Replace zero-length
array with flexible-array member"), use a flexible-array member to get a
compiler warning in case the flexible array does not occur last in the
structure.
Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Fri, 24 Apr 2020 15:43:42 +0000 (18:43 +0300)]
mlxsw: spectrum_span: Rename parms() to parms_set()
Use a more meaningful name for parms() function.
Signed-off-by: Amit Cohen <amitc@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Amit Cohen [Fri, 24 Apr 2020 15:43:41 +0000 (18:43 +0300)]
mlxsw: spectrum_span: Reduce nesting in mlxsw_sp_span_entry_configure()
Use early return to avoid unnecessary nesting.
Signed-off-by: Amit Cohen <amitc@mellanox.com> Signed-off-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Jiri Pirko <jiri@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
proc: Put thread_pid in release_task not proc_flush_pid
Oleg pointed out that in the unlikely event the kernel is compiled
with CONFIG_PROC_FS unset that release_task will now leak the pid.
Move the put_pid out of proc_flush_pid into release_task to fix this
and to guarantee I don't make that mistake again.
When possible it makes sense to keep get and put in the same function
so it can easily been seen how they pair up.
Fixes: 7bc3e6e55acf ("proc: Use a list of inodes to flush from proc") Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
Merge tag 'pm-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management fixes from Rafael Wysocki:
"Restore an optimization related to asynchronous suspend and resume of
devices during system-wide power transitions that was disabled by
mistake (Kai-Heng Feng) and update the pm-graph suite of power
management utilities (Todd Brandt)"
* tag 'pm-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
PM: sleep: core: Switch back to async_schedule_dev()
pm-graph v5.6
Merge tag 'acpi-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull ACPI fixes from Rafael Wysocki:
"Drop a lid status quirk for Asus T200TA that is not necessary any more
and clean up a resource management inconsistency in the PCI IRQ link
configuration code.
Both changes from Hans de Goede"
* tag 'acpi-5.7-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
ACPI: button: Drop no longer necessary Asus T200TA lid_init_state quirk
ACPI/PCI: pci_link: use extended_irq union member when setting ext-irq shareable
IORING_OP_MADVISE can end up basically doing mprotect() on the VM of
another process, which means that it can race with our crazy core dump
handling which accesses the VM state without holding the mmap_sem
(because it incorrectly thinks that it is the final user).
This is clearly a core dumping problem, but we've never fixed it the
right way, and instead have the notion of "check that the mm is still
ok" using mmget_still_valid() after getting the mmap_sem for writing in
any situation where we're not the original VM thread.
See commit 04f5866e41fb ("coredump: fix race condition between
mmget_not_zero()/get_task_mm() and core dumping") for more background on
this whole mmget_still_valid() thing. You might want to have a barf bag
handy when you do.
We're discussing just fixing this properly in the only remaining core
dumping routines. But even if we do that, let's make do_madvise() do
the right thing, and then when we fix core dumping, we can remove all
these mmget_still_valid() checks.
David S. Miller [Fri, 24 Apr 2020 20:17:01 +0000 (13:17 -0700)]
Merge tag 'mac80211-for-net-2020-04-24' of git://git.kernel.org/pub/scm/linux/kernel/git/jberg/mac80211
Johannes Berg says:
====================
Just three changes:
* fix a wrong GFP_KERNEL in hwsim
* fix the debugfs mess after the mac80211 registration race fix
* suppress false-positive RCU list lockdep warnings
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Merge tag 'trace-v5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace
Pull tracing fixes from Steven Rostedt:
"A few tracing fixes:
- Two fixes for memory leaks detected by kmemleak
- Removal of some dead code
- A few local functions turned static"
* tag 'trace-v5.7-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace:
tracing: Convert local functions in tracing_map.c to static
tracing: Remove DECLARE_TRACE_NOARGS
ftrace: Fix memory leak caused by not freeing entry in unregister_ftrace_direct()
tracing: Fix memory leaks in trace_events_hist.c