net: sched: em_ipt: keep the user-specified nfproto and dump it
If we dump NFPROTO_UNSPEC as nfproto user-space libxtables can't handle
it and would exit with an error like:
"libxtables: unhandled NFPROTO in xtables_set_nfproto"
In order to avoid the error return the user-specified nfproto. If we
don't record it then the match family is used which can be
NFPROTO_UNSPEC. Even if we add support to mask NFPROTO_UNSPEC in
iproute2 we have to be compatible with older versions which would be
also be allowed to add NFPROTO_UNSPEC matches (e.g. addrtype after the
last patch).
v3: don't use the user nfproto for matching, only for dumping the rule,
also don't allow the nfproto to be unspecified (explained above)
v2: adjust changes to missing patch, was patch 04 in v1
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: sched: em_ipt: set the family based on the packet if it's unspecified
Set the family based on the packet if it's unspecified otherwise
protocol-neutral matches will have wrong information (e.g. NFPROTO_UNSPEC).
In preparation for using NFPROTO_UNSPEC xt matches.
v2: set the nfproto only when unspecified
Suggested-by: Eyal Birger <eyal.birger@gmail.com> Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Restrict matching only to ip/ipv6 traffic and make sure we can use the
headers, otherwise matches will be attempted on any protocol which can
be unexpected by the xt matches. Currently policy supports only ipv4/6.
Signed-off-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net/sched: Add txtime-assist support for taprio.
Changes in v6:
- Use _BITUL() instead of BIT() in UAPI for etf. (patch #1)
- Fix a bug reported by kbuild test bot in length_to_duration(). (patch #6)
- Remove an unused function (get_cycle_start()). (Patch #6)
Changes in v5:
- Commit message improved for the igb patch (patch #1).
- Fixed typo in commit message for etf patch (patch #2).
Changes in v4:
- Remove inline directive from functions in foo.c.
- Fix spacing in pkt_sched.h (for etf patch).
Changes in v3:
- Simplify implementation for taprio flags.
- txtime_delay can only be set if txtime-assist mode is enabled.
- txtime_delay and flags will only be visible in tc output if set by user.
- Minor changes in error reporting.
Changes in v2:
- Txtime-offload has now been renamed to txtime-assist mode.
- Renamed the offload parameter to flags.
- Removed the code which introduced the hardware offloading functionality.
Original Cover letter (with above changes included)
--------------------------------------------------
Currently, we are seeing packets being transmitted outside their
timeslices. We can confirm that the packets are being dequeued at the right
time. So, the delay is induced after the packet is dequeued, because
taprio, without any offloading, has no control of when a packet is actually
transmitted.
In order to solve this, we are making use of the txtime feature provided by
ETF qdisc. Hardware offloading needs to be supported by the ETF qdisc in
order to take advantage of this feature. The taprio qdisc will assign
txtime (in skb->tstamp) for all the packets which do not have the txtime
allocated via the SO_TXTIME socket option. For the packets which already
have SO_TXTIME set, taprio will validate whether the packet will be
transmitted in the correct interval.
In order to support this, the following parameters have been added:
- flags (taprio): This is added in order to support different offloading
modes which will be added in the future.
- txtime-delay (taprio): This indicates the minimum time it will take for
the packet to hit the wire after it reaches taprio_enqueue(). This is
useful in determining whether we can transmit the packet in the remaining
time if the gate corresponding to the packet is currently open.
- skip_skb_check (ETF): ETF currently drops any packet which does not have
the SO_TXTIME socket option set. This check can be skipped by specifying
this option.
Here, the "flags" parameter is indicating that the txtime-assist mode is
enabled. Also, all the traffic classes have been assigned the same queue.
This is to prevent the traffic classes in the lower priority queues from
getting starved. Note that this configuration is specific to the i210
ethernet card. Other network cards where the hardware queues are given the
same priority, might be able to utilize more than one queue.
Following are some of the other highlights of the series:
- Fix a bug where hardware timestamping and SO_TXTIME options cannot be
used together. (Patch 1)
- Introduces the skip_skb_check option. (Patch 2)
- Make TxTime assist mode work with TCP packets (Patch 7).
The following changes are recommended to be done in order to get the best
performance from taprio in this mode:
ip link set dev enp1s0 mtu 1514
ethtool -K eth0 gso off
ethtool -K eth0 tso off
ethtool --set-eee eth0 eee off
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vedang Patel [Tue, 25 Jun 2019 22:07:19 +0000 (15:07 -0700)]
taprio: Adjust timestamps for TCP packets
When the taprio qdisc is running in "txtime offload" mode, it will
set the launchtime value (in skb->tstamp) for all the packets which do
not have the SO_TXTIME socket option. But, the TCP packets already have
this value set and it indicates the earliest departure time represented
in CLOCK_MONOTONIC clock.
We need to respect the timestamp set by the TCP subsystem. So, convert
this time to the clock which taprio is using and ensure that the packet
is not transmitted before the deadline set by TCP.
Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vedang Patel [Tue, 25 Jun 2019 22:07:18 +0000 (15:07 -0700)]
taprio: make clock reference conversions easier
Later in this series we will need to transform from
CLOCK_MONOTONIC (used in TCP) to the clock reference used in TAPRIO.
Signed-off-by: Vinicius Costa Gomes <vinicius.gomes@intel.com> Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vedang Patel [Tue, 25 Jun 2019 22:07:17 +0000 (15:07 -0700)]
taprio: Add support for txtime-assist mode
Currently, we are seeing non-critical packets being transmitted outside of
their timeslice. We can confirm that the packets are being dequeued at the
right time. So, the delay is induced in the hardware side. The most likely
reason is the hardware queues are starving the lower priority queues.
In order to improve the performance of taprio, we will be making use of the
txtime feature provided by the ETF qdisc. For all the packets which do not
have the SO_TXTIME option set, taprio will set the transmit timestamp (set
in skb->tstamp) in this mode. TAPrio Qdisc will ensure that the transmit
time for the packet is set to when the gate is open. If SO_TXTIME is set,
the TAPrio qdisc will validate whether the timestamp (in skb->tstamp)
occurs when the gate corresponding to skb's traffic class is open.
Following two parameters added to support this mode:
- flags: used to enable txtime-assist mode. Will also be used to enable
other modes (like hardware offloading) later.
- txtime-delay: This indicates the minimum time it will take for the packet
to hit the wire. This is useful in determining whether we can transmit
the packet in the remaining time if the gate corresponding to the packet is
currently open.
An example configuration for enabling txtime-assist:
Note that all the traffic classes are mapped to the same queue. This is
only possible in taprio when txtime-assist is enabled. Also, note that the
ETF Qdisc is enabled with offload mode set.
In this mode, if the packet's traffic class is open and the complete packet
can be transmitted, taprio will try to transmit the packet immediately.
This will be done by setting skb->tstamp to current_time + the time delta
indicated in the txtime-delay parameter. This parameter indicates the time
taken (in software) for packet to reach the network adapter.
If the packet cannot be transmitted in the current interval or if the
packet's traffic is not currently transmitting, the skb->tstamp is set to
the next available timestamp value. This is tracked in the next_launchtime
parameter in the struct sched_entry.
The behaviour w.r.t admin and oper schedules is not changed from what is
present in software mode.
The transmit time is already known in advance. So, we do not need the HR
timers to advance the schedule and wakeup the dequeue side of taprio. So,
HR timer won't be run when this mode is enabled.
Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vedang Patel [Tue, 25 Jun 2019 22:07:15 +0000 (15:07 -0700)]
taprio: calculate cycle_time when schedule is installed
cycle time for a particular schedule is calculated only when it is first
installed. So, it makes sense to just calculate it once right after the
'cycle_time' parameter has been parsed and store it in cycle_time.
Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vedang Patel [Tue, 25 Jun 2019 22:07:14 +0000 (15:07 -0700)]
etf: Add skip_sock_check
Currently, etf expects a socket with SO_TXTIME option set for each packet
it encounters. So, it will drop all other packets. But, in the future
commits we are planning to add functionality where tstamp value will be set
by another qdisc. Also, some packets which are generated from within the
kernel (e.g. ICMP packets) do not have any socket associated with them.
So, this commit adds support for skip_sock_check. When this option is set,
etf will skip checking for a socket and other associated options for all
skbs.
Signed-off-by: Vedang Patel <vedang.patel@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vedang Patel [Tue, 25 Jun 2019 22:07:12 +0000 (15:07 -0700)]
igb: clear out skb->tstamp after reading the txtime
If a packet which is utilizing the launchtime feature (via SO_TXTIME socket
option) also requests the hardware transmit timestamp, the hardware
timestamp is not delivered to the userspace. This is because the value in
skb->tstamp is mistaken as the software timestamp.
Applications, like ptp4l, request a hardware timestamp by setting the
SOF_TIMESTAMPING_TX_HARDWARE socket option. Whenever a new timestamp is
detected by the driver (this work is done in igb_ptp_tx_work() which calls
igb_ptp_tx_hwtstamps() in igb_ptp.c[1]), it will queue the timestamp in the
ERR_QUEUE for the userspace to read. When the userspace is ready, it will
issue a recvmsg() call to collect this timestamp. The problem is in this
recvmsg() call. If the skb->tstamp is not cleared out, it will be
interpreted as a software timestamp and the hardware tx timestamp will not
be successfully sent to the userspace. Look at skb_is_swtx_tstamp() and the
callee function __sock_recv_timestamp() in net/socket.c for more details.
Signed-off-by: Vedang Patel <vedang.patel@intel.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 28 Jun 2019 21:36:25 +0000 (14:36 -0700)]
Merge branch 'mirred-recurse'
John Hurley says:
====================
Track recursive calls in TC act_mirred
These patches aim to prevent act_mirred causing stack overflow events from
recursively calling packet xmit or receive functions. Such events can
occur with poor TC configuration that causes packets to travel in loops
within the system.
Florian Westphal advises that a recursion crash and packets looping are
separate issues and should be treated as such. David Miller futher points
out that pcpu counters cannot track the precise skb context required to
detect loops. Hence these patches are not aimed at detecting packet loops,
rather, preventing stack flows arising from such loops.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Mon, 24 Jun 2019 22:13:36 +0000 (23:13 +0100)]
net: sched: protect against stack overflow in TC act_mirred
TC hooks allow the application of filters and actions to packets at both
ingress and egress of the network stack. It is possible, with poor
configuration, that this can produce loops whereby an ingress hook calls
a mirred egress action that has an egress hook that redirects back to
the first ingress etc. The TC core classifier protects against loops when
doing reclassifies but there is no protection against a packet looping
between multiple hooks and recursively calling act_mirred. This can lead
to stack overflow panics.
Add a per CPU counter to act_mirred that is incremented for each recursive
call of the action function when processing a packet. If a limit is passed
then the packet is dropped and CPU counter reset.
Note that this patch does not protect against loops in TC datapaths. Its
aim is to prevent stack overflow kernel panics that can be a consequence
of such loops.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
John Hurley [Mon, 24 Jun 2019 22:13:35 +0000 (23:13 +0100)]
net: sched: refactor reinsert action
The TC_ACT_REINSERT return type was added as an in-kernel only option to
allow a packet ingress or egress redirect. This is used to avoid
unnecessary skb clones in situations where they are not required. If a TC
hook returns this code then the packet is 'reinserted' and no skb consume
is carried out as no clone took place.
This return type is only used in act_mirred. Rather than have the reinsert
called from the main datapath, call it directly in act_mirred. Instead of
returning TC_ACT_REINSERT, change the type to the new TC_ACT_CONSUMED
which tells the caller that the packet has been stolen by another process
and that no consume call is required.
Moving all redirect calls to the act_mirred code is in preparation for
tracking recursion created by act_mirred.
Signed-off-by: John Hurley <john.hurley@netronome.com> Reviewed-by: Simon Horman <simon.horman@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Tools such as vpnc try to flush routes when run inside network
namespaces by writing 1 into /proc/sys/net/ipv4/route/flush. This
currently does not work because flush is not enabled in non-initial
network namespaces.
Since routes are per network namespace it is safe to enable
/proc/sys/net/ipv4/route/flush in there.
Link: https://github.com/lxc/lxd/issues/4257 Signed-off-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Peng Li [Fri, 28 Jun 2019 11:50:18 +0000 (19:50 +0800)]
net: hns3: optimize the CSQ cmd error handling
If CMDQ ring is full, hclge_cmd_send may return directly, but IMP still
working and HW pointer changed, SW ring pointer do not match the HW
pointer. This patch update the SW pointer every time when the space is
full, so it can work normally next time if IMP and HW still working.
Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Fri, 28 Jun 2019 11:50:17 +0000 (19:50 +0800)]
net: hns3: remove RXD_VLD check in hns3_handle_bdinfo
The HNS3_RXD_VLD_B bit has already been checked in hns3_add_frag
or hns3_handle_rx_bd before calling hns3_handle_bdinfo, so when
hns3_handle_bdinfo is called, the HNS3_RXD_VLD_B bit is always
set, which makes the checking in hns3_handle_bdinfo unnecessary.
This patch removes the RXD_VLD_B checking in hns3_handle_bdinfo.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jian Shen [Fri, 28 Jun 2019 11:50:16 +0000 (19:50 +0800)]
net: hns3: remove unused linkmode definition
This patch removes unused linkmode definition.
Signed-off-by: Jian Shen <shenjian15@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yufeng Mo [Fri, 28 Jun 2019 11:50:15 +0000 (19:50 +0800)]
net: hns3: fix a statistics issue about l3l4 checksum error
The frame column is based on rx_crc_errors and rx_frame_errors. So
l3l4 checksum error should not be counted by rx_crc_errors. Instead,
l3l4 checksum error should be counted in ifconfig error column.
Fixes: d3ec4ef66937 ("net: hns3: refactor the statistics updating for netdev") Signed-off-by: Yufeng Mo <moyufeng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Fri, 28 Jun 2019 11:50:14 +0000 (19:50 +0800)]
net: hns3: handle empty unknown interrupt
Since some MSI-X interrupt's status may be cleared by hardware,
so when the driver receives the interrupt, reading
HCLGE_VECTOR0_PF_OTHER_INT_STS_REG register will get an empty
unknown interrupt. For this case, the irq handler should enable
vector0 interrupt. This patch also use dev_info() instead of
dev_dbg() in the hclge_check_event_cause(), since this information
will be useful for normal usage.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Fri, 28 Jun 2019 11:50:13 +0000 (19:50 +0800)]
net: hns3: re-schedule reset task while VF reset fail
The VF reset may fail for some probabilistic reasons,
such as wait for hardware reset timeout, wait for mailbox
response timeout, so this patch tries to re-schedule the
reset task when the number of reset failing is under
HCLGEVF_RESET_MAX_FAIL_CNT. This patch also add a function
hclgevf_reset_err_handle() to handle the reset failing.
Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yonglong Liu [Fri, 28 Jun 2019 11:50:12 +0000 (19:50 +0800)]
net: hns3: add Asym Pause support to fix autoneg problem
Local device and link partner config auto-negotiation on both,
local device config pause frame use as: rx on/tx off,
link partner config pause frame use as: rx off/tx on.
We except the result is:
Local device:
Autonegotiate: on
RX: on
TX: off
RX negotiated: on
TX negotiated: off
Link partner:
Autonegotiate: on
RX: off
TX: on
RX negotiated: off
TX negotiated: on
But actually, the result of Local device and link partner is both:
Autonegotiate: on
RX: off
TX: off
RX negotiated: off
TX negotiated: off
The root cause is that the supported flag is has only Pause,
reference to the function genphy_config_advert():
static int genphy_config_advert(struct phy_device *phydev)
{
...
linkmode_and(phydev->advertising, phydev->advertising,
phydev->supported);
...
}
The pause frame use of link partner is rx off/tx on, so its
advertising only set the bit Asym_Pause, and the supported is
only set the bit Pause, so the result of linkmode_and(), is
rx off/tx off.
This patch adds Asym_Pause to the supported flag to fix it.
Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yonglong Liu [Fri, 28 Jun 2019 11:50:11 +0000 (19:50 +0800)]
net: hns3: fix a -Wformat-nonliteral compile warning
When setting -Wformat=2, there is a compiler warning like this:
hclge_main.c:xxx:x: warning: format not a string literal and no
format arguments [-Wformat-nonliteral]
strs[i].desc);
^~~~
This patch adds missing format parameter "%s" to snprintf() to
fix it.
Fixes: 46a3df9f9718 ("Add HNS3 Acceleration Engine & Compatibility Layer Support") Signed-off-by: Yonglong Liu <liuyonglong@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Fri, 28 Jun 2019 11:50:10 +0000 (19:50 +0800)]
net: hns3: add some error checking in hclge_tm module
When hdev->tx_sch_mode is HCLGE_FLAG_VNET_BASE_SCH_MODE, the
hclge_tm_schd_mode_vnet_base_cfg calls hclge_tm_pri_schd_mode_cfg
with vport->vport_id as pri_id, which is used as index for
hdev->tm_info.tc_info, it will cause out of bound access issue
if vport_id is equal to or larger than HNAE3_MAX_TC.
Also hardware only support maximum speed of HCLGE_ETHER_MAX_RATE.
So this patch adds two checks for above cases.
Fixes: 848440544b41 ("net: hns3: Add support of TX Scheduler & Shaper to HNS3 driver") Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Fri, 28 Jun 2019 11:50:09 +0000 (19:50 +0800)]
net: hns3: change SSU's buffer allocation according to UM
Currently when there is share buffer in the SSU(storage
switching unit), the low waterline for RX private buffer is
too low to keep the hardware running. Hardware may have
processed all the packet stored in the private buffer of the
low waterline before the new packet comes, because hardware
only tell the peer send packet again when the private buffer
is under the low waterline.
So this patch only allocate RX private buffer if there is
enough buffer according to hardware user manual.
This patch also reserve some buffer for reusing when TC num
is less than or equal to 2, and change PAUSE_TRANS_GAP &
HCLGE_NON_DCB_ADDITIONAL_BUF according to hardware user
manual.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Yunsheng Lin [Fri, 28 Jun 2019 11:50:08 +0000 (19:50 +0800)]
net: hns3: enable DCB when TC num is one and pfc_en is non-zero
Currently when TC num is one, the DCB will be disabled no matter if
pfc_en is non-zero or not.
This patch enables the DCB if pfc_en is non-zero, even when TC num
is one.
Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Signed-off-by: Peng Li <lipeng321@huawei.com> Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Huazhong Tan [Fri, 28 Jun 2019 11:50:07 +0000 (19:50 +0800)]
net: hns3: fix __QUEUE_STATE_STACK_XOFF not cleared issue
When change MTU or other operations, which just calling .reset_notify
to do HNAE3_DOWN_CLIENT and HNAE3_UP_CLIENT, then
the netdev_tx_reset_queue() in the hns3_clear_all_ring() will be
ignored. So the dev_watchdog() may misdiagnose a TX timeout.
This patch separates netdev_tx_reset_queue() from
hns3_clear_all_ring(), and unifies hns3_clear_all_ring() and
hns3_force_clear_all_ring into one, since they are doing
similar things.
Fixes: 3a30964a2eef ("net: hns3: delay ring buffer clearing during reset") Signed-off-by: Huazhong Tan <tanhuazhong@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 27 Jun 2019 21:46:37 +0000 (00:46 +0300)]
net: dsa: sja1105: Mark in-band AN modes not supported for PHYLINK
We need a better way to signal this, perhaps in phylink_validate, but
for now just print this error message as guidance for other people
looking at this driver's code while trying to rework PHYLINK.
Cc: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 27 Jun 2019 21:46:36 +0000 (00:46 +0300)]
net: dsa: sja1105: Check for PHY mode mismatches with what PHYLINK reports
PHYLINK being designed with PHYs in mind that can change MII protocol,
for correct operation it is necessary to ensure that the PHY interface
mode stays the same (otherwise clear the supported bit mask, as
required).
Because this is just a hypothetical situation for now, we don't bother
to check whether we could actually support the new PHY interface mode.
Actually we could modify the xMII table, reset the switch and send an
updated static configuration, but adding that would just be dead code.
Cc: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Thu, 27 Jun 2019 21:46:35 +0000 (00:46 +0300)]
net: dsa: sja1105: Don't check state->link in phylink_mac_config
It has been pointed out that PHYLINK can call mac_config only to update
the phy_interface_type and without knowing what the AN results are.
Experimentally, when this was observed to happen, state->link was also
unset, and therefore was used as a proxy to ignore this call. However it
is also suggested that state->link is undefined for this callback and
should not be relied upon.
So let the previously-dead codepath for SPEED_UNKNOWN be called, and
update the comment to make sure the MAC's behavior is sane.
Cc: Russell King <rmk+kernel@armlinux.org.uk> Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Arnd Bergmann [Fri, 28 Jun 2019 10:31:44 +0000 (12:31 +0200)]
hinic: reduce rss_init stack usage
On 32-bit architectures, putting an array of 256 u32 values on the
stack uses more space than the warning limit:
drivers/net/ethernet/huawei/hinic/hinic_main.c: In function 'hinic_rss_init':
drivers/net/ethernet/huawei/hinic/hinic_main.c:286:1: error: the frame size of 1068 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
I considered changing the code to use u8 values here, since that's
all the hardware supports, but dynamically allocating the array is
a more isolated fix here.
Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Fri, 28 Jun 2019 07:29:21 +0000 (09:29 +0200)]
net: stmmac: Update Kconfig entry
We support more speeds now. Update the Kconfig entry.
Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Fri, 28 Jun 2019 07:29:20 +0000 (09:29 +0200)]
net: stmmac: Only disable interrupts if NAPI is scheduled
Only disable the interrupts if RX NAPI gets to be scheduled. Also,
schedule the TX NAPI only when the interrupts are disabled.
Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Fri, 28 Jun 2019 07:29:19 +0000 (09:29 +0200)]
net: stmmac: Update RX Tail Pointer to last free entry
Update the RX Tail Pointer to the last available SKB entry.
Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Fri, 28 Jun 2019 07:29:18 +0000 (09:29 +0200)]
net: stmmac: Enable support for > 32 Bits addressing in XGMAC
Currently, stmmac only supports 32 bits addressing for SKB. Enable the
support for upto 48 bits addressing in XGMAC core.
This avoids the use of bounce buffers and increases performance.
Changes from v1:
- Fallback to 32 bits in failure (Andrew)
Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Cc: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Fri, 28 Jun 2019 07:29:17 +0000 (09:29 +0200)]
net: stmmac: Do not disable interrupts when cleaning TX
This is a performance killer and anyways the interrupts are being
disabled by RX NAPI so no need to disable them again.
Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Fri, 28 Jun 2019 07:29:16 +0000 (09:29 +0200)]
net: stmmac: Add the missing speeds that XGMAC supports
XGMAC supports following speeds:
- 10G XGMII
- 5G XGMII
- 2.5G XGMII
- 2.5G GMII
- 1G GMII
- 100M MII
- 10M MII
Add them to the stmmac driver.
Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Fri, 28 Jun 2019 07:29:15 +0000 (09:29 +0200)]
net: stmmac: dwxgmac: Fix the undefined burst setting
Undefined burst shall only be set if pdata asks to.
Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Fri, 28 Jun 2019 07:29:14 +0000 (09:29 +0200)]
net: stmmac: Decrease default RX Watchdog value
For performance reasons decrease the default RX Watchdog value for the
minimum allowed.
Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Fri, 28 Jun 2019 07:29:13 +0000 (09:29 +0200)]
net: stmmac: Do not try to enable PHY EEE if MAC does not support it
Do not enable EEE feature in the PHY if MAC does not support it.
Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Fri, 28 Jun 2019 07:29:12 +0000 (09:29 +0200)]
net: stmmac: dwxgmac: Enable EDMA by default
Enable the EDMA feature by default which gives higher performance.
Changes from v1:
- Do not use magic values (David)
Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jose Abreu [Fri, 28 Jun 2019 07:25:07 +0000 (09:25 +0200)]
net: stmmac: Fix case when PHY handle is not present
Some DT bindings do not have the PHY handle. Let's fallback to manually
discovery in case phylink_of_phy_connect() fails.
Changes from v1:
- Fixup comment style (Sergei)
Fixes: 74371272f97f ("net: stmmac: Convert to phylink and remove phylib logic") Reported-by: Katsuhiro Suzuki <katsuhiro@katsuster.net> Tested-by: Katsuhiro Suzuki <katsuhiro@katsuster.net> Signed-off-by: Jose Abreu <joabreu@synopsys.com> Cc: Joao Pinto <jpinto@synopsys.com> Cc: David S. Miller <davem@davemloft.net> Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com> Cc: Alexandre Torgue <alexandre.torgue@st.com> Cc: Sergei Shtylyov <sergei.shtylyov@cogentembedded.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sergej Benilov [Mon, 24 Jun 2019 21:21:02 +0000 (23:21 +0200)]
sis900: remove TxIDLE
Before "sis900: fix TX completion" patch, TX completion was done on TxIDLE interrupt.
TX completion also was the only thing done on TxIDLE interrupt.
Since "sis900: fix TX completion", TX completion is done on TxDESC interrupt.
So it is not necessary any more to set and to check for TxIDLE.
Eliminate TxIDLE from sis900.
Correct some typos, too.
Signed-off-by: Sergej Benilov <sergej.benilov@googlemail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Xin Long [Thu, 20 Jun 2019 11:03:41 +0000 (19:03 +0800)]
tipc: add dst_cache support for udp media
As other udp/ip tunnels do, tipc udp media should also have a
lockless dst_cache supported on its tx path.
Here we add dst_cache into udp_replicast to support dst cache
for both rmcast and rcast, and rmcast uses ub->rcast and each
rcast uses its own node in ub->rcast.list.
Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
nfp: extend flower capabilities for GRE tunnel offload
Pieter says:
This set extends the flower match and action components to offload
GRE decapsulation with classification and encapsulation actions. The
first 3 patches are refactor and cleanup patches for improving
readability and reusability. Patch 4 and 5 implement GRE decap and
encap functionality respectively.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Add new GRE encapsulation support, which allows offload of filters
using tunnel_key set action in combination with actions that egress
to GRE type ports.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Signed-off-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Extend the existing tunnel matching support to include GRE decap
classification. Specifically matching existing tunnel fields for
NVGRE (GRE with protocol field set to TEB).
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
nfp: flower: rename tunnel related functions in action offload
Previously tunnel related functions in action offload only applied
to UDP tunnels. Rename these functions in preparation for new
tunnel types.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
nfp: flower: add helper functions for tunnel classification
Adds IPv4 address and TTL/TOS helper functions, which is done in
preparation for compiling new tunnel types.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Refactor the key layer calculation function, in particular the tunnel
key layer calculation by introducing helper functions. This is done
in preparation for supporting GRE tunnel offloads.
Signed-off-by: Pieter Jansen van Vuuren <pieter.jansenvanvuuren@netronome.com> Reviewed-by: Jakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: John Hurley <john.hurley@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
net: dsa: microchip: Further regmap cleanups
This patchset cleans up KSZ9477 switch driver by replacing various
ad-hoc polling implementations and register RMW with regmap functions.
Each polling function is replaced separately to make it easier to review
and possibly bisect, but maybe the patches can be squashed.
====================
Signed-off-by: Marek Vasut <marex@denx.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Thu, 27 Jun 2019 21:55:56 +0000 (23:55 +0200)]
net: dsa: microchip: Replace bit RMW with regmap
Regmap provides read-modify-write function to update bitfields in
registers. Replace ad-hoc read-modify-write with regmap_update_bits()
where applicable.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Thu, 27 Jun 2019 21:55:55 +0000 (23:55 +0200)]
net: dsa: microchip: Replace ksz9477_wait_alu_sta_ready polling with regmap
Regmap provides polling function to poll for bits in a register. This
function is another reimplementation of polling for bit being clear in
a register. Replace this with regmap polling function. Moreover, inline
the function parameters, as the function is never called with any other
parameter values than this one.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Thu, 27 Jun 2019 21:55:54 +0000 (23:55 +0200)]
net: dsa: microchip: Replace ksz9477_wait_alu_ready polling with regmap
Regmap provides polling function to poll for bits in a register. This
function is another reimplementation of polling for bit being clear in
a register. Replace this with regmap polling function. Moreover, inline
the function parameters, as the function is never called with any other
parameter values than this one.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Thu, 27 Jun 2019 21:55:53 +0000 (23:55 +0200)]
net: dsa: microchip: Replace ksz9477_wait_vlan_ctrl_ready polling with regmap
Regmap provides polling function to poll for bits in a register. This
function is another reimplementation of polling for bit being clear in
a register. Replace this with regmap polling function. Moreover, inline
the function parameters, as the function is never called with any other
parameter values than this one.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Thu, 27 Jun 2019 21:55:52 +0000 (23:55 +0200)]
net: dsa: microchip: Replace ad-hoc polling with regmap
Regmap provides polling function to poll for bits in a register,
use in instead of reimplementing it.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 28 Jun 2019 00:50:09 +0000 (08:50 +0800)]
Merge tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux
Pull clk fixes from Stephen Boyd:
"A handful of clk driver fixes and one core framework fix
- Do a DT/firmware lookup in clk_core_get() even when the DT index is
a nonsensical value
- Fix some clk data typos in the Amlogic DT headers/code
- Avoid returning junk in the TI clk driver when an invalid clk is
looked for
- Fix dividers for the emac clks on Stratix10 SoCs
- Fix default HDA rates on Tegra210 to correct distorted audio"
* tag 'clk-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux:
clk: socfpga: stratix10: fix divider entry for the emac clocks
clk: Do a DT parent lookup even when index < 0
clk: tegra210: Fix default rates for HDA clocks
clk: ti: clkctrl: Fix returning uninitialized data
clk: meson: meson8b: fix a typo in the VPU parent names array variable
clk: meson: fix MPLL 50M binding id typo
Linus Torvalds [Fri, 28 Jun 2019 00:48:21 +0000 (08:48 +0800)]
Merge tag 'for-5.2/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm
Pull device mapper fixes from Mike Snitzer:
- Fix incorrect uses of kstrndup and DM logging macros in DM's early
init code.
- Fix DM log-writes target's handling of super block sectors so updates
are made in order through use of completion.
- Fix DM core's argument splitting code to avoid undefined behaviour
reported as a side-effect of UBSAN analysis on ppc64le.
- Fix DM verity target to limit the amount of error messages that can
result from a corrupt block being found.
* tag 'for-5.2/dm-fixes-2' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
dm verity: use message limit for data block corruption message
dm table: don't copy from a NULL pointer in realloc_argv()
dm log writes: make sure super sector log updates are written in order
dm init: remove trailing newline from calls to DMERR() and DMINFO()
dm init: fix incorrect uses of kstrndup()
Linus Torvalds [Fri, 28 Jun 2019 00:41:18 +0000 (08:41 +0800)]
Merge tag 'for-linus-20190627' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux
Pull pidfd fixes from Christian Brauner:
"Userspace tools and libraries such as strace or glibc need a cheap and
reliable way to tell whether CLONE_PIDFD is supported. The easiest way
is to pass an invalid fd value in the return argument, perform the
syscall and verify the value in the return argument has been changed
to a valid fd.
However, if CLONE_PIDFD is specified we currently check if pidfd == 0
and return EINVAL if not.
The check for pidfd == 0 was originally added to enable us to abuse
the return argument for passing additional flags along with
CLONE_PIDFD in the future.
However, extending legacy clone this way would be a terrible idea and
with clone3 on the horizon and the ability to reuse CLONE_DETACHED
with CLONE_PIDFD there's no real need for this clutch. So remove the
pidfd == 0 check and help userspace out.
Also, accordig to Al, anon_inode_getfd() should only be used past the
point of no failure and ksys_close() should not be used at all since
it is far too easy to get wrong. Al's motto being "basically, once
it's in descriptor table, it's out of your control". So Al's patch
switches back to what we already had in v1 of the original patchset
and uses a anon_inode_getfile() + put_user() + fd_install() sequence
in the success path and a fput() + put_unused_fd() in the failure
path.
The other two changes should be trivial"
* tag 'for-linus-20190627' of gitolite.kernel.org:pub/scm/linux/kernel/git/brauner/linux:
proc: remove useless d_is_dir() check
copy_process(): don't use ksys_close() on cleanups
samples: make pidfd-metadata fail gracefully on older kernels
fork: don't check parent_tidptr with CLONE_PIDFD
Linus Torvalds [Fri, 28 Jun 2019 00:39:18 +0000 (08:39 +0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Jiri Kosina:
- fix for one corner case in HID++ protocol with respect to handling
very long reports, from Hans de Goede
- power management fix in Intel-ISH driver, from Hyungwoo Yang
- use-after-free fix in Intel-ISH driver, from Dan Carpenter
- a couple of new device IDs/quirks from Kai-Heng Feng, Kyle Godbey and
Oleksandr Natalenko
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: intel-ish-hid: fix wrong driver_data usage
HID: multitouch: Add pointstick support for ALPS Touchpad
HID: logitech-dj: Fix forwarding of very long HID++ reports
HID: uclogic: Add support for Huion HS64 tablet
HID: chicony: add another quirk for PixArt mouse
HID: intel-ish-hid: Fix a use after free in load_fw_from_host()
Linus Torvalds [Fri, 28 Jun 2019 00:37:04 +0000 (08:37 +0800)]
Merge tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull ARM SoC fixes from Olof Johansson:
"A smaller batch of fixes, nothing that stands out as risky or scary.
Mostly DTS tweaks for a few issues:
- GPU fixlets for Meson
- CPU idle fix for LS1028A
- PWM interrupt fixes for i.MX6UL
Also, enable a driver (FSL_EDMA) on arm64 defconfig, and a warning and
two MAINTAINER tweaks"
* tag 'armsoc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc:
ARM: dts: imx6ul: fix PWM[1-4] interrupts
ARM: omap2: remove incorrect __init annotation
ARM: dts: gemini Fix up DNS-313 compatible string
ARM: dts: Blank D-Link DIR-685 console
arm64: defconfig: Enable FSL_EDMA driver
arm64: dts: ls1028a: Fix CPU idle fail.
MAINTAINERS: BCM53573: Add internal Broadcom mailing list
MAINTAINERS: BCM2835: Add internal Broadcom mailing list
ARM: dts: meson8b: fix the operating voltage of the Mali GPU
ARM: dts: meson8b: drop undocumented property from the Mali GPU node
ARM: dts: meson8: fix GPU interrupts and drop an undocumented property
Linus Torvalds [Fri, 28 Jun 2019 00:34:12 +0000 (08:34 +0800)]
Merge tag 'afs-fixes-20190620' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs
Pull AFS fixes from David Howells:
"The in-kernel AFS client has been undergoing testing on opendev.org on
one of their mirror machines. They are using AFS to hold data that is
then served via apache, and Ian Wienand had reported seeing oopses,
spontaneous machine reboots and updates to volumes going missing. This
patch series appears to have fixed the problem, very probably due to
patch (2), but it's not 100% certain.
(1) Fix the printing of the "vnode modified" warning to exclude checks
on files for which we don't have a callback promise from the
server (and so don't expect the server to tell us when it
changes).
Without this, for every file or directory for which we still have
an in-core inode that gets changed on the server, we may get a
message logged when we next look at it. This can happen in bulk
if, for instance, someone does "vos release" to update a R/O
volume from a R/W volume and a whole set of files are all changed
together.
We only really want to log a message if the file changed and the
server didn't tell us about it or we failed to track the state
internally.
(2) Fix accidental corruption of either afs_vlserver struct objects or
the the following memory locations (which could hold anything).
The issue is caused by a union that points to two different
structs in struct afs_call (to save space in the struct). The call
cleanup code assumes that it can simply call the cleanup for one
of those structs if not NULL - when it might be actually pointing
to the other struct.
This means that every Volume Location RPC op is going to corrupt
something.
(3) Fix an uninitialised spinlock. This isn't too bad, it just causes
a one-off warning if lockdep is enabled when "vos release" is
called, but the spinlock still behaves correctly.
(4) Fix the setting of i_block in the inode. This causes du, for
example, to produce incorrect results, but otherwise should not be
dangerous to the kernel"
* tag 'afs-fixes-20190620' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs:
afs: Fix setting of i_blocks
afs: Fix uninitialised spinlock afs_volume::cb_break_lock
afs: Fix vlserver record corruption
afs: Fix over zealous "vnode modified" warnings
1) Fix ppp_mppe crypto soft dependencies, from Takashi Iawi.
2) Fix TX completion to be finite, from Sergej Benilov.
3) Use register_pernet_device to avoid a dst leak in tipc, from Xin
Long.
4) Double free of TX cleanup in Dirk van der Merwe.
5) Memory leak in packet_set_ring(), from Eric Dumazet.
6) Out of bounds read in qmi_wwan, from Bjørn Mork.
7) Fix iif used in mcast/bcast looped back packets, from Stephen
Suryaputra.
8) Fix neighbour resolution on raw ipv6 sockets, from Nicolas Dichtel.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (25 commits)
af_packet: Block execution of tasks waiting for transmit to complete in AF_PACKET
sctp: change to hold sk after auth shkey is created successfully
ipv6: fix neighbour resolution with raw socket
ipv6: constify rt6_nexthop()
net: dsa: microchip: Use gpiod_set_value_cansleep()
net: aquantia: fix vlans not working over bridged network
ipv4: reset rt_iif for recirculated mcast/bcast out pkts
team: Always enable vlan tx offload
net/smc: Fix error path in smc_init
net/smc: hold conns_lock before calling smc_lgr_register_conn()
bonding: Always enable vlan tx offload
net/ipv6: Fix misuse of proc_dointvec "skip_notify_on_dev_down"
ipv4: Use return value of inet_iif() for __raw_v4_lookup in the while loop
qmi_wwan: Fix out-of-bounds read
tipc: check msg->req data len in tipc_nl_compat_bearer_disable
net: macb: do not copy the mac address if NULL
net/packet: fix memory leak in packet_set_ring()
net/tls: fix page double free on TX cleanup
net/sched: cbs: Fix error path of cbs_module_init
tipc: change to use register_pernet_device
...
David S. Miller [Thu, 27 Jun 2019 19:42:51 +0000 (12:42 -0700)]
Merge tag 'blk-dim-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux
Saeed Mamameed says:
====================
Generic DIM
From: Tal Gilboa and Yamin Fridman
Implement net DIM over a generic DIM library, add RDMA DIM
dim.h lib exposes an implementation of the DIM algorithm for
dynamically-tuned interrupt moderation for networking interfaces.
We want a similar functionality for other protocols, which might need to
optimize interrupts differently. Main motivation here is DIM for NVMf
storage protocol.
Current DIM implementation prioritizes reducing interrupt overhead over
latency. Also, in order to reduce DIM's own overhead, the algorithm might
take some time to identify it needs to change profiles. While this is
acceptable for networking, it might not work well on other scenarios.
Here we propose a new structure to DIM. The idea is to allow a slightly
modified functionality without the risk of breaking Net DIM behavior for
netdev. We verified there are no degradations in current DIM behavior with
the modified solution.
Suggested solution:
- Common logic is implemented in lib/dim/dim.c
- Net DIM (existing) logic is implemented in lib/dim/net_dim.c, which uses
the common logic in dim.c
- Any new DIM logic will be implemented in "lib/dim/new_dim.c".
This new implementation will expose modified versions of profiles,
dim_step() and dim_decision().
- DIM API is declared in include/linux/dim.h for all implementations.
Pros for this solution are:
- Zero impact on existing net_dim implementation and usage
- Relatively more code reuse (compared to two separate solutions)
- Increased extensibility
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
The QCA8337(N) has a RESETn signal on Pin B42 that
triggers a chip reset if the line is pulled low.
The datasheet says that: "The active low duration
must be greater than 10 ms".
This can hopefully fix some of the issues related
to pin strapping in OpenWrt for the EA8500 which
suffers from detection issues after a SoC reset.
Please note that the qca8k_probe() function does
currently require to read the chip's revision
register for identification purposes.
Signed-off-by: Christian Lamparter <chunkeey@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Mon, 24 Jun 2019 20:44:51 +0000 (13:44 -0700)]
ipv6: Convert gateway validation to use fib6_info
Gateway validation does not need a dst_entry, it only needs the fib
entry to validate the gateway resolution and egress device. So,
convert ip6_nh_lookup_table from ip6_pol_route to fib6_table_lookup
and ip6_route_check_nh to use fib6_lookup over rt6_lookup.
ip6_pol_route is a call to fib6_table_lookup and if successful a call
to fib6_select_path. From there the exception cache is searched for an
entry or a dst_entry is created to return to the caller. The exception
entry is not relevant for gateway validation, so what matters are the
calls to fib6_table_lookup and then fib6_select_path.
Similarly, rt6_lookup can be replaced with a call to fib6_lookup with
RT6_LOOKUP_F_IFACE set in flags. Again, the exception cache search is
not relevant, only the lookup with path selection. The primary difference
in the lookup paths is the use of rt6_select with fib6_lookup versus
rt6_device_match with rt6_lookup. When you remove complexities in the
rt6_select path, e.g.,
1. saddr is not set for gateway validation, so RT6_LOOKUP_F_HAS_SADDR
is not relevant
2. rt6_check_neigh is not called so that removes the RT6_NUD_FAIL_DO_RR
return and round-robin logic.
the code paths are believed to be equivalent for the given use case -
validate the gateway and optionally given the device. Furthermore, it
aligns the validation with onlink code path and the lookup path actually
used for rx and tx.
Adjust the users, ip6_route_check_nh_onlink and ip6_route_check_nh to
handle a fib6_info vs a rt6_info when performing validation checks.
Existing selftests fib-onlink-tests.sh and fib_tests.sh are used to
verify the changes.
Signed-off-by: David Ahern <dsahern@gmail.com> Reviewed-by: Wei Wang <weiwan@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
====================
FDB, VLAN and PTP fixes for SJA1105 DSA
This patchset is an assortment of fixes for the net-next version of the
sja1105 DSA driver:
- Avoid a kernel panic when the driver fails to probe or unregisters
- Finish Arnd Bermann's idea of compiling PTP support as part of the
main DSA driver and not separately
- Better handling of initial port-based VLAN as well as VLANs for
dsa_8021q FDB entries
- Fix address learning for the SJA1105 P/Q/R/S family
- Make static FDB entries persistent across switch resets
- Fix reporting of statically-added FDB entries in 'bridge fdb show'
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 25 Jun 2019 23:39:42 +0000 (02:39 +0300)]
net: dsa: sja1105: Implement is_static for FDB entries on E/T
The first generation switches don't tell us through the dynamic config
interface whether the dumped FDB entries are static or not (the LOCKEDS
bit from P/Q/R/S).
However, now that we're keeping a mirror of all 'bridge fdb' commands in
the static config, this is an opportunity to compare a dumped FDB entry
to the driver's private database. After all, what makes an entry static
is that *we* added it.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 25 Jun 2019 23:39:41 +0000 (02:39 +0300)]
net: dsa: sja1105: Use correct dsa_8021q VIDs for FDB commands
A FDB entry means that "frames that match this VID and DMAC must be
forwarded to this port".
In the case of dsa_8021q however, the VID is not a single one (and
neither two, as my previous patch assumed). The VID can be set either by
the CPU port (1 tx_vid), or by any of the other front-panel port (n-1
rx_vid's).
Fixes: 93647594d8f5 ("net: dsa: sja1105: Hide the dsa_8021q VLANs from the bridge fdb command") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 25 Jun 2019 23:39:40 +0000 (02:39 +0300)]
net: dsa: sja1105: Populate is_static for FDB entries on P/Q/R/S
The reason why this wasn't tackled earlier is that I had hoped I
understood the user manual wrong. But unfortunately hacks are required
in order to retrieve the static/dynamic nature of FDB entries on SJA1105
P/Q/R/S, since this info is stored in the writeback buffer of the
dynamic config command.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 25 Jun 2019 23:39:39 +0000 (02:39 +0300)]
net: dsa: sja1105: Add a high-level overview of the dynamic config interface
When trying to add support for LOCKEDS (static FDB entries) on SJA1105
P/Q/R/S, at first I didn't remember how the abstraction I created
worked, and actually thought it works by mistake.
To avoid other people staring at the code and not making much sense out
of it, add some comments at the top of the file.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 25 Jun 2019 23:39:38 +0000 (02:39 +0300)]
net: dsa: sja1105: Back up static FDB entries in kernel memory
After commit 8456721dd4ec ("net: dsa: sja1105: Add support for
configuring address ageing time"), we started to reset the switch rather
often (each time the bridge core changes the ageing time on a switch
port).
The unfortunate reality is that SJA1105 doesn't have any {cold, warm,
whatever} reset mode in which it accepts a new configuration stream
without flushing the FDB. Instead, in its world, the FDB *is* an
optional part of the static configuration.
So we play its game, and do what we also do for VLANs: for each 'bridge
fdb' command, we add the FDB entry through the dynamic interface, and we
append the in-kernel static config memory with info that we're going to
use later, when the next reset command is going to be issued.
The result is that 'bridge fdb' commands are now persistent (dynamically
learned entries are lost, but that's ok).
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 25 Jun 2019 23:39:37 +0000 (02:39 +0300)]
net: dsa: sja1105: Make P/Q/R/S learn MAC addresses
At the end of the commit 1da73821343c ("net: dsa: sja1105: Add FDB
operations for P/Q/R/S series") message, I said that:
At the moment only FDB entries installed statically through 'bridge fdb'
are visible in the dump callback - the dynamically learned ones are
still under investigation.
It looks like the reason why they were not visible in 'bridge fdb' was
that they were never learned - always flooded.
SJA1105 P/Q/R/S manual says about the MAXADDRP[port] field:
Specify the maximum number of MAC address dynamically learned from
the respective port. It is used to limit the number of learned MAC
addresses per port.
It looks like not providing a value in the static config (aka providing
zeroes) is enough for it to not store the learned addresses in the FDB.
For now we divide the 1024 entry FDB "equally" amongst the 5 ports. This
may be revisited if the situation calls for that - for now I'm happy
that learning works.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 25 Jun 2019 23:39:36 +0000 (02:39 +0300)]
net: dsa: sja1105: Actually implement the P/Q/R/S FDB bits
In commit 1da73821343c ("net: dsa: sja1105: Add FDB operations for
P/Q/R/S series"), these bits were set in the static config, but
apparently they did not do anything. The reason is that the packing
accessors for them were part of a patch I forgot to send.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 25 Jun 2019 23:39:35 +0000 (02:39 +0300)]
net: dsa: sja1105: Make vid 1 the default pvid
In SJA1105 there is no concept of 'default values' per se, everything
needs to be driver-supplied through the static configuration tables.
The issue is that the hardware manual says that 'at least the default
untagging VLAN' is mandatory to be provided through the static config.
But VLAN 0 isn't a very good initial pvid - its use is reserved for
priority-tagged frames, and the layers of the stack that care about
those already make sure that this VLAN is installed, as can be seen in
the message below:
8021q: adding VLAN 0 to HW filter on device swp2
So change the pvid provided through the static configuration to 1, which
matches the bridge core's defaults.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 25 Jun 2019 23:39:34 +0000 (02:39 +0300)]
net: dsa: sja1105: Cancel PTP delayed work on unregister
Currently when the driver unloads and PTP is enabled, the delayed work
that prevents the timecounter from expiring becomes a ticking time bomb.
The kernel will schedule the work thread within 60 seconds of driver
removal, but the work handler is no longer there, leading to this
strange and inconclusive stack trace:
In this case, what happened is that the DSA driver failed to probe at
boot time due to a PHY issue during phylink_connect_phy:
[ 2.245607] fsl-gianfar soc:ethernet@2d90000 eth2: error -19 setting up slave phy
[ 2.258051] sja1105 spi0.1: failed to create slave for port 0.0
Fixes: bb77f36ac21d ("net: dsa: sja1105: Add support for the PTP clock") Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vladimir Oltean [Tue, 25 Jun 2019 23:39:33 +0000 (02:39 +0300)]
net: dsa: sja1105: Build PTP support in main DSA driver
As Arnd Bergmann pointed out in commit 78fe8a28fb96 ("net: dsa: sja1105:
fix ptp link error"), there is no point in having PTP support as a
separate loadable kernel module.
So remove the exported symbols and make sja1105.ko contain PTP support
or not based on CONFIG_NET_DSA_SJA1105_PTP.
Signed-off-by: Vladimir Oltean <olteanv@gmail.com> Acked-by: Willem de Bruijn <willemb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Tue, 25 Jun 2019 23:43:48 +0000 (01:43 +0200)]
net: dsa: microchip: Replace ad-hoc bit manipulation with regmap
Regmap provides bit manipulation functions to set/clear bits, use those
insted of reimplementing them.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Tue, 25 Jun 2019 23:43:47 +0000 (01:43 +0200)]
net: dsa: microchip: Factor out regmap config generation into common header
The regmap config tables are rather similar for various generations of
the KSZ8xxx/KSZ9xxx switches. Introduce a macro which allows generating
those tables without duplication. Note that $regalign parameter is not
used right now, but will be used in KSZ87xx series switches.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Tue, 25 Jun 2019 23:43:46 +0000 (01:43 +0200)]
net: dsa: microchip: Dispose of ksz_io_ops
Since the driver now uses regmap , get rid of ad-hoc ksz_io_ops
abstraction, which no longer has any meaning. Moreover, since regmap
has it's own locking, get rid of the register access mutex.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Tue, 25 Jun 2019 23:43:45 +0000 (01:43 +0200)]
net: dsa: microchip: Initial SPI regmap support
Add basic SPI regmap support into the driver.
Previous patches unconver that ksz_spi_write() is always ever called
with len = 1, 2 or 4. We can thus drop the if (len > SPI_TX_BUF_LEN)
check and we can also drop the allocation of the txbuf which is part
of the driver data and wastes 256 bytes for no reason. Regmap covers
the whole thing now.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Tue, 25 Jun 2019 23:43:44 +0000 (01:43 +0200)]
net: dsa: microchip: Factor out register access opcode generation
Factor out the code which sends out the register read/write opcodes
to the switch, since the code differs in single bit between read and
write.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Tue, 25 Jun 2019 23:43:43 +0000 (01:43 +0200)]
net: dsa: microchip: Use PORT_CTRL_ADDR() instead of indirect function call
The indirect function call to dev->dev_ops->get_port_addr() is expensive
especially if called for every single register access, and only returns
the value of PORT_CTRL_ADDR() macro. Use PORT_CTRL_ADDR() macro directly
instead.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Woojung Huh <Woojung.Huh@microchip.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Tue, 25 Jun 2019 23:43:42 +0000 (01:43 +0200)]
net: dsa: microchip: Move ksz_cfg and ksz_port_cfg to ksz9477.c
These functions are only used by the KSZ9477 code, move them from
the header into that code. Note that these functions will be soon
replaced by regmap equivalents.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Tue, 25 Jun 2019 23:43:41 +0000 (01:43 +0200)]
net: dsa: microchip: Inline ksz_spi.h
The functions in the header file are static, and the header file is
included from single C file, just inline the code into the C file.
The bonus is that it's easier to spot further content to clean up.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Tue, 25 Jun 2019 23:43:40 +0000 (01:43 +0200)]
net: dsa: microchip: Remove ksz_{get,set}()
These functions and callbacks are never used, remove them.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Marek Vasut [Tue, 25 Jun 2019 23:43:39 +0000 (01:43 +0200)]
net: dsa: microchip: Remove ksz_{read,write}24()
These functions and callbacks are never used, remove them.
Signed-off-by: Marek Vasut <marex@denx.de> Cc: Andrew Lunn <andrew@lunn.ch> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Tristram Ha <Tristram.Ha@microchip.com> Cc: Woojung Huh <Woojung.Huh@microchip.com> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
This patchset introduces hardware VLAN offload support and also does some
maintenance: we replace driver version with uts version string, add
documentation file for atlantic driver, and update maintainers
adding Igor as a maintainer.
v3: shuffle doc sections, per Andrew's comments
v2: updates in doc, gpl spdx tag cleanup
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
set_features should update flags and reinit hardware if
vlan offload settings were changed.
Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Tested-by: Nikita Danilov <ndanilov@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Igor Russkikh [Wed, 26 Jun 2019 12:35:46 +0000 (12:35 +0000)]
net: aquantia: vlan offloads logic in datapath
Update datapath by adding logic related to hardware assisted
vlan strip/insert behaviour.
Tested-by: Nikita Danilov <ndanilov@aquantia.com> Signed-off-by: Igor Russkikh <igor.russkikh@aquantia.com> Signed-off-by: David S. Miller <davem@davemloft.net>