git.proxmox.com Git - mirror_ubuntu-eoan-kernel.git/log

geneve: remove white-space before '#if IS_ENABLED(CONFIG_IPV6)'

Signed-off-by: Alexey Kodanev <alexey.kodanev@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

team: account for oper state

Account for operational state when determining port linkup state,
as per Documentation/networking/operstates.txt.

Signed-off-by: George Wilkie <gwilkie@vyatta.att-mail.com>
Acked-by: Jiri Pirko <jiri@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'tipc-Confgiuration-of-MTU-for-media-UDP'

GhantaKrishnamurthy MohanKrishna says:

====================
tipc: Confgiuration of MTU for media UDP

Systematic measurements have shown that an emulated MTU of 14k for
UDP bearers is the optimal value for maximal throughput. Accordingly,
the default MTU of UDP bearers is changed to 14k.

We also provide users with a fallback option from this value,
by providing support to configure MTU for UDP bearers. The following
options are introduced which are symmetrical to the design of
confguring link tolerance.

- Configure media with new MTU value, which will take effect on
links going up after the moment it was configured. Alternatively,
the bearer has to be disabled and re-enabled, for existing links to
reflect the configured value.

- Configure bearer with new MTU value, which take effect on
running links dynamically.

Please note:
- User has to change MTU at both endpoints, otherwise the link
will fall back to smallest MTU after a reset.
- Failover from a link with higher MTU to a link with lower MTU
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: confgiure and apply UDP bearer MTU on running links

Currently, we have option to configure MTU of UDP media. The configured
MTU takes effect on the links going up after that moment. I.e, a user
has to reset bearer to have new value applied across its links. This is
confusing and disturbing on a running cluster.

We now introduce the functionality to change the default UDP bearer MTU
in struct tipc_bearer. Additionally, the links are updated dynamically,
without any need for a reset, when bearer value is changed. We leverage
the existing per-link functionality and the design being symetrical to
the confguration of link tolerance.

Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: implement configuration of UDP media MTU

In previous commit, we changed the default emulated MTU for UDP bearers
to 14k.

This commit adds the functionality to set/change the default value
by configuring new MTU for UDP media. UDP bearer(s) have to be disabled
and enabled back for the new MTU to take effect.

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tipc: set default MTU for UDP media

Currently, all bearers are configured with MTU value same as the
underlying L2 device. However, in case of bearers with media type
UDP, higher throughput is possible with a fixed and higher emulated
MTU value than adapting to the underlying L2 MTU.

In this commit, we introduce a parameter mtu in struct tipc_media
and a default value is set for UDP. A default value of 14k
was determined by experimentation and found to have a higher throughput
than 16k. MTU for UDP bearers are assigned the above set value of
media MTU.

Acked-by: Ying Xue <ying.xue@windriver.com>
Acked-by: Jon Maloy <jon.maloy@ericsson.com>
Signed-off-by: GhantaKrishnamurthy MohanKrishna <mohan.krishna.ghanta.krishnamurthy@ericsson.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

liquidio: Added ndo_get_vf_stats support

Added the ndo to gather VF statistics through the PF.

Collect VF statistics via mailbox from VF.

Signed-off-by: Intiyaz Basha <intiyaz.basha@cavium.com>
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ave-fix-the-activation-issues-for-some-UniPhier-SoCs'

Kunihiko Hayashi says:

====================
ave: fix the activation issues for some UniPhier SoCs

This add the following stuffs to fix the activation issues and satisfy
requirements for AVE ethernet driver implemented on some UniPhier SoCs.

- Add support for additional necessary clocks and resets, because the kernel
is stalled on Pro4 due to lack of them.

- Check whether the SoC supports the specified phy-mode

- Add DT property support indicating system controller that has the feature
for configurating phy-mode including built-in phy on LD11.

v1: https://www.spinics.net/lists/netdev/msg494904.html

Changes since v1:
- Add 'Reviewed-by' lines
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ave: add support for phy-mode setting of system controller

This patch adds support for specifying system controller that configures
phy-mode setting.

According to the DT property "phy-mode", it's necessary to configure the
controller, which is used to choose the settings of the MAC suitable,
for example, mdio pin connections, internal clocks, and so on.

Supported phy-modes are SoC-dependent. The driver allows phy-mode to set
"internal" if the SoC has a built-in PHY, and {"mii", "rmii", "rgmii"}
if the SoC supports each mode. So we have to check whether the phy-mode
is valid or not.

This adds the following features for each SoC:
- check whether the SoC supports the specified phy-mode
- configure the controller accroding to phy-mode

The DT property accepts one argument to distinguish them for multiple MAC
instances.

ethernet@65000000 {
...
socionext,syscon-phy-mode = <&soc_glue 0>;
};

ethernet@65200000 {
...
socionext,syscon-phy-mode = <&soc_glue 1>;
};

Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

dt-bindings: net: ave: add syscon-phy-mode property to configure phy-mode setting

Add "socionext,syscon-phy-mode" property to specify system controller that
configures the settings about phy-mode.

Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: ethernet: ave: add multiple clocks and resets support as required property

When the link is becoming up for Pro4 SoC, the kernel is stalled
due to some missing clocks and resets.

The AVE block for Pro4 is connected to the GIO bus in the SoC.
Without its clock/reset, the access to the AVE register makes the
system stall.

In the same way, another MAC clock for Giga-bit Connection and
the PHY clock are also required for Pro4 to activate the Giga-bit feature
and to recognize the PHY.

To satisfy these requirements, this patch adds support for multiple clocks
and resets, and adds the clock-names and reset-names to the binding because
we need to distinguish clock/reset for the AVE main block and the others.

Also, make the resets a required property. Currently, "reset is
optional" relies on that the bootloader or firmware has deasserted
the reset before booting the kernel. Drivers should work without
such expectation.

Fixes: 4c270b55a5af ("net: ethernet: socionext: add AVE ethernet driver")
Suggested-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Signed-off-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mdio-boardinfo: Allow recursive mdiobus_register()

mdiobus_register will search for any mdiobus board info registered for
the bus being registered. If found, it will probe devices on the bus.
That device, if for example it is an ethernet switch, may then try to
register an mdio bus. Thus we need to allow recursive calls to
mdiobus_register.

Holding the mdio_board_lock will cause a deadlock during this
recursion. Release the lock and use list_for_each_entry_safe.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Amiga-xsurf100'

Michael Schmitz says:

====================
New network driver for Amiga X-Surf 100 (m68k)

[This is a resend of my v3 series which was based on the wrong version and
tree. Only substantial change is to Asix AX99796B PHY driver.]

This patch series adds support for the Individual Computers X-Surf 100
network card for m68k Amiga, a network adapter based on the AX88796 chip set.

The driver was originally written for kernel version 3.19 by Michael Karcher
(see CC:), and adapted to 4.16+ for submission to netdev by me. Questions
regarding motivation for some of the changes are probably best directed at
Michael Karcher.

The driver has been tested by Adrian <glaubitz@physik.fu-berlin.de> who will
send his Tested-by tag separately.

A few changes to the ax88796 driver were required:
- to read the MAC address, some setup of the ax99796 chip must be done,
- attach to the MII bus only on device open to allow module unloading,
- allow to supersede ax_block_input/ax_block_output by card-specific
  optimized code,
- use an optional interrupt status callback to allow easier sharing of the
  card interrupt,
- set IRQF_SHARED if platform IRQ resource is marked shareable

The Asix Electronix PHY used on the X-Surf 100 is buggy, and causes the
software reset to hang if the previous command sent to the PHY was also
a soft reset. This bug requires addition of a PHY driver for Asix PHYs
to provide a fixed .soft_reset function, included in this series.

Some additional cleanup:
- do not attempt to free IRQ in ax_remove (complements 82533ad9a1c),
- clear platform drvdata on probe fail and module remove.

Changes since v1:

Raised in review by Andrew Lunn:
- move MII code around to avoid need for forward declaration,
- combine patches 2 and 7 to add cleanup in error path

Changes since v2:

- corrected authorship attribution to Michael Karcher

Suggested by Geert Uytterhoeven:
- use ei_local->reset_8390() instead of duplicating ax_reset_8390(),
- use %pR to format struct resource pointers,
- assign pdev and xs100 pointers in declaration,
- don't split error messages,
- change Kconfig logic to only require XSURF100 set on Amiga

Suggested by Andrew Lunn:
- add COMPILE_TEST to ax88796 Kconfig options,
- use new Asix PHY driver for X-Surf 100

Suggested by Andrew Lunn/Finn Thain:
- declare struct sk_buff in ax88796.h,
- correct whitespace error in ax88796.h

Changes since v3:

- various checkpatch cleanup

Andrew Lunn:
- don't duplicate genphy_soft_reset in Asix PHY driver, just call
  genphy_soft_reset after writing zero to control register
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net-next: New ax88796 platform driver for Amiga X-Surf 100 Zorro board (m68k)

Add platform device driver to populate the ax88796 platform data from
information provided by the XSurf100 zorro device driver. The ax88796
module will be loaded through this module's probe function.

Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-next: ax88796: release platform device drvdata on probe error and module remove

The net device struct pointer is stored as platform device drvdata on
module probe - clear the drvdata entry on probe fail there, as well as
when unloading the module.

Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-next: ax88796: set IRQF_SHARED flag when IRQ resource is marked as shareable

On the Amiga X-Surf100, the network card interrupt is shared with many
other interrupt sources, so requires the IRQF_SHARED flag to register.

Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-next: ax88796: add interrupt status callback to platform data

To be able to tell the ax88796 driver whether it is sensible to enter
the 8390 interrupt handler, an "is this interrupt caused by the 88796"
callback has been added to the ax_plat_data structure (with NULL being
compatible to the previous behaviour).

Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-next: ax88796: Add block_input/output hooks to ax_plat_data

Add platform specific hooks for block transfer reads/writes of packet
buffer data, superseding the default provided ax_block_input/output.
Currently used for m68k Amiga XSurf100.

Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-next: ax88796: Do not free IRQ in ax_remove() (already freed in ax_close()).

This complements the fix in 82533ad9a1c ("net: ethernet: ax88796:
don't call free_irq without request_irq first") that removed the
free_irq call in the error path of probe, to also not call free_irq
when remove is called to revert the effects of probe.

Fixes: 82533ad9a1c (net: ethernet: ax88796: don't call free_irq without request_irq first)
Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-next: ax88796: Attach MII bus only when open

Call ax_mii_init in ax_open(), and unregister/remove mdiobus resources
in ax_close().

This is needed to be able to unload the module, as the module is busy
while the MII bus is attached.

Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-next: ax88796: Fix MAC address reading

To read the MAC address from the (virtual) SAprom, the remote DMA
unit needs to be set up like for every other process access to card-local
memory.

Signed-off-by: Michael Karcher <kernel@mkarcher.dialup.fu-berlin.de>
Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-next: phy: new Asix Electronics PHY driver

The Asix Electronics PHY found on the X-Surf 100 Amiga Zorro network
card by Individual Computers is buggy, and needs the reset bit toggled
as workaround to make a PHY soft reset succeed.

Add workaround driver just for this special case.

Suggested in xsurf100 patch series review by Andrew Lunn <andrew@lunn.ch>

Signed-off-by: Michael Schmitz <schmitzmic@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'Modernize-mdio-gpio'

Andrew Lunn says:

====================
Modernize mdio-gpio

This patchset is inspired by a previous version by Linus Walleij

It reworks the mdio-gpio code to make use of gpio descriptors instead
of gpio numbers. However compared to the previous version, it retains
support for platform devices. It does however remove the platform_data
header file. The needed GPIOs are now passed by making use of a gpiod
lookup table. e.g:

static struct gpiod_lookup_table zii_scu_mdio_gpiod_table = {
.dev_id = "mdio-gpio.0",
.table = {
GPIO_LOOKUP_IDX("gpio_ich", 17, NULL, MDIO_GPIO_MDC,
GPIO_ACTIVE_HIGH),
GPIO_LOOKUP_IDX("gpio_ich", 2, NULL, MDIO_GPIO_MDIO,
GPIO_ACTIVE_HIGH),
GPIO_LOOKUP_IDX("gpio_ich", 21, NULL, MDIO_GPIO_MDO,
GPIO_ACTIVE_LOW),
},
};
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mdio-gpio: Remove redundant platform data header

The platform data header file is now unused. Remove it, but add
an extra include which it brought in.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mdio-gpio: Add #defines for the GPIO index's

The GPIOs are described in device tree using a list, without names.
Add defines to indicate what each index in the list means. These
defines should also be used by platform devices passing GPIOs via a
GPIO lookup table.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mdio-gpio: Parse properties directly into bitbang structure

The same parsing code can be used for both OF and platform devices, if
the platform device uses a gpiod_lookup_table. Parse these properties
directly into the bitbang structure, rather than use an intermediate
platform data structure.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mdio-gpio: Move allocation for bitbanging data

Moving the allocation of this structure to the probe function is a
step towards making it the core data structure of the driver.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mdio-gpio: Swap to using gpio descriptors

This simplifies the code, removing the need to handle active low
flags, etc.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mdio-gpio: Remove support for IRQs in platform data

No current devices use IRQs in platform data, so remove support for
it. The MDIO core will also initialise the new bus such that all
addresses are polled, so remove the unneeded re-initialisation.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mdio-gpio: remove support for phy mask

This is not needed any more by devices using platform data, so remove
it.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mdio-gpio: remove support for ignoring turn around

This is not needed any more by devices using platform data, so remove
it.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mdio-bitbang: Remove reset support

The mdio-gpio driver was the only user of the interface reset option.
Since it no longer uses it, remove it from the bit banging code.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy: mdio-gpio: Remove reset function

The platform data can contain a function to call to reset
the bit banging interface. It is not used, so remove it.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: phy_ mdio-gpio: Fixup , which should be ;

Seems like an old typ0.

Signed-off-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipv6-followup-to-fib6_info-change'

David Ahern says:

====================
net/ipv6: followup to fib6_info change

Followup to fib change for IPv6.

First 2 patches rename fib6_info struct elements to match its name,
and rename addrconf_dst_alloc to match what it returns.

Patches 3-7 refactor the code to remove the need for fib6_idev reducing
fib6_info by another 8 bytes to 200 bytes.

Patch 8 fixes the gfp flags argument to addrconf_prefix_route in a
couple of places.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Fix gfp_flags arg to addrconf_prefix_route

Eric noticed that __ipv6_ifa_notify is called under rcu_read_lock, so
the gfp argument to addrconf_prefix_route can not be GFP_KERNEL.

While scrubbing other calls I noticed addrconf_addr_gen has one
place with GFP_ATOMIC that can be GFP_KERNEL.

Fixes: acb54e3cba404 ("net/ipv6: Add gfp_flags to route add functions")
Reported-by: syzbot+2add39b05179b31f912f@syzkaller.appspotmail.com
Reported-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Remove fib6_idev

fib6_idev can be obtained from __in6_dev_get on the nexthop device
rather than caching it in the fib6_info. Remove it.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Remove compare of fib6_idev from rt6_duplicate_nexthop

After 4832c30d5458 ("net: ipv6: put host and anycast routes on device
with address") the comparison of idev does not add value since it
correlates to the nexthop device which is already compared. Remove
the idev comparison.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Change ip6_route_get_saddr to get dev from route

Prior to 4832c30d5458 ("net: ipv6: put host and anycast routes on device
with address") host routes and anycast routes were installed with the
device set to loopback (or VRF device once that feature was added). In the
older code dst.dev was set to loopback (needed for packet tx) and rt6i_idev
was used to denote the actual interface.

Commit 4832c30d5458 changed the code to have dst.dev pointing to the real
device with the switch to lo or vrf device done on dst clones. As a
consequence of this change ip6_route_get_saddr can just pass the nexthop
device to ipv6_dev_get_saddr.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Remove unnecessary checks on fib6_idev

Prior to 4832c30d5458 ("net: ipv6: put host and anycast routes on device
with address") host routes and anycast routes were installed with the
device set to loopback (or VRF device once that feature was added). In the
older code dst.dev was set to loopback (needed for packet tx) and rt6i_idev
was used to denote the actual interface.

Commit 4832c30d5458 changed the code to have dst.dev pointing to the real
device with the switch to lo or vrf device done on dst clones. As a
consequence of this change a couple of device checks during route lookups
are no longer needed. Remove them.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Remove aca_idev

aca_idev has only 1 user - inet6_fill_ifacaddr - and it only
wants the device index which can be extracted from the fib6_info
nexthop.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Rename addrconf_dst_alloc

addrconf_dst_alloc now returns a fib6_info. Update the name
and its users to reflect the change.

Rename only; no functional change intended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Rename fib6_info struct elements

Change the prefix for fib6_info struct elements from rt6i_ to fib6_.
rt6i_pcpu and rt6i_exception_bucket are left as is given that they
point to rt6_info entries.

Rename only; not functional change intended.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: pskb_trim_rcsum() and CHECKSUM_COMPLETE are friends

After working on IP defragmentation lately, I found that some large
packets defeat CHECKSUM_COMPLETE optimization because of NIC adding
zero paddings on the last (small) fragment.

While removing the padding with pskb_trim_rcsum(), we set skb->ip_summed
to CHECKSUM_NONE, forcing a full csum validation, even if all prior
fragments had CHECKSUM_COMPLETE set.

We can instead compute the checksum of the part we are trimming,
usually smaller than the part we keep.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net-next/hinic: add arm64 support

This patch enables arm64 platform support for the HINIC driver.

Signed-off-by: Zhao Chen <zhaochen6@huawei.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'TCP-data-delivery-and-ECN-stats-tracking'

Yuchung Cheng says:

====================
tracking TCP data delivery and ECN stats

This patch series improve tracking the data delivery status
  1. minor improvement on SYN data
  2. accounting bytes delivered with CE marks
  3. exporting the delivery stats to applications

s.t. users can get better sense of TCP performance at per host,
per connection, and even per application message level.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: export packets delivery info

Export data delivered and delivered with CE marks to
1) SNMP TCPDelivered and TCPDeliveredCE
2) getsockopt(TCP_INFO)
3) Timestamping API SOF_TIMESTAMPING_OPT_STATS

Note that for SCM_TSTAMP_ACK, the delivery info in
SOF_TIMESTAMPING_OPT_STATS is reported before the info
was fully updated on the ACK.

These stats help application monitor TCP delivery and ECN status
on per host, per connection, even per message level.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: track total bytes delivered with ECN CE marks

Introduce a new delivered_ce stat in tcp socket to estimate
number of packets being marked with CE bits. The estimation is
done via ACKs with ECE bit. Depending on the actual receiver
behavior, the estimation could have biases.

Since the TCP sender can't really see the CE bit in the data path,
so the sender is technically counting packets marked delivered with
the "ECE / ECN-Echo" flag set.

With RFC3168 ECN, because the ECE bit is sticky, this count can
drastically overestimate the nummber of CE-marked data packets

With DCTCP-style ECN this should be reasonably precise unless there
is loss in the ACK path, in which case it's not precise.

With AccECN proposal this can be made still more precise, even in
the case some degree of ACK loss.

However this is sender's best estimate of CE information.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: new helper to calculate newly delivered

Add new helper tcp_newly_delivered() to prepare the ECN accounting change.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

tcp: better delivery accounting for SYN-ACK and SYN-data

the tcp_sock:delivered has inconsistent accounting for SYN and FIN.
1. it counts pure FIN
2. it counts pure SYN
3. it counts SYN-data twice
4. it does not count SYN-ACK

For congestion control perspective it does not matter much as C.C. only
cares about the difference not the aboslute value. But the next patch
would export this field to user-space so it's better to report the absolute
value w/o these caveats.

This patch counts SYN, SYN-ACK, or SYN-data delivery once always in
the "delivered" field.

Signed-off-by: Yuchung Cheng <ycheng@google.com>
Reviewed-by: Neal Cardwell <ncardwell@google.com>
Reviewed-by: Soheil Hassas Yeganeh <soheil@google.com>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

ipv6: frags: fix a lockdep false positive

lockdep does not know that the locks used by IPv4 defrag
and IPv6 reassembly units are of different classes.

It complains because of following chains :

1) sch_direct_xmit()        (lock txq->_xmit_lock)
    dev_hard_start_xmit()
     xmit_one()
      dev_queue_xmit_nit()
       packet_rcv_fanout()
        ip_check_defrag()
         ip_defrag()
          spin_lock()     (lock frag queue spinlock)

2) ip6_input_finish()
    ipv6_frag_rcv()       (lock frag queue spinlock)
     ip6_frag_queue()
      icmpv6_param_prob() (lock txq->_xmit_lock at some point)

We could add lockdep annotations, but we also can make sure IPv6
calls icmpv6_param_prob() only after the release of the frag queue spinlock,
since this naturally makes frag queue spinlock a leaf in lock hierarchy.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hv_netvsc: Add NetVSP v6 and v6.1 into version negotiation

This patch adds the NetVSP v6 and 6.1 message structures, and includes
these versions into NetVSC/NetVSP version negotiation process.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

hv_netvsc: propogate Hyper-V friendly name into interface alias

This patch implement the 'Device Naming' feature of the Hyper-V
network device API. In Hyper-V on the host through the GUI or PowerShell
it is possible to enable the device naming feature which causes
the host to make available to the guest the name of the device.
This shows up in the RNDIS protocol as the friendly name.

The name has no particular meaning and is limited to 256 characters.
The value can only be set via PowerShell on the host, but could
be scripted for mass deployments. The default value is the
string 'Network Adapter' and since that is the same for all devices
and useless, the driver ignores it.

In Windows, the value goes into a registry key for use in SNMP
ifAlias. For Linux, this patch puts the value in the network
device alias property; where it is visible in ip tools and SNMP.

The host provided ifAlias is just a suggestion, and can be
overridden by later ip commands.

Also requires exporting dev_set_alias in netdev core.

Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'r8169-series-with-further-smaller-improvements'

Heiner Kallweit says:

====================
r8169: series with further smaller improvements

This series includes further smaller improvements.

Then I think the basic cleanup has been done and next step would be
preparing the switch to phylib.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: remove jumbo_tx_csum from chip config struct

According to the chip configuration entries only RTL8169 (ver <= 06)
supports tx checksumming for jumbo packets.
By the way: constant JUMBO_1K is a little misleading because it refers
to the standard packet size and not to a jumbo packet size.

By implementing this rule we can get rid of configuring tx checksumming
support per chip type.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: improve pci region handling

The region to be used is always the first of type IORESOURCE_MEM.
We can implement this rule directly w/o having to specify which
region is the first one per configuration entry.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: drop member txd_version from struct rtl8169_private

txd_version is used in rtl_init_one() only, so we can drop member
txd_version from struct rtl8169_private.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: improve rtl8169_get_mac_version

Certain entries in array mac_info[] are redundant, so remove them:
0x7cf, 0x2c200000 (VER 33): matched by entry 0x7c8, 0x2c000000
0x7cf, 0x28300000 (VER 26): matched by entry 0x7c8, 0x28000000
0x7cf, 0x3cb00000 (VER 24): matched by entry 0x7c8, 0x3c800000
0x7cf, 0x3c400000 (VER 22): matched by entry 0x7c8, 0x3c000000
0x7cf, 0x38500000 (VER 17): matched by entry 0x7c8, 0x38000000
0x7cf, 0x44900000 (VER 39): matched by entry 0x7c8, 0x44800000
0x7cf, 0x40b00000 (VER 30): matched by entry 0x7c8, 0x40800000
0x7cf, 0x40a00000 (VER 30): matched by entry 0x7c8, 0x40800000
0x7cf, 0x34a00000 (VER 09): matched by entry 0x7c8, 0x34800000
0x7cf, 0x24a00000 (VER 09): matched by entry 0x7c8, 0x24800000

In addition don't mask out bits 30 and 29 when printing the XID.
Most likely this is a relict from the times when the driver covered
RTL8169 chip version only.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: don't display tp->mmio_addr address

For security reasons since commit ad67b74d2469 "printk: hash addresses
printed with %p" %p doesn't display the full address any longer.
We could switch to %px, but I think the pointer address doesn't
provide a real benefit, so remove printing the hashed address.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: drop member opts1_mask from struct rtl8169_private

We can get rid of member opts1_mask and in addition save a few cpu
cycles in the hot path of rtl_rx().

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: change interrupt handler argument type

Code can be a little simplified by switching the interrupt handler
argument type to struct rtl8169_private *.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: change argument type of counters handling functions

The counter handling functions don't deal with the net_device, so code
can be simplified by changing the argument type to
struct rtl8169_private *.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: change hw_start argument type

Code can be simplified by changing the argument type of hw_start
callbacks from struct net_device * to struct rtl8169_private *.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: remove rtl8169_map_to_asic

This function is very simple and used only once, so we can inline
the two statements.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: replace rx_buf_sz with a constant

rx_buf_sz is constant, so we don't have to pass it as parameter and
in general can replace it with a constant.

When working on this I noticed that also before in
rtl_set_rx_max_size() a value of 0x4000 is set, what is not in line
with the chip spec. According to the spec only bits 0..13 are used
and we set an effective value of zero therefore.
However, the driver still seems to work and due to potential side
effects I'm reluctant to make a change.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: remove unneeded check in rtl8169_rx_fill

rtl8169_rx_fill() is called only once and directly before the call
array tp->Rx_databuff[] is filled with zero's. Therefore we don't
need this check.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: improve rtl8169_init_ring

This function doesn't use the net_device, therefore change the
parameter to type struct rtl8169_private * to simplify the code.
In addition we don't need the calculations in the memset
statements, we can use the size of the arrays directly.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: simplify rtl8169_alloc_rx_data

dev->dev.parent has the same value as tp_to_dev(tp)
(set by SET_NETDEV_DEV() in rtl_init_one()) and we know it can't be NULL.
This allows us to simplify the code.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: switch to napi_schedule_irqoff

napi_schedule() is called from hard irq context, so we can switch to
napi_schedule_irqoff() and avoid some overhead.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: use constant NAPI_POLL_WAIT

We can use generic constant NAPI_POLL_WAIT instead of defining an own
constant for the same value.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: use skb_copy_to_linear_data in rtl8169_try_rx_copy

Not a giant leap for mankind, but let's avoid the open-coded memcpy
and use standard helper skb_copy_to_linear_data instead.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: remove member align from struct rtl_cfg_info

Since commit 6f0333b8fde4 "r8169: use 50% less ram for RX ring" member
align isn't used any longer, so remove it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

r8169: remove unused member features from struct

Member features of struct rtl8169_private isn't used any longer since
commit 6c6aa15fdea5 "r8169: improve interrupt handling", so remove it.

Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'netcp-K2G-SoC-support'

Murali Karicheri says:

====================
Add support for netcp driver on K2G SoC

K2G SoC is another variant of Keystone family of SoCs. This patch
series add support for NetCP driver on this SoC. The QMSS found on
K2G SoC is a cut down version of the QMSS found on other keystone
devices with less number of queues, internal link ram etc. The patch
series has 2 patch sets that goes into the drivers/soc and the
rest has to be applied to net sub system. Please review and merge
if this looks good.

K2G TRM is located at http://www.ti.com/lit/ug/spruhy8g/spruhy8g.pdf
Thanks

The boot logs on K2G ICE board (tftp boot over Ethernet and from mmc)
https://pastebin.ubuntu.com/p/yvZ6drFhkW/

The boot logs on K2G GP board (tftp boot over Ethernet and from mmc)
https://pastebin.ubuntu.com/p/QTr6K7s4Zp/

Also regressed boot on K2HK and K2L EVMs as we have modified GBE
version detection logic (K2E uses same version of NetCP as in K2L.
So regression on one of them is needed).

Boot log on K2L and K2HK EVMs are at
https://pastebin.ubuntu.com/p/N9DBdPjbvR/

This series applies to net-next master branch.

Change history:
  v4 - ready for merge to net-next
       Folded the series "Add promiscous mode support in k2g network driver"
       into this.
       Fixed a typo in 5/11 (sgmii to rgmii) based on TI internal comment
       Reworked 4/11 and title changed to reflect additional changes to
       exclude sgmii configuration code for 2U cpsw. Use IS_SS_ID_2U()
       macro for customization.
       Added Reviewed-by from Rob Herring against 1/13
  v3 - Addressed comments from Andrew Lunn and Grygorii Strashko
       against v2.
  v2 - Addressed following comments on initial version
       - split patch 3/5 to multiple patches from Andrew Lunn
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: ethss: k2g: add promiscuous mode support

This patch adds support for promiscuous mode in k2g's network
driver. When upper layer instructs to transition from
non-promiscuous mode to promiscuous mode or vice versa
K2G network driver needs to configure ALE accordingly
so that in case of non-promiscuous mode, ALE will not flood
all unicast packets to host port, while in promiscuous
mode, it will pass all received unicast packets to
host port.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: add api to support set rx mode in netcp modules

This patch adds an API to support setting rx mode in
netcp modules. If a netcp module needs to be notified
when upper layer transitions from one rx mode to
another and react accordingly, such a module will implement
the new API set_rx_mode added in this patch. Currently
rx modes supported are PROMISCUOUS and NON_PROMISCUOUS
modes.

Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: support probe deferral

The netcp driver shouldn't proceed until the knav qmss and dma
devices are ready. So return -EPROBE_DEFER if these devices are not
ready.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

Revert "net: netcp: remove dead code from the driver"

As the probe sequence is not guaranteed contrary to the assumption
of the commit 2d8e276a9030, same has to be reverted.

commit 2d8e276a9030 ("net: netcp: remove dead code from the driver")

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: ethss: use of_get_phy_mode() to support different RGMII modes

The phy used for K2G allows for internal delays to be added optionally
to the clock circuitry based on board desing. To add this support,
enhance the driver to use of_get_phy_mode() to read the phy-mode from
the phy device and pass the same to phy through of_phy_connect().

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: ethss: re-use stats handling code for 2u hardware

The stats block in 2u cpsw hardware is similar to the one on nu
and hence handle it in a similar way by using a macro that includes
2u hardware as well.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: ethss: map vlan priorities to zero flow

The driver currently support only vlan priority zero. So map the
vlan priorities to zero flow in hardware.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: ethss: use rgmii link status for 2u cpsw hardware

Introduce rgmii link status to handle link state events for 2u
cpsw hardware on K2G.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: ethss: add support for handling rgmii link interface

2u cpsw hardware on K2G uses rgmii link to interface with Phy. So add
support for this interface in the code so that driver can be re-used
for 2u hardware.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: ethss: make sgmii configuration conditional

As a preparatory patch to add support for 2u cpsw hardware found on
K2G SoC, make sgmii configuration conditional. This is required
since 2u uses RGMII interface instead of SGMII and to allow for driver
re-use.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net: netcp: ethss: use macro for checking ss_version consistently

Driver currently uses macro for NU and XBE hardwrae, while other
places for older hardware such as that on K2H/K SoC (version 1.4
of the cpsw hardware, it explicitly check for the ss_version
inline. Add a new macro for version 1.4 and use it to customize
code in the driver. While at it also fix similar issue with
checking XBE version by re-using existing macro IS_SS_ID_XGBE().

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

soc: ti: K2G: provide APIs to support driver probe deferral

This patch provide APIs to allow client drivers to support
probe deferral. On K2G SoC, devices can be probed only
after the ti_sci_pm_domains driver is probed and ready.
As drivers may get probed at different order, any driver
that depends on knav dma and qmss drivers, for example
netcp network driver, needs to defer probe until
knav devices are probed and ready to service. To do this,
add an API to query the device ready status from the knav
dma and qmss devices.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

soc: ti: K2G: enhancement to support QMSS in K2G NAVSS

Navigator Subsystem (NAVSS) available on K2G SoC has a cut down
version of QMSS with less number of queues, internal linking ram
with lesser number of buffers etc. It doesn't have status and
explicit push register space as in QMSS available on other K2 SoCs.
So define reg indices specific to QMSS on K2G. This patch introduces
"ti,66ak2g-navss-qm" compatibility to identify QMSS on K2G NAVSS
and to customize the dts handling code. Per Device manual,
descriptors with index less than or equal to regions0_size is in region 0
in the case of K2 QMSS where as for QMSS on K2G, descriptors with index
less than regions0_size is in region 0. So update the size accordingly in
the regions0_size bits of the linking ram size 0 register.

Signed-off-by: Murali Karicheri <m-karicheri2@ti.com>
Signed-off-by: WingMan Kwok <w-kwok2@ti.com>
Reviewed-by: Rob Herring <robh@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>

Merge branch 'ipv6-Separate-data-structures-for-FIB-and-data-path'

David Ahern says:

====================
net/ipv6: Separate data structures for FIB and data path

IPv6 uses the same data struct for both control plane (FIB entries) and
data path (dst entries). This struct has elements needed for both paths
adding memory overhead and complexity (taking a dst hold in most places
but an additional reference on rt6i_ref in a few). Furthermore, because
of the dst_alloc tie, all FIB entries are allocated with GFP_ATOMIC.

This patch set separates FIB entries from dst entries, better aligning
IPv6 code with IPv4, simplifying the reference counting and allowing
FIB entries added by userspace (not autoconf) to use GFP_KERNEL. It is
first step to a number of performance and scalability changes.

The end result of this patch set:
  - FIB entries (fib6_info):
        /* size: 208, cachelines: 4, members: 25 */
        /* sum members: 207, holes: 1, sum holes: 1 */

  - dst entries (rt6_info)
       /* size: 240, cachelines: 4, members: 11 */

Versus the the single rt6_info struct today for both paths:
      /* size: 320, cachelines: 5, members: 28 */

This amounts to a 35% reduction in memory use for FIB entries and a
25% reduction for dst entries.

With respect to locking FIB entries use RCU and a single atomic
counter with fib6_info_hold and fib6_info_release helpers to manage
the reference counting. dst entries use only the traditional dst
refcounts with dst_hold and dst_release.

FIB entries for host routes are referenced by inet6_ifaddr and
ifacaddr6. In both cases, additional holds are taken -- similar to
what is done for devices.

This set is the first of many changes to improve the scalability of the
IPv6 code. Follow on changes include:
- consolidating duplicate fib6_info references like IPv4 does with
  duplicate fib_info

- moving fib6_info into a slab cache to avoid allocation roundups to
  power of 2 (the 208 size becomes a 256 actual allocation)

- Allow FIB lookups without generating a dst (e.g., most rt6_lookup
  users just want to verify the egress device). Means moving dst
  allocation to the other side of fib6_rule_lookup which again aligns
  with IPv4 behavior

- using separate standalone nexthop objects which have performance
  benefits beyond fib_info consolidation

At this point I am not seeing any refcount leaks or underflows, no
oops or bug_ons, or warnings from kasan, so I think it is ready for
others to beat up on it finding errors in code paths I have missed.

v2 changes
- rebased to top of tree
- improved commit message on patch 7

v1 changes
- rebased to top of tree
- fix memory leak of metrics as noted by Ido
- MTU fixes based on pmtu tests (thanks Stefano Brivio for writing)

RFC v2 changes
- improved commit messages
- move common metrics code from dst.c to net/ipv4/metrics.c (comment
  from DaveM)
- address comments from Wei Wang and Martin KaFai Lau (let me know if
  I missed something)
- fixes detected by kernel test robots
  + added fib6_metric_set to change metric on a FIB entry which could
    be pointing to read-only dst_default_metrics
  + 0day testing found a problem with an intermediate patch; added
    dst_hold_safe on rt->from. Code is removed 3 patches later
- allow cacheinfo to handle NULL dst; means only expires is pushed to
  userspace
====================

Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Remove unused code and variables for rt6_info

Drop unneeded elements from rt6_info struct and rearrange layout to
something more relevant for the data path.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Flip FIB entries to fib6_info

Convert all code paths referencing a FIB entry from
rt6_info to fib6_info.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: separate handling of FIB entries from dst based routes

Last step before flipping the data type for FIB entries:
- use fib6_info_alloc to create FIB entries in ip6_route_info_create
  and addrconf_dst_alloc
- use fib6_info_release in place of dst_release, ip6_rt_put and
  rt6_release
- remove the dst_hold before calling __ip6_ins_rt or ip6_del_rt
- when purging routes, drop per-cpu routes
- replace inc and dec of rt6i_ref with fib6_info_hold and fib6_info_release
- use rt->from since it points to the FIB entry
- drop references to exception bucket, fib6_metrics and per-cpu from
  dst entries (those are relevant for fib entries only)

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: introduce fib6_info struct and helpers

Add fib6_info struct and alloc, destroy, hold and release helpers.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Cleanup exception and cache route handling

IPv6 FIB will only contain FIB entries with exception routes added to
the FIB entry. Once this transformation is complete, FIB lookups will
return a fib6_info with the lookup functions still returning a dst
based rt6_info. The current code uses rt6_info for both paths and
overloads the rt6_info variable usually called 'rt'.

This patch introduces a new 'f6i' variable name for the result of the FIB
lookup and keeps 'rt' as the dst based return variable. 'f6i' becomes a
fib6_info in a later patch which is why it is introduced as f6i now;
avoids the additional churn in the later patch.

In addition, remove RTF_CACHE and dst checks from fib6 add and delete
since they can not happen now and will never happen after the data
type flip.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Add gfp_flags to route add functions

Most FIB entries can be added using memory allocated with GFP_KERNEL.
Add gfp_flags to ip6_route_add and addrconf_dst_alloc. Code paths that
can be reached from the packet path (e.g., ndisc and autoconfig) or
atomic notifiers use GFP_ATOMIC; paths from user context (adding
addresses and routes) use GFP_KERNEL.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Create a neigh_lookup for FIB entries

The router discovery code has a FIB entry and wants to validate the
gateway has a neighbor entry. Refactor the existing dst_neigh_lookup
for IPv6 and create a new function that takes the gateway and device
and returns a neighbor entry. Use the new function in
ndisc_router_discovery to validate the gateway.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Move dst flags to booleans in fib entries

Continuing to wean FIB paths off of dst_entry, use a bool to hold
requests for certain dst settings. Add a helper to convert the
flags to DST flags when a FIB entry is converted to a dst_entry.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Add rt6_info create function for ip6_pol_route_lookup

ip6_pol_route_lookup is the lookup function for ip6_route_lookup and
rt6_lookup. At the moment it returns either a reference to a FIB entry
or a cached exception. To move FIB entries to a separate struct, this
lookup function needs to convert FIB entries to an rt6_info that is
returned to the caller.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: Add fib6_null_entry

ip6_null_entry will stay a dst based return for lookups that fail to
match an entry.

Add a new fib6_null_entry which constitutes the root node and leafs
for fibs. Replace existing references to ip6_null_entry with the
new fib6_null_entry when dealing with FIBs.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: move expires into rt6_info

Add expires to rt6_info for FIB entries, and add fib6 helpers to
manage it. Data path use of dst.expires remains.

The transition is fairly straightforward: when working with fib entries,
rt->dst.expires is just rt->expires, rt6_clean_expires is replaced with
fib6_clean_expires, rt6_set_expires becomes fib6_set_expires, and
rt6_check_expired becomes fib6_check_expired, where the fib6 versions
are added by this patch.

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>

net/ipv6: move metrics from dst to rt6_info

Similar to IPv4, add fib metrics to the fib struct, which at the moment
is rt6_info. Will be moved to fib6_info in a later patch. Copy metrics
into dst by reference using refcount.

To make the transition:
- add dst_metrics to rt6_info. Default to dst_default_metrics if no
  metrics are passed during route add. No need for a separate pmtu
  entry; it can reference the MTU slot in fib6_metrics

- ip6_convert_metrics allocates memory in the FIB entry and uses
  ip_metrics_convert to copy from netlink attribute to metrics entry

- the convert metrics call is done in ip6_route_info_create simplifying
  the route add path
  + fib6_commit_metrics and fib6_copy_metrics and the temporary
    mx6_config are no longer needed

- add fib6_metric_set helper to change the value of a metric in the
  fib entry since dst_metric_set can no longer be used

- cow_metrics for IPv6 can drop to dst_cow_metrics_generic

- rt6_dst_from_metrics_check is no longer needed

- rt6_fill_node needs the FIB entry and dst as separate arguments to
  keep compatibility with existing output. Current dst address is
  renamed to dest.
  (to be consistent with IPv4 rt6_fill_node really should be split
  into 2 functions similar to fib_dump_info and rt_fill_info)

- rt6_fill_node no longer needs the temporary metrics variable

Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>