Alexander Duyck [Wed, 14 Sep 2016 23:24:34 +0000 (16:24 -0700)]
i40e: Remove unused function i40e_vsi_lookup
The function is not used so there is no need to carry it forward. I have
plans to add a slightly different function that can be inlined to handle
the same kind of functionality.
Change-ID: Ie2dfcb189dc75e5fbc156bac23003e3b4210ae0f Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Filip Sadowski [Wed, 14 Sep 2016 23:24:33 +0000 (16:24 -0700)]
i40e: Bit test mask correction
Incorrect bit mask was used for testing "get link status" response.
Instead of I40E_AQ_LSE_ENABLE (which is actually 0x03) it most probably
should be I40E_AQ_LSE_IS_ENABLED (which is defined as 0x01).
Change-ID: Ia199142906720507f847de3a33a25c61a9781b2f Signed-off-by: Filip Sadowski <filip.sadowski@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Alexander Duyck [Wed, 14 Sep 2016 23:24:32 +0000 (16:24 -0700)]
i40e: Rewrite Flow Director busy wait loop
We can reorder the busy wait loop at the start of the Flow Director
transmit function to reduce the overall code size while still retaining the
same functionality. As such I am taking advantage of the opportunity to do
so.
Change-ID: I34c403ca001953c6ac9816e65d5305e73d869026 Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
This patch fixes a problem in the client interface that
was causing random stack traces in RDMA driver load and
unload tests. This patch fixes the problem by checking
for an existing client before trying to open it. Without
this patch, there is a timing related null pointer deref.
Change-ID: Ib73d30671a27f6f9770dd53b3e5292b88d6b62da Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
Jiri Pirko [Thu, 27 Oct 2016 13:12:59 +0000 (15:12 +0200)]
mlxsw: Move PCI id table definitions into driver modules
So far, mlxsw_pci.ko is the module that registers PCI table for all
drivers (spectrum and switchx2). That is problematic for example with
dracut. Since mlxsw_spectrum.ko and mlxsw_switchx2.ko are loaded
dynamically from within mlxsw_core.ko, dracut does not have track of
them and avoids them from being included in initramfs.
So make this in an ordinary way and define the PCI tables in individual
driver modules, so it can be properly loaded and included in dracut
initramfs image. As a side effect, this patch could remove no longer
necessary driver "kind" strings which were used to link PCI ids with
individual mlxsw drivers.
Suggested-by: Ivan Vecera <ivecera@redhat.com> Tested-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Acked-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
It's very critical to optimize the exit path for network namespaces,
because they are destroyed under net_mutex and many namespaces can be
destroyed for one iteration.
v2: use dev_set_uevent_suppress()
Cc: Cong Wang <xiyou.wangcong@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Richter [Mon, 24 Oct 2016 12:42:26 +0000 (14:42 +0200)]
ethernet: fix min/max MTU typos
Fixes: d894be57ca92('ethernet: use net core MTU range checking in more drivers') CC: Jarod Wilson <jarod@redhat.com> CC: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 27 Oct 2016 20:16:14 +0000 (16:16 -0400)]
Merge branch 'genetlink-improvements'
Johannes Berg says:
====================
genetlink improvements
This series contains some generic netlink improvements, making
the API safer to use, and making the function pointers in the
family struct safer by allowing it to be __ro_after_init.
The first patch, introducing genl_family_attrbuf(), just ensures
that the users of family->attrbuf aren't actually racy, but making
them use the indirection function for obtaining a reference and
checking that the context can actually do so.
The second patch removes the more or less broken ability to have
a static family ID, the three IDs that need to be static because
it's simply needed (genl controller), or due to old API misused.
Everything else couldn't be static anyway, or could fail when the
family is registered, if somebody else already got a static ID.
The third patch statically initializes the families, mostly to save
some code. I wrote this initially because I thought I could make
them all const, but that ends up being very inefficient (it would
require always doing some kind of family -> id lookup), so now it's
just here because I had it already and it reduces the code size.
The fourth patch then, finally, lays the groundwork for what I had
really wanted - now with __ro_after_init instead of const; I remove
code there to do the ID->family hash table mapping in genetlink and
use IDR instead to both allocate and map the IDs, which again ends
up saving some code size.
Finally, the fifth patch updates all families, as it turns out, no
families exist that really dynamically register/unregister. This
last patch should perhaps be split up, I could submit it for each
subsystem separately, but it'd depend on the second and third to
go in first, so would take a while. I can do that though, if that
seems better to you.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Mon, 24 Oct 2016 12:40:05 +0000 (14:40 +0200)]
genetlink: mark families as __ro_after_init
Now genl_register_family() is the only thing (other than the
users themselves, perhaps, but I didn't find any doing that)
writing to the family struct.
In all families that I found, genl_register_family() is only
called from __init functions (some indirectly, in which case
I've add __init annotations to clarifly things), so all can
actually be marked __ro_after_init.
This protects the data structure from accidental corruption.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Mon, 24 Oct 2016 12:40:04 +0000 (14:40 +0200)]
genetlink: use idr to track families
Since generic netlink family IDs are small integers, allocated
densely, IDR is an ideal match for lookups. Replace the existing
hand-written hash-table with IDR for allocation and lookup.
This lets the families only be written to once, during register,
since the list_head can be removed and removal of a family won't
cause any writes.
It also slightly reduces the code size (by about 1.3k on x86-64).
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Mon, 24 Oct 2016 12:40:02 +0000 (14:40 +0200)]
genetlink: no longer support using static family IDs
Static family IDs have never really been used, the only
use case was the workaround I introduced for those users
that assumed their family ID was also their multicast
group ID.
Additionally, because static family IDs would never be
reserved by the generic netlink code, using a relatively
low ID would only work for built-in families that can be
registered immediately after generic netlink is started,
which is basically only the control family (apart from
the workaround code, which I also had to add code for so
it would reserve those IDs)
Thus, anything other than GENL_ID_GENERATE is flawed and
luckily not used except in the cases I mentioned. Move
those workarounds into a few lines of code, and then get
rid of GENL_ID_GENERATE entirely, making it more robust.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Johannes Berg [Mon, 24 Oct 2016 12:40:01 +0000 (14:40 +0200)]
genetlink: introduce and use genl_family_attrbuf()
This helper function allows family implementations to access
their family's attrbuf. This gets rid of the attrbuf usage
in families, and also adds locking validation, since it's not
valid to use the attrbuf with parallel_ops or outside of the
dumpit callback.
Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
skbedit: allow the user to specify bitmask for mark
The user may want to use only some bits of the skb mark in
his skbedit rules because the remaining part might be used by
something else.
Introduce the "mask" parameter to the skbedit actor in order
to implement such functionality.
When the mask is specified, only those bits selected by the
latter are altered really changed by the actor, while the
rest is left untouched.
Signed-off-by: Antonio Quartulli <antonio@open-mesh.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Richter [Sun, 23 Oct 2016 14:30:56 +0000 (16:30 +0200)]
firewire: net: set initial MTU = 1500 unconditionally, fix IPv6 on some CardBus cards
firewire-net, like the older eth1394 driver, reduced the initial MTU to
less than 1500 octets if the local link layer controller's asynchronous
packet reception limit was lower.
This is bogus, since this reception limit does not have anything to do
with the transmission limit. Neither did this reduction affect the TX
path positively, nor could it prevent link fragmentation at the RX path.
Many FireWire CardBus cards have a max_rec of 9, causing an initial MTU
of 1024 - 16 = 1008. RFC 2734 and RFC 3146 allow a minimum max_rec = 8,
which would result in an initial MTU of 512 - 16 = 496. On such cards,
IPv6 could only be employed if the MTU was manually increased to 1280 or
more, i.e. IPv6 would not work without intervention from userland.
We now always initialize the MTU to 1500, which is the default according
to RFC 2734 and RFC 3146.
On a VIA VT6316 based CardBus card which was affected by this, changing
the MTU from 1008 to 1500 also increases TX bandwidth by 6 %.
RX remains unaffected.
CC: netdev@vger.kernel.org CC: linux1394-devel@lists.sourceforge.net CC: Jarod Wilson <jarod@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Stefan Richter [Sun, 23 Oct 2016 14:29:03 +0000 (16:29 +0200)]
firewire: net: fix maximum possible MTU
Commit b3e3893e1253 ("net: use core MTU range checking in misc drivers")
mistakenly introduced an upper limit for firewire-net's MTU based on the
local link layer controller's reception capability. Revert this. Neither
RFC 2734 nor our implementation impose any particular upper limit.
Actually, to be on the safe side and to make the code explicit, set
ETH_MAX_MTU = 65535 as upper limit now.
(I replaced sizeof(struct rfc2734_header) by the equivalent
RFC2374_FRAG_HDR_SIZE in order to avoid distracting long/int conversions.)
Fixes: b3e3893e1253('net: use core MTU range checking in misc drivers') CC: netdev@vger.kernel.org CC: linux1394-devel@lists.sourceforge.net CC: Jarod Wilson <jarod@redhat.com> Signed-off-by: Stefan Richter <stefanr@s5r6.in-berlin.de> Acked-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Sat, 22 Oct 2016 14:36:36 +0000 (14:36 +0000)]
net: ena: use setup_timer() and mod_timer()
Use setup_timer() instead of init_timer(), being the preferred/standard
way to set a timer up.
Also, quoting the mod_timer() function comment:
-> mod_timer() is a more efficient way to update the expire field of an
active timer (if the timer is inactive it will be activated).
Use setup_timer and mod_timer to setup and arm a timer, to make the code
cleaner and easier to read.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Sat, 22 Oct 2016 14:35:30 +0000 (14:35 +0000)]
amd-xgbe: Fix error return code in xgbe_probe()
Fix to return error code -ENODEV from the DMA is not supported error
handling case instead of 0, as done elsewhere in this function.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Acked-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Sat, 22 Oct 2016 14:34:55 +0000 (14:34 +0000)]
net: ns83820: use dev_kfree_skb_irq instead of kfree_skb
It is not allowed to call kfree_skb() from hardware interrupt
context or with interrupts being disabled, spin_lock_irqsave()
make sure always in irq disable context. So the kfree_skb()
should be replaced with dev_kfree_skb_irq().
This is detected by Coccinelle semantic patch.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Sven Eckelmann [Sat, 22 Oct 2016 07:46:24 +0000 (09:46 +0200)]
batman-adv: Revert "use core MTU range checking in misc drivers"
The maximum MTU is defined via the slave devices of an batman-adv
interface. Thus it is not possible to calculate the max_mtu during the
creation of the batman-adv device when no slave devices are attached. Doing
so would for example break non-fragmentation setups which then
(incorrectly) allow an MTU of 1500 even when underlying device cannot
transport 1500 bytes + batman-adv headers.
Checking the dynamically calculated max_mtu via the minimum of the slave
devices MTU during .ndo_change_mtu is also used by the bridge interface.
Cc: Jarod Wilson <jarod@redhat.com> Fixes: b3e3893e1253 ("net: use core MTU range checking in misc drivers") Signed-off-by: Sven Eckelmann <sven@narfation.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Xo Wang [Fri, 21 Oct 2016 17:20:13 +0000 (10:20 -0700)]
net: phy: broadcom: Add support for BCM54612E
This PHY has internal delays enabled after reset. This clears the
internal delay enables unless the interface specifically requests them.
Signed-off-by: Xo Wang <xow@google.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Xo Wang [Fri, 21 Oct 2016 17:20:12 +0000 (10:20 -0700)]
net: phy: broadcom: Update Auxiliary Control Register macros
Add the RXD-to-RXC skew (delay) time bit in the Miscellaneous Control
shadow register and a mask for the shadow selector field.
Remove a re-definition of MII_BCM54XX_AUXCTL_SHDWSEL_AUXCTL.
Signed-off-by: Xo Wang <xow@google.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Joel Stanley <joel@jms.id.au> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Fri, 21 Oct 2016 16:02:43 +0000 (09:02 -0700)]
sch_htb: do not report fake rate estimators
When I prepared commit d250a5f90e53 ("pkt_sched: gen_estimator: Dont
report fake rate estimators"), htb still had an implicit rate estimator
for all its classes.
Then later, I made this rate estimator optional in commit 64153ce0a7b6
("net_sched: htb: do not setup default rate estimators"), but I forgot
to update htb use of gnet_stats_copy_rate_est()
After this patch, "tc -s qdisc ..." no longer report fake rate
estimators for HTB classes.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cyrill Gorcunov [Fri, 21 Oct 2016 10:03:44 +0000 (13:03 +0300)]
net: ip, diag -- Add diag interface for raw sockets
In criu we are actively using diag interface to collect sockets
present in the system when dumping applications. And while for
unix, tcp, udp[lite], packet, netlink it works as expected,
the raw sockets do not have. Thus add it.
v2:
- add missing sock_put calls in raw_diag_dump_one (by eric.dumazet@)
- implement @destroy for diag requests (by dsa@)
v3:
- add export of raw_abort for IPv6 (by dsa@)
- pass net-admin flag into inet_sk_diag_fill due to
changes in net-next branch (by dsa@)
v4:
- use @pad in struct inet_diag_req_v2 for raw socket
protocol specification: raw module carries sockets
which may have custom protocol passed from socket()
syscall and sole @sdiag_protocol is not enough to
match underlied ones
- start reporting protocol specifed in socket() call
when sockets are raw ones for the same reason: user
space tools like ss may parse this attribute and use
it for socket matching
v5 (by eric.dumazet@):
- use sock_hold in raw_sock_get instead of atomic_inc,
we're holding (raw_v4_hashinfo|raw_v6_hashinfo)->lock
when looking up so counter won't be zero here.
v6:
- use sdiag_raw_protocol() helper which will access @pad
structure used for raw sockets protocol specification:
we can't simply rename this member without breaking uapi
v7:
- sine sdiag_raw_protocol() helper is not suitable for
uapi lets rather make an alias structure with proper
names. __check_inet_diag_req_raw helper will catch
if any of structure unintentionally changed.
CC: David S. Miller <davem@davemloft.net> CC: Eric Dumazet <eric.dumazet@gmail.com> CC: David Ahern <dsa@cumulusnetworks.com> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> CC: James Morris <jmorris@namei.org> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Patrick McHardy <kaber@trash.net> CC: Andrey Vagin <avagin@openvz.org> CC: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Andrey Vagin [Fri, 21 Oct 2016 02:45:43 +0000 (19:45 -0700)]
net: allow to kill a task which waits net_mutex in copy_new_ns
net_mutex can be locked for a long time. It may be because many
namespaces are being destroyed or many processes decide to create
a network namespace.
Both these operations are heavy, so it is better to have an ability to
kill a process which is waiting net_mutex.
Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: Andrei Vagin <avagin@openvz.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Shmulik Ladkani [Thu, 20 Oct 2016 21:18:08 +0000 (00:18 +0300)]
net/sched: em_meta: Fix 'meta vlan' to correctly recognize zero VID frames
META_COLLECTOR int_vlan_tag() assumes that if the accel tag (vlan_tci)
is zero, then no vlan accel tag is present.
This is incorrect for zero VID vlan accel packets, making the following
match fail:
tc filter add ... basic match 'meta(vlan mask 0xfff eq 0)' ...
Apparently 'int_vlan_tag' was implemented prior VLAN_TAG_PRESENT was
introduced in 05423b2 "vlan: allow null VLAN ID to be used"
(and at time introduced, the 'vlan_tx_tag_get' call in em_meta was not
adapted).
Fix, testing skb_vlan_tag_present instead of testing skb_vlan_tag_get's
value.
Fixes: 05423b2413 ("vlan: allow null VLAN ID to be used") Fixes: 1a31f2042e ("netsched: Allow meta match on vlan tag on receive") Signed-off-by: Shmulik Ladkani <shmulik.ladkani@gmail.com> Cc: Eric Dumazet <eric.dumazet@gmail.com> Cc: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Jiri Pirko [Fri, 21 Oct 2016 14:07:23 +0000 (16:07 +0200)]
mlxsw: Convert resources into array
Since the number of resources is going to get much bigger, ease up the
addition by simly defining IDs. Convert the existing structure members
to a set array, one for validity, one for values. Introduce a set of
getters and setters for easy access.
Signed-off-by: Jiri Pirko <jiri@mellanox.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Fri, 21 Oct 2016 10:46:33 +0000 (12:46 +0200)]
bpf: add helper for retrieving current numa node id
Use case is mainly for soreuseport to select sockets for the local
numa node, but since generic, lets also add this for other networking
and tracing program types.
Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
This patch series refactor the udp memory accounting, replacing the
generic implementation with a custom one, in order to remove the needs for
locking the socket on the enqueue and dequeue operations. The socket backlog
usage is dropped, as well.
The first patch factor out pieces of some queue and memory management
socket helpers, so that they can later be used by the udp memory accounting
functions.
The second patch adds the memory account helpers, without using them.
The third patch replacse the old rx memory accounting path for udp over ipv4 and
udp over ipv6. In kernel UDP users are updated, as well.
The memory accounting schema is described in detail in the individual patch
commit message.
The performance gain depends on the specific scenario; with few flows (and
little contention in the original code) the differences are in the noise range,
while with several flows contending the same socket, the measured speed-up
is relevant (e.g. even over 100% in case of extreme contention)
Many thanks to Eric Dumazet for the reiterated reviews and suggestions.
v5 -> v6:
- do not orphan the skb on enqueue, skb_steal_sock() already did
the work for us
v4 -> v5:
- use the receive queue spin lock to protect the memory accounting
- several minor clean-up
v3 -> v4:
- simplified the locking schema, always use a plain spinlock
v2 -> v3:
- do not set the now unsed backlog_rcv callback
v1 -> v2:
- changed slighly the memory accounting schema, we now perform lazy reclaim
- fixed forward_alloc updating issue
- fixed memory counter integer overflows
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 21 Oct 2016 11:55:47 +0000 (13:55 +0200)]
udp: use it's own memory accounting schema
Completely avoid default sock memory accounting and replace it
with udp-specific accounting.
Since the new memory accounting model encapsulates completely
the required locking, remove the socket lock on both enqueue and
dequeue, and avoid using the backlog on enqueue.
Be sure to clean-up rx queue memory on socket destruction, using
udp its own sk_destruct.
Tested using pktgen with random src port, 64 bytes packet,
wire-speed on a 10G link as sender and udp_sink as the receiver,
using an l4 tuple rxhash to stress the contention, and one or more
udp_sink instances with reuseport.
v2 -> v3:
- do not set the now unsed backlog_rcv callback
v1 -> v2:
- add memory pressure support
- fixed dropwatch accounting for ipv6
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 21 Oct 2016 11:55:46 +0000 (13:55 +0200)]
udp: implement memory accounting helpers
Avoid using the generic helpers.
Use the receive queue spin lock to protect the memory
accounting operation, both on enqueue and on dequeue.
On dequeue perform partial memory reclaiming, trying to
leave a quantum of forward allocated memory.
On enqueue use a custom helper, to allow some optimizations:
- use a plain spin_lock() variant instead of the slightly
costly spin_lock_irqsave(),
- avoid dst_force check, since the calling code has already
dropped the skb dst
- avoid orphaning the skb, since skb_steal_sock() already did
the work for us
The above needs custom memory reclaiming on shutdown, provided
by the udp_destruct_sock().
v5 -> v6:
- don't orphan the skb on enqueue
v4 -> v5:
- replace the mem_lock with the receive queue spin lock
- ensure that the bh is always allowed to enqueue at least
a skb, even if sk_rcvbuf is exceeded
v3 -> v4:
- reworked memory accunting, simplifying the schema
- provide an helper for both memory scheduling and enqueuing
v1 -> v2:
- use a udp specific destrctor to perform memory reclaiming
- remove a couple of helpers, unneeded after the above cleanup
- do not reclaim memory on dequeue if not under memory
pressure
- reworked the fwd accounting schema to avoid potential
integer overflow
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Paolo Abeni [Fri, 21 Oct 2016 11:55:45 +0000 (13:55 +0200)]
net/socket: factor out helpers for memory and queue manipulation
Basic sock operations that udp code can use with its own
memory accounting schema. No functional change is introduced
in the existing APIs.
v4 -> v5:
- avoid whitespace changes
v2 -> v4:
- avoid exporting __sock_enqueue_skb
v1 -> v2:
- avoid export sock_rmem_free
Acked-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Fri, 21 Oct 2016 03:25:27 +0000 (23:25 -0400)]
net: remove MTU limits on a few ether_setup callers
These few drivers call ether_setup(), but have no ndo_change_mtu, and thus
were overlooked for changes to MTU range checking behavior. They
previously had no range checks, so for feature-parity, set their min_mtu
to 0 and max_mtu to ETH_MAX_MTU (65535), instead of the 68 and 1500
inherited from the ether_setup() changes. Fine-tuning can come after we get
back to full feature-parity here.
CC: netdev@vger.kernel.org Reported-by: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st> CC: Asbjoern Sloth Toennesen <asbjorn@asbjorn.st> CC: R Parameswaran <parameswaran.r7@gmail.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Vitaly Kuznetsov [Wed, 19 Oct 2016 13:53:01 +0000 (15:53 +0200)]
hv_netvsc: fix a race between netvsc_send() and netvsc_init_buf()
Fix in commit 880988348270 ("hv_netvsc: set nvdev link after populating
chn_table") turns out to be incomplete. A crash in
netvsc_get_next_send_section() is observed on mtu change when the device
is under load. The race I identified is: if we get to netvsc_send() after
we set net_device_ctx->nvdev link in netvsc_device_add() but before we
finish netvsc_connect_vsp()->netvsc_init_buf() send_section_map is not
allocated and we crash. Unfortunately we can't set net_device_ctx->nvdev
link after the netvsc_init_buf() call as during the negotiation we need
to receive packets and on the receive path we check for it. It would
probably be possible to split nvdev into a pair of nvdev_in and nvdev_out
links and check them accordingly in get_outbound_net_device()/
get_inbound_net_device() but this looks like an overkill.
Check that send_section_map is allocated in netvsc_send().
Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 20 Oct 2016 18:51:11 +0000 (14:51 -0400)]
Merge branch 'MTU-core-range-checking-more'
Jarod Wilson says:
====================
net: use core MTU range checking everywhere
This stack of patches should get absolutely everything in the kernel
converted from doing their own MTU range checking to the core MTU range
checking. This second spin includes alterations to hopefully fix all
concerns raised with the first, as well as including some additional
changes to drivers and infrastructure where I completely missed necessary
updates.
These have all been built through the 0-day build infrastructure via the
(rebasing) master branch at https://github.com/jarodwilson/linux-muck, which
at the time of the most recent compile across 147 configs, was based on
net-next at commit 7b1536ef0aa0.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
CC: netdev@vger.kernel.org CC: "David S. Miller" <davem@davemloft.net> CC: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> CC: James Morris <jmorris@namei.org> CC: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> CC: Patrick McHardy <kaber@trash.net> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Thu, 20 Oct 2016 17:55:23 +0000 (13:55 -0400)]
s390/net: use net core MTU range checking
ctcm:
- min_mtu = 576, max_mtu = 65527
netiucv:
- min_mtu = 576, max_mtu = 65535
qeth:
- min_mtu = 64, max_mtu = 65535
CC: netdev@vger.kernel.org CC: linux-s390@vger.kernel.org CC: Ursula Braun <ubraun@linux.vnet.ibm.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Thu, 20 Oct 2016 17:55:22 +0000 (13:55 -0400)]
net: use core MTU range checking in misc drivers
firewire-net:
- set min/max_mtu
- remove fwnet_change_mtu
nes:
- set max_mtu
- clean up nes_netdev_change_mtu
xpnet:
- set min/max_mtu
- remove xpnet_dev_change_mtu
hippi:
- set min/max_mtu
- remove hippi_change_mtu
batman-adv:
- set max_mtu
- remove batadv_interface_change_mtu
- initialization is a little async, not 100% certain that max_mtu is set
in the optimal place, don't have hardware to test with
rionet:
- set min/max_mtu
- remove rionet_change_mtu
slip:
- set min/max_mtu
- streamline sl_change_mtu
um/net_kern:
- remove pointless ndo_change_mtu
hsi/clients/ssi_protocol:
- use core MTU range checking
- remove now redundant ssip_pn_set_mtu
ipoib:
- set a default max MTU value
- Note: ipoib's actual max MTU can vary, depending on if the device is in
connected mode or not, so we'll just set the max_mtu value to the max
possible, and let the ndo_change_mtu function continue to validate any new
MTU change requests with checks for CM or not. Note that ipoib has no
min_mtu set, and thus, the network core's mtu > 0 check is the only lower
bounds here.
mptlan:
- use net core MTU range checking
- remove now redundant mpt_lan_change_mtu
fjes:
- min_mtu = 8192, max_mtu = 65536
- The max_mtu value is actually one over IP_MAX_MTU here, but the idea is to
get past the core net MTU range checks so fjes_change_mtu can validate a
new MTU against what it supports (see fjes_support_mtu in fjes_hw.c)
hsr:
- min_mtu = 0 (calls ether_setup, max_mtu is 1500)
Jarod Wilson [Thu, 20 Oct 2016 17:55:20 +0000 (13:55 -0400)]
net: use core MTU range checking in core net infra
geneve:
- Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu
- This one isn't quite as straight-forward as others, could use some
closer inspection and testing
macvlan:
- set min/max_mtu
tun:
- set min/max_mtu, remove tun_net_change_mtu
vxlan:
- Merge __vxlan_change_mtu back into vxlan_change_mtu
- Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in
change_mtu function
- This one is also not as straight-forward and could use closer inspection
and testing from vxlan folks
bridge:
- set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in
change_mtu function
openvswitch:
- set min/max_mtu, remove internal_dev_change_mtu
- note: max_mtu wasn't checked previously, it's been set to 65535, which
is the largest possible size supported
sch_teql:
- set min/max_mtu (note: max_mtu previously unchecked, used max of 65535)
macsec:
- min_mtu = 0, max_mtu = 65535
macvlan:
- min_mtu = 0, max_mtu = 65535
ntb_netdev:
- min_mtu = 0, max_mtu = 65535
veth:
- min_mtu = 68, max_mtu = 65535
8021q:
- min_mtu = 0, max_mtu = 65535
CC: netdev@vger.kernel.org CC: Nicolas Dichtel <nicolas.dichtel@6wind.com> CC: Hannes Frederic Sowa <hannes@stressinduktion.org> CC: Tom Herbert <tom@herbertland.com> CC: Daniel Borkmann <daniel@iogearbox.net> CC: Alexander Duyck <alexander.h.duyck@intel.com> CC: Paolo Abeni <pabeni@redhat.com> CC: Jiri Benc <jbenc@redhat.com> CC: WANG Cong <xiyou.wangcong@gmail.com> CC: Roopa Prabhu <roopa@cumulusnetworks.com> CC: Pravin B Shelar <pshelar@ovn.org> CC: Sabrina Dubroca <sd@queasysnail.net> CC: Patrick McHardy <kaber@trash.net> CC: Stephen Hemminger <stephen@networkplumber.org> CC: Pravin Shelar <pshelar@nicira.com> CC: Maxim Krasnyansky <maxk@qti.qualcomm.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Thu, 20 Oct 2016 17:55:19 +0000 (13:55 -0400)]
net: use core MTU range checking in WAN drivers
- set min/max_mtu in all hdlc drivers, remove hdlc_change_mtu
- sent max_mtu in lec driver, remove lec_change_mtu
- set min/max_mtu in x25_asy driver
CC: netdev@vger.kernel.org CC: Krzysztof Halasa <khc@pm.waw.pl> CC: Krzysztof Halasa <khalasa@piap.pl> CC: Jan "Yenya" Kasprzak <kas@fi.muni.cz> CC: Francois Romieu <romieu@fr.zoreil.com> CC: Kevin Curtis <kevin.curtis@farsite.co.uk> CC: Zhao Qiang <qiang.zhao@nxp.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Thu, 20 Oct 2016 17:55:18 +0000 (13:55 -0400)]
net: use core MTU range checking in wireless drivers
- set max_mtu in wil6210 driver
- set max_mtu in atmel driver
- set min/max_mtu in cisco airo driver, remove airo_change_mtu
- set min/max_mtu in ipw2100/ipw2200 drivers, remove libipw_change_mtu
- set min/max_mtu in p80211netdev, remove wlan_change_mtu
- set min/max_mtu in net/mac80211/iface.c and remove ieee80211_change_mtu
- set min/max_mtu in wimax/i2400m and remove i2400m_change_mtu
- set min/max_mtu in intersil/hostap and remove prism2_change_mtu
- set min/max_mtu in intersil/orinoco
- set min/max_mtu in tty/n_gsm and remove gsm_change_mtu
CC: netdev@vger.kernel.org CC: linux-wireless@vger.kernel.org CC: Maya Erez <qca_merez@qca.qualcomm.com> CC: Simon Kelley <simon@thekelleys.org.uk> CC: Stanislav Yakovlev <stas.yakovlev@gmail.com> CC: Johannes Berg <johannes@sipsolutions.net> CC: Inaky Perez-Gonzalez <inaky.perez-gonzalez@intel.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Acked-by: Johannes Berg <johannes@sipsolutions.net> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Thu, 20 Oct 2016 17:55:17 +0000 (13:55 -0400)]
net: use core MTU range checking in USB NIC drivers
usbnet:
- Remove stale new_mtu <= 0 check in usbnet.c
- Set min_mtu = 0, max_mtu = 65535 (sub-drivers must set their own
max_mtu and/or min_mtu as needed)
r8152:
- Set appropriate max_mtu for different variants (1500 or 9194)
lan78xx:
- Set max_mtu = 9000
asix_driver:
- max_mtu = 16384 for ax88178 variant
ax88179:
- max_mtu = 4088
cdc_ncm:
- max_mtu from hardware
cdc-phonet:
- min_mtu = 6, max_mtu = 65541
sierra_net:
- max_mtu = 1500, call usbnet_change_mtu directly
- sierra_net_change_mtu checked for MTU > 1500, then called
usbnet_change_mtu, but if we set max_mtu to let the network core handle
the range check, then we can simply call usbnet_change_mtu directly
smsc75xx:
- max_mtu = 9000
CC: netdev@vger.kernel.org CC: Woojung Huh <woojung.huh@microchip.com> CC: Microchip Linux Driver Support <UNGLinuxDriver@microchip.com> CC: Hayes Wang <hayeswang@realtek.com> CC: Oliver Neukum <oneukum@suse.com> CC: Steve Glendinning <steve.glendinning@shawell.net> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jarod Wilson [Thu, 20 Oct 2016 17:55:16 +0000 (13:55 -0400)]
ethernet: use net core MTU range checking in more drivers
Somehow, I missed a healthy number of ethernet drivers in the last pass.
Most of these drivers either were in need of an updated max_mtu to make
jumbo frames possible to enable again. In a few cases, also setting a
different min_mtu to match previous lower bounds. There are also a few
drivers that had no upper bounds checking, so they're getting a brand new
ETH_MAX_MTU that is identical to IP_MAX_MTU, but accessible by includes
all ethernet and ethernet-like drivers all have already.
qlge:
- min_mtu = 1500, max_mtu = 9000
- driver only supports setting mtu to 1500 or 9000, so the core check only
rules out < 1500 and > 9000, qlge_change_mtu still needs to check that
the value is 1500 or 9000
qualcomm/emac:
- min_mtu = 46, max_mtu = 9194
xilinx_axienet:
- min_mtu = 64, max_mtu = 9000
Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking") CC: netdev@vger.kernel.org CC: Jes Sorensen <jes@trained-monkey.org> CC: Netanel Belgazal <netanel@annapurnalabs.com> CC: Tom Lendacky <thomas.lendacky@amd.com> CC: Santosh Raspatur <santosh@chelsio.com> CC: Hariprasad S <hariprasad@chelsio.com> CC: Sathya Perla <sathya.perla@broadcom.com> CC: Ajit Khaparde <ajit.khaparde@broadcom.com> CC: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> CC: Somnath Kotur <somnath.kotur@broadcom.com> CC: Thomas Falcon <tlfalcon@linux.vnet.ibm.com> CC: John Allen <jallen@linux.vnet.ibm.com> CC: Guo-Fu Tseng <cooldavid@cooldavid.org> CC: Sebastian Hesselbarth <sebastian.hesselbarth@gmail.com> CC: Jiri Pirko <jiri@mellanox.com> CC: Ido Schimmel <idosch@mellanox.com> CC: Manish Chopra <manish.chopra@qlogic.com> CC: Sony Chacko <sony.chacko@qlogic.com> CC: Rajesh Borundia <rajesh.borundia@qlogic.com> CC: Timur Tabi <timur@codeaurora.org> CC: Anirudha Sarangi <anirudh@xilinx.com> CC: John Linn <John.Linn@xilinx.com> Signed-off-by: Jarod Wilson <jarod@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Thu, 20 Oct 2016 17:00:32 +0000 (17:00 +0000)]
net: ethernet: mediatek: use dev_kfree_skb_any instead of dev_kfree_skb
Replace dev_kfree_skb with dev_kfree_skb_any in mtk_start_xmit()
which can be called from hard irq context (netpoll) and from
other contexts. mtk_start_xmit() only frees skbs that it has
dropped.
This is detected by Coccinelle semantic patch.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Thu, 20 Oct 2016 16:59:49 +0000 (16:59 +0000)]
dwc_eth_qos: use dev_kfree_skb_any instead of dev_kfree_skb
Replace dev_kfree_skb with dev_kfree_skb_any in dwceqos_start_xmit()
which can be called from hard irq context (netpoll) and from
other contexts. dwceqos_start_xmit() only frees skbs that it has
dropped.
This is detected by Coccinelle semantic patch.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Uwe Kleine-König [Thu, 20 Oct 2016 08:28:27 +0000 (10:28 +0200)]
net: fec: drop check for clk==NULL before calling clk_*
clk_prepare, clk_enable and their counterparts (at least the common clk
ones, but also most others) do check for the clk being NULL anyhow (and
return 0 then), so there is no gain when the caller checks, too.
Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Acked-by: Fugang Duan <fugang.duan@nxp.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 20 Oct 2016 04:24:58 +0000 (21:24 -0700)]
tcp: relax listening_hash operations
softirq handlers use RCU protection to lookup listeners,
and write operations all happen from process context.
We do not need to block BH for dump operations.
Also SYN_RECV since request sockets are stored in the ehash table :
1) inet_diag_dump_icsk() no longer need to clear
cb->args[3] and cb->args[4] that were used as cursors while
iterating the old per listener hash table.
2) Also factorize a test : No need to scan listening_hash[]
if r->id.idiag_dport is not zero.
Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Jarzmik [Wed, 19 Oct 2016 21:23:50 +0000 (23:23 +0200)]
net: smc91x: fix neponset breakage by pxa u16 writes
The patch isolating the u16 writes for pxa assumed all machine_is_*()
calls were removed, and therefore removed the mach-types.h include which
provided them.
Unfortunately 2 machine_is_*() remained in smc91x.c file including
smc91x.h from which the include was removed, triggering the error:
drivers/net/ethernet/smsc/smc91x.c: In function ‘smc_drv_probe’:
drivers/net/ethernet/smsc/smc91x.c:2380:2: error: implicit declaration
of function ‘machine_is_assabet’
[-Werror=implicit-function-declaration]
if (machine_is_assabet() && machine_has_neponset())
This adds back the wrongly removed include.
Fixes: d09d747ae4c2 ("net: smc91x: isolate u16 writes alignment workaround") Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Fixes: 65d7ab8de582 ("net: Identifier Locator Addressing module") Signed-off-by: Thomas Graf <tgraf@suug.ch> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Wed, 19 Oct 2016 19:18:21 +0000 (15:18 -0400)]
Merge branch 'macb-ethtool-ringparam'
Zach Brown says:
====================
macb: Add ethtool get_ringparam and set_ringparam to cadence
There are use cases like RT that would benefit from being able to tune the
macb rx/tx ring sizes. The ethtool set_ringparam function is the standard way
of doing so.
The first patch changes the hardcoded tx/rx ring sizes to variables that are
set to a hardcoded default.
The second patch implements the get_ringparam and set_ringparam fucntions.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Zach Brown [Wed, 19 Oct 2016 14:56:57 +0000 (09:56 -0500)]
net: macb: Use variables with defaults for tx/rx ring sizes instead of hardcoded values
The macb driver hardcoded the tx/rx ring sizes. This made it
impossible to change the sizes at run time.
Add tx_ring_size, and rx_ring_size variables to macb object, which
are initilized with default vales during macb_init. Change all
references to RX_RING_SIZE and TX_RING_SIZE to their respective
replacements.
Signed-off-by: Zach Brown <zach.brown@ni.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Wei Yongjun [Wed, 19 Oct 2016 13:47:52 +0000 (13:47 +0000)]
net: arc_emac: use dev_kfree_skb_any instead of dev_kfree_skb
Replace dev_kfree_skb with dev_kfree_skb_any in arc_emac_tx()
which can be called from hard irq context (netpoll) and from
other contexts. arc_emac_tx() only frees skbs that it has
dropped.
This is detected by Coccinelle semantic patch.
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: David S. Miller <davem@davemloft.net>
A BPF program is required to check the return register of a
map_elem_lookup() call before accessing memory. The verifier keeps
track of this by converting the type of the result register from
PTR_TO_MAP_VALUE_OR_NULL to PTR_TO_MAP_VALUE after a conditional
jump ensures safety. This check is currently exclusively performed
for the result register 0.
In the event the compiler reorders instructions, BPF_MOV64_REG
instructions may be moved before the conditional jump which causes
them to keep their type PTR_TO_MAP_VALUE_OR_NULL to which the
verifier objects when the register is accessed:
This commit extends the verifier to keep track of all identical
PTR_TO_MAP_VALUE_OR_NULL registers after a map_elem_lookup() by
assigning them an ID and then marking them all when the conditional
jump is observed.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Reviewed-by: Josef Bacik <jbacik@fb.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Tobias Klauser [Wed, 19 Oct 2016 09:24:57 +0000 (11:24 +0200)]
net: fs_enet: Use net_device_stats from struct net_device
Instead of using a private copy of struct net_device_stats in struct
fs_enet_private, use stats from struct net_device. Also remove the now
unnecessary .ndo_get_stats function.
Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 18 Oct 2016 18:14:22 +0000 (14:14 -0400)]
Merge branch 'smc91x-dt'
Robert Jarzmik says:
====================
support smc91x on mainstone and devicetree
This series aims at bringing support to mainstone board on a
device-tree based build, as what is already in place for legacy
mainstone.
The bulk of the mainstone "specific" behavior is that a u16 write
doesn't work on a address of the form 4*n + 2, while it works on 4*n.
The legacy workaround was in SMC_outw(), with calls to
machine_is_mainstone(). These calls don't work with a pxa27x-dt
machine type, which is used when a generic device-tree pxa27x machine
is used to boot the mainstone board.
Therefore, this series enables the smc91c111 adapter of the mainstone
board to work on a device-tree build, exaclty as it's been working for
years with the legacy arch/arm/mach-pxa/mainstone.c definition.
As a sum up, this extends an existing mechanism to device-tree based
pxa platforms.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Jarzmik [Mon, 17 Oct 2016 19:45:32 +0000 (21:45 +0200)]
net: smsc91x: add u16 workaround for pxa platforms
Add a workaround for mainstone, idp and stargate2 boards, for u16 writes
which must be aligned on 32 bits addresses.
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Cc: Jeremy Linton <jeremy.linton@arm.com> Acked-by: Rob Herring <robh@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Robert Jarzmik [Mon, 17 Oct 2016 19:45:31 +0000 (21:45 +0200)]
net: smc91x: take into account half-word workaround
For device-tree builds, platforms such as mainstone, idp and stargate2
must have their u16 writes all aligned on 32 bit boundaries. This is
already enabled in platform data builds, and this patch adds it to
device-tree builds.
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Writes to u16 has a special handling on 3 PXA platforms, where the
hardware wiring forces these writes to be u32 aligned.
This patch isolates this handling for PXA platforms as before, but
enables this "workaround" to be set up dynamically, which will be the
case in device-tree build types.
This patch was tested on 2 PXA platforms : mainstone, which relies on
the workaround, and lubbock, which doesn't.
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr> Signed-off-by: David S. Miller <davem@davemloft.net>
Bert Kenward [Tue, 18 Oct 2016 16:47:45 +0000 (17:47 +0100)]
ethernet/sfc: use core min/max MTU checking
Fixes: 61e84623ace3 ("net: centralize net_device min/max MTU checking") Signed-off-by: Bert Kenward <bkenward@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Tue, 18 Oct 2016 15:56:31 +0000 (11:56 -0400)]
Merge branch 'phy-led-triggers'
Zach Brown says:
====================
Add support for led triggers on phy link state change
Fix skge driver that declared enum contants that conflicted with enum
constants in linux/leds.h
Create function that encapsulates actions taken during the adjust phy link step
of phy state changes.
Create function that provides list of speeds currently supported by the phy.
Add support for led triggers on phy link state changes by adding
a config option. When set the config option will create a set of led triggers
for each phy device. Users can use the led triggers to represent link state
changes on the phy.
v2:
* New patch that creates phy_adjust_link function to encapsulate actions taken
when adjusting phy link during phy state changes
* led trigger speed strings changed to match existing phy speed strings
* New function that maps speeds to led triggers
* Replace magic constants with definitions when declaring trigger name
buffer and number of triggers.
v3:
* Changed LED_ON to LED_REG_ON in skge driver to avoid possible future
conflict and improve consistency.
* Dropped rtl8712 patch that was accepted separately.
v4:
* tweaked commit message
v5
* Changed commit message to explain relationship between the new triggers and
leds driven by phys.
* Added new patch that creates phy_supported_speeds function.
* Moved phy_leds_triggers_register and phy_leds_triggers_unregister to
phy_attach and phy_detach respectively. This change is so the
phydev->supported field will be filled by the time the triggers are
registered.
* Changed hardcoded list of triggers to dynamic list determined by speeds
return by phy_supported_speeds.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Zach Brown [Mon, 17 Oct 2016 15:49:55 +0000 (10:49 -0500)]
net: phy: leds: add support for led triggers on phy link state change
Create an option CONFIG_LED_TRIGGER_PHY (default n), which will create a
set of led triggers for each instantiated PHY device. There is one LED
trigger per link-speed, per-phy.
The triggers are registered during phy_attach and unregistered during
phy_detach.
This allows for a user to configure their system to allow a set of LEDs
not controlled by the phy to represent link state changes on the phy.
LEDS controlled by the phy are unaffected.
For example, we have a board where some of the leds in the
RJ45 socket are controlled by the phy, but others are not. Using the
triggers provided by this patch the leds not controlled by the phy can
be configured to show the current speed of the ethernet connection. The
leds controlled by the phy are unaffected.
Signed-off-by: Josh Cartwright <josh.cartwright@ni.com> Signed-off-by: Nathan Sullivan <nathan.sullivan@ni.com> Signed-off-by: Zach Brown <zach.brown@ni.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zach Brown [Mon, 17 Oct 2016 15:49:53 +0000 (10:49 -0500)]
net: phy: Encapsulate actions performed during link state changes into function phy_adjust_link
During phy state machine state transitions some set of actions should
occur whenever the link state changes. These actions should be
encapsulated into a single function
This patch adds the phy_adjust_link function, which is called whenever
phydev->adjust_link would have been called before. Actions that should
occur whenever the phy link is adjusted can now be added to the
phy_adjust_link function.
Signed-off-by: Zach Brown <zach.brown@ni.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Zach Brown [Mon, 17 Oct 2016 15:49:52 +0000 (10:49 -0500)]
skge: Rename LED_OFF and LED_ON in marvel skge driver to avoid conflicts with leds namespace
Adding led support for phy causes namespace conflicts for some
phy drivers.
The marvel skge driver declared an enum for representing the states of
Link LED Register. The enum contained constant LED_OFF which conflicted
with declartation found in linux/leds.h.
LED_OFF changed to LED_REG_OFF
Also changed LED_ON to LED_REG_ON to avoid possible future conflict and
for consistency.
Signed-off-by: Zach Brown <zach.brown@ni.com> Signed-off-by: David S. Miller <davem@davemloft.net>
When the BUG is converted to a WARN_ON it shows 4 missing adjacencies:
eth3 - myvrf, mvrf - eth3, bond1 - myvrf and myvrf - bond1
All of those are because the __netdev_upper_dev_link function does not
properly link macvlan lower devices to myvrf when it is enslaved.
The second case just flips the ordering of the enslavements:
ip link set bond1 master bridge
ip link set macvlan master myvrf
Then run:
ip link delete bond1
ip link delete myvrf
The vrf delete command hangs because myvrf has a reference that has not
been released. In this case the removal code does not account for 2 paths
between eth3 and myvrf - one from bridge to vrf and the other through the
macvlan.
Rather than try to maintain a linked list of all upper and lower devices
per netdevice, only track the direct neighbors. The remaining stack can
be determined by recursively walking the neighbors.
The existing netdev_for_each_all_upper_dev_rcu,
netdev_for_each_all_lower_dev and netdev_for_each_all_lower_dev_rcu macros
are replaced with APIs that walk the upper and lower device lists. The
new APIs take a callback function and a data arg that is passed to the
callback for each device in the list. Drivers using the old macros are
converted in separate patches to make it easier on reviewers. It is an
API conversion only; no functional change is intended.
v3
- address Stephen's comment to simplify logic and remove typecasts
v2
- fixed bond0 references in cover-letter
- fixed definition of netdev_next_lower_dev_rcu to mirror the upper_dev
version.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
David Ahern [Tue, 18 Oct 2016 02:15:53 +0000 (19:15 -0700)]
net: dev: Improve debug statements for adjacency tracking
Adjacency code only has debugs for the insert case. Add debugs for
the remove path and make both consistently worded to make it easier
to follow the insert and removal with reference counts.
In addition, change the BUG to a WARN_ON. A missing adjacency at
removal time is not cause for a panic.
Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>