It turns out that the p54 and cw2100 drivers assume that there's
tailroom even when they don't say they really need it. However,
there's currently no way for them to explicitly say they do need
it, so for now revert this.
This fixes https://bugzilla.kernel.org/show_bug.cgi?id=90331.
Cc: stable@vger.kernel.org Fixes: ca34e3b5c808 ("mac80211: Fix accounting of the tailroom-needed counter") Reported-by: Christopher Chavez <chrischavez@gmx.us> Bisected-by: Larry Finger <Larry.Finger@lwfinger.net> Debugged-by: Christian Lamparter <chunkeey@googlemail.com> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Ying Xue [Sun, 4 Jan 2015 07:24:35 +0000 (15:24 +0800)]
list_nulls: fix missing header
Fixup below build error:
include/linux/list_nulls.h: In function ‘hlist_nulls_del’:
include/linux/list_nulls.h:84:13: error: ‘LIST_POISON2’ undeclared (first use in this function)
Signed-off-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Mon, 5 Jan 2015 03:21:39 +0000 (22:21 -0500)]
Merge branch 'geneve-next'
Jesse Gross says:
====================
Geneve Cleanups
Much of the basis for the Geneve code comes from VXLAN. However,
Geneve is quite a bit simpler than VXLAN and so this cleans up a
lot of the infrastruction - particularly around locking - where the
extra complexity is not necessary.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross [Sat, 3 Jan 2015 02:26:05 +0000 (18:26 -0800)]
geneve: Check family when reusing sockets.
When searching for an existing socket to reuse, the address family
is not taken into account - only port number. This means that an
IPv4 socket could be used for IPv6 traffic and vice versa, which
is sure to cause problems when passing packets.
It is not possible to trigger this problem currently because the
only user of Geneve creates just IPv4 sockets. However, that is
likely to change in the near future.
Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross [Sat, 3 Jan 2015 02:26:04 +0000 (18:26 -0800)]
geneve: Remove socket hash table.
The hash table for open Geneve ports is used only on creation and
deletion time. It is not performance critical and is not likely to
grow to a large number of items. Therefore, this can be changed
to use a simple linked list.
Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross [Sat, 3 Jan 2015 02:26:03 +0000 (18:26 -0800)]
geneve: Simplify locking.
The existing Geneve locking scheme was pulled over directly from
VXLAN. However, VXLAN has a number of built in mechanisms which make
the locking more complex and are unlikely to be necessary with Geneve.
This simplifies the locking to use a basic scheme of a mutex
when doing updates plus RCU on receive.
In addition to making the code easier to read, this also avoids the
possibility of a race when creating or destroying sockets since
UDP sockets and the list of Geneve sockets are protected by different
locks. After this change, the entire operation is atomic.
Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jesse Gross [Sat, 3 Jan 2015 02:26:02 +0000 (18:26 -0800)]
geneve: Remove workqueue.
The work queue is used only to free the UDP socket upon destruction.
This is not necessary with Geneve and generally makes the code more
difficult to reason about. It also introduces nondeterministic
behavior such as when a socket is rapidly deleted and recreated, which
could fail as the the deletion happens asynchronously.
Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Felipe Balbi [Fri, 2 Jan 2015 22:15:59 +0000 (16:15 -0600)]
net: ethernet: cpsw: fix hangs with interrupts
The CPSW IP implements pulse-signaled interrupts. Due to
that we must write a correct, pre-defined value to the
CPDMA_MACEOIVECTOR register so the controller generates
a pulse on the correct IRQ line to signal the End Of
Interrupt.
The way the driver is written today, all four IRQ lines
are requested using the same IRQ handler and, because of
that, we could fall into situations where a TX IRQ fires
but we tell the controller that we ended an RX IRQ (or
vice-versa). This situation triggers an IRQ storm on the
reserved IRQ 127 of INTC which will in turn call ack_bad_irq()
which will, then, print a ton of:
unexpected IRQ trap at vector 00
In order to fix the problem, we are moving all calls to
cpdma_ctlr_eoi() inside the IRQ handler and making sure
we *always* write the correct value to the CPDMA_MACEOIVECTOR
register. Note that the algorithm assumes that IRQ numbers and
value-to-be-written-to-EOI are proportional, meaning that a
write of value 0 would trigger an EOI pulse for the RX_THRESHOLD
Interrupt and that's the IRQ number sitting in the 0-th index
of our irqs_table array.
This, however, is safe at least for current implementations of
CPSW so we will refrain from making the check smarter (and, as
a side-effect, slower) until we actually have a platform where
IRQ lines are swapped.
This patch has been tested for several days with AM335x- and
AM437x-based platforms. AM57x was left out because there are
still pending patches to enable ethernet in mainline for that
platform. A read of the TRM confirms the statement on previous
paragraph.
Reported-by: Yegor Yefremov <yegorslists@googlemail.com> Fixes: 510a1e7 (drivers: net: davinci_cpdma: acknowledge interrupt properly) Cc: <stable@vger.kernel.org> # v3.9+ Signed-off-by: Felipe Balbi <balbi@ti.com> Acked-by: Tony Lindgren <tony@atomide.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Sun, 4 Jan 2015 19:46:43 +0000 (11:46 -0800)]
Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml
Pull UML fixes from Richard Weinberger:
"Two fixes for UML regressions. Nothing exciting"
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml:
x86, um: actually mark system call tables readonly
um: Skip futex_atomic_cmpxchg_inatomic() test
Pavel Machek [Sun, 4 Jan 2015 19:01:23 +0000 (20:01 +0100)]
Revert "ARM: 7830/1: delay: don't bother reporting bogomips in /proc/cpuinfo"
Commit 9fc2105aeaaf ("ARM: 7830/1: delay: don't bother reporting
bogomips in /proc/cpuinfo") breaks audio in python, and probably
elsewhere, with message
Daniel Borkmann [Sat, 3 Jan 2015 12:11:10 +0000 (13:11 +0100)]
x86, um: actually mark system call tables readonly
Commit a074335a370e ("x86, um: Mark system call tables readonly") was
supposed to mark the sys_call_table in UML as RO by adding the const,
but it doesn't have the desired effect as it's nevertheless being placed
into the data section since __cacheline_aligned enforces sys_call_table
being placed into .data..cacheline_aligned instead. We need to use
the ____cacheline_aligned version instead to fix this issue.
Before:
$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
U sys_writev 0000000000000000 D sys_call_table 0000000000000000 D syscall_table_size
After:
$ nm -v arch/x86/um/sys_call_table_64.o | grep -1 "sys_call_table"
U sys_writev 0000000000000000 R sys_call_table 0000000000000000 D syscall_table_size
Fixes: a074335a370e ("x86, um: Mark system call tables readonly") Cc: H. Peter Anvin <hpa@zytor.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Daniel Borkmann <dborkman@redhat.com> Signed-off-by: Richard Weinberger <richard@nod.at>
futex_atomic_cmpxchg_inatomic() does not work on UML because
it triggers a copy_from_user() in kernel context.
On UML copy_from_user() can only be used if the kernel was called
by a real user space process such that UML can use ptrace()
to fetch the value.
Reported-by: Miklos Szeredi <miklos@szeredi.hu> Suggested-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Richard Weinberger <richard@nod.at> Tested-by: Daniel Walter <d.walter@0x90.at>
David S. Miller [Sat, 3 Jan 2015 19:33:03 +0000 (14:33 -0500)]
Merge branch 'rhashtable-next'
Thomas Graf says:
====================
rhashtable: Per bucket locks & deferred table resizing
Prepares for and introduces per bucket spinlocks and deferred table
resizing. This allows for parallel table mutations in different hash
buckets from atomic context. The resizing occurs in the background
in a separate worker thread while lookups, inserts, and removals can
continue.
Also modified the chain linked list to be terminated with a special
nulls marker to allow entries to move between multiple lists.
Last but not least, reintroduces lockless netlink_lookup() with
deferred Netlink socket destruction to avoid the side effect of
increased netlink_release() runtime.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Fri, 2 Jan 2015 22:00:22 +0000 (23:00 +0100)]
netlink: Lockless lookup with RCU grace period in socket release
Defers the release of the socket reference using call_rcu() to
allow using an RCU read-side protected call to rhashtable_lookup()
This restores behaviour and performance gains as previously
introduced by e341694 ("netlink: Convert netlink_lookup() to use
RCU protected hash table") without the side effect of severely
delayed socket destruction.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Fri, 2 Jan 2015 22:00:21 +0000 (23:00 +0100)]
rhashtable: Supports for nulls marker
In order to allow for wider usage of rhashtable, use a special nulls
marker to terminate each chain. The reason for not using the existing
nulls_list is that the prev pointer usage would not be valid as entries
can be linked in two different buckets at the same time.
The 4 nulls base bits can be set through the rhashtable_params structure
like this:
Thomas Graf [Fri, 2 Jan 2015 22:00:20 +0000 (23:00 +0100)]
rhashtable: Per bucket locks & deferred expansion/shrinking
Introduces an array of spinlocks to protect bucket mutations. The number
of spinlocks per CPU is configurable and selected based on the hash of
the bucket. This allows for parallel insertions and removals of entries
which do not share a lock.
The patch also defers expansion and shrinking to a worker queue which
allows insertion and removal from atomic context. Insertions and
deletions may occur in parallel to it and are only held up briefly
while the particular bucket is linked or unzipped.
Mutations of the bucket table pointer is protected by a new mutex, read
access is RCU protected.
In the event of an expansion or shrinking, the new bucket table allocated
is exposed as a so called future table as soon as the resize process
starts. Lookups, deletions, and insertions will briefly use both tables.
The future table becomes the main table after an RCU grace period and
initial linking of the old to the new table was performed. Optimization
of the chains to make use of the new number of buckets follows only the
new table is in use.
The side effect of this is that during that RCU grace period, a bucket
traversal using any rht_for_each() variant on the main table will not see
any insertions performed during the RCU grace period which would at that
point land in the future table. The lookup will see them as it searches
both tables if needed.
Having multiple insertions and removals occur in parallel requires nelems
to become an atomic counter.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Fri, 2 Jan 2015 22:00:18 +0000 (23:00 +0100)]
nft_hash: Remove rhashtable_remove_pprev()
The removal function of nft_hash currently stores a reference to the
previous element during lookup which is used to optimize removal later
on. This was possible because a lock is held throughout calling
rhashtable_lookup() and rhashtable_remove().
With the introdution of deferred table resizing in parallel to lookups
and insertions, the nftables lock will no longer synchronize all
table mutations and the stored pprev may become invalid.
Removing this optimization makes removal slightly more expensive on
average but allows taking the resize cost out of the insert and
remove path.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Cc: netfilter-devel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Fri, 2 Jan 2015 22:00:17 +0000 (23:00 +0100)]
rhashtable: Factor out bucket_tail() function
Subsequent patches will require access to the bucket tail. Access
to the tail is relatively cheap as the automatic resizing of the
table should keep the number of entries per bucket to no more
than 0.75 on average.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Graf [Fri, 2 Jan 2015 22:00:16 +0000 (23:00 +0100)]
rhashtable: Convert bucket iterators to take table and index
This patch is in preparation to introduce per bucket spinlocks. It
extends all iterator macros to take the bucket table and bucket
index. It also introduces a new rht_dereference_bucket() to
handle protected accesses to buckets.
It introduces a barrier() to the RCU iterators to the prevent
the compiler from caching the first element.
The lockdep verifier is introduced as stub which always succeeds
and properly implement in the next patch when the locks are
introduced.
Signed-off-by: Thomas Graf <tgraf@suug.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Fri, 2 Jan 2015 21:47:51 +0000 (16:47 -0500)]
Merge branch 'timecounter-next'
Richard Cochran says:
====================
Fixing the "Time Counter fixes and improvements"
For this series I had only tested the build with ARCH=x86 and arm, but
others like sparc64, microblaze, powerpc, and s390 will fail because
they somehow don't indirectly include clocksource.h for the drivers in
question.
This series fixes the build issues reported by:
kbuild test robot <fengguang.wu@intel.com>
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Some buggy firmwares export an incorrect MAC address (00:a0:c6:00:00:00). This
makes for example checking devices for random MAC addresses tricky, and you
might end up with multiple network interfaces with the same address.
This patch tries to fix, or at least improve, the situation by setting the MAC
address of devices with this firmware bug to a random address. I tested the
patch with two devices that has this firmware bug (Huawei E398 and E392), and
network traffic worked fine after changing the address.
Signed-off-by: Kristian Evensen <kristian.evensen@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
net: ethernet: cisco: enic: enic_dev: Remove some unused functions
Removes some functions that are not used anywhere:
enic_dev_enable2_done() enic_dev_enable2() enic_dev_deinit_done()
enic_dev_init_prov2() enic_vnic_dev_deinit()
This was partially found by using a static code analysis program called cppcheck.
Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 2 Jan 2015 21:24:41 +0000 (13:24 -0800)]
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"This is a set of three fixes: one to correct an abort path thinko
causing failures (and a panic) in USB on device misbehaviour, One to
fix an out of order issue in the fnic driver and one to match discard
expectations to qemu which otherwise cause Linux to behave badly as a
guest"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
SCSI: fix regression in scsi_send_eh_cmnd()
fnic: IOMMU Fault occurs when IO and abort IO is out of order
sd: tweak discard heuristics to work around QEMU SCSI issue
Ben Pfaff [Wed, 31 Dec 2014 16:45:46 +0000 (08:45 -0800)]
openvswitch: Consistently include VLAN header in flow and port stats.
Until now, when VLAN acceleration was in use, the bytes of the VLAN header
were not included in port or flow byte counters. They were however
included when VLAN acceleration was not used. This commit corrects the
inconsistency, by always including the VLAN header in byte counters.
Previous discussion at
http://openvswitch.org/pipermail/dev/2014-December/049521.html
Reported-by: Motonori Shindo <mshindo@vmware.com> Signed-off-by: Ben Pfaff <blp@nicira.com> Reviewed-by: Flavio Leitner <fbl@sysclose.org> Acked-by: Pravin B Shelar <pshelar@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Herbert Xu [Wed, 31 Dec 2014 13:39:23 +0000 (00:39 +1100)]
tcp: Do not apply TSO segment limit to non-TSO packets
Thomas Jarosch reported IPsec TCP stalls when a PMTU event occurs.
In fact the problem was completely unrelated to IPsec. The bug is
also reproducible if you just disable TSO/GSO.
The problem is that when the MSS goes down, existing queued packet
on the TX queue that have not been transmitted yet all look like
TSO packets and get treated as such.
This then triggers a bug where tcp_mss_split_point tells us to
generate a zero-sized packet on the TX queue. Once that happens
we're screwed because the zero-sized packet can never be removed
by ACKs.
Fixes: 1485348d242 ("tcp: Apply device TSO segment limit earlier") Reported-by: Thomas Jarosch <thomas.jarosch@intra2net.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Cheers, Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Westphal [Wed, 31 Dec 2014 12:33:41 +0000 (13:33 +0100)]
net: skbuff: don't zero tc members when freeing skb
Not needed, only four cases:
- kfree_skb (or one of its aliases).
Don't need to zero, memory will be freed.
- kfree_skb_partial and head was stolen: memory will be freed.
- skb_morph: The skb header fields (including tc ones) will be
copied over from the 'to-be-morphed' skb right after
skb_release_head_state returns.
- skb_segment: Same as before, all the skb header
fields are copied over from the original skb right away.
Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Here's the first batch of bluetooth patches for 3.20.
- Cleanups & fixes to ieee802154 drivers
- Fix synchronization of mgmt commands with respective HCI commands
- Add self-tests for LE pairing crypto functionality
- Remove 'BlueFritz!' specific handling from core using a new quirk flag
- Public address configuration support for ath3012
- Refactor debugfs support into a dedicated file
- Initial support for LE Data Length Extension feature from Bluetooth 4.2
Please let me know if there are any issues pulling. Thanks.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 2 Jan 2015 20:57:20 +0000 (12:57 -0800)]
Merge tag 'sound-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound
Pull sound fixes from Takashi Iwai:
"Nothing too exciting as a new year's start here: most of fixes are for
ASoC, a boot crash fix on OMAP for deferred probe, a few driver
specific fixes (Intel, dwc, rockchip, rt5677), in addition to typo
fixes in kerneldoc comments for PCM"
* tag 'sound-3.19-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
ALSA: pcm: Fix kerneldoc for params_*() functions
ASoC: rockchip: i2s: fix maxburst of dma data to 4
ASoC: rockchip: i2s: fix error defination of transmit data level
ASoC: Intel: correct the fixed free block allocation
ASoC: rt5677: fixed rt5677_dsp_vad_put rt5677_dsp_vad_get panic
ASoC: Intel: Fix BYTCR machine driver MODULE_ALIAS
ASoC: Intel: Fix BYTCR firmware name
ASoC: dwc: Iterate over all channels
ASoC: dwc: Ensure FIFOs are flushed to prevent channel swap
ASoC: Intel: Add I2C dependency to two new machines
ASoC: dapm: Remove snd_soc_of_parse_audio_routing() due to deferred probe
Joe Stringer [Wed, 31 Dec 2014 03:10:16 +0000 (19:10 -0800)]
geneve: Add Geneve GRO support
This results in an approximately 30% increase in throughput
when handling encapsulated bulk traffic.
Signed-off-by: Joe Stringer <joestringer@nicira.com> Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Currently the only tunnel protocol that supports GRO with encapsulated
Ethernet is VXLAN. This pulls out the Ethernet code into a proper layer
so that it can be used by other tunnel protocols such as GRE and Geneve.
Signed-off-by: Jesse Gross <jesse@nicira.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Kostya Belezko [Tue, 30 Dec 2014 17:27:09 +0000 (12:27 -0500)]
Altera TSE: Add missing phydev
Altera network device doesn't come up after
ifconfig eth0 down
ifconfig eth0 up
The reason behind is clearing priv->phydev during tse_shutdown().
The phydev is not restored back at tse_open().
Resubmiting as to follow Tobias Klauser suggestion.
phy_start/phy_stop are called on each ifup/ifdown and
phy_disconnect is called once during the module removal.
Signed-off-by: Kostya Belezko <bkostya@hotmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jack Morgenstein [Tue, 30 Dec 2014 09:59:50 +0000 (11:59 +0200)]
net/mlx4_core: Fix error flow in mlx4_init_hca()
We shouldn't call UNMAP_FA here, this is done in mlx4_load_one.
If mlx4_query_func fails, we need to invoke CLOSE_HCA for both
native and master.
Fixes: a0eacca948d2 ('net/mlx4_core: Refactor mlx4_load_one') Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Maor Gottlieb [Tue, 30 Dec 2014 09:59:49 +0000 (11:59 +0200)]
net/mlx4_core: Correcly update the mtt's offset in the MR re-reg flow
Previously, mlx4_mt_rereg_write filled the MPT's entity_size with the
old MTT's page shift, which could result in using an incorrect offset.
Fix the initialization to be after we calculate the new MTT offset.
In addition, assign mtt order to -1 after calling mlx4_mtt_cleanup. This
is necessary in order to mark the MTT as invalid and avoid freeing it later.
Fixes: e630664 ('mlx4_core: Add helper functions to support MR re-registration') Signed-off-by: Maor Gottlieb <maorg@mellanox.com> Signed-off-by: Matan Barak <matanb@mellanox.com> Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Fri, 2 Jan 2015 20:07:50 +0000 (12:07 -0800)]
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull vhost cleanup and virtio bugfix
"There's a single change here, fixing a vhost bug where vhost
initialization fails due to used ring alignment check being too
strict"
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vhost: relax used address alignment
virtio_ring: document alignment requirements
Yongjian Xu [Tue, 30 Dec 2014 08:03:46 +0000 (16:03 +0800)]
qlcnic: Fix return value in qlcnic_probe()
If the check of adapter fails and goes into the 'else' branch, the
return value 'err' should not still be zero.
Signed-off-by: Yongjian Xu <xuyongjiande@gmail.com> Acked-by: Shahed Shaikh <shahed.shaikh@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Roger Chen [Mon, 29 Dec 2014 09:43:36 +0000 (17:43 +0800)]
GMAC: add driver for Rockchip RK3288 SoCs integrated GMAC
This driver is based on stmmac driver.
changes since v2:
- use tab instead of space for macros
- use HIWORD_UPDATE macro for GMAC_CLK_RX_DL_CFG and GMAC_CLK_TX_DL_CFG
- remove drive-strength setting in the driver and set it in the pinctrl settings
- use dev_err instead of pr_err
- remove clock names's macros, just use the real name of the clock
- use devm_clk_get() instead of clk_get()
- remove clk_set_parent(bsp_priv->clk_mac, bsp_priv->clk_mac_pll)
- remove gpio setting for LDO, just use regulator API
- remove phy reset using gpio in the glue layer, it has been handled in the stmmac driver
- remove handling phy interrupt (mii interrupt)
changes since v1:
- use BIT() to set register
- combine two remap_write() operations into one for the same register
- use macros for register value setting
- remove grf fail check in rk_gmac_setup() and save all the check in set_rgmii_speed()
- remove .tx_coe=1 in rk_gmac_data
Signed-off-by: Roger Chen <roger.chen@rock-chips.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Mon, 8 Dec 2014 04:28:39 +0000 (04:28 +0000)]
i40e: Fix possible memory leak in i40e_dbg_dump_desc
I didn't notice that return in the code, fix it by
adding a goto out instead to free the memory.
Fixes:
> New smatch warnings:
> drivers/net/ethernet/intel/i40e/i40e_debugfs.c:832 i40e_dbg_dump_desc() warn: possible memory leak of 'ring'
Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Joe Perches <joe@perches.com> Tested-by: Jim Young <james.m.young@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
David S. Miller [Wed, 31 Dec 2014 23:26:02 +0000 (18:26 -0500)]
Merge branch 'fib_trie-next'
Alexander Duyck says:
====================
fib_trie: Reduce time spent in fib_table_lookup by 35 to 75%
These patches are meant to address several performance issues I have seen
in the fib_trie implementation, and fib_table_lookup specifically. With
these changes in place I have seen a reduction of up to 35 to 75% for the
total time spent in fib_table_lookup depending on the type of search being
performed.
On a VM running in my Corei7-4930K system with a trie of maximum depth of 7
this resulted in a reduction of over 370ns per packet in the total time to
process packets received from an ixgbe interface and route them to a dummy
interface. This represents a failed lookup in the local trie followed by
a successful search in the main trie.
In the simple case of receiving a frame and dropping it before it can reach
the socket layer I saw a reduction of 40ns per packet. This represents a
trip through the local trie with the correct leaf found with no need for
any backtracing.
These changes have resulted in several functions being inlined such as
check_leaf and fib_find_node, but due to the code simplification the
overall size of the code has been reduced.
text data bss dec hex filename
16932 376 16 17324 43ac net/ipv4/fib_trie.o - before
15259 376 8 15643 3d1b net/ipv4/fib_trie.o - after
Changes since RFC:
Replaced this_cpu_ptr with correct call to this_cpu_inc in patch 1
Changed test for leaf_info mismatch to (key ^ n->key) & li->mask_plen in patch 10
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 31 Dec 2014 18:57:08 +0000 (10:57 -0800)]
fib_trie: Add tracking value for suffix length
This change adds a tracking value for the maximum suffix length of all
prefixes stored in any given tnode. With this value we can determine if we
need to backtrace or not based on if the suffix is greater than the pos
value.
By doing this we can reduce the CPU overhead for lookups in the local table
as many of the prefixes there are 32b long and have a suffix length of 0
meaning we can immediately backtrace to the root node without needing to
test any of the nodes between it and where we ended up.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 31 Dec 2014 18:57:02 +0000 (10:57 -0800)]
fib_trie: Remove checks for index >= tnode_child_length from tnode_get_child
For some reason the compiler doesn't seem to understand that when we are in
a loop that runs from tnode_child_length - 1 to 0 we don't expect the value
of tn->bits to change. As such every call to tnode_get_child was rerunning
tnode_chile_length which ended up consuming quite a bit of space in the
resultant assembly code.
I have gone though and verified that in all cases where tnode_get_child
is used we are either winding though a fixed loop from tnode_child_length -
1 to 0, or are in a fastpath case where we are verifying the value by
either checking for any remaining bits after shifting index by bits and
testing for leaf, or by using tnode_child_length.
size net/ipv4/fib_trie.o
Before:
text data bss dec hex filename
15506 376 8 15890 3e12 net/ipv4/fib_trie.o
After:
text data bss dec hex filename
14827 376 8 15211 3b6b net/ipv4/fib_trie.o
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 31 Dec 2014 18:56:55 +0000 (10:56 -0800)]
fib_trie: inflate/halve nodes in a more RCU friendly way
This change pulls the node_set_parent functionality out of put_child_reorg
and instead leaves that to the function to take care of as well. By doing
this we can fully construct the new cluster of tnodes and all of the
pointers out of it before we start routing pointers into it.
I am suspecting this will likely fix some concurency issues though I don't
have a good test to show as such.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 31 Dec 2014 18:56:49 +0000 (10:56 -0800)]
fib_trie: Push tnode flushing down to inflate/halve
This change pushes the tnode freeing down into the inflate and halve
functions. It makes more sense here as we have a better grasp of what is
going on and when a given cluster of nodes is ready to be freed.
I believe this may address a bug in the freeing logic as well. For some
reason if the freelist got to a certain size we would call
synchronize_rcu(). I'm assuming that what they meant to do is call
synchronize_rcu() after they had handed off that much memory via
call_rcu(). As such that is what I have updated the behavior to be.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 31 Dec 2014 18:56:43 +0000 (10:56 -0800)]
fib_trie: Push assignment of child to parent down into inflate/halve
This change makes it so that the assignment of the tnode to the parent is
handled directly within whatever function is currently handling the node be
it inflate, halve, or resize. By doing this we can avoid some of the need
to set NULL pointers in the tree while we are resizing the subnodes.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 31 Dec 2014 18:56:37 +0000 (10:56 -0800)]
fib_trie: Add functions should_inflate and should_halve
This change pulls the logic for if we should inflate/halve the nodes out
into separate functions. It also addresses what I believe is a bug where 1
full node is all that is needed to keep a node from ever being halved.
Simple script to reproduce the issue:
modprobe dummy; ifconfig dummy0 up
for i in `seq 0 255`; do ifconfig dummy0:$i 10.0.${i}.1/24 up; done
ifconfig dummy0:256 10.0.255.33/16 up
for i in `seq 0 254`; do ifconfig dummy0:$i down; done
Results from /proc/net/fib_triestat
Before:
Local:
Aver depth: 3.00
Max depth: 4
Leaves: 17
Prefixes: 18
Internal nodes: 11
1: 8 2: 2 10: 1
Pointers: 1048
Null ptrs: 1021
Total size: 11 kB
After:
Local:
Aver depth: 3.41
Max depth: 5
Leaves: 17
Prefixes: 18
Internal nodes: 12
1: 8 2: 3 3: 1
Pointers: 36
Null ptrs: 8
Total size: 3 kB
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 31 Dec 2014 18:56:24 +0000 (10:56 -0800)]
fib_trie: Push rcu_read_lock/unlock to callers
This change is to start cleaning up some of the rcu_read_lock/unlock
handling. I realized while reviewing the code there are several spots that
I don't believe are being handled correctly or are masking warnings by
locally calling rcu_read_lock/unlock instead of calling them at the correct
level.
A common example is a call to fib_get_table followed by fib_table_lookup.
The rcu_read_lock/unlock ought to wrap both but there are several spots where
they were not wrapped.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 31 Dec 2014 18:56:18 +0000 (10:56 -0800)]
fib_trie: Use unsigned long for anything dealing with a shift by bits
This change makes it so that anything that can be shifted by, or compared
to a value shifted by bits is updated to be an unsigned long. This is
mostly a precaution against an insanely huge address space that somehow
starts coming close to the 2^32 root node size which would require
something like 1.5 billion addresses.
I chose unsigned long instead of unsigned long long since I do not believe
it is possible to allocate a 32 bit tnode on a 32 bit system as the memory
consumed would be 16GB + 28B which exceeds the addressible space for any
one process.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 31 Dec 2014 18:56:12 +0000 (10:56 -0800)]
fib_trie: Update meaning of pos to represent unchecked bits
This change moves the pos value to the other side of the "bits" field. By
doing this it actually simplifies a significant amount of code in the trie.
For example when halving a tree we know that the bit lost exists at
oldnode->pos, and if we inflate the tree the new bit being add is at
tn->pos. Previously to find those bits you would have to subtract pos and
bits from the keylength or start with a value of (1 << 31) and then shift
that.
There are a number of spots throughout the code that benefit from this. In
the case of the hot-path searches the main advantage is that we can drop 2
or more operations from the search path as we no longer need to compute the
value for the index to be shifted by and can instead just use the raw pos
value.
In addition the tkey_extract_bits is now defunct and can be replaced by
get_index since the two operations were doing the same thing, but now
get_index does it much more quickly as it is only an xor and shift versus a
pair of shifts and a subtraction.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 31 Dec 2014 18:56:06 +0000 (10:56 -0800)]
fib_trie: Optimize fib_table_insert
This patch updates the fib_table_insert function to take advantage of the
changes made to improve the performance of fib_table_lookup. As a result
the code should be smaller and run faster then the original.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 31 Dec 2014 18:56:00 +0000 (10:56 -0800)]
fib_trie: Optimize fib_find_node
This patch makes use of the same features I made use of for
fib_table_lookup to streamline fib_find_node. The resultant code should be
smaller and run faster than the original.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 31 Dec 2014 18:55:54 +0000 (10:55 -0800)]
fib_trie: Optimize fib_table_lookup to avoid wasting time on loops/variables
This patch is meant to reduce the complexity of fib_table_lookup by reducing
the number of variables to the bare minimum while still keeping the same if
not improved functionality versus the original.
Most of this change was started off by the desire to rid the function of
chopped_off and current_prefix_length as they actually added very little to
the function since they only applied when computing the cindex. I was able
to replace them mostly with just a check for the prefix match. As long as
the prefix between the key and the node being tested was the same we know
we can search the tnode fully versus just testing cindex 0.
The second portion of the change ended up being a massive reordering.
Originally the calls to check_leaf were up near the start of the loop, and
the backtracing and descending into lower levels of tnodes was later. This
didn't make much sense as the structure of the tree means the leaves are
always the last thing to be tested. As such I reordered things so that we
instead have a loop that will delve into the tree and only exit when we
have either found a leaf or we have exhausted the tree. The advantage of
rearranging things like this is that we can fully inline check_leaf since
there is now only one reference to it in the function.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 31 Dec 2014 18:55:47 +0000 (10:55 -0800)]
fib_trie: Merge leaf into tnode
This change makes it so that leaf and tnode are the same struct. As a
result there is no need for rt_trie_node anymore since everyting can be
merged into tnode.
On 32b systems this results in the leaf being 4 bytes larger, however I
don't know if that is really an issue as this and an eariler patch that
added bits & pos have increased the size from 20 to 28. If I am not
mistaken slub/slab allocate on power of 2 sizes so 20 was likely being
rounded up to 32 anyway.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 31 Dec 2014 18:55:41 +0000 (10:55 -0800)]
fib_trie: Merge tnode_free and leaf_free into node_free
Both the leaf and the tnode had an rcu_head in them, but they had them in
slightly different places. Since we now have them in the same spot and
know that any node with bits == 0 is a leaf and the rest are either vmalloc
or kmalloc tnodes depending on the value of bits it makes it easy to combine
the functions and reduce overhead.
In addition I have taken advantage of the rcu_head pointer to go ahead and
put together a simple linked list instead of using the tnode pointer as
this way we can merge either type of structure for freeing.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 31 Dec 2014 18:55:35 +0000 (10:55 -0800)]
fib_trie: Make leaf and tnode more uniform
This change makes some fundamental changes to the way leaves and tnodes are
constructed. The big differences are:
1. Leaves now populate pos and bits indicating their full key size.
2. Trie nodes now mask out their lower bits to be consistent with the leaf
3. Both structures have been reordered so that rt_trie_node now consisists
of a much larger region including the pos, bits, and rcu portions of
the tnode structure.
On 32b systems this will result in the leaf being 4B larger as the pos and
bits values were added to a hole created by the key as it was only 4B in
length.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Alexander Duyck [Wed, 31 Dec 2014 18:55:29 +0000 (10:55 -0800)]
fib_trie: Update usage stats to be percpu instead of global variables
The trie usage stats were currently being shared by all threads that were
calling fib_table_lookup. As a result when multiple threads were
performing lookups simultaneously the trie would begin to cache bounce
between those threads.
In order to prevent this I have updated the usage stats to use a set of
percpu variables. By doing this we should be able to avoid the cache
bouncing and still make use of these stats.
Signed-off-by: Alexander Duyck <alexander.h.duyck@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Linus Torvalds [Wed, 31 Dec 2014 22:52:18 +0000 (14:52 -0800)]
Merge branch 'upstream' of git://git.infradead.org/users/pcmoore/audit
Pull audit fix from Paul Moore:
"One audit patch to resolve a panic/oops when recording filenames in
the audit log, see the mail archive link below.
The fix isn't as nice as I would like, as it involves an allocate/copy
of the filename, but it solves the problem and the overhead should
only affect users who have configured audit rules involving file
names.
We'll revisit this issue with future kernels in an attempt to make
this suck less, but in the meantime I think this fix should go into
the next release of v3.19-rcX.
Todd Fujinaka [Thu, 27 Nov 2014 01:00:02 +0000 (01:00 +0000)]
igb: Remove unneeded FIXME
Remove a FIXME comment that was missed in a commit on 1/2007.
Signed-off-by: Todd Fujinaka <todd.fujinaka@intel.com> Reported-by: nick <xerofoify@gmail.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
John W. Linville [Sat, 15 Nov 2014 01:19:53 +0000 (01:19 +0000)]
e100: fix typo in MDI/MDI-X eeprom check in e100_phy_init
Although it doesn't explicitly say so, commit 60ffa478759f39a2 ("e100:
Fix MDIO/MDIO-X") appears to be intended to revert the earlier commit 648951451e6d2d53 ("e100: fixed e100 MDI/MDI-X issues"). However,
careful examination reveals that the attempted revert actually
_inverted_ the test for eeprom_mdix_enabled. That is bound to program
a few PHYs incorrectly...
Signed-off-by: "John W. Linville" <linville@tuxdriver.com> Tested-by: Aaron Brown <aaron.f.brown@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
The GRE tap device supports Ethernet over GRE, but doesn't
care about the source address of the tunnel, therefore it
can be changed without bring device down.
Signed-off-by: Stephen Hemminger <stephen@networkplumber.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Bill Hong [Sat, 27 Dec 2014 18:12:39 +0000 (10:12 -0800)]
l2tp : multicast notification to the registered listeners
Previously l2tp module did not provide any means for the user space to
get notified when tunnels/sessions are added/modified/deleted.
This change contains the following
- create a multicast group for the listeners to register.
- notify the registered listeners when the tunnels/sessions are
created/modified/deleted.
Signed-off-by: Bill Hong <bhong@brocade.com> Reviewed-by: Stephen Hemminger <stephen@networkplumber.org> Reviewed-by: Sven-Thorsten Dietrich <sven@brocade.com> Signed-off-by: David S. Miller <davem@davemloft.net>