Mintz, Yuval [Sun, 1 Jan 2017 11:57:07 +0000 (13:57 +0200)]
qed*: RSS indirection based on queue-handles
A step toward having qede agnostic to the queue configurations
in firmware/hardware - let the RSS indirections use queue handles
instead of actual queue indices.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Manish Chopra [Sun, 1 Jan 2017 11:57:05 +0000 (13:57 +0200)]
qede - mark SKB as encapsulated
When driver receives a recognized encapsulated packet it needs
to set the skb->encapsulation field as well.
Signed-off-by: Manish Chopra <Manish.Chopra@cavium.com> Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Sun, 1 Jan 2017 11:57:04 +0000 (13:57 +0200)]
qede: Postpone reallocation until NAPI end
During Rx flow driver allocates a replacement buffer each time
it consumes an Rx buffer. Failing to do so, it would consume the
currently processed buffer and re-post it on the ring.
As a result, the Rx ring is always completely full [from driver POV].
We now allow the Rx ring to shorten by doing the re-allocations
at the end of the NAPI run. The only limitation is that we still want to
make sure each time we reallocate that we'd still have sufficient
elements in the Rx ring to guarantee that FW would be able to post
additional data and trigger an interrupt.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Sun, 1 Jan 2017 11:57:03 +0000 (13:57 +0200)]
qed*: Change maximal number of queues
Today qede requests contexts that would suffice for 64 'whole'
combined queues [192 meant for 64 rx, tx and xdp tx queues],
but registers netdev and limits the number of queues based on
information received by qed. In turn, qed doesn't take context
into account when informing qede how many queues it can support.
This would lead to a configuration problem in case user tries
configuring >64 combined queues to interface [or >96 in case
xdp isn't enabled]. Since we don't have a mangement firware
that actually provides so many interrupt lines to a single
device we're currently safe but that's about to change soon.
The new maximum is hence changed:
- For RoCE devices, the limit would remain 64.
- For non-RoCE devices, the limit might be higher [depending
on the actual configuration of the device].
qed would start enforcing that limit in both scenarios.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mintz, Yuval [Sun, 1 Jan 2017 11:57:00 +0000 (13:57 +0200)]
qed*: Update to dual-license
Since the submission of the qedr driver, there's inconsistency
in the licensing of the various qed/qede files - some are GPLv2
and some are dual-license.
Since qedr requires dual-license and it's dependent on both,
we're updating the licensing of all qed/qede source files.
Signed-off-by: Yuval Mintz <Yuval.Mintz@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Thomas Preisner [Fri, 30 Dec 2016 02:37:54 +0000 (03:37 +0100)]
net: 3com: typhoon: typhoon_init_one: make return values more specific
In some cases the return value of a failing function is not being used
and the function typhoon_init_one() returns another negative error code
instead.
Signed-off-by: Thomas Preisner <thomas.preisner+linux@fau.de> Signed-off-by: Milan Stephan <milan.stephan+linux@fau.de> Signed-off-by: David S. Miller <davem@davemloft.net>
In a few cases the err-variable is not set to a negative error code if a
function call in typhoon_init_one() fails and thus 0 is returned
instead.
It may be better to set err to the appropriate negative error
code before returning.
Reported-by: Pan Bian <bianpan2016@163.com> Signed-off-by: Thomas Preisner <thomas.preisner+linux@fau.de> Signed-off-by: Milan Stephan <milan.stephan+linux@fau.de> Signed-off-by: David S. Miller <davem@davemloft.net>
Relax the check in setsockopt to allow setting mc_index to an L3 slave if
sk_bound_dev_if points to an L3 master.
Make a similar change for IPv6. In this case change the device lookup to
take the rcu_read_lock avoiding a refcnt. The rcu lock is also needed for
the lookup of a potential L3 master device.
This really only silences a setsockopt failure since uses of mc_index are
secondary to sk_bound_dev_if if it is set. In both cases, if either index
is an L3 slave or master, lookups are directed to the same FIB table so
relaxing the check at setsockopt time causes no harm.
Patch is based on a suggested change by Darwin for a problem noted in
their code base.
Suggested-by: Darwin Dingel <darwin.dingel@alliedtelesis.co.nz> Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Felix Manlunas [Fri, 30 Dec 2016 01:04:47 +0000 (17:04 -0800)]
liquidio: optimize reads from Octeon PCI console
Reads from Octeon PCI console are inefficient because before each read
operation, a dynamic mapping to Octeon DRAM is set up. This patch replaces
the repeated setup of a dynamic mapping with a one-time setup of a static
mapping.
Signed-off-by: Felix Manlunas <felix.manlunas@cavium.com> Signed-off-by: Raghu Vatsavayi <raghu.vatsavayi@cavium.com> Signed-off-by: Derek Chickles <derek.chickles@cavium.com> Signed-off-by: Satanand Burla <satananda.burla@cavium.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Oftenly, introducing side effects on packet processing on the other half
of the stack by adjusting one of TX/RX via sysctl is not desirable.
There are cases of demand for asymmetric, orthogonal configurability.
This holds true especially for nodes where RPS for RFS usage on top is
configured and therefore use the 'old dev_weight'. This is quite a
common base configuration setup nowadays, even with NICs of superior processing
support (e.g. aRFS).
A good example use case are nodes acting as noSQL data bases with a
large number of tiny requests and rather fewer but large packets as responses.
It's affordable to have large budget and rx dev_weights for the
requests. But as a side effect having this large a number on TX
processed in one run can overwhelm drivers.
This patch therefore introduces an independent configurability via sysctl to
userland.
Signed-off-by: Matthias Tafelmeier <matthias.tafelmeier@gmx.net> Signed-off-by: David S. Miller <davem@davemloft.net>
David S. Miller [Thu, 29 Dec 2016 19:37:25 +0000 (14:37 -0500)]
Merge branch 'bnxt_en-updates'
Michael Chan says:
====================
bnxt_en: updates for net-next.
This patch series for net-next contains cleanups, new features and minor
fixes. The driver specific busy polling code is removed to use busy
polling support in core networking. Hardware RFS support is enhanced with
added ipv6 flows support and VF support. A new scheme to allocate TX
rings from the firmware is implemented for newer chips and firmware. Plus
some misc. cleanups, minor fixes, and to add the maintainer entry. Please
review.
====================
Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 29 Dec 2016 17:13:43 +0000 (12:13 -0500)]
bnxt_en: Handle no aggregation ring gracefully.
The current code assumes that we will always have at least 2 rx rings, 1
will be used as an aggregation ring for TPA and jumbo page placements.
However, it is possible, especially on a VF, that there is only 1 rx
ring available. In this scenario, the current code will fail to initialize.
To handle it, we need to properly set up only 1 ring without aggregation.
Set a new flag BNXT_FLAG_NO_AGG_RINGS for this condition and add logic to
set up the chip to place RX data linearly into a single buffer per packet.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 29 Dec 2016 17:13:42 +0000 (12:13 -0500)]
bnxt_en: Set default completion ring for async events.
With the added support for the bnxt_re RDMA driver, both drivers can be
allocating completion rings in any order. The firmware does not know
which completion ring should be receiving async events. Add an
extra step to tell firmware the completion ring number for receiving
async events after bnxt_en allocates the completion rings.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 29 Dec 2016 17:13:41 +0000 (12:13 -0500)]
bnxt_en: Implement new scheme to reserve tx rings.
In order to properly support TX rate limiting in SRIOV VF functions or
NPAR functions, firmware needs better control over tx ring allocations.
The new scheme requires the driver to reserve the number of tx rings
and to query to see if the requested number of tx rings is reserved.
The driver will use the new scheme when the firmware interface spec is
1.6.1 or newer.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 29 Dec 2016 17:13:38 +0000 (12:13 -0500)]
bnxt_en: Add new hardware RFS mode.
The existing hardware RFS mode uses one hardware RSS context block
per ring just to calculate the RSS hash. This is very wasteful and
prevents VF functions from using it. The new hardware mode shares
the same hardware RSS context for RSS placement and RFS steering.
This allows VFs to enable RFS.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 29 Dec 2016 17:13:37 +0000 (12:13 -0500)]
bnxt_en: Refactor code that determines RFS capability.
Add function bnxt_rfs_supported() that determines if the chip supports
RFS. Refactor the existing function bnxt_rfs_capable() that determines
if run-time conditions support RFS.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 29 Dec 2016 17:13:34 +0000 (12:13 -0500)]
bnxt_en: Fix and clarify link_info->advertising.
The advertising field is closely related to the auto_link_speeds field.
The former is the user setting while the latter is the firmware setting.
Both should be u16. We should use the advertising field in
bnxt_get_link_ksettings because the auto_link_speeds field may not
be updated with the latest from the firmware yet.
Signed-off-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Michael Chan [Thu, 29 Dec 2016 17:13:33 +0000 (12:13 -0500)]
bnxt_en: Improve the IRQ disable sequence during shutdown.
The IRQ is disabled by writing to the completion ring doorbell. This
should be done before the hardware completion ring is freed for correctness.
The current code disables IRQs after all the completion rings are freed.
Fix it by calling bnxt_disable_int_sync() before freeing the completion
rings. Rearrange the code to avoid forward declaration.
Signed-off-by: Michael Chan <michael.chan@broadocm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Dave Jones [Wed, 28 Dec 2016 16:53:18 +0000 (11:53 -0500)]
ipv6: remove unnecessary inet6_sk check
np is already assigned in the variable declaration of ping_v6_sendmsg.
At this point, we have already dereferenced np several times, so the
NULL check is also redundant.
Suggested-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: Dave Jones <davej@codemonkey.org.uk> Signed-off-by: David S. Miller <davem@davemloft.net>
jpinto [Wed, 28 Dec 2016 12:57:48 +0000 (12:57 +0000)]
stmmac: enable rx queues
When the hardware is synthesized with multiple queues, all queues are
disabled for default. This patch adds the rx queues configuration.
This patch was successfully tested in a Synopsys QoS Reference design.
Signed-off-by: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Joe Perches [Thu, 22 Dec 2016 03:54:53 +0000 (19:54 -0800)]
fddi: skfp: Use more common logging styles
Several macros use non-standard styles where format and arguments
are not verified. Convert these to a more typical fmt, ##__VA_ARGS__
use so format and arguments match as appropriate.
Miscellanea:
o Fix format and argument mismatches
o Realign and reindent misindented block
o Strip newlines from formats and add to macro defines
o Coalesce a few consecutive logging uses to more simple single uses
Signed-off-by: Joe Perches <joe@perches.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Wed, 28 Dec 2016 08:47:42 +0000 (16:47 +0800)]
driver: ipvlan: Remove unnecessary ipvlan NULL check in ipvlan_count_rx
There are three functions which would invoke the ipvlan_count_rx. They
are ipvlan_process_multicast, ipvlan_rcv_frame, and ipvlan_nf_input.
The former two functions already use the ipvlan directly before
ipvlan_count_rx, and ipvlan_nf_input gets the ipvlan from
ipvl_addr->master, it is not possible to be NULL too.
So the ipvlan pointer check is unnecessary in ipvlan_count_rx.
Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Gao Feng [Wed, 28 Dec 2016 08:46:51 +0000 (16:46 +0800)]
driver: ipvlan: Define common functions to decrease duplicated codes used to add or del IP address
There are some duplicated codes in ipvlan_add_addr6/4 and
ipvlan_del_addr6/4. Now define two common functions ipvlan_add_addr
and ipvlan_del_addr to decrease the duplicated codes.
It could be helful to maintain the codes.
Signed-off-by: Gao Feng <fgao@ikuai8.com> Signed-off-by: David S. Miller <davem@davemloft.net>
1) Various ipvlan fixes from Eric Dumazet and Mahesh Bandewar.
The most important is to not assume the packet is RX just because
the destination address matches that of the device. Such an
assumption causes problems when an interface is put into loopback
mode.
2) If we retry when creating a new tc entry (because we dropped the
RTNL mutex in order to load a module, for example) we end up with
-EAGAIN and then loop trying to replay the request. But we didn't
reset some state when looping back to the top like this, and if
another thread meanwhile inserted the same tc entry we were trying
to, we re-link it creating an enless loop in the tc chain. Fix from
Daniel Borkmann.
3) There are two different WRITE bits in the MDIO address register for
the stmmac chip, depending upon the chip variant. Due to a bug we
could set them both, fix from Hock Leong Kweh.
4) Fix mlx4 bug in XDP_TX handling, from Tariq Toukan.
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
net: stmmac: fix incorrect bit set in gmac4 mdio addr register
r8169: add support for RTL8168 series add-on card.
net: xdp: remove unused bfp_warn_invalid_xdp_buffer()
openvswitch: upcall: Fix vlan handling.
ipv4: Namespaceify tcp_tw_reuse knob
net: korina: Fix NAPI versus resources freeing
net, sched: fix soft lockup in tc_classify
net/mlx4_en: Fix user prio field in XDP forward
tipc: don't send FIN message from connectionless socket
ipvlan: fix multicast processing
ipvlan: fix various issues in ipvlan_process_multicast()
net: stmmac: fix incorrect bit set in gmac4 mdio addr register
Fixing the gmac4 mdio write access to use MII_GMAC4_WRITE only instead of
OR together with MII_WRITE.
Signed-off-by: Kweh, Hock Leong <hock.leong.kweh@intel.com> Acked-By: Joao Pinto <jpinto@synopsys.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.r.fastabend@intel.com> Signed-off-by: Jason Wang <jasowang@redhat.com> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
pravin shelar [Mon, 26 Dec 2016 16:31:27 +0000 (08:31 -0800)]
openvswitch: upcall: Fix vlan handling.
Networking stack accelerate vlan tag handling by
keeping topmost vlan header in skb. This works as
long as packet remains in OVS datapath. But during
OVS upcall vlan header is pushed on to the packet.
When such packet is sent back to OVS datapath, core
networking stack might not handle it correctly. Following
patch avoids this issue by accelerating the vlan tag
during flow key extract. This simplifies datapath by
bringing uniform packet processing for packets from
all code paths.
Fixes: 5108bbaddc ("openvswitch: add processing of L3 packets"). CC: Jarno Rajahalme <jarno@ovn.org> CC: Jiri Benc <jbenc@redhat.com> Signed-off-by: Pravin B Shelar <pshelar@ovn.org> Signed-off-by: David S. Miller <davem@davemloft.net>
Haishuang Yan [Sun, 25 Dec 2016 06:33:16 +0000 (14:33 +0800)]
ipv4: Namespaceify tcp_tw_reuse knob
Different namespaces might have different requirements to reuse
TIME-WAIT sockets for new connections. This might be required in
cases where different namespace applications are in place which
require TIME_WAIT socket connections to be reduced independently
of the host.
Signed-off-by: Haishuang Yan <yanhaishuang@cmss.chinamobile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Florian Fainelli [Sat, 24 Dec 2016 03:56:56 +0000 (19:56 -0800)]
net: korina: Fix NAPI versus resources freeing
Commit beb0babfb77e ("korina: disable napi on close and restart")
introduced calls to napi_disable() that were missing before,
unfortunately this leaves a small window during which NAPI has a chance
to run, yet we just freed resources since korina_free_ring() has been
called:
Fix this by disabling NAPI first then freeing resource, and make sure
that we also cancel the restart task before doing the resource freeing.
Fixes: beb0babfb77e ("korina: disable napi on close and restart") Reported-by: Alexandros C. Couloumbis <alex@ozo.com> Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Daniel Borkmann [Wed, 21 Dec 2016 17:04:11 +0000 (18:04 +0100)]
net, sched: fix soft lockup in tc_classify
Shahar reported a soft lockup in tc_classify(), where we run into an
endless loop when walking the classifier chain due to tp->next == tp
which is a state we should never run into. The issue only seems to
trigger under load in the tc control path.
What happens is that in tc_ctl_tfilter(), thread A allocates a new
tp, initializes it, sets tp_created to 1, and calls into tp->ops->change()
with it. In that classifier callback we had to unlock/lock the rtnl
mutex and returned with -EAGAIN. One reason why we need to drop there
is, for example, that we need to request an action module to be loaded.
This happens via tcf_exts_validate() -> tcf_action_init/_1() meaning
after we loaded and found the requested action, we need to redo the
whole request so we don't race against others. While we had to unlock
rtnl in that time, thread B's request was processed next on that CPU.
Thread B added a new tp instance successfully to the classifier chain.
When thread A returned grabbing the rtnl mutex again, propagating -EAGAIN
and destroying its tp instance which never got linked, we goto replay
and redo A's request.
This time when walking the classifier chain in tc_ctl_tfilter() for
checking for existing tp instances we had a priority match and found
the tp instance that was created and linked by thread B. Now calling
again into tp->ops->change() with that tp was successful and returned
without error.
tp_created was never cleared in the second round, thus kernel thinks
that we need to link it into the classifier chain (once again). tp and
*back point to the same object due to the match we had earlier on. Thus
for thread B's already public tp, we reset tp->next to tp itself and
link it into the chain, which eventually causes the mentioned endless
loop in tc_classify() once a packet hits the data path.
Fix is to clear tp_created at the beginning of each request, also when
we replay it. On the paths that can cause -EAGAIN we already destroy
the original tp instance we had and on replay we really need to start
from scratch. It seems that this issue was first introduced in commit 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining
and avoid kernel panic when we use cls_cgroup").
Fixes: 12186be7d2e1 ("net_cls: fix unconfigured struct tcf_proto keeps chaining and avoid kernel panic when we use cls_cgroup") Reported-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Cong Wang <xiyou.wangcong@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Tested-by: Shahar Klein <shahark@mellanox.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Larry Finger [Fri, 23 Dec 2016 03:06:53 +0000 (21:06 -0600)]
powerpc: Fix build warning on 32-bit PPC
I am getting the following warning when I build kernel 4.9-git on my
PowerBook G4 with a 32-bit PPC processor:
AS arch/powerpc/kernel/misc_32.o
arch/powerpc/kernel/misc_32.S:299:7: warning: "CONFIG_FSL_BOOKE" is not defined [-Wundef]
This problem is evident after commit 989cea5c14be ("kbuild: prevent
lib-ksyms.o rebuilds"); however, this change in kbuild only exposes an
error that has been in the code since 2005 when this source file was
created. That was with commit 9994a33865f4 ("powerpc: Introduce
entry_{32,64}.S, misc_{32,64}.S, systbl.S").
The offending line does not make a lot of sense. This error does not
seem to cause any errors in the executable, thus I am not recommending
that it be applied to any stable versions.
Thanks to Nicholas Piggin for suggesting this solution.
Fixes: 9994a33865f4 ("powerpc: Introduce entry_{32,64}.S, misc_{32,64}.S, systbl.S") Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Linus Torvalds [Sun, 25 Dec 2016 22:56:58 +0000 (14:56 -0800)]
avoid spurious "may be used uninitialized" warning
The timer type simplifications caused a new gcc warning:
drivers/base/power/domain.c: In function ‘genpd_runtime_suspend’:
drivers/base/power/domain.c:562:14: warning: ‘time_start’ may be used uninitialized in this function [-Wmaybe-uninitialized]
elapsed_ns = ktime_to_ns(ktime_sub(ktime_get(), time_start));
despite the actual use of "time_start" not having changed in any way.
It appears that simply changing the type of ktime_t from a union to a
plain scalar type made gcc check the use.
The variable wasn't actually used uninitialized, but gcc apparently
failed to notice that the conditional around the use was exactly the
same as the conditional around the initialization of that variable.
Add an unnecessary initialization just to shut up the compiler.
Linus Torvalds [Sun, 25 Dec 2016 22:30:04 +0000 (14:30 -0800)]
Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull timer type cleanups from Thomas Gleixner:
"This series does a tree wide cleanup of types related to
timers/timekeeping.
- Get rid of cycles_t and use a plain u64. The type is not really
helpful and caused more confusion than clarity
- Get rid of the ktime union. The union has become useless as we use
the scalar nanoseconds storage unconditionally now. The 32bit
timespec alike storage got removed due to the Y2038 limitations
some time ago.
That leaves the odd union access around for no reason. Clean it up.
Both changes have been done with coccinelle and a small amount of
manual mopping up"
* 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
ktime: Get rid of ktime_equal()
ktime: Cleanup ktime_set() usage
ktime: Get rid of the union
clocksource: Use a plain u64 instead of cycle_t
Linus Torvalds [Sun, 25 Dec 2016 22:05:56 +0000 (14:05 -0800)]
Merge branch 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull SMP hotplug notifier removal from Thomas Gleixner:
"This is the final cleanup of the hotplug notifier infrastructure. The
series has been reintgrated in the last two days because there came a
new driver using the old infrastructure via the SCSI tree.
Summary:
- convert the last leftover drivers utilizing notifiers
- fixup for a completely broken hotplug user
- prevent setup of already used states
- removal of the notifiers
- treewide cleanup of hotplug state names
- consolidation of state space
There is a sphinx based documentation pending, but that needs review
from the documentation folks"
* 'smp-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
irqchip/armada-xp: Consolidate hotplug state space
irqchip/gic: Consolidate hotplug state space
coresight/etm3/4x: Consolidate hotplug state space
cpu/hotplug: Cleanup state names
cpu/hotplug: Remove obsolete cpu hotplug register/unregister functions
staging/lustre/libcfs: Convert to hotplug state machine
scsi/bnx2i: Convert to hotplug state machine
scsi/bnx2fc: Convert to hotplug state machine
cpu/hotplug: Prevent overwriting of callbacks
x86/msr: Remove bogus cleanup from the error path
bus: arm-ccn: Prevent hotplug callback leak
perf/x86/intel/cstate: Prevent hotplug callback leak
ARM/imx/mmcd: Fix broken cpu hotplug handling
scsi: qedi: Convert to hotplug state machine
Linus Torvalds [Sun, 25 Dec 2016 22:01:28 +0000 (14:01 -0800)]
Merge branch 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux
Pull turbostat updates from Len Brown.
* 'turbostat' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
tools/power turbostat: remove obsolete -M, -m, -C, -c options
tools/power turbostat: Make extensible via the --add parameter
tools/power turbostat: Denverton uses a 25 MHz crystal, not 19.2 MHz
tools/power turbostat: line up headers when -M is used
tools/power turbostat: fix SKX PKG_CSTATE_LIMIT decoding
tools/power turbostat: Support Knights Mill (KNM)
tools/power turbostat: Display HWP OOB status
tools/power turbostat: fix Denverton BCLK
tools/power turbostat: use intel-family.h model strings
tools/power/turbostat: Add Denverton RAPL support
tools/power/turbostat: Add Denverton support
tools/power/turbostat: split core MSR support into status + limit
tools/power turbostat: fix error case overflow read of slm_freq_table[]
tools/power turbostat: Allocate correct amount of fd and irq entries
tools/power turbostat: switch to tab delimited output
tools/power turbostat: Gracefully handle ACPI S3
tools/power turbostat: tidy up output on Joule counter overflow
Nicholas Piggin [Sun, 25 Dec 2016 03:00:30 +0000 (13:00 +1000)]
mm: add PageWaiters indicating tasks are waiting for a page bit
Add a new page flag, PageWaiters, to indicate the page waitqueue has
tasks waiting. This can be tested rather than testing waitqueue_active
which requires another cacheline load.
This bit is always set when the page has tasks on page_waitqueue(page),
and is set and cleared under the waitqueue lock. It may be set when
there are no tasks on the waitqueue, which will cause a harmless extra
wakeup check that will clears the bit.
The generic bit-waitqueue infrastructure is no longer used for pages.
Instead, waitqueues are used directly with a custom key type. The
generic code was not flexible enough to have PageWaiters manipulation
under the waitqueue lock (which simplifies concurrency).
This improves the performance of page lock intensive microbenchmarks by
2-3%.
Putting two bits in the same word opens the opportunity to remove the
memory barrier between clearing the lock bit and testing the waiters
bit, after some work on the arch primitives (e.g., ensuring memory
operand widths match and cover both bits).
Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Bob Peterson <rpeterso@redhat.com> Cc: Steven Whitehouse <swhiteho@redhat.com> Cc: Andrew Lutomirski <luto@kernel.org> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Mel Gorman <mgorman@techsingularity.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Thomas Gleixner [Sun, 25 Dec 2016 11:30:41 +0000 (12:30 +0100)]
ktime: Cleanup ktime_set() usage
ktime_set(S,N) was required for the timespec storage type and is still
useful for situations where a Seconds and Nanoseconds part of a time value
needs to be converted. For anything where the Seconds argument is 0, this
is pointless and can be replaced with a simple assignment.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
Thomas Gleixner [Sun, 25 Dec 2016 10:38:40 +0000 (11:38 +0100)]
ktime: Get rid of the union
ktime is a union because the initial implementation stored the time in
scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
variant for 32bit machines. The Y2038 cleanup removed the timespec variant
and switched everything to scalar nanoseconds. The union remained, but
become completely pointless.
Get rid of the union and just keep ktime_t as simple typedef of type s64.
The conversion was done with coccinelle and some manual mopping up.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org>
Thomas Gleixner [Wed, 21 Dec 2016 19:19:57 +0000 (20:19 +0100)]
irqchip/armada-xp: Consolidate hotplug state space
The mpic is either the main interrupt controller or is cascaded behind a
GIC. The mpic is single instance and the modes are mutually exclusive, so
there is no reason to have seperate cpu hotplug states.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Petazzoni <thomas.petazzoni@free-electrons.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Link: http://lkml.kernel.org/r/20161221192112.333161745@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 21 Dec 2016 19:19:56 +0000 (20:19 +0100)]
irqchip/gic: Consolidate hotplug state space
Even if both drivers are compiled in only one instance can run on a given
system depending on the available GIC version.
So having seperate hotplug states for them is pointless.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Marc Zyngier <marc.zyngier@arm.com> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192112.252416267@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 21 Dec 2016 19:19:54 +0000 (20:19 +0100)]
cpu/hotplug: Cleanup state names
When the state names got added a script was used to add the extra argument
to the calls. The script basically converted the state constant to a
string, but the cleanup to convert these strings into meaningful ones did
not happen.
Replace all the useless strings with 'subsys/xxx/yyy:state' strings which
are used in all the other places already.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192112.085444152@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Remove also the now pointless cpu notifier error injection mechanism. The
states can be executed step by step and error rollback is the same as cpu
down, so any state transition can be tested w/o requiring the notifier
error injection.
Some CPU hotplug states are kept as they are (ab)used for hotplug state
tracking.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: rt@linutronix.de Link: http://lkml.kernel.org/r/20161221192112.005642358@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Wed, 21 Dec 2016 19:19:49 +0000 (20:19 +0100)]
cpu/hotplug: Prevent overwriting of callbacks
Developers manage to overwrite states blindly without thought. That's fatal
and hard to debug. Add sanity checks to make it fail.
This requries to restructure the code so that the dynamic state allocation
happens in the same lock protected section as the actual store. Otherwise
the previous assignment of 'Reserved' to the name field would trigger the
overwrite check.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Link: http://lkml.kernel.org/r/20161221192111.675234535@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Thu, 22 Dec 2016 09:32:38 +0000 (10:32 +0100)]
x86/msr: Remove bogus cleanup from the error path
The error cleanup which is invoked when the hotplug state setup failed
tries to remove the failed state, which is broken.
Fixes: 8fba38c937cd ("x86/msr: Convert to hotplug state machine") Reported-by: kernel test robot <fengguang.wu@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de>
Thomas Gleixner [Thu, 22 Dec 2016 10:14:06 +0000 (11:14 +0100)]
bus: arm-ccn: Prevent hotplug callback leak
In case the driver registration fails, the hotplug callback is leaked.
Not fatal, because it's never invoked as there are no instances registered,
but wrong nevertheless.
Fixes: fdc15a36d84e ("bus/arm-ccn: Convert to hotplug statemachine") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Pawel Moll <pawel.moll@arm.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will.deacon@arm.com>
If the pmu registration fails the registered hotplug callbacks are not
removed. Wrong in any case, but fatal in case of a modular driver.
Replace the nonsensical state names with proper ones while at it.
Fixes: 77c34ef1c319 ("perf/x86/intel/cstate: Convert Intel CSTATE to hotplug state machine") Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org
Thomas Gleixner [Wed, 21 Dec 2016 19:19:48 +0000 (20:19 +0100)]
ARM/imx/mmcd: Fix broken cpu hotplug handling
The cpu hotplug support of this perf driver is broken in several ways:
1) It adds a instance before setting up the state.
2) The state for the instance is different from the state of the
callback. It's just a randomly chosen state.
3) The instance registration is not error checked so nobody noticed that
the call can never succeed.
4) The state for the multi install callbacks is chosen randomly and
overwrites existing state. This is now prevented by the core code so the
call is guaranteed to fail.
5) The error exit path in the init function leaves the instance registered
and then frees the memory which contains the enqueued hlist node.
6) The remove function is removing the state and not the instance.
Fix it by:
- Setting up the state before adding instances. Use a dynamically allocated
state for it.
- Installing instances after the state has been set up
- Removing the instance in the error path before freeing memory
- Removing the instance not the state in the driver remove callback
While at is use raw_cpu_processor_id(), because cpu_processor_id() cannot
be used in preemptible context, and set the driver data after successful
registration of the pmu.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Shawn Guo <shawnguo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Frank Li <frank.li@nxp.com> Cc: Zhengyu Shen <zhengyu.shen@nxp.com> Link: http://lkml.kernel.org/r/20161221192111.596204211@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Thomas Gleixner [Sat, 24 Dec 2016 11:34:02 +0000 (12:34 +0100)]
scsi: qedi: Convert to hotplug state machine
The CPU hotplug code is a trainwreck. It leaks a notifier in case of driver
registration error and the per cpu loop is racy against cpu hotplug. Aside
of that the driver should have been written and merged with the new state
machine interfaces in the first place.
Mop up the mess and Convert it to the hotplug state machine.
Signed-off-by: Thomas Grumpy Gleixner <tglx@linutronix.de> Cc: Nilesh Javali <nilesh.javali@cavium.com> Cc: Adheer Chandravanshi <adheer.chandravanshi@qlogic.com> Cc: Chad Dupuis <chad.dupuis@cavium.com> Cc: Saurav Kashyap <saurav.kashyap@cavium.com> Cc: Arun Easi <arun.easi@cavium.com> Cc: Manish Rangankar <manish.rangankar@cavium.com> Cc: Johannes Thumshirn <jthumshirn@suse.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Martin K. Petersen <martin.petersen@oracle.com> Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Linus Torvalds [Sat, 24 Dec 2016 19:37:18 +0000 (11:37 -0800)]
Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6
Pull cifs fixes from Steve French:
"This ncludes various cifs/smb3 bug fixes, mostly for stable as well.
In the next week I expect that Germano will have some reconnection
fixes, and also I expect to have the remaining pieces of the snapshot
enablement and SMB3 ACLs, but wanted to get this set of bug fixes in"
* 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
cifs_get_root shouldn't use path with tree name
Fix default behaviour for empty domains and add domainauto option
cifs: use %16phN for formatting md5 sum
cifs: Fix smbencrypt() to stop pointing a scatterlist at the stack
CIFS: Fix a possible double locking of mutex during reconnect
CIFS: Fix a possible memory corruption during reconnect
CIFS: Fix a possible memory corruption in push locks
CIFS: Fix missing nls unload in smb2_reconnect()
CIFS: Decrease verbosity of ioctl call
SMB3: parsing for new snapshot timestamp mount parm
Linus Torvalds [Sat, 24 Dec 2016 19:27:45 +0000 (11:27 -0800)]
Merge tag 'watchdog-for-linus-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging
Pull watchdog updates from Wim Van Sebroeck and Guenter Roeck:
- new driver for Add Loongson1 SoC
- minor cleanup and fixes in various drivers
* tag 'watchdog-for-linus-v4.10' of git://git.kernel.org/pub/scm/linux/kernel/git/groeck/linux-staging:
watchdog: it87_wdt: add IT8620E ID
watchdog: mpc8xxx: Remove unneeded linux/miscdevice.h include
watchdog: octeon: Remove unneeded linux/miscdevice.h include
watchdog: bcm2835_wdt: set WDOG_HW_RUNNING bit when appropriate
watchdog: loongson1: Add Loongson1 SoC watchdog driver
watchdog: cpwd: remove memory allocate failure message
watchdog: da9062/61: watchdog driver
intel-mid_wdt: Error code is just an integer
intel-mid_wdt: make sure watchdog is not running at startup
watchdog: mei_wdt: request stop on reboot to prevent false positive event
watchdog: hpwdt: changed maintainer information
watchdog: jz4740: Fix modular build
watchdog: qcom: fix kernel panic due to external abort on non-linefetch
watchdog: davinci: add support for deferred probing
watchdog: meson: Remove unneeded platform MODULE_ALIAS
watchdog: Standardize leading tabs and spaces in Kconfig file
watchdog: max77620_wdt: fix module autoload
watchdog: bcm7038_wdt: fix module autoload
Linus Torvalds [Sat, 24 Dec 2016 19:23:24 +0000 (11:23 -0800)]
Merge tag 'ntb-4.10' of git://github.com/jonmason/ntb
Pull NTB update from Jon Mason:
- NTB bug fixes for removing an unnecessary call to ntb_peer_spad_read,
and correcting a free_irq inconsistency
- add Intel SKX support
- change the AMD NTB maintainer, and fix some bugs present there
* tag 'ntb-4.10' of git://github.com/jonmason/ntb:
ntb_transport: Remove unnecessary call to ntb_peer_spad_read
NTB: Fix 'request_irq()' and 'free_irq()' inconsistancy
ntb: fix SKX NTB config space size register offsets
NTB: correct ntb_peer_spad_read for case when callback is not supplied.
MAINTAINERS: Change in maintainer for AMD NTB
ntb_transport: Limit memory windows based on available, scratchpads
NTB: Register and offset values fix for memory window
NTB: add support for hotplug feature
ntb: Adding Skylake Xeon NTB support
Linus Torvalds [Sat, 24 Dec 2016 00:54:46 +0000 (16:54 -0800)]
Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
"There's a number of fixes:
- a round of fixes for CPUID-less legacy CPUs
- a number of microcode loader fixes
- i8042 detection robustization fixes
- stack dump/unwinder fixes
- x86 SoC platform driver fixes
- a GCC 7 warning fix
- virtualization related fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits)
Revert "x86/unwind: Detect bad stack return address"
x86/paravirt: Mark unused patch_default label
x86/microcode/AMD: Reload proper initrd start address
x86/platform/intel/quark: Add printf attribute to imr_self_test_result()
x86/platform/intel-mid: Switch MPU3050 driver to IIO
x86/alternatives: Do not use sync_core() to serialize I$
x86/topology: Document cpu_llc_id
x86/hyperv: Handle unknown NMIs on one CPU when unknown_nmi_panic
x86/asm: Rewrite sync_core() to use IRET-to-self
x86/microcode/intel: Replace sync_core() with native_cpuid()
Revert "x86/boot: Fail the boot if !M486 and CPUID is missing"
x86/asm/32: Make sync_core() handle missing CPUID on all 32-bit kernels
x86/cpu: Probe CPUID leaf 6 even when cpuid_level == 6
x86/tools: Fix gcc-7 warning in relocs.c
x86/unwind: Dump stack data on warnings
x86/unwind: Adjust last frame check for aligned function stacks
x86/init: Fix a couple of comment typos
x86/init: Remove i8042_detect() from platform ops
Input: i8042 - Trust firmware a bit more when probing on X86
x86/init: Add i8042 state to the platform data
...
Linus Torvalds [Sat, 24 Dec 2016 00:49:12 +0000 (16:49 -0800)]
Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf fixes from Ingo Molnar:
"On the kernel side there's two x86 PMU driver fixes and a uprobes fix,
plus on the tooling side there's a number of fixes and some late
updates"
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (36 commits)
perf sched timehist: Fix invalid period calculation
perf sched timehist: Remove hardcoded 'comm_width' check at print_summary
perf sched timehist: Enlarge default 'comm_width'
perf sched timehist: Honour 'comm_width' when aligning the headers
perf/x86: Fix overlap counter scheduling bug
perf/x86/pebs: Fix handling of PEBS buffer overflows
samples/bpf: Move open_raw_sock to separate header
samples/bpf: Remove perf_event_open() declaration
samples/bpf: Be consistent with bpf_load_program bpf_insn parameter
tools lib bpf: Add bpf_prog_{attach,detach}
samples/bpf: Switch over to libbpf
perf diff: Do not overwrite valid build id
perf annotate: Don't throw error for zero length symbols
perf bench futex: Fix lock-pi help string
perf trace: Check if MAP_32BIT is defined (again)
samples/bpf: Make perf_event_read() static
uprobes: Fix uprobes on MIPS, allow for a cache flush after ixol breakpoint creation
samples/bpf: Make samples more libbpf-centric
tools lib bpf: Add flags to bpf_create_map()
tools lib bpf: use __u32 from linux/types.h
...
Tariq Toukan [Thu, 22 Dec 2016 12:32:58 +0000 (14:32 +0200)]
net/mlx4_en: Fix user prio field in XDP forward
The user prio field is wrong (and overflows) in the XDP forward
flow.
This is a result of a bad value for num_tx_rings_p_up, which should
account all XDP TX rings, as they operate for the same user prio.
Signed-off-by: Tariq Toukan <tariqt@mellanox.com> Reported-by: Martin KaFai Lau <kafai@fb.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Jon Paul Maloy [Thu, 22 Dec 2016 12:22:29 +0000 (07:22 -0500)]
tipc: don't send FIN message from connectionless socket
In commit 6f00089c7372 ("tipc: remove SS_DISCONNECTING state") the
check for socket type is in the wrong place, causing a closing socket
to always send out a FIN message even when the socket was never
connected. This is normally harmless, since the destination node for
such messages most often is zero, and the message will be dropped, but
it is still a wrong and confusing behavior.
We fix this in this commit.
Reviewed-by: Parthasarathy Bhuvaragan <parthasarathy.bhuvaragan@ericsson.com> Signed-off-by: Jon Maloy <jon.maloy@ericsson.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Mahesh Bandewar [Thu, 22 Dec 2016 01:30:16 +0000 (17:30 -0800)]
ipvlan: fix multicast processing
In an IPvlan setup when master is set in loopback mode e.g.
ethtool -K eth0 set loopback on
where eth0 is master device for IPvlan setup.
The failure is caused by the faulty logic that determines if the
packet is from TX-path vs. RX-path by just looking at the mac-
addresses on the packet while processing multicast packets.
In the loopback-mode where this crash was happening, the packets
that are sent out are reflected by the NIC and are processed on
the RX path, but mac-address check tricks into thinking this
packet is from TX path and falsely uses dev_forward_skb() to pass
packets to the slave (virtual) devices.
This patch records the path while queueing packets and eliminates
logic of looking at mac-addresses for the same decision.
Fixes: ba35f8588f47 ("ipvlan: Defer multicast / broadcast processing to a work-queue") Signed-off-by: Mahesh Bandewar <maheshb@google.com> CC: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Eric Dumazet [Thu, 22 Dec 2016 02:00:24 +0000 (18:00 -0800)]
ipvlan: fix various issues in ipvlan_process_multicast()
1) netif_rx() / dev_forward_skb() should not be called from process
context.
2) ipvlan_count_rx() should be called with preemption disabled.
3) We should check if ipvlan->dev is up before feeding packets
to netif_rx()
4) We need to prevent device from disappearing if some packets
are in the multicast backlog.
5) One kfree_skb() should be a consume_skb() eventually
Fixes: ba35f8588f47 ("ipvlan: Defer multicast / broadcast processing to
a work-queue") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Mahesh Bandewar <maheshb@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
Steve Wahl [Wed, 21 Dec 2016 16:45:22 +0000 (11:45 -0500)]
ntb_transport: Remove unnecessary call to ntb_peer_spad_read
The results were previously ignored, anyway.
Signed-off-by: Steve Wahl <Steve.Wahl@dell.com> Fixes: e26a5843f7f5014ae4460030ca4de029a3ac35d3 Acked-by: Allen Hubbe <Allen.Hubbe@dell.com> Signed-off-by: Jon Mason <jdmason@kudzu.us>