]> git.proxmox.com Git - mirror_ubuntu-jammy-kernel.git/log
mirror_ubuntu-jammy-kernel.git
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next
David S. Miller [Wed, 17 Feb 2021 21:19:24 +0000 (13:19 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/pablo/nf-next

Pablo Neira Ayuso says:

====================
Netfilter updates for net-next

The following patchset contains Netfilter updates for net-next:

1) Add two helper functions to release one table and hooks from
   the netns and netlink event path.

2) Add table ownership infrastructure, this new infrastructure allows
   users to bind a table (and its content) to a process through the
   netlink socket.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mdio: Remove of_phy_attach()
Florian Fainelli [Wed, 17 Feb 2021 20:25:57 +0000 (12:25 -0800)]
net: mdio: Remove of_phy_attach()

We have no in-tree users, also update the sfp-phylink.rst documentation
to indicate that phy_attach_direct() is used instead of of_phy_attach().

Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: select PACKING in the Kconfig
Vladimir Oltean [Wed, 17 Feb 2021 20:33:48 +0000 (22:33 +0200)]
net: mscc: ocelot: select PACKING in the Kconfig

Ocelot now uses include/linux/dsa/ocelot.h which makes use of
CONFIG_PACKING to pack/unpack bits into the Injection/Extraction Frame
Headers. So it needs to explicitly select it, otherwise there might be
build errors due to the missing dependency.

Fixes: 40d3f295b5fe ("net: mscc: ocelot: use common tag parsing code with DSA")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoALSA: hda: intel-dsp-config: add Alder Lake support
Kai Vehmanen [Wed, 10 Feb 2021 11:13:10 +0000 (13:13 +0200)]
ALSA: hda: intel-dsp-config: add Alder Lake support

Add rules to select SOF driver for Alder Lake systems if a digital
microphone or SoundWire codecs are present in the system. This is
following same rules as for older Tiger Lake systems.

Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Reviewed-by: Xiuli Pan <xiuli.pan@intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20210210111310.2227417-1-kai.vehmanen@linux.intel.com
Signed-off-by: Takashi Iwai <tiwai@suse.de>
3 years agoMerge tag 'asoc-v5.12' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie...
Takashi Iwai [Wed, 17 Feb 2021 20:16:27 +0000 (21:16 +0100)]
Merge tag 'asoc-v5.12' of https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus

ASoC: Updates for v5.12

Another quiet release in terms of features, though several of the
drivers got quite a bit of work and there were a lot of general changes
resulting from Morimoto-san's ongoing cleanup work.

 - As ever, lots of hard work by Morimoto-san cleaning up the code and
   making it more consistent.
 - Many improvements in the Intel drivers including a wide range of
   quirks and bug fixes.
 - A KUnit testsuite for the topology code.
 - Support for Ingenic JZ4760(B), Intel AlderLake-P, DT configured
   nVidia cards, Qualcomm lpass-rx-macro and lpass-tx-macro
 - Removal of obsolete SIRF prima/atlas, Txx9 and ZTE zx drivers.

3 years agoMerge remote-tracking branch 'asoc/for-5.12' into asoc-linus
Mark Brown [Wed, 17 Feb 2021 18:52:26 +0000 (18:52 +0000)]
Merge remote-tracking branch 'asoc/for-5.12' into asoc-linus

3 years agoMerge remote-tracking branch 'asoc/for-5.11' into asoc-linus
Mark Brown [Wed, 17 Feb 2021 18:52:25 +0000 (18:52 +0000)]
Merge remote-tracking branch 'asoc/for-5.11' into asoc-linus

3 years agoMerge series "ASoC: Intel: bytcr_rt5640: Add quirks for 4 more tablet / 2-in-1 models...
Mark Brown [Wed, 17 Feb 2021 18:48:09 +0000 (18:48 +0000)]
Merge series "ASoC: Intel: bytcr_rt5640: Add quirks for 4 more tablet / 2-in-1 models" from Hans de Goede <hdegoede@redhat.com>:

Hi All,

Here is a patch series adding quirks with device-specific settings for
4 more tablet / 2-in-1 models.

Regards,

Hans

Hans de Goede (4):
  ASoC: Intel: bytcr_rt5640: Add quirk for the Estar Beauty HD MID 7316R tablet
  ASoC: Intel: bytcr_rt5640: Add quirk for the Voyo Winpad A15 tablet
  ASoC: Intel: bytcr_rt5651: Add quirk for the Jumper EZpad 7 tablet
  ASoC: Intel: bytcr_rt5640: Add quirk for the Acer One S1002 tablet

 sound/soc/intel/boards/bytcr_rt5640.c | 37 +++++++++++++++++++++++++++
 sound/soc/intel/boards/bytcr_rt5651.c | 13 ++++++++++
 2 files changed, 50 insertions(+)

--
2.30.1

3 years agoASoC: soc-pcm: fix hw param limits calculation for multi-DAI
Kai Vehmanen [Tue, 16 Feb 2021 17:22:51 +0000 (19:22 +0200)]
ASoC: soc-pcm: fix hw param limits calculation for multi-DAI

In case DPCM runtime has multiple CPU DAIs, dpcm_init_runtime_hw() is
called multiple times, once for each CPU DAI. This will lead to
ignoring hw limits of all but the last DAI.

Fix this by moving soc_pcm_hw_init() up by one level to
dpcm_init_runtime_hw().

Fixes: 140f553d1298 ("ASoC: soc-pcm: fix hwparams min/max init for dpcm")
Suggested-by: Kuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
Link: https://lore.kernel.org/r/20210216172251.3023723-1-kai.vehmanen@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 years agoASoC: Intel: bytcr_rt5640: Add quirk for the Acer One S1002 tablet
Hans de Goede [Tue, 16 Feb 2021 21:35:55 +0000 (22:35 +0100)]
ASoC: Intel: bytcr_rt5640: Add quirk for the Acer One S1002 tablet

The Acer One S1002 tablet is using an analog mic on IN1 and has
its jack-detect connected to JD2_IN4N, instead of using the default
IN3 for its internal mic and JD1_IN4P for jack-detect.

Note it is also using AIF2 instead of AIF1 which is somewhat unusual,
this is correctly advertised in the ACPI CHAN package, so the speakers
do work without the quirk.

Add a quirk for the mic and jack-detect settings.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20210216213555.36555-5-hdegoede@redhat.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 years agoASoC: Intel: bytcr_rt5651: Add quirk for the Jumper EZpad 7 tablet
Hans de Goede [Tue, 16 Feb 2021 21:35:54 +0000 (22:35 +0100)]
ASoC: Intel: bytcr_rt5651: Add quirk for the Jumper EZpad 7 tablet

Add a DMI quirk for the Jumper EZpad 7 tablet, this tablet has
a jack-detect switch which reads 1/high when a jack is inserted,
rather then using the standard active-low setup which most
jack-detect switches use. All other settings are using the defaults.

Add a DMI-quirk setting the defaults + the BYT_RT5651_JD_NOT_INV
flags for this.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20210216213555.36555-4-hdegoede@redhat.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 years agoASoC: Intel: bytcr_rt5640: Add quirk for the Voyo Winpad A15 tablet
Hans de Goede [Tue, 16 Feb 2021 21:35:53 +0000 (22:35 +0100)]
ASoC: Intel: bytcr_rt5640: Add quirk for the Voyo Winpad A15 tablet

The Voyo Winpad A15 tablet uses a Bay Trail (non CR) SoC, so it is using
SSP2 (AIF1) and it mostly works with the defaults. But instead of using
DMIC1 it is using an analog mic on IN1, add a quirk for this.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20210216213555.36555-3-hdegoede@redhat.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 years agoASoC: Intel: bytcr_rt5640: Add quirk for the Estar Beauty HD MID 7316R tablet
Hans de Goede [Tue, 16 Feb 2021 21:35:52 +0000 (22:35 +0100)]
ASoC: Intel: bytcr_rt5640: Add quirk for the Estar Beauty HD MID 7316R tablet

The Estar Beauty HD MID 7316R tablet almost fully works with out default
settings. The only problem is that it has only 1 speaker so any sounds
only playing on the right channel get lost.

Add a quirk for this model using the default settings + MONO_SPEAKER.

Signed-off-by: Hans de Goede <hdegoede@redhat.com>
Acked-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.intel.com>
Link: https://lore.kernel.org/r/20210216213555.36555-2-hdegoede@redhat.com
Signed-off-by: Mark Brown <broonie@kernel.org>
3 years agosched,x86: Allow !PREEMPT_DYNAMIC
Peter Zijlstra [Tue, 9 Feb 2021 21:02:33 +0000 (22:02 +0100)]
sched,x86: Allow !PREEMPT_DYNAMIC

Allow building x86 with PREEMPT_DYNAMIC=n, this is needed for
PREEMPT_RT as it makes no sense to not have full preemption on
PREEMPT_RT.

Fixes: 8c98e8cf723c ("preempt/dynamic: Provide preempt_schedule[_notrace]() static calls")
Reported-by: Mike Galbraith <efault@gmx.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Mike Galbraith <efault@gmx.de>
Link: https://lkml.kernel.org/r/YCK1+JyFNxQnWeXK@hirez.programming.kicks-ass.net
3 years agoentry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point
Frederic Weisbecker [Sun, 31 Jan 2021 23:05:48 +0000 (00:05 +0100)]
entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point

Following the idle loop model, cleanly check for pending rcuog wakeup
before the last rescheduling point upon resuming to guest mode. This
way we can avoid to do it from rcu_user_enter() with the last resort
self-IPI hack that enforces rescheduling.

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-6-frederic@kernel.org
3 years agoentry: Explicitly flush pending rcuog wakeup before last rescheduling point
Frederic Weisbecker [Sun, 31 Jan 2021 23:05:47 +0000 (00:05 +0100)]
entry: Explicitly flush pending rcuog wakeup before last rescheduling point

Following the idle loop model, cleanly check for pending rcuog wakeup
before the last rescheduling point on resuming to user mode. This
way we can avoid to do it from rcu_user_enter() with the last resort
self-IPI hack that enforces rescheduling.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-5-frederic@kernel.org
3 years agorcu/nocb: Trigger self-IPI on late deferred wake up before user resume
Frederic Weisbecker [Sun, 31 Jan 2021 23:05:46 +0000 (00:05 +0100)]
rcu/nocb: Trigger self-IPI on late deferred wake up before user resume

Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.

Unfortunately the call to rcu_user_enter() is already past the last
rescheduling opportunity before we resume to userspace or to guest mode.
We may escape there with the woken task ignored.

The ultimate resort to fix every callsites is to trigger a self-IPI
(nohz_full depends on arch to implement arch_irq_work_raise()) that will
trigger a reschedule on IRQ tail or guest exit.

Eventually every site that want a saner treatment will need to carefully
place a call to rcu_nocb_flush_deferred_wakeup() before the last explicit
need_resched() check upon resume.

Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and perf)
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-4-frederic@kernel.org
3 years agorcu/nocb: Perform deferred wake up before last idle's need_resched() check
Frederic Weisbecker [Sun, 31 Jan 2021 23:05:45 +0000 (00:05 +0100)]
rcu/nocb: Perform deferred wake up before last idle's need_resched() check

Entering RCU idle mode may cause a deferred wake up of an RCU NOCB_GP
kthread (rcuog) to be serviced.

Usually a local wake up happening while running the idle task is handled
in one of the need_resched() checks carefully placed within the idle
loop that can break to the scheduler.

Unfortunately the call to rcu_idle_enter() is already beyond the last
generic need_resched() check and we may halt the CPU with a resched
request unhandled, leaving the task hanging.

Fix this with splitting the rcuog wakeup handling from rcu_idle_enter()
and place it before the last generic need_resched() check in the idle
loop. It is then assumed that no call to call_rcu() will be performed
after that in the idle loop until the CPU is put in low power mode.

Fixes: 96d3fd0d315a (rcu: Break call_rcu() deadlock involving scheduler and perf)
Reported-by: Paul E. McKenney <paulmck@kernel.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-3-frederic@kernel.org
3 years agorcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers
Frederic Weisbecker [Sun, 31 Jan 2021 23:05:44 +0000 (00:05 +0100)]
rcu: Pull deferred rcuog wake up to rcu_eqs_enter() callers

Deferred wakeup of rcuog kthreads upon RCU idle mode entry is going to
be handled differently whether initiated by idle, user or guest. Prepare
with pulling that control up to rcu_eqs_enter() callers.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210131230548.32970-2-frederic@kernel.org
3 years agosched/features: Distinguish between NORMAL and DEADLINE hrtick
Juri Lelli [Mon, 8 Feb 2021 07:35:54 +0000 (08:35 +0100)]
sched/features: Distinguish between NORMAL and DEADLINE hrtick

The HRTICK feature has traditionally been servicing configurations that
need precise preemptions point for NORMAL tasks. More recently, the
feature has been extended to also service DEADLINE tasks with stringent
runtime enforcement needs (e.g., runtime < 1ms with HZ=1000).

Enabling HRTICK sched feature currently enables the additional timer and
task tick for both classes, which might introduced undesired overhead
for no additional benefit if one needed it only for one of the cases.

Separate HRTICK sched feature in two (and leave the traditional case
name unmodified) so that it can be selectively enabled when needed.

With:

  $ echo HRTICK > /sys/kernel/debug/sched_features

the NORMAL/fair hrtick gets enabled.

With:

  $ echo HRTICK_DL > /sys/kernel/debug/sched_features

the DEADLINE hrtick gets enabled.

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210208073554.14629-3-juri.lelli@redhat.com
3 years agosched/features: Fix hrtick reprogramming
Juri Lelli [Mon, 8 Feb 2021 07:35:53 +0000 (08:35 +0100)]
sched/features: Fix hrtick reprogramming

Hung tasks and RCU stall cases were reported on systems which were not
100% busy. Investigation of such unexpected cases (no sign of potential
starvation caused by tasks hogging the system) pointed out that the
periodic sched tick timer wasn't serviced anymore after a certain point
and that caused all machinery that depends on it (timers, RCU, etc.) to
stop working as well. This issues was however only reproducible if
HRTICK was enabled.

Looking at core dumps it was found that the rbtree of the hrtimer base
used also for the hrtick was corrupted (i.e. next as seen from the base
root and actual leftmost obtained by traversing the tree are different).
Same base is also used for periodic tick hrtimer, which might get "lost"
if the rbtree gets corrupted.

Much alike what described in commit 1f71addd34f4c ("tick/sched: Do not
mess with an enqueued hrtimer") there is a race window between
hrtimer_set_expires() in hrtick_start and hrtimer_start_expires() in
__hrtick_restart() in which the former might be operating on an already
queued hrtick hrtimer, which might lead to corruption of the base.

Use hrtick_start() (which removes the timer before enqueuing it back) to
ensure hrtick hrtimer reprogramming is entirely guarded by the base
lock, so that no race conditions can occur.

Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210208073554.14629-2-juri.lelli@redhat.com
3 years agosched/deadline: Reduce rq lock contention in dl_add_task_root_domain()
Dietmar Eggemann [Tue, 19 Jan 2021 08:35:42 +0000 (09:35 +0100)]
sched/deadline: Reduce rq lock contention in dl_add_task_root_domain()

dl_add_task_root_domain() is called during sched domain rebuild:

  rebuild_sched_domains_locked()
    partition_and_rebuild_sched_domains()
      rebuild_root_domains()
         for all top_cpuset descendants:
           update_tasks_root_domain()
             for all tasks of cpuset:
               dl_add_task_root_domain()

Change it so that only the task pi lock is taken to check if the task
has a SCHED_DEADLINE (DL) policy. In case that p is a DL task take the
rq lock as well to be able to safely de-reference root domain's DL
bandwidth structure.

Most of the tasks will have another policy (namely SCHED_NORMAL) and
can now bail without taking the rq lock.

One thing to note here: Even in case that there aren't any DL user
tasks, a slow frequency switching system with cpufreq gov schedutil has
a DL task (sugov) per frequency domain running which participates in DL
bandwidth management.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Reviewed-by: Valentin Schneider <valentin.schneider@arm.com>
Reviewed-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lkml.kernel.org/r/20210119083542.19856-1-dietmar.eggemann@arm.com
3 years agouprobes: (Re)add missing get_uprobe() in __find_uprobe()
Sven Schnelle [Tue, 9 Feb 2021 15:07:11 +0000 (16:07 +0100)]
uprobes: (Re)add missing get_uprobe() in __find_uprobe()

commit c6bc9bd06dff ("rbtree, uprobes: Use rbtree helpers")
accidentally removed the refcount increase. Add it again.

Fixes: c6bc9bd06dff ("rbtree, uprobes: Use rbtree helpers")
Signed-off-by: Sven Schnelle <svens@linux.ibm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210209150711.36778-1-svens@linux.ibm.com
3 years agosmp: Process pending softirqs in flush_smp_call_function_from_idle()
Sebastian Andrzej Siewior [Sat, 23 Jan 2021 20:10:25 +0000 (21:10 +0100)]
smp: Process pending softirqs in flush_smp_call_function_from_idle()

send_call_function_single_ipi() may wake an idle CPU without sending an
IPI. The woken up CPU will process the SMP-functions in
flush_smp_call_function_from_idle(). Any raised softirq from within the
SMP-function call will not be processed.
Should the CPU have no tasks assigned, then it will go back to idle with
pending softirqs and the NOHZ will rightfully complain.

Process pending softirqs on return from flush_smp_call_function_queue().

Fixes: b2a02fc43a1f4 ("smp: Optimize send_call_function_single_ipi()")
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210123201027.3262800-2-bigeasy@linutronix.de
3 years agosched: Harden PREEMPT_DYNAMIC
Peter Zijlstra [Mon, 25 Jan 2021 15:26:50 +0000 (16:26 +0100)]
sched: Harden PREEMPT_DYNAMIC

Use the new EXPORT_STATIC_CALL_TRAMP() / static_call_mod() to unexport
the static_call_key for the PREEMPT_DYNAMIC calls such that modules
can no longer update these calls.

Having modules change/hi-jack the preemption calls would be horrible.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
3 years agostatic_call: Allow module use without exposing static_call_key
Josh Poimboeuf [Wed, 27 Jan 2021 23:18:37 +0000 (17:18 -0600)]
static_call: Allow module use without exposing static_call_key

When exporting static_call_key; with EXPORT_STATIC_CALL*(), the module
can use static_call_update() to change the function called.  This is
not desirable in general.

Not exporting static_call_key however also disallows usage of
static_call(), since objtool needs the key to construct the
static_call_site.

Solve this by allowing objtool to create the static_call_site using
the trampoline address when it builds a module and cannot find the
static_call_key symbol. The module loader will then try and map the
trampole back to a key before it constructs the normal sites list.

Doing this requires a trampoline -> key associsation, so add another
magic section that keeps those.

Originally-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210127231837.ifddpn7rhwdaepiu@treble
3 years agosched: Add /debug/sched_preempt
Peter Zijlstra [Fri, 22 Jan 2021 12:01:58 +0000 (13:01 +0100)]
sched: Add /debug/sched_preempt

Add a debugfs file to muck about with the preempt mode at runtime.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/YAsGiUYf6NyaTplX@hirez.programming.kicks-ass.net
3 years agopreempt/dynamic: Support dynamic preempt with preempt= boot option
Peter Zijlstra (Intel) [Mon, 18 Jan 2021 14:12:23 +0000 (15:12 +0100)]
preempt/dynamic: Support dynamic preempt with preempt= boot option

Support the preempt= boot option and patch the static call sites
accordingly.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-9-frederic@kernel.org
3 years agopreempt/dynamic: Provide irqentry_exit_cond_resched() static call
Peter Zijlstra (Intel) [Mon, 18 Jan 2021 14:12:22 +0000 (15:12 +0100)]
preempt/dynamic: Provide irqentry_exit_cond_resched() static call

Provide static call to control IRQ preemption (called in CONFIG_PREEMPT)
so that we can override its behaviour when preempt= is overriden.

Since the default behaviour is full preemption, its call is
initialized to provide IRQ preemption when preempt= isn't passed.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-8-frederic@kernel.org
3 years agopreempt/dynamic: Provide preempt_schedule[_notrace]() static calls
Peter Zijlstra (Intel) [Mon, 18 Jan 2021 14:12:21 +0000 (15:12 +0100)]
preempt/dynamic: Provide preempt_schedule[_notrace]() static calls

Provide static calls to control preempt_schedule[_notrace]()
(called in CONFIG_PREEMPT) so that we can override their behaviour when
preempt= is overriden.

Since the default behaviour is full preemption, both their calls are
initialized to the arch provided wrapper, if any.

[fweisbec: only define static calls when PREEMPT_DYNAMIC, make it less
           dependent on x86 with __preempt_schedule_func]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-7-frederic@kernel.org
3 years agopreempt/dynamic: Provide cond_resched() and might_resched() static calls
Peter Zijlstra (Intel) [Mon, 18 Jan 2021 14:12:20 +0000 (15:12 +0100)]
preempt/dynamic: Provide cond_resched() and might_resched() static calls

Provide static calls to control cond_resched() (called in !CONFIG_PREEMPT)
and might_resched() (called in CONFIG_PREEMPT_VOLUNTARY) to that we
can override their behaviour when preempt= is overriden.

Since the default behaviour is full preemption, both their calls are
ignored when preempt= isn't passed.

  [fweisbec: branch might_resched() directly to __cond_resched(), only
             define static calls when PREEMPT_DYNAMIC]

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-6-frederic@kernel.org
3 years agopreempt: Introduce CONFIG_PREEMPT_DYNAMIC
Michal Hocko [Mon, 18 Jan 2021 14:12:19 +0000 (15:12 +0100)]
preempt: Introduce CONFIG_PREEMPT_DYNAMIC

Preemption mode selection is currently hardcoded on Kconfig choices.
Introduce a dedicated option to tune preemption flavour at boot time,

This will be only available on architectures efficiently supporting
static calls in order not to tempt with the feature against additional
overhead that might be prohibitive or undesirable.

CONFIG_PREEMPT_DYNAMIC is automatically selected by CONFIG_PREEMPT if
the architecture provides the necessary support (CONFIG_STATIC_CALL_INLINE,
CONFIG_GENERIC_ENTRY, and provide with __preempt_schedule_function() /
__preempt_schedule_notrace_function()).

Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
[peterz: relax requirement to HAVE_STATIC_CALL]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-5-frederic@kernel.org
3 years agostatic_call: Provide DEFINE_STATIC_CALL_RET0()
Frederic Weisbecker [Mon, 18 Jan 2021 14:12:17 +0000 (15:12 +0100)]
static_call: Provide DEFINE_STATIC_CALL_RET0()

DECLARE_STATIC_CALL() must pass the original function targeted for a
given static call. But DEFINE_STATIC_CALL() may want to initialize it as
off. In this case we can't pass NULL (for functions without return value)
or __static_call_return0 (for functions returning a value) directly
to DEFINE_STATIC_CALL() as that may trigger a static call redeclaration
with a different function prototype. Type casts neither can work around
that as they don't get along with typeof().

The proper way to do that for functions that don't return a value is
to use DEFINE_STATIC_CALL_NULL(). But functions returning a actual value
don't have an equivalent yet.

Provide DEFINE_STATIC_CALL_RET0() to solve this situation.

Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-3-frederic@kernel.org
3 years agostatic_call/x86: Add __static_call_return0()
Peter Zijlstra [Mon, 18 Jan 2021 14:12:16 +0000 (15:12 +0100)]
static_call/x86: Add __static_call_return0()

Provide a stub function that return 0 and wire up the static call site
patching to replace the CALL with a single 5 byte instruction that
clears %RAX, the return value register.

The function can be cast to any function pointer type that has a
single %RAX return (including pointers). Also provide a version that
returns an int for convenience. We are clearing the entire %RAX register
in any case, whether the return value is 32 or 64 bits, since %RAX is
always a scratch register anyway.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-2-frederic@kernel.org
3 years agostatic_call: Pull some static_call declarations to the type headers
Peter Zijlstra [Mon, 18 Jan 2021 14:12:18 +0000 (15:12 +0100)]
static_call: Pull some static_call declarations to the type headers

Some static call declarations are going to be needed on low level header
files. Move the necessary material to the dedicated static call types
header to avoid inclusion dependency hell.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210118141223.123667-4-frederic@kernel.org
3 years agosched/core: Update task_prio() function header
Dietmar Eggemann [Thu, 28 Jan 2021 13:10:40 +0000 (14:10 +0100)]
sched/core: Update task_prio() function header

The description of the RT offset and the values for 'normal' tasks needs
update. Moreover there are DL tasks now.
task_prio() has to stay like it is to guarantee compatibility with the
/proc/<pid>/stat priority field:

  # cat /proc/<pid>/stat | awk '{ print $18; }'

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210128131040.296856-4-dietmar.eggemann@arm.com
3 years agosched: Remove USER_PRIO, TASK_USER_PRIO and MAX_USER_PRIO
Dietmar Eggemann [Thu, 28 Jan 2021 13:10:39 +0000 (14:10 +0100)]
sched: Remove USER_PRIO, TASK_USER_PRIO and MAX_USER_PRIO

The only remaining use of MAX_USER_PRIO (and USER_PRIO) is the
SCALE_PRIO() definition in the PowerPC Cell architecture's Synergistic
Processor Unit (SPU) scheduler. TASK_USER_PRIO isn't used anymore.

Commit fe443ef2ac42 ("[POWERPC] spusched: Dynamic timeslicing for
SCHED_OTHER") copied SCALE_PRIO() from the task scheduler in v2.6.23.

Commit a4ec24b48dde ("sched: tidy up SCHED_RR") removed it from the task
scheduler in v2.6.24.

Commit 3ee237dddcd8 ("sched/prio: Add 3 macros of MAX_NICE, MIN_NICE and
NICE_WIDTH in prio.h") introduced NICE_WIDTH much later.

With:

  MAX_USER_PRIO = USER_PRIO(MAX_PRIO)

                = MAX_PRIO - MAX_RT_PRIO

       MAX_PRIO = MAX_RT_PRIO + NICE_WIDTH

  MAX_USER_PRIO = MAX_RT_PRIO + NICE_WIDTH - MAX_RT_PRIO

  MAX_USER_PRIO = NICE_WIDTH

MAX_USER_PRIO can be replaced by NICE_WIDTH to be able to remove all the
{*_}USER_PRIO defines.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210128131040.296856-3-dietmar.eggemann@arm.com
3 years agosched: Remove MAX_USER_RT_PRIO
Dietmar Eggemann [Thu, 28 Jan 2021 13:10:38 +0000 (14:10 +0100)]
sched: Remove MAX_USER_RT_PRIO

Commit d46523ea32a7 ("[PATCH] fix MAX_USER_RT_PRIO and MAX_RT_PRIO")
was introduced due to a a small time period in which the realtime patch
set was using different values for MAX_USER_RT_PRIO and MAX_RT_PRIO.

This is no longer true, i.e. now MAX_RT_PRIO == MAX_USER_RT_PRIO.

Get rid of MAX_USER_RT_PRIO and make everything use MAX_RT_PRIO
instead.

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Link: https://lkml.kernel.org/r/20210128131040.296856-2-dietmar.eggemann@arm.com
3 years agosched/topology: Fix sched_domain_topology_level alloc in sched_init_numa()
Dietmar Eggemann [Mon, 1 Feb 2021 09:53:53 +0000 (10:53 +0100)]
sched/topology: Fix sched_domain_topology_level alloc in sched_init_numa()

Commit "sched/topology: Make sched_init_numa() use a set for the
deduplicating sort" allocates 'i + nr_levels (level)' instead of
'i + nr_levels + 1' sched_domain_topology_level.

This led to an Oops (on Arm64 juno with CONFIG_SCHED_DEBUG):

sched_init_domains
  build_sched_domains()
    __free_domain_allocs()
      __sdt_free() {
...
        for_each_sd_topology(tl)
  ...
          sd = *per_cpu_ptr(sdd->sd, j); <--
  ...
      }

Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Barry Song <song.bao.hua@hisilicon.com>
Link: https://lkml.kernel.org/r/6000e39e-7d28-c360-9cd6-8798fd22a9bf@arm.com
3 years agorbtree, timerqueue: Use rb_add_cached()
Peter Zijlstra [Wed, 29 Apr 2020 15:07:53 +0000 (17:07 +0200)]
rbtree, timerqueue: Use rb_add_cached()

Reduce rbtree boiler plate by using the new helpers.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
3 years agorbtree, rtmutex: Use rb_add_cached()
Peter Zijlstra [Wed, 29 Apr 2020 15:29:58 +0000 (17:29 +0200)]
rbtree, rtmutex: Use rb_add_cached()

Reduce rbtree boiler plate by using the new helpers.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
3 years agorbtree, uprobes: Use rbtree helpers
Peter Zijlstra [Wed, 29 Apr 2020 15:06:27 +0000 (17:06 +0200)]
rbtree, uprobes: Use rbtree helpers

Reduce rbtree boilerplate by using the new helpers.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
3 years agorbtree, perf: Use new rbtree helpers
Peter Zijlstra [Wed, 29 Apr 2020 15:05:15 +0000 (17:05 +0200)]
rbtree, perf: Use new rbtree helpers

Reduce rbtree boiler plate by using the new helpers.

One noteworthy change is unification of the various (partial) compare
functions. We construct a subtree match by forcing the sub-order to
always match, see __group_cmp().

Due to 'const' we had to touch cgroup_id().

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
3 years agorbtree, sched/deadline: Use rb_add_cached()
Peter Zijlstra [Wed, 29 Apr 2020 15:04:41 +0000 (17:04 +0200)]
rbtree, sched/deadline: Use rb_add_cached()

Reduce rbtree boiler plate by using the new helpers.

Make rb_add_cached() / rb_erase_cached() return a pointer to the
leftmost node to aid in updating additional state.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
3 years agorbtree, sched/fair: Use rb_add_cached()
Peter Zijlstra [Wed, 29 Apr 2020 15:04:12 +0000 (17:04 +0200)]
rbtree, sched/fair: Use rb_add_cached()

Reduce rbtree boiler plate by using the new helper function.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
3 years agorbtree: Add generic add and find helpers
Peter Zijlstra [Wed, 29 Apr 2020 15:03:22 +0000 (17:03 +0200)]
rbtree: Add generic add and find helpers

I've always been bothered by the endless (fragile) boilerplate for
rbtree, and I recently wrote some rbtree helpers for objtool and
figured I should lift them into the kernel and use them more widely.

Provide:

partial-order; less() based:
 - rb_add(): add a new entry to the rbtree
 - rb_add_cached(): like rb_add(), but for a rb_root_cached

total-order; cmp() based:
 - rb_find(): find an entry in an rbtree
 - rb_find_add(): find an entry, and add if not found

 - rb_find_first(): find the first (leftmost) matching entry
 - rb_next_match(): continue from rb_find_first()
 - rb_for_each(): iterate a sub-tree using the previous two

Inlining and constant propagation should see the compiler inline the
whole thing, including the various compare functions.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Michel Lespinasse <walken@google.com>
Acked-by: Davidlohr Bueso <dbueso@suse.de>
3 years agosched/fair: Merge select_idle_core/cpu()
Mel Gorman [Wed, 27 Jan 2021 13:52:03 +0000 (13:52 +0000)]
sched/fair: Merge select_idle_core/cpu()

Both select_idle_core() and select_idle_cpu() do a loop over the same
cpumask. Observe that by clearing the already visited CPUs, we can
fold the iteration and iterate a core at a time.

All we need to do is remember any non-idle CPU we encountered while
scanning for an idle core. This way we'll only iterate every CPU once.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210127135203.19633-5-mgorman@techsingularity.net
3 years agosched/fair: Remove select_idle_smt()
Mel Gorman [Mon, 25 Jan 2021 08:59:08 +0000 (08:59 +0000)]
sched/fair: Remove select_idle_smt()

In order to make the next patch more readable, and to quantify the
actual effectiveness of this pass, start by removing it.

Signed-off-by: Mel Gorman <mgorman@techsingularity.net>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20210125085909.4600-4-mgorman@techsingularity.net
3 years agoMerge tag 'v5.11' into sched/core, to pick up fixes & refresh the branch
Ingo Molnar [Wed, 17 Feb 2021 13:04:39 +0000 (14:04 +0100)]
Merge tag 'v5.11' into sched/core, to pick up fixes & refresh the branch

Signed-off-by: Ingo Molnar <mingo@kernel.org>
3 years agodrm/i915/gt: Correct surface base address for renderclear
Chris Wilson [Wed, 10 Feb 2021 12:27:28 +0000 (12:27 +0000)]
drm/i915/gt: Correct surface base address for renderclear

The surface_state_base is an offset into the batch, so we need to pass
the correct batch address for STATE_BASE_ADDRESS.

Fixes: 47f8253d2b89 ("drm/i915/gen7: Clear all EU/L3 residual contexts")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Prathap Kumar Valsan <prathap.kumar.valsan@intel.com>
Cc: Akeem G Abodunrin <akeem.g.abodunrin@intel.com>
Cc: Hans de Goede <hdegoede@redhat.com>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: <stable@vger.kernel.org> # v5.7+
Link: https://patchwork.freedesktop.org/patch/msgid/20210210122728.20097-1-chris@chris-wilson.co.uk
(cherry picked from commit 1914911f4aa08ddc05bae71d3516419463e0c567)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
3 years agodrm/i915: Disallow plane x+w>stride on ilk+ with X-tiling
Ville Syrjälä [Tue, 9 Feb 2021 02:19:16 +0000 (04:19 +0200)]
drm/i915: Disallow plane x+w>stride on ilk+ with X-tiling

ilk+ planes get notably unhappy when the plane x+w exceeds
the stride. This wasn't a problem previously because we
always aligned SURF to the closest tile boundary so the
x offset never got particularly large. But now with async
flips we have to align to 256KiB instead and thus this
becomes a real issue.

On ilk/snb/ivb it looks like the accesses just wrap
early to the next tile row when scanout goes past the
SURF+n*stride boundary, hsw/bdw suffer more heavily and
start to underrun constantly. i965/g4x appear to be immune.
vlv/chv I've not yet checked.

Let's borrow another trick from the skl+ code and search
backwards for a better SURF offset in the hopes of getting the
x offset below the limit. IIRC when I ran into a similar issue
on skl years ago it was causing the hardware to fall over
pretty hard as well.

And let's be consistent and include i965/g4x in the check
as well, just in case I just got super lucky somehow when
I wasn't able to reproduce the issue. Not that it really
matters since we still use 4k SURF alignment for i965/g4x
anyway.

Fixes: 6ede6b0616b2 ("drm/i915: Implement async flips for vlv/chv")
Fixes: 4bb18054adc4 ("drm/i915: Implement async flip for ilk/snb")
Fixes: 2a636e240c77 ("drm/i915: Implement async flip for ivb/hsw")
Fixes: cda195f13abd ("drm/i915: Implement async flips for bdw")
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
Link: https://patchwork.freedesktop.org/patch/msgid/20210209021918.16234-1-ville.syrjala@linux.intel.com
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
(cherry picked from commit 59fb8218c8e5001f854e7d5fdb5fb135cba58102)
Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
[Rodrigo also exported some functions from intel_display.c during backport]

3 years agoMerge branch 'perf/kprobes' into perf/core, to pick up finished branch
Ingo Molnar [Wed, 17 Feb 2021 10:50:11 +0000 (11:50 +0100)]
Merge branch 'perf/kprobes' into perf/core, to pick up finished branch

Signed-off-by: Ingo Molnar <mingo@kernel.org>
3 years agonet: re-solve some conflicts after net -> net-next merge
Jakub Kicinski [Wed, 17 Feb 2021 06:58:44 +0000 (22:58 -0800)]
net: re-solve some conflicts after net -> net-next merge

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
3 years agoMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
David S. Miller [Wed, 17 Feb 2021 01:30:20 +0000 (17:30 -0800)]
Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

3 years agonet: dsa: tag_rtl4_a: Support also egress tags
Linus Walleij [Tue, 16 Feb 2021 23:55:42 +0000 (00:55 +0100)]
net: dsa: tag_rtl4_a: Support also egress tags

Support also transmitting frames using the custom "8899 A"
4 byte tag.

Qingfang came up with the solution: we need to pad the
ethernet frame to 60 bytes using eth_skb_pad(), then the
switch will happily accept frames with custom tags.

Cc: Mauri Sandberg <sandberg@mailfence.com>
Reported-by: DENG Qingfang <dqfext@gmail.com>
Fixes: efd7fe68f0c6 ("net: dsa: tag_rtl4_a: Implement Realtek 4 byte A tag")
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'broadcom-next'
David S. Miller [Tue, 16 Feb 2021 23:23:24 +0000 (15:23 -0800)]
Merge branch 'broadcom-next'

Robert Hancock says:

====================
Broadcom PHY driver updates

Updates to the Broadcom PHY driver related to use with copper SFP modules.

Changed since v3:
-fixed kerneldoc error

Changed since v2:
-Create flag for PHY on SFP module and use that rather than accessing
 attached_dev directly in PHY driver

Changed since v1:
-Reversed conditional to reduce indentation
-Added missing setting of MII_BCM54XX_AUXCTL_MISC_WREN in
 MII_BCM54XX_AUXCTL_SHDWSEL_MISC register
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: broadcom: Do not modify LED configuration for SFP module PHYs
Robert Hancock [Tue, 16 Feb 2021 22:54:54 +0000 (16:54 -0600)]
net: phy: broadcom: Do not modify LED configuration for SFP module PHYs

bcm54xx_config_init was modifying the PHY LED configuration to enable link
and activity indications. However, some SFP modules (such as Bel-Fuse
SFP-1GBT-06) have no LEDs but use the LED outputs to control the SFP LOS
signal, and modifying the LED settings will cause the LOS output to
malfunction. Skip this configuration for PHYs which are bound to an SFP
bus.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: Add is_on_sfp_module flag and phy_on_sfp helper
Robert Hancock [Tue, 16 Feb 2021 22:54:53 +0000 (16:54 -0600)]
net: phy: Add is_on_sfp_module flag and phy_on_sfp helper

Add a flag and helper function to indicate that a PHY device is part of
an SFP module, which is set on attach. This can be used by PHY drivers
to handle SFP-specific quirks or behavior.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: broadcom: Set proper 1000BaseX/SGMII interface mode for BCM54616S
Robert Hancock [Tue, 16 Feb 2021 22:54:52 +0000 (16:54 -0600)]
net: phy: broadcom: Set proper 1000BaseX/SGMII interface mode for BCM54616S

The default configuration for the BCM54616S PHY may not match the desired
mode when using 1000BaseX or SGMII interface modes, such as when it is on
an SFP module. Add code to explicitly set the correct mode using
programming sequences provided by Bel-Fuse:

https://www.belfuse.com/resources/datasheets/powersolutions/ds-bps-sfp-1gbt-05-series.pdf
https://www.belfuse.com/resources/datasheets/powersolutions/ds-bps-sfp-1gbt-06-series.pdf

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agolan743x: sync only the received area of an rx ring buffer
Sven Van Asbroeck [Tue, 16 Feb 2021 01:08:03 +0000 (20:08 -0500)]
lan743x: sync only the received area of an rx ring buffer

On cpu architectures w/o dma cache snooping, dma_unmap() is a
is a very expensive operation, because its resulting sync
needs to invalidate cpu caches.

Increase efficiency/performance by syncing only those sections
of the lan743x's rx ring buffers that are actually in use.

Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Reviewed-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agolan743x: boost performance on cpu archs w/o dma cache snooping
Sven Van Asbroeck [Tue, 16 Feb 2021 01:08:02 +0000 (20:08 -0500)]
lan743x: boost performance on cpu archs w/o dma cache snooping

The buffers in the lan743x driver's receive ring are always 9K,
even when the largest packet that can be received (the mtu) is
much smaller. This performs particularly badly on cpu archs
without dma cache snooping (such as ARM): each received packet
results in a 9K dma_{map|unmap} operation, which is very expensive
because cpu caches need to be invalidated.

Careful measurement of the driver rx path on armv7 reveals that
the cpu spends the majority of its time waiting for cache
invalidation.

Optimize by keeping the rx ring buffer size as close as possible
to the mtu. This limits the amount of cache that requires
invalidation.

This optimization would normally force us to re-allocate all
ring buffers when the mtu is changed - a disruptive event,
because it can only happen when the network interface is down.

Remove the need to re-allocate all ring buffers by adding support
for multi-buffer frames. Now any combination of mtu and ring
buffer size will work. When the mtu changes from mtu1 to mtu2,
consumed buffers of size mtu1 are lazily replaced by newly
allocated buffers of size mtu2.

These optimizations double the rx performance on armv7.
Third parties report 3x rx speedup on armv8.

Tested with iperf3 on a freescale imx6qp + lan7430, both sides
set to mtu 1500 bytes, measure rx performance:

Before:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-20.00  sec   550 MBytes   231 Mbits/sec    0
After:
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-20.00  sec  1.33 GBytes   570 Mbits/sec    0

Signed-off-by: Sven Van Asbroeck <thesven73@gmail.com>
Reviewed-by: Bryan Whitehead <Bryan.Whitehead@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: enetc: fix destroyed phylink dereference during unbind
Vladimir Oltean [Tue, 16 Feb 2021 10:16:28 +0000 (12:16 +0200)]
net: enetc: fix destroyed phylink dereference during unbind

The following call path suggests that calling unregister_netdev on an
interface that is up will first bring it down.

enetc_pf_remove
-> unregister_netdev
   -> unregister_netdevice_queue
      -> unregister_netdevice_many
         -> dev_close_many
            -> __dev_close_many
               -> enetc_close
                  -> enetc_stop
                     -> phylink_stop

However, enetc first destroys the phylink instance, then calls
unregister_netdev. This is already dissimilar to the setup (and error
path teardown path) from enetc_pf_probe, but more than that, it is buggy
because it is invalid to call phylink_stop after phylink_destroy.

So let's first unregister the netdev (and let the .ndo_stop events
consume themselves), then destroy the phylink instance, then free the
netdev.

Fixes: 71b77a7a27a3 ("enetc: Migrate to PHYLINK and PCS_LYNX")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'net-mvneta-implement-basic-MQPrio-support'
David S. Miller [Tue, 16 Feb 2021 23:03:26 +0000 (15:03 -0800)]
Merge branch 'net-mvneta-implement-basic-MQPrio-support'

Maxime Chevallier says:

====================
net: mvneta: implement basic MQPrio support

This is V2 for the MQPrio support in mvneta.

This small series adds basic support for mqprio offloading, by having
the rx queueing mirroring the TCs based on VLAN prio fields.

This was tested on Armada 3700, and proves useful to make sure
high-priority traffic has a better chance not getting dropped when
there's lots of packets incoming.

The first patch of the series deals with the per-cpu interrupts on the
armada 3700. Since they don't work, there were already some patches
applied to keep all queue mappings to CPU0, but there still were some
remaining mappings left to be dealt with.

The second patch implements the MQPrio offloading for the receive path.

Changes in V2 :
 - Add a Fixes tag for the first patch
 - Fix some warnings and the xmas tree in the second patch
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mvneta: Implement mqprio support
Maxime Chevallier [Tue, 16 Feb 2021 09:25:36 +0000 (10:25 +0100)]
net: mvneta: Implement mqprio support

Implement a basic MQPrio support, inserting rules in RX that translate
the TC to prio mapping into vlan prio to queues.

The TX logic stays the same as when we don't offload the qdisc.

Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mvneta: Remove per-cpu queue mapping for Armada 3700
Maxime Chevallier [Tue, 16 Feb 2021 09:25:35 +0000 (10:25 +0100)]
net: mvneta: Remove per-cpu queue mapping for Armada 3700

According to Errata #23 "The per-CPU GbE interrupt is limited to Core
0", we can't use the per-cpu interrupt mechanism on the Armada 3700
familly.

This is correctly checked for RSS configuration, but the initial queue
mapping is still done by having the queues spread across all the CPUs in
the system, both in the init path and in the cpu_hotplug path.

Fixes: 2636ac3cc2b4 ("net: mvneta: Add network support for Armada 3700 SoC")
Signed-off-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: sched: fix police ext initialization
Vlad Buslov [Tue, 16 Feb 2021 16:22:00 +0000 (18:22 +0200)]
net: sched: fix police ext initialization

When police action is created by cls API tcf_exts_validate() first
conditional that calls tcf_action_init_1() directly, the action idr is not
updated according to latest changes in action API that require caller to
commit newly created action to idr with tcf_idr_insert_many(). This results
such action not being accessible through act API and causes crash reported
by syzbot:

==================================================================
BUG: KASAN: null-ptr-deref in instrument_atomic_read include/linux/instrumented.h:71 [inline]
BUG: KASAN: null-ptr-deref in atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
BUG: KASAN: null-ptr-deref in __tcf_idr_release net/sched/act_api.c:178 [inline]
BUG: KASAN: null-ptr-deref in tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
Read of size 4 at addr 0000000000000010 by task kworker/u4:5/204

CPU: 0 PID: 204 Comm: kworker/u4:5 Not tainted 5.11.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 __kasan_report mm/kasan/report.c:400 [inline]
 kasan_report.cold+0x5f/0xd5 mm/kasan/report.c:413
 check_memory_region_inline mm/kasan/generic.c:179 [inline]
 check_memory_region+0x13d/0x180 mm/kasan/generic.c:185
 instrument_atomic_read include/linux/instrumented.h:71 [inline]
 atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
 __tcf_idr_release net/sched/act_api.c:178 [inline]
 tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
 tc_action_net_exit include/net/act_api.h:151 [inline]
 police_exit_net+0x168/0x360 net/sched/act_police.c:390
 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190
 cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604
 process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
 kthread+0x3b1/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
==================================================================
Kernel panic - not syncing: panic_on_warn set ...
CPU: 0 PID: 204 Comm: kworker/u4:5 Tainted: G    B             5.11.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
Workqueue: netns cleanup_net
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x107/0x163 lib/dump_stack.c:120
 panic+0x306/0x73d kernel/panic.c:231
 end_report+0x58/0x5e mm/kasan/report.c:100
 __kasan_report mm/kasan/report.c:403 [inline]
 kasan_report.cold+0x67/0xd5 mm/kasan/report.c:413
 check_memory_region_inline mm/kasan/generic.c:179 [inline]
 check_memory_region+0x13d/0x180 mm/kasan/generic.c:185
 instrument_atomic_read include/linux/instrumented.h:71 [inline]
 atomic_read include/asm-generic/atomic-instrumented.h:27 [inline]
 __tcf_idr_release net/sched/act_api.c:178 [inline]
 tcf_idrinfo_destroy+0x129/0x1d0 net/sched/act_api.c:598
 tc_action_net_exit include/net/act_api.h:151 [inline]
 police_exit_net+0x168/0x360 net/sched/act_police.c:390
 ops_exit_list+0x10d/0x160 net/core/net_namespace.c:190
 cleanup_net+0x4ea/0xb10 net/core/net_namespace.c:604
 process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
 worker_thread+0x64c/0x1120 kernel/workqueue.c:2421
 kthread+0x3b1/0x4a0 kernel/kthread.c:292
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:296
Kernel Offset: disabled

Fix the issue by calling tcf_idr_insert_many() after successful action
initialization.

Fixes: 0fedc63fadf0 ("net_sched: commit action insertions together")
Reported-by: syzbot+151e3e714d34ae4ce7e8@syzkaller.appspotmail.com
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
Reviewed-by: Cong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox...
David S. Miller [Tue, 16 Feb 2021 22:53:30 +0000 (14:53 -0800)]
Merge branch 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux

Saeed Mahameed says:
====================
pull-request: mlx5-next 2021-02-16

The patches in this pr are already submitted and reviewed through the
netdev and rdma mailing lists.

The series includes mlx5 HW bits and definitions for mlx5 real time clock
translation and handling in the mlx5 driver clock module to enable and
support such mode [1]

[1] https://patchwork.kernel.org/project/netdevbpf/patch/20210212223042.449816-7-saeed@kernel.org/
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodrivers: net: xilinx_emaclite: remove arch limitation
Gary Guo [Tue, 16 Feb 2021 22:33:42 +0000 (22:33 +0000)]
drivers: net: xilinx_emaclite: remove arch limitation

The changes made in eccd540 is enough for xilinx_emaclite to run
without problem on 64-bit systems. I have tested it on a Xilinx
FPGA with RV64 softcore. The architecture limitation in Kconfig
seems no longer necessary.

A small change is included to print address with %lx instead of
casting to int and print with %x.

Signed-off-by: Gary Guo <gary@garyguo.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'bridge-mrp-Extend-br_mrp_switchdev_'
David S. Miller [Tue, 16 Feb 2021 22:47:46 +0000 (14:47 -0800)]
Merge branch 'bridge-mrp-Extend-br_mrp_switchdev_'

Horatiu Vulturv says:

====================
bridge: mrp: Extend br_mrp_switchdev_*

This patch series extends MRP switchdev to allow the SW to have a better
understanding if the HW can implement the MRP functionality or it needs
to help the HW to run it. There are 3 cases:
- when HW can't implement at all the functionality.
- when HW can implement a part of the functionality but needs the SW
  implement the rest. For example if it can't detect when it stops
  receiving MRP Test frames but it can copy the MRP frames to CPU to
  allow the SW to determine this.  Another example is generating the MRP
  Test frames. If HW can't do that then the SW is used as backup.
- when HW can implement completely the functionality.

So, initially the SW tries to offload the entire functionality in HW, if
that fails it tries offload parts of the functionality in HW and use the
SW as helper and if also this fails then MRP can't run on this HW.

Based on these new calls, implement the switchdev for Ocelot driver. This
is an example where the HW can't run completely the functionality but it
can help the SW to run it, by trapping all MRP frames to CPU.

Also this patch series adds MRP support to DSA and implements the Felix
driver which just reuse the Ocelot functions. This part was just compiled
tested because I don't have any HW on which to do the actual tests.

v4:
 - remove ifdef MRP from include/net/switchdev.h
 - move MRP implementation for Ocelot in a different file such that
   Felix driver can use it.
 - extend DSA with MRP support
 - implement MRP support for Felix.
v3:
 - implement the switchdev calls needed by Ocelot driver.
v2:
 - fix typos in comments and in commit messages
 - remove some of the comments
 - move repeated code in helper function
 - fix issue when deleting a node when sw_backup was true
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: felix: Add support for MRP
Horatiu Vultur [Tue, 16 Feb 2021 21:42:05 +0000 (22:42 +0100)]
net: dsa: felix: Add support for MRP

Implement functions 'port_mrp_add', 'port_mrp_del',
'port_mrp_add_ring_role' and 'port_mrp_del_ring_role' to call the mrp
functions from ocelot.

Also all MRP frames that arrive to CPU on queue number OCELOT_MRP_CPUQ
will be forward by the SW.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: add MRP support
Horatiu Vultur [Tue, 16 Feb 2021 21:42:04 +0000 (22:42 +0100)]
net: dsa: add MRP support

Add support for offloading MRP in HW. Currently implement the switchdev
calls 'SWITCHDEV_OBJ_ID_MRP', 'SWITCHDEV_OBJ_ID_RING_ROLE_MRP',
to allow to create MRP instances and to set the role of these instances.

Add DSA_NOTIFIER_MRP_ADD/DEL and DSA_NOTIFIER_MRP_ADD/DEL_RING_ROLE
which calls to .port_mrp_add/del and .port_mrp_add/del_ring_role in the
DSA driver for the switch.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: mscc: ocelot: Add support for MRP
Horatiu Vultur [Tue, 16 Feb 2021 21:42:03 +0000 (22:42 +0100)]
net: mscc: ocelot: Add support for MRP

Add basic support for MRP. The HW will just trap all MRP frames on the
ring ports to CPU and allow the SW to process them. In this way it is
possible to for this node to behave both as MRM and MRC.

Current limitations are:
- it doesn't support Interconnect roles.
- it supports only a single ring.
- the HW should be able to do forwarding of MRP Test frames so the SW
  will not need to do this. So it would be able to have the role MRC
  without SW support.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobridge: mrp: Update br_mrp to use new return values of br_mrp_switchdev
Horatiu Vultur [Tue, 16 Feb 2021 21:42:02 +0000 (22:42 +0100)]
bridge: mrp: Update br_mrp to use new return values of br_mrp_switchdev

Check the return values of the br_mrp_switchdev function.
In case of:
- BR_MRP_NONE, return the error to userspace,
- BR_MRP_SW, continue with SW implementation,
- BR_MRP_HW, continue without SW implementation,

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobridge: mrp: Extend br_mrp_switchdev to detect better the errors
Horatiu Vultur [Tue, 16 Feb 2021 21:42:01 +0000 (22:42 +0100)]
bridge: mrp: Extend br_mrp_switchdev to detect better the errors

This patch extends the br_mrp_switchdev functions to be able to have a
better understanding what cause the issue and if the SW needs to be used
as a backup.

There are the following cases:
- when the code is compiled without CONFIG_NET_SWITCHDEV. In this case
  return success so the SW can continue with the protocol. Depending
  on the function, it returns 0 or BR_MRP_SW.
- when code is compiled with CONFIG_NET_SWITCHDEV and the driver doesn't
  implement any MRP callbacks. In this case the HW can't run MRP so it
  just returns -EOPNOTSUPP. So the SW will stop further to configure the
  node.
- when code is compiled with CONFIG_NET_SWITCHDEV and the driver fully
  supports any MRP functionality. In this case the SW doesn't need to do
  anything. The functions will return 0 or BR_MRP_HW.
- when code is compiled with CONFIG_NET_SWITCHDEV and the HW can't run
  completely the protocol but it can help the SW to run it. For
  example, the HW can't support completely MRM role(can't detect when it
  stops receiving MRP Test frames) but it can redirect these frames to
  CPU. In this case it is possible to have a SW fallback. The SW will
  try initially to call the driver with sw_backup set to false, meaning
  that the HW should implement completely the role. If the driver returns
  -EOPNOTSUPP, the SW will try again with sw_backup set to false,
  meaning that the SW will detect when it stops receiving the frames but
  it needs HW support to redirect the frames to CPU. In case the driver
  returns 0 then the SW will continue to configure the node accordingly.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agobridge: mrp: Add 'enum br_mrp_hw_support'
Horatiu Vultur [Tue, 16 Feb 2021 21:42:00 +0000 (22:42 +0100)]
bridge: mrp: Add 'enum br_mrp_hw_support'

Add the enum br_mrp_hw_support that is used by the br_mrp_switchdev
functions to allow the SW to detect the cases where HW can't implement
the functionality or when SW is used as a backup.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoswitchdev: mrp: Extend ring_role_mrp and in_role_mrp
Horatiu Vultur [Tue, 16 Feb 2021 21:41:59 +0000 (22:41 +0100)]
switchdev: mrp: Extend ring_role_mrp and in_role_mrp

Add the member sw_backup to the structures switchdev_obj_ring_role_mrp
and switchdev_obj_in_role_mrp. In this way the SW can call the driver in
2 ways, once when sw_backup is set to false, meaning that the driver
should implement this completely in HW. And if that is not supported the
SW will call again but with sw_backup set to true, meaning that the
HW should help or allow the SW to run the protocol.

For example when role is MRM, if the HW can't detect when it stops
receiving MRP Test frames but it can trap these frames to CPU, then it
needs to return -EOPNOTSUPP when sw_backup is false and return 0 when
sw_backup is true.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoswitchdev: mrp: Remove CONFIG_BRIDGE_MRP
Horatiu Vultur [Tue, 16 Feb 2021 21:41:58 +0000 (22:41 +0100)]
switchdev: mrp: Remove CONFIG_BRIDGE_MRP

Remove #IS_ENABLED(CONFIG_BRIDGE_MRP) from switchdev.h. This will
simplify the code implements MRP callbacks and will be similar with the
vlan filtering.

Signed-off-by: Horatiu Vultur <horatiu.vultur@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: marvell: Ensure SGMII auto-negotiation is enabled for 88E1111
Robert Hancock [Tue, 16 Feb 2021 20:53:30 +0000 (14:53 -0600)]
net: phy: marvell: Ensure SGMII auto-negotiation is enabled for 88E1111

When 88E1111 is operating in SGMII mode, auto-negotiation should be enabled
on the SGMII side so that the link will come up properly with PCSes which
normally have auto-negotiation enabled. This is normally the case when the
PHY defaults to SGMII mode at power-up, however if we switched it from some
other mode like 1000Base-X, as may happen in some SFP module situations,
it may not be, particularly for modules which have 1000Base-X
auto-negotiation defaulting to disabled.

Call genphy_check_and_restart_aneg on the fiber page to ensure that auto-
negotiation is properly enabled on the SGMII interface.

Signed-off-by: Robert Hancock <robert.hancock@calian.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'Add-5gbase-r-PHY-interface-mode'
David S. Miller [Tue, 16 Feb 2021 22:15:12 +0000 (14:15 -0800)]
Merge branch 'Add-5gbase-r-PHY-interface-mode'

Marek Behún says:

====================-
Add 5gbase-r PHY interface mode

there is still some testing needed for Amethyst patches, so I have
split the part adding support for 5gbase-r interface mode and am sending
it alone.

The first two patches are already reviewed.

Changes since last patches (Amethyst v16):
- added phylink 5gbase-r handler
- added SFP support for 5gbase-r mode
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agosfp: add support for 5gbase-t SFPs
Marek Behún [Tue, 16 Feb 2021 19:20:55 +0000 (20:20 +0100)]
sfp: add support for 5gbase-t SFPs

The sfp_parse_support() function is setting 5000baseT_Full in some cases.
Now that we have PHY_INTERFACE_MODE_5GBASER interface mode available,
change sfp_select_interface() to return PHY_INTERFACE_MODE_5GBASER if
5000baseT_Full is set in the link mode mask.

Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phylink: Add 5gbase-r support
Marek Behún [Tue, 16 Feb 2021 19:20:54 +0000 (20:20 +0100)]
net: phylink: Add 5gbase-r support

Add 5GBASER interface type and speed to phylink.

Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: Add 5GBASER interface mode
Pavana Sharma [Tue, 16 Feb 2021 19:20:53 +0000 (20:20 +0100)]
net: phy: Add 5GBASER interface mode

Add 5GBASE-R phy interface mode

Signed-off-by: Pavana Sharma <pavana.sharma@digi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agodt-bindings: net: Add 5GBASER phy interface
Pavana Sharma [Tue, 16 Feb 2021 19:20:52 +0000 (20:20 +0100)]
dt-bindings: net: Add 5GBASER phy interface

Add 5gbase-r PHY interface mode.

Signed-off-by: Pavana Sharma <pavana.sharma@digi.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Reviewed-by: Florian Fainelli <f.fainelli@gmail.com>
Acked-by: Rob Herring <robh@kernel.org>
Signed-off-by: Marek Behún <kabel@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agotg3: Remove unused PHY_BRCM flags
Florian Fainelli [Tue, 16 Feb 2021 19:08:37 +0000 (11:08 -0800)]
tg3: Remove unused PHY_BRCM flags

The tg3 driver tried to communicate towards the PHY driver whether it
wanted RGMII in-band signaling enabled or disabled however there is
nothing that looks at those flags in drivers/net/phy/broadcom.c so this
does do not anything.

Suggested-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Florian Fainelli <f.fainelli@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'amd-xgbe-fixes'
David S. Miller [Tue, 16 Feb 2021 22:09:46 +0000 (14:09 -0800)]
Merge branch 'amd-xgbe-fixes'

Shyam Sundar S K says:

====================
Bug fixes to amd-xgbe driver

General fixes on amd-xgbe driver are addressed in this series, mostly
on the mailbox communication failures and improving the link stability
of the amd-xgbe device.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: amd-xgbe: Fix network fluctuations when using 1G BELFUSE SFP
Shyam Sundar S K [Tue, 16 Feb 2021 19:07:10 +0000 (00:37 +0530)]
net: amd-xgbe: Fix network fluctuations when using 1G BELFUSE SFP

Frequent link up/down events can happen when a Bel Fuse SFP part is
connected to the amd-xgbe device. Try to avoid the frequent link
issues by resetting the PHY as documented in Bel Fuse SFP datasheets.

Fixes: e722ec82374b ("amd-xgbe: Update the BelFuse quirk to support SGMII")
Co-developed-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Signed-off-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: amd-xgbe: Reset link when the link never comes back
Shyam Sundar S K [Tue, 16 Feb 2021 19:07:09 +0000 (00:37 +0530)]
net: amd-xgbe: Reset link when the link never comes back

Normally, auto negotiation and reconnect should be automatically done by
the hardware. But there seems to be an issue where auto negotiation has
to be restarted manually. This happens because of link training and so
even though still connected to the partner the link never "comes back".
This needs an auto-negotiation restart.

Also, a change in xgbe-mdio is needed to get ethtool to recognize the
link down and get the link change message. This change is only
required in a backplane connection mode.

Fixes: abf0a1c2b26a ("amd-xgbe: Add support for SFP+ modules")
Co-developed-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Signed-off-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: amd-xgbe: Fix NETDEV WATCHDOG transmit queue timeout warning
Shyam Sundar S K [Tue, 16 Feb 2021 19:07:08 +0000 (00:37 +0530)]
net: amd-xgbe: Fix NETDEV WATCHDOG transmit queue timeout warning

The current driver calls netif_carrier_off() late in the link tear down
which can result in a netdev watchdog timeout.

Calling netif_carrier_off() immediately after netif_tx_stop_all_queues()
avoids the warning.

 ------------[ cut here ]------------
 NETDEV WATCHDOG: enp3s0f2 (amd-xgbe): transmit queue 0 timed out
 WARNING: CPU: 3 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x20d/0x220
 Modules linked in: amd_xgbe(E)  amd-xgbe 0000:03:00.2 enp3s0f2: Link is Down
 CPU: 3 PID: 0 Comm: swapper/3 Tainted: G            E
 Hardware name: AMD Bilby-RV2/Bilby-RV2, BIOS RBB1202A 10/18/2019
 RIP: 0010:dev_watchdog+0x20d/0x220
 Code: 00 49 63 4e e0 eb 92 4c 89 e7 c6 05 c6 e2 c1 00 01 e8 e7 ce fc ff 89 d9 48
 RSP: 0018:ffff90cfc28c3e88 EFLAGS: 00010286
 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000006
 RDX: 0000000000000007 RSI: 0000000000000086 RDI: ffff90cfc28d63c0
 RBP: ffff90cfb977845c R08: 0000000000000050 R09: 0000000000196018
 R10: ffff90cfc28c3ef8 R11: 0000000000000000 R12: ffff90cfb9778000
 R13: 0000000000000003 R14: ffff90cfb9778480 R15: 0000000000000010
 FS:  0000000000000000(0000) GS:ffff90cfc28c0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 00007f240ff2d9d0 CR3: 00000001e3e0a000 CR4: 00000000003406e0
 Call Trace:
  <IRQ>
  ? pfifo_fast_reset+0x100/0x100
  call_timer_fn+0x2b/0x130
  run_timer_softirq+0x3e8/0x440
  ? enqueue_hrtimer+0x39/0x90

Fixes: e722ec82374b ("amd-xgbe: Update the BelFuse quirk to support SGMII")
Co-developed-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Signed-off-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: amd-xgbe: Reset the PHY rx data path when mailbox command timeout
Shyam Sundar S K [Tue, 16 Feb 2021 19:07:07 +0000 (00:37 +0530)]
net: amd-xgbe: Reset the PHY rx data path when mailbox command timeout

Sometimes mailbox commands timeout when the RX data path becomes
unresponsive. This prevents the submission of new mailbox commands to DXIO.
This patch identifies the timeout and resets the RX data path so that the
next message can be submitted properly.

Fixes: 549b32af9f7c ("amd-xgbe: Simplify mailbox interface rate change code")
Co-developed-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Signed-off-by: Sudheesh Mavila <sudheesh.mavila@amd.com>
Signed-off-by: Shyam Sundar S K <Shyam-sundar.S-k@amd.com>
Acked-by: Tom Lendacky <thomas.lendacky@amd.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agoMerge branch 'Fixes-applied-to-VCS8514'
David S. Miller [Tue, 16 Feb 2021 22:06:19 +0000 (14:06 -0800)]
Merge branch 'Fixes-applied-to-VCS8514'

Bjarni Jonasson says:

====================
Fixes applied to VCS8514

3 different fixes applied to VSC8514:
LCPLL reset, serdes calibration and coma mode disabled.
Especially the serdes calibration is large and is now placed
in a new file 'mscc_serdes.c' which can act as
a placeholder for future serdes configuration.

v1 -> v2:
  Preserved reversed christmas tree
  Removed forward definitions
  Fixed build issues
  Changed net to net-next

v2 -> v3:
  Added cover letter.
  Removed ena_clk_bypass from function call
  Created mscc_serdes.c and .h for serdes configuration
  Modified coma register config.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: mscc: coma mode disabled for VSC8514
Bjarni Jonasson [Tue, 16 Feb 2021 15:29:44 +0000 (16:29 +0100)]
net: phy: mscc: coma mode disabled for VSC8514

The 'coma mode' (configurable through sw or hw) provides an
optional feature that may be used to control when the PHYs become active.
The typical usage is to synchronize the link-up time across
all PHY instances. This patch releases coma mode if not done by hardware,
otherwise the phys will not link-up.

Fixes: e4f9ba642f0b ("net: phy: mscc: add support for VSC8514 PHY.")
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: mscc: improved serdes calibration applied to VSC8514
Bjarni Jonasson [Tue, 16 Feb 2021 15:29:43 +0000 (16:29 +0100)]
net: phy: mscc: improved serdes calibration applied to VSC8514

The current IB serdes calibration algorithm (performed by the onboard 8051)
has proven to be unstable for the VSC8514 QSGMII phy.
A new algorithm has been developed based on
'Frequency-offset Jittered-Injection' or 'FoJi' method which solves
all known issues.  This patch disables the 8051 algorithm and
replaces it with the new FoJi algorithm.
The calibration is now performed in a new file (mscc_serdes.c),
which can act as an placeholder for future serdes configurations.

Fixes: e4f9ba642f0b ("net: phy: mscc: add support for VSC8514 PHY.")
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: phy: mscc: adding LCPLL reset to VSC8514
Bjarni Jonasson [Tue, 16 Feb 2021 15:29:42 +0000 (16:29 +0100)]
net: phy: mscc: adding LCPLL reset to VSC8514

At Power-On Reset, transients may cause the LCPLL to lock onto a
clock that is momentarily unstable. This is normally seen in QSGMII
setups where the higher speed 6G SerDes is being used.
This patch adds an initial LCPLL Reset to the PHY (first instance)
to avoid this issue.

Fixes: e4f9ba642f0b ("net: phy: mscc: add support for VSC8514 PHY.")
Signed-off-by: Steen Hegelund <steen.hegelund@microchip.com>
Signed-off-by: Bjarni Jonasson <bjarni.jonasson@microchip.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet/mlx5: Add cyc2time HW translation mode support
Aya Levin [Fri, 12 Feb 2021 22:30:42 +0000 (14:30 -0800)]
net/mlx5: Add cyc2time HW translation mode support

Device timestamp can be in real time mode (cycles to time translation is
offloaded into the Hardware). With real time mode, HW provides timestamp
which is already translated into nanoseconds.

With this mode, driver adjusts both the HW and timecounter (to keep
clock_info_page updated) using callbacks: adjfreq, adjtime and settime.
HW clock modifications are done via MTUTC access reg commands. Driver is
allowed to modify HW real time clock only if MCAM ptpcyc2realtime_modify
capability is set.

Add MTUTC set function to be used for configuring the HW real time
clock. Modify existing code to support both internal timer (with
conversion via timecounter_cyc2time() and real time (no conversions).

Align the signatures of the helpers converting from timestamp to
nanoseconds. With that, when allocating a queue assign the corresponding
callback with respect to the capability.

Adjust 1PPS timestamp calculation flows based on the timestamp mode.

Cyc2time offload brings two major advantages:
- Improve MTAE (Max Time Absolute Error) for HW TS by up to 160 ns over a
  100% loaded CPU.
- Faster data-path timestamp to nanoseconds, as translation is
  lock-less and done in HW.

On real time mode, timestamp format is 32 high bits of seconds and 32
low bits of nanoseconds. On some flows, driver shall convert this format
into nanoseconds wall-clock with REAL_TIME_TO_NS macro.

HW supports a single clock, and it is shared by all functions on a
device. In case real time clock is used, it is recommended to use
a single GM to all device's functions.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Move some PPS logic into helper functions
Eran Ben Elisha [Fri, 12 Feb 2021 22:30:41 +0000 (14:30 -0800)]
net/mlx5: Move some PPS logic into helper functions

Some of PPS logic (timestamp calculations) fits only internal timer
timestamp mode. Move these logics into helper functions. Later in the
patchset cyc2time HW translation mode will expose its own PPS timestamp
calculations.

With this change, main flow will only hold calling PPS logic based on run
time mode.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Move all internal timer metadata into a dedicated struct
Eran Ben Elisha [Fri, 12 Feb 2021 22:30:40 +0000 (14:30 -0800)]
net/mlx5: Move all internal timer metadata into a dedicated struct

Internal timer mode (SW clock) requires some PTP clock related metadata
structs. Real time mode (HW clock) will not need these metadata structs.
This separation emphasize the different interfaces for HW clock and SW
clock.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Aya Levin <ayal@nvidia.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Refactor init clock function
Eran Ben Elisha [Fri, 12 Feb 2021 22:30:39 +0000 (14:30 -0800)]
net/mlx5: Refactor init clock function

Function mlx5_init_clock() is responsible for internal PTP related metadata
initializations. Break mlx5_init_clock() to sub functions, each takes care
of its own logic.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agonet/mlx5: Add register layout to support real-time time-stamp
Eran Ben Elisha [Fri, 12 Feb 2021 22:30:38 +0000 (14:30 -0800)]
net/mlx5: Add register layout to support real-time time-stamp

Add needed structure layouts and defines for MTUTC (Management UTC)
register. MTUTC will be used for cyc2time HW translation.

In addition, add cyc2time modify capability bit and init segment HCA
real time address.

Finally, add capability bits indicating which time-stamping format is
supported per SQ and RQ. Add ts_format in the queue's context layout to
allow configuration.

Signed-off-by: Eran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: Aya Levin <ayal@nvidia.com>
Reviewed-by: Moshe Shemesh <moshe@mellanox.com>
Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
3 years agoMerge branch 'Fix-buggy-brport-flags-offload-for-SJA1105-DSA'
David S. Miller [Tue, 16 Feb 2021 22:02:46 +0000 (14:02 -0800)]
Merge branch 'Fix-buggy-brport-flags-offload-for-SJA1105-DSA'

Vladimir Oltean says:

====================
Fix buggy brport flags offload for SJA1105 DSA

I am resending this series because the title and the patches were mixed
up and these patches were lost. This series' cover letter was used as
the merge commit for the unrelated "Fixing build breakage after "Merge
branch 'Propagate-extack-for-switchdev-LANs-from-DSA'"" series, as can
be seen below:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=ca04422afd6998611a81d0ea1b61d5a5f4923f84

while the actual patches from the "Fix buggy brport flags offload for
SJA1105 DSA" series were marked as superseded and not applied:
https://patchwork.kernel.org/project/netdevbpf/cover/20210214155704.1784220-1-olteanv@gmail.com/
which they should have.

I know with so many bugs I introduced it's hard to keep track, I'm sorry.

Original series description:

While testing software bridging on sja1105, I discovered that I managed
to introduce two bugs in a single patch submitted recently to net-next.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
3 years agonet: dsa: sja1105: fix leakage of flooded frames outside bridging domain
Vladimir Oltean [Tue, 16 Feb 2021 11:41:19 +0000 (13:41 +0200)]
net: dsa: sja1105: fix leakage of flooded frames outside bridging domain

Quite embarrasingly, I managed to fool myself into thinking that the
flooding domain of sja1105 source ports is restricted by the forwarding
domain, which it isn't. Frames which match an FDB entry are forwarded
towards that entry's DESTPORTS restricted by REACH_PORT[SRC_PORT], while
frames that don't match any FDB entry are forwarded towards
FL_DOMAIN[SRC_PORT] or BC_DOMAIN[SRC_PORT].

This means we can't get away with doing the simple thing, and we must
manage the flooding domain ourselves such that it is restricted by the
forwarding domain. This new function must be called from the
.port_bridge_join and .port_bridge_leave methods too, not just from
.port_bridge_flags as we did before.

Fixes: 4d9423549501 ("net: dsa: sja1105: offload bridge port flags to device")
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>