[pve-kernel.git] / patches / kernel / 0018-sched-core-Drop-spinlocks-on-contention-iff-kernel-i.patch

From 39f2bfe0177d3f56c9feac4e70424e4952949e2a Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Wed, 10 Jan 2024 13:47:23 -0800
Subject: [PATCH] sched/core: Drop spinlocks on contention iff kernel is
 preemptible

Use preempt_model_preemptible() to detect a preemptible kernel when
deciding whether or not to reschedule in order to drop a contended
spinlock or rwlock.  Because PREEMPT_DYNAMIC selects PREEMPTION, kernels
built with PREEMPT_DYNAMIC=y will yield contended locks even if the live
preemption model is "none" or "voluntary".  In short, make kernels with
dynamically selected models behave the same as kernels with statically
selected models.

Somewhat counter-intuitively, NOT yielding a lock can provide better
latency for the relevant tasks/processes.  E.g. KVM x86's mmu_lock, a
rwlock, is often contended between an invalidation event (takes mmu_lock
for write) and a vCPU servicing a guest page fault (takes mmu_lock for
read).  For _some_ setups, letting the invalidation task complete even
if there is mmu_lock contention provides lower latency for *all* tasks,
i.e. the invalidation completes sooner *and* the vCPU services the guest
page fault sooner.

But even KVM's mmu_lock behavior isn't uniform, e.g. the "best" behavior
can vary depending on the host VMM, the guest workload, the number of
vCPUs, the number of pCPUs in the host, why there is lock contention, etc.

In other words, simply deleting the CONFIG_PREEMPTION guard (or doing the
opposite and removing contention yielding entirely) needs to come with a
big pile of data proving that changing the status quo is a net positive.

Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Marco Elver <elver@google.com>
Cc: Frederic Weisbecker <frederic@kernel.org>
Cc: David Matlack <dmatlack@google.com>
Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 include/linux/sched.h | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 292c31697248..a274bc85f222 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2234,11 +2234,10 @@ static inline bool preempt_model_preemptible(void)
  */
 static inline int spin_needbreak(spinlock_t *lock)
 {
-#ifdef CONFIG_PREEMPTION
+	if (!preempt_model_preemptible())
+		return 0;
+
 	return spin_is_contended(lock);
-#else
-	return 0;
-#endif
 }
 
 /*
@@ -2251,11 +2250,10 @@ static inline int spin_needbreak(spinlock_t *lock)
  */
 static inline int rwlock_needbreak(rwlock_t *lock)
 {
-#ifdef CONFIG_PREEMPTION
+	if (!preempt_model_preemptible())
+		return 0;
+
 	return rwlock_is_contended(lock);
-#else
-	return 0;
-#endif
 }
 
 static __always_inline bool need_resched(void)
-- 
2.39.2
Commit	Line	Data
29cb6fcb FW	1	From 39f2bfe0177d3f56c9feac4e70424e4952949e2a Mon Sep 17 00:00:00 2001
	2	From: Sean Christopherson <seanjc@google.com>
	3	Date: Wed, 10 Jan 2024 13:47:23 -0800
	4	Subject: [PATCH] sched/core: Drop spinlocks on contention iff kernel is
	5	preemptible
	6
	7	Use preempt_model_preemptible() to detect a preemptible kernel when
	8	deciding whether or not to reschedule in order to drop a contended
	9	spinlock or rwlock. Because PREEMPT_DYNAMIC selects PREEMPTION, kernels
	10	built with PREEMPT_DYNAMIC=y will yield contended locks even if the live
	11	preemption model is "none" or "voluntary". In short, make kernels with
	12	dynamically selected models behave the same as kernels with statically
	13	selected models.
	14
	15	Somewhat counter-intuitively, NOT yielding a lock can provide better
	16	latency for the relevant tasks/processes. E.g. KVM x86's mmu_lock, a
	17	rwlock, is often contended between an invalidation event (takes mmu_lock
	18	for write) and a vCPU servicing a guest page fault (takes mmu_lock for
	19	read). For _some_ setups, letting the invalidation task complete even
	20	if there is mmu_lock contention provides lower latency for all tasks,
	21	i.e. the invalidation completes sooner and the vCPU services the guest
	22	page fault sooner.
	23
	24	But even KVM's mmu_lock behavior isn't uniform, e.g. the "best" behavior
	25	can vary depending on the host VMM, the guest workload, the number of
	26	vCPUs, the number of pCPUs in the host, why there is lock contention, etc.
	27
	28	In other words, simply deleting the CONFIG_PREEMPTION guard (or doing the
	29	opposite and removing contention yielding entirely) needs to come with a
	30	big pile of data proving that changing the status quo is a net positive.
	31
	32	Cc: Valentin Schneider <valentin.schneider@arm.com>
	33	Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
	34	Cc: Marco Elver <elver@google.com>
	35	Cc: Frederic Weisbecker <frederic@kernel.org>
	36	Cc: David Matlack <dmatlack@google.com>
	37	Signed-off-by: Sean Christopherson <seanjc@google.com>
	38	---
	39	include/linux/sched.h \| 14 ++++++--------
	40	1 file changed, 6 insertions(+), 8 deletions(-)
	41
	42	diff --git a/include/linux/sched.h b/include/linux/sched.h
	43	index 292c31697248..a274bc85f222 100644
	44	--- a/include/linux/sched.h
	45	+++ b/include/linux/sched.h
	46	@@ -2234,11 +2234,10 @@ static inline bool preempt_model_preemptible(void)
	47	*/
	48	static inline int spin_needbreak(spinlock_t *lock)
	49	{
	50	-#ifdef CONFIG_PREEMPTION
	51	+ if (!preempt_model_preemptible())
	52	+ return 0;
	53	+
	54	return spin_is_contended(lock);
	55	-#else
	56	- return 0;
	57	-#endif
	58	}
	59
	60	/*
	61	@@ -2251,11 +2250,10 @@ static inline int spin_needbreak(spinlock_t *lock)
	62	*/
	63	static inline int rwlock_needbreak(rwlock_t *lock)
	64	{
65	-#ifdef CONFIG_PREEMPTION
66	+ if (!preempt_model_preemptible())
67	+ return 0;
68	+
69	return rwlock_is_contended(lock);
70	-#else
71	- return 0;
72	-#endif
73	}
74
75	static __always_inline bool need_resched(void)
76	--
77	2.39.2
78