From nobody Mon Feb  9 05:05:20 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id F22AEC4167B
	for <linux-kernel@archiver.kernel.org>; Thu, 14 Dec 2023 02:47:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1442871AbjLNCrb (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 13 Dec 2023 21:47:31 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41954 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S234158AbjLNCr3 (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 13 Dec 2023 21:47:29 -0500
Received: from mail-qt1-x829.google.com (mail-qt1-x829.google.com
 [IPv6:2607:f8b0:4864:20::829])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 201A7E8
        for <linux-kernel@vger.kernel.org>;
 Wed, 13 Dec 2023 18:47:35 -0800 (PST)
Received: by mail-qt1-x829.google.com with SMTP id
 d75a77b69052e-4259295ca72so50009021cf.1
        for <linux-kernel@vger.kernel.org>;
 Wed, 13 Dec 2023 18:47:35 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bitbyteword.org; s=google; t=1702522054; x=1703126854;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=sLym5Gc0MfszdRLPykujrkvWzHKTsOdv87aqMVbJcZE=;
        b=CtpZW+yfFmmldMBNhlbaLH+UqNvskT+bCvTO/4T3yCTZHcGRZYj1i54/6BHcZxEqJc
         aGR2wEDsLId9s0qfioq4oRslrKaMVLXTEwwbijbZFYkOUweXR2hIngkRGoeph1j7uhGb
         yPXXn/UEAlblTSLRON+HKWlI769TUilzwvf5hJdso4ggGFQ1S1Cz4FAhvilhlJJ0ps9b
         8EY41g+UrExAwfNK78+T68Ki1EWFT3yqHA46FisuMtgn95jHMgkECgwI6z3GS2oKM6Ek
         42cD+q0F5jJYE0+I/FJUo76MAFx+0kD8jSw+BjSc/A+BI95IZQ42DCYsaF8e/EqJkhHW
         3idg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1702522054; x=1703126854;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=sLym5Gc0MfszdRLPykujrkvWzHKTsOdv87aqMVbJcZE=;
        b=H1v9SUJGZSL3vIlqWrr4D8FEqfOaEytZdVcJrBdtX/XztMasoCvPadcCv+OnJtjBUS
         PAtmeFJBEjElhZzgxFaN3R053IyPjoksmQriI32hygfma/bLaCuRt3EYJyy2R467Vr/F
         r15Fum95z7pamRv2Z5vGLLLMIO+Xz3c4wh0uoDF0W3PSAP/IY/hp7brZT84SjJehDeGz
         KLAxRFK45wzlQddvemr+gh/5/wyNjb+fFTP107cauaMvpVEnAIm+08V/GMhSHpZlFHSI
         UINpsOUhBLrRNY3WTE8i9xwfHBXcONpbgYs+24ZL2yj6orW4aKfIdYvDqGobdh2N6tu+
         wBIA==
X-Gm-Message-State: AOJu0YzpJwRUTzUWHYxnW77Tn+Hce8IYZsSZKNJ7ih8358pMx/1X5+52
        LpaT1AUsvRcUzZtOUtFmdXG94A==
X-Google-Smtp-Source: 
 AGHT+IGSYInxhET4tbJoX6IZB5Ibc+UY+fKm4fU+fM1gek+LhRpmFkRDuIWe4shEdWu5gr3DwGZmpA==
X-Received: by 2002:a05:622a:490:b0:425:4043:763a with SMTP id
 p16-20020a05622a049000b004254043763amr12929180qtx.98.1702522054204;
        Wed, 13 Dec 2023 18:47:34 -0800 (PST)
Received: from vinp3lin.lan (c-73-143-21-186.hsd1.vt.comcast.net.
 [73.143.21.186])
        by smtp.gmail.com with ESMTPSA id
 fh3-20020a05622a588300b00425b356b919sm4240208qtb.55.2023.12.13.18.47.32
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 13 Dec 2023 18:47:33 -0800 (PST)
From: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>
To: Ben Segall <bsegall@google.com>, Borislav Petkov <bp@alien8.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        "H . Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
        Juri Lelli <juri.lelli@redhat.com>,
        Mel Gorman <mgorman@suse.de>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Andy Lutomirski <luto@kernel.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Sean Christopherson <seanjc@google.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Valentin Schneider <vschneid@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>
Cc: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>,
        Suleiman Souhlal <suleiman@google.com>,
        Masami Hiramatsu <mhiramat@google.com>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, x86@kernel.org,
        Joel Fernandes <joel@joelfernandes.org>
Subject: [RFC PATCH 1/8] kvm: x86: MSR for setting up scheduler info shared
 memory
Date: Wed, 13 Dec 2023 21:47:18 -0500
Message-ID: <20231214024727.3503870-2-vineeth@bitbyteword.org>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20231214024727.3503870-1-vineeth@bitbyteword.org>
References: <20231214024727.3503870-1-vineeth@bitbyteword.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Implement a kvm MSR that guest uses to provide the GPA of shared memory
for communicating the scheduling information between host and guest.

wrmsr(0) disables the feature. wrmsr(valid_gpa) enables the feature and
uses the gpa for further communication.

Also add a new cpuid feature flag for the host to advertise the feature
to the guest.

Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
---
 arch/x86/include/asm/kvm_host.h      | 25 ++++++++++++
 arch/x86/include/uapi/asm/kvm_para.h | 24 +++++++++++
 arch/x86/kvm/Kconfig                 | 12 ++++++
 arch/x86/kvm/cpuid.c                 |  2 +
 arch/x86/kvm/x86.c                   | 61 ++++++++++++++++++++++++++++
 include/linux/kvm_host.h             |  5 +++
 6 files changed, 129 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index f72b30d2238a..f89ba1f07d88 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -987,6 +987,18 @@ struct kvm_vcpu_arch {
 	/* Protected Guests */
 	bool guest_state_protected;
=20
+#ifdef CONFIG_PARAVIRT_SCHED_KVM
+	/*
+	 * MSR to setup a shared memory for scheduling
+	 * information sharing between host and guest.
+	 */
+	struct {
+		enum kvm_vcpu_boost_state boost_status;
+		u64 msr_val;
+		struct gfn_to_hva_cache data;
+	} pv_sched;
+#endif
+
 	/*
 	 * Set when PDPTS were loaded directly by the userspace without
 	 * reading the guest memory
@@ -2217,4 +2229,17 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot,=
 unsigned long npages);
  */
 #define KVM_EXIT_HYPERCALL_MBZ		GENMASK_ULL(31, 1)
=20
+#ifdef CONFIG_PARAVIRT_SCHED_KVM
+static inline bool kvm_arch_vcpu_pv_sched_enabled(struct kvm_vcpu_arch *ar=
ch)
+{
+	return arch->pv_sched.msr_val;
+}
+
+static inline void kvm_arch_vcpu_set_boost_status(struct kvm_vcpu_arch *ar=
ch,
+		enum kvm_vcpu_boost_state boost_status)
+{
+	arch->pv_sched.boost_status =3D boost_status;
+}
+#endif
+
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/a=
sm/kvm_para.h
index 6e64b27b2c1e..6b1dea07a563 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -36,6 +36,7 @@
 #define KVM_FEATURE_MSI_EXT_DEST_ID	15
 #define KVM_FEATURE_HC_MAP_GPA_RANGE	16
 #define KVM_FEATURE_MIGRATION_CONTROL	17
+#define KVM_FEATURE_PV_SCHED		18
=20
 #define KVM_HINTS_REALTIME      0
=20
@@ -58,6 +59,7 @@
 #define MSR_KVM_ASYNC_PF_INT	0x4b564d06
 #define MSR_KVM_ASYNC_PF_ACK	0x4b564d07
 #define MSR_KVM_MIGRATION_CONTROL	0x4b564d08
+#define MSR_KVM_PV_SCHED	0x4b564da0
=20
 struct kvm_steal_time {
 	__u64 steal;
@@ -150,4 +152,26 @@ struct kvm_vcpu_pv_apf_data {
 #define KVM_PV_EOI_ENABLED KVM_PV_EOI_MASK
 #define KVM_PV_EOI_DISABLED 0x0
=20
+/*
+ * VCPU boost state shared between the host and guest.
+ */
+enum kvm_vcpu_boost_state {
+	/* Priority boosting feature disabled in host */
+	VCPU_BOOST_DISABLED =3D 0,
+	/*
+	 * vcpu is not explicitly boosted by the host.
+	 * (Default priority when the guest started)
+	 */
+	VCPU_BOOST_NORMAL,
+	/* vcpu is boosted by the host */
+	VCPU_BOOST_BOOSTED
+};
+
+/*
+ * Structure passed in via MSR_KVM_PV_SCHED
+ */
+struct pv_sched_data {
+	__u64 boost_status;
+};
+
 #endif /* _UAPI_ASM_X86_KVM_PARA_H */
diff --git a/arch/x86/kvm/Kconfig b/arch/x86/kvm/Kconfig
index 89ca7f4c1464..dbcba73fb508 100644
--- a/arch/x86/kvm/Kconfig
+++ b/arch/x86/kvm/Kconfig
@@ -141,4 +141,16 @@ config KVM_XEN
 config KVM_EXTERNAL_WRITE_TRACKING
 	bool
=20
+config PARAVIRT_SCHED_KVM
+	bool "Enable paravirt scheduling capability for kvm"
+	depends on KVM
+	help
+	  Paravirtualized scheduling facilitates the exchange of scheduling
+	  related information between the host and guest through shared memory,
+	  enhancing the efficiency of vCPU thread scheduling by the hypervisor.
+	  An illustrative use case involves dynamically boosting the priority of
+	  a vCPU thread when the guest is executing a latency-sensitive workload
+	  on that specific vCPU.
+	  This config enables paravirt scheduling in the kvm hypervisor.
+
 endif # VIRTUALIZATION
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 7bdc66abfc92..960ef6e869f2 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -1113,6 +1113,8 @@ static inline int __do_cpuid_func(struct kvm_cpuid_ar=
ray *array, u32 function)
 			     (1 << KVM_FEATURE_POLL_CONTROL) |
 			     (1 << KVM_FEATURE_PV_SCHED_YIELD) |
 			     (1 << KVM_FEATURE_ASYNC_PF_INT);
+		if (IS_ENABLED(CONFIG_PARAVIRT_SCHED_KVM))
+			entry->eax |=3D (1 << KVM_FEATURE_PV_SCHED);
=20
 		if (sched_info_on())
 			entry->eax |=3D (1 << KVM_FEATURE_STEAL_TIME);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 7bcf1a76a6ab..0f475b50ac83 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -3879,6 +3879,33 @@ int kvm_set_msr_common(struct kvm_vcpu *vcpu, struct=
 msr_data *msr_info)
 			return 1;
 		break;
=20
+#ifdef CONFIG_PARAVIRT_SCHED_KVM
+	case MSR_KVM_PV_SCHED:
+		if (!guest_pv_has(vcpu, KVM_FEATURE_PV_SCHED))
+			return 1;
+
+		if (!(data & KVM_MSR_ENABLED))
+			break;
+
+		if (!(data & ~KVM_MSR_ENABLED)) {
+			/*
+			 * Disable the feature
+			 */
+			vcpu->arch.pv_sched.msr_val =3D 0;
+			kvm_set_vcpu_boosted(vcpu, false);
+		} if (!kvm_gfn_to_hva_cache_init(vcpu->kvm,
+				&vcpu->arch.pv_sched.data, data & ~KVM_MSR_ENABLED,
+				sizeof(struct pv_sched_data))) {
+			vcpu->arch.pv_sched.msr_val =3D data;
+			kvm_set_vcpu_boosted(vcpu, false);
+		} else {
+			pr_warn("MSR_KVM_PV_SCHED: kvm:%p, vcpu:%p, "
+				"msr value: %llx, kvm_gfn_to_hva_cache_init failed!\n",
+				vcpu->kvm, vcpu, data & ~KVM_MSR_ENABLED);
+		}
+		break;
+#endif
+
 	case MSR_KVM_POLL_CONTROL:
 		if (!guest_pv_has(vcpu, KVM_FEATURE_POLL_CONTROL))
 			return 1;
@@ -4239,6 +4266,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, struct=
 msr_data *msr_info)
=20
 		msr_info->data =3D vcpu->arch.pv_eoi.msr_val;
 		break;
+#ifdef CONFIG_PARAVIRT_SCHED_KVM
+	case MSR_KVM_PV_SCHED:
+		msr_info->data =3D vcpu->arch.pv_sched.msr_val;
+		break;
+#endif
 	case MSR_KVM_POLL_CONTROL:
 		if (!guest_pv_has(vcpu, KVM_FEATURE_POLL_CONTROL))
 			return 1;
@@ -9820,6 +9852,29 @@ static int complete_hypercall_exit(struct kvm_vcpu *=
vcpu)
 	return kvm_skip_emulated_instruction(vcpu);
 }
=20
+#ifdef CONFIG_PARAVIRT_SCHED_KVM
+static void record_vcpu_boost_status(struct kvm_vcpu *vcpu)
+{
+	u64 val =3D vcpu->arch.pv_sched.boost_status;
+
+	if (!kvm_arch_vcpu_pv_sched_enabled(&vcpu->arch))
+		return;
+
+	pagefault_disable();
+	kvm_write_guest_offset_cached(vcpu->kvm, &vcpu->arch.pv_sched.data,
+		&val, offsetof(struct pv_sched_data, boost_status), sizeof(u64));
+	pagefault_enable();
+}
+
+void kvm_set_vcpu_boosted(struct kvm_vcpu *vcpu, bool boosted)
+{
+	kvm_arch_vcpu_set_boost_status(&vcpu->arch,
+			boosted ? VCPU_BOOST_BOOSTED : VCPU_BOOST_NORMAL);
+
+	kvm_make_request(KVM_REQ_VCPU_BOOST_UPDATE, vcpu);
+}
+#endif
+
 int kvm_emulate_hypercall(struct kvm_vcpu *vcpu)
 {
 	unsigned long nr, a0, a1, a2, a3, ret;
@@ -10593,6 +10648,12 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 		}
 		if (kvm_check_request(KVM_REQ_STEAL_UPDATE, vcpu))
 			record_steal_time(vcpu);
+
+#ifdef CONFIG_PARAVIRT_SCHED_KVM
+		if (kvm_check_request(KVM_REQ_VCPU_BOOST_UPDATE, vcpu))
+			record_vcpu_boost_status(vcpu);
+#endif
+
 #ifdef CONFIG_KVM_SMM
 		if (kvm_check_request(KVM_REQ_SMI, vcpu))
 			process_smi(vcpu);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index 9d3ac7720da9..a74aeea55347 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -167,6 +167,7 @@ static inline bool is_error_page(struct page *page)
 #define KVM_REQ_VM_DEAD			(1 | KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_UNBLOCK			2
 #define KVM_REQ_DIRTY_RING_SOFT_FULL	3
+#define KVM_REQ_VCPU_BOOST_UPDATE	6
 #define KVM_REQUEST_ARCH_BASE		8
=20
 /*
@@ -2287,4 +2288,8 @@ static inline void kvm_account_pgtable_pages(void *vi=
rt, int nr)
 /* Max number of entries allowed for each kvm dirty ring */
 #define  KVM_DIRTY_RING_MAX_ENTRIES  65536
=20
+#ifdef CONFIG_PARAVIRT_SCHED_KVM
+void kvm_set_vcpu_boosted(struct kvm_vcpu *vcpu, bool boosted);
+#endif
+
 #endif
--=20
2.43.0
From nobody Mon Feb  9 05:05:20 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 2A7EAC4167B
	for <linux-kernel@archiver.kernel.org>; Thu, 14 Dec 2023 02:47:42 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1442886AbjLNCrd (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 13 Dec 2023 21:47:33 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:41972 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S234176AbjLNCra (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 13 Dec 2023 21:47:30 -0500
Received: from mail-qt1-x82a.google.com (mail-qt1-x82a.google.com
 [IPv6:2607:f8b0:4864:20::82a])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8F399F7
        for <linux-kernel@vger.kernel.org>;
 Wed, 13 Dec 2023 18:47:36 -0800 (PST)
Received: by mail-qt1-x82a.google.com with SMTP id
 d75a77b69052e-4258026a9fdso46758941cf.0
        for <linux-kernel@vger.kernel.org>;
 Wed, 13 Dec 2023 18:47:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bitbyteword.org; s=google; t=1702522055; x=1703126855;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=lIavbINZh5nYTp8Oo+UkDFK4yofTsKFYo/9aAWUfbh4=;
        b=pyaRSrZE9bUbL1v6wRSTh/KrGJhVI/H81tqlpUnSBayz57AhzMTxw5v7ZKAw2Qrbm4
         8/1K/8T10ZhhTHu4BcZqfttWcaylyFiHInRdINWJtnoaBoJDQW0DWkk8qS4enaJO5zZ3
         xrGMo4CskepCFD/mNh7c+6AZ961di3QTiuzBnAB41nTbCqr1sRfcXuNjYaMoK9t0VRd4
         JGq6CI6L7jmAIxKhMYJBFHL+Q/2lz2pKlI0tcAwsV+ds+HHJeo6EAjPxVaiIGp5F1+P9
         v5ONl1fej1S+bJWAAy84U2xfxD67IVTP+PHXvWufjaPiVLiEl+sCgzVANIX/MCs+vE1r
         Qj4A==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1702522055; x=1703126855;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=lIavbINZh5nYTp8Oo+UkDFK4yofTsKFYo/9aAWUfbh4=;
        b=rEHqJM/M8MNU64I5ouMChREwUZazxLQr9jW3lAb6x4TDF5qe/PQ+V6tjy1ri1mgiyS
         AyQ1693vK1F0Vwgu7qMV9gVNtDpmCrPiDb0sp6ZqeD2+TMlwJWTkr7GWAc6+GpgBPWpX
         RvYKQuMUvtOs/WmQs45W+DLBsKVWo7hujuBzKTNQdxa/jo7wtTllqYhAc40Van8JN0n3
         UfR0CFFx6CEfXUUrGSXxvPbijq15FO/t1MoGZMqK3n8mc5NK8ky1fIcEfF3gEm7+DauJ
         bwhwoCgC8qpGsfWvyhSmxoPjKqnbfCK0LC2XjFM3ZZlas6yea1VDH+M+zbdLTtsHgpfO
         2ITg==
X-Gm-Message-State: AOJu0YzLBdydomF/xFMEltvT3BFLjgPOI/6aE5M5qfxT7QIHKVnBP7Dn
        a5Sy3NZUArJ5tfQoXLo4etTNTg==
X-Google-Smtp-Source: 
 AGHT+IH6u1GHrME2lBmJJpVZSxUPVZ6fOdoMo5fRDVqA+ScERzhBmZ8B0kv7EEMH2wDjJzf7yiNE+g==
X-Received: by 2002:a05:622a:8d:b0:425:4043:29fc with SMTP id
 o13-20020a05622a008d00b00425404329fcmr12342292qtw.119.1702522055644;
        Wed, 13 Dec 2023 18:47:35 -0800 (PST)
Received: from vinp3lin.lan (c-73-143-21-186.hsd1.vt.comcast.net.
 [73.143.21.186])
        by smtp.gmail.com with ESMTPSA id
 fh3-20020a05622a588300b00425b356b919sm4240208qtb.55.2023.12.13.18.47.34
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 13 Dec 2023 18:47:35 -0800 (PST)
From: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>
To: Ben Segall <bsegall@google.com>, Borislav Petkov <bp@alien8.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        "H . Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
        Juri Lelli <juri.lelli@redhat.com>,
        Mel Gorman <mgorman@suse.de>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Andy Lutomirski <luto@kernel.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Sean Christopherson <seanjc@google.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Valentin Schneider <vschneid@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>
Cc: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>,
        Suleiman Souhlal <suleiman@google.com>,
        Masami Hiramatsu <mhiramat@google.com>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, x86@kernel.org,
        Joel Fernandes <joel@joelfernandes.org>
Subject: [RFC PATCH 2/8] sched/core: sched_setscheduler_pi_nocheck for
 interrupt context usage
Date: Wed, 13 Dec 2023 21:47:19 -0500
Message-ID: <20231214024727.3503870-3-vineeth@bitbyteword.org>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20231214024727.3503870-1-vineeth@bitbyteword.org>
References: <20231214024727.3503870-1-vineeth@bitbyteword.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

__sched_setscheduler takes an argument 'pi' so as to allow its usage in
interrupt context when PI is not used. But this is not exported and
cannot be used outside of scheduler code. sched_setscheduler_nocheck is
exported but it doesn't allow that flexibility.

Introduce sched_setscheduler_pi_nocheck to allow for the flexibility to
call from interrupt context

Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
---
 include/linux/sched.h |  2 ++
 kernel/sched/core.c   | 34 +++++++++++++++++++++++++++++++---
 2 files changed, 33 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 609bde814cb0..de7382f149cf 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1908,6 +1908,8 @@ extern int idle_cpu(int cpu);
 extern int available_idle_cpu(int cpu);
 extern int sched_setscheduler(struct task_struct *, int, const struct sche=
d_param *);
 extern int sched_setscheduler_nocheck(struct task_struct *, int, const str=
uct sched_param *);
+extern int sched_setscheduler_pi_nocheck(struct task_struct *p, int policy,
+		const struct sched_param *sp, bool pi);
 extern void sched_set_fifo(struct task_struct *p);
 extern void sched_set_fifo_low(struct task_struct *p);
 extern void sched_set_normal(struct task_struct *p, int nice);
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index e8f73ff12126..b47f72b6595f 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -7850,8 +7850,8 @@ static int __sched_setscheduler(struct task_struct *p,
 	return retval;
 }
=20
-static int _sched_setscheduler(struct task_struct *p, int policy,
-			       const struct sched_param *param, bool check)
+static int _sched_setscheduler_pi(struct task_struct *p, int policy,
+			       const struct sched_param *param, bool check, bool pi)
 {
 	struct sched_attr attr =3D {
 		.sched_policy   =3D policy,
@@ -7866,8 +7866,15 @@ static int _sched_setscheduler(struct task_struct *p=
, int policy,
 		attr.sched_policy =3D policy;
 	}
=20
-	return __sched_setscheduler(p, &attr, check, true);
+	return __sched_setscheduler(p, &attr, check, pi);
+}
+
+static inline int _sched_setscheduler(struct task_struct *p, int policy,
+			       const struct sched_param *param, bool check)
+{
+	return _sched_setscheduler_pi(p, policy, param, check, true);
 }
+
 /**
  * sched_setscheduler - change the scheduling policy and/or RT priority of=
 a thread.
  * @p: the task in question.
@@ -7916,6 +7923,27 @@ int sched_setscheduler_nocheck(struct task_struct *p=
, int policy,
 	return _sched_setscheduler(p, policy, param, false);
 }
=20
+/**
+ * sched_setscheduler_pi_nocheck - change the scheduling policy and/or RT =
priority of a thread from kernelspace.
+ * @p: the task in question.
+ * @policy: new policy.
+ * @param: structure containing the new RT priority.
+ * @pi: boolean flag stating if pi validation needs to be performed.
+ *
+ * A flexible version of sched_setcheduler_nocheck which allows for specif=
ying
+ * whether PI context validation needs to be done or not. set_scheduler_no=
check
+ * is not allowed in interrupt context as it assumes that PI is used.
+ * This function allows interrupt context call by specifying pi =3D false.
+ *
+ * Return: 0 on success. An error code otherwise.
+ */
+int sched_setscheduler_pi_nocheck(struct task_struct *p, int policy,
+			       const struct sched_param *param, bool pi)
+{
+	return _sched_setscheduler_pi(p, policy, param, false, pi);
+}
+EXPORT_SYMBOL_GPL(sched_setscheduler_pi_nocheck);
+
 /*
  * SCHED_FIFO is a broken scheduler model; that is, it is fundamentally
  * incapable of resource management, which is the one thing an OS really s=
hould
--=20
2.43.0
From nobody Mon Feb  9 05:05:20 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 22B17C4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 14 Dec 2023 02:47:50 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S234324AbjLNCrl (ORCPT <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 13 Dec 2023 21:47:41 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58444 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1442926AbjLNCrd (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 13 Dec 2023 21:47:33 -0500
Received: from mail-qt1-x82f.google.com (mail-qt1-x82f.google.com
 [IPv6:2607:f8b0:4864:20::82f])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1B359F7
        for <linux-kernel@vger.kernel.org>;
 Wed, 13 Dec 2023 18:47:38 -0800 (PST)
Received: by mail-qt1-x82f.google.com with SMTP id
 d75a77b69052e-4259c7dfb63so39671851cf.1
        for <linux-kernel@vger.kernel.org>;
 Wed, 13 Dec 2023 18:47:38 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bitbyteword.org; s=google; t=1702522057; x=1703126857;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=Ekpxaaikq3I9HLHBaO2l2iBuB2axaTEVCYm2OaBnRno=;
        b=p3sofx11abKQooywzYyPteiIcvfZ7riV6Aa6EAwPAQh2fYduZiYUIQTcgzo9jkag4Q
         c2PcTJhu+0ladh5/S1DKTG4fR+EXb9NsMKAW+sMknOUoJm3M9/22lPZMazM4ZbjFV1ky
         pybaJtrYmnk5DCOlE9NxxYTlO57dI/4C4P9FD1MnGpW8eD6KuDJw0vUjpiuooUZwvam1
         a4Xmh1aiEF9L+VCosq/3jvK9HkzTNSnzvuYKylw/cqyNyO/CbURge1csoT7K1+9EU2w1
         iv6ChdYfUMBMsdKc/hLTMZqT/83uN8jZ51RFy6jBspW/tioOo8cWjdjIe6xGIgzR4jUB
         hJFA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1702522057; x=1703126857;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=Ekpxaaikq3I9HLHBaO2l2iBuB2axaTEVCYm2OaBnRno=;
        b=PsRyrilm19HsFEDRn3RJXPmdPRhc7ceiiQ6XFQWNVTsQxuNR9Rw4W19c5O1pA6/YD7
         DtdUZqWsJJEIueEtipTu34e6IMCvRIqlqcyAMfNYUmUl+s75+/0vAY7Nqu5HBAE1oVOs
         XXgWxhuLa4kv6etKegNpgLdy2cEYe/XJARiJ0zNqbv+MtmIn+324Hwg710SQw1FwBaeW
         usGOfuEkER6n/+PP4GjkNhSgYiYMUQJs5RY4sh+YQxUKCHjQcVcvabFCWs2pPLqkuWHb
         Rx3b4Ngy4TRP+ZyZJ/RJvCPlkZyqbhdzvbTfBO0PA8O2l9/tn79x54V+ZgCS8bnkRLZm
         Z/LA==
X-Gm-Message-State: AOJu0YzldOgZQnS5mkERWga14dnDkFNK1eHpngar2IH+pxoDiSB33lXD
        HfyDXbEaZJ8K1mQ/mZ3zlnq6dA==
X-Google-Smtp-Source: 
 AGHT+IFvU0N0QxWo4eV6lx4Bbxuz57j3TWF3PwmsDngRIU5vBkxPxZDnBO+vqLYngLOzGAuFoeRARQ==
X-Received: by 2002:a05:622a:10a:b0:423:b290:e7e6 with SMTP id
 u10-20020a05622a010a00b00423b290e7e6mr12537997qtw.37.1702522057118;
        Wed, 13 Dec 2023 18:47:37 -0800 (PST)
Received: from vinp3lin.lan (c-73-143-21-186.hsd1.vt.comcast.net.
 [73.143.21.186])
        by smtp.gmail.com with ESMTPSA id
 fh3-20020a05622a588300b00425b356b919sm4240208qtb.55.2023.12.13.18.47.35
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 13 Dec 2023 18:47:36 -0800 (PST)
From: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>
To: Ben Segall <bsegall@google.com>, Borislav Petkov <bp@alien8.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        "H . Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
        Juri Lelli <juri.lelli@redhat.com>,
        Mel Gorman <mgorman@suse.de>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Andy Lutomirski <luto@kernel.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Sean Christopherson <seanjc@google.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Valentin Schneider <vschneid@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>
Cc: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>,
        Suleiman Souhlal <suleiman@google.com>,
        Masami Hiramatsu <mhiramat@google.com>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, x86@kernel.org,
        Joel Fernandes <joel@joelfernandes.org>
Subject: [RFC PATCH 3/8] kvm: x86: vcpu boosting/unboosting framework
Date: Wed, 13 Dec 2023 21:47:20 -0500
Message-ID: <20231214024727.3503870-4-vineeth@bitbyteword.org>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20231214024727.3503870-1-vineeth@bitbyteword.org>
References: <20231214024727.3503870-1-vineeth@bitbyteword.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

When the guest kernel is about to run a critical or latency sensitive
workload, it can request the hypervisor to boost the priority of the
vcpu thread. Similarly, guest kernel can request to unboost when the
vcpu switches to a normal workload.

When a guest determines that it needs a boost, it need not immediately
request a synchronous boost as it is already running at that moment.
Synchronous request is detrimental because it incurs a VMEXIT. Rather,
let the guest note down its request on a shared memory and the host can
check this request on next VMEXIT and boost if needed.

Preemption disabled state in a vcpu is also considered latency sensitive
and requires boosting. Guest passes its preemption state to host and
host boosts the vcpu thread as needed.

Unboost requests needs to be synchronous because guest running boosted
while really not required might hurt host workload. Implement a
synchronous mechanism to unboost using MSR_KVM_PV_SCHED. Host checks
the shared memory for boost/unboost request on every VMEXIT. So, the
VMEXIT due to wrmsr(ULLONG_MAX) also goes through a fastexit path which
takes care of the boost/unboost.

Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
---
 arch/x86/include/asm/kvm_host.h      | 42 ++++++++++++++++
 arch/x86/include/uapi/asm/kvm_para.h | 19 +++++++
 arch/x86/kvm/x86.c                   | 48 ++++++++++++++++++
 include/linux/kvm_host.h             | 11 +++++
 virt/kvm/kvm_main.c                  | 74 ++++++++++++++++++++++++++++
 5 files changed, 194 insertions(+)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index f89ba1f07d88..474fe2d6d3e0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -994,6 +994,8 @@ struct kvm_vcpu_arch {
 	 */
 	struct {
 		enum kvm_vcpu_boost_state boost_status;
+		int boost_policy;
+		int boost_prio;
 		u64 msr_val;
 		struct gfn_to_hva_cache data;
 	} pv_sched;
@@ -2230,6 +2232,13 @@ int memslot_rmap_alloc(struct kvm_memory_slot *slot,=
 unsigned long npages);
 #define KVM_EXIT_HYPERCALL_MBZ		GENMASK_ULL(31, 1)
=20
 #ifdef CONFIG_PARAVIRT_SCHED_KVM
+/*
+ * Default policy and priority used for boosting
+ * VCPU threads.
+ */
+#define VCPU_BOOST_DEFAULT_PRIO	8
+#define VCPU_BOOST_DEFAULT_POLICY	SCHED_RR
+
 static inline bool kvm_arch_vcpu_pv_sched_enabled(struct kvm_vcpu_arch *ar=
ch)
 {
 	return arch->pv_sched.msr_val;
@@ -2240,6 +2249,39 @@ static inline void kvm_arch_vcpu_set_boost_status(st=
ruct kvm_vcpu_arch *arch,
 {
 	arch->pv_sched.boost_status =3D boost_status;
 }
+
+static inline bool kvm_arch_vcpu_boosted(struct kvm_vcpu_arch *arch)
+{
+	return arch->pv_sched.boost_status =3D=3D VCPU_BOOST_BOOSTED;
+}
+
+static inline int kvm_arch_vcpu_boost_policy(struct kvm_vcpu_arch *arch)
+{
+	return arch->pv_sched.boost_policy;
+}
+
+static inline int kvm_arch_vcpu_boost_prio(struct kvm_vcpu_arch *arch)
+{
+	return arch->pv_sched.boost_prio;
+}
+
+static inline int kvm_arch_vcpu_set_boost_prio(struct kvm_vcpu_arch *arch,=
 u64 prio)
+{
+	if (prio >=3D MAX_RT_PRIO)
+		return -EINVAL;
+
+	arch->pv_sched.boost_prio =3D prio;
+	return 0;
+}
+
+static inline int kvm_arch_vcpu_set_boost_policy(struct kvm_vcpu_arch *arc=
h, u64 policy)
+{
+	if (policy !=3D SCHED_FIFO && policy !=3D SCHED_RR)
+		return -EINVAL;
+
+	arch->pv_sched.boost_policy =3D policy;
+	return 0;
+}
 #endif
=20
 #endif /* _ASM_X86_KVM_HOST_H */
diff --git a/arch/x86/include/uapi/asm/kvm_para.h b/arch/x86/include/uapi/a=
sm/kvm_para.h
index 6b1dea07a563..e53c3f3a88d7 100644
--- a/arch/x86/include/uapi/asm/kvm_para.h
+++ b/arch/x86/include/uapi/asm/kvm_para.h
@@ -167,11 +167,30 @@ enum kvm_vcpu_boost_state {
 	VCPU_BOOST_BOOSTED
 };
=20
+/*
+ * Boost Request from guest to host for lazy boosting.
+ */
+enum kvm_vcpu_boost_request {
+	VCPU_REQ_NONE =3D 0,
+	VCPU_REQ_UNBOOST,
+	VCPU_REQ_BOOST,
+};
+
+
+union guest_schedinfo {
+	struct {
+		__u8 boost_req;
+		__u8 preempt_disabled;
+	};
+	__u64 pad;
+};
+
 /*
  * Structure passed in via MSR_KVM_PV_SCHED
  */
 struct pv_sched_data {
 	__u64 boost_status;
+	union guest_schedinfo schedinfo;
 };
=20
 #endif /* _UAPI_ASM_X86_KVM_PARA_H */
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 0f475b50ac83..2577e1083f91 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -2148,6 +2148,37 @@ static inline bool kvm_vcpu_exit_request(struct kvm_=
vcpu *vcpu)
 		xfer_to_guest_mode_work_pending();
 }
=20
+#ifdef CONFIG_PARAVIRT_SCHED_KVM
+static inline bool __vcpu_needs_boost(struct kvm_vcpu *vcpu, union guest_s=
chedinfo schedinfo)
+{
+	bool pending_event =3D kvm_cpu_has_pending_timer(vcpu) || kvm_cpu_has_int=
errupt(vcpu);
+
+	/*
+	 * vcpu needs a boost if
+	 * - A lazy boost request active, or
+	 * - Pending latency sensitive event, or
+	 * - Preemption disabled in this vcpu.
+	 */
+	return (schedinfo.boost_req =3D=3D VCPU_REQ_BOOST || pending_event || sch=
edinfo.preempt_disabled);
+}
+
+static inline void kvm_vcpu_do_pv_sched(struct kvm_vcpu *vcpu)
+{
+	union guest_schedinfo schedinfo;
+
+	if (!kvm_vcpu_sched_enabled(vcpu))
+		return;
+
+	if (kvm_read_guest_offset_cached(vcpu->kvm, &vcpu->arch.pv_sched.data,
+		&schedinfo, offsetof(struct pv_sched_data, schedinfo), sizeof(schedinfo)=
))
+		return;
+
+	kvm_vcpu_set_sched(vcpu, __vcpu_needs_boost(vcpu, schedinfo));
+}
+#else
+static inline void kvm_vcpu_do_pv_sched(struct kvm_vcpu *vcpu) { }
+#endif
+
 /*
  * The fast path for frequent and performance sensitive wrmsr emulation,
  * i.e. the sending of IPI, sending IPI early in the VM-Exit flow reduces
@@ -2201,6 +2232,15 @@ fastpath_t handle_fastpath_set_msr_irqoff(struct kvm=
_vcpu *vcpu)
 			ret =3D EXIT_FASTPATH_REENTER_GUEST;
 		}
 		break;
+#ifdef CONFIG_PARAVIRT_SCHED_KVM
+	case MSR_KVM_PV_SCHED:
+		data =3D kvm_read_edx_eax(vcpu);
+		if (data =3D=3D ULLONG_MAX) {
+			kvm_skip_emulated_instruction(vcpu);
+			ret =3D EXIT_FASTPATH_EXIT_HANDLED;
+		}
+		break;
+#endif
 	default:
 		break;
 	}
@@ -10919,6 +10959,9 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
 	guest_timing_exit_irqoff();
=20
 	local_irq_enable();
+
+	kvm_vcpu_do_pv_sched(vcpu);
+
 	preempt_enable();
=20
 	kvm_vcpu_srcu_read_lock(vcpu);
@@ -11990,6 +12033,11 @@ int kvm_arch_vcpu_create(struct kvm_vcpu *vcpu)
 	if (r)
 		goto free_guest_fpu;
=20
+#ifdef CONFIG_PARAVIRT_SCHED_KVM
+	kvm_arch_vcpu_set_boost_prio(&vcpu->arch, VCPU_BOOST_DEFAULT_PRIO);
+	kvm_arch_vcpu_set_boost_policy(&vcpu->arch, VCPU_BOOST_DEFAULT_POLICY);
+#endif
+
 	vcpu->arch.arch_capabilities =3D kvm_get_arch_capabilities();
 	vcpu->arch.msr_platform_info =3D MSR_PLATFORM_INFO_CPUID_FAULT;
 	kvm_xen_init_vcpu(vcpu);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index a74aeea55347..c6647f6312c9 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2290,6 +2290,17 @@ static inline void kvm_account_pgtable_pages(void *v=
irt, int nr)
=20
 #ifdef CONFIG_PARAVIRT_SCHED_KVM
 void kvm_set_vcpu_boosted(struct kvm_vcpu *vcpu, bool boosted);
+int kvm_vcpu_set_sched(struct kvm_vcpu *vcpu, bool boost);
+
+static inline bool kvm_vcpu_sched_enabled(struct kvm_vcpu *vcpu)
+{
+	return kvm_arch_vcpu_pv_sched_enabled(&vcpu->arch);
+}
+#else
+static inline int kvm_vcpu_set_sched(struct kvm_vcpu *vcpu, bool boost)
+{
+	return 0;
+}
 #endif
=20
 #endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 5bbb5612b207..37748e2512e1 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -57,6 +57,9 @@
 #include <asm/ioctl.h>
 #include <linux/uaccess.h>
=20
+#include <linux/sched.h>
+#include <uapi/linux/sched/types.h>
+
 #include "coalesced_mmio.h"
 #include "async_pf.h"
 #include "kvm_mm.h"
@@ -3602,6 +3605,77 @@ bool kvm_vcpu_wake_up(struct kvm_vcpu *vcpu)
 }
 EXPORT_SYMBOL_GPL(kvm_vcpu_wake_up);
=20
+#ifdef CONFIG_PARAVIRT_SCHED_KVM
+/*
+ * Check if we need to act on the boost/unboost request.
+ * Returns true if:
+ *  - caller is requesting boost and vcpu is boosted, or
+ *  - caller is requesting unboost and vcpu is not boosted.
+ */
+static inline bool __can_ignore_set_sched(struct kvm_vcpu *vcpu, bool boos=
t)
+{
+	return ((boost && kvm_arch_vcpu_boosted(&vcpu->arch)) ||
+		(!boost && !kvm_arch_vcpu_boosted(&vcpu->arch)));
+}
+
+int kvm_vcpu_set_sched(struct kvm_vcpu *vcpu, bool boost)
+{
+	int policy;
+	int ret =3D 0;
+	struct pid *pid;
+	struct sched_param param =3D { 0 };
+	struct task_struct *vcpu_task =3D NULL;
+
+	/*
+	 * We can ignore the request if a boost request comes
+	 * when we are already boosted or an unboost request
+	 * when we are already unboosted.
+	 */
+	if (__can_ignore_set_sched(vcpu, boost))
+		goto set_boost_status;
+
+	if (boost) {
+		policy =3D kvm_arch_vcpu_boost_policy(&vcpu->arch);
+		param.sched_priority =3D kvm_arch_vcpu_boost_prio(&vcpu->arch);
+	} else {
+		/*
+		 * TODO: here we just unboost to SCHED_NORMAL. Ideally we
+		 * should either
+		 * - revert to the initial priority before boost, or
+		 * - introduce tunables for unboost priority.
+		 */
+		policy =3D SCHED_NORMAL;
+		param.sched_priority =3D 0;
+	}
+
+	rcu_read_lock();
+	pid =3D rcu_dereference(vcpu->pid);
+	if (pid)
+		vcpu_task =3D get_pid_task(pid, PIDTYPE_PID);
+	rcu_read_unlock();
+	if (vcpu_task =3D=3D NULL)
+		return -KVM_EINVAL;
+
+	/*
+	 * This might be called from interrupt context.
+	 * Since we do not use rt-mutexes, we can safely call
+	 * sched_setscheduler_pi_nocheck with pi =3D false.
+	 * NOTE: If in future, we use rt-mutexes, this should
+	 * be modified to use a tasklet to do boost/unboost.
+	 */
+	WARN_ON_ONCE(vcpu_task->pi_top_task);
+	ret =3D sched_setscheduler_pi_nocheck(vcpu_task, policy,
+			&param, false);
+	put_task_struct(vcpu_task);
+set_boost_status:
+	if (!ret)
+		kvm_set_vcpu_boosted(vcpu, boost);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(kvm_vcpu_set_sched);
+#endif
+
 #ifndef CONFIG_S390
 /*
  * Kick a sleeping VCPU, or a guest VCPU in guest mode, into host kernel m=
ode.
--=20
2.43.0
From nobody Mon Feb  9 05:05:20 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 01107C4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 14 Dec 2023 02:47:53 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1442998AbjLNCrp (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 13 Dec 2023 21:47:45 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58456 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235224AbjLNCri (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 13 Dec 2023 21:47:38 -0500
Received: from mail-qt1-x833.google.com (mail-qt1-x833.google.com
 [IPv6:2607:f8b0:4864:20::833])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9BF9F10C
        for <linux-kernel@vger.kernel.org>;
 Wed, 13 Dec 2023 18:47:39 -0800 (PST)
Received: by mail-qt1-x833.google.com with SMTP id
 d75a77b69052e-42589694492so56236791cf.1
        for <linux-kernel@vger.kernel.org>;
 Wed, 13 Dec 2023 18:47:39 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bitbyteword.org; s=google; t=1702522059; x=1703126859;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=gTSAC8lorgQBroJi8wDtWTizY3Sq3NA2keDTcoNJeiU=;
        b=E28tJ5RP/YYloN5KYrVjjeROKlG/JunnuwxD6Il3I6T6YvRmpMq51N9KkkNBmNswzN
         wTmyjgWlSLk5HVBQshysM6JawkK8RDuhLi9in9RTT+YpePoVUerXU/4W58tKoxhAWrSh
         qXuFLVoVX79+Tyosr4QjXE1foejYZUjFxO/MIiOb02ccBOeZx/lkHZ3hkpn1wh932Fpi
         dvWa5tnyRQgnfZNel+URgPB05B5NbQ1oKEidcdVJWJBwmkxfuT0jwhF7FaMexfMSjX/F
         xlcOualUpnQUzK6XxMsS6Rbbb5+tMMEw9KzlPGpibeatp++uGyQghnczU1h9wIzyZgTB
         8Pyw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1702522059; x=1703126859;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=gTSAC8lorgQBroJi8wDtWTizY3Sq3NA2keDTcoNJeiU=;
        b=RwGKmVWC/Kpqcd9eV7FCrImkJ+0KsQU4gr4z+66GuzrQfFeufLtSwKqGULYqgOykIS
         Zu+MQYtPktHpkzaHmmpiWJgxxNAyCdeHjDCZMxQJe72Sy21PaQj9mlU72roFmephyQj7
         aQdYBoz4yYvFqKgr7z56dz6f62hKNHz6qhMjLRCyTaBtm++EaknOiNDAXQ8H92KNy78u
         3q5ua0AXaPzQdGuKxZ9s0jyPXLx+0/2W34V+Jzatoox+KdfcbEfKHG2vPRecQI8Ild1I
         VkGnkPmSei0A70v5ViukWXfYdwNsFp+Q72pOTtVwYqTIicw0usPv0yTI1sy01+pr5Tkv
         gUNQ==
X-Gm-Message-State: AOJu0Ywflx4e6zzXlNbu2QafSdpVjGNaa3qt5Es/LnTDXY/I3PVijohO
        XqyVqiz5EJCjxGBKYN8T2IC5Hg==
X-Google-Smtp-Source: 
 AGHT+IFCknomeNY7bK13m6X0xaXsU9OorF5sm50ko/Fs0P9mA9ckrWT4Qq2LVghLRhWQezBZWpVAAQ==
X-Received: by 2002:ac8:5d4c:0:b0:423:7d72:6c8 with SMTP id
 g12-20020ac85d4c000000b004237d7206c8mr14420946qtx.53.1702522058661;
        Wed, 13 Dec 2023 18:47:38 -0800 (PST)
Received: from vinp3lin.lan (c-73-143-21-186.hsd1.vt.comcast.net.
 [73.143.21.186])
        by smtp.gmail.com with ESMTPSA id
 fh3-20020a05622a588300b00425b356b919sm4240208qtb.55.2023.12.13.18.47.37
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 13 Dec 2023 18:47:38 -0800 (PST)
From: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>
To: Ben Segall <bsegall@google.com>, Borislav Petkov <bp@alien8.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        "H . Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
        Juri Lelli <juri.lelli@redhat.com>,
        Mel Gorman <mgorman@suse.de>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Andy Lutomirski <luto@kernel.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Sean Christopherson <seanjc@google.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Valentin Schneider <vschneid@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>
Cc: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>,
        Suleiman Souhlal <suleiman@google.com>,
        Masami Hiramatsu <mhiramat@google.com>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, x86@kernel.org,
        Joel Fernandes <joel@joelfernandes.org>
Subject: [RFC PATCH 4/8] kvm: x86: boost vcpu threads on latency sensitive
 paths
Date: Wed, 13 Dec 2023 21:47:21 -0500
Message-ID: <20231214024727.3503870-5-vineeth@bitbyteword.org>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20231214024727.3503870-1-vineeth@bitbyteword.org>
References: <20231214024727.3503870-1-vineeth@bitbyteword.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Proactively boost the vcpu thread when delivering an interrupt so
that the guest vcpu gets to run with minimum latency and service the
interrupt. The host knows that guest vcpu is going to service an irq/nmi
as soon as its delivered and boosting the priority will help the guest
to avoid latencies. Timer interrupt is one common scenario which
benefits from this.

When a vcpu resumes from halt, it would be because of an event like
timer or irq/nmi and is latency sensitive. So, boosting the priority
of vcpu thread when it goes idle makes sense as the wakeup would be
because of a latency sensitive event and this boosting will not hurt
the host as the thread is scheduled out.

Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
---
 arch/x86/kvm/i8259.c     | 2 +-
 arch/x86/kvm/lapic.c     | 8 ++++----
 arch/x86/kvm/svm/svm.c   | 2 +-
 arch/x86/kvm/vmx/vmx.c   | 2 +-
 include/linux/kvm_host.h | 8 ++++++++
 virt/kvm/kvm_main.c      | 8 ++++++++
 6 files changed, 23 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/i8259.c b/arch/x86/kvm/i8259.c
index 8dec646e764b..6841ed802f00 100644
--- a/arch/x86/kvm/i8259.c
+++ b/arch/x86/kvm/i8259.c
@@ -62,7 +62,7 @@ static void pic_unlock(struct kvm_pic *s)
 		kvm_for_each_vcpu(i, vcpu, s->kvm) {
 			if (kvm_apic_accept_pic_intr(vcpu)) {
 				kvm_make_request(KVM_REQ_EVENT, vcpu);
-				kvm_vcpu_kick(vcpu);
+				kvm_vcpu_kick_boost(vcpu);
 				return;
 			}
 		}
diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index e74e223f46aa..ae25176fddc8 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -1309,12 +1309,12 @@ static int __apic_accept_irq(struct kvm_lapic *apic=
, int delivery_mode,
 		result =3D 1;
 		vcpu->arch.pv.pv_unhalted =3D 1;
 		kvm_make_request(KVM_REQ_EVENT, vcpu);
-		kvm_vcpu_kick(vcpu);
+		kvm_vcpu_kick_boost(vcpu);
 		break;
=20
 	case APIC_DM_SMI:
 		if (!kvm_inject_smi(vcpu)) {
-			kvm_vcpu_kick(vcpu);
+			kvm_vcpu_kick_boost(vcpu);
 			result =3D 1;
 		}
 		break;
@@ -1322,7 +1322,7 @@ static int __apic_accept_irq(struct kvm_lapic *apic, =
int delivery_mode,
 	case APIC_DM_NMI:
 		result =3D 1;
 		kvm_inject_nmi(vcpu);
-		kvm_vcpu_kick(vcpu);
+		kvm_vcpu_kick_boost(vcpu);
 		break;
=20
 	case APIC_DM_INIT:
@@ -1901,7 +1901,7 @@ static void apic_timer_expired(struct kvm_lapic *apic=
, bool from_timer_fn)
 	atomic_inc(&apic->lapic_timer.pending);
 	kvm_make_request(KVM_REQ_UNBLOCK, vcpu);
 	if (from_timer_fn)
-		kvm_vcpu_kick(vcpu);
+		kvm_vcpu_kick_boost(vcpu);
 }
=20
 static void start_sw_tscdeadline(struct kvm_lapic *apic)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index c8466bc64b87..578c19aeef73 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -3566,7 +3566,7 @@ void svm_complete_interrupt_delivery(struct kvm_vcpu =
*vcpu, int delivery_mode,
 	if (!READ_ONCE(vcpu->arch.apic->apicv_active)) {
 		/* Process the interrupt via kvm_check_and_inject_events(). */
 		kvm_make_request(KVM_REQ_EVENT, vcpu);
-		kvm_vcpu_kick(vcpu);
+		kvm_vcpu_kick_boost(vcpu);
 		return;
 	}
=20
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index bc6f0fea48b4..b786cb2eb185 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4266,7 +4266,7 @@ static void vmx_deliver_interrupt(struct kvm_lapic *a=
pic, int delivery_mode,
 	if (vmx_deliver_posted_interrupt(vcpu, vector)) {
 		kvm_lapic_set_irr(vector, apic);
 		kvm_make_request(KVM_REQ_EVENT, vcpu);
-		kvm_vcpu_kick(vcpu);
+		kvm_vcpu_kick_boost(vcpu);
 	} else {
 		trace_kvm_apicv_accept_irq(vcpu->vcpu_id, delivery_mode,
 					   trig_mode, vector);
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index c6647f6312c9..f76680fbc60d 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -2296,11 +2296,19 @@ static inline bool kvm_vcpu_sched_enabled(struct kv=
m_vcpu *vcpu)
 {
 	return kvm_arch_vcpu_pv_sched_enabled(&vcpu->arch);
 }
+
+static inline void kvm_vcpu_kick_boost(struct kvm_vcpu *vcpu)
+{
+	kvm_vcpu_set_sched(vcpu, true);
+	kvm_vcpu_kick(vcpu);
+}
 #else
 static inline int kvm_vcpu_set_sched(struct kvm_vcpu *vcpu, bool boost)
 {
 	return 0;
 }
+
+#define kvm_vcpu_kick_boost kvm_vcpu_kick
 #endif
=20
 #endif
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 37748e2512e1..0dd8b84ed073 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -3460,6 +3460,14 @@ bool kvm_vcpu_block(struct kvm_vcpu *vcpu)
 		if (kvm_vcpu_check_block(vcpu) < 0)
 			break;
=20
+		/*
+		 * Boost before scheduling out. Wakeup happens only on
+		 * an event or a signal and hence it is beneficial to
+		 * be scheduled ASAP. Ultimately, guest gets to idle loop
+		 * and then will request deboost.
+		 */
+		kvm_vcpu_set_sched(vcpu, true);
+
 		waited =3D true;
 		schedule();
 	}
--=20
2.43.0
From nobody Mon Feb  9 05:05:20 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 3557CC4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 14 Dec 2023 02:47:56 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1443053AbjLNCrr (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 13 Dec 2023 21:47:47 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58536 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235353AbjLNCri (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 13 Dec 2023 21:47:38 -0500
Received: from mail-qt1-x836.google.com (mail-qt1-x836.google.com
 [IPv6:2607:f8b0:4864:20::836])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 22197115
        for <linux-kernel@vger.kernel.org>;
 Wed, 13 Dec 2023 18:47:41 -0800 (PST)
Received: by mail-qt1-x836.google.com with SMTP id
 d75a77b69052e-4259c7dfb63so39672151cf.1
        for <linux-kernel@vger.kernel.org>;
 Wed, 13 Dec 2023 18:47:41 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bitbyteword.org; s=google; t=1702522060; x=1703126860;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=ntiXZyzgAfw7DEiUwC2cla+1/r+Z91Px9imV7BJfm4E=;
        b=eQccfhVk7nDYWuJiaMHSFst/2lfRIsQ5LcVGGPwzOro5E4IPE1jJq+4zmfiTV9fW1/
         JGLtODe28w1wrfnlDnMMAx73YW5V20GAbAKmbOsqSH4llz/cO0+ogVRiKpPYI4RKn5yw
         ZPJHR51ztl6mJQ+an3niXS8WJ9Au7nx8IKJ4Z7hAiae2xd7Flwme1gN9ZC18Jvv8hH/c
         pQlyqj+lvOVz8ZCFRV2lXN+WWu1QQE6sw88b+COl0DZL0SLEOn6wPpvy4xJGXE0/gccq
         4cmLLsGp5h89B3YKsmHKHPKcrMHQ/eMYQAbYVRtP9iDyseLKoe8sB4rv9KIBZz0Pkp+/
         6hyw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1702522060; x=1703126860;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=ntiXZyzgAfw7DEiUwC2cla+1/r+Z91Px9imV7BJfm4E=;
        b=pwpchAxIP8zKb096PvMepcX756w3ZRuyBGNGS62hIZ/0LHxTxGlMI9aUvgnTzePz9y
         Wb+ek5DVg3g4w5yrskQ/QA3PJRDiyX6ZTpI7eRcQJ0RiAq3Vi3MKgQZBwyrlpsPb5Opx
         j8NcFUjYCtztqFXbnrAZvwpwz1dnqbFyx5CtjX33SRLvMXtNpZd1kJ9qTUnUROoxVoT6
         nI9fk8T5h92K3IKnsK2BPEfNCIjHewTGVl+VKJi4JmW4bV3LGjdi4Z5Q7iTh0S4l7rBC
         wl4oaChEDz0lED3SrHQABo/2FN6OCFr7eAu2C8WCkX+jcQReulUGvr+lJpmC0a2ex8+m
         Ssxw==
X-Gm-Message-State: AOJu0Yw+cRu/as4xrF82XAu6ItsKsd3zCMwaTfMV+DHjMrhyhe811Py0
        j703H5hCl272OinPka+BcDIi2w==
X-Google-Smtp-Source: 
 AGHT+IFp1ZD52g4ymlbxazKG1B37Keyo6/4q8bj6M6rVdCJFfpmlC4GJ0ji/8pgJ8q6GEwLUiO0XHA==
X-Received: by 2002:ac8:4e52:0:b0:425:9ab0:467a with SMTP id
 e18-20020ac84e52000000b004259ab0467amr10005951qtw.19.1702522060183;
        Wed, 13 Dec 2023 18:47:40 -0800 (PST)
Received: from vinp3lin.lan (c-73-143-21-186.hsd1.vt.comcast.net.
 [73.143.21.186])
        by smtp.gmail.com with ESMTPSA id
 fh3-20020a05622a588300b00425b356b919sm4240208qtb.55.2023.12.13.18.47.38
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 13 Dec 2023 18:47:39 -0800 (PST)
From: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>
To: Ben Segall <bsegall@google.com>, Borislav Petkov <bp@alien8.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        "H . Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
        Juri Lelli <juri.lelli@redhat.com>,
        Mel Gorman <mgorman@suse.de>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Andy Lutomirski <luto@kernel.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Sean Christopherson <seanjc@google.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Valentin Schneider <vschneid@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>
Cc: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>,
        Suleiman Souhlal <suleiman@google.com>,
        Masami Hiramatsu <mhiramat@google.com>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, x86@kernel.org,
        Joel Fernandes <joel@joelfernandes.org>
Subject: [RFC PATCH 5/8] kvm: x86: upper bound for preemption based boost
 duration
Date: Wed, 13 Dec 2023 21:47:22 -0500
Message-ID: <20231214024727.3503870-6-vineeth@bitbyteword.org>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20231214024727.3503870-1-vineeth@bitbyteword.org>
References: <20231214024727.3503870-1-vineeth@bitbyteword.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Guest requests boost on preempt disable but doesn't request unboost on
preempt enable. This may cause the guest vcpu to be boosted for longer
than what it deserves. Also, there are lots of preemption disabled paths
in kernel and some could be quite long.

This patch sets a bound on the maximum time a vcpu is boosted due to
preemption disabled in guest. Default is 3000us, and could be changed
via kvm module parameter.

Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
---
 arch/x86/include/asm/kvm_host.h |  2 ++
 arch/x86/kvm/x86.c              | 49 ++++++++++++++++++++++++++++++---
 2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_hos=
t.h
index 474fe2d6d3e0..6a8326baa6a0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -994,6 +994,8 @@ struct kvm_vcpu_arch {
 	 */
 	struct {
 		enum kvm_vcpu_boost_state boost_status;
+		bool preempt_disabled;
+		ktime_t preempt_disabled_ts;
 		int boost_policy;
 		int boost_prio;
 		u64 msr_val;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 2577e1083f91..8c15c6ff352e 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -199,6 +199,15 @@ module_param(eager_page_split, bool, 0644);
 static bool __read_mostly mitigate_smt_rsb;
 module_param(mitigate_smt_rsb, bool, 0444);
=20
+#ifdef CONFIG_PARAVIRT_SCHED_KVM
+/*
+ * Maximum time in micro seconds a guest vcpu can stay boosted due
+ * to preemption disabled.
+ */
+unsigned int pvsched_max_preempt_disabled_us =3D 3000;
+module_param(pvsched_max_preempt_disabled_us, uint, 0644);
+#endif
+
 /*
  * Restoring the host value for MSRs that are only consumed when running in
  * usermode, e.g. SYSCALL MSRs and TSC_AUX, can be deferred until the CPU
@@ -2149,17 +2158,47 @@ static inline bool kvm_vcpu_exit_request(struct kvm=
_vcpu *vcpu)
 }
=20
 #ifdef CONFIG_PARAVIRT_SCHED_KVM
+static inline void kvm_vcpu_update_preempt_disabled(struct kvm_vcpu_arch *=
arch,
+		bool preempt_disabled)
+{
+	if (arch->pv_sched.preempt_disabled !=3D preempt_disabled) {
+		arch->pv_sched.preempt_disabled =3D preempt_disabled;
+		if (preempt_disabled)
+			arch->pv_sched.preempt_disabled_ts =3D ktime_get();
+		else
+			arch->pv_sched.preempt_disabled_ts =3D 0;
+	}
+}
+
+static inline bool kvm_vcpu_exceeds_preempt_disabled_duration(struct kvm_v=
cpu_arch *arch)
+{
+	s64 max_delta =3D pvsched_max_preempt_disabled_us * NSEC_PER_USEC;
+
+	if (max_delta && arch->pv_sched.preempt_disabled) {
+		s64 delta;
+
+		WARN_ON_ONCE(arch->pv_sched.preempt_disabled_ts =3D=3D 0);
+		delta =3D ktime_to_ns(ktime_sub(ktime_get(),
+					arch->pv_sched.preempt_disabled_ts));
+
+		if (delta >=3D max_delta)
+			return true;
+	}
+
+	return false;
+}
+
 static inline bool __vcpu_needs_boost(struct kvm_vcpu *vcpu, union guest_s=
chedinfo schedinfo)
 {
 	bool pending_event =3D kvm_cpu_has_pending_timer(vcpu) || kvm_cpu_has_int=
errupt(vcpu);
=20
 	/*
 	 * vcpu needs a boost if
-	 * - A lazy boost request active, or
-	 * - Pending latency sensitive event, or
-	 * - Preemption disabled in this vcpu.
+	 * - A lazy boost request active or a pending latency sensitive event, and
+	 * - Preemption disabled duration on this vcpu has not crossed the thresh=
old.
 	 */
-	return (schedinfo.boost_req =3D=3D VCPU_REQ_BOOST || pending_event || sch=
edinfo.preempt_disabled);
+	return ((schedinfo.boost_req =3D=3D VCPU_REQ_BOOST || pending_event) &&
+			!kvm_vcpu_exceeds_preempt_disabled_duration(&vcpu->arch));
 }
=20
 static inline void kvm_vcpu_do_pv_sched(struct kvm_vcpu *vcpu)
@@ -2173,6 +2212,8 @@ static inline void kvm_vcpu_do_pv_sched(struct kvm_vc=
pu *vcpu)
 		&schedinfo, offsetof(struct pv_sched_data, schedinfo), sizeof(schedinfo)=
))
 		return;
=20
+	kvm_vcpu_update_preempt_disabled(&vcpu->arch, schedinfo.preempt_disabled);
+
 	kvm_vcpu_set_sched(vcpu, __vcpu_needs_boost(vcpu, schedinfo));
 }
 #else
--=20
2.43.0
From nobody Mon Feb  9 05:05:20 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4DC1FC4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 14 Dec 2023 02:48:06 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1443080AbjLNCr5 (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 13 Dec 2023 21:47:57 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58536 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1442884AbjLNCrk (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 13 Dec 2023 21:47:40 -0500
Received: from mail-qt1-x830.google.com (mail-qt1-x830.google.com
 [IPv6:2607:f8b0:4864:20::830])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AD09912A
        for <linux-kernel@vger.kernel.org>;
 Wed, 13 Dec 2023 18:47:42 -0800 (PST)
Received: by mail-qt1-x830.google.com with SMTP id
 d75a77b69052e-425f5964ce1so6848311cf.1
        for <linux-kernel@vger.kernel.org>;
 Wed, 13 Dec 2023 18:47:42 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bitbyteword.org; s=google; t=1702522061; x=1703126861;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=/yPwhVHo1DnvaRqBLpgUpHgGMwpWU0ZdsFg9YzpMxKE=;
        b=G0av+TRk7vqrktMbBnEyBeSKHR0i23wA6bmzOG1y5voP8FQm0jAtFvdNFEUHUUrVY8
         bmLY/9fa9YRZG8nCe7sVMAAOW7iakZkuLGgysI7LIQDRmDHv0h3DereWTeSuL8MgFL69
         M4AJyASVVKcRwd4m/+XoLJuSEQ/PYOmZmT+RcbogmQuFhQj7pWMGhxP4/OjpkG7AEKDq
         5MwnCbcHP9DyB2WX7Tdp/s8TYaMUcUArmppqlf40tswIK8CV31zsJIYpM4x+EIR7p36O
         Uq03QIg6d51dmgIGeIcvHHJoUpIdtkddFOc3CjN37YyaNteTvGCsLvbKf49M58z/wbmq
         +20g==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1702522061; x=1703126861;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=/yPwhVHo1DnvaRqBLpgUpHgGMwpWU0ZdsFg9YzpMxKE=;
        b=KxkXFMh60zoq2AQ6+M8to8pSiNXSuNAtXSCvME3AYHBAwB235fohnNB3FDjMojkFOp
         wchDkWly0rQRun+FF2enK6/epRUXxk62QEMV7gUOTtplsiX0sz0ODxO0Mqa80wauC366
         bUOQQbolKyjpmYVCXpC0WkafKPf/iPG+jiXR3XJsWfha4D/MxcrGVkXzFtQw3Wvkq7UH
         vtv95u0AfqUL8xmLBrFRv4vmLN63Bm9UlxAFmwBm2QZwitu3/SgPYS1lsX/8Z9F0hC70
         M/rdPAjBUqsAR/7Vqi5+SrB1Z350GGViiHBDpyi2Ewk+MZo6gBVKZUfR/tlzcMmYZmUr
         iPQQ==
X-Gm-Message-State: AOJu0YwdbIBcDkb7q2ZXl+inzBvThgnJQwavAdCeLT1hdGlodoKz0ugW
        bNoOF5ygXAQcnxt00M4wUYv8/g==
X-Google-Smtp-Source: 
 AGHT+IF+ZeEjRT/5mEYxEeQ/x6jk/1CzxRDZwt23B9wDPg2/4qz07MH2B4H09rOeeBg0SGLeYN9dFg==
X-Received: by 2002:ac8:59d3:0:b0:425:4043:50eb with SMTP id
 f19-20020ac859d3000000b00425404350ebmr12277865qtf.122.1702522061663;
        Wed, 13 Dec 2023 18:47:41 -0800 (PST)
Received: from vinp3lin.lan (c-73-143-21-186.hsd1.vt.comcast.net.
 [73.143.21.186])
        by smtp.gmail.com with ESMTPSA id
 fh3-20020a05622a588300b00425b356b919sm4240208qtb.55.2023.12.13.18.47.40
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 13 Dec 2023 18:47:41 -0800 (PST)
From: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>
To: Ben Segall <bsegall@google.com>, Borislav Petkov <bp@alien8.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        "H . Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
        Juri Lelli <juri.lelli@redhat.com>,
        Mel Gorman <mgorman@suse.de>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Andy Lutomirski <luto@kernel.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Sean Christopherson <seanjc@google.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Valentin Schneider <vschneid@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>
Cc: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>,
        Suleiman Souhlal <suleiman@google.com>,
        Masami Hiramatsu <mhiramat@google.com>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, x86@kernel.org,
        Joel Fernandes <joel@joelfernandes.org>
Subject: [RFC PATCH 6/8] kvm: x86: enable/disable global/per-guest vcpu boost
 feature
Date: Wed, 13 Dec 2023 21:47:23 -0500
Message-ID: <20231214024727.3503870-7-vineeth@bitbyteword.org>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20231214024727.3503870-1-vineeth@bitbyteword.org>
References: <20231214024727.3503870-1-vineeth@bitbyteword.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

Implement the module parameter for enable/disable of the feature
globally. Also implement the ioctls for enable/disable of the feature
per guest.

TODO: Documentation for the ioctls and kvm module parameters.

Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
---
 arch/x86/kvm/x86.c       |  8 +++--
 include/linux/kvm_host.h | 34 +++++++++++++++++-
 include/uapi/linux/kvm.h |  5 +++
 virt/kvm/kvm_main.c      | 76 +++++++++++++++++++++++++++++++++++++---
 4 files changed, 116 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 8c15c6ff352e..4fb73833fc68 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -9949,8 +9949,12 @@ static void record_vcpu_boost_status(struct kvm_vcpu=
 *vcpu)
=20
 void kvm_set_vcpu_boosted(struct kvm_vcpu *vcpu, bool boosted)
 {
-	kvm_arch_vcpu_set_boost_status(&vcpu->arch,
-			boosted ? VCPU_BOOST_BOOSTED : VCPU_BOOST_NORMAL);
+	enum kvm_vcpu_boost_state boost_status =3D VCPU_BOOST_DISABLED;
+
+	if (kvm_pv_sched_enabled(vcpu->kvm))
+		boost_status =3D boosted ? VCPU_BOOST_BOOSTED : VCPU_BOOST_NORMAL;
+
+	kvm_arch_vcpu_set_boost_status(&vcpu->arch, boost_status);
=20
 	kvm_make_request(KVM_REQ_VCPU_BOOST_UPDATE, vcpu);
 }
diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index f76680fbc60d..07f60a27025c 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -807,6 +807,9 @@ struct kvm {
 	struct notifier_block pm_notifier;
 #endif
 	char stats_id[KVM_STATS_NAME_SIZE];
+#ifdef CONFIG_PARAVIRT_SCHED_KVM
+	bool pv_sched_enabled;
+#endif
 };
=20
 #define kvm_err(fmt, ...) \
@@ -2292,9 +2295,38 @@ static inline void kvm_account_pgtable_pages(void *v=
irt, int nr)
 void kvm_set_vcpu_boosted(struct kvm_vcpu *vcpu, bool boosted);
 int kvm_vcpu_set_sched(struct kvm_vcpu *vcpu, bool boost);
=20
+DECLARE_STATIC_KEY_FALSE(kvm_pv_sched);
+
+static inline bool kvm_pv_sched_enabled(struct kvm *kvm)
+{
+	if (static_branch_unlikely(&kvm_pv_sched))
+		return kvm->pv_sched_enabled;
+
+	return false;
+}
+
+static inline void kvm_set_pv_sched_enabled(struct kvm *kvm, bool enabled)
+{
+	unsigned long i;
+	struct kvm_vcpu *vcpu;
+
+	kvm->pv_sched_enabled =3D enabled;
+	/*
+	 * After setting vcpu_sched_enabled, we need to update each vcpu's
+	 * state(VCPU_BOOST_{DISABLED,NORMAL}) so that guest knows about the
+	 * update.
+	 * When disabling, we would also need to unboost vcpu threads
+	 * if already boosted.
+	 * XXX: this can race, needs locking!
+	 */
+	kvm_for_each_vcpu(i, vcpu, kvm)
+		kvm_vcpu_set_sched(vcpu, false);
+}
+
 static inline bool kvm_vcpu_sched_enabled(struct kvm_vcpu *vcpu)
 {
-	return kvm_arch_vcpu_pv_sched_enabled(&vcpu->arch);
+	return kvm_pv_sched_enabled(vcpu->kvm) &&
+		kvm_arch_vcpu_pv_sched_enabled(&vcpu->arch);
 }
=20
 static inline void kvm_vcpu_kick_boost(struct kvm_vcpu *vcpu)
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index f089ab290978..4beaeaa3e78f 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -1192,6 +1192,7 @@ struct kvm_ppc_resize_hpt {
 #define KVM_CAP_COUNTER_OFFSET 227
 #define KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE 228
 #define KVM_CAP_ARM_SUPPORTED_BLOCK_SIZES 229
+#define KVM_CAP_PV_SCHED	600
=20
 #ifdef KVM_CAP_IRQ_ROUTING
=20
@@ -2249,4 +2250,8 @@ struct kvm_s390_zpci_op {
 /* flags for kvm_s390_zpci_op->u.reg_aen.flags */
 #define KVM_S390_ZPCIOP_REGAEN_HOST    (1 << 0)
=20
+/* Available with KVM_CAP_PV_SCHED */
+#define KVM_SET_PV_SCHED_ENABLED	_IOW(KVMIO, 0xe0, int)
+#define KVM_GET_PV_SCHED_ENABLED	_IOR(KVMIO, 0xe1, int)
+
 #endif /* __LINUX_KVM_H */
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index 0dd8b84ed073..d17cd28d5a92 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -99,6 +99,52 @@ unsigned int halt_poll_ns_shrink;
 module_param(halt_poll_ns_shrink, uint, 0644);
 EXPORT_SYMBOL_GPL(halt_poll_ns_shrink);
=20
+#ifdef CONFIG_PARAVIRT_SCHED_KVM
+__read_mostly DEFINE_STATIC_KEY_FALSE(kvm_pv_sched);
+EXPORT_SYMBOL_GPL(kvm_pv_sched);
+
+static int set_kvm_pv_sched(const char *val, const struct kernel_param *cp)
+{
+	struct kvm *kvm;
+	char *s =3D strstrip((char *)val);
+	bool new_val, old_val =3D static_key_enabled(&kvm_pv_sched);
+
+	if (!strcmp(s, "0"))
+		new_val =3D 0;
+	else if (!strcmp(s, "1"))
+		new_val =3D 1;
+	else
+		return -EINVAL;
+
+	if (old_val !=3D new_val) {
+		if (new_val)
+			static_branch_enable(&kvm_pv_sched);
+		else
+			static_branch_disable(&kvm_pv_sched);
+
+		mutex_lock(&kvm_lock);
+		list_for_each_entry(kvm, &vm_list, vm_list)
+			kvm_set_pv_sched_enabled(kvm, !old_val);
+		mutex_unlock(&kvm_lock);
+	}
+
+	return 0;
+}
+
+static int get_kvm_pv_sched(char *buf, const struct kernel_param *cp)
+{
+	return sprintf(buf, "%s\n",
+			static_key_enabled(&kvm_pv_sched) ? "1" : "0");
+}
+
+static const struct kernel_param_ops kvm_pv_sched_ops =3D {
+	.set =3D set_kvm_pv_sched,
+	.get =3D get_kvm_pv_sched
+};
+
+module_param_cb(kvm_pv_sched, &kvm_pv_sched_ops, NULL, 0644);
+#endif
+
 /*
  * Ordering of locks:
  *
@@ -1157,6 +1203,9 @@ static struct kvm *kvm_create_vm(unsigned long type, =
const char *fdname)
=20
 	BUILD_BUG_ON(KVM_MEM_SLOTS_NUM > SHRT_MAX);
=20
+#ifdef CONFIG_PARAVIRT_SCHED_KVM
+	kvm->pv_sched_enabled =3D true;
+#endif
 	/*
 	 * Force subsequent debugfs file creations to fail if the VM directory
 	 * is not created (by kvm_create_vm_debugfs()).
@@ -3635,11 +3684,15 @@ int kvm_vcpu_set_sched(struct kvm_vcpu *vcpu, bool =
boost)
 	struct task_struct *vcpu_task =3D NULL;
=20
 	/*
-	 * We can ignore the request if a boost request comes
-	 * when we are already boosted or an unboost request
-	 * when we are already unboosted.
+	 * If the feature is disabled and we receive a boost request,
+	 * we can ignore the request and set VCPU_BOOST_DISABLED for the
+	 * guest to see(kvm_set_vcpu_boosted).
+	 * Similarly, we can ignore the request if a boost request comes
+	 * when we are already boosted or an unboost request when we are
+	 * already unboosted.
 	 */
-	if (__can_ignore_set_sched(vcpu, boost))
+	if ((!kvm_vcpu_sched_enabled(vcpu) && boost) ||
+			__can_ignore_set_sched(vcpu, boost))
 		goto set_boost_status;
=20
 	if (boost) {
@@ -4591,6 +4644,9 @@ static int kvm_vm_ioctl_check_extension_generic(struc=
t kvm *kvm, long arg)
 	case KVM_CAP_CHECK_EXTENSION_VM:
 	case KVM_CAP_ENABLE_CAP_VM:
 	case KVM_CAP_HALT_POLL:
+#ifdef CONFIG_PARAVIRT_SCHED_KVM
+	case KVM_CAP_PV_SCHED:
+#endif
 		return 1;
 #ifdef CONFIG_KVM_MMIO
 	case KVM_CAP_COALESCED_MMIO:
@@ -5018,6 +5074,18 @@ static long kvm_vm_ioctl(struct file *filp,
 	case KVM_GET_STATS_FD:
 		r =3D kvm_vm_ioctl_get_stats_fd(kvm);
 		break;
+#ifdef CONFIG_PARAVIRT_SCHED_KVM
+	case KVM_SET_PV_SCHED_ENABLED:
+		r =3D -EINVAL;
+		if (arg =3D=3D 0 || arg =3D=3D 1) {
+			kvm_set_pv_sched_enabled(kvm, arg);
+			r =3D 0;
+		}
+		break;
+	case KVM_GET_PV_SCHED_ENABLED:
+		r =3D kvm->pv_sched_enabled;
+		break;
+#endif
 	default:
 		r =3D kvm_arch_vm_ioctl(filp, ioctl, arg);
 	}
--=20
2.43.0
From nobody Mon Feb  9 05:05:20 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9A93DC4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 14 Dec 2023 02:48:09 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1442926AbjLNCsA (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 13 Dec 2023 21:48:00 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58716 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235370AbjLNCrl (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 13 Dec 2023 21:47:41 -0500
Received: from mail-qt1-x833.google.com (mail-qt1-x833.google.com
 [IPv6:2607:f8b0:4864:20::833])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5A92A187
        for <linux-kernel@vger.kernel.org>;
 Wed, 13 Dec 2023 18:47:44 -0800 (PST)
Received: by mail-qt1-x833.google.com with SMTP id
 d75a77b69052e-425ff068d02so515601cf.1
        for <linux-kernel@vger.kernel.org>;
 Wed, 13 Dec 2023 18:47:44 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bitbyteword.org; s=google; t=1702522063; x=1703126863;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=hAar5i9/lcWjhhQPz6VKvLzxomcjdsaa1gOTUvqg6u0=;
        b=TLk6wGLM6TvHmadZf7aJXG+U6KiAZWelER7w7r8xDWhLZsfrtEb3uXe21PZA4x/ocy
         98YNchMd2lCMK+NTAa9zamqfOZSMBPRy36+UAjNUth+qyeq+s9NCFczgriXUohb68uee
         GIL/zo9W6j+qMCREL0hwvfMRydTBrA8QHGdH/qaWcv2JzHWbRjpXrTNhq5nWt6OYta93
         HhX25H+zgKGIuxCQywm29nH+pJLn2k0ED2Vz7ct/5wMU8AMOEJxxvMfkCoxeFrvu0/d6
         lh9VnWC5HjNyDddN7mrfGBLHC0g25yRhrsy0tF+gWte0nqV8/+3kmvDEDxwlAzGJ7IlW
         XpQw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1702522063; x=1703126863;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=hAar5i9/lcWjhhQPz6VKvLzxomcjdsaa1gOTUvqg6u0=;
        b=YIPt2SzRwFyL5Yi5TVsUWunBG/rY60ucZ9gAcyHGmImRnFm93f5iN+mpkUfZ0IXTUE
         CVTVt0HYFlIC9W2QD3qL6PMB3Y+++obKg/5kMWHeHb60jM+G7Kvqs7oYU+i0PYFF8xwV
         ZQsAXACr9VJshl6SrFNmBXb38BHKoirQ5fQx9ak0Upls5VMLaak0Gab/gUEMmApkGx67
         LxwkyCcmRbNChrZefQiRAddrzc3qfFudePP6/vkGSvEasExnYfCPbcFgOX6ZqghxbFxr
         P03sK+kjD87f4W09T3boCRLewkD3HhlHpgeXOlretDF6yvoVrhWaMqH3BUtDS4fpcH+J
         h46w==
X-Gm-Message-State: AOJu0YydjNqNy7bHTyQBnN4h1T/tJj9Qx6qCaDQbsF/mIsQ0xgFJCR5W
        lVaMjRlFISoYGvx1ygEw+jQ30Q==
X-Google-Smtp-Source: 
 AGHT+IFGgq1THcmm+bd9GOJsDlZSeGvBOLxsyJq1Ep1X2Gc7vAduDsMkJsEoVLK7RPMLAoAHN2gtmw==
X-Received: by 2002:a05:622a:1308:b0:417:f85b:5a5a with SMTP id
 v8-20020a05622a130800b00417f85b5a5amr8227335qtk.5.1702522063178;
        Wed, 13 Dec 2023 18:47:43 -0800 (PST)
Received: from vinp3lin.lan (c-73-143-21-186.hsd1.vt.comcast.net.
 [73.143.21.186])
        by smtp.gmail.com with ESMTPSA id
 fh3-20020a05622a588300b00425b356b919sm4240208qtb.55.2023.12.13.18.47.41
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 13 Dec 2023 18:47:42 -0800 (PST)
From: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>
To: Ben Segall <bsegall@google.com>, Borislav Petkov <bp@alien8.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        "H . Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
        Juri Lelli <juri.lelli@redhat.com>,
        Mel Gorman <mgorman@suse.de>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Andy Lutomirski <luto@kernel.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Sean Christopherson <seanjc@google.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Valentin Schneider <vschneid@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>
Cc: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>,
        Suleiman Souhlal <suleiman@google.com>,
        Masami Hiramatsu <mhiramat@google.com>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, x86@kernel.org,
        Joel Fernandes <joel@joelfernandes.org>
Subject: [RFC PATCH 7/8] sched/core: boost/unboost in guest scheduler
Date: Wed, 13 Dec 2023 21:47:24 -0500
Message-ID: <20231214024727.3503870-8-vineeth@bitbyteword.org>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20231214024727.3503870-1-vineeth@bitbyteword.org>
References: <20231214024727.3503870-1-vineeth@bitbyteword.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

RT or higher priority tasks in guest is considered a critical workload
and guest scheduler can request boost/unboost on a task switch and/or a
task wakeup. Also share the preempt status of guest vcpu with the host
so that host can take decision on boot/unboost.

CONFIG_TRACE_PREEMPT_TOGGLE is enabled for using the function
equivalent of preempt_count_{add,sub} to update the shared memory.
Another option is to update the preempt_count_{add,sub} macros, but
it will be more code churn and complex.

Boost request is lazy, but unboost request is synchronous.

Detect the feature in guest from cpuid flags and use the MSR to pass the
GPA of memory location for sharing scheduling information.

Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
---
 arch/x86/Kconfig                | 13 +++++
 arch/x86/include/asm/kvm_para.h |  7 +++
 arch/x86/kernel/kvm.c           | 16 ++++++
 include/linux/sched.h           | 21 ++++++++
 kernel/entry/common.c           |  9 ++++
 kernel/sched/core.c             | 93 ++++++++++++++++++++++++++++++++-
 6 files changed, 158 insertions(+), 1 deletion(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 68ce4f786dcd..556ae2698633 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -827,6 +827,19 @@ config KVM_GUEST
 	  underlying device model, the host provides the guest with
 	  timing infrastructure such as time of day, and system time
=20
+config PARAVIRT_SCHED
+	bool "Enable paravirt scheduling capability for guests"
+	depends on KVM_GUEST
+	select TRACE_PREEMPT_TOGGLE
+	help
+	  Paravirtualized scheduling facilitates the exchange of scheduling
+	  related information between the host and guest through shared memory,
+	  enhancing the efficiency of vCPU thread scheduling by the hypervisor.
+	  An illustrative use case involves dynamically boosting the priority of
+	  a vCPU thread when the guest is executing a latency-sensitive workload
+	  on that specific vCPU.
+	  This config enables paravirt scheduling in guest(VM).
+
 config ARCH_CPUIDLE_HALTPOLL
 	def_bool n
 	prompt "Disable host haltpoll when loading haltpoll driver"
diff --git a/arch/x86/include/asm/kvm_para.h b/arch/x86/include/asm/kvm_par=
a.h
index 57bc74e112f2..3473dd2915b5 100644
--- a/arch/x86/include/asm/kvm_para.h
+++ b/arch/x86/include/asm/kvm_para.h
@@ -176,4 +176,11 @@ static __always_inline bool kvm_handle_async_pf(struct=
 pt_regs *regs, u32 token)
 }
 #endif
=20
+#ifdef CONFIG_PARAVIRT_SCHED
+static inline void kvm_pv_sched_notify_host(void)
+{
+	wrmsrl(MSR_KVM_PV_SCHED, ULLONG_MAX);
+}
+#endif
+
 #endif /* _ASM_X86_KVM_PARA_H */
diff --git a/arch/x86/kernel/kvm.c b/arch/x86/kernel/kvm.c
index 526d4da3dcd4..5f96b228bdd5 100644
--- a/arch/x86/kernel/kvm.c
+++ b/arch/x86/kernel/kvm.c
@@ -377,6 +377,14 @@ static void kvm_guest_cpu_init(void)
 		wrmsrl(MSR_KVM_PV_EOI_EN, pa);
 	}
=20
+#ifdef CONFIG_PARAVIRT_SCHED
+	if (pv_sched_enabled()) {
+		unsigned long pa =3D pv_sched_pa() | KVM_MSR_ENABLED;
+
+		wrmsrl(MSR_KVM_PV_SCHED, pa);
+	}
+#endif
+
 	if (has_steal_clock)
 		kvm_register_steal_time();
 }
@@ -832,6 +840,14 @@ static void __init kvm_guest_init(void)
 		alloc_intr_gate(HYPERVISOR_CALLBACK_VECTOR, asm_sysvec_kvm_asyncpf_inter=
rupt);
 	}
=20
+#ifdef CONFIG_PARAVIRT_SCHED
+	if (kvm_para_has_feature(KVM_FEATURE_PV_SCHED)) {
+		pr_info("KVM host has PV_SCHED!\n");
+		pv_sched_enable();
+	} else
+		pr_info("KVM host does not support PV_SCHED!\n");
+#endif
+
 #ifdef CONFIG_SMP
 	if (pv_tlb_flush_supported()) {
 		pv_ops.mmu.flush_tlb_multi =3D kvm_flush_tlb_multi;
diff --git a/include/linux/sched.h b/include/linux/sched.h
index de7382f149cf..e740b1e8abe3 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2442,4 +2442,25 @@ static inline void sched_core_fork(struct task_struc=
t *p) { }
=20
 extern void sched_set_stop_task(int cpu, struct task_struct *stop);
=20
+#ifdef CONFIG_PARAVIRT_SCHED
+DECLARE_STATIC_KEY_FALSE(__pv_sched_enabled);
+
+extern unsigned long pv_sched_pa(void);
+
+static inline bool pv_sched_enabled(void)
+{
+	return static_branch_unlikely(&__pv_sched_enabled);
+}
+
+static inline void pv_sched_enable(void)
+{
+	static_branch_enable(&__pv_sched_enabled);
+}
+
+extern bool pv_sched_vcpu_boosted(void);
+extern void pv_sched_boost_vcpu(void);
+extern void pv_sched_unboost_vcpu(void);
+extern void pv_sched_boost_vcpu_lazy(void);
+extern void pv_sched_unboost_vcpu_lazy(void);
+#endif
 #endif
diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index be61332c66b5..fae56faac0b0 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -210,6 +210,15 @@ static void exit_to_user_mode_prepare(struct pt_regs *=
regs)
 	kmap_assert_nomap();
 	lockdep_assert_irqs_disabled();
 	lockdep_sys_exit();
+#ifdef CONFIG_PARAVIRT_SCHED
+	/*
+	 * Guest requests a boost when preemption is disabled but does not request
+	 * an immediate unboost when preemption is enabled back. There is a chance
+	 * that we are boosted here. Unboost if needed.
+	 */
+	if (pv_sched_enabled() && !task_is_realtime(current))
+		pv_sched_unboost_vcpu();
+#endif
 }
=20
 /*
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index b47f72b6595f..57f211f1b3d7 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -151,6 +151,71 @@ const_debug unsigned int sysctl_sched_nr_migrate =3D S=
CHED_NR_MIGRATE_BREAK;
=20
 __read_mostly int scheduler_running;
=20
+#ifdef CONFIG_PARAVIRT_SCHED
+#include <linux/kvm_para.h>
+
+DEFINE_STATIC_KEY_FALSE(__pv_sched_enabled);
+
+DEFINE_PER_CPU_DECRYPTED(struct pv_sched_data, pv_sched) __aligned(64);
+
+unsigned long pv_sched_pa(void)
+{
+	return slow_virt_to_phys(this_cpu_ptr(&pv_sched));
+}
+
+bool pv_sched_vcpu_boosted(void)
+{
+	return (this_cpu_read(pv_sched.boost_status) =3D=3D VCPU_BOOST_BOOSTED);
+}
+
+void pv_sched_boost_vcpu_lazy(void)
+{
+	this_cpu_write(pv_sched.schedinfo.boost_req, VCPU_REQ_BOOST);
+}
+
+void pv_sched_unboost_vcpu_lazy(void)
+{
+	this_cpu_write(pv_sched.schedinfo.boost_req, VCPU_REQ_UNBOOST);
+}
+
+void pv_sched_boost_vcpu(void)
+{
+	pv_sched_boost_vcpu_lazy();
+	/*
+	 * XXX: there could be a race between the boost_status check
+	 *      and hypercall.
+	 */
+	if (this_cpu_read(pv_sched.boost_status) =3D=3D VCPU_BOOST_NORMAL)
+		kvm_pv_sched_notify_host();
+}
+
+void pv_sched_unboost_vcpu(void)
+{
+	pv_sched_unboost_vcpu_lazy();
+	/*
+	 * XXX: there could be a race between the boost_status check
+	 *      and hypercall.
+	 */
+	if (this_cpu_read(pv_sched.boost_status) =3D=3D VCPU_BOOST_BOOSTED &&
+			!preempt_count())
+		kvm_pv_sched_notify_host();
+}
+
+/*
+ * Share the preemption enabled/disabled status with host. This will not i=
ncur a
+ * VMEXIT and acts as a lazy boost/unboost mechanism - host will check thi=
s on
+ * the next VMEXIT for boost/unboost decisions.
+ * XXX: Lazy unboosting may allow cfs tasks to run on RT vcpu till next VM=
EXIT.
+ */
+static inline void pv_sched_update_preempt_status(bool preempt_disabled)
+{
+	if (pv_sched_enabled())
+		this_cpu_write(pv_sched.schedinfo.preempt_disabled, preempt_disabled);
+}
+#else
+static inline void pv_sched_update_preempt_status(bool preempt_disabled) {}
+#endif
+
 #ifdef CONFIG_SCHED_CORE
=20
 DEFINE_STATIC_KEY_FALSE(__sched_core_enabled);
@@ -2070,6 +2135,19 @@ unsigned long get_wchan(struct task_struct *p)
=20
 static inline void enqueue_task(struct rq *rq, struct task_struct *p, int =
flags)
 {
+#ifdef CONFIG_PARAVIRT_SCHED
+	/*
+	 * TODO: currently request for boosting remote vcpus is not implemented. =
So
+	 * we boost only if this enqueue happens for this cpu.
+	 * This is not a big problem though, target cpu gets an IPI and then gets
+	 * boosted by the host. Posted interrupts is an exception where target vc=
pu
+	 * will not get boosted immediately, but on the next schedule().
+	 */
+	if (pv_sched_enabled() && this_rq() =3D=3D rq &&
+			sched_class_above(p->sched_class, &fair_sched_class))
+		pv_sched_boost_vcpu_lazy();
+#endif
+
 	if (!(flags & ENQUEUE_NOCLOCK))
 		update_rq_clock(rq);
=20
@@ -5835,6 +5913,8 @@ static inline void preempt_latency_start(int val)
 #ifdef CONFIG_DEBUG_PREEMPT
 		current->preempt_disable_ip =3D ip;
 #endif
+		pv_sched_update_preempt_status(true);
+
 		trace_preempt_off(CALLER_ADDR0, ip);
 	}
 }
@@ -5867,8 +5947,10 @@ NOKPROBE_SYMBOL(preempt_count_add);
  */
 static inline void preempt_latency_stop(int val)
 {
-	if (preempt_count() =3D=3D val)
+	if (preempt_count() =3D=3D val) {
+		pv_sched_update_preempt_status(false);
 		trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip());
+	}
 }
=20
 void preempt_count_sub(int val)
@@ -6678,6 +6760,15 @@ static void __sched notrace __schedule(unsigned int =
sched_mode)
 	rq->last_seen_need_resched_ns =3D 0;
 #endif
=20
+#ifdef CONFIG_PARAVIRT_SCHED
+	if (pv_sched_enabled()) {
+		if (sched_class_above(next->sched_class, &fair_sched_class))
+			pv_sched_boost_vcpu_lazy();
+		else if (next->sched_class =3D=3D &fair_sched_class)
+			pv_sched_unboost_vcpu();
+	}
+#endif
+
 	if (likely(prev !=3D next)) {
 		rq->nr_switches++;
 		/*
--=20
2.43.0
From nobody Mon Feb  9 05:05:20 2026
Return-Path: <linux-kernel-owner@vger.kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from vger.kernel.org (vger.kernel.org [23.128.96.18])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 401F6C4332F
	for <linux-kernel@archiver.kernel.org>; Thu, 14 Dec 2023 02:48:13 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1443092AbjLNCsE (ORCPT
        <rfc822;linux-kernel@archiver.kernel.org>);
        Wed, 13 Dec 2023 21:48:04 -0500
Received: from lindbergh.monkeyblade.net ([23.128.96.19]:58528 "EHLO
        lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S235413AbjLNCrm (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Wed, 13 Dec 2023 21:47:42 -0500
Received: from mail-qt1-x832.google.com (mail-qt1-x832.google.com
 [IPv6:2607:f8b0:4864:20::832])
        by lindbergh.monkeyblade.net (Postfix) with ESMTPS id D2E9C199
        for <linux-kernel@vger.kernel.org>;
 Wed, 13 Dec 2023 18:47:45 -0800 (PST)
Received: by mail-qt1-x832.google.com with SMTP id
 d75a77b69052e-4258e0a0dc1so45301331cf.2
        for <linux-kernel@vger.kernel.org>;
 Wed, 13 Dec 2023 18:47:45 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=bitbyteword.org; s=google; t=1702522064; x=1703126864;
 darn=vger.kernel.org;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:from:to:cc:subject:date
         :message-id:reply-to;
        bh=MJ8V9wpZFRMaxTJ2Yr+JKkom97NUFELsE8HKkLVd89U=;
        b=tTwcoFrpm1z648NnqZPTYvJRG8qRAP6HhEhissBvfO3WICzlER+W4Tt/oI7kmMIP9f
         cuwJw6erApazvVuJOsmICilQm5tVsJlbPvoBAQY+jE8fhvJ9EEt+aZap7P6j2KrttjT8
         ajUUve1LCghnI3HfHmKzTePg0y6NqTof15pSkJUrVGJfAwkHMmKPcC0ans+73KGvpcm6
         +vIekWNczLtj8PFttljYj4NeVH0bB9j61SwGzwiZr3MnRn4KSJKHnf/9zLNnMWCea1H6
         W5PDTA315RGU9qFRrhFMJ2R7vZiE4BPt4hvi2DwRLOND3Pwhozz5zAjH0glcKaUO+CQn
         3JpQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1702522064; x=1703126864;
        h=content-transfer-encoding:mime-version:references:in-reply-to
         :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc
         :subject:date:message-id:reply-to;
        bh=MJ8V9wpZFRMaxTJ2Yr+JKkom97NUFELsE8HKkLVd89U=;
        b=rqM6SElFgDlutMjfoXuhZfgFLoZwa/4NgQ7ennqGTOawDo0fv+kclxPurXtJ9C/+mz
         BctkTHoWtt744vA0tXKcyRbdfhoSKNtIHJsaoODadIT+2XMXyuw8kzEje6Co+Yc1/iVV
         b3Cram6eg1XjffbLplwHZVGPz7J/gpU8VLFGhkLIPz59kGKpZgQq+dXKqEuiuHIE//hf
         ftqA0G0evydcLQQ20d/p+fAzsOhtA204eZnzxf4/mgMYI8YNFBN2e2jtETCU/FV6rWdP
         x8SD492+UUJHh7bGE7E9N9rHzH4k7Jlzsq5Doa4yEpu2bDEogvMdQd4kL6LNpdjWTz51
         RANA==
X-Gm-Message-State: AOJu0YzY1Dl/8BMyCko7XMX/xuCsk/Mwve1k0Ntx5TFEIsIccvKySe3W
        VMmABZNnhvP/Fe0z7KBp1iupbQ==
X-Google-Smtp-Source: 
 AGHT+IHCsSa54wlj5dnr98DogOBhKgfMGGbDmmsgR6hVaIJ/We5ltJAIKpR8uR1tHmUAsiuMmCDtIg==
X-Received: by 2002:a05:622a:251:b0:418:1088:7d69 with SMTP id
 c17-20020a05622a025100b0041810887d69mr12636997qtx.18.1702522064725;
        Wed, 13 Dec 2023 18:47:44 -0800 (PST)
Received: from vinp3lin.lan (c-73-143-21-186.hsd1.vt.comcast.net.
 [73.143.21.186])
        by smtp.gmail.com with ESMTPSA id
 fh3-20020a05622a588300b00425b356b919sm4240208qtb.55.2023.12.13.18.47.43
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Wed, 13 Dec 2023 18:47:44 -0800 (PST)
From: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>
To: Ben Segall <bsegall@google.com>, Borislav Petkov <bp@alien8.de>,
        Daniel Bristot de Oliveira <bristot@redhat.com>,
        Dave Hansen <dave.hansen@linux.intel.com>,
        Dietmar Eggemann <dietmar.eggemann@arm.com>,
        "H . Peter Anvin" <hpa@zytor.com>, Ingo Molnar <mingo@redhat.com>,
        Juri Lelli <juri.lelli@redhat.com>,
        Mel Gorman <mgorman@suse.de>,
        Paolo Bonzini <pbonzini@redhat.com>,
        Andy Lutomirski <luto@kernel.org>,
        Peter Zijlstra <peterz@infradead.org>,
        Sean Christopherson <seanjc@google.com>,
        Steven Rostedt <rostedt@goodmis.org>,
        Thomas Gleixner <tglx@linutronix.de>,
        Valentin Schneider <vschneid@redhat.com>,
        Vincent Guittot <vincent.guittot@linaro.org>,
        Vitaly Kuznetsov <vkuznets@redhat.com>,
        Wanpeng Li <wanpengli@tencent.com>
Cc: "Vineeth Pillai (Google)" <vineeth@bitbyteword.org>,
        Suleiman Souhlal <suleiman@google.com>,
        Masami Hiramatsu <mhiramat@google.com>, kvm@vger.kernel.org,
        linux-kernel@vger.kernel.org, x86@kernel.org,
        Joel Fernandes <joel@joelfernandes.org>
Subject: [RFC PATCH 8/8] irq: boost/unboost in irq/nmi entry/exit and softirq
Date: Wed, 13 Dec 2023 21:47:25 -0500
Message-ID: <20231214024727.3503870-9-vineeth@bitbyteword.org>
X-Mailer: git-send-email 2.43.0
In-Reply-To: <20231214024727.3503870-1-vineeth@bitbyteword.org>
References: <20231214024727.3503870-1-vineeth@bitbyteword.org>
MIME-Version: 1.0
Content-Transfer-Encoding: quoted-printable
Precedence: bulk
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org
Content-Type: text/plain; charset="utf-8"

The host proactively boosts the VCPU threads during irq/nmi injection.
However, the host is unaware of posted interrupts, and therefore, the
guest should request a boost if it has not already been boosted.

Similarly, guest should request an unboost on irq/nmi/softirq exit if
the vcpu doesn't need the boost any more.

Co-developed-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Joel Fernandes (Google) <joel@joelfernandes.org>
Signed-off-by: Vineeth Pillai (Google) <vineeth@bitbyteword.org>
---
 kernel/entry/common.c | 30 ++++++++++++++++++++++++++++++
 kernel/softirq.c      | 11 +++++++++++
 2 files changed, 41 insertions(+)

diff --git a/kernel/entry/common.c b/kernel/entry/common.c
index fae56faac0b0..c69912b71725 100644
--- a/kernel/entry/common.c
+++ b/kernel/entry/common.c
@@ -327,6 +327,13 @@ noinstr irqentry_state_t irqentry_enter(struct pt_regs=
 *regs)
 		.exit_rcu =3D false,
 	};
=20
+#ifdef CONFIG_PARAVIRT_SCHED
+	instrumentation_begin();
+	if (pv_sched_enabled())
+		pv_sched_boost_vcpu_lazy();
+	instrumentation_end();
+#endif
+
 	if (user_mode(regs)) {
 		irqentry_enter_from_user_mode(regs);
 		return ret;
@@ -452,6 +459,18 @@ noinstr void irqentry_exit(struct pt_regs *regs, irqen=
try_state_t state)
 		if (state.exit_rcu)
 			ct_irq_exit();
 	}
+
+#ifdef CONFIG_PARAVIRT_SCHED
+	instrumentation_begin();
+	/*
+	 * On irq exit, request a deboost from hypervisor if no softirq pending
+	 * and current task is not RT and !need_resched.
+	 */
+	if (pv_sched_enabled() && !local_softirq_pending() &&
+			!need_resched() && !task_is_realtime(current))
+		pv_sched_unboost_vcpu();
+	instrumentation_end();
+#endif
 }
=20
 irqentry_state_t noinstr irqentry_nmi_enter(struct pt_regs *regs)
@@ -469,6 +488,11 @@ irqentry_state_t noinstr irqentry_nmi_enter(struct pt_=
regs *regs)
 	kmsan_unpoison_entry_regs(regs);
 	trace_hardirqs_off_finish();
 	ftrace_nmi_enter();
+
+#ifdef CONFIG_PARAVIRT_SCHED
+	if (pv_sched_enabled())
+		pv_sched_boost_vcpu_lazy();
+#endif
 	instrumentation_end();
=20
 	return irq_state;
@@ -482,6 +506,12 @@ void noinstr irqentry_nmi_exit(struct pt_regs *regs, i=
rqentry_state_t irq_state)
 		trace_hardirqs_on_prepare();
 		lockdep_hardirqs_on_prepare();
 	}
+
+#ifdef CONFIG_PARAVIRT_SCHED
+	if (pv_sched_enabled() && !in_hardirq() && !local_softirq_pending() &&
+			!need_resched() && !task_is_realtime(current))
+		pv_sched_unboost_vcpu();
+#endif
 	instrumentation_end();
=20
 	ct_nmi_exit();
diff --git a/kernel/softirq.c b/kernel/softirq.c
index 807b34ccd797..90a127615e16 100644
--- a/kernel/softirq.c
+++ b/kernel/softirq.c
@@ -530,6 +530,11 @@ asmlinkage __visible void __softirq_entry __do_softirq=
(void)
 	in_hardirq =3D lockdep_softirq_start();
 	account_softirq_enter(current);
=20
+#ifdef CONFIG_PARAVIRT_SCHED
+	if (pv_sched_enabled())
+		pv_sched_boost_vcpu_lazy();
+#endif
+
 restart:
 	/* Reset the pending bitmask before enabling irqs */
 	set_softirq_pending(0);
@@ -577,6 +582,12 @@ asmlinkage __visible void __softirq_entry __do_softirq=
(void)
 		wakeup_softirqd();
 	}
=20
+#ifdef CONFIG_PARAVIRT_SCHED
+	if (pv_sched_enabled() && !need_resched() &&
+			!task_is_realtime(current))
+		pv_sched_unboost_vcpu();
+#endif
+
 	account_softirq_exit(current);
 	lockdep_softirq_end(in_hardirq);
 	softirq_handle_end();
--=20
2.43.0