From: Mykola Kvach <mykola_kvach@epam.com>
This patch adds support for the PSCI SYSTEM_SUSPEND function in the vPSCI
(virtual PSCI) interface, allowing guests to request suspend via the PSCI
v1.0 SYSTEM_SUSPEND call (both 32-bit and 64-bit variants).
The implementation:
- Adds SYSTEM_SUSPEND function IDs to PSCI definitions
- Implements trapping and handling of SYSTEM_SUSPEND in vPSCI
- Allows only non-hardware domains to invoke SYSTEM_SUSPEND; for the
hardware domain, PSCI_NOT_SUPPORTED is returned to avoid halting the
system in hwdom_shutdown() called from domain_shutdown
- Ensures all secondary VCPUs of the calling domain are offline before
allowing suspend due to PSCI spec
GIC and virtual timer context must be saved when the domain suspends.
This is done by moving the respective code in ctxt_switch_from
before the return that happens if the domain suspended.
domain_resume_nopause() is introduced to allow resuming a domain from
SYSTEM_SUSPEND without pausing it first. This avoids problematic
domain_pause() calls when resuming from suspend on Arm, particularly
in error paths. The function is only used on Arm; other architectures
continue to use the original domain_resume().
Usage:
For Linux-based guests, suspend can be initiated with:
echo mem > /sys/power/state
or via:
systemctl suspend
Resuming the guest is performed from control domain using:
xl resume <domain>
Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
Changes in V9:
- no functional changes
- cosmetic chnages after review
- enhance commit message and add extra comment to the code after review
Changes in V8:
- GIC and virtual timer context must be saved when the domain suspends
- rework locking
- minor changes after code review
Changes in V7:
- add proper locking
- minor changes after code review
Changes in V6:
- skip execution of ctxt_switch_from for vcpu that is in paused domain
- add implementation of domain_resume without domain_pause
- add helper function to determine if vcpu is suspended or not
- ignore upper 32 bits of argument values when the domain is 64-bit
and calls the SMC32 SYSTEM_SUSPEND function
- cosmetic changes after review
Changes in V5:
- don't use standby mode, restore execution in a provided by guest point
- move checking that all CPUs, except current one, are offline to after
pausing the vCPUs
- provide ret status from arch_domain_shutdown and handle it in
domain_shutdown
- adjust VPSCI_NR_FUNCS to reflect the number of newly added PSCI functions
Changes in V4:
Dropped all changes related to watchdog, domain is marked as shutting
down in domain_shutdown and watchdog timeout handler won't trigger
because of it.
Previous versions included code to manage Xen watchdog timers during suspend,
but this was removed. When a guest OS starts the Xen watchdog (either via the
kernel driver or xenwatchdogd), it is responsible for managing that state
across suspend/resume. On Linux, the Xen kernel driver properly stops the
watchdog during suspend. However, when xenwatchdogd is used instead, suspend
handling is incomplete, potentially leading to watchdog-triggered resets on
resume. Xen leaves watchdog handling to the guest OS and its services.
Dropped all changes related to VCPU context, because instead domain_shutdown
is used, so we don't need any extra changes for suspending domain.
Changes in V3:
Dropped all domain flags and related code (which touched common functions like
vcpu_unblock), keeping only the necessary changes for Xen suspend/resume, i.e.
suspend/resume is now fully supported only for the hardware domain.
Proper support for domU suspend/resume will be added in a future patch.
This patch does not yet include VCPU context reset or domain context
restoration in VCPU.
---
xen/arch/arm/domain.c | 17 +++--
xen/arch/arm/include/asm/perfc_defn.h | 1 +
xen/arch/arm/include/asm/psci.h | 2 +
xen/arch/arm/include/asm/vpsci.h | 2 +-
xen/arch/arm/vpsci.c | 101 ++++++++++++++++++++++----
xen/common/domain.c | 31 ++++++--
xen/include/xen/sched.h | 6 ++
7 files changed, 131 insertions(+), 29 deletions(-)
diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
index 310c578909..9e9649c4e2 100644
--- a/xen/arch/arm/domain.c
+++ b/xen/arch/arm/domain.c
@@ -90,6 +90,16 @@ static void ctxt_switch_from(struct vcpu *p)
if ( is_idle_vcpu(p) )
return;
+ /* Arch timer */
+ p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1);
+ virt_timer_save(p);
+
+ /* VGIC */
+ gic_save_state(p);
+
+ if ( test_bit(_VPF_suspended, &p->pause_flags) )
+ return;
+
p2m_save_state(p);
/* CP 15 */
@@ -106,10 +116,6 @@ static void ctxt_switch_from(struct vcpu *p)
p->arch.tpidrro_el0 = READ_SYSREG(TPIDRRO_EL0);
p->arch.tpidr_el1 = READ_SYSREG(TPIDR_EL1);
- /* Arch timer */
- p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1);
- virt_timer_save(p);
-
if ( is_32bit_domain(p->domain) && cpu_has_thumbee )
{
p->arch.teecr = READ_SYSREG(TEECR32_EL1);
@@ -158,9 +164,6 @@ static void ctxt_switch_from(struct vcpu *p)
/* XXX MPU */
- /* VGIC */
- gic_save_state(p);
-
isb();
}
diff --git a/xen/arch/arm/include/asm/perfc_defn.h b/xen/arch/arm/include/asm/perfc_defn.h
index effd25b69e..8dfcac7e3b 100644
--- a/xen/arch/arm/include/asm/perfc_defn.h
+++ b/xen/arch/arm/include/asm/perfc_defn.h
@@ -33,6 +33,7 @@ PERFCOUNTER(vpsci_system_reset, "vpsci: system_reset")
PERFCOUNTER(vpsci_cpu_suspend, "vpsci: cpu_suspend")
PERFCOUNTER(vpsci_cpu_affinity_info, "vpsci: cpu_affinity_info")
PERFCOUNTER(vpsci_features, "vpsci: features")
+PERFCOUNTER(vpsci_system_suspend, "vpsci: system_suspend")
PERFCOUNTER(vcpu_kick, "vcpu: notify other vcpu")
diff --git a/xen/arch/arm/include/asm/psci.h b/xen/arch/arm/include/asm/psci.h
index 4780972621..48a93e6b79 100644
--- a/xen/arch/arm/include/asm/psci.h
+++ b/xen/arch/arm/include/asm/psci.h
@@ -47,10 +47,12 @@ void call_psci_system_reset(void);
#define PSCI_0_2_FN32_SYSTEM_OFF PSCI_0_2_FN32(8)
#define PSCI_0_2_FN32_SYSTEM_RESET PSCI_0_2_FN32(9)
#define PSCI_1_0_FN32_PSCI_FEATURES PSCI_0_2_FN32(10)
+#define PSCI_1_0_FN32_SYSTEM_SUSPEND PSCI_0_2_FN32(14)
#define PSCI_0_2_FN64_CPU_SUSPEND PSCI_0_2_FN64(1)
#define PSCI_0_2_FN64_CPU_ON PSCI_0_2_FN64(3)
#define PSCI_0_2_FN64_AFFINITY_INFO PSCI_0_2_FN64(4)
+#define PSCI_1_0_FN64_SYSTEM_SUSPEND PSCI_0_2_FN64(14)
/* PSCI v0.2 affinity level state returned by AFFINITY_INFO */
#define PSCI_0_2_AFFINITY_LEVEL_ON 0
diff --git a/xen/arch/arm/include/asm/vpsci.h b/xen/arch/arm/include/asm/vpsci.h
index 0cca5e6830..69d40f9d7f 100644
--- a/xen/arch/arm/include/asm/vpsci.h
+++ b/xen/arch/arm/include/asm/vpsci.h
@@ -23,7 +23,7 @@
#include <asm/psci.h>
/* Number of function implemented by virtual PSCI (only 0.2 or later) */
-#define VPSCI_NR_FUNCS 12
+#define VPSCI_NR_FUNCS 14
/* Functions handle PSCI calls from the guests */
bool do_vpsci_0_1_call(struct cpu_user_regs *regs, uint32_t fid);
diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
index 7ba9ccd94b..67d369a8a2 100644
--- a/xen/arch/arm/vpsci.c
+++ b/xen/arch/arm/vpsci.c
@@ -10,28 +10,18 @@
#include <public/sched.h>
-static int do_common_cpu_on(register_t target_cpu, register_t entry_point,
- register_t context_id)
+static int do_setup_vcpu_ctx(struct vcpu *v, register_t entry_point,
+ register_t context_id)
{
- struct vcpu *v;
struct domain *d = current->domain;
struct vcpu_guest_context *ctxt;
int rc;
bool is_thumb = entry_point & 1;
- register_t vcpuid;
-
- vcpuid = vaffinity_to_vcpuid(target_cpu);
-
- if ( (v = domain_vcpu(d, vcpuid)) == NULL )
- return PSCI_INVALID_PARAMETERS;
/* THUMB set is not allowed with 64-bit domain */
if ( is_64bit_domain(d) && is_thumb )
return PSCI_INVALID_ADDRESS;
- if ( !test_bit(_VPF_down, &v->pause_flags) )
- return PSCI_ALREADY_ON;
-
if ( (ctxt = alloc_vcpu_guest_context()) == NULL )
return PSCI_DENIED;
@@ -78,11 +68,32 @@ static int do_common_cpu_on(register_t target_cpu, register_t entry_point,
if ( rc < 0 )
return PSCI_DENIED;
- vcpu_wake(v);
-
return PSCI_SUCCESS;
}
+static int do_common_cpu_on(register_t target_cpu, register_t entry_point,
+ register_t context_id)
+{
+ int rc;
+ struct vcpu *v;
+ struct domain *d = current->domain;
+ register_t vcpuid;
+
+ vcpuid = vaffinity_to_vcpuid(target_cpu);
+
+ if ( (v = domain_vcpu(d, vcpuid)) == NULL )
+ return PSCI_INVALID_PARAMETERS;
+
+ if ( !test_bit(_VPF_down, &v->pause_flags) )
+ return PSCI_ALREADY_ON;
+
+ rc = do_setup_vcpu_ctx(v, entry_point, context_id);
+ if ( rc == PSCI_SUCCESS )
+ vcpu_wake(v);
+
+ return rc;
+}
+
static int32_t do_psci_cpu_on(uint32_t vcpuid, register_t entry_point)
{
int32_t ret;
@@ -197,6 +208,48 @@ static void do_psci_0_2_system_reset(void)
domain_shutdown(d,SHUTDOWN_reboot);
}
+static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
+{
+ int32_t rc;
+ struct vcpu *v;
+ struct domain *d = current->domain;
+
+ /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
+ if ( is_hardware_domain(d) )
+ return PSCI_NOT_SUPPORTED;
+
+ /* Ensure that all CPUs other than the calling one are offline */
+ domain_lock(d);
+ for_each_vcpu ( d, v )
+ {
+ if ( v != current && is_vcpu_online(v) )
+ {
+ domain_unlock(d);
+ return PSCI_DENIED;
+ }
+ }
+ domain_unlock(d);
+
+ rc = domain_shutdown(d, SHUTDOWN_suspend);
+ if ( rc )
+ return PSCI_DENIED;
+
+ rc = do_setup_vcpu_ctx(current, epoint, cid);
+ if ( rc != PSCI_SUCCESS )
+ {
+ domain_resume_nopause(d);
+ return rc;
+ }
+
+ set_bit(_VPF_suspended, ¤t->pause_flags);
+
+ dprintk(XENLOG_DEBUG,
+ "Dom %u: SYSTEM_SUSPEND requested, epoint=%#lx, cid=%#lx\n",
+ d->domain_id, epoint, cid);
+
+ return rc;
+}
+
static int32_t do_psci_1_0_features(uint32_t psci_func_id)
{
/* /!\ Ordered by function ID and not name */
@@ -214,6 +267,8 @@ static int32_t do_psci_1_0_features(uint32_t psci_func_id)
case PSCI_0_2_FN32_SYSTEM_OFF:
case PSCI_0_2_FN32_SYSTEM_RESET:
case PSCI_1_0_FN32_PSCI_FEATURES:
+ case PSCI_1_0_FN32_SYSTEM_SUSPEND:
+ case PSCI_1_0_FN64_SYSTEM_SUSPEND:
case ARM_SMCCC_VERSION_FID:
return 0;
default:
@@ -344,6 +399,24 @@ bool do_vpsci_0_2_call(struct cpu_user_regs *regs, uint32_t fid)
return true;
}
+ case PSCI_1_0_FN32_SYSTEM_SUSPEND:
+ case PSCI_1_0_FN64_SYSTEM_SUSPEND:
+ {
+ register_t epoint = PSCI_ARG(regs, 1);
+ register_t cid = PSCI_ARG(regs, 2);
+
+ if ( is_64bit_domain(current->domain) &&
+ fid == PSCI_1_0_FN32_SYSTEM_SUSPEND )
+ {
+ epoint &= GENMASK(31, 0);
+ cid &= GENMASK(31, 0);
+ }
+
+ perfc_incr(vpsci_system_suspend);
+ PSCI_SET_RESULT(regs, do_psci_1_0_system_suspend(epoint, cid));
+ return true;
+ }
+
default:
return false;
}
diff --git a/xen/common/domain.c b/xen/common/domain.c
index 5241a1629e..624c3e2c27 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -1343,16 +1343,13 @@ int domain_shutdown(struct domain *d, u8 reason)
return 0;
}
-void domain_resume(struct domain *d)
+#ifndef CONFIG_ARM
+static
+#endif
+void domain_resume_nopause(struct domain *d)
{
struct vcpu *v;
- /*
- * Some code paths assume that shutdown status does not get reset under
- * their feet (e.g., some assertions make this assumption).
- */
- domain_pause(d);
-
spin_lock(&d->shutdown_lock);
d->is_shutting_down = d->is_shut_down = 0;
@@ -1360,13 +1357,33 @@ void domain_resume(struct domain *d)
for_each_vcpu ( d, v )
{
+ /*
+ * No need to conditionally clear _VPF_suspended here:
+ * - This bit is only set on Arm64, and only after a successful suspend.
+ * - domain_resume_nopause() may also be called from paths other than
+ * the suspend/resume flow, such as "soft-reset" actions (e.g.,
+ * on_poweroff), as part of the Xenstore control/shutdown protocol.
+ * These require guest acknowledgement to complete the operation.
+ * So clearing the bit unconditionally is safe.
+ */
+ clear_bit(_VPF_suspended, &v->pause_flags);
+
if ( v->paused_for_shutdown )
vcpu_unpause(v);
v->paused_for_shutdown = 0;
}
spin_unlock(&d->shutdown_lock);
+}
+void domain_resume(struct domain *d)
+{
+ /*
+ * Some code paths assume that shutdown status does not get reset under
+ * their feet (e.g., some assertions make this assumption).
+ */
+ domain_pause(d);
+ domain_resume_nopause(d);
domain_unpause(d);
}
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index fd5c9f9333..c1848d8ea6 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -814,6 +814,9 @@ void domain_destroy(struct domain *d);
int domain_kill(struct domain *d);
int domain_shutdown(struct domain *d, u8 reason);
void domain_resume(struct domain *d);
+#ifdef CONFIG_ARM
+void domain_resume_nopause(struct domain *d);
+#endif
int domain_soft_reset(struct domain *d, bool resuming);
@@ -1010,6 +1013,9 @@ static inline struct domain *next_domain_in_cpupool(
/* VCPU is parked. */
#define _VPF_parked 8
#define VPF_parked (1UL<<_VPF_parked)
+/* VCPU is suspended. */
+#define _VPF_suspended 9
+#define VPF_suspended (1UL << _VPF_suspended)
static inline bool vcpu_runnable(const struct vcpu *v)
{
--
2.48.1
Hi Mykola,
On 18/08/2025 09:49, Mykola Kvach wrote:
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> This patch adds support for the PSCI SYSTEM_SUSPEND function in the vPSCI
> (virtual PSCI) interface, allowing guests to request suspend via the PSCI
> v1.0 SYSTEM_SUSPEND call (both 32-bit and 64-bit variants).
>
> The implementation:
> - Adds SYSTEM_SUSPEND function IDs to PSCI definitions
> - Implements trapping and handling of SYSTEM_SUSPEND in vPSCI
> - Allows only non-hardware domains to invoke SYSTEM_SUSPEND; for the
> hardware domain, PSCI_NOT_SUPPORTED is returned to avoid halting the
> system in hwdom_shutdown() called from domain_shutdown
> - Ensures all secondary VCPUs of the calling domain are offline before
> allowing suspend due to PSCI spec
>
> GIC and virtual timer context must be saved when the domain suspends.
Can you expand a bit more on this? Is this a requirement from the Arm
Arm? If so, please specify the specification so we can easily check.
> This is done by moving the respective code in ctxt_switch_from
> before the return that happens if the domain suspended.
From the commit message, it is unclear why we want to skip the save part.
[...]
> ---
> xen/arch/arm/domain.c | 17 +++--
> xen/arch/arm/include/asm/perfc_defn.h | 1 +
> xen/arch/arm/include/asm/psci.h | 2 +
> xen/arch/arm/include/asm/vpsci.h | 2 +-
> xen/arch/arm/vpsci.c | 101 ++++++++++++++++++++++----
> xen/common/domain.c | 31 ++++++--
> xen/include/xen/sched.h | 6 ++
> 7 files changed, 131 insertions(+), 29 deletions(-)
>
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 310c578909..9e9649c4e2 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -90,6 +90,16 @@ static void ctxt_switch_from(struct vcpu *p)
> if ( is_idle_vcpu(p) )
> return;
>
It would be good to have a comment explaining what (and why) we need to
save only partially the state while the VCPU is suspended.
> + /* Arch timer */
> + p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1);
> + virt_timer_save(p);
> +
> + /* VGIC */
> + gic_save_state(p);
> +
> + if ( test_bit(_VPF_suspended, &p->pause_flags) )
> + return;
Can you explain why we don't need an isb()?
> +
> p2m_save_state(p);
At least part of p2m_save_state() can't be skipped because it is
applying a workaround on platform where AT mistakenly speculate.
[...]
>
> +static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
> +{
> + int32_t rc;
> + struct vcpu *v;
> + struct domain *d = current->domain;
> +
> + /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
> + if ( is_hardware_domain(d) )
> + return PSCI_NOT_SUPPORTED;
> +
> + /* Ensure that all CPUs other than the calling one are offline */
> + domain_lock(d);
> + for_each_vcpu ( d, v )
> + {
> + if ( v != current && is_vcpu_online(v) )
> + {
> + domain_unlock(d);
> + return PSCI_DENIED;
> + }
> + }
> + domain_unlock(d);
> +
> + rc = domain_shutdown(d, SHUTDOWN_suspend);
> + if ( rc )
> + return PSCI_DENIED;
> +
> + rc = do_setup_vcpu_ctx(current, epoint, cid);
> + if ( rc != PSCI_SUCCESS )
> + {
> + domain_resume_nopause(d);
> + return rc;
> + }
> +
> + set_bit(_VPF_suspended, ¤t->pause_flags);
> +
> + dprintk(XENLOG_DEBUG,
I think you should use gdprintk() which will print the domain for you as
well as appropriately ratelimit the message.
That said, I would consider using gprintk() so the message can also be
printed in a debug build.
> + "Dom %u: SYSTEM_SUSPEND requested, epoint=%#lx, cid=%#lx\n",
For both of them you technically want to use PRIregister rather than lx.
Lastly, we prefer to use %pd when printing a domain. The argument is
'd'. But this should not be necessary if you use gprintk().
> + d->domain_id, epoint, cid);
> +
> + return rc;
> +}
> +
> static int32_t do_psci_1_0_features(uint32_t psci_func_id)
> {
> /* /!\ Ordered by function ID and not name */
> @@ -214,6 +267,8 @@ static int32_t do_psci_1_0_features(uint32_t psci_func_id)
> case PSCI_0_2_FN32_SYSTEM_OFF:
> case PSCI_0_2_FN32_SYSTEM_RESET:
> case PSCI_1_0_FN32_PSCI_FEATURES:
> + case PSCI_1_0_FN32_SYSTEM_SUSPEND:
> + case PSCI_1_0_FN64_SYSTEM_SUSPEND:
> case ARM_SMCCC_VERSION_FID:
> return 0;
> default:
> @@ -344,6 +399,24 @@ bool do_vpsci_0_2_call(struct cpu_user_regs *regs, uint32_t fid)
> return true;
> }
>
> + case PSCI_1_0_FN32_SYSTEM_SUSPEND:
> + case PSCI_1_0_FN64_SYSTEM_SUSPEND:
> + {
> + register_t epoint = PSCI_ARG(regs, 1);
> + register_t cid = PSCI_ARG(regs, 2);
> +
> + if ( is_64bit_domain(current->domain) &&
Shouldn't this be removed so the check also apply for 32-bit domain on
64-bit system?
> + fid == PSCI_1_0_FN32_SYSTEM_SUSPEND )
> + {> + epoint &= GENMASK(31, 0);
> + cid &= GENMASK(31, 0);
> + }
> +
> + perfc_incr(vpsci_system_suspend);
> + PSCI_SET_RESULT(regs, do_psci_1_0_system_suspend(epoint, cid));
> + return true;
> + }
> +
> default:
> return false;
> }
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 5241a1629e..624c3e2c27 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1343,16 +1343,13 @@ int domain_shutdown(struct domain *d, u8 reason)
> return 0;
> }
>
> -void domain_resume(struct domain *d)
> +#ifndef CONFIG_ARM
> +static
> +#endif
I am not sure who suggests this but personally, I dislike using
CONFIG_<ARCH> in common code. I also think this is worse for hiding the
"static" keyword.
For the latter, I think it would be better to provide a separate helper
that can be #ifdef.
[...]
Cheers,
--
Julien Grall
Hi Julien,
On Sun, Aug 24, 2025 at 12:26 AM Julien Grall <julien@xen.org> wrote:
>
> Hi Mykola,
>
> On 18/08/2025 09:49, Mykola Kvach wrote:
> > From: Mykola Kvach <mykola_kvach@epam.com>
> >
> > This patch adds support for the PSCI SYSTEM_SUSPEND function in the vPSCI
> > (virtual PSCI) interface, allowing guests to request suspend via the PSCI
> > v1.0 SYSTEM_SUSPEND call (both 32-bit and 64-bit variants).
> >
> > The implementation:
> > - Adds SYSTEM_SUSPEND function IDs to PSCI definitions
> > - Implements trapping and handling of SYSTEM_SUSPEND in vPSCI
> > - Allows only non-hardware domains to invoke SYSTEM_SUSPEND; for the
> > hardware domain, PSCI_NOT_SUPPORTED is returned to avoid halting the
> > system in hwdom_shutdown() called from domain_shutdown
> > - Ensures all secondary VCPUs of the calling domain are offline before
> > allowing suspend due to PSCI spec
> >
> > GIC and virtual timer context must be saved when the domain suspends.
>
> Can you expand a bit more on this? Is this a requirement from the Arm
> Arm? If so, please specify the specification so we can easily check.
>
> > This is done by moving the respective code in ctxt_switch_from
> > before the return that happens if the domain suspended.
>
> From the commit message, it is unclear why we want to skip the save part.
I'll add extra information to the commit message.
>
> [...]
>
> > ---
> > xen/arch/arm/domain.c | 17 +++--
> > xen/arch/arm/include/asm/perfc_defn.h | 1 +
> > xen/arch/arm/include/asm/psci.h | 2 +
> > xen/arch/arm/include/asm/vpsci.h | 2 +-
> > xen/arch/arm/vpsci.c | 101 ++++++++++++++++++++++----
> > xen/common/domain.c | 31 ++++++--
> > xen/include/xen/sched.h | 6 ++
> > 7 files changed, 131 insertions(+), 29 deletions(-)
> >
> > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> > index 310c578909..9e9649c4e2 100644
> > --- a/xen/arch/arm/domain.c
> > +++ b/xen/arch/arm/domain.c
> > @@ -90,6 +90,16 @@ static void ctxt_switch_from(struct vcpu *p)
> > if ( is_idle_vcpu(p) )
> > return;
> >
>
> It would be good to have a comment explaining what (and why) we need to
> save only partially the state while the VCPU is suspended.
Actually, nothing bad happens if we save everything except SCTLR_EL1,
which is modified in PSCI SYSTEM_SUSPEND. The only downside might be
losing some CPU cycles.
If we talk just about GIC and the Arch timer state, the relevant guidance
can be found in the Power State Coordination Interface
(DEN0022F.b, 6.8 Preserving the execution context).
Even if the guest saves the context it has access to and restores it on
resume, we only need to save the transient state. Looks like we can
do nothing with the arch timer here. And gic save function can be called
from PSCI SYSTEM_SUSPEND.
>
> > + /* Arch timer */
> > + p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1);
> > + virt_timer_save(p);
> > +
> > + /* VGIC */
> > + gic_save_state(p);
> > +
> > + if ( test_bit(_VPF_suspended, &p->pause_flags) )
> > + return;
>
> Can you explain why we don't need an isb()?
>
> > +
> > p2m_save_state(p);
> At least part of p2m_save_state() can't be skipped because it is
> applying a workaround on platform where AT mistakenly speculate.
Thank you for pointing that out, we need it here.
>
> [...]
>
> >
> > +static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
> > +{
> > + int32_t rc;
> > + struct vcpu *v;
> > + struct domain *d = current->domain;
> > +
> > + /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
> > + if ( is_hardware_domain(d) )
> > + return PSCI_NOT_SUPPORTED;
> > +
> > + /* Ensure that all CPUs other than the calling one are offline */
> > + domain_lock(d);
> > + for_each_vcpu ( d, v )
> > + {
> > + if ( v != current && is_vcpu_online(v) )
> > + {
> > + domain_unlock(d);
> > + return PSCI_DENIED;
> > + }
> > + }
> > + domain_unlock(d);
> > +
> > + rc = domain_shutdown(d, SHUTDOWN_suspend);
> > + if ( rc )
> > + return PSCI_DENIED;
> > +
> > + rc = do_setup_vcpu_ctx(current, epoint, cid);
> > + if ( rc != PSCI_SUCCESS )
> > + {
> > + domain_resume_nopause(d);
> > + return rc;
> > + }
> > +
> > + set_bit(_VPF_suspended, ¤t->pause_flags);
> > +
> > + dprintk(XENLOG_DEBUG,
>
> I think you should use gdprintk() which will print the domain for you as
> well as appropriately ratelimit the message.
>
> That said, I would consider using gprintk() so the message can also be
> printed in a debug build.
Got it, I will use gprintk() instead as suggested. Thank you for the
recommendation.
>
> > + "Dom %u: SYSTEM_SUSPEND requested, epoint=%#lx, cid=%#lx\n",
>
> For both of them you technically want to use PRIregister rather than lx.
Got it, I will use PRIregister instead as suggested. Thank you for the
recommendation.
>
> Lastly, we prefer to use %pd when printing a domain. The argument is
> 'd'. But this should not be necessary if you use gprintk().
Ok.
>
>
> > + d->domain_id, epoint, cid);
> > +
> > + return rc;
> > +}
> > +
> > static int32_t do_psci_1_0_features(uint32_t psci_func_id)
> > {
> > /* /!\ Ordered by function ID and not name */
> > @@ -214,6 +267,8 @@ static int32_t do_psci_1_0_features(uint32_t psci_func_id)
> > case PSCI_0_2_FN32_SYSTEM_OFF:
> > case PSCI_0_2_FN32_SYSTEM_RESET:
> > case PSCI_1_0_FN32_PSCI_FEATURES:
> > + case PSCI_1_0_FN32_SYSTEM_SUSPEND:
> > + case PSCI_1_0_FN64_SYSTEM_SUSPEND:
> > case ARM_SMCCC_VERSION_FID:
> > return 0;
> > default:
> > @@ -344,6 +399,24 @@ bool do_vpsci_0_2_call(struct cpu_user_regs *regs, uint32_t fid)
> > return true;
> > }
> >
> > + case PSCI_1_0_FN32_SYSTEM_SUSPEND:
> > + case PSCI_1_0_FN64_SYSTEM_SUSPEND:
> > + {
> > + register_t epoint = PSCI_ARG(regs, 1);
> > + register_t cid = PSCI_ARG(regs, 2);
> > +
> > + if ( is_64bit_domain(current->domain) &&
>
> Shouldn't this be removed so the check also apply for 32-bit domain on
> 64-bit system?
AFAIK, this question was already addressed in a previous version of
this patch series:
https://patchew.org/Xen/cover.1751020456.git.mykola._5Fkvach@epam.com/072270e0940b6bcc2743d56a336363f4719ba60a.1751020456.git.mykola._5Fkvach@epam.com/#7070f416-119c-49f8-acd0-82c6e31f0fc6@xen.org
>
> > + fid == PSCI_1_0_FN32_SYSTEM_SUSPEND )
> > + {> + epoint &= GENMASK(31, 0);
> > + cid &= GENMASK(31, 0);
> > + }
> > +
> > + perfc_incr(vpsci_system_suspend);
> > + PSCI_SET_RESULT(regs, do_psci_1_0_system_suspend(epoint, cid));
> > + return true;
> > + }
> > +
> > default:
> > return false;
> > }
> > diff --git a/xen/common/domain.c b/xen/common/domain.c
> > index 5241a1629e..624c3e2c27 100644
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
> > @@ -1343,16 +1343,13 @@ int domain_shutdown(struct domain *d, u8 reason)
> > return 0;
> > }
> >
> > -void domain_resume(struct domain *d)
> > +#ifndef CONFIG_ARM
> > +static
> > +#endif
>
> I am not sure who suggests this but personally, I dislike using
> CONFIG_<ARCH> in common code. I also think this is worse for hiding the
> "static" keyword.
>
> For the latter, I think it would be better to provide a separate helper
> that can be #ifdef.
Will do. I will provide a separate helper as suggested.
>
> [...]
>
> Cheers,
>
> --
> Julien Grall
>
Best regards,
Mykola
Hi Mykola, On 26/08/2025 10:09, Mykola Kvach wrote: >> Shouldn't this be removed so the check also apply for 32-bit domain on >> 64-bit system? > > AFAIK, this question was already addressed in a previous version of > this patch series: > https://patchew.org/Xen/cover.1751020456.git.mykola._5Fkvach@epam.com/072270e0940b6bcc2743d56a336363f4719ba60a.1751020456.git.mykola._5Fkvach@epam.com/#7070f416-119c-49f8-acd0-82c6e31f0fc6@xen.org Sure. For 32-bit domain, in theory the top bits should be zeroed. But AFAIK, there is no harm to zero them again and it would avoid someone to wonder why this is protected by is_64bit_domain(). So unless there is a strong reason to keep, I would rather prefer if we remove the 64-bit. Cheers, -- Julien Grall
On Tue, Aug 26, 2025 at 12:26 PM Julien Grall <julien@xen.org> wrote: > > Hi Mykola, > > On 26/08/2025 10:09, Mykola Kvach wrote: > >> Shouldn't this be removed so the check also apply for 32-bit domain on > >> 64-bit system? > > > > AFAIK, this question was already addressed in a previous version of > > this patch series: > > https://patchew.org/Xen/cover.1751020456.git.mykola._5Fkvach@epam.com/072270e0940b6bcc2743d56a336363f4719ba60a.1751020456.git.mykola._5Fkvach@epam.com/#7070f416-119c-49f8-acd0-82c6e31f0fc6@xen.org > > Sure. For 32-bit domain, in theory the top bits should be zeroed. But > AFAIK, there is no harm to zero them again and it would avoid someone to > wonder why this is protected by is_64bit_domain(). > > So unless there is a strong reason to keep, I would rather prefer if we > remove the 64-bit. Understood, I’ll remove the 64-bit domain check as suggested. > > Cheers, > > -- > Julien Grall > Best Regards, Mykola
Hi Mykola,
Mykola Kvach <xakep.amatop@gmail.com> writes:
> From: Mykola Kvach <mykola_kvach@epam.com>
>
> This patch adds support for the PSCI SYSTEM_SUSPEND function in the vPSCI
> (virtual PSCI) interface, allowing guests to request suspend via the PSCI
> v1.0 SYSTEM_SUSPEND call (both 32-bit and 64-bit variants).
>
> The implementation:
> - Adds SYSTEM_SUSPEND function IDs to PSCI definitions
> - Implements trapping and handling of SYSTEM_SUSPEND in vPSCI
> - Allows only non-hardware domains to invoke SYSTEM_SUSPEND; for the
> hardware domain, PSCI_NOT_SUPPORTED is returned to avoid halting the
> system in hwdom_shutdown() called from domain_shutdown
> - Ensures all secondary VCPUs of the calling domain are offline before
> allowing suspend due to PSCI spec
>
> GIC and virtual timer context must be saved when the domain suspends.
> This is done by moving the respective code in ctxt_switch_from
> before the return that happens if the domain suspended.
>
> domain_resume_nopause() is introduced to allow resuming a domain from
> SYSTEM_SUSPEND without pausing it first. This avoids problematic
> domain_pause() calls when resuming from suspend on Arm, particularly
> in error paths. The function is only used on Arm; other architectures
> continue to use the original domain_resume().
>
> Usage:
>
> For Linux-based guests, suspend can be initiated with:
> echo mem > /sys/power/state
> or via:
> systemctl suspend
>
> Resuming the guest is performed from control domain using:
> xl resume <domain>
>
> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
> Changes in V9:
> - no functional changes
> - cosmetic chnages after review
> - enhance commit message and add extra comment to the code after review
>
> Changes in V8:
> - GIC and virtual timer context must be saved when the domain suspends
> - rework locking
> - minor changes after code review
>
> Changes in V7:
> - add proper locking
> - minor changes after code review
>
> Changes in V6:
> - skip execution of ctxt_switch_from for vcpu that is in paused domain
> - add implementation of domain_resume without domain_pause
> - add helper function to determine if vcpu is suspended or not
> - ignore upper 32 bits of argument values when the domain is 64-bit
> and calls the SMC32 SYSTEM_SUSPEND function
> - cosmetic changes after review
>
> Changes in V5:
> - don't use standby mode, restore execution in a provided by guest point
> - move checking that all CPUs, except current one, are offline to after
> pausing the vCPUs
> - provide ret status from arch_domain_shutdown and handle it in
> domain_shutdown
> - adjust VPSCI_NR_FUNCS to reflect the number of newly added PSCI functions
>
> Changes in V4:
> Dropped all changes related to watchdog, domain is marked as shutting
> down in domain_shutdown and watchdog timeout handler won't trigger
> because of it.
>
> Previous versions included code to manage Xen watchdog timers during suspend,
> but this was removed. When a guest OS starts the Xen watchdog (either via the
> kernel driver or xenwatchdogd), it is responsible for managing that state
> across suspend/resume. On Linux, the Xen kernel driver properly stops the
> watchdog during suspend. However, when xenwatchdogd is used instead, suspend
> handling is incomplete, potentially leading to watchdog-triggered resets on
> resume. Xen leaves watchdog handling to the guest OS and its services.
>
> Dropped all changes related to VCPU context, because instead domain_shutdown
> is used, so we don't need any extra changes for suspending domain.
>
> Changes in V3:
> Dropped all domain flags and related code (which touched common functions like
> vcpu_unblock), keeping only the necessary changes for Xen suspend/resume, i.e.
> suspend/resume is now fully supported only for the hardware domain.
> Proper support for domU suspend/resume will be added in a future patch.
> This patch does not yet include VCPU context reset or domain context
> restoration in VCPU.
> ---
> xen/arch/arm/domain.c | 17 +++--
> xen/arch/arm/include/asm/perfc_defn.h | 1 +
> xen/arch/arm/include/asm/psci.h | 2 +
> xen/arch/arm/include/asm/vpsci.h | 2 +-
> xen/arch/arm/vpsci.c | 101 ++++++++++++++++++++++----
> xen/common/domain.c | 31 ++++++--
> xen/include/xen/sched.h | 6 ++
> 7 files changed, 131 insertions(+), 29 deletions(-)
>
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 310c578909..9e9649c4e2 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -90,6 +90,16 @@ static void ctxt_switch_from(struct vcpu *p)
> if ( is_idle_vcpu(p) )
> return;
>
> + /* Arch timer */
> + p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1);
> + virt_timer_save(p);
> +
> + /* VGIC */
> + gic_save_state(p);
> +
I'd like to see comment here on why we don't need to do rest of the
ctx save code. I saw that this is described in the commit message, but
comment there will be very helpful for future contributors.
> + if ( test_bit(_VPF_suspended, &p->pause_flags) )
> + return;
> +
> p2m_save_state(p);
>
> /* CP 15 */
> @@ -106,10 +116,6 @@ static void ctxt_switch_from(struct vcpu *p)
> p->arch.tpidrro_el0 = READ_SYSREG(TPIDRRO_EL0);
> p->arch.tpidr_el1 = READ_SYSREG(TPIDR_EL1);
>
> - /* Arch timer */
> - p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1);
> - virt_timer_save(p);
> -
> if ( is_32bit_domain(p->domain) && cpu_has_thumbee )
> {
> p->arch.teecr = READ_SYSREG(TEECR32_EL1);
> @@ -158,9 +164,6 @@ static void ctxt_switch_from(struct vcpu *p)
>
> /* XXX MPU */
>
> - /* VGIC */
> - gic_save_state(p);
> -
> isb();
> }
>
> diff --git a/xen/arch/arm/include/asm/perfc_defn.h b/xen/arch/arm/include/asm/perfc_defn.h
> index effd25b69e..8dfcac7e3b 100644
> --- a/xen/arch/arm/include/asm/perfc_defn.h
> +++ b/xen/arch/arm/include/asm/perfc_defn.h
> @@ -33,6 +33,7 @@ PERFCOUNTER(vpsci_system_reset, "vpsci: system_reset")
> PERFCOUNTER(vpsci_cpu_suspend, "vpsci: cpu_suspend")
> PERFCOUNTER(vpsci_cpu_affinity_info, "vpsci: cpu_affinity_info")
> PERFCOUNTER(vpsci_features, "vpsci: features")
> +PERFCOUNTER(vpsci_system_suspend, "vpsci: system_suspend")
>
> PERFCOUNTER(vcpu_kick, "vcpu: notify other vcpu")
>
> diff --git a/xen/arch/arm/include/asm/psci.h b/xen/arch/arm/include/asm/psci.h
> index 4780972621..48a93e6b79 100644
> --- a/xen/arch/arm/include/asm/psci.h
> +++ b/xen/arch/arm/include/asm/psci.h
> @@ -47,10 +47,12 @@ void call_psci_system_reset(void);
> #define PSCI_0_2_FN32_SYSTEM_OFF PSCI_0_2_FN32(8)
> #define PSCI_0_2_FN32_SYSTEM_RESET PSCI_0_2_FN32(9)
> #define PSCI_1_0_FN32_PSCI_FEATURES PSCI_0_2_FN32(10)
> +#define PSCI_1_0_FN32_SYSTEM_SUSPEND PSCI_0_2_FN32(14)
>
> #define PSCI_0_2_FN64_CPU_SUSPEND PSCI_0_2_FN64(1)
> #define PSCI_0_2_FN64_CPU_ON PSCI_0_2_FN64(3)
> #define PSCI_0_2_FN64_AFFINITY_INFO PSCI_0_2_FN64(4)
> +#define PSCI_1_0_FN64_SYSTEM_SUSPEND PSCI_0_2_FN64(14)
>
> /* PSCI v0.2 affinity level state returned by AFFINITY_INFO */
> #define PSCI_0_2_AFFINITY_LEVEL_ON 0
> diff --git a/xen/arch/arm/include/asm/vpsci.h b/xen/arch/arm/include/asm/vpsci.h
> index 0cca5e6830..69d40f9d7f 100644
> --- a/xen/arch/arm/include/asm/vpsci.h
> +++ b/xen/arch/arm/include/asm/vpsci.h
> @@ -23,7 +23,7 @@
> #include <asm/psci.h>
>
> /* Number of function implemented by virtual PSCI (only 0.2 or later) */
> -#define VPSCI_NR_FUNCS 12
> +#define VPSCI_NR_FUNCS 14
>
> /* Functions handle PSCI calls from the guests */
> bool do_vpsci_0_1_call(struct cpu_user_regs *regs, uint32_t fid);
> diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
> index 7ba9ccd94b..67d369a8a2 100644
> --- a/xen/arch/arm/vpsci.c
> +++ b/xen/arch/arm/vpsci.c
> @@ -10,28 +10,18 @@
>
> #include <public/sched.h>
>
> -static int do_common_cpu_on(register_t target_cpu, register_t entry_point,
> - register_t context_id)
> +static int do_setup_vcpu_ctx(struct vcpu *v, register_t entry_point,
> + register_t context_id)
> {
> - struct vcpu *v;
> struct domain *d = current->domain;
> struct vcpu_guest_context *ctxt;
> int rc;
> bool is_thumb = entry_point & 1;
> - register_t vcpuid;
> -
> - vcpuid = vaffinity_to_vcpuid(target_cpu);
> -
> - if ( (v = domain_vcpu(d, vcpuid)) == NULL )
> - return PSCI_INVALID_PARAMETERS;
>
> /* THUMB set is not allowed with 64-bit domain */
> if ( is_64bit_domain(d) && is_thumb )
> return PSCI_INVALID_ADDRESS;
>
> - if ( !test_bit(_VPF_down, &v->pause_flags) )
> - return PSCI_ALREADY_ON;
> -
> if ( (ctxt = alloc_vcpu_guest_context()) == NULL )
> return PSCI_DENIED;
>
> @@ -78,11 +68,32 @@ static int do_common_cpu_on(register_t target_cpu, register_t entry_point,
> if ( rc < 0 )
> return PSCI_DENIED;
>
> - vcpu_wake(v);
> -
> return PSCI_SUCCESS;
> }
>
> +static int do_common_cpu_on(register_t target_cpu, register_t entry_point,
> + register_t context_id)
> +{
> + int rc;
> + struct vcpu *v;
> + struct domain *d = current->domain;
> + register_t vcpuid;
> +
> + vcpuid = vaffinity_to_vcpuid(target_cpu);
> +
> + if ( (v = domain_vcpu(d, vcpuid)) == NULL )
> + return PSCI_INVALID_PARAMETERS;
> +
> + if ( !test_bit(_VPF_down, &v->pause_flags) )
> + return PSCI_ALREADY_ON;
> +
> + rc = do_setup_vcpu_ctx(v, entry_point, context_id);
> + if ( rc == PSCI_SUCCESS )
> + vcpu_wake(v);
> +
> + return rc;
> +}
> +
> static int32_t do_psci_cpu_on(uint32_t vcpuid, register_t entry_point)
> {
> int32_t ret;
> @@ -197,6 +208,48 @@ static void do_psci_0_2_system_reset(void)
> domain_shutdown(d,SHUTDOWN_reboot);
> }
>
> +static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
> +{
> + int32_t rc;
> + struct vcpu *v;
> + struct domain *d = current->domain;
> +
> + /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
> + if ( is_hardware_domain(d) )
> + return PSCI_NOT_SUPPORTED;
> +
> + /* Ensure that all CPUs other than the calling one are offline */
> + domain_lock(d);
> + for_each_vcpu ( d, v )
> + {
> + if ( v != current && is_vcpu_online(v) )
> + {
> + domain_unlock(d);
> + return PSCI_DENIED;
> + }
> + }
> + domain_unlock(d);
> +
> + rc = domain_shutdown(d, SHUTDOWN_suspend);
> + if ( rc )
> + return PSCI_DENIED;
> +
> + rc = do_setup_vcpu_ctx(current, epoint, cid);
> + if ( rc != PSCI_SUCCESS )
> + {
> + domain_resume_nopause(d);
> + return rc;
> + }
> +
> + set_bit(_VPF_suspended, ¤t->pause_flags);
> +
> + dprintk(XENLOG_DEBUG,
> + "Dom %u: SYSTEM_SUSPEND requested, epoint=%#lx, cid=%#lx\n",
> + d->domain_id, epoint, cid);
> +
> + return rc;
> +}
> +
> static int32_t do_psci_1_0_features(uint32_t psci_func_id)
> {
> /* /!\ Ordered by function ID and not name */
> @@ -214,6 +267,8 @@ static int32_t do_psci_1_0_features(uint32_t psci_func_id)
> case PSCI_0_2_FN32_SYSTEM_OFF:
> case PSCI_0_2_FN32_SYSTEM_RESET:
> case PSCI_1_0_FN32_PSCI_FEATURES:
> + case PSCI_1_0_FN32_SYSTEM_SUSPEND:
> + case PSCI_1_0_FN64_SYSTEM_SUSPEND:
> case ARM_SMCCC_VERSION_FID:
> return 0;
> default:
> @@ -344,6 +399,24 @@ bool do_vpsci_0_2_call(struct cpu_user_regs *regs, uint32_t fid)
> return true;
> }
>
> + case PSCI_1_0_FN32_SYSTEM_SUSPEND:
> + case PSCI_1_0_FN64_SYSTEM_SUSPEND:
> + {
> + register_t epoint = PSCI_ARG(regs, 1);
> + register_t cid = PSCI_ARG(regs, 2);
> +
> + if ( is_64bit_domain(current->domain) &&
> + fid == PSCI_1_0_FN32_SYSTEM_SUSPEND )
> + {
> + epoint &= GENMASK(31, 0);
> + cid &= GENMASK(31, 0);
> + }
> +
> + perfc_incr(vpsci_system_suspend);
> + PSCI_SET_RESULT(regs, do_psci_1_0_system_suspend(epoint, cid));
> + return true;
> + }
> +
> default:
> return false;
> }
> diff --git a/xen/common/domain.c b/xen/common/domain.c
> index 5241a1629e..624c3e2c27 100644
> --- a/xen/common/domain.c
> +++ b/xen/common/domain.c
> @@ -1343,16 +1343,13 @@ int domain_shutdown(struct domain *d, u8 reason)
> return 0;
> }
>
> -void domain_resume(struct domain *d)
> +#ifndef CONFIG_ARM
> +static
> +#endif
> +void domain_resume_nopause(struct domain *d)
It took me some time to understand that this function makes assumption
that domain is already paused. As it behaves like *_unlocked()
functions, maybe it is better to call it something like domain_resume_paused() ?
> {
> struct vcpu *v;
>
> - /*
> - * Some code paths assume that shutdown status does not get reset under
> - * their feet (e.g., some assertions make this assumption).
> - */
> - domain_pause(d);
> -
> spin_lock(&d->shutdown_lock);
>
> d->is_shutting_down = d->is_shut_down = 0;
> @@ -1360,13 +1357,33 @@ void domain_resume(struct domain *d)
>
> for_each_vcpu ( d, v )
> {
> + /*
> + * No need to conditionally clear _VPF_suspended here:
> + * - This bit is only set on Arm64, and only after a successful suspend.
> + * - domain_resume_nopause() may also be called from paths other than
> + * the suspend/resume flow, such as "soft-reset" actions (e.g.,
> + * on_poweroff), as part of the Xenstore control/shutdown protocol.
> + * These require guest acknowledgement to complete the operation.
> + * So clearing the bit unconditionally is safe.
> + */
> + clear_bit(_VPF_suspended, &v->pause_flags);
> +
> if ( v->paused_for_shutdown )
> vcpu_unpause(v);
> v->paused_for_shutdown = 0;
> }
>
> spin_unlock(&d->shutdown_lock);
> +}
>
> +void domain_resume(struct domain *d)
> +{
> + /*
> + * Some code paths assume that shutdown status does not get reset under
> + * their feet (e.g., some assertions make this assumption).
> + */
> + domain_pause(d);
> + domain_resume_nopause(d);
> domain_unpause(d);
> }
>
> diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> index fd5c9f9333..c1848d8ea6 100644
> --- a/xen/include/xen/sched.h
> +++ b/xen/include/xen/sched.h
> @@ -814,6 +814,9 @@ void domain_destroy(struct domain *d);
> int domain_kill(struct domain *d);
> int domain_shutdown(struct domain *d, u8 reason);
> void domain_resume(struct domain *d);
> +#ifdef CONFIG_ARM
> +void domain_resume_nopause(struct domain *d);
> +#endif
Probably I need comment from other reviewers here. Do we really need to
make this ARM-specific? I understand that right now it will be used only
by ARM, but still... Also, I am not big fan of that
> +#ifndef CONFIG_ARM
> +static
> +#endif
in the function definition.
--
WBR, Volodymyr
Hi Volodymyr,
On Sat, Aug 23, 2025 at 2:30 AM Volodymyr Babchuk
<Volodymyr_Babchuk@epam.com> wrote:
>
>
> Hi Mykola,
>
> Mykola Kvach <xakep.amatop@gmail.com> writes:
>
> > From: Mykola Kvach <mykola_kvach@epam.com>
> >
> > This patch adds support for the PSCI SYSTEM_SUSPEND function in the vPSCI
> > (virtual PSCI) interface, allowing guests to request suspend via the PSCI
> > v1.0 SYSTEM_SUSPEND call (both 32-bit and 64-bit variants).
> >
> > The implementation:
> > - Adds SYSTEM_SUSPEND function IDs to PSCI definitions
> > - Implements trapping and handling of SYSTEM_SUSPEND in vPSCI
> > - Allows only non-hardware domains to invoke SYSTEM_SUSPEND; for the
> > hardware domain, PSCI_NOT_SUPPORTED is returned to avoid halting the
> > system in hwdom_shutdown() called from domain_shutdown
> > - Ensures all secondary VCPUs of the calling domain are offline before
> > allowing suspend due to PSCI spec
> >
> > GIC and virtual timer context must be saved when the domain suspends.
> > This is done by moving the respective code in ctxt_switch_from
> > before the return that happens if the domain suspended.
> >
> > domain_resume_nopause() is introduced to allow resuming a domain from
> > SYSTEM_SUSPEND without pausing it first. This avoids problematic
> > domain_pause() calls when resuming from suspend on Arm, particularly
> > in error paths. The function is only used on Arm; other architectures
> > continue to use the original domain_resume().
> >
> > Usage:
> >
> > For Linux-based guests, suspend can be initiated with:
> > echo mem > /sys/power/state
> > or via:
> > systemctl suspend
> >
> > Resuming the guest is performed from control domain using:
> > xl resume <domain>
> >
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> > Changes in V9:
> > - no functional changes
> > - cosmetic chnages after review
> > - enhance commit message and add extra comment to the code after review
> >
> > Changes in V8:
> > - GIC and virtual timer context must be saved when the domain suspends
> > - rework locking
> > - minor changes after code review
> >
> > Changes in V7:
> > - add proper locking
> > - minor changes after code review
> >
> > Changes in V6:
> > - skip execution of ctxt_switch_from for vcpu that is in paused domain
> > - add implementation of domain_resume without domain_pause
> > - add helper function to determine if vcpu is suspended or not
> > - ignore upper 32 bits of argument values when the domain is 64-bit
> > and calls the SMC32 SYSTEM_SUSPEND function
> > - cosmetic changes after review
> >
> > Changes in V5:
> > - don't use standby mode, restore execution in a provided by guest point
> > - move checking that all CPUs, except current one, are offline to after
> > pausing the vCPUs
> > - provide ret status from arch_domain_shutdown and handle it in
> > domain_shutdown
> > - adjust VPSCI_NR_FUNCS to reflect the number of newly added PSCI functions
> >
> > Changes in V4:
> > Dropped all changes related to watchdog, domain is marked as shutting
> > down in domain_shutdown and watchdog timeout handler won't trigger
> > because of it.
> >
> > Previous versions included code to manage Xen watchdog timers during suspend,
> > but this was removed. When a guest OS starts the Xen watchdog (either via the
> > kernel driver or xenwatchdogd), it is responsible for managing that state
> > across suspend/resume. On Linux, the Xen kernel driver properly stops the
> > watchdog during suspend. However, when xenwatchdogd is used instead, suspend
> > handling is incomplete, potentially leading to watchdog-triggered resets on
> > resume. Xen leaves watchdog handling to the guest OS and its services.
> >
> > Dropped all changes related to VCPU context, because instead domain_shutdown
> > is used, so we don't need any extra changes for suspending domain.
> >
> > Changes in V3:
> > Dropped all domain flags and related code (which touched common functions like
> > vcpu_unblock), keeping only the necessary changes for Xen suspend/resume, i.e.
> > suspend/resume is now fully supported only for the hardware domain.
> > Proper support for domU suspend/resume will be added in a future patch.
> > This patch does not yet include VCPU context reset or domain context
> > restoration in VCPU.
> > ---
> > xen/arch/arm/domain.c | 17 +++--
> > xen/arch/arm/include/asm/perfc_defn.h | 1 +
> > xen/arch/arm/include/asm/psci.h | 2 +
> > xen/arch/arm/include/asm/vpsci.h | 2 +-
> > xen/arch/arm/vpsci.c | 101 ++++++++++++++++++++++----
> > xen/common/domain.c | 31 ++++++--
> > xen/include/xen/sched.h | 6 ++
> > 7 files changed, 131 insertions(+), 29 deletions(-)
> >
> > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> > index 310c578909..9e9649c4e2 100644
> > --- a/xen/arch/arm/domain.c
> > +++ b/xen/arch/arm/domain.c
> > @@ -90,6 +90,16 @@ static void ctxt_switch_from(struct vcpu *p)
> > if ( is_idle_vcpu(p) )
> > return;
> >
> > + /* Arch timer */
> > + p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1);
> > + virt_timer_save(p);
> > +
> > + /* VGIC */
> > + gic_save_state(p);
> > +
>
> I'd like to see comment here on why we don't need to do rest of the
> ctx save code. I saw that this is described in the commit message, but
> comment there will be very helpful for future contributors.
Honestly, regarding the vCPU context, the only thing we need to
prevent from being saved (overwritten) in this call is the SCTLR_EL1
register. This was already handled in one of the previous patches,
but the change was not accepted during review.
As for other registers, saving them is not strictly required
according to the PSCI specification. It states that at startup we
must set the AIF or DAIF bits for AArch32 or AArch64, respectively
(DEN0022F.b 1.3, "6.4.3.3 Exceptions"). SCTLR.{I, C, M} must be set
to {0, 0, 0} (DEN0022F.b 1.3, "6.4.3.4 MMU, cache and branch
predictor enable").
Finally, the context ID and the entry point must also be saved in
the guest registers, but in this function they are not modified.
I'll add a comment as you proposed, to make this clearer for future
contributors.
>
> > + if ( test_bit(_VPF_suspended, &p->pause_flags) )
> > + return;
> > +
> > p2m_save_state(p);
> >
> > /* CP 15 */
> > @@ -106,10 +116,6 @@ static void ctxt_switch_from(struct vcpu *p)
> > p->arch.tpidrro_el0 = READ_SYSREG(TPIDRRO_EL0);
> > p->arch.tpidr_el1 = READ_SYSREG(TPIDR_EL1);
> >
> > - /* Arch timer */
> > - p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1);
> > - virt_timer_save(p);
> > -
> > if ( is_32bit_domain(p->domain) && cpu_has_thumbee )
> > {
> > p->arch.teecr = READ_SYSREG(TEECR32_EL1);
> > @@ -158,9 +164,6 @@ static void ctxt_switch_from(struct vcpu *p)
> >
> > /* XXX MPU */
> >
> > - /* VGIC */
> > - gic_save_state(p);
> > -
> > isb();
> > }
> >
> > diff --git a/xen/arch/arm/include/asm/perfc_defn.h b/xen/arch/arm/include/asm/perfc_defn.h
> > index effd25b69e..8dfcac7e3b 100644
> > --- a/xen/arch/arm/include/asm/perfc_defn.h
> > +++ b/xen/arch/arm/include/asm/perfc_defn.h
> > @@ -33,6 +33,7 @@ PERFCOUNTER(vpsci_system_reset, "vpsci: system_reset")
> > PERFCOUNTER(vpsci_cpu_suspend, "vpsci: cpu_suspend")
> > PERFCOUNTER(vpsci_cpu_affinity_info, "vpsci: cpu_affinity_info")
> > PERFCOUNTER(vpsci_features, "vpsci: features")
> > +PERFCOUNTER(vpsci_system_suspend, "vpsci: system_suspend")
> >
> > PERFCOUNTER(vcpu_kick, "vcpu: notify other vcpu")
> >
> > diff --git a/xen/arch/arm/include/asm/psci.h b/xen/arch/arm/include/asm/psci.h
> > index 4780972621..48a93e6b79 100644
> > --- a/xen/arch/arm/include/asm/psci.h
> > +++ b/xen/arch/arm/include/asm/psci.h
> > @@ -47,10 +47,12 @@ void call_psci_system_reset(void);
> > #define PSCI_0_2_FN32_SYSTEM_OFF PSCI_0_2_FN32(8)
> > #define PSCI_0_2_FN32_SYSTEM_RESET PSCI_0_2_FN32(9)
> > #define PSCI_1_0_FN32_PSCI_FEATURES PSCI_0_2_FN32(10)
> > +#define PSCI_1_0_FN32_SYSTEM_SUSPEND PSCI_0_2_FN32(14)
> >
> > #define PSCI_0_2_FN64_CPU_SUSPEND PSCI_0_2_FN64(1)
> > #define PSCI_0_2_FN64_CPU_ON PSCI_0_2_FN64(3)
> > #define PSCI_0_2_FN64_AFFINITY_INFO PSCI_0_2_FN64(4)
> > +#define PSCI_1_0_FN64_SYSTEM_SUSPEND PSCI_0_2_FN64(14)
> >
> > /* PSCI v0.2 affinity level state returned by AFFINITY_INFO */
> > #define PSCI_0_2_AFFINITY_LEVEL_ON 0
> > diff --git a/xen/arch/arm/include/asm/vpsci.h b/xen/arch/arm/include/asm/vpsci.h
> > index 0cca5e6830..69d40f9d7f 100644
> > --- a/xen/arch/arm/include/asm/vpsci.h
> > +++ b/xen/arch/arm/include/asm/vpsci.h
> > @@ -23,7 +23,7 @@
> > #include <asm/psci.h>
> >
> > /* Number of function implemented by virtual PSCI (only 0.2 or later) */
> > -#define VPSCI_NR_FUNCS 12
> > +#define VPSCI_NR_FUNCS 14
> >
> > /* Functions handle PSCI calls from the guests */
> > bool do_vpsci_0_1_call(struct cpu_user_regs *regs, uint32_t fid);
> > diff --git a/xen/arch/arm/vpsci.c b/xen/arch/arm/vpsci.c
> > index 7ba9ccd94b..67d369a8a2 100644
> > --- a/xen/arch/arm/vpsci.c
> > +++ b/xen/arch/arm/vpsci.c
> > @@ -10,28 +10,18 @@
> >
> > #include <public/sched.h>
> >
> > -static int do_common_cpu_on(register_t target_cpu, register_t entry_point,
> > - register_t context_id)
> > +static int do_setup_vcpu_ctx(struct vcpu *v, register_t entry_point,
> > + register_t context_id)
> > {
> > - struct vcpu *v;
> > struct domain *d = current->domain;
> > struct vcpu_guest_context *ctxt;
> > int rc;
> > bool is_thumb = entry_point & 1;
> > - register_t vcpuid;
> > -
> > - vcpuid = vaffinity_to_vcpuid(target_cpu);
> > -
> > - if ( (v = domain_vcpu(d, vcpuid)) == NULL )
> > - return PSCI_INVALID_PARAMETERS;
> >
> > /* THUMB set is not allowed with 64-bit domain */
> > if ( is_64bit_domain(d) && is_thumb )
> > return PSCI_INVALID_ADDRESS;
> >
> > - if ( !test_bit(_VPF_down, &v->pause_flags) )
> > - return PSCI_ALREADY_ON;
> > -
> > if ( (ctxt = alloc_vcpu_guest_context()) == NULL )
> > return PSCI_DENIED;
> >
> > @@ -78,11 +68,32 @@ static int do_common_cpu_on(register_t target_cpu, register_t entry_point,
> > if ( rc < 0 )
> > return PSCI_DENIED;
> >
> > - vcpu_wake(v);
> > -
> > return PSCI_SUCCESS;
> > }
> >
> > +static int do_common_cpu_on(register_t target_cpu, register_t entry_point,
> > + register_t context_id)
> > +{
> > + int rc;
> > + struct vcpu *v;
> > + struct domain *d = current->domain;
> > + register_t vcpuid;
> > +
> > + vcpuid = vaffinity_to_vcpuid(target_cpu);
> > +
> > + if ( (v = domain_vcpu(d, vcpuid)) == NULL )
> > + return PSCI_INVALID_PARAMETERS;
> > +
> > + if ( !test_bit(_VPF_down, &v->pause_flags) )
> > + return PSCI_ALREADY_ON;
> > +
> > + rc = do_setup_vcpu_ctx(v, entry_point, context_id);
> > + if ( rc == PSCI_SUCCESS )
> > + vcpu_wake(v);
> > +
> > + return rc;
> > +}
> > +
> > static int32_t do_psci_cpu_on(uint32_t vcpuid, register_t entry_point)
> > {
> > int32_t ret;
> > @@ -197,6 +208,48 @@ static void do_psci_0_2_system_reset(void)
> > domain_shutdown(d,SHUTDOWN_reboot);
> > }
> >
> > +static int32_t do_psci_1_0_system_suspend(register_t epoint, register_t cid)
> > +{
> > + int32_t rc;
> > + struct vcpu *v;
> > + struct domain *d = current->domain;
> > +
> > + /* SYSTEM_SUSPEND is not supported for the hardware domain yet */
> > + if ( is_hardware_domain(d) )
> > + return PSCI_NOT_SUPPORTED;
> > +
> > + /* Ensure that all CPUs other than the calling one are offline */
> > + domain_lock(d);
> > + for_each_vcpu ( d, v )
> > + {
> > + if ( v != current && is_vcpu_online(v) )
> > + {
> > + domain_unlock(d);
> > + return PSCI_DENIED;
> > + }
> > + }
> > + domain_unlock(d);
> > +
> > + rc = domain_shutdown(d, SHUTDOWN_suspend);
> > + if ( rc )
> > + return PSCI_DENIED;
> > +
> > + rc = do_setup_vcpu_ctx(current, epoint, cid);
> > + if ( rc != PSCI_SUCCESS )
> > + {
> > + domain_resume_nopause(d);
> > + return rc;
> > + }
> > +
> > + set_bit(_VPF_suspended, ¤t->pause_flags);
> > +
> > + dprintk(XENLOG_DEBUG,
> > + "Dom %u: SYSTEM_SUSPEND requested, epoint=%#lx, cid=%#lx\n",
> > + d->domain_id, epoint, cid);
> > +
> > + return rc;
> > +}
> > +
> > static int32_t do_psci_1_0_features(uint32_t psci_func_id)
> > {
> > /* /!\ Ordered by function ID and not name */
> > @@ -214,6 +267,8 @@ static int32_t do_psci_1_0_features(uint32_t psci_func_id)
> > case PSCI_0_2_FN32_SYSTEM_OFF:
> > case PSCI_0_2_FN32_SYSTEM_RESET:
> > case PSCI_1_0_FN32_PSCI_FEATURES:
> > + case PSCI_1_0_FN32_SYSTEM_SUSPEND:
> > + case PSCI_1_0_FN64_SYSTEM_SUSPEND:
> > case ARM_SMCCC_VERSION_FID:
> > return 0;
> > default:
> > @@ -344,6 +399,24 @@ bool do_vpsci_0_2_call(struct cpu_user_regs *regs, uint32_t fid)
> > return true;
> > }
> >
> > + case PSCI_1_0_FN32_SYSTEM_SUSPEND:
> > + case PSCI_1_0_FN64_SYSTEM_SUSPEND:
> > + {
> > + register_t epoint = PSCI_ARG(regs, 1);
> > + register_t cid = PSCI_ARG(regs, 2);
> > +
> > + if ( is_64bit_domain(current->domain) &&
> > + fid == PSCI_1_0_FN32_SYSTEM_SUSPEND )
> > + {
> > + epoint &= GENMASK(31, 0);
> > + cid &= GENMASK(31, 0);
> > + }
> > +
> > + perfc_incr(vpsci_system_suspend);
> > + PSCI_SET_RESULT(regs, do_psci_1_0_system_suspend(epoint, cid));
> > + return true;
> > + }
> > +
> > default:
> > return false;
> > }
> > diff --git a/xen/common/domain.c b/xen/common/domain.c
> > index 5241a1629e..624c3e2c27 100644
> > --- a/xen/common/domain.c
> > +++ b/xen/common/domain.c
> > @@ -1343,16 +1343,13 @@ int domain_shutdown(struct domain *d, u8 reason)
> > return 0;
> > }
> >
> > -void domain_resume(struct domain *d)
> > +#ifndef CONFIG_ARM
> > +static
> > +#endif
> > +void domain_resume_nopause(struct domain *d)
>
> It took me some time to understand that this function makes assumption
> that domain is already paused. As it behaves like *_unlocked()
> functions, maybe it is better to call it something like domain_resume_paused() ?
Thank you for your suggestion.
The "nopause" in the function name is intended to highlight the
difference from the previous implementation, where domain_resume()
would explicitly pause the domain before resuming it.
Pausing the domain is not required here. This function is executed on the
last running virtual CPU of the domain, so there is no need to wait for
other vCPUs to finish execution in order to perform a safe operation.
Since we are already inside the HVC trap handler, it is not possible for
other code to run or for race conditions to occur in shutdown states.
The state of other vCPUs cannot change at this point.
For other parts of the code that do not execute in the context of the
current domain, a domain pause is performed before such operations to
ensure that all virtual CPUs are blocked and not running. This guarantees
that operations which require all vCPUs to be paused/unpaused are
performed safely.
>
> > {
> > struct vcpu *v;
> >
> > - /*
> > - * Some code paths assume that shutdown status does not get reset under
> > - * their feet (e.g., some assertions make this assumption).
> > - */
> > - domain_pause(d);
> > -
> > spin_lock(&d->shutdown_lock);
> >
> > d->is_shutting_down = d->is_shut_down = 0;
> > @@ -1360,13 +1357,33 @@ void domain_resume(struct domain *d)
> >
> > for_each_vcpu ( d, v )
> > {
> > + /*
> > + * No need to conditionally clear _VPF_suspended here:
> > + * - This bit is only set on Arm64, and only after a successful suspend.
> > + * - domain_resume_nopause() may also be called from paths other than
> > + * the suspend/resume flow, such as "soft-reset" actions (e.g.,
> > + * on_poweroff), as part of the Xenstore control/shutdown protocol.
> > + * These require guest acknowledgement to complete the operation.
> > + * So clearing the bit unconditionally is safe.
> > + */
> > + clear_bit(_VPF_suspended, &v->pause_flags);
> > +
> > if ( v->paused_for_shutdown )
> > vcpu_unpause(v);
> > v->paused_for_shutdown = 0;
> > }
> >
> > spin_unlock(&d->shutdown_lock);
> > +}
> >
> > +void domain_resume(struct domain *d)
> > +{
> > + /*
> > + * Some code paths assume that shutdown status does not get reset under
> > + * their feet (e.g., some assertions make this assumption).
> > + */
> > + domain_pause(d);
> > + domain_resume_nopause(d);
> > domain_unpause(d);
> > }
> >
> > diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
> > index fd5c9f9333..c1848d8ea6 100644
> > --- a/xen/include/xen/sched.h
> > +++ b/xen/include/xen/sched.h
> > @@ -814,6 +814,9 @@ void domain_destroy(struct domain *d);
> > int domain_kill(struct domain *d);
> > int domain_shutdown(struct domain *d, u8 reason);
> > void domain_resume(struct domain *d);
> > +#ifdef CONFIG_ARM
> > +void domain_resume_nopause(struct domain *d);
> > +#endif
>
> Probably I need comment from other reviewers here. Do we really need to
> make this ARM-specific? I understand that right now it will be used only
It is done to avoid violation of MISRA C Rule 8.7:
Functions and objects should not be defined with external linkage if
they are referenced in only one translation unit
> by ARM, but still... Also, I am not big fan of that
>
> > +#ifndef CONFIG_ARM
> > +static
> > +#endif
>
> in the function definition.
This is done to avoid violation of MISRA C Rule 8.8:
The static storage class specifier shall be used in all declarations of
objects and functions that have internal linkage
>
>
> --
> WBR, Volodymyr
Best regards,
Mykola
On 18.08.2025 10:49, Mykola Kvach wrote:
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -90,6 +90,16 @@ static void ctxt_switch_from(struct vcpu *p)
> if ( is_idle_vcpu(p) )
> return;
>
> + /* Arch timer */
> + p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1);
> + virt_timer_save(p);
> +
> + /* VGIC */
> + gic_save_state(p);
> +
> + if ( test_bit(_VPF_suspended, &p->pause_flags) )
> + return;
As I had to look at the Arm side uses of the new bit to at least try to
follow the comment further down, I came across this. And now I wonder:
Why would one of the pause flags need special casing here?
> @@ -1360,13 +1357,33 @@ void domain_resume(struct domain *d)
>
> for_each_vcpu ( d, v )
> {
> + /*
> + * No need to conditionally clear _VPF_suspended here:
> + * - This bit is only set on Arm64, and only after a successful suspend.
> + * - domain_resume_nopause() may also be called from paths other than
> + * the suspend/resume flow, such as "soft-reset" actions (e.g.,
> + * on_poweroff), as part of the Xenstore control/shutdown protocol.
> + * These require guest acknowledgement to complete the operation.
> + * So clearing the bit unconditionally is safe.
> + */
> + clear_bit(_VPF_suspended, &v->pause_flags);
Seeing that you set this bit for a single vCPU in a domain only, I wonder why
it needs to be a per-vCPU flag.
Apart from this, with the comment I still fear I wouldn't feel capable of
ack-ing this, as there's too much Arm-only interaction in here. It's not even
clear whether this could easily be re-used by another port.
Jan
Hi Jan,
On Mon, Aug 18, 2025 at 1:15 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 18.08.2025 10:49, Mykola Kvach wrote:
> > --- a/xen/arch/arm/domain.c
> > +++ b/xen/arch/arm/domain.c
> > @@ -90,6 +90,16 @@ static void ctxt_switch_from(struct vcpu *p)
> > if ( is_idle_vcpu(p) )
> > return;
> >
> > + /* Arch timer */
> > + p->arch.cntkctl = READ_SYSREG(CNTKCTL_EL1);
> > + virt_timer_save(p);
> > +
> > + /* VGIC */
> > + gic_save_state(p);
> > +
> > + if ( test_bit(_VPF_suspended, &p->pause_flags) )
> > + return;
>
> As I had to look at the Arm side uses of the new bit to at least try to
> follow the comment further down, I came across this. And now I wonder:
> Why would one of the pause flags need special casing here?
Some kind of answer was given in a previous version of this patch series,
see:
https://patchew.org/Xen/cover.1751020456.git.mykola._5Fkvach@epam.com/072270e0940b6bcc2743d56a336363f4719ba60a.1751020456.git.mykola._5Fkvach@epam.com/#066c6e93-a478-4c8f-b161-d109bd0e6bb4@xen.org
To clarify:
We need to avoid updating the vCPU context when switching from a vCPU of a
domain that is being suspended to a vCPU of another domain. After the current
HVC trap, which handles the PSCI SYSTEM_SUSPEND call, completes, the scheduler
will switch to a vCPU from another domain. At this point, the virtual CPU is
marked as paused either via pause_count (domain_shutdown -> vcpu_pause_nosync)
or by having the _VPF_suspended bit set in pause_flags. In both cases,
vcpu_runnable() will return false, so control will not be returned to the guest
OS from this trap, and the scheduler will switch to another domain's vCPU.
During this context switch, we must not update the saved context of certain
vCPU registers for the domain that has entered suspend. The vCPU context for
suspend is set up in do_setup_vcpu_ctx(). If another function were to overwrite
the saved values with the current ones in a physical CPU at this point, the
domain would not be able to resume correctly after a resume command.
As an alternative, all suspend-related actions could be performed in a tasklet,
which would avoid the need for modifications in domain_pause and the
introduction of the new flag. The tasklet would execute after the context
switch, but this approach would increase code complexity and introduce
synchronization issues, such as passing the suspend command context to the
tasklet, adding extra locks, or creating a dedicated tasklet for each domain.
Locking would also need to be reworked to handle cases where another domain
might attempt to change the vCPU context concurrently and just domain_pause
won't be enough.
Since the trap is a synchronous call, the current approach greatly simplifies
synchronization compared to the tasklet-based alternative.
>
> > @@ -1360,13 +1357,33 @@ void domain_resume(struct domain *d)
> >
> > for_each_vcpu ( d, v )
> > {
> > + /*
> > + * No need to conditionally clear _VPF_suspended here:
> > + * - This bit is only set on Arm64, and only after a successful suspend.
Note to self: s/Arm64/Arm/g
> > + * - domain_resume_nopause() may also be called from paths other than
> > + * the suspend/resume flow, such as "soft-reset" actions (e.g.,
> > + * on_poweroff), as part of the Xenstore control/shutdown protocol.
> > + * These require guest acknowledgement to complete the operation.
> > + * So clearing the bit unconditionally is safe.
> > + */
> > + clear_bit(_VPF_suspended, &v->pause_flags);
>
> Seeing that you set this bit for a single vCPU in a domain only, I wonder why
> it needs to be a per-vCPU flag.
That's a good question. In earlier versions of this patch series, both I and
some other contributors used existed fields from the domain structure, such as
shutdown_code and is_shutting_down, for this purpose. However, I recall that
in a previous review, this approach was not well received. See:
https://lore.kernel.org/all/d24be446-af5a-7881-2db4-b25afac3e1f4@citrix.com/
Technically, there is nothing preventing me from storing this information in
the domain structure. However, I do not see much benefit in introducing a new
field to the domain struct when there is already an existing per-vCPU flags
field that tracks powerdown and pause states. Using one more bit in the
pause_flags field seems sufficient and avoids further bloating the domain
structure.
>
> Apart from this, with the comment I still fear I wouldn't feel capable of
> ack-ing this, as there's too much Arm-only interaction in here. It's not even
> clear whether this could easily be re-used by another port.
Thank you for your feedback.
I understand your concern regarding the Arm-specific nature of this code and
the potential challenges for reusing it on other architectures. The current
implementation is focused on supporting PSCI SYSTEM_SUSPEND, which is an
Arm-specific interface, so much of the logic is naturally tied to Arm.
That said, I have tried to keep the changes as contained as possible, and
where feasible, I have avoided making unnecessary modifications to common code.
If there is interest or a use case for supporting similar suspend/resume flows
on other architectures, I am open to suggestions on how to further abstract or
generalize the implementation.
If you have specific recommendations for making this code more portable or
easier to adapt for other ports, I would be happy to discuss and consider
them.
>
> Jan
Best regards,
Mykola
On 18.08.2025 14:43, Mykola Kvach wrote:
> On Mon, Aug 18, 2025 at 1:15 PM Jan Beulich <jbeulich@suse.com> wrote:
>> On 18.08.2025 10:49, Mykola Kvach wrote:
>>> @@ -1360,13 +1357,33 @@ void domain_resume(struct domain *d)
>>>
>>> for_each_vcpu ( d, v )
>>> {
>>> + /*
>>> + * No need to conditionally clear _VPF_suspended here:
>>> + * - This bit is only set on Arm64, and only after a successful suspend.
>
> Note to self: s/Arm64/Arm/g
>
>>> + * - domain_resume_nopause() may also be called from paths other than
>>> + * the suspend/resume flow, such as "soft-reset" actions (e.g.,
>>> + * on_poweroff), as part of the Xenstore control/shutdown protocol.
>>> + * These require guest acknowledgement to complete the operation.
>>> + * So clearing the bit unconditionally is safe.
>>> + */
>>> + clear_bit(_VPF_suspended, &v->pause_flags);
>>
>> Seeing that you set this bit for a single vCPU in a domain only, I wonder why
>> it needs to be a per-vCPU flag.
>
> That's a good question. In earlier versions of this patch series, both I and
> some other contributors used existed fields from the domain structure, such as
> shutdown_code and is_shutting_down, for this purpose. However, I recall that
> in a previous review, this approach was not well received. See:
> https://lore.kernel.org/all/d24be446-af5a-7881-2db4-b25afac3e1f4@citrix.com/
>
> Technically, there is nothing preventing me from storing this information in
> the domain structure. However, I do not see much benefit in introducing a new
> field to the domain struct when there is already an existing per-vCPU flags
> field that tracks powerdown and pause states. Using one more bit in the
> pause_flags field seems sufficient and avoids further bloating the domain
> structure.
Hmm, yes, I was mis-remembering something here: I thought that much like we
have pause_count both for vCPU-s and for domains, we'd also have pause_flags
for both. Perhaps indeed okay as is then, as far as where to put the flag
goes.
Jan
© 2016 - 2025 Red Hat, Inc.