[v1] systemd: Add hooks to stop/start xen-watchdog on suspend/resume

[PATCH] systemd: Add hooks to stop/start xen-watchdog on suspend/resume

Posted by Mykola Kvach 7 months, 2 weeks ago

This patch adds a systemd sleep hook script to stop the xen-watchdog
service before system suspend and start it again after resume.

Stopping the watchdog before a system suspend operation may look unsafe.
Let’s imagine the following situation: 'systemctl suspend' does not
interact with the running service at all. In such a case, the Xen
watchdog daemon freezes just before suspend. If this happens, for
example, right before sending a ping, and Xen has not yet marked the
domain as suspended (is_shutting_down), the Xen watchdog timer may
trigger a false alert.

This is an almost impossible situation, because typically:
    ping time = watchdog timeout / 2

and the watchdog timeout is usually set to a relatively large value
(dozens of seconds).

Still, this is more likely with very short watchdog timeouts. It may
happen in the following scenarios:
    * Significant delays occur between freezing Linux tasks and
      triggering the ACPI or PSCI sleep request or handler.
    * Long delays happen inside Xen between the entrance to the sleep
      trigger and the actual forwarding of the sleep request further.

A similar situation may occur on resume with short timeouts. During the
resume operation, Xen restores timers and the domain context. The Xen
watchdog timer also resumes. If it schedules the domain right before the
watchdog timeout expires, and the daemon responsible for pinging is not
yet running, a timeout might occur.

Both scenarios are rare and typically require very small watchdog
timeouts combined with significant delays in Xen or the Linux kernel
during suspend/resume flows.

Conceptually, however, if activating and pinging the Xen watchdog is the
responsibility of the domain and its services, then the domain should
also manage the watchdog service/daemon lifecycle. This is similar to
what is already done by the Xen watchdog driver inside the Linux kernel.

Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
---
 tools/hotplug/Linux/systemd/Makefile          | 12 ++++-
 .../Linux/systemd/xen-watchdog-sleep.sh       | 45 +++++++++++++++++++
 2 files changed, 56 insertions(+), 1 deletion(-)
 create mode 100644 tools/hotplug/Linux/systemd/xen-watchdog-sleep.sh

diff --git a/tools/hotplug/Linux/systemd/Makefile b/tools/hotplug/Linux/systemd/Makefile
index e29889156d..98d325cc5d 100644
--- a/tools/hotplug/Linux/systemd/Makefile
+++ b/tools/hotplug/Linux/systemd/Makefile
@@ -5,6 +5,9 @@ XEN_SYSTEMD_MODULES := xen.conf
 
 XEN_SYSTEMD_MOUNT := proc-xen.mount
 
+XEN_SYSTEMD_SLEEP_SCRIPTS := xen-watchdog-sleep.sh
+XEN_SYSTEMD_SLEEP_DIR := $(XEN_SYSTEMD_DIR)/../system-sleep
+
 XEN_SYSTEMD_SERVICE := xenstored.service
 XEN_SYSTEMD_SERVICE += xenconsoled.service
 XEN_SYSTEMD_SERVICE += xen-qemu-dom0-disk-backend.service
@@ -15,7 +18,8 @@ XEN_SYSTEMD_SERVICE += xendriverdomain.service
 
 ALL_XEN_SYSTEMD :=	$(XEN_SYSTEMD_MODULES)  \
 			$(XEN_SYSTEMD_MOUNT)	\
-			$(XEN_SYSTEMD_SERVICE)
+			$(XEN_SYSTEMD_SERVICE)	\
+			$(XEN_SYSTEMD_SLEEP_SCRIPTS)
 
 .PHONY: all
 all:	$(ALL_XEN_SYSTEMD)
@@ -31,15 +35,21 @@ distclean: clean
 install: $(ALL_XEN_SYSTEMD)
 	$(INSTALL_DIR) $(DESTDIR)$(XEN_SYSTEMD_DIR)
 	$(INSTALL_DIR) $(DESTDIR)$(XEN_SYSTEMD_MODULES_LOAD)
+	$(INSTALL_DIR) $(DESTDIR)$(XEN_SYSTEMD_SLEEP_DIR)
 	$(INSTALL_DATA) *.service $(DESTDIR)$(XEN_SYSTEMD_DIR)
 	$(INSTALL_DATA) *.mount $(DESTDIR)$(XEN_SYSTEMD_DIR)
 	$(INSTALL_DATA) *.conf $(DESTDIR)$(XEN_SYSTEMD_MODULES_LOAD)
+	set -e; for i in $(XEN_SYSTEMD_SLEEP_SCRIPTS); \
+	    do \
+	    $(INSTALL_PROG) $$i $(DESTDIR)$(XEN_SYSTEMD_SLEEP_DIR); \
+	done
 
 .PHONY: uninstall
 uninstall:
 	rm -f $(DESTDIR)$(XEN_SYSTEMD_MODULES_LOAD)/*.conf
 	rm -f $(DESTDIR)$(XEN_SYSTEMD_DIR)/*.mount
 	rm -f $(DESTDIR)$(XEN_SYSTEMD_DIR)/*.service
+	rm -f $(addprefix $(DESTDIR)$(XEN_SYSTEMD_SLEEP_DIR)/, $(XEN_SYSTEMD_SLEEP_SCRIPTS))
 
 $(XEN_SYSTEMD_MODULES):
 	rm -f $@.tmp
diff --git a/tools/hotplug/Linux/systemd/xen-watchdog-sleep.sh b/tools/hotplug/Linux/systemd/xen-watchdog-sleep.sh
new file mode 100644
index 0000000000..2b2f0e16d8
--- /dev/null
+++ b/tools/hotplug/Linux/systemd/xen-watchdog-sleep.sh
@@ -0,0 +1,45 @@
+#!/bin/sh
+
+# The first argument ($1) is:
+#     "pre" or "post"
+# The second argument ($2) is:
+#     "suspend", "hibernate", "hybrid-sleep", or "suspend-then-hibernate"
+
+. /etc/xen/scripts/hotplugpath.sh
+
+SERVICE_NAME="xen-watchdog.service"
+STATE_FILE="/run/xen-watchdog-sleep-marker"
+XEN_WATCHDOG_SLEEP_LOG="${XEN_LOG_DIR}/xen-watchdog-sleep.log"
+
+log_watchdog() {
+    echo "$1"
+    echo "$(date): $1" >> "${XEN_WATCHDOG_SLEEP_LOG}"
+}
+
+# Exit silently if Xen watchdog service is not present
+if ! systemctl show "${SERVICE_NAME}" > /dev/null 2>&1; then
+    exit 0
+fi
+
+case "$1" in
+pre)
+    if systemctl is-active --quiet "${SERVICE_NAME}"; then
+        touch "${STATE_FILE}"
+        log_watchdog "Stopping ${SERVICE_NAME} before $2."
+        systemctl stop "${SERVICE_NAME}"
+    fi
+    ;;
+post)
+    if [ -f "${STATE_FILE}" ]; then
+        log_watchdog "Starting ${SERVICE_NAME} after $2."
+        systemctl start "${SERVICE_NAME}"
+        rm "${STATE_FILE}"
+    fi
+    ;;
+*)
+    log_watchdog "Script called with unknown action '$1'. Arguments: '$@'"
+    exit 1
+    ;;
+esac
+
+exit 0
-- 
2.48.1

Re: [PATCH] systemd: Add hooks to stop/start xen-watchdog on suspend/resume

Posted by Anthony PERARD 7 months, 2 weeks ago

Hi Mykola,

First, since you used a different email account to send you patch, the
email should start with "From: Mykola ... <...@epam.com>" so that
`git am` can set the correct author for the commit. Often
`git send-email` managed to do that automatically, if it knows that the
author of the email is going to be different than the author of the
commit been sent.

On Thu, Jun 26, 2025 at 11:12:46AM +0300, Mykola Kvach wrote:
> This patch adds a systemd sleep hook script to stop the xen-watchdog
> service before system suspend and start it again after resume.
> 
> Stopping the watchdog before a system suspend operation may look unsafe.
> Let’s imagine the following situation: 'systemctl suspend' does not
> interact with the running service at all. In such a case, the Xen
> watchdog daemon freezes just before suspend. If this happens, for
> example, right before sending a ping, and Xen has not yet marked the
> domain as suspended (is_shutting_down), the Xen watchdog timer may
> trigger a false alert.
> 
> This is an almost impossible situation, because typically:
>     ping time = watchdog timeout / 2
> 
> and the watchdog timeout is usually set to a relatively large value
> (dozens of seconds).
> 
> Still, this is more likely with very short watchdog timeouts. It may
> happen in the following scenarios:
>     * Significant delays occur between freezing Linux tasks and
>       triggering the ACPI or PSCI sleep request or handler.
>     * Long delays happen inside Xen between the entrance to the sleep
>       trigger and the actual forwarding of the sleep request further.
> 
> A similar situation may occur on resume with short timeouts. During the
> resume operation, Xen restores timers and the domain context. The Xen
> watchdog timer also resumes. If it schedules the domain right before the
> watchdog timeout expires, and the daemon responsible for pinging is not
> yet running, a timeout might occur.

On resume from suspend, does Xen expect a ping from the guest? Or is the
watchdog only rearmed on the first ping from the guest after been
resumed?

> Both scenarios are rare and typically require very small watchdog
> timeouts combined with significant delays in Xen or the Linux kernel
> during suspend/resume flows.
> 
> Conceptually, however, if activating and pinging the Xen watchdog is the
> responsibility of the domain and its services, then the domain should
> also manage the watchdog service/daemon lifecycle. This is similar to
> what is already done by the Xen watchdog driver inside the Linux kernel.

So there's already watchdog driver in Linux, why not activate it with
systemd, since it knows how to do it? I almost want to to remove the
service file and redirect users to use systemd's watchdog instead, in
the documentation.

> Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> ---
>  tools/hotplug/Linux/systemd/Makefile          | 12 ++++-
>  .../Linux/systemd/xen-watchdog-sleep.sh       | 45 +++++++++++++++++++
>  2 files changed, 56 insertions(+), 1 deletion(-)
>  create mode 100644 tools/hotplug/Linux/systemd/xen-watchdog-sleep.sh
> 
> diff --git a/tools/hotplug/Linux/systemd/Makefile b/tools/hotplug/Linux/systemd/Makefile
> index e29889156d..98d325cc5d 100644
> --- a/tools/hotplug/Linux/systemd/Makefile
> +++ b/tools/hotplug/Linux/systemd/Makefile
> @@ -5,6 +5,9 @@ XEN_SYSTEMD_MODULES := xen.conf
>  
>  XEN_SYSTEMD_MOUNT := proc-xen.mount
>  
> +XEN_SYSTEMD_SLEEP_SCRIPTS := xen-watchdog-sleep.sh
> +XEN_SYSTEMD_SLEEP_DIR := $(XEN_SYSTEMD_DIR)/../system-sleep

This is the wrong directory, I have no idea what "$(XEN_SYSTEMD_DIR)/.."
could be, even if it's likely to be systemd's directory.
$(XEN_SYSTEMD_DIR) should only be used for system unit files, because
that's how it is defined.

Another comment, from `man 8 systemd-suspend.service`:

    Note that scripts or binaries dropped in /lib/systemd/system-sleep/ are
    intended for local use only and should be considered hacks. If
    applications want to react to system suspend/hibernation and resume,
    they should rather use the Inhibitor interface[1].

    [1] https://www.freedesktop.org/wiki/Software/systemd/inhibit

So is a script in system-sleep the right way?
We probably don't want to go the "inhibitor" way that the manual
suggest, as this would add many dependencies to the daemon (and it's
probably not needed).

How about enhancing xen-watchdog.service to deal with suspend?
It's possible to have "Conflicts=sleep.target" which mean stop this unit
when doing suspend. But restarting the unit on resume seems to need a
second service file which might be a bit more complicated to write,
something like:
    [Unit]
    After=sleep.target
    [Service]
    ExecStart=systemctl restart xen-watchdogd
    [Install]
    WantedBy=suspend.target
    WantedBy=hibernate.target
    WantedBy=hybrid-sleep.target
    WantedBy=suspend-then-hibernate.target
    ...
Actually, I'm not sure After=sleep.target is going to work... we should
be able to use systemd's watchdog capability instead :-) (which seems to
mean that a driver in Linux for xen's watchdog is needed); Never mind,
I've re-read the patch description and commented there.

Anyway, don't use XEN_SYSTEMD_DIR and introduce a new variable
in "systemd.m4".

> +
>  XEN_SYSTEMD_SERVICE := xenstored.service
>  XEN_SYSTEMD_SERVICE += xenconsoled.service
>  XEN_SYSTEMD_SERVICE += xen-qemu-dom0-disk-backend.service
> @@ -15,7 +18,8 @@ XEN_SYSTEMD_SERVICE += xendriverdomain.service
>  
>  ALL_XEN_SYSTEMD :=	$(XEN_SYSTEMD_MODULES)  \
>  			$(XEN_SYSTEMD_MOUNT)	\
> -			$(XEN_SYSTEMD_SERVICE)
> +			$(XEN_SYSTEMD_SERVICE)	\
> +			$(XEN_SYSTEMD_SLEEP_SCRIPTS)
>  
>  .PHONY: all
>  all:	$(ALL_XEN_SYSTEMD)
> @@ -31,15 +35,21 @@ distclean: clean
>  install: $(ALL_XEN_SYSTEMD)
>  	$(INSTALL_DIR) $(DESTDIR)$(XEN_SYSTEMD_DIR)
>  	$(INSTALL_DIR) $(DESTDIR)$(XEN_SYSTEMD_MODULES_LOAD)
> +	$(INSTALL_DIR) $(DESTDIR)$(XEN_SYSTEMD_SLEEP_DIR)
>  	$(INSTALL_DATA) *.service $(DESTDIR)$(XEN_SYSTEMD_DIR)
>  	$(INSTALL_DATA) *.mount $(DESTDIR)$(XEN_SYSTEMD_DIR)
>  	$(INSTALL_DATA) *.conf $(DESTDIR)$(XEN_SYSTEMD_MODULES_LOAD)
> +	set -e; for i in $(XEN_SYSTEMD_SLEEP_SCRIPTS); \
> +	    do \
> +	    $(INSTALL_PROG) $$i $(DESTDIR)$(XEN_SYSTEMD_SLEEP_DIR); \

I don't think you need a loop for that, `install` is perfectly capable
of installing multiple sources.

> +	done
>  
>  .PHONY: uninstall
>  uninstall:
>  	rm -f $(DESTDIR)$(XEN_SYSTEMD_MODULES_LOAD)/*.conf
>  	rm -f $(DESTDIR)$(XEN_SYSTEMD_DIR)/*.mount
>  	rm -f $(DESTDIR)$(XEN_SYSTEMD_DIR)/*.service
> +	rm -f $(addprefix $(DESTDIR)$(XEN_SYSTEMD_SLEEP_DIR)/, $(XEN_SYSTEMD_SLEEP_SCRIPTS))
>  
>  $(XEN_SYSTEMD_MODULES):
>  	rm -f $@.tmp
> diff --git a/tools/hotplug/Linux/systemd/xen-watchdog-sleep.sh b/tools/hotplug/Linux/systemd/xen-watchdog-sleep.sh
> new file mode 100644
> index 0000000000..2b2f0e16d8
> --- /dev/null
> +++ b/tools/hotplug/Linux/systemd/xen-watchdog-sleep.sh
> @@ -0,0 +1,45 @@
> +#!/bin/sh
> +
> +# The first argument ($1) is:
> +#     "pre" or "post"
> +# The second argument ($2) is:
> +#     "suspend", "hibernate", "hybrid-sleep", or "suspend-then-hibernate"
> +
> +. /etc/xen/scripts/hotplugpath.sh
> +
> +SERVICE_NAME="xen-watchdog.service"
> +STATE_FILE="/run/xen-watchdog-sleep-marker"

This should use $XEN_RUN_DIR

> +XEN_WATCHDOG_SLEEP_LOG="${XEN_LOG_DIR}/xen-watchdog-sleep.log"

Is this necessary? Use only `logger`, if `echo log` doesn't log anything.

> +log_watchdog() {
> +    echo "$1"
> +    echo "$(date): $1" >> "${XEN_WATCHDOG_SLEEP_LOG}"
> +}
> +
> +# Exit silently if Xen watchdog service is not present
> +if ! systemctl show "${SERVICE_NAME}" > /dev/null 2>&1; then

Is this necessary? It seems `systemctl is-active` works fine when the
unit doesn't exist.

> +    exit 0
> +fi
> +
> +case "$1" in
> +pre)
> +    if systemctl is-active --quiet "${SERVICE_NAME}"; then
> +        touch "${STATE_FILE}"
> +        log_watchdog "Stopping ${SERVICE_NAME} before $2."
> +        systemctl stop "${SERVICE_NAME}"
> +    fi
> +    ;;
> +post)
> +    if [ -f "${STATE_FILE}" ]; then

Would using `systemctl is-enabled` instead work? It seems to work for a
service on my machine.

> +        log_watchdog "Starting ${SERVICE_NAME} after $2."
> +        systemctl start "${SERVICE_NAME}"
> +        rm "${STATE_FILE}"
> +    fi
> +    ;;
> +*)
> +    log_watchdog "Script called with unknown action '$1'. Arguments: '$@'"
> +    exit 1
> +    ;;
> +esac
> +
> +exit 0
> -- 
> 2.48.1

Thanks,

-- 
Anthony PERARD

Re: [PATCH] systemd: Add hooks to stop/start xen-watchdog on suspend/resume

Posted by Mykola Kvach 7 months, 1 week ago

Hi Anthony,

On Fri, Jun 27, 2025 at 3:37 PM Anthony PERARD <anthony@xenproject.org> wrote:
>
> Hi Mykola,
>
> First, since you used a different email account to send you patch, the
> email should start with "From: Mykola ... <...@epam.com>" so that
> `git am` can set the correct author for the commit. Often
> `git send-email` managed to do that automatically, if it knows that the
> author of the email is going to be different than the author of the
> commit been sent.

Thank you for pointing this out. All my previous patches were sent
correctly — I mean when I used my other email address. I’ll look
into why this patch is missing the "From" line, even though I followed
the usual process.

>
> On Thu, Jun 26, 2025 at 11:12:46AM +0300, Mykola Kvach wrote:
> > This patch adds a systemd sleep hook script to stop the xen-watchdog
> > service before system suspend and start it again after resume.
> >
> > Stopping the watchdog before a system suspend operation may look unsafe.
> > Let’s imagine the following situation: 'systemctl suspend' does not
> > interact with the running service at all. In such a case, the Xen
> > watchdog daemon freezes just before suspend. If this happens, for
> > example, right before sending a ping, and Xen has not yet marked the
> > domain as suspended (is_shutting_down), the Xen watchdog timer may
> > trigger a false alert.
> >
> > This is an almost impossible situation, because typically:
> >     ping time = watchdog timeout / 2
> >
> > and the watchdog timeout is usually set to a relatively large value
> > (dozens of seconds).
> >
> > Still, this is more likely with very short watchdog timeouts. It may
> > happen in the following scenarios:
> >     * Significant delays occur between freezing Linux tasks and
> >       triggering the ACPI or PSCI sleep request or handler.
> >     * Long delays happen inside Xen between the entrance to the sleep
> >       trigger and the actual forwarding of the sleep request further.
> >
> > A similar situation may occur on resume with short timeouts. During the
> > resume operation, Xen restores timers and the domain context. The Xen
> > watchdog timer also resumes. If it schedules the domain right before the
> > watchdog timeout expires, and the daemon responsible for pinging is not
> > yet running, a timeout might occur.
>
> On resume from suspend, does Xen expect a ping from the guest? Or is the
> watchdog only rearmed on the first ping from the guest after been
> resumed?

If the suspend sequence is correct, the Xen watchdog timer is stopped.
It can be stopped either by the domain or directly by the Xen watchdog
timer handler, in the case where the domain is marked as shutting down.

During the resume process, a ping should be sent to start the watchdog
timer again. The guest must continue sending pings to prevent the
watchdog from triggering.

In this case, since a service is used, it starts a daemon that pings
the Xen watchdog at the configured interval:

https://elixir.bootlin.com/xen/v4.20.0/source/tools/misc/xenwatchdogd.c#L186

>
> > Both scenarios are rare and typically require very small watchdog
> > timeouts combined with significant delays in Xen or the Linux kernel
> > during suspend/resume flows.
> >
> > Conceptually, however, if activating and pinging the Xen watchdog is the
> > responsibility of the domain and its services, then the domain should
> > also manage the watchdog service/daemon lifecycle. This is similar to
> > what is already done by the Xen watchdog driver inside the Linux kernel.
>
> So there's already watchdog driver in Linux, why not activate it with
> systemd, since it knows how to do it? I almost want to to remove the

Actually, I don't know the exact reason why we have two different
implementations. It could be historical — for example, initially the
Xen daemon was used, and later the Linux kernel driver was introduced,
or vice versa. It's also possible that some setups still use very old
kernels that lack the driver, and backporting it would require
additional effort.

More likely, though, the daemon approach was chosen for simplicity.
Using the Linux kernel driver requires building the module or even
rebuilding the entire kernel if the driver needs to be built-in.
In contrast, with the daemon, you just need to build the binary and
copy it into the domain's filesystem. It's easier and requires much
less effort.

> service file and redirect users to use systemd's watchdog instead, in
> the documentation.

I think that would be a better solution too. At the very least, we
would avoid having to handle power-related scenarios for every
existing init system.

For example, I see that we currently have three separate watchdog
services: one for NetBSD (rc.d), and two for Linux (init.d and
systemd). I also looked into how to set up power-down hooks with
init.d, and it’s neither easy nor safe -- especially if pm-utils is
not installed on the system.

>
> > Signed-off-by: Mykola Kvach <mykola_kvach@epam.com>
> > ---
> >  tools/hotplug/Linux/systemd/Makefile          | 12 ++++-
> >  .../Linux/systemd/xen-watchdog-sleep.sh       | 45 +++++++++++++++++++
> >  2 files changed, 56 insertions(+), 1 deletion(-)
> >  create mode 100644 tools/hotplug/Linux/systemd/xen-watchdog-sleep.sh
> >
> > diff --git a/tools/hotplug/Linux/systemd/Makefile b/tools/hotplug/Linux/systemd/Makefile
> > index e29889156d..98d325cc5d 100644
> > --- a/tools/hotplug/Linux/systemd/Makefile
> > +++ b/tools/hotplug/Linux/systemd/Makefile
> > @@ -5,6 +5,9 @@ XEN_SYSTEMD_MODULES := xen.conf
> >
> >  XEN_SYSTEMD_MOUNT := proc-xen.mount
> >
> > +XEN_SYSTEMD_SLEEP_SCRIPTS := xen-watchdog-sleep.sh
> > +XEN_SYSTEMD_SLEEP_DIR := $(XEN_SYSTEMD_DIR)/../system-sleep
>
> This is the wrong directory, I have no idea what "$(XEN_SYSTEMD_DIR)/.."
> could be, even if it's likely to be systemd's directory.
> $(XEN_SYSTEMD_DIR) should only be used for system unit files, because
> that's how it is defined.
>
> Another comment, from `man 8 systemd-suspend.service`:
>
>     Note that scripts or binaries dropped in /lib/systemd/system-sleep/ are
>     intended for local use only and should be considered hacks. If
>     applications want to react to system suspend/hibernation and resume,
>     they should rather use the Inhibitor interface[1].
>
>     [1] https://www.freedesktop.org/wiki/Software/systemd/inhibit
>
> So is a script in system-sleep the right way?
> We probably don't want to go the "inhibitor" way that the manual
> suggest, as this would add many dependencies to the daemon (and it's
> probably not needed).

I'm not very familiar with this area, but thank you for the information --
I'll read up on it.

>
> How about enhancing xen-watchdog.service to deal with suspend?
> It's possible to have "Conflicts=sleep.target" which mean stop this unit
> when doing suspend. But restarting the unit on resume seems to need a
> second service file which might be a bit more complicated to write,
> something like:
>     [Unit]
>     After=sleep.target
>     [Service]
>     ExecStart=systemctl restart xen-watchdogd
>     [Install]
>     WantedBy=suspend.target
>     WantedBy=hibernate.target
>     WantedBy=hybrid-sleep.target
>     WantedBy=suspend-then-hibernate.target
>     ...
> Actually, I'm not sure After=sleep.target is going to work... we should
> be able to use systemd's watchdog capability instead :-) (which seems to
> mean that a driver in Linux for xen's watchdog is needed); Never mind,
> I've re-read the patch description and commented there.

I had considered that approach before, but I decided it's better to keep
everything in one place.

>
> Anyway, don't use XEN_SYSTEMD_DIR and introduce a new variable
> in "systemd.m4".

Got it.

>
> > +
> >  XEN_SYSTEMD_SERVICE := xenstored.service
> >  XEN_SYSTEMD_SERVICE += xenconsoled.service
> >  XEN_SYSTEMD_SERVICE += xen-qemu-dom0-disk-backend.service
> > @@ -15,7 +18,8 @@ XEN_SYSTEMD_SERVICE += xendriverdomain.service
> >
> >  ALL_XEN_SYSTEMD :=   $(XEN_SYSTEMD_MODULES)  \
> >                       $(XEN_SYSTEMD_MOUNT)    \
> > -                     $(XEN_SYSTEMD_SERVICE)
> > +                     $(XEN_SYSTEMD_SERVICE)  \
> > +                     $(XEN_SYSTEMD_SLEEP_SCRIPTS)
> >
> >  .PHONY: all
> >  all: $(ALL_XEN_SYSTEMD)
> > @@ -31,15 +35,21 @@ distclean: clean
> >  install: $(ALL_XEN_SYSTEMD)
> >       $(INSTALL_DIR) $(DESTDIR)$(XEN_SYSTEMD_DIR)
> >       $(INSTALL_DIR) $(DESTDIR)$(XEN_SYSTEMD_MODULES_LOAD)
> > +     $(INSTALL_DIR) $(DESTDIR)$(XEN_SYSTEMD_SLEEP_DIR)
> >       $(INSTALL_DATA) *.service $(DESTDIR)$(XEN_SYSTEMD_DIR)
> >       $(INSTALL_DATA) *.mount $(DESTDIR)$(XEN_SYSTEMD_DIR)
> >       $(INSTALL_DATA) *.conf $(DESTDIR)$(XEN_SYSTEMD_MODULES_LOAD)
> > +     set -e; for i in $(XEN_SYSTEMD_SLEEP_SCRIPTS); \
> > +         do \
> > +         $(INSTALL_PROG) $$i $(DESTDIR)$(XEN_SYSTEMD_SLEEP_DIR); \
>
> I don't think you need a loop for that, `install` is perfectly capable
> of installing multiple sources.

Ok, thanks for the suggestion.
I’ll redo it without the loop and use install to handle multiple files directly.

>
> > +     done
> >
> >  .PHONY: uninstall
> >  uninstall:
> >       rm -f $(DESTDIR)$(XEN_SYSTEMD_MODULES_LOAD)/*.conf
> >       rm -f $(DESTDIR)$(XEN_SYSTEMD_DIR)/*.mount
> >       rm -f $(DESTDIR)$(XEN_SYSTEMD_DIR)/*.service
> > +     rm -f $(addprefix $(DESTDIR)$(XEN_SYSTEMD_SLEEP_DIR)/, $(XEN_SYSTEMD_SLEEP_SCRIPTS))
> >
> >  $(XEN_SYSTEMD_MODULES):
> >       rm -f $@.tmp
> > diff --git a/tools/hotplug/Linux/systemd/xen-watchdog-sleep.sh b/tools/hotplug/Linux/systemd/xen-watchdog-sleep.sh
> > new file mode 100644
> > index 0000000000..2b2f0e16d8
> > --- /dev/null
> > +++ b/tools/hotplug/Linux/systemd/xen-watchdog-sleep.sh
> > @@ -0,0 +1,45 @@
> > +#!/bin/sh
> > +
> > +# The first argument ($1) is:
> > +#     "pre" or "post"
> > +# The second argument ($2) is:
> > +#     "suspend", "hibernate", "hybrid-sleep", or "suspend-then-hibernate"
> > +
> > +. /etc/xen/scripts/hotplugpath.sh
> > +
> > +SERVICE_NAME="xen-watchdog.service"
> > +STATE_FILE="/run/xen-watchdog-sleep-marker"
>
> This should use $XEN_RUN_DIR

Got it.

>
> > +XEN_WATCHDOG_SLEEP_LOG="${XEN_LOG_DIR}/xen-watchdog-sleep.log"
>
> Is this necessary? Use only `logger`, if `echo log` doesn't log anything.

In my case, I do see logs in journalctl after using echo in this script.
I thought it was a common approach for the Xen toolstack to use its own log
files for tools and services.

However, if it’s preferable to use only the standard systemd logging, I can
remove all changes related to logging into separate files.

>
> > +log_watchdog() {
> > +    echo "$1"
> > +    echo "$(date): $1" >> "${XEN_WATCHDOG_SLEEP_LOG}"
> > +}
> > +
> > +# Exit silently if Xen watchdog service is not present
> > +if ! systemctl show "${SERVICE_NAME}" > /dev/null 2>&1; then
>
> Is this necessary? It seems `systemctl is-active` works fine when the
> unit doesn't exist.

is-active doesn't work correctly on resume.
In the case of resume, the script exits early because of
the is-active check. Maybe I'm missing something.

>
> > +    exit 0
> > +fi
> > +
> > +case "$1" in
> > +pre)
> > +    if systemctl is-active --quiet "${SERVICE_NAME}"; then
> > +        touch "${STATE_FILE}"
> > +        log_watchdog "Stopping ${SERVICE_NAME} before $2."
> > +        systemctl stop "${SERVICE_NAME}"
> > +    fi
> > +    ;;
> > +post)
> > +    if [ -f "${STATE_FILE}" ]; then
>
> Would using `systemctl is-enabled` instead work? It seems to work for a
> service on my machine.

The service may be enabled (set to start on boot) but not running right
now. I want to restart it only if it was running before suspend --
for example, if the user stopped the service before suspend, they’d
expect it to remain stopped after resume.

>
> > +        log_watchdog "Starting ${SERVICE_NAME} after $2."
> > +        systemctl start "${SERVICE_NAME}"
> > +        rm "${STATE_FILE}"
> > +    fi
> > +    ;;
> > +*)
> > +    log_watchdog "Script called with unknown action '$1'. Arguments: '$@'"
> > +    exit 1
> > +    ;;
> > +esac
> > +
> > +exit 0
> > --
> > 2.48.1

Thank you for the detailed review.

Best regards,
Mykola

>
> Thanks,


>
> --
> Anthony PERARD

Re: [PATCH] systemd: Add hooks to stop/start xen-watchdog on suspend/resume

Posted by Anthony PERARD 7 months ago

On Tue, Jul 01, 2025 at 03:11:39PM +0300, Mykola Kvach wrote:
> On Fri, Jun 27, 2025 at 3:37 PM Anthony PERARD <anthony@xenproject.org> wrote:
> > On Thu, Jun 26, 2025 at 11:12:46AM +0300, Mykola Kvach wrote:
> > > Both scenarios are rare and typically require very small watchdog
> > > timeouts combined with significant delays in Xen or the Linux kernel
> > > during suspend/resume flows.
> > >
> > > Conceptually, however, if activating and pinging the Xen watchdog is the
> > > responsibility of the domain and its services, then the domain should
> > > also manage the watchdog service/daemon lifecycle. This is similar to
> > > what is already done by the Xen watchdog driver inside the Linux kernel.
> >
> > So there's already watchdog driver in Linux, why not activate it with
> > systemd, since it knows how to do it? I almost want to to remove the
> 
> Actually, I don't know the exact reason why we have two different
> implementations. It could be historical — for example, initially the

It's definitely historical. xenwatchdogd was introduced before systemd
existed. Then someone added systemd service files to Xen, but I don't
know if systemd's watchdog support existed at the time, or the service
file was created just to replace the existing init script.

> Xen daemon was used, and later the Linux kernel driver was introduced,
> or vice versa. It's also possible that some setups still use very old
> kernels that lack the driver, and backporting it would require
> additional effort.
> 
> More likely, though, the daemon approach was chosen for simplicity.
> Using the Linux kernel driver requires building the module or even
> rebuilding the entire kernel if the driver needs to be built-in.
> In contrast, with the daemon, you just need to build the binary and
> copy it into the domain's filesystem. It's easier and requires much
> less effort.

That sound like a good explanation, make it easier to use Xen's
watchdog, even if the currently built kernel doesn't have support for
it, so having a systemd.service file for this case is useful.

> 
> > service file and redirect users to use systemd's watchdog instead, in
> > the documentation.
> 
> I think that would be a better solution too. At the very least, we
> would avoid having to handle power-related scenarios for every
> existing init system.
> 
> For example, I see that we currently have three separate watchdog
> services: one for NetBSD (rc.d), and two for Linux (init.d and
> systemd). I also looked into how to set up power-down hooks with
> init.d, and it’s neither easy nor safe -- especially if pm-utils is
> not installed on the system.
> 
> > > diff --git a/tools/hotplug/Linux/systemd/xen-watchdog-sleep.sh b/tools/hotplug/Linux/systemd/xen-watchdog-sleep.sh
> > > new file mode 100644
> > > index 0000000000..2b2f0e16d8
> > > --- /dev/null
> > > +++ b/tools/hotplug/Linux/systemd/xen-watchdog-sleep.sh

> > > +XEN_WATCHDOG_SLEEP_LOG="${XEN_LOG_DIR}/xen-watchdog-sleep.log"
> >
> > Is this necessary? Use only `logger`, if `echo log` doesn't log anything.
> 
> In my case, I do see logs in journalctl after using echo in this script.
> I thought it was a common approach for the Xen toolstack to use its own log
> files for tools and services.

Well, most or all those log files were usually "created" before systemd
existed, and I don't think a single one of those were created for only
a systemd service.

> However, if it’s preferable to use only the standard systemd logging, I can
> remove all changes related to logging into separate files.

For something systemd specific, I think we should use systemd's logging
facility. There's already too many separated log files that are hard to
discover when there's an error. Adding one more isn't going to help,
especially for something that barely generate any logs.

> > > +log_watchdog() {
> > > +    echo "$1"
> > > +    echo "$(date): $1" >> "${XEN_WATCHDOG_SLEEP_LOG}"
> > > +}
> > > +
> > > +# Exit silently if Xen watchdog service is not present
> > > +if ! systemctl show "${SERVICE_NAME}" > /dev/null 2>&1; then
> >
> > Is this necessary? It seems `systemctl is-active` works fine when the
> > unit doesn't exist.
> 
> is-active doesn't work correctly on resume.
> In the case of resume, the script exits early because of
> the is-active check. Maybe I'm missing something.

I think I meant to remove this check and let the script just do its
thing. If the unit isn't present, `systemctl is-active` is going to be
"false" on suspend, and the $STATE_FILE isn't going to exist on resume.

> > > +    exit 0
> > > +fi
> > > +
> > > +case "$1" in
> > > +pre)
> > > +    if systemctl is-active --quiet "${SERVICE_NAME}"; then
> > > +        touch "${STATE_FILE}"
> > > +        log_watchdog "Stopping ${SERVICE_NAME} before $2."
> > > +        systemctl stop "${SERVICE_NAME}"
> > > +    fi
> > > +    ;;
> > > +post)
> > > +    if [ -f "${STATE_FILE}" ]; then
> >
> > Would using `systemctl is-enabled` instead work? It seems to work for a
> > service on my machine.
> 
> The service may be enabled (set to start on boot) but not running right
> now. I want to restart it only if it was running before suspend --
> for example, if the user stopped the service before suspend, they’d
> expect it to remain stopped after resume.

Sounds good.

> >
> > > +        log_watchdog "Starting ${SERVICE_NAME} after $2."
> > > +        systemctl start "${SERVICE_NAME}"
> > > +        rm "${STATE_FILE}"
> > > +    fi
> > > +    ;;
> > > +*)
> > > +    log_watchdog "Script called with unknown action '$1'. Arguments: '$@'"
> > > +    exit 1
> > > +    ;;
> > > +esac

Cheers,

-- 
Anthony PERARD