xen: Avoid calling device suspend/resume callbacks

[Xen-devel] [PATCH] xen: Avoid calling device suspend/resume callbacks

Posted by Ross Lagerwall 4 years, 9 months ago

When suspending/resuming or migrating under Xen, there isn't much need
for suspending and resuming all the attached devices since the Xen/QEMU
should correctly maintain the hardware state. Drop these calls and
replace with more specific calls to ensure Xen frontend devices are
properly reconnected.

This change is needed to make NVIDIA vGPU migration work under Xen since
the vGPU device being suspended interferes with that working correctly.
However, it has the added benefit of reducing migration downtime - by
approximately 500ms with an HVM guest in my environment.

Tested by putting an HVM guest through 1000 migration cycles. I also
tested PV guest migration (though less rigorously).

Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com>
---
 drivers/xen/manage.c                       | 24 +++++++---------------
 drivers/xen/xenbus/xenbus_probe_frontend.c | 22 ++++++++++++++++++++
 include/xen/xenbus.h                       |  3 +++
 3 files changed, 32 insertions(+), 17 deletions(-)

diff --git a/drivers/xen/manage.c b/drivers/xen/manage.c
index cd046684e0d1..53768e0e2560 100644
--- a/drivers/xen/manage.c
+++ b/drivers/xen/manage.c
@@ -113,21 +113,12 @@ static void do_suspend(void)
 		goto out_thaw;
 	}
 
-	err = dpm_suspend_start(PMSG_FREEZE);
-	if (err) {
-		pr_err("%s: dpm_suspend_start %d\n", __func__, err);
-		goto out_thaw;
-	}
+	xenbus_suspend_frontends();
 
 	printk(KERN_DEBUG "suspending xenstore...\n");
 	xs_suspend();
 
-	err = dpm_suspend_end(PMSG_FREEZE);
-	if (err) {
-		pr_err("dpm_suspend_end failed: %d\n", err);
-		si.cancelled = 0;
-		goto out_resume;
-	}
+	suspend_device_irqs();
 
 	xen_arch_suspend();
 
@@ -141,7 +132,7 @@ static void do_suspend(void)
 
 	raw_notifier_call_chain(&xen_resume_notifier, 0, NULL);
 
-	dpm_resume_start(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
+	resume_device_irqs();
 
 	if (err) {
 		pr_err("failed to start xen_suspend: %d\n", err);
@@ -150,13 +141,12 @@ static void do_suspend(void)
 
 	xen_arch_resume();
 
-out_resume:
-	if (!si.cancelled)
+	if (!si.cancelled) {
 		xs_resume();
-	else
+		xenbus_resume_frontends();
+	} else {
 		xs_suspend_cancel();
-
-	dpm_resume_end(si.cancelled ? PMSG_THAW : PMSG_RESTORE);
+	}
 
 out_thaw:
 	thaw_processes();
diff --git a/drivers/xen/xenbus/xenbus_probe_frontend.c b/drivers/xen/xenbus/xenbus_probe_frontend.c
index a7d90a719cea..8cd836c402e1 100644
--- a/drivers/xen/xenbus/xenbus_probe_frontend.c
+++ b/drivers/xen/xenbus/xenbus_probe_frontend.c
@@ -153,6 +153,28 @@ static struct xen_bus_type xenbus_frontend = {
 	},
 };
 
+static int xenbus_suspend_one(struct device *dev, void *data)
+{
+	xenbus_dev_suspend(dev);
+	return 0;
+}
+
+void xenbus_suspend_frontends(void)
+{
+	bus_for_each_dev(&xenbus_frontend.bus, NULL, NULL, xenbus_suspend_one);
+}
+
+static int xenbus_resume_one(struct device *dev, void *data)
+{
+	xenbus_frontend_dev_resume(dev);
+	return 0;
+}
+
+void xenbus_resume_frontends(void)
+{
+	bus_for_each_dev(&xenbus_frontend.bus, NULL, NULL, xenbus_resume_one);
+}
+
 static void frontend_changed(struct xenbus_watch *watch,
 			     const char *path, const char *token)
 {
diff --git a/include/xen/xenbus.h b/include/xen/xenbus.h
index 869c816d5f8c..71eeb442c375 100644
--- a/include/xen/xenbus.h
+++ b/include/xen/xenbus.h
@@ -233,4 +233,7 @@ extern const struct file_operations xen_xenbus_fops;
 extern struct xenstore_domain_interface *xen_store_interface;
 extern int xen_store_evtchn;
 
+void xenbus_suspend_frontends(void);
+void xenbus_resume_frontends(void);
+
 #endif /* _XEN_XENBUS_H */
-- 
2.17.2


_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen: Avoid calling device suspend/resume callbacks

Posted by Jan Beulich 4 years, 9 months ago

On 29.07.2019 17:41, Ross Lagerwall wrote:
> When suspending/resuming or migrating under Xen, there isn't much need
> for suspending and resuming all the attached devices since the Xen/QEMU
> should correctly maintain the hardware state. Drop these calls and
> replace with more specific calls to ensure Xen frontend devices are
> properly reconnected.

Is this true for the general pass-through case as well? While migration
may not be (fully) compatible with pass-through, iirc save/restore is.
Would qemu restore state of physical PCI devices?

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen: Avoid calling device suspend/resume callbacks

Posted by Andrew Cooper 4 years, 9 months ago

On 29/07/2019 16:57, Jan Beulich wrote:
> On 29.07.2019 17:41, Ross Lagerwall wrote:
>> When suspending/resuming or migrating under Xen, there isn't much need
>> for suspending and resuming all the attached devices since the Xen/QEMU
>> should correctly maintain the hardware state. Drop these calls and
>> replace with more specific calls to ensure Xen frontend devices are
>> properly reconnected.
> Is this true for the general pass-through case as well? While migration
> may not be (fully) compatible with pass-through, iirc save/restore is.

What gives you this impression?

Migration and save/restore are *literally* the same thing, except that
in one case you're piping the data to/from disk, and in the other you're
piping it to the destination and restoring it immediately.

If you look at the toolstack code, it is all in terms of reading/writing
an fd (including libxl's API) which is either a network socket or a
file, as chosen by xl.

> Would qemu restore state of physical PCI devices?

What state would Qemu be in a position to know about, which isn't
already present in Qemu's datablob?

What we do with graphics cards is to merge Xens logdirty bitmap, with a
dirty list provided by the card itself.  This needs a device-specific
knowledge.  In addition, there is an opaque blob of data produced by the
source card, which is handed to the destination card.  That also lives
in the stream.

Intel's Scalable IOV spec is attempting to rationalise this by having a
standard ways of getting logdirty and "internal state" information out
of a device, but for the moment, it requires custom device-driver
specific code to do anything migration related with real hardware.

As for why its safe to do like this, the best argument is that this is
how all other vendors do migration, including KVM.  Xen is the
odd-one-out using the full S3 path.

~Andrew

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen: Avoid calling device suspend/resume callbacks

Posted by Jan Beulich 4 years, 9 months ago

On 29.07.2019 19:06, Andrew Cooper wrote:
> On 29/07/2019 16:57, Jan Beulich wrote:
>> On 29.07.2019 17:41, Ross Lagerwall wrote:
>>> When suspending/resuming or migrating under Xen, there isn't much need
>>> for suspending and resuming all the attached devices since the Xen/QEMU
>>> should correctly maintain the hardware state. Drop these calls and
>>> replace with more specific calls to ensure Xen frontend devices are
>>> properly reconnected.
>> Is this true for the general pass-through case as well? While migration
>> may not be (fully) compatible with pass-through, iirc save/restore is.
> 
> What gives you this impression?
> 
> Migration and save/restore are *literally* the same thing, except that
> in one case you're piping the data to/from disk, and in the other you're
> piping it to the destination and restoring it immediately.
> 
> If you look at the toolstack code, it is all in terms of reading/writing
> an fd (including libxl's API) which is either a network socket or a
> file, as chosen by xl.

Sure. The main difference is where the restore happens (by default):
For live migration I expect this to be on a different host, whereas
for a non-live restore I'd expect this to be the same host. And it
is only the "same host" case where one can assume the same physical
piece of hardware to be available again for passing through to this
guest. In the "different host" case restore _may_ be possible, using
identical hardware. (And yes, in the "same host" case restore may
also be impossible, if the hardware meanwhile has been assigned to
another guest. But as said, I'm talking about the default case here.)

>> Would qemu restore state of physical PCI devices?
> 
> What state would Qemu be in a position to know about, which isn't
> already present in Qemu's datablob?

That's a valid (rhetorical) question, but not helping to answer mine.

> What we do with graphics cards is to merge Xens logdirty bitmap, with a
> dirty list provided by the card itself.  This needs a device-specific
> knowledge.  In addition, there is an opaque blob of data produced by the
> source card, which is handed to the destination card.  That also lives
> in the stream.
> 
> Intel's Scalable IOV spec is attempting to rationalise this by having a
> standard ways of getting logdirty and "internal state" information out
> of a device, but for the moment, it requires custom device-driver
> specific code to do anything migration related with real hardware.

Which isn't very nice, since it doesn't scale well as a model.

> As for why its safe to do like this, the best argument is that this is
> how all other vendors do migration, including KVM.  Xen is the
> odd-one-out using the full S3 path.

So how do "all other vendors" deal with device specific state? So
far I was under the impression that to deal with this is precisely
why we use the S3 logic in the kernel.

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Re: [Xen-devel] [PATCH] xen: Avoid calling device suspend/resume callbacks

Posted by Marek Marczykowski-Górecki 4 years, 9 months ago

On Tue, Jul 30, 2019 at 07:55:02AM +0000, Jan Beulich wrote:
> On 29.07.2019 19:06, Andrew Cooper wrote:
> > On 29/07/2019 16:57, Jan Beulich wrote:
> >> On 29.07.2019 17:41, Ross Lagerwall wrote:
> >>> When suspending/resuming or migrating under Xen, there isn't much need
> >>> for suspending and resuming all the attached devices since the Xen/QEMU
> >>> should correctly maintain the hardware state. Drop these calls and
> >>> replace with more specific calls to ensure Xen frontend devices are
> >>> properly reconnected.
> >> Is this true for the general pass-through case as well? While migration
> >> may not be (fully) compatible with pass-through, iirc save/restore is.
> > 
> > What gives you this impression?
> > 
> > Migration and save/restore are *literally* the same thing, except that
> > in one case you're piping the data to/from disk, and in the other you're
> > piping it to the destination and restoring it immediately.
> > 
> > If you look at the toolstack code, it is all in terms of reading/writing
> > an fd (including libxl's API) which is either a network socket or a
> > file, as chosen by xl.
> 
> Sure. The main difference is where the restore happens (by default):
> For live migration I expect this to be on a different host, whereas
> for a non-live restore I'd expect this to be the same host. And it
> is only the "same host" case where one can assume the same physical
> piece of hardware to be available again for passing through to this
> guest. In the "different host" case restore _may_ be possible, using
> identical hardware. (And yes, in the "same host" case restore may
> also be impossible, if the hardware meanwhile has been assigned to
> another guest. But as said, I'm talking about the default case here.)
> 
> >> Would qemu restore state of physical PCI devices?
> > 
> > What state would Qemu be in a position to know about, which isn't
> > already present in Qemu's datablob?
> 
> That's a valid (rhetorical) question, but not helping to answer mine.
> 
> > What we do with graphics cards is to merge Xens logdirty bitmap, with a
> > dirty list provided by the card itself.  This needs a device-specific
> > knowledge.  In addition, there is an opaque blob of data produced by the
> > source card, which is handed to the destination card.  That also lives
> > in the stream.
> > 
> > Intel's Scalable IOV spec is attempting to rationalise this by having a
> > standard ways of getting logdirty and "internal state" information out
> > of a device, but for the moment, it requires custom device-driver
> > specific code to do anything migration related with real hardware.
> 
> Which isn't very nice, since it doesn't scale well as a model.
> 
> > As for why its safe to do like this, the best argument is that this is
> > how all other vendors do migration, including KVM.  Xen is the
> > odd-one-out using the full S3 path.
> 
> So how do "all other vendors" deal with device specific state? So
> far I was under the impression that to deal with this is precisely
> why we use the S3 logic in the kernel.

FWIW in Qubes we specifically S3 domUs with PCI devices assigned just
before host S3 - to let the driver save the state and later restore it.
AFAIR in same cases (but it might be PV, not HVM then) not doing so was
breaking host S3 altogether.

-- 
Best Regards,
Marek Marczykowski-Górecki
Invisible Things Lab
A: Because it messes up the order in which people normally read text.
Q: Why is top-posting such a bad thing?
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel