[PATCH v3] libxl/PCI: Fix PV hotplug & stubdom coldplug

Jason Andryuk posted 1 patch 2 years, 4 months ago
Test gitlab-ci passed
Patches applied successfully (tree, apply log)
git fetch https://gitlab.com/xen-project/patchew/xen tags/patchew/20220109180436.4112-1-jandryuk@gmail.com
There is a newer version of this series
tools/libs/light/libxl_pci.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
[PATCH v3] libxl/PCI: Fix PV hotplug & stubdom coldplug
Posted by Jason Andryuk 2 years, 4 months ago
commit 0fdb48ffe7a1 "libxl: Make sure devices added by pci-attach are
reflected in the config" broken PCI hotplug (xl pci-attach) for PV
domains when it moved libxl__create_pci_backend() later in the function.

This also broke HVM + stubdom PCI passthrough coldplug.  For that, the
PCI devices are hotplugged to a running PV stubdom, and then the QEMU
QMP device_add commands are made to QEMU inside the stubdom.

Are running PV domain calls libxl__wait_for_backend().  With the current
placement of libxl__create_pci_backend(), the path does not exist and
the call immediately fails:
libxl: error: libxl_device.c:1388:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/43/0 does not exist
libxl: error: libxl_pci.c:1764:device_pci_add_done: Domain 42:libxl__device_pci_add failed for PCI device 0:2:0.0 (rc -3)
libxl: error: libxl_create.c:1857:domcreate_attach_devices: Domain 42:unable to add pci devices

The wait is only relevant when the backend is already present.  num_devs
is already used to determine if the backend needs to be created.  Re-use
num_devs to determine if the backend wait is necessary.  The wait is
necessary to avoid racing with another PCI attachment reconfiguring the
front/back or changing to some other state like closing. If we are
creating the backend, then we don't have to worry about the state since
it is being created.

Fixes: 0fdb48ffe7a1 ("libxl: Make sure devices added by pci-attach are
reflected in the config")

Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
---
Alternative to Jan's patch:
https://lore.kernel.org/xen-devel/5114ae87-bc0e-3d58-e16e-6d9d2fee0801@suse.com/

v3:
Change title & commit message

v2:
https://lore.kernel.org/xen-devel/20210812005700.3159-1-jandryuk@gmail.com/
Add Fixes
Expand num_devs use in commit message
---
 tools/libs/light/libxl_pci.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/libs/light/libxl_pci.c b/tools/libs/light/libxl_pci.c
index 4c2d7aeefb..e8fd3bd937 100644
--- a/tools/libs/light/libxl_pci.c
+++ b/tools/libs/light/libxl_pci.c
@@ -157,8 +157,10 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc,
     if (domtype == LIBXL_DOMAIN_TYPE_INVALID)
         return ERROR_FAIL;
 
-    if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
-        if (libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d", XenbusStateConnected)) < 0)
+    /* wait is only needed if the backend already exists (num_devs != NULL) */
+    if (num_devs && !starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
+        if (libxl__wait_for_backend(gc, be_path,
+                                    GCSPRINTF("%d", XenbusStateConnected)) < 0)
             return ERROR_FAIL;
     }
 
-- 
2.32.0


Re: [PATCH v3] libxl/PCI: Fix PV hotplug & stubdom coldplug
Posted by Jan Beulich 2 years, 4 months ago
(also Cc-ing Paul)

On 09.01.2022 19:04, Jason Andryuk wrote:
> commit 0fdb48ffe7a1 "libxl: Make sure devices added by pci-attach are
> reflected in the config" broken PCI hotplug (xl pci-attach) for PV
> domains when it moved libxl__create_pci_backend() later in the function.
> 
> This also broke HVM + stubdom PCI passthrough coldplug.  For that, the
> PCI devices are hotplugged to a running PV stubdom, and then the QEMU
> QMP device_add commands are made to QEMU inside the stubdom.
> 
> Are running PV domain calls libxl__wait_for_backend().

I could only make sense of this when replacing "Are" by "A", but then
I'm not sure that's what you've meant.

>  With the current
> placement of libxl__create_pci_backend(), the path does not exist and
> the call immediately fails:
> libxl: error: libxl_device.c:1388:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/43/0 does not exist
> libxl: error: libxl_pci.c:1764:device_pci_add_done: Domain 42:libxl__device_pci_add failed for PCI device 0:2:0.0 (rc -3)
> libxl: error: libxl_create.c:1857:domcreate_attach_devices: Domain 42:unable to add pci devices
> 
> The wait is only relevant when the backend is already present.  num_devs
> is already used to determine if the backend needs to be created.  Re-use
> num_devs to determine if the backend wait is necessary.  The wait is
> necessary to avoid racing with another PCI attachment reconfiguring the
> front/back or changing to some other state like closing. If we are
> creating the backend, then we don't have to worry about the state since
> it is being created.

While I follow all of this, what I'm missing here is some form of proof
why the wait is _not_ needed for a newly created backend. Isn't it
necessary for the backend to reach Connected also when putting in place
the first device, in order for the guest to be able to actually use the
device? Is it perhaps assumed that the frontend driver would wait for
the backend reaching Connected (emphasizing the difference to HVM,
where no frontend driver exists)? Whatever the answer, it may be
worthwhile to also add that (in suitably abbreviated form) to the newly
added code comment.

> Fixes: 0fdb48ffe7a1 ("libxl: Make sure devices added by pci-attach are
> reflected in the config")
> 
> Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
> ---
> Alternative to Jan's patch:
> https://lore.kernel.org/xen-devel/5114ae87-bc0e-3d58-e16e-6d9d2fee0801@suse.com/

This answers the question raised in reply to Anthony's comment on my
patch.

> --- a/tools/libs/light/libxl_pci.c
> +++ b/tools/libs/light/libxl_pci.c
> @@ -157,8 +157,10 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc,
>      if (domtype == LIBXL_DOMAIN_TYPE_INVALID)
>          return ERROR_FAIL;
>  
> -    if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
> -        if (libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d", XenbusStateConnected)) < 0)
> +    /* wait is only needed if the backend already exists (num_devs != NULL) */
> +    if (num_devs && !starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
> +        if (libxl__wait_for_backend(gc, be_path,
> +                                    GCSPRINTF("%d", XenbusStateConnected)) < 0)
>              return ERROR_FAIL;
>      }
>  

In how far is the !starting part of the condition then still needed? I
have to admit that I've been wondering about the applicability of this
condition anyway, and your patch description actually seems to further
support my suspicion about this not being quite right (which is
connected to the fact that devices get added one by one instead of all
in one go, which would avoid the need for the backend to reconfigure
at all during domain creation).

Two nits:

With tools/libs/light/CODING_STYLE not saying anything about comment
style, imo ./CODING_STYLE applies, which demands comments to start with
a capital letter.

Plus, while I'm not sure what the conventions in libxl are, touching this
code would seem like a great opportunity to me to fold the two if()-s.

Jan


Re: [PATCH v3] libxl/PCI: Fix PV hotplug & stubdom coldplug
Posted by Jason Andryuk 2 years, 4 months ago
On Mon, Jan 10, 2022 at 4:09 AM Jan Beulich <jbeulich@suse.com> wrote:
>
> (also Cc-ing Paul)

Thanks.

> On 09.01.2022 19:04, Jason Andryuk wrote:
> > commit 0fdb48ffe7a1 "libxl: Make sure devices added by pci-attach are
> > reflected in the config" broken PCI hotplug (xl pci-attach) for PV
> > domains when it moved libxl__create_pci_backend() later in the function.
> >
> > This also broke HVM + stubdom PCI passthrough coldplug.  For that, the
> > PCI devices are hotplugged to a running PV stubdom, and then the QEMU
> > QMP device_add commands are made to QEMU inside the stubdom.
> >
> > Are running PV domain calls libxl__wait_for_backend().
>
> I could only make sense of this when replacing "Are" by "A", but then
> I'm not sure that's what you've meant.

Yes, "A".

> >  With the current
> > placement of libxl__create_pci_backend(), the path does not exist and
> > the call immediately fails:
> > libxl: error: libxl_device.c:1388:libxl__wait_for_backend: Backend /local/domain/0/backend/pci/43/0 does not exist
> > libxl: error: libxl_pci.c:1764:device_pci_add_done: Domain 42:libxl__device_pci_add failed for PCI device 0:2:0.0 (rc -3)
> > libxl: error: libxl_create.c:1857:domcreate_attach_devices: Domain 42:unable to add pci devices
> >
> > The wait is only relevant when the backend is already present.  num_devs
> > is already used to determine if the backend needs to be created.  Re-use
> > num_devs to determine if the backend wait is necessary.  The wait is
> > necessary to avoid racing with another PCI attachment reconfiguring the
> > front/back or changing to some other state like closing. If we are
> > creating the backend, then we don't have to worry about the state since
> > it is being created.
>
> While I follow all of this, what I'm missing here is some form of proof
> why the wait is _not_ needed for a newly created backend.  Isn't it
> necessary for the backend to reach Connected also when putting in place
> the first device, in order for the guest to be able to actually use the
> device?

The backend creation is now done as part of the xenstore transaction,
see "This is the first device, so create the backend" in
libxl__device_pci_add_xenstore().  That is part of Paul's change in
0fdb48ffe7a1.  Before that, it created the backend and waited.

> Is it perhaps assumed that the frontend driver would wait for
> the backend reaching Connected (emphasizing the difference to HVM,
> where no frontend driver exists)? Whatever the answer, it may be
> worthwhile to also add that (in suitably abbreviated form) to the newly
> added code comment.

I think my starting vs. !starting comment below will clarify this.

> > Fixes: 0fdb48ffe7a1 ("libxl: Make sure devices added by pci-attach are
> > reflected in the config")
> >
> > Signed-off-by: Jason Andryuk <jandryuk@gmail.com>
> > ---
> > Alternative to Jan's patch:
> > https://lore.kernel.org/xen-devel/5114ae87-bc0e-3d58-e16e-6d9d2fee0801@suse.com/
>
> This answers the question raised in reply to Anthony's comment on my
> patch.
>
> > --- a/tools/libs/light/libxl_pci.c
> > +++ b/tools/libs/light/libxl_pci.c
> > @@ -157,8 +157,10 @@ static int libxl__device_pci_add_xenstore(libxl__gc *gc,
> >      if (domtype == LIBXL_DOMAIN_TYPE_INVALID)
> >          return ERROR_FAIL;
> >
> > -    if (!starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
> > -        if (libxl__wait_for_backend(gc, be_path, GCSPRINTF("%d", XenbusStateConnected)) < 0)
> > +    /* wait is only needed if the backend already exists (num_devs != NULL) */
> > +    if (num_devs && !starting && domtype == LIBXL_DOMAIN_TYPE_PV) {
> > +        if (libxl__wait_for_backend(gc, be_path,
> > +                                    GCSPRINTF("%d", XenbusStateConnected)) < 0)
> >              return ERROR_FAIL;
> >      }
> >
>
> In how far is the !starting part of the condition then still needed? I
> have to admit that I've been wondering about the applicability of this
> condition anyway, and your patch description actually seems to further
> support my suspicion about this not being quite right (which is
> connected to the fact that devices get added one by one instead of all
> in one go, which would avoid the need for the backend to reconfigure
> at all during domain creation).

In the starting case, libxl can write all the xenstore entries without
waiting.  The frontend is not running, so this function is called
multiple times and the additional subdevices are written.

But you have a point that the backend is running and will see the
xenstores entries.  However, Reconfiguring is only set when !starting.
And since the frontend is not running when starting, then the backend
cannot have transitioned to Connected.  Looking at xen-pciback, it
waits for the frontend to go to Initialized (From Initializing) before
transitioning to Connected.  So the multiple independent xenstore
writes accumulate for the starting case.

In the !starting (running) case, the code adds subdevices one at a
time and waits for each reconfiguration to complete prior to modifying
the xenstore state.

You previously mentioned wanting all the devices at written at once.
libxl is written to handle 1 device at a time.  That works fine for
most PV devices, but PCI with subdevices doesn't fit that paradigm.
While the PV xenstore writes could be done as a single transaction,
HVM still requires individual QMP device_add commands per device.  It
could be made to work, but it seemed like more work than it is worth
and doesn't seem to mesh well with the rest of libxl.

> Two nits:
>
> With tools/libs/light/CODING_STYLE not saying anything about comment
> style, imo ./CODING_STYLE applies, which demands comments to start with
> a capital letter.

Yes, this sounds good.

> Plus, while I'm not sure what the conventions in libxl are, touching this
> code would seem like a great opportunity to me to fold the two if()-s.

I prefer them separate.  There are two logical conditions - "do we
need to wait?" and "did the wait succeed?".  Keeping them separate
maintains that distinction, and it is more readable IMO.

Regards,
Jason