When multiple PCI devices get assigned to a guest right at boot, libxl
incrementally populates the backend tree. The writes for the first of
the devices trigger the backend watch. In turn xen_pcibk_setup_backend()
will set the XenBus state to Initialised, at which point no further
reconfigures would happen unless a device got hotplugged. Arrange for
reconfigure to also get triggered from the backend watch handler.
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Cc: stable@vger.kernel.org
---
I will admit that this isn't entirely race-free (with the guest actually
attaching in parallel), but from the looks of it such a race ought to be
benign (not the least thanks to the mutex). Ideally the tool stack
wouldn't write num_devs until all devices had their information
populated. I tried doing so in libxl, but xen_pcibk_setup_backend()
calling xenbus_dev_fatal() when not being able to read that node
prohibits such an approach (or else libxl and driver changes would need
to go into use in lock-step).
I wonder why what is being watched isn't just the num_devs node. Right
now the watch triggers quite frequently without anything relevant
actually having changed (I suppose in at least some cases in response
to writes by the backend itself).
--- a/drivers/xen/xen-pciback/xenbus.c
+++ b/drivers/xen/xen-pciback/xenbus.c
@@ -359,7 +359,8 @@ out:
return err;
}
-static int xen_pcibk_reconfigure(struct xen_pcibk_device *pdev)
+static int xen_pcibk_reconfigure(struct xen_pcibk_device *pdev,
+ enum xenbus_state state)
{
int err = 0;
int num_devs;
@@ -374,8 +375,7 @@ static int xen_pcibk_reconfigure(struct
mutex_lock(&pdev->dev_lock);
/* Make sure we only reconfigure once */
- if (xenbus_read_driver_state(pdev->xdev->nodename) !=
- XenbusStateReconfiguring)
+ if (xenbus_read_driver_state(pdev->xdev->nodename) != state)
goto out;
err = xenbus_scanf(XBT_NIL, pdev->xdev->nodename, "num_devs", "%d",
@@ -500,6 +500,9 @@ static int xen_pcibk_reconfigure(struct
}
}
+ if (state != XenbusStateReconfiguring)
+ goto out;
+
err = xenbus_switch_state(pdev->xdev, XenbusStateReconfigured);
if (err) {
xenbus_dev_fatal(pdev->xdev, err,
@@ -525,7 +528,7 @@ static void xen_pcibk_frontend_changed(s
break;
case XenbusStateReconfiguring:
- xen_pcibk_reconfigure(pdev);
+ xen_pcibk_reconfigure(pdev, XenbusStateReconfiguring);
break;
case XenbusStateConnected:
@@ -664,6 +667,10 @@ static void xen_pcibk_be_watch(struct xe
xen_pcibk_setup_backend(pdev);
break;
+ case XenbusStateInitialised:
+ xen_pcibk_reconfigure(pdev, XenbusStateInitialised);
+ break;
+
default:
break;
}
On 4/7/21 10:37 AM, Jan Beulich wrote:
> When multiple PCI devices get assigned to a guest right at boot, libxl
> incrementally populates the backend tree. The writes for the first of
> the devices trigger the backend watch. In turn xen_pcibk_setup_backend()
> will set the XenBus state to Initialised, at which point no further
> reconfigures would happen unless a device got hotplugged. Arrange for
> reconfigure to also get triggered from the backend watch handler.
>
> Signed-off-by: Jan Beulich <jbeulich@suse.com>
> Cc: stable@vger.kernel.org
> ---
> I will admit that this isn't entirely race-free (with the guest actually
> attaching in parallel), but from the looks of it such a race ought to be
> benign (not the least thanks to the mutex). Ideally the tool stack
> wouldn't write num_devs until all devices had their information
> populated. I tried doing so in libxl, but xen_pcibk_setup_backend()
> calling xenbus_dev_fatal() when not being able to read that node
> prohibits such an approach (or else libxl and driver changes would need
> to go into use in lock-step).
>
> I wonder why what is being watched isn't just the num_devs node. Right
> now the watch triggers quite frequently without anything relevant
> actually having changed (I suppose in at least some cases in response
> to writes by the backend itself).
>
> --- a/drivers/xen/xen-pciback/xenbus.c
> +++ b/drivers/xen/xen-pciback/xenbus.c
> @@ -359,7 +359,8 @@ out:
> return err;
> }
>
> -static int xen_pcibk_reconfigure(struct xen_pcibk_device *pdev)
> +static int xen_pcibk_reconfigure(struct xen_pcibk_device *pdev,
> + enum xenbus_state state)
> {
> int err = 0;
> int num_devs;
> @@ -374,8 +375,7 @@ static int xen_pcibk_reconfigure(struct
>
> mutex_lock(&pdev->dev_lock);
> /* Make sure we only reconfigure once */
Is this comment still valid or should it be moved ...
> - if (xenbus_read_driver_state(pdev->xdev->nodename) !=
> - XenbusStateReconfiguring)
> + if (xenbus_read_driver_state(pdev->xdev->nodename) != state)
> goto out;
>
> err = xenbus_scanf(XBT_NIL, pdev->xdev->nodename, "num_devs", "%d",
> @@ -500,6 +500,9 @@ static int xen_pcibk_reconfigure(struct
> }
> }
>
> + if (state != XenbusStateReconfiguring)
> + goto out;
> +
... here?
> err = xenbus_switch_state(pdev->xdev, XenbusStateReconfigured);
> if (err) {
> xenbus_dev_fatal(pdev->xdev, err,
> @@ -525,7 +528,7 @@ static void xen_pcibk_frontend_changed(s
> break;
>
> case XenbusStateReconfiguring:
> - xen_pcibk_reconfigure(pdev);
> + xen_pcibk_reconfigure(pdev, XenbusStateReconfiguring);
> break;
>
> case XenbusStateConnected:
> @@ -664,6 +667,10 @@ static void xen_pcibk_be_watch(struct xe
> xen_pcibk_setup_backend(pdev);
> break;
>
> + case XenbusStateInitialised:
> + xen_pcibk_reconfigure(pdev, XenbusStateInitialised);
Could you add a comment here that this is needed when multiple devices are added?
It also looks a bit odd that adding a device is now viewed as a reconfiguration. It seems to me that xen_pcibk_setup_backend() and xen_pcibk_reconfigure() really should be merged --- initialization code for both looks pretty much the same.
-boris
On 09.04.2021 23:43, Boris Ostrovsky wrote:
>
> On 4/7/21 10:37 AM, Jan Beulich wrote:
>> When multiple PCI devices get assigned to a guest right at boot, libxl
>> incrementally populates the backend tree. The writes for the first of
>> the devices trigger the backend watch. In turn xen_pcibk_setup_backend()
>> will set the XenBus state to Initialised, at which point no further
>> reconfigures would happen unless a device got hotplugged. Arrange for
>> reconfigure to also get triggered from the backend watch handler.
>>
>> Signed-off-by: Jan Beulich <jbeulich@suse.com>
>> Cc: stable@vger.kernel.org
>> ---
>> I will admit that this isn't entirely race-free (with the guest actually
>> attaching in parallel), but from the looks of it such a race ought to be
>> benign (not the least thanks to the mutex). Ideally the tool stack
>> wouldn't write num_devs until all devices had their information
>> populated. I tried doing so in libxl, but xen_pcibk_setup_backend()
>> calling xenbus_dev_fatal() when not being able to read that node
>> prohibits such an approach (or else libxl and driver changes would need
>> to go into use in lock-step).
>>
>> I wonder why what is being watched isn't just the num_devs node. Right
>> now the watch triggers quite frequently without anything relevant
>> actually having changed (I suppose in at least some cases in response
>> to writes by the backend itself).
>>
>> --- a/drivers/xen/xen-pciback/xenbus.c
>> +++ b/drivers/xen/xen-pciback/xenbus.c
>> @@ -359,7 +359,8 @@ out:
>> return err;
>> }
>>
>> -static int xen_pcibk_reconfigure(struct xen_pcibk_device *pdev)
>> +static int xen_pcibk_reconfigure(struct xen_pcibk_device *pdev,
>> + enum xenbus_state state)
>> {
>> int err = 0;
>> int num_devs;
>> @@ -374,8 +375,7 @@ static int xen_pcibk_reconfigure(struct
>>
>> mutex_lock(&pdev->dev_lock);
>> /* Make sure we only reconfigure once */
>
>
> Is this comment still valid or should it be moved ...
>
>
>> - if (xenbus_read_driver_state(pdev->xdev->nodename) !=
>> - XenbusStateReconfiguring)
>> + if (xenbus_read_driver_state(pdev->xdev->nodename) != state)
>> goto out;
>>
>> err = xenbus_scanf(XBT_NIL, pdev->xdev->nodename, "num_devs", "%d",
>> @@ -500,6 +500,9 @@ static int xen_pcibk_reconfigure(struct
>> }
>> }
>>
>> + if (state != XenbusStateReconfiguring)
>> + goto out;
>> +
>
>
> ... here?
Yeah, probably better to move it.
>> err = xenbus_switch_state(pdev->xdev, XenbusStateReconfigured);
>> if (err) {
>> xenbus_dev_fatal(pdev->xdev, err,
>> @@ -525,7 +528,7 @@ static void xen_pcibk_frontend_changed(s
>> break;
>>
>> case XenbusStateReconfiguring:
>> - xen_pcibk_reconfigure(pdev);
>> + xen_pcibk_reconfigure(pdev, XenbusStateReconfiguring);
>> break;
>>
>> case XenbusStateConnected:
>> @@ -664,6 +667,10 @@ static void xen_pcibk_be_watch(struct xe
>> xen_pcibk_setup_backend(pdev);
>> break;
>>
>> + case XenbusStateInitialised:
>> + xen_pcibk_reconfigure(pdev, XenbusStateInitialised);
>
>
> Could you add a comment here that this is needed when multiple devices are added?
Sure.
> It also looks a bit odd that adding a device is now viewed as a reconfiguration. It seems to me that xen_pcibk_setup_backend() and xen_pcibk_reconfigure() really should be merged --- initialization code for both looks pretty much the same.
I thought the same, but didn't want to make more of a change than is
needed to address the problem at hand. Plus of course there's still
the possibility that someone may fix libxl, at which point the change
here (and hence the merging) would become unnecessary.
Jan
On 4/12/21 5:44 AM, Jan Beulich wrote: > >> It also looks a bit odd that adding a device is now viewed as a reconfiguration. It seems to me that xen_pcibk_setup_backend() and xen_pcibk_reconfigure() really should be merged --- initialization code for both looks pretty much the same. > I thought the same, but didn't want to make more of a change than is > needed to address the problem at hand. Plus of course there's still > the possibility that someone may fix libxl, at which point the change > here (and hence the merging) would become unnecessary. Merging them would be a good thing regardless of fixing libxl. But I agree that it is separate from fixing the issue at hand. Maybe rename the function? (I myself can't think of a good name). -boris
© 2016 - 2026 Red Hat, Inc.