[PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready

Marcel Apfelbaum posted 1 patch 3 years, 6 months ago
Test checkpatch passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20201022114026.31968-1-marcel.apfelbaum@gmail.com
Maintainers: "Michael S. Tsirkin" <mst@redhat.com>, Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
hw/pci/pcie.c | 7 +++++++
1 file changed, 7 insertions(+)
[PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Marcel Apfelbaum 3 years, 6 months ago
From: Marcel Apfelbaum <marcel@redhat.com>

During PCIe Root Port's transition from Power-Off to Power-ON (or vice-versa)
the "Slot Control Register" has the "Power Indicator Control"
set to "Blinking" expressing a "power transition" mode.

Any hotplug operation during the "power transition" mode is not permitted
or at least not expected by the Guest OS leading to strange failures.

Detect and refuse hotplug operations in such case.

Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
---
 hw/pci/pcie.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 5b48bae0f6..2fe5c1473f 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
     PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
     uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap;
     uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);
+    uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);
 
     /* Check if hot-plug is disabled on the slot */
     if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {
@@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
         return;
     }
 
+    if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) {
+        error_setg(errp, "Hot-plug failed: %s is in Power Transition",
+                   DEVICE(hotplug_pdev)->id);
+        return;
+    }
+
     pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp);
 }
 
-- 
2.17.2


Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Michael S. Tsirkin 3 years, 5 months ago
On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:
> From: Marcel Apfelbaum <marcel@redhat.com>
> 
> During PCIe Root Port's transition from Power-Off to Power-ON (or vice-versa)
> the "Slot Control Register" has the "Power Indicator Control"
> set to "Blinking" expressing a "power transition" mode.
> 
> Any hotplug operation during the "power transition" mode is not permitted
> or at least not expected by the Guest OS leading to strange failures.
> 
> Detect and refuse hotplug operations in such case.
> 
> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>


Going back to this I have another question: could we get
a bit more detail on when do we get into this situation?
When does guest start blinking the indicator without us
first starting a hotplug operation?

> ---
>  hw/pci/pcie.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 5b48bae0f6..2fe5c1473f 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
>      PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
>      uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap;
>      uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);
> +    uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);
>  
>      /* Check if hot-plug is disabled on the slot */
>      if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {
> @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
>          return;
>      }
>  
> +    if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) {
> +        error_setg(errp, "Hot-plug failed: %s is in Power Transition",
> +                   DEVICE(hotplug_pdev)->id);
> +        return;
> +    }
> +
>      pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp);
>  }
>  
> -- 
> 2.17.2


Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Marcel Apfelbaum 3 years, 5 months ago
Hi Michael,

On Wed, Nov 11, 2020 at 2:35 PM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:
> > From: Marcel Apfelbaum <marcel@redhat.com>
> >
> > During PCIe Root Port's transition from Power-Off to Power-ON (or
> vice-versa)
> > the "Slot Control Register" has the "Power Indicator Control"
> > set to "Blinking" expressing a "power transition" mode.
> >
> > Any hotplug operation during the "power transition" mode is not permitted
> > or at least not expected by the Guest OS leading to strange failures.
> >
> > Detect and refuse hotplug operations in such case.
> >
> > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
>
>
> Going back to this I have another question: could we get
> a bit more detail on when do we get into this situation?
> When does guest start blinking the indicator without us
> first starting a hotplug operation?
>

While David has more insight on the kernel behavior during the mentioned
issue,
it seems the PCI subsystem sets the Power Indicator
to "blinking" as part of the init sequence of the PCIe Root Port.
The kernel will turn it off as soon as it finishes the init sequence.

A hotplug operation during the "init" sequence will surprise the Guest OS.
I'll let David to give us more info.

Thank you,
Marcel


> > ---
> >  hw/pci/pcie.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > index 5b48bae0f6..2fe5c1473f 100644
> > --- a/hw/pci/pcie.c
> > +++ b/hw/pci/pcie.c
> > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler
> *hotplug_dev, DeviceState *dev,
> >      PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
> >      uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap;
> >      uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);
> > +    uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);
> >
> >      /* Check if hot-plug is disabled on the slot */
> >      if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {
> > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler
> *hotplug_dev, DeviceState *dev,
> >          return;
> >      }
> >
> > +    if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) {
> > +        error_setg(errp, "Hot-plug failed: %s is in Power Transition",
> > +                   DEVICE(hotplug_pdev)->id);
> > +        return;
> > +    }
> > +
> >      pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp);
> >  }
> >
> > --
> > 2.17.2
>
>
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Roman Kagan 3 years, 5 months ago
On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:
> From: Marcel Apfelbaum <marcel@redhat.com>
> 
> During PCIe Root Port's transition from Power-Off to Power-ON (or vice-versa)
> the "Slot Control Register" has the "Power Indicator Control"
> set to "Blinking" expressing a "power transition" mode.
> 
> Any hotplug operation during the "power transition" mode is not permitted
> or at least not expected by the Guest OS leading to strange failures.
> 
> Detect and refuse hotplug operations in such case.
> 
> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> ---
>  hw/pci/pcie.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 5b48bae0f6..2fe5c1473f 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
>      PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
>      uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap;
>      uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);
> +    uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);
>  
>      /* Check if hot-plug is disabled on the slot */
>      if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {
> @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
>          return;
>      }
>  
> +    if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) {
> +        error_setg(errp, "Hot-plug failed: %s is in Power Transition",
> +                   DEVICE(hotplug_pdev)->id);
> +        return;
> +    }
> +
>      pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp);
>  }

I wonder if hw/pci/shpc.c is free from this issue?

Roman.

Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Marcel Apfelbaum 3 years, 5 months ago
Hi Roman,

On Wed, Nov 11, 2020 at 6:10 PM Roman Kagan <rvkagan@yandex-team.ru> wrote:

> On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:
> > From: Marcel Apfelbaum <marcel@redhat.com>
> >
> > During PCIe Root Port's transition from Power-Off to Power-ON (or
> vice-versa)
> > the "Slot Control Register" has the "Power Indicator Control"
> > set to "Blinking" expressing a "power transition" mode.
> >
> > Any hotplug operation during the "power transition" mode is not permitted
> > or at least not expected by the Guest OS leading to strange failures.
> >
> > Detect and refuse hotplug operations in such case.
> >
> > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > ---
> >  hw/pci/pcie.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > index 5b48bae0f6..2fe5c1473f 100644
> > --- a/hw/pci/pcie.c
> > +++ b/hw/pci/pcie.c
> > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler
> *hotplug_dev, DeviceState *dev,
> >      PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
> >      uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap;
> >      uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);
> > +    uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);
> >
> >      /* Check if hot-plug is disabled on the slot */
> >      if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {
> > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler
> *hotplug_dev, DeviceState *dev,
> >          return;
> >      }
> >
> > +    if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) {
> > +        error_setg(errp, "Hot-plug failed: %s is in Power Transition",
> > +                   DEVICE(hotplug_pdev)->id);
> > +        return;
> > +    }
> > +
> >      pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp);
> >  }
>
> I wonder if hw/pci/shpc.c is free from this issue?
>
>
I did not encounter the issue with shpc.

However, the SHPC spec describes the Power Indicator behavior for hotplug
only leaving the booting sequence out of it:
    "Although the Standard Usage Model presented in Section 2.2.1.2
requires a specific
     relationship between the Power Indicator and the state of the slot,
this relationship is
     enforced by software, not by the SHPC."

It looks like it depends on the software implementation.


Thanks,
Marcel


> Roman.
>
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Michael S. Tsirkin 3 years, 6 months ago
On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:
> From: Marcel Apfelbaum <marcel@redhat.com>
> 
> During PCIe Root Port's transition from Power-Off to Power-ON (or vice-versa)
> the "Slot Control Register" has the "Power Indicator Control"
> set to "Blinking" expressing a "power transition" mode.
> 
> Any hotplug operation during the "power transition" mode is not permitted
> or at least not expected by the Guest OS leading to strange failures.
> 
> Detect and refuse hotplug operations in such case.
> 
> Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> ---
>  hw/pci/pcie.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 5b48bae0f6..2fe5c1473f 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
>      PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
>      uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap;
>      uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);
> +    uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);
>  
>      /* Check if hot-plug is disabled on the slot */
>      if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {
> @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
>          return;
>      }
>  
> +    if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) {
> +        error_setg(errp, "Hot-plug failed: %s is in Power Transition",
> +                   DEVICE(hotplug_pdev)->id);
> +        return;
> +    }
> +
>      pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp);
>  }

Probably the only way to handle for existing machine types.
For new ones, can't we queue it in host memory somewhere?

> -- 
> 2.17.2


Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by David Gibson 3 years, 6 months ago
On Thu, 22 Oct 2020 08:06:55 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:
> > From: Marcel Apfelbaum <marcel@redhat.com>
> > 
> > During PCIe Root Port's transition from Power-Off to Power-ON (or vice-versa)
> > the "Slot Control Register" has the "Power Indicator Control"
> > set to "Blinking" expressing a "power transition" mode.
> > 
> > Any hotplug operation during the "power transition" mode is not permitted
> > or at least not expected by the Guest OS leading to strange failures.
> > 
> > Detect and refuse hotplug operations in such case.
> > 
> > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > ---
> >  hw/pci/pcie.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> > 
> > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > index 5b48bae0f6..2fe5c1473f 100644
> > --- a/hw/pci/pcie.c
> > +++ b/hw/pci/pcie.c
> > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
> >      PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
> >      uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap;
> >      uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);
> > +    uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);
> >  
> >      /* Check if hot-plug is disabled on the slot */
> >      if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {
> > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
> >          return;
> >      }
> >  
> > +    if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) {
> > +        error_setg(errp, "Hot-plug failed: %s is in Power Transition",
> > +                   DEVICE(hotplug_pdev)->id);
> > +        return;
> > +    }
> > +
> >      pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp);
> >  }  
> 
> Probably the only way to handle for existing machine types.
> For new ones, can't we queue it in host memory somewhere?

I'm not actually convinced we can't do that even for existing machine
types.  So I'm a bit hesitant to suggest going ahead with this without
looking a bit closer at whether we can implement a wait-for-ready in
qemu, rather than forcing every user of qemu (human or machine) to do
so.


-- 
David Gibson <dgibson@redhat.com>
Principal Software Engineer, Virtualization, Red Hat
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Michael S. Tsirkin 3 years, 6 months ago
On Thu, Oct 22, 2020 at 11:56:32PM +1100, David Gibson wrote:
> On Thu, 22 Oct 2020 08:06:55 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:
> > > From: Marcel Apfelbaum <marcel@redhat.com>
> > > 
> > > During PCIe Root Port's transition from Power-Off to Power-ON (or vice-versa)
> > > the "Slot Control Register" has the "Power Indicator Control"
> > > set to "Blinking" expressing a "power transition" mode.
> > > 
> > > Any hotplug operation during the "power transition" mode is not permitted
> > > or at least not expected by the Guest OS leading to strange failures.
> > > 
> > > Detect and refuse hotplug operations in such case.
> > > 
> > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > > ---
> > >  hw/pci/pcie.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > > 
> > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > index 5b48bae0f6..2fe5c1473f 100644
> > > --- a/hw/pci/pcie.c
> > > +++ b/hw/pci/pcie.c
> > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >      PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
> > >      uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->exp.exp_cap;
> > >      uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);
> > > +    uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);
> > >  
> > >      /* Check if hot-plug is disabled on the slot */
> > >      if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {
> > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >          return;
> > >      }
> > >  
> > > +    if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK) {
> > > +        error_setg(errp, "Hot-plug failed: %s is in Power Transition",
> > > +                   DEVICE(hotplug_pdev)->id);
> > > +        return;
> > > +    }
> > > +
> > >      pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp);
> > >  }  
> > 
> > Probably the only way to handle for existing machine types.
> > For new ones, can't we queue it in host memory somewhere?
> 
> I'm not actually convinced we can't do that even for existing machine
> types.

The difficulty would be in migrating the extra "reuested but defferred"
state.

>  So I'm a bit hesitant to suggest going ahead with this without
> looking a bit closer at whether we can implement a wait-for-ready in
> qemu, rather than forcing every user of qemu (human or machine) to do
> so.
> 
> 
> -- 
> David Gibson <dgibson@redhat.com>
> Principal Software Engineer, Virtualization, Red Hat



Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by David Gibson 3 years, 6 months ago
On Thu, 22 Oct 2020 09:15:28 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Oct 22, 2020 at 11:56:32PM +1100, David Gibson wrote:
>  [...]  
>  [...]  
>  [...]  
> > > 
> > > Probably the only way to handle for existing machine types.
> > > For new ones, can't we queue it in host memory somewhere?  
> > 
> > I'm not actually convinced we can't do that even for existing machine
> > types.  
> 
> The difficulty would be in migrating the extra "reuested but defferred"
> state.

Ah, true.  Although we could block migration for the duration instead.

-- 
David Gibson <dgibson@redhat.com>
Principal Software Engineer, Virtualization, Red Hat
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Marcel Apfelbaum 3 years, 6 months ago
Hi David, Michael,

On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgibson@redhat.com> wrote:

> On Thu, 22 Oct 2020 08:06:55 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:
> > > From: Marcel Apfelbaum <marcel@redhat.com>
> > >
> > > During PCIe Root Port's transition from Power-Off to Power-ON (or
> vice-versa)
> > > the "Slot Control Register" has the "Power Indicator Control"
> > > set to "Blinking" expressing a "power transition" mode.
> > >
> > > Any hotplug operation during the "power transition" mode is not
> permitted
> > > or at least not expected by the Guest OS leading to strange failures.
> > >
> > > Detect and refuse hotplug operations in such case.
> > >
> > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> > > ---
> > >  hw/pci/pcie.c | 7 +++++++
> > >  1 file changed, 7 insertions(+)
> > >
> > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> > > index 5b48bae0f6..2fe5c1473f 100644
> > > --- a/hw/pci/pcie.c
> > > +++ b/hw/pci/pcie.c
> > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler
> *hotplug_dev, DeviceState *dev,
> > >      PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
> > >      uint8_t *exp_cap = hotplug_pdev->config +
> hotplug_pdev->exp.exp_cap;
> > >      uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);
> > > +    uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);
> > >
> > >      /* Check if hot-plug is disabled on the slot */
> > >      if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {
> > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler
> *hotplug_dev, DeviceState *dev,
> > >          return;
> > >      }
> > >
> > > +    if ((sltctl & PCI_EXP_SLTCTL_PIC) ==
> PCI_EXP_SLTCTL_PWR_IND_BLINK) {
> > > +        error_setg(errp, "Hot-plug failed: %s is in Power Transition",
> > > +                   DEVICE(hotplug_pdev)->id);
> > > +        return;
> > > +    }
> > > +
> > >      pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp);
> > >  }
> >
> > Probably the only way to handle for existing machine types.
>

I agree


> > For new ones, can't we queue it in host memory somewhere?
>
>
I am not sure I understand what will be the flow.
  - The user asks for a hotplug operation.
  -  QEMU deferred operation.
After that the operation may still fail, how would the user know if the
operation
succeeded or not?



> I'm not actually convinced we can't do that even for existing machine
> types.


Is a Guest visible change, I don't think we can do it.


> So I'm a bit hesitant to suggest going ahead with this without
> looking a bit closer at whether we can implement a wait-for-ready in
> qemu, rather than forcing every user of qemu (human or machine) to do
> so.
>

While I agree it is a pain from the usability point of view, hotplug
operations
are allowed to fail. This is not more than a corner case, ensuring the right
response (gracefully erroring out) may be enough.

Thanks,
Marcel



>
>
> --
> David Gibson <dgibson@redhat.com>
> Principal Software Engineer, Virtualization, Red Hat
>
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Michael S. Tsirkin 3 years, 6 months ago
On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote:
> Hi David, Michael,
> 
> On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgibson@redhat.com> wrote:
> 
>     On Thu, 22 Oct 2020 08:06:55 -0400
>     "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
>     > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:
>     > > From: Marcel Apfelbaum <marcel@redhat.com>
>     > >
>     > > During PCIe Root Port's transition from Power-Off to Power-ON (or
>     vice-versa)
>     > > the "Slot Control Register" has the "Power Indicator Control"
>     > > set to "Blinking" expressing a "power transition" mode.
>     > >
>     > > Any hotplug operation during the "power transition" mode is not
>     permitted
>     > > or at least not expected by the Guest OS leading to strange failures.
>     > >
>     > > Detect and refuse hotplug operations in such case.
>     > >
>     > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
>     > > ---
>     > >  hw/pci/pcie.c | 7 +++++++
>     > >  1 file changed, 7 insertions(+)
>     > >
>     > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
>     > > index 5b48bae0f6..2fe5c1473f 100644
>     > > --- a/hw/pci/pcie.c
>     > > +++ b/hw/pci/pcie.c
>     > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler
>     *hotplug_dev, DeviceState *dev,
>     > >      PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
>     > >      uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->
>     exp.exp_cap;
>     > >      uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);
>     > > +    uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);
>     > > 
>     > >      /* Check if hot-plug is disabled on the slot */
>     > >      if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {
>     > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler
>     *hotplug_dev, DeviceState *dev,
>     > >          return;
>     > >      }
>     > > 
>     > > +    if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK)
>     {
>     > > +        error_setg(errp, "Hot-plug failed: %s is in Power Transition",
>     > > +                   DEVICE(hotplug_pdev)->id);
>     > > +        return;
>     > > +    }
>     > > +
>     > >      pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp);
>     > >  } 
>     >
>     > Probably the only way to handle for existing machine types.
> 
> 
> I agree
>  
> 
>     > For new ones, can't we queue it in host memory somewhere?
> 
> 
> 
> I am not sure I understand what will be the flow.
>   - The user asks for a hotplug operation.
>   -  QEMU deferred operation.
> After that the operation may still fail, how would the user know if the
> operation
> succeeded or not?


How can it fail? It's just a button press ...

>  
> 
>     I'm not actually convinced we can't do that even for existing machine
>     types. 
> 
> 
> Is a Guest visible change, I don't think we can do it.
>  
> 
>     So I'm a bit hesitant to suggest going ahead with this without
>     looking a bit closer at whether we can implement a wait-for-ready in
>     qemu, rather than forcing every user of qemu (human or machine) to do
>     so.
> 
> 
> While I agree it is a pain from the usability point of view, hotplug operations
> are allowed to fail. This is not more than a corner case, ensuring the right
> response (gracefully erroring out) may be enough.
> 
> Thanks,
> Marcel
> 


I don't think they ever failed in the past so management is unlikely
to handle the failure by retrying ...

> 
> 
> 
>     --
>     David Gibson <dgibson@redhat.com>
>     Principal Software Engineer, Virtualization, Red Hat
> 


Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Marcel Apfelbaum 3 years, 6 months ago
On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote:
> > Hi David, Michael,
> >
> > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgibson@redhat.com> wrote:
> >
> >     On Thu, 22 Oct 2020 08:06:55 -0400
> >     "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >
> >     > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:
> >     > > From: Marcel Apfelbaum <marcel@redhat.com>
> >     > >
> >     > > During PCIe Root Port's transition from Power-Off to Power-ON (or
> >     vice-versa)
> >     > > the "Slot Control Register" has the "Power Indicator Control"
> >     > > set to "Blinking" expressing a "power transition" mode.
> >     > >
> >     > > Any hotplug operation during the "power transition" mode is not
> >     permitted
> >     > > or at least not expected by the Guest OS leading to strange
> failures.
> >     > >
> >     > > Detect and refuse hotplug operations in such case.
> >     > >
> >     > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
> >     > > ---
> >     > >  hw/pci/pcie.c | 7 +++++++
> >     > >  1 file changed, 7 insertions(+)
> >     > >
> >     > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> >     > > index 5b48bae0f6..2fe5c1473f 100644
> >     > > --- a/hw/pci/pcie.c
> >     > > +++ b/hw/pci/pcie.c
> >     > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler
> >     *hotplug_dev, DeviceState *dev,
> >     > >      PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
> >     > >      uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->
> >     exp.exp_cap;
> >     > >      uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);
> >     > > +    uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);
> >     > >
> >     > >      /* Check if hot-plug is disabled on the slot */
> >     > >      if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {
> >     > > @@ -418,6 +419,12 @@ void
> pcie_cap_slot_pre_plug_cb(HotplugHandler
> >     *hotplug_dev, DeviceState *dev,
> >     > >          return;
> >     > >      }
> >     > >
> >     > > +    if ((sltctl & PCI_EXP_SLTCTL_PIC) ==
> PCI_EXP_SLTCTL_PWR_IND_BLINK)
> >     {
> >     > > +        error_setg(errp, "Hot-plug failed: %s is in Power
> Transition",
> >     > > +                   DEVICE(hotplug_pdev)->id);
> >     > > +        return;
> >     > > +    }
> >     > > +
> >     > >      pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev,
> errp);
> >     > >  }
> >     >
> >     > Probably the only way to handle for existing machine types.
> >
> >
> > I agree
> >
> >
> >     > For new ones, can't we queue it in host memory somewhere?
> >
> >
> >
> > I am not sure I understand what will be the flow.
> >   - The user asks for a hotplug operation.
> >   -  QEMU deferred operation.
> > After that the operation may still fail, how would the user know if the
> > operation
> > succeeded or not?
>
>
> How can it fail? It's just a button press ...
>
>
Currently we have "Hotplug unsupported."
With this change we have "Guest/System not ready"



> >
> >
> >     I'm not actually convinced we can't do that even for existing machine
> >     types.
> >
> >
> > Is a Guest visible change, I don't think we can do it.
> >
> >
> >     So I'm a bit hesitant to suggest going ahead with this without
> >     looking a bit closer at whether we can implement a wait-for-ready in
> >     qemu, rather than forcing every user of qemu (human or machine) to do
> >     so.
> >
> >
> > While I agree it is a pain from the usability point of view, hotplug
> operations
> > are allowed to fail. This is not more than a corner case, ensuring the
> right
> > response (gracefully erroring out) may be enough.
> >
> > Thanks,
> > Marcel
> >
>
>
> I don't think they ever failed in the past so management is unlikely
> to handle the failure by retrying ...
>

That would require some management handling, yes.
But even without a "retry", failing is better than strange OS behavior.

Trying a better alternative like deferring the operation for new machines
would make sense, however is out of the scope of this patch that simply
detects the error leaving us in a slightly better state than today.

Thanks,
Marcel


>
> >
> >
> >
> >     --
> >     David Gibson <dgibson@redhat.com>
> >     Principal Software Engineer, Virtualization, Red Hat
> >
>
>
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Michael S. Tsirkin 3 years, 6 months ago
On Thu, Oct 22, 2020 at 05:10:43PM +0300, Marcel Apfelbaum wrote:
> 
> 
> On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> 
>     On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote:
>     > Hi David, Michael,
>     >
>     > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgibson@redhat.com> wrote:
>     >
>     >     On Thu, 22 Oct 2020 08:06:55 -0400
>     >     "Michael S. Tsirkin" <mst@redhat.com> wrote:
>     >
>     >     > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:
>     >     > > From: Marcel Apfelbaum <marcel@redhat.com>
>     >     > >
>     >     > > During PCIe Root Port's transition from Power-Off to Power-ON (or
>     >     vice-versa)
>     >     > > the "Slot Control Register" has the "Power Indicator Control"
>     >     > > set to "Blinking" expressing a "power transition" mode.
>     >     > >
>     >     > > Any hotplug operation during the "power transition" mode is not
>     >     permitted
>     >     > > or at least not expected by the Guest OS leading to strange
>     failures.
>     >     > >
>     >     > > Detect and refuse hotplug operations in such case.
>     >     > >
>     >     > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
>     >     > > ---
>     >     > >  hw/pci/pcie.c | 7 +++++++
>     >     > >  1 file changed, 7 insertions(+)
>     >     > >
>     >     > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
>     >     > > index 5b48bae0f6..2fe5c1473f 100644
>     >     > > --- a/hw/pci/pcie.c
>     >     > > +++ b/hw/pci/pcie.c
>     >     > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler
>     >     *hotplug_dev, DeviceState *dev,
>     >     > >      PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
>     >     > >      uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->
>     >     exp.exp_cap;
>     >     > >      uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);
>     >     > > +    uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);
>     >     > > 
>     >     > >      /* Check if hot-plug is disabled on the slot */
>     >     > >      if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {
>     >     > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb
>     (HotplugHandler
>     >     *hotplug_dev, DeviceState *dev,
>     >     > >          return;
>     >     > >      }
>     >     > > 
>     >     > > +    if ((sltctl & PCI_EXP_SLTCTL_PIC) ==
>     PCI_EXP_SLTCTL_PWR_IND_BLINK)
>     >     {
>     >     > > +        error_setg(errp, "Hot-plug failed: %s is in Power
>     Transition",
>     >     > > +                   DEVICE(hotplug_pdev)->id);
>     >     > > +        return;
>     >     > > +    }
>     >     > > +
>     >     > >      pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev,
>     errp);
>     >     > >  } 
>     >     >
>     >     > Probably the only way to handle for existing machine types.
>     >
>     >
>     > I agree
>     >  
>     >
>     >     > For new ones, can't we queue it in host memory somewhere?
>     >
>     >
>     >
>     > I am not sure I understand what will be the flow.
>     >   - The user asks for a hotplug operation.
>     >   -  QEMU deferred operation.
>     > After that the operation may still fail, how would the user know if the
>     > operation
>     > succeeded or not?
> 
> 
>     How can it fail? It's just a button press ...
> 
> 
> 
> Currently we have "Hotplug unsupported."
> With this change we have "Guest/System not ready"


Hotplug unsupported is not an error that can trigger with
a well behaved management such as libvirt.


>  
> 
>     >  
>     >
>     >     I'm not actually convinced we can't do that even for existing machine
>     >     types. 
>     >
>     >
>     > Is a Guest visible change, I don't think we can do it.
>     >  
>     >
>     >     So I'm a bit hesitant to suggest going ahead with this without
>     >     looking a bit closer at whether we can implement a wait-for-ready in
>     >     qemu, rather than forcing every user of qemu (human or machine) to do
>     >     so.
>     >
>     >
>     > While I agree it is a pain from the usability point of view, hotplug
>     operations
>     > are allowed to fail. This is not more than a corner case, ensuring the
>     right
>     > response (gracefully erroring out) may be enough.
>     >
>     > Thanks,
>     > Marcel
>     >
> 
> 
>     I don't think they ever failed in the past so management is unlikely
>     to handle the failure by retrying ...
> 
> 
> That would require some management handling, yes.
> But even without a "retry", failing is better than strange OS behavior.
> 
> Trying a better alternative like deferring the operation for new machines
> would make sense, however is out of the scope of this patch

Expand the scope please. The scope should be "solve a problem xx" not
"solve a problem xx by doing abc".

> that simply
> detects the error leaving us in a slightly better state than today.
> 
> Thanks,
> Marcel

Not applying a patch is the only tool we maintainers have to influence
people to solve the problem fully. That's why I'm not inclined to apply
"slightly better" patches generally.


> 
> 
>     >
>     >
>     >
>     >     --
>     >     David Gibson <dgibson@redhat.com>
>     >     Principal Software Engineer, Virtualization, Red Hat
>     >
> 
> 


Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Marcel Apfelbaum 3 years, 6 months ago
On Thu, Oct 22, 2020 at 5:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Thu, Oct 22, 2020 at 05:10:43PM +0300, Marcel Apfelbaum wrote:
> >
> >
> > On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> >
> >     On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote:
> >     > Hi David, Michael,
> >     >
> >     > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgibson@redhat.com>
> wrote:
> >     >
> >     >     On Thu, 22 Oct 2020 08:06:55 -0400
> >     >     "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >     >
> >     >     > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum
> wrote:
> >     >     > > From: Marcel Apfelbaum <marcel@redhat.com>
> >     >     > >
> >     >     > > During PCIe Root Port's transition from Power-Off to
> Power-ON (or
> >     >     vice-versa)
> >     >     > > the "Slot Control Register" has the "Power Indicator
> Control"
> >     >     > > set to "Blinking" expressing a "power transition" mode.
> >     >     > >
> >     >     > > Any hotplug operation during the "power transition" mode
> is not
> >     >     permitted
> >     >     > > or at least not expected by the Guest OS leading to strange
> >     failures.
> >     >     > >
> >     >     > > Detect and refuse hotplug operations in such case.
> >     >     > >
> >     >     > > Signed-off-by: Marcel Apfelbaum <
> marcel.apfelbaum@gmail.com>
> >     >     > > ---
> >     >     > >  hw/pci/pcie.c | 7 +++++++
> >     >     > >  1 file changed, 7 insertions(+)
> >     >     > >
> >     >     > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> >     >     > > index 5b48bae0f6..2fe5c1473f 100644
> >     >     > > --- a/hw/pci/pcie.c
> >     >     > > +++ b/hw/pci/pcie.c
> >     >     > > @@ -410,6 +410,7 @@ void
> pcie_cap_slot_pre_plug_cb(HotplugHandler
> >     >     *hotplug_dev, DeviceState *dev,
> >     >     > >      PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
> >     >     > >      uint8_t *exp_cap = hotplug_pdev->config +
> hotplug_pdev->
> >     >     exp.exp_cap;
> >     >     > >      uint32_t sltcap = pci_get_word(exp_cap +
> PCI_EXP_SLTCAP);
> >     >     > > +    uint32_t sltctl = pci_get_word(exp_cap +
> PCI_EXP_SLTCTL);
> >     >     > >
> >     >     > >      /* Check if hot-plug is disabled on the slot */
> >     >     > >      if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC)
> == 0) {
> >     >     > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb
> >     (HotplugHandler
> >     >     *hotplug_dev, DeviceState *dev,
> >     >     > >          return;
> >     >     > >      }
> >     >     > >
> >     >     > > +    if ((sltctl & PCI_EXP_SLTCTL_PIC) ==
> >     PCI_EXP_SLTCTL_PWR_IND_BLINK)
> >     >     {
> >     >     > > +        error_setg(errp, "Hot-plug failed: %s is in Power
> >     Transition",
> >     >     > > +                   DEVICE(hotplug_pdev)->id);
> >     >     > > +        return;
> >     >     > > +    }
> >     >     > > +
> >     >     > >      pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev),
> dev,
> >     errp);
> >     >     > >  }
> >     >     >
> >     >     > Probably the only way to handle for existing machine types.
> >     >
> >     >
> >     > I agree
> >     >
> >     >
> >     >     > For new ones, can't we queue it in host memory somewhere?
> >     >
> >     >
> >     >
> >     > I am not sure I understand what will be the flow.
> >     >   - The user asks for a hotplug operation.
> >     >   -  QEMU deferred operation.
> >     > After that the operation may still fail, how would the user know
> if the
> >     > operation
> >     > succeeded or not?
> >
> >
> >     How can it fail? It's just a button press ...
> >
> >
> >
> > Currently we have "Hotplug unsupported."
> > With this change we have "Guest/System not ready"
>
>
> Hotplug unsupported is not an error that can trigger with
> a well behaved management such as libvirt.
>
>
> >
> >
> >     >
> >     >
> >     >     I'm not actually convinced we can't do that even for existing
> machine
> >     >     types.
> >     >
> >     >
> >     > Is a Guest visible change, I don't think we can do it.
> >     >
> >     >
> >     >     So I'm a bit hesitant to suggest going ahead with this without
> >     >     looking a bit closer at whether we can implement a
> wait-for-ready in
> >     >     qemu, rather than forcing every user of qemu (human or
> machine) to do
> >     >     so.
> >     >
> >     >
> >     > While I agree it is a pain from the usability point of view,
> hotplug
> >     operations
> >     > are allowed to fail. This is not more than a corner case, ensuring
> the
> >     right
> >     > response (gracefully erroring out) may be enough.
> >     >
> >     > Thanks,
> >     > Marcel
> >     >
> >
> >
> >     I don't think they ever failed in the past so management is unlikely
> >     to handle the failure by retrying ...
> >
> >
> > That would require some management handling, yes.
> > But even without a "retry", failing is better than strange OS behavior.
> >
> > Trying a better alternative like deferring the operation for new machines
> > would make sense, however is out of the scope of this patch
>
> Expand the scope please. The scope should be "solve a problem xx" not
> "solve a problem xx by doing abc".
>
>
The scope is detecting a hotplug error early instead
passing to the Guest OS a hotplug operation that we know it will fail.



> > that simply
> > detects the error leaving us in a slightly better state than today.
> >
> > Thanks,
> > Marcel
>
> Not applying a patch is the only tool we maintainers have to influence
> people to solve the problem fully.

That's why I'm not inclined to apply
> "slightly better" patches generally.
>
>
The patch is a proposal following some offline discussions on this matter.
I personally see the value of it versus what we have today.

Thanks,
Marcel


>
> >
> >
> >     >
> >     >
> >     >
> >     >     --
> >     >     David Gibson <dgibson@redhat.com>
> >     >     Principal Software Engineer, Virtualization, Red Hat
> >     >
> >
> >
>
>
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Michael S. Tsirkin 3 years, 6 months ago
On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote:
> 
> 
> On Thu, Oct 22, 2020 at 5:33 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> 
>     On Thu, Oct 22, 2020 at 05:10:43PM +0300, Marcel Apfelbaum wrote:
>     >
>     >
>     > On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <mst@redhat.com>
>     wrote:
>     >
>     >     On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote:
>     >     > Hi David, Michael,
>     >     >
>     >     > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgibson@redhat.com>
>     wrote:
>     >     >
>     >     >     On Thu, 22 Oct 2020 08:06:55 -0400
>     >     >     "Michael S. Tsirkin" <mst@redhat.com> wrote:
>     >     >
>     >     >     > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum
>     wrote:
>     >     >     > > From: Marcel Apfelbaum <marcel@redhat.com>
>     >     >     > >
>     >     >     > > During PCIe Root Port's transition from Power-Off to
>     Power-ON (or
>     >     >     vice-versa)
>     >     >     > > the "Slot Control Register" has the "Power Indicator
>     Control"
>     >     >     > > set to "Blinking" expressing a "power transition" mode.
>     >     >     > >
>     >     >     > > Any hotplug operation during the "power transition" mode is
>     not
>     >     >     permitted
>     >     >     > > or at least not expected by the Guest OS leading to strange
>     >     failures.
>     >     >     > >
>     >     >     > > Detect and refuse hotplug operations in such case.
>     >     >     > >
>     >     >     > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com
>     >
>     >     >     > > ---
>     >     >     > >  hw/pci/pcie.c | 7 +++++++
>     >     >     > >  1 file changed, 7 insertions(+)
>     >     >     > >
>     >     >     > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
>     >     >     > > index 5b48bae0f6..2fe5c1473f 100644
>     >     >     > > --- a/hw/pci/pcie.c
>     >     >     > > +++ b/hw/pci/pcie.c
>     >     >     > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb
>     (HotplugHandler
>     >     >     *hotplug_dev, DeviceState *dev,
>     >     >     > >      PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
>     >     >     > >      uint8_t *exp_cap = hotplug_pdev->config +
>     hotplug_pdev->
>     >     >     exp.exp_cap;
>     >     >     > >      uint32_t sltcap = pci_get_word(exp_cap +
>     PCI_EXP_SLTCAP);
>     >     >     > > +    uint32_t sltctl = pci_get_word(exp_cap +
>     PCI_EXP_SLTCTL);
>     >     >     > > 
>     >     >     > >      /* Check if hot-plug is disabled on the slot */
>     >     >     > >      if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) =
>     = 0) {
>     >     >     > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb
>     >     (HotplugHandler
>     >     >     *hotplug_dev, DeviceState *dev,
>     >     >     > >          return;
>     >     >     > >      }
>     >     >     > > 
>     >     >     > > +    if ((sltctl & PCI_EXP_SLTCTL_PIC) ==
>     >     PCI_EXP_SLTCTL_PWR_IND_BLINK)
>     >     >     {
>     >     >     > > +        error_setg(errp, "Hot-plug failed: %s is in Power
>     >     Transition",
>     >     >     > > +                   DEVICE(hotplug_pdev)->id);
>     >     >     > > +        return;
>     >     >     > > +    }
>     >     >     > > +
>     >     >     > >      pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev),
>     dev,
>     >     errp);
>     >     >     > >  } 
>     >     >     >
>     >     >     > Probably the only way to handle for existing machine types.
>     >     >
>     >     >
>     >     > I agree
>     >     >  
>     >     >
>     >     >     > For new ones, can't we queue it in host memory somewhere?
>     >     >
>     >     >
>     >     >
>     >     > I am not sure I understand what will be the flow.
>     >     >   - The user asks for a hotplug operation.
>     >     >   -  QEMU deferred operation.
>     >     > After that the operation may still fail, how would the user know if
>     the
>     >     > operation
>     >     > succeeded or not?
>     >
>     >
>     >     How can it fail? It's just a button press ...
>     >
>     >
>     >
>     > Currently we have "Hotplug unsupported."
>     > With this change we have "Guest/System not ready"
> 
> 
>     Hotplug unsupported is not an error that can trigger with
>     a well behaved management such as libvirt.
> 
> 
>     >  
>     >
>     >     >  
>     >     >
>     >     >     I'm not actually convinced we can't do that even for existing
>     machine
>     >     >     types. 
>     >     >
>     >     >
>     >     > Is a Guest visible change, I don't think we can do it.
>     >     >  
>     >     >
>     >     >     So I'm a bit hesitant to suggest going ahead with this without
>     >     >     looking a bit closer at whether we can implement a
>     wait-for-ready in
>     >     >     qemu, rather than forcing every user of qemu (human or machine)
>     to do
>     >     >     so.
>     >     >
>     >     >
>     >     > While I agree it is a pain from the usability point of view,
>     hotplug
>     >     operations
>     >     > are allowed to fail. This is not more than a corner case, ensuring
>     the
>     >     right
>     >     > response (gracefully erroring out) may be enough.
>     >     >
>     >     > Thanks,
>     >     > Marcel
>     >     >
>     >
>     >
>     >     I don't think they ever failed in the past so management is unlikely
>     >     to handle the failure by retrying ...
>     >
>     >
>     > That would require some management handling, yes.
>     > But even without a "retry", failing is better than strange OS behavior.
>     >
>     > Trying a better alternative like deferring the operation for new machines
>     > would make sense, however is out of the scope of this patch
> 
>     Expand the scope please. The scope should be "solve a problem xx" not
>     "solve a problem xx by doing abc".
> 
> 
> 
> The scope is detecting a hotplug error early instead
> passing to the Guest OS a hotplug operation that we know it will fail.
> 

Right. After detecting just failing unconditionally it a bit too
simplistic IMHO.

> 
>     > that simply
>     > detects the error leaving us in a slightly better state than today.
>     >
>     > Thanks,
>     > Marcel
> 
>     Not applying a patch is the only tool we maintainers have to influence
>     people to solve the problem fully. 
> 
>     That's why I'm not inclined to apply
>     "slightly better" patches generally.
> 
> 
> 
> The patch is a proposal following some offline discussions on this matter.
> I personally see the value of it versus what we have today.
> 
> Thanks,
> Marcel
> 
> 
>     >
>     >
>     >     >
>     >     >
>     >     >
>     >     >     --
>     >     >     David Gibson <dgibson@redhat.com>
>     >     >     Principal Software Engineer, Virtualization, Red Hat
>     >     >
>     >
>     >
> 
> 


Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by David Gibson 3 years, 6 months ago
On Thu, 22 Oct 2020 11:01:04 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote:
>  [...]  
> 
> Right. After detecting just failing unconditionally it a bit too
> simplistic IMHO.

There's also another factor here, which I thought I'd mentioned
already, but looks like I didn't: I think we're still missing some
details in what's going on.

The premise for this patch is that plugging while the indicator is in
transition state is allowed to fail in any way on the guest side.  I
don't think that's a reasonable interpretation, because it's unworkable
for physical hotplug.  If the indicator starts blinking while you're in
the middle of shoving a card in, you'd be in trouble.

So, what I'm assuming here is that while "don't plug while blinking" is
the instruction for the operator to obey as best they can, on the guest
side the rule has to be "start blinking, wait a while and by the time
you leave blinking state again, you can be confident any plugs or
unplugs have completed".  Obviously still racy in the strict computer
science sense, but about the best you can do with slow humans in the
mix.

So, qemu should of course endeavour to follow that rule as though it
was a human operator on a physical machine and not plug when the
indicator is blinking.  *But* the qemu plug will in practice be fast
enough that if we're hitting real problems here, it suggests the guest
is still doing something wrong.

-- 
David Gibson <dgibson@redhat.com>
Principal Software Engineer, Virtualization, Red Hat
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Marcel Apfelbaum 3 years, 6 months ago
Hi David,

On Fri, Oct 23, 2020 at 6:49 AM David Gibson <dgibson@redhat.com> wrote:

> On Thu, 22 Oct 2020 11:01:04 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
> > On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote:
> >  [...]
> >
> > Right. After detecting just failing unconditionally it a bit too
> > simplistic IMHO.
>
> There's also another factor here, which I thought I'd mentioned
> already, but looks like I didn't: I think we're still missing some
> details in what's going on.
>
> The premise for this patch is that plugging while the indicator is in
> transition state is allowed to fail in any way on the guest side.  I
> don't think that's a reasonable interpretation, because it's unworkable
> for physical hotplug.  If the indicator starts blinking while you're in
> the middle of shoving a card in, you'd be in trouble.
>
> So, what I'm assuming here is that while "don't plug while blinking" is
> the instruction for the operator to obey as best they can, on the guest
> side the rule has to be "start blinking, wait a while and by the time
> you leave blinking state again, you can be confident any plugs or
> unplugs have completed".  Obviously still racy in the strict computer
> science sense, but about the best you can do with slow humans in the
> mix.
>
> So, qemu should of course endeavour to follow that rule as though it
> was a human operator on a physical machine and not plug when the
> indicator is blinking.  *But* the qemu plug will in practice be fast
> enough that if we're hitting real problems here, it suggests the guest
> is still doing something wrong.
>

I personally think there is a little bit of over-engineering here.
Let's start with the spec:

    Power Indicator Blinking
    A blinking Power Indicator indicates that the slot is powering up or
powering down and that
    insertion or removal of the adapter is not permitted.

What exactly is an interpretation here?
As you stated, the races are theoretical, the whole point of the indicator
is to let the operator know he can't plug the device just yet.

I understand it would be more user friendly if the QEMU would wait
internally for the
blinking to end, but the whole point of the indicator is to let the
operator (human or machine)
know they can't plug the device at a specific time.
Should QEMU take the responsibility of the operator? Is it even correct?

Even if we would want such a feature, how is it related to this patch?
The patch simply refuses to start a hotplug operation when it knows it will
not succeed.

Another way that would make sense to me would be  is a new QEMU interface
other than
"add_device", let's say "adding_device_allowed", that would return true if
the hotplug is allowed
at this point of time. (I am aware of the theoretical races)

The above will at least mimic the mechanics of the pyhs world.  The
operator looks at the indicator,
the management software checks if adding the device is allowed.
Since it is a corner case I would prefer the device_add to fail rather than
introducing a new interface,
but that's just me.

Thanks,
Marcel


> --
> David Gibson <dgibson@redhat.com>
> Principal Software Engineer, Virtualization, Red Hat
>
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Michael S. Tsirkin 3 years, 6 months ago
On Fri, Oct 23, 2020 at 09:47:14AM +0300, Marcel Apfelbaum wrote:
> Hi David,
> 
> On Fri, Oct 23, 2020 at 6:49 AM David Gibson <dgibson@redhat.com> wrote:
> 
>     On Thu, 22 Oct 2020 11:01:04 -0400
>     "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
>     > On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote:
>     >  [...] 
>     >
>     > Right. After detecting just failing unconditionally it a bit too
>     > simplistic IMHO.
> 
>     There's also another factor here, which I thought I'd mentioned
>     already, but looks like I didn't: I think we're still missing some
>     details in what's going on.
> 
>     The premise for this patch is that plugging while the indicator is in
>     transition state is allowed to fail in any way on the guest side.  I
>     don't think that's a reasonable interpretation, because it's unworkable
>     for physical hotplug.  If the indicator starts blinking while you're in
>     the middle of shoving a card in, you'd be in trouble.
> 
>     So, what I'm assuming here is that while "don't plug while blinking" is
>     the instruction for the operator to obey as best they can, on the guest
>     side the rule has to be "start blinking, wait a while and by the time
>     you leave blinking state again, you can be confident any plugs or
>     unplugs have completed".  Obviously still racy in the strict computer
>     science sense, but about the best you can do with slow humans in the
>     mix.
> 
>     So, qemu should of course endeavour to follow that rule as though it
>     was a human operator on a physical machine and not plug when the
>     indicator is blinking.  *But* the qemu plug will in practice be fast
>     enough that if we're hitting real problems here, it suggests the guest
>     is still doing something wrong.
> 
> 
> I personally think there is a little bit of over-engineering here.
> Let's start with the spec:
> 
>     Power Indicator Blinking
>     A blinking Power Indicator indicates that the slot is powering up or
> powering down and that
>     insertion or removal of the adapter is not permitted.
> 
> What exactly is an interpretation here?
> As you stated, the races are theoretical, the whole point of the indicator
> is to let the operator know he can't plug the device just yet.
> 
> I understand it would be more user friendly if the QEMU would wait internally
> for the
> blinking to end, but the whole point of the indicator is to let the operator 
> (human or machine)
> know they can't plug the device at a specific time.
> Should QEMU take the responsibility of the operator? Is it even correct?
> 
> Even if we would want such a feature, how is it related to this patch?
> The patch simply refuses to start a hotplug operation when it knows it will not
> succeed. 
>  
> Another way that would make sense to me would be  is a new QEMU interface other
> than
> "add_device", let's say "adding_device_allowed", that would return true if the
> hotplug is allowed
> at this point of time. (I am aware of the theoretical races) 

Rather than adding_device_allowed, something like "query slot"
might be helpful for debugging. That would help user figure out
e.g. why isn't device visible without any races.


> The above will at least mimic the mechanics of the pyhs world.  The operator
> looks at the indicator,
> the management software checks if adding the device is allowed.
> Since it is a corner case I would prefer the device_add to fail rather than
> introducing a new interface,
> but that's just me.
> 
> Thanks,
> Marcel
> 

I think we want QEMU management interface to be reasonably
abstract and agnostic if possible. Pushing knowledge of hardware
detail to management will just lead to pain IMHO.
We supported device_add which practically never fails for years,
at this point it's easier to keep supporting it than
change all users ...


> 
>     --
>     David Gibson <dgibson@redhat.com>
>     Principal Software Engineer, Virtualization, Red Hat
> 


Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Igor Mammedov 3 years, 6 months ago
On Fri, 23 Oct 2020 11:54:40 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Fri, Oct 23, 2020 at 09:47:14AM +0300, Marcel Apfelbaum wrote:
> > Hi David,
> > 
> > On Fri, Oct 23, 2020 at 6:49 AM David Gibson <dgibson@redhat.com> wrote:
> > 
> >     On Thu, 22 Oct 2020 11:01:04 -0400
> >     "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >   
> >     > On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote:
> >     >  [...] 
> >     >
> >     > Right. After detecting just failing unconditionally it a bit too
> >     > simplistic IMHO.  
> > 
> >     There's also another factor here, which I thought I'd mentioned
> >     already, but looks like I didn't: I think we're still missing some
> >     details in what's going on.
> > 
> >     The premise for this patch is that plugging while the indicator is in
> >     transition state is allowed to fail in any way on the guest side.  I
> >     don't think that's a reasonable interpretation, because it's unworkable
> >     for physical hotplug.  If the indicator starts blinking while you're in
> >     the middle of shoving a card in, you'd be in trouble.
> > 
> >     So, what I'm assuming here is that while "don't plug while blinking" is
> >     the instruction for the operator to obey as best they can, on the guest
> >     side the rule has to be "start blinking, wait a while and by the time
> >     you leave blinking state again, you can be confident any plugs or
> >     unplugs have completed".  Obviously still racy in the strict computer
> >     science sense, but about the best you can do with slow humans in the
> >     mix.
> > 
> >     So, qemu should of course endeavour to follow that rule as though it
> >     was a human operator on a physical machine and not plug when the
> >     indicator is blinking.  *But* the qemu plug will in practice be fast
> >     enough that if we're hitting real problems here, it suggests the guest
> >     is still doing something wrong.
> > 
> > 
> > I personally think there is a little bit of over-engineering here.
> > Let's start with the spec:
> > 
> >     Power Indicator Blinking
> >     A blinking Power Indicator indicates that the slot is powering up or
> > powering down and that
> >     insertion or removal of the adapter is not permitted.
> > 
> > What exactly is an interpretation here?
> > As you stated, the races are theoretical, the whole point of the indicator
> > is to let the operator know he can't plug the device just yet.
> > 
> > I understand it would be more user friendly if the QEMU would wait internally
> > for the
> > blinking to end, but the whole point of the indicator is to let the operator 
> > (human or machine)
> > know they can't plug the device at a specific time.
> > Should QEMU take the responsibility of the operator? Is it even correct?
> > 
> > Even if we would want such a feature, how is it related to this patch?
> > The patch simply refuses to start a hotplug operation when it knows it will not
> > succeed. 
> >  
> > Another way that would make sense to me would be  is a new QEMU interface other
> > than
> > "add_device", let's say "adding_device_allowed", that would return true if the
> > hotplug is allowed
> > at this point of time. (I am aware of the theoretical races)   
> 
> Rather than adding_device_allowed, something like "query slot"
> might be helpful for debugging. That would help user figure out
> e.g. why isn't device visible without any races.

Would be new command useful tough? What we end up is broken guest
(if I read commit message right) and a user who has no idea if 
device_add was successful or not.
So what user should do in this case
  - wait till it explodes?
  - can user remove it or it would be stuck there forever?
  - poll slot before hotplug, manually?

(if this is the case then failing device_add cleanly doesn't sound bad,
it looks similar to another error we have "/* Check if hot-plug is disabled on the slot */"
in pcie_cap_slot_pre_plug_cb)

CCing libvirt, as it concerns not only QEMU.

> 
> > The above will at least mimic the mechanics of the pyhs world.  The operator
> > looks at the indicator,
> > the management software checks if adding the device is allowed.
> > Since it is a corner case I would prefer the device_add to fail rather than
> > introducing a new interface,
> > but that's just me.
> > 
> > Thanks,
> > Marcel
> >   
> 
> I think we want QEMU management interface to be reasonably
> abstract and agnostic if possible. Pushing knowledge of hardware
> detail to management will just lead to pain IMHO.
> We supported device_add which practically never fails for years,

For CPUs and RAM, device_add can fail so maybe management is also
prepared to handle errors on PCI hotplug path.

> at this point it's easier to keep supporting it than
> change all users ...
> 
> 
> > 
> >     --
> >     David Gibson <dgibson@redhat.com>
> >     Principal Software Engineer, Virtualization, Red Hat
> >   
> 
> 


Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by David Gibson 3 years, 6 months ago
On Fri, 23 Oct 2020 19:27:55 +0200
Igor Mammedov <imammedo@redhat.com> wrote:

> On Fri, 23 Oct 2020 11:54:40 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
>  [...]  
>  [...]  
>  [...]  
>  [...]  
> > 
> > Rather than adding_device_allowed, something like "query slot"
> > might be helpful for debugging. That would help user figure out
> > e.g. why isn't device visible without any races.  
> 
> Would be new command useful tough? What we end up is broken guest
> (if I read commit message right) and a user who has no idea if 
> device_add was successful or not.
> So what user should do in this case
>   - wait till it explodes?
>   - can user remove it or it would be stuck there forever?
>   - poll slot before hotplug, manually?
> 
> (if this is the case then failing device_add cleanly doesn't sound bad,
> it looks similar to another error we have "/* Check if hot-plug is disabled on the slot */"
> in pcie_cap_slot_pre_plug_cb)
> 
> CCing libvirt, as it concerns not only QEMU.
> 
>  [...]  
>  [...]  
> > 
> > I think we want QEMU management interface to be reasonably
> > abstract and agnostic if possible. Pushing knowledge of hardware
> > detail to management will just lead to pain IMHO.
> > We supported device_add which practically never fails for years,  
> 
> For CPUs and RAM, device_add can fail so maybe management is also
> prepared to handle errors on PCI hotplug path.

There can be unarguable reasons for PCI hotplug to fail as well
(attempting to plug to a bus that can't support it for one).  The
difference here is that it's a failure that we expect to be transitory.

-- 
David Gibson <dgibson@redhat.com>
Principal Software Engineer, Virtualization, Red Hat
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Peter Krempa 3 years, 6 months ago
On Fri, Oct 23, 2020 at 19:27:55 +0200, Igor Mammedov wrote:
> On Fri, 23 Oct 2020 11:54:40 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Fri, Oct 23, 2020 at 09:47:14AM +0300, Marcel Apfelbaum wrote:
> > > Hi David,
> > > 
> > > On Fri, Oct 23, 2020 at 6:49 AM David Gibson <dgibson@redhat.com> wrote:
> > > 
> > >     On Thu, 22 Oct 2020 11:01:04 -0400
> > >     "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > >   
> > >     > On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote:
> > >     >  [...] 
> > >     >
> > >     > Right. After detecting just failing unconditionally it a bit too
> > >     > simplistic IMHO.  
> > > 
> > >     There's also another factor here, which I thought I'd mentioned
> > >     already, but looks like I didn't: I think we're still missing some
> > >     details in what's going on.
> > > 
> > >     The premise for this patch is that plugging while the indicator is in
> > >     transition state is allowed to fail in any way on the guest side.  I
> > >     don't think that's a reasonable interpretation, because it's unworkable
> > >     for physical hotplug.  If the indicator starts blinking while you're in
> > >     the middle of shoving a card in, you'd be in trouble.
> > > 
> > >     So, what I'm assuming here is that while "don't plug while blinking" is
> > >     the instruction for the operator to obey as best they can, on the guest
> > >     side the rule has to be "start blinking, wait a while and by the time
> > >     you leave blinking state again, you can be confident any plugs or
> > >     unplugs have completed".  Obviously still racy in the strict computer
> > >     science sense, but about the best you can do with slow humans in the
> > >     mix.
> > > 
> > >     So, qemu should of course endeavour to follow that rule as though it
> > >     was a human operator on a physical machine and not plug when the
> > >     indicator is blinking.  *But* the qemu plug will in practice be fast
> > >     enough that if we're hitting real problems here, it suggests the guest
> > >     is still doing something wrong.
> > > 
> > > 
> > > I personally think there is a little bit of over-engineering here.
> > > Let's start with the spec:
> > > 
> > >     Power Indicator Blinking
> > >     A blinking Power Indicator indicates that the slot is powering up or
> > > powering down and that
> > >     insertion or removal of the adapter is not permitted.
> > > 
> > > What exactly is an interpretation here?
> > > As you stated, the races are theoretical, the whole point of the indicator
> > > is to let the operator know he can't plug the device just yet.
> > > 
> > > I understand it would be more user friendly if the QEMU would wait internally
> > > for the
> > > blinking to end, but the whole point of the indicator is to let the operator 
> > > (human or machine)
> > > know they can't plug the device at a specific time.
> > > Should QEMU take the responsibility of the operator? Is it even correct?
> > > 
> > > Even if we would want such a feature, how is it related to this patch?
> > > The patch simply refuses to start a hotplug operation when it knows it will not
> > > succeed. 
> > >  
> > > Another way that would make sense to me would be  is a new QEMU interface other
> > > than
> > > "add_device", let's say "adding_device_allowed", that would return true if the
> > > hotplug is allowed
> > > at this point of time. (I am aware of the theoretical races)   
> > 
> > Rather than adding_device_allowed, something like "query slot"
> > might be helpful for debugging. That would help user figure out
> > e.g. why isn't device visible without any races.
> 
> Would be new command useful tough? What we end up is broken guest
> (if I read commit message right) and a user who has no idea if 
> device_add was successful or not.
> So what user should do in this case
>   - wait till it explodes?
>   - can user remove it or it would be stuck there forever?
>   - poll slot before hotplug, manually?
> 
> (if this is the case then failing device_add cleanly doesn't sound bad,
> it looks similar to another error we have "/* Check if hot-plug is disabled on the slot */"
> in pcie_cap_slot_pre_plug_cb)
> 
> CCing libvirt, as it concerns not only QEMU.

The only reason a separate command might make sense is if libvirt would
want to provide a specific error to the user/upper management layer that
the operation failed due to a transient failure (so that it can be
retried later).

We don't really want to go to a policy decision of how long to wait in
such case, so unless qemu waits libvirt will plainly want to report an
error.

That said IMO 'device_add' should still fail if it's certain that the
device won't be plugged in. This will fix any client which is currently
in use. Adding a separate command is worth only for pre-checking for
saner error handling.

> > > The above will at least mimic the mechanics of the pyhs world.  The operator
> > > looks at the indicator,
> > > the management software checks if adding the device is allowed.
> > > Since it is a corner case I would prefer the device_add to fail rather than
> > > introducing a new interface,
> > > but that's just me.
> > > 
> > > Thanks,
> > > Marcel
> > >   
> > 
> > I think we want QEMU management interface to be reasonably
> > abstract and agnostic if possible. Pushing knowledge of hardware
> > detail to management will just lead to pain IMHO.
> > We supported device_add which practically never fails for years,
> 
> For CPUs and RAM, device_add can fail so maybe management is also
> prepared to handle errors on PCI hotplug path.

While it was me who implemented device_add for cpu/memory I don't
remmeber any more whether we are really prepared for it.

We certainly want to do it if there's a possibility to do it.


Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by David Gibson 3 years, 6 months ago
On Fri, 23 Oct 2020 09:47:14 +0300
Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote:

> Hi David,
> 
> On Fri, Oct 23, 2020 at 6:49 AM David Gibson <dgibson@redhat.com> wrote:
> 
> > On Thu, 22 Oct 2020 11:01:04 -0400
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >  
> > > On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote:
> > >  [...]
> > >
> > > Right. After detecting just failing unconditionally it a bit too
> > > simplistic IMHO.  
> >
> > There's also another factor here, which I thought I'd mentioned
> > already, but looks like I didn't: I think we're still missing some
> > details in what's going on.
> >
> > The premise for this patch is that plugging while the indicator is in
> > transition state is allowed to fail in any way on the guest side.  I
> > don't think that's a reasonable interpretation, because it's unworkable
> > for physical hotplug.  If the indicator starts blinking while you're in
> > the middle of shoving a card in, you'd be in trouble.
> >
> > So, what I'm assuming here is that while "don't plug while blinking" is
> > the instruction for the operator to obey as best they can, on the guest
> > side the rule has to be "start blinking, wait a while and by the time
> > you leave blinking state again, you can be confident any plugs or
> > unplugs have completed".  Obviously still racy in the strict computer
> > science sense, but about the best you can do with slow humans in the
> > mix.
> >
> > So, qemu should of course endeavour to follow that rule as though it
> > was a human operator on a physical machine and not plug when the
> > indicator is blinking.  *But* the qemu plug will in practice be fast
> > enough that if we're hitting real problems here, it suggests the guest
> > is still doing something wrong.
> >  
> 
> I personally think there is a little bit of over-engineering here.
> Let's start with the spec:
> 
>     Power Indicator Blinking
>     A blinking Power Indicator indicates that the slot is powering up or
> powering down and that
>     insertion or removal of the adapter is not permitted.

Right, but the weird bit here is that IIUC, the kernel during general
probe is switching the indicator from off -> blinking -> off without it
ever going to "on" state.  And it seems to do the power on and presence
check with the indicator still in blinking state.  This is different
from the normal sequence on a hotplug:
	press button
	indicator goes to blinking
	...wait...
	indicator goes to full on
	slot powers on
	presence detect

Or am I missing something?

> What exactly is an interpretation here?
> As you stated, the races are theoretical, the whole point of the indicator
> is to let the operator know he can't plug the device just yet.
> 
> I understand it would be more user friendly if the QEMU would wait
> internally for the
> blinking to end, but the whole point of the indicator is to let the
> operator (human or machine)
> know they can't plug the device at a specific time.
> Should QEMU take the responsibility of the operator? Is it even correct?

I don't think there's an unambiguous correct answer here.  But I think
it's reasonable to interpret a general "device_add" as "instruct the
virtual operator to plug in the card ASAP" as easily as "shove the
virtual card in right now".  device_add already covers a bunch of
different pluggable interfaces, so I don't think precisely aligning to
pci-e semantics is really a priority.

> Even if we would want such a feature, how is it related to this patch?
> The patch simply refuses to start a hotplug operation when it knows it will
> not succeed.

I don't think I was clear.  There are two separate and unrelated things
I'm bringing up here.  The first is that having the device_add fail for
transitory reasons that the management layer doesn't really care about
is really bad UX.

But the second point I'm raising here is that "don't plug when blinking"
is not, strictly speaking an implementable strategy for a human
operator.  That makes me conclude that the idea here is plugs started
at basically the same time as the blinking starts (which could be a
little before or a little after, humans being fuzzy) should actually be
acceptable, even if they finish after the indicator is blinking.  It's
not clear to me that the kernel's current behaviour actually permits
that.

> Another way that would make sense to me would be  is a new QEMU interface
> other than
> "add_device", let's say "adding_device_allowed", that would return true if
> the hotplug is allowed
> at this point of time. (I am aware of the theoretical races)
> 
> The above will at least mimic the mechanics of the pyhs world.  The
> operator looks at the indicator,
> the management software checks if adding the device is allowed.
> Since it is a corner case I would prefer the device_add to fail rather than
> introducing a new interface,
> but that's just me.

Oh, no, that's not what I'm suggesting.  Adding an "is allowed" command
is even worse.  It makes the management's task *more* complex, which
making any possible race here even wider.

-- 
David Gibson <dgibson@redhat.com>
Principal Software Engineer, Virtualization, Red Hat
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Marcel Apfelbaum 3 years, 6 months ago
Hi Michael,

On Thu, Oct 22, 2020 at 6:01 PM Michael S. Tsirkin <mst@redhat.com> wrote:

> On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote:
> >
> >
> > On Thu, Oct 22, 2020 at 5:33 PM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> >
> >     On Thu, Oct 22, 2020 at 05:10:43PM +0300, Marcel Apfelbaum wrote:
> >     >
> >     >
> >     > On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <mst@redhat.com
> >
> >     wrote:
> >     >
> >     >     On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum
> wrote:
> >     >     > Hi David, Michael,
> >     >     >
> >     >     > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <
> dgibson@redhat.com>
> >     wrote:
> >     >     >
> >     >     >     On Thu, 22 Oct 2020 08:06:55 -0400
> >     >     >     "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >     >     >
> >     >     >     > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel
> Apfelbaum
> >     wrote:
> >     >     >     > > From: Marcel Apfelbaum <marcel@redhat.com>
> >     >     >     > >
> >     >     >     > > During PCIe Root Port's transition from Power-Off to
> >     Power-ON (or
> >     >     >     vice-versa)
> >     >     >     > > the "Slot Control Register" has the "Power Indicator
> >     Control"
> >     >     >     > > set to "Blinking" expressing a "power transition"
> mode.
> >     >     >     > >
> >     >     >     > > Any hotplug operation during the "power transition"
> mode is
> >     not
> >     >     >     permitted
> >     >     >     > > or at least not expected by the Guest OS leading to
> strange
> >     >     failures.
> >     >     >     > >
> >     >     >     > > Detect and refuse hotplug operations in such case.
> >     >     >     > >
> >     >     >     > > Signed-off-by: Marcel Apfelbaum <
> marcel.apfelbaum@gmail.com
> >     >
> >     >     >     > > ---
> >     >     >     > >  hw/pci/pcie.c | 7 +++++++
> >     >     >     > >  1 file changed, 7 insertions(+)
> >     >     >     > >
> >     >     >     > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> >     >     >     > > index 5b48bae0f6..2fe5c1473f 100644
> >     >     >     > > --- a/hw/pci/pcie.c
> >     >     >     > > +++ b/hw/pci/pcie.c
> >     >     >     > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb
> >     (HotplugHandler
> >     >     >     *hotplug_dev, DeviceState *dev,
> >     >     >     > >      PCIDevice *hotplug_pdev =
> PCI_DEVICE(hotplug_dev);
> >     >     >     > >      uint8_t *exp_cap = hotplug_pdev->config +
> >     hotplug_pdev->
> >     >     >     exp.exp_cap;
> >     >     >     > >      uint32_t sltcap = pci_get_word(exp_cap +
> >     PCI_EXP_SLTCAP);
> >     >     >     > > +    uint32_t sltctl = pci_get_word(exp_cap +
> >     PCI_EXP_SLTCTL);
> >     >     >     > >
> >     >     >     > >      /* Check if hot-plug is disabled on the slot */
> >     >     >     > >      if (dev->hotplugged && (sltcap &
> PCI_EXP_SLTCAP_HPC) =
> >     = 0) {
> >     >     >     > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb
> >     >     (HotplugHandler
> >     >     >     *hotplug_dev, DeviceState *dev,
> >     >     >     > >          return;
> >     >     >     > >      }
> >     >     >     > >
> >     >     >     > > +    if ((sltctl & PCI_EXP_SLTCTL_PIC) ==
> >     >     PCI_EXP_SLTCTL_PWR_IND_BLINK)
> >     >     >     {
> >     >     >     > > +        error_setg(errp, "Hot-plug failed: %s is in
> Power
> >     >     Transition",
> >     >     >     > > +                   DEVICE(hotplug_pdev)->id);
> >     >     >     > > +        return;
> >     >     >     > > +    }
> >     >     >     > > +
> >     >     >     > >
> pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev),
> >     dev,
> >     >     errp);
> >     >     >     > >  }
> >     >     >     >
> >     >     >     > Probably the only way to handle for existing machine
> types.
> >     >     >
> >     >     >
> >     >     > I agree
> >     >     >
> >     >     >
> >     >     >     > For new ones, can't we queue it in host memory
> somewhere?
> >     >     >
> >     >     >
> >     >     >
> >     >     > I am not sure I understand what will be the flow.
> >     >     >   - The user asks for a hotplug operation.
> >     >     >   -  QEMU deferred operation.
> >     >     > After that the operation may still fail, how would the user
> know if
> >     the
> >     >     > operation
> >     >     > succeeded or not?
> >     >
> >     >
> >     >     How can it fail? It's just a button press ...
> >     >
> >     >
> >     >
> >     > Currently we have "Hotplug unsupported."
> >     > With this change we have "Guest/System not ready"
> >
> >
> >     Hotplug unsupported is not an error that can trigger with
> >     a well behaved management such as libvirt.
> >
> >
> >     >
> >     >
> >     >     >
> >     >     >
> >     >     >     I'm not actually convinced we can't do that even for
> existing
> >     machine
> >     >     >     types.
> >     >     >
> >     >     >
> >     >     > Is a Guest visible change, I don't think we can do it.
> >     >     >
> >     >     >
> >     >     >     So I'm a bit hesitant to suggest going ahead with this
> without
> >     >     >     looking a bit closer at whether we can implement a
> >     wait-for-ready in
> >     >     >     qemu, rather than forcing every user of qemu (human or
> machine)
> >     to do
> >     >     >     so.
> >     >     >
> >     >     >
> >     >     > While I agree it is a pain from the usability point of view,
> >     hotplug
> >     >     operations
> >     >     > are allowed to fail. This is not more than a corner case,
> ensuring
> >     the
> >     >     right
> >     >     > response (gracefully erroring out) may be enough.
> >     >     >
> >     >     > Thanks,
> >     >     > Marcel
> >     >     >
> >     >
> >     >
> >     >     I don't think they ever failed in the past so management is
> unlikely
> >     >     to handle the failure by retrying ...
> >     >
> >     >
> >     > That would require some management handling, yes.
> >     > But even without a "retry", failing is better than strange OS
> behavior.
> >     >
> >     > Trying a better alternative like deferring the operation for new
> machines
> >     > would make sense, however is out of the scope of this patch
> >
> >     Expand the scope please. The scope should be "solve a problem xx" not
> >     "solve a problem xx by doing abc".
> >
> >
> >
> > The scope is detecting a hotplug error early instead
> > passing to the Guest OS a hotplug operation that we know it will fail.
> >
>
> Right. After detecting just failing unconditionally it a bit too
> simplistic IMHO.
>
>
Simplistic does not mean wrong or incorrect.
I fail to see why it is not enough.

What QEMU can do better? Wait an unbounded time for the blinking to finish?
What if we have a buggy guest with a kernel stuck in blinking?
Is QEMU's responsibility to emulate the operator itself? Because the
operator
is the one who is supposed to wait.


Thanks,
Marcel

[...]
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Michael S. Tsirkin 3 years, 6 months ago
On Fri, Oct 23, 2020 at 09:26:48AM +0300, Marcel Apfelbaum wrote:
> Hi Michael,
> 
> On Thu, Oct 22, 2020 at 6:01 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> 
>     On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote:
>     >
>     >
>     > On Thu, Oct 22, 2020 at 5:33 PM Michael S. Tsirkin <mst@redhat.com>
>     wrote:
>     >
>     >     On Thu, Oct 22, 2020 at 05:10:43PM +0300, Marcel Apfelbaum wrote:
>     >     >
>     >     >
>     >     > On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <mst@redhat.com>
>     >     wrote:
>     >     >
>     >     >     On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum
>     wrote:
>     >     >     > Hi David, Michael,
>     >     >     >
>     >     >     > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <
>     dgibson@redhat.com>
>     >     wrote:
>     >     >     >
>     >     >     >     On Thu, 22 Oct 2020 08:06:55 -0400
>     >     >     >     "Michael S. Tsirkin" <mst@redhat.com> wrote:
>     >     >     >
>     >     >     >     > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel
>     Apfelbaum
>     >     wrote:
>     >     >     >     > > From: Marcel Apfelbaum <marcel@redhat.com>
>     >     >     >     > >
>     >     >     >     > > During PCIe Root Port's transition from Power-Off to
>     >     Power-ON (or
>     >     >     >     vice-versa)
>     >     >     >     > > the "Slot Control Register" has the "Power Indicator
>     >     Control"
>     >     >     >     > > set to "Blinking" expressing a "power transition"
>     mode.
>     >     >     >     > >
>     >     >     >     > > Any hotplug operation during the "power transition"
>     mode is
>     >     not
>     >     >     >     permitted
>     >     >     >     > > or at least not expected by the Guest OS leading to
>     strange
>     >     >     failures.
>     >     >     >     > >
>     >     >     >     > > Detect and refuse hotplug operations in such case.
>     >     >     >     > >
>     >     >     >     > > Signed-off-by: Marcel Apfelbaum <
>     marcel.apfelbaum@gmail.com
>     >     >
>     >     >     >     > > ---
>     >     >     >     > >  hw/pci/pcie.c | 7 +++++++
>     >     >     >     > >  1 file changed, 7 insertions(+)
>     >     >     >     > >
>     >     >     >     > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
>     >     >     >     > > index 5b48bae0f6..2fe5c1473f 100644
>     >     >     >     > > --- a/hw/pci/pcie.c
>     >     >     >     > > +++ b/hw/pci/pcie.c
>     >     >     >     > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb
>     >     (HotplugHandler
>     >     >     >     *hotplug_dev, DeviceState *dev,
>     >     >     >     > >      PCIDevice *hotplug_pdev = PCI_DEVICE
>     (hotplug_dev);
>     >     >     >     > >      uint8_t *exp_cap = hotplug_pdev->config +
>     >     hotplug_pdev->
>     >     >     >     exp.exp_cap;
>     >     >     >     > >      uint32_t sltcap = pci_get_word(exp_cap +
>     >     PCI_EXP_SLTCAP);
>     >     >     >     > > +    uint32_t sltctl = pci_get_word(exp_cap +
>     >     PCI_EXP_SLTCTL);
>     >     >     >     > > 
>     >     >     >     > >      /* Check if hot-plug is disabled on the slot */
>     >     >     >     > >      if (dev->hotplugged && (sltcap &
>     PCI_EXP_SLTCAP_HPC) =
>     >     = 0) {
>     >     >     >     > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb
>     >     >     (HotplugHandler
>     >     >     >     *hotplug_dev, DeviceState *dev,
>     >     >     >     > >          return;
>     >     >     >     > >      }
>     >     >     >     > > 
>     >     >     >     > > +    if ((sltctl & PCI_EXP_SLTCTL_PIC) ==
>     >     >     PCI_EXP_SLTCTL_PWR_IND_BLINK)
>     >     >     >     {
>     >     >     >     > > +        error_setg(errp, "Hot-plug failed: %s is in
>     Power
>     >     >     Transition",
>     >     >     >     > > +                   DEVICE(hotplug_pdev)->id);
>     >     >     >     > > +        return;
>     >     >     >     > > +    }
>     >     >     >     > > +
>     >     >     >     > >      pcie_cap_slot_plug_common(PCI_DEVICE
>     (hotplug_dev),
>     >     dev,
>     >     >     errp);
>     >     >     >     > >  } 
>     >     >     >     >
>     >     >     >     > Probably the only way to handle for existing machine
>     types.
>     >     >     >
>     >     >     >
>     >     >     > I agree
>     >     >     >  
>     >     >     >
>     >     >     >     > For new ones, can't we queue it in host memory
>     somewhere?
>     >     >     >
>     >     >     >
>     >     >     >
>     >     >     > I am not sure I understand what will be the flow.
>     >     >     >   - The user asks for a hotplug operation.
>     >     >     >   -  QEMU deferred operation.
>     >     >     > After that the operation may still fail, how would the user
>     know if
>     >     the
>     >     >     > operation
>     >     >     > succeeded or not?
>     >     >
>     >     >
>     >     >     How can it fail? It's just a button press ...
>     >     >
>     >     >
>     >     >
>     >     > Currently we have "Hotplug unsupported."
>     >     > With this change we have "Guest/System not ready"
>     >
>     >
>     >     Hotplug unsupported is not an error that can trigger with
>     >     a well behaved management such as libvirt.
>     >
>     >
>     >     >  
>     >     >
>     >     >     >  
>     >     >     >
>     >     >     >     I'm not actually convinced we can't do that even for
>     existing
>     >     machine
>     >     >     >     types. 
>     >     >     >
>     >     >     >
>     >     >     > Is a Guest visible change, I don't think we can do it.
>     >     >     >  
>     >     >     >
>     >     >     >     So I'm a bit hesitant to suggest going ahead with this
>     without
>     >     >     >     looking a bit closer at whether we can implement a
>     >     wait-for-ready in
>     >     >     >     qemu, rather than forcing every user of qemu (human or
>     machine)
>     >     to do
>     >     >     >     so.
>     >     >     >
>     >     >     >
>     >     >     > While I agree it is a pain from the usability point of view,
>     >     hotplug
>     >     >     operations
>     >     >     > are allowed to fail. This is not more than a corner case,
>     ensuring
>     >     the
>     >     >     right
>     >     >     > response (gracefully erroring out) may be enough.
>     >     >     >
>     >     >     > Thanks,
>     >     >     > Marcel
>     >     >     >
>     >     >
>     >     >
>     >     >     I don't think they ever failed in the past so management is
>     unlikely
>     >     >     to handle the failure by retrying ...
>     >     >
>     >     >
>     >     > That would require some management handling, yes.
>     >     > But even without a "retry", failing is better than strange OS
>     behavior.
>     >     >
>     >     > Trying a better alternative like deferring the operation for new
>     machines
>     >     > would make sense, however is out of the scope of this patch
>     >
>     >     Expand the scope please. The scope should be "solve a problem xx" not
>     >     "solve a problem xx by doing abc".
>     >
>     >
>     >
>     > The scope is detecting a hotplug error early instead
>     > passing to the Guest OS a hotplug operation that we know it will fail.
>     >
> 
>     Right. After detecting just failing unconditionally it a bit too
>     simplistic IMHO.
> 
> 
> 
> Simplistic does not mean wrong or incorrect.
> I fail to see why it is not enough.

The failure patch requires management to retry later.
A more elaborate scheme will fix the bug without need for management
changes.


> What QEMU can do better? Wait an unbounded time for the blinking to finish?
> What if we have a buggy guest with a kernel stuck in blinking?

Then it won't see the new device ever but does it even matter? It's
stuck ... I'd ack adding a query command to see what is going
on with the device. Can be generic, implementable on top of ACPI too.

> Is QEMU's responsibility to emulate the operator itself? Because the operator
> is the one who is supposed to wait.

I think these details are immaterial for users. They don't read pci
spec.

> 
> Thanks,
> Marcel
> 
> [...] 


Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by David Gibson 3 years, 6 months ago
On Fri, 23 Oct 2020 09:26:48 +0300
Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote:

> Hi Michael,
> 
> On Thu, Oct 22, 2020 at 6:01 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> 
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
> Simplistic does not mean wrong or incorrect.
> I fail to see why it is not enough.
> 
> What QEMU can do better? Wait an unbounded time for the blinking to finish?

It certainly shouldn't wait an unbounded time.  But a wait with timeout
seems worth investigating to me.

> What if we have a buggy guest with a kernel stuck in blinking?
> Is QEMU's responsibility to emulate the operator itself? Because the
> operator
> is the one who is supposed to wait.
> 
> 
> Thanks,
> Marcel
> 
> [...]


-- 
David Gibson <dgibson@redhat.com>
Principal Software Engineer, Virtualization, Red Hat
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Michael S. Tsirkin 3 years, 6 months ago
On Mon, Oct 26, 2020 at 05:45:37PM +1100, David Gibson wrote:
> On Fri, 23 Oct 2020 09:26:48 +0300
> Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote:
> 
> > Hi Michael,
> > 
> > On Thu, Oct 22, 2020 at 6:01 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > 
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> >  [...]  
> > Simplistic does not mean wrong or incorrect.
> > I fail to see why it is not enough.
> > 
> > What QEMU can do better? Wait an unbounded time for the blinking to finish?
> 
> It certainly shouldn't wait an unbounded time.  But a wait with timeout
> seems worth investigating to me.

If it's helpful, I'd add a query to check state
so management can figure out why doesn't guest see device yet.
But otherwise just buffer the request until such time as
we can deliver it to guest ...


> > What if we have a buggy guest with a kernel stuck in blinking?
> > Is QEMU's responsibility to emulate the operator itself? Because the
> > operator
> > is the one who is supposed to wait.
> > 
> > 
> > Thanks,
> > Marcel
> > 
> > [...]
> 
> 
> -- 
> David Gibson <dgibson@redhat.com>
> Principal Software Engineer, Virtualization, Red Hat



Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Igor Mammedov 3 years, 6 months ago
On Tue, 27 Oct 2020 07:26:44 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Mon, Oct 26, 2020 at 05:45:37PM +1100, David Gibson wrote:
> > On Fri, 23 Oct 2020 09:26:48 +0300
> > Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote:
> >   
> > > Hi Michael,
> > > 
> > > On Thu, Oct 22, 2020 at 6:01 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > 
> > >  [...]  
[...]
> > > Simplistic does not mean wrong or incorrect.
> > > I fail to see why it is not enough.
> > > 
> > > What QEMU can do better? Wait an unbounded time for the blinking to finish?  
> > 
> > It certainly shouldn't wait an unbounded time.  But a wait with timeout
> > seems worth investigating to me.  
racy, timeout is bound to break once it's in overcommited env.

> If it's helpful, I'd add a query to check state
> so management can figure out why doesn't guest see device yet.
that means mgmt would have to poll it and forward it to user
somehow.

> But otherwise just buffer the request until such time as
> we can deliver it to guest ...
I have more questions wrt the suggestion/workflow:
* at what place would you suggest buffering it?
* what would be the request in this case, i.e. create PCI device anyways
and try to signal hotplug event later?
* what would baremethal do in such case?
* what to do in case guest is never ready, what user should do in such case?
* can be such device be removed?

not sure that all of this is worth of the effort and added complexity.

alternatively:
maybe ports can send QMP events about it's state changes, which end user would
be able to see + error like in this patch.

On top of it, mgmt could build a better UIx, like retry/notify logic if
that's what user really wishes for and configures (it would be up to user to
define behaviour).

> > > What if we have a buggy guest with a kernel stuck in blinking?
> > > Is QEMU's responsibility to emulate the operator itself? Because the
> > > operator
> > > is the one who is supposed to wait.
> > > 
> > > 
> > > Thanks,
> > > Marcel
> > > 
> > > [...]  
> > 
> > 
> > -- 
> > David Gibson <dgibson@redhat.com>
> > Principal Software Engineer, Virtualization, Red Hat  
> 
> 
> 


Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by David Gibson 3 years, 6 months ago
On Tue, 27 Oct 2020 13:54:26 +0100
Igor Mammedov <imammedo@redhat.com> wrote:

> On Tue, 27 Oct 2020 07:26:44 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
>  [...]  
>  [...]  
>  [...]  
> [...]
>  [...]  
> > > 
> > > It certainly shouldn't wait an unbounded time.  But a wait with timeout
> > > seems worth investigating to me.    
> racy, timeout is bound to break once it's in overcommited env.

Hm.  That's no less true at the management layer than it is at the qemu
layer.

> > If it's helpful, I'd add a query to check state
> > so management can figure out why doesn't guest see device yet.  
> that means mgmt would have to poll it and forward it to user
> somehow.

If that even makes sense.  In the case of Kata, it's supposed to be
autonomously creating the VM, so there's nothing meaningful it can
forward to the user other than "failed to create the container because
of some hotplug problem that means nothing to you".

>  [...]  
> I have more questions wrt the suggestion/workflow:
> * at what place would you suggest buffering it?
> * what would be the request in this case, i.e. create PCI device anyways
> and try to signal hotplug event later?
> * what would baremethal do in such case?
> * what to do in case guest is never ready, what user should do in such case?
> * can be such device be removed?
> 
> not sure that all of this is worth of the effort and added complexity.
> 
> alternatively:
> maybe ports can send QMP events about it's state changes, which end user would
> be able to see + error like in this patch.
> 
> On top of it, mgmt could build a better UIx, like retry/notify logic if
> that's what user really wishes for and configures (it would be up to user to
> define behaviour).

That kind of makes sense if the user is explicitly requesting hotplugs,
but that's not necessarily the case.

-- 
David Gibson <dgibson@redhat.com>
Principal Software Engineer, Virtualization, Red Hat
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Igor Mammedov 3 years, 6 months ago
On Wed, 28 Oct 2020 14:31:35 +1100
David Gibson <dgibson@redhat.com> wrote:

> On Tue, 27 Oct 2020 13:54:26 +0100
> Igor Mammedov <imammedo@redhat.com> wrote:
> 
> > On Tue, 27 Oct 2020 07:26:44 -0400
> > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > 
> >  [...]  
> >  [...]  
> >  [...]  
> > [...]
> >  [...]    
> > > > 
> > > > It certainly shouldn't wait an unbounded time.  But a wait with timeout
> > > > seems worth investigating to me.      
> > racy, timeout is bound to break once it's in overcommited env.  
> 
> Hm.  That's no less true at the management layer than it is at the qemu
> layer.
true, but it's user policy which is defined by user not by QEMU.

> 
> > > If it's helpful, I'd add a query to check state
> > > so management can figure out why doesn't guest see device yet.    
> > that means mgmt would have to poll it and forward it to user
> > somehow.  
> 
> If that even makes sense.  In the case of Kata, it's supposed to be
> autonomously creating the VM, so there's nothing meaningful it can
> forward to the user other than "failed to create the container because
> of some hotplug problem that means nothing to you".
> 
> >  [...]  
> > I have more questions wrt the suggestion/workflow:
> > * at what place would you suggest buffering it?
> > * what would be the request in this case, i.e. create PCI device anyways
> > and try to signal hotplug event later?
> > * what would baremethal do in such case?
> > * what to do in case guest is never ready, what user should do in such case?
> > * can be such device be removed?
> > 
> > not sure that all of this is worth of the effort and added complexity.
> > 
> > alternatively:
> > maybe ports can send QMP events about it's state changes, which end user would
> > be able to see + error like in this patch.
> > 
> > On top of it, mgmt could build a better UIx, like retry/notify logic if
> > that's what user really wishes for and configures (it would be up to user to
> > define behaviour).  
> 
> That kind of makes sense if the user is explicitly requesting hotplugs,
> but that's not necessarily the case.
user doesn't have to be a human, it could be some mgmt layer that would
automate retry logic, depending on what actually user needs for particular task
(i.e. fail immediately, retry N time then fail, retry with time out - then fail,
don't care - succeed, ...). The point is for QEMU to provide means for mgmt to
implement whatever policy user would need.

PS:
but then, I know close to nothing about PCI, so all of above might be nonsense.


Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Michael S. Tsirkin 3 years, 6 months ago
On Wed, Oct 28, 2020 at 04:39:45PM +0100, Igor Mammedov wrote:
> On Wed, 28 Oct 2020 14:31:35 +1100
> David Gibson <dgibson@redhat.com> wrote:
> 
> > On Tue, 27 Oct 2020 13:54:26 +0100
> > Igor Mammedov <imammedo@redhat.com> wrote:
> > 
> > > On Tue, 27 Oct 2020 07:26:44 -0400
> > > "Michael S. Tsirkin" <mst@redhat.com> wrote:
> > > 
> > >  [...]  
> > >  [...]  
> > >  [...]  
> > > [...]
> > >  [...]    
> > > > > 
> > > > > It certainly shouldn't wait an unbounded time.  But a wait with timeout
> > > > > seems worth investigating to me.      
> > > racy, timeout is bound to break once it's in overcommited env.  
> > 
> > Hm.  That's no less true at the management layer than it is at the qemu
> > layer.
> true, but it's user policy which is defined by user not by QEMU.
> 
> > 
> > > > If it's helpful, I'd add a query to check state
> > > > so management can figure out why doesn't guest see device yet.    
> > > that means mgmt would have to poll it and forward it to user
> > > somehow.  
> > 
> > If that even makes sense.  In the case of Kata, it's supposed to be
> > autonomously creating the VM, so there's nothing meaningful it can
> > forward to the user other than "failed to create the container because
> > of some hotplug problem that means nothing to you".
> > 
> > >  [...]  
> > > I have more questions wrt the suggestion/workflow:
> > > * at what place would you suggest buffering it?
> > > * what would be the request in this case, i.e. create PCI device anyways
> > > and try to signal hotplug event later?
> > > * what would baremethal do in such case?
> > > * what to do in case guest is never ready, what user should do in such case?
> > > * can be such device be removed?
> > > 
> > > not sure that all of this is worth of the effort and added complexity.
> > > 
> > > alternatively:
> > > maybe ports can send QMP events about it's state changes, which end user would
> > > be able to see + error like in this patch.
> > > 
> > > On top of it, mgmt could build a better UIx, like retry/notify logic if
> > > that's what user really wishes for and configures (it would be up to user to
> > > define behaviour).  
> > 
> > That kind of makes sense if the user is explicitly requesting hotplugs,
> > but that's not necessarily the case.
> user doesn't have to be a human, it could be some mgmt layer that would
> automate retry logic, depending on what actually user needs for particular task
> (i.e. fail immediately, retry N time then fail, retry with time out - then fail,
> don't care - succeed, ...). The point is for QEMU to provide means for mgmt to
> implement whatever policy user would need.

We are not coming up with new APIs here. Let's make existing ones
work reliably first. We can talk about a flag where it fails
instead of deferring hotplug, separately.

> PS:
> but then, I know close to nothing about PCI, so all of above might be nonsense.




Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by Michael S. Tsirkin 3 years, 6 months ago
On Tue, Oct 27, 2020 at 01:54:26PM +0100, Igor Mammedov wrote:
> On Tue, 27 Oct 2020 07:26:44 -0400
> "Michael S. Tsirkin" <mst@redhat.com> wrote:
> 
> > On Mon, Oct 26, 2020 at 05:45:37PM +1100, David Gibson wrote:
> > > On Fri, 23 Oct 2020 09:26:48 +0300
> > > Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote:
> > >   
> > > > Hi Michael,
> > > > 
> > > > On Thu, Oct 22, 2020 at 6:01 PM Michael S. Tsirkin <mst@redhat.com> wrote:
> > > > 
> > > >  [...]  
> [...]
> > > > Simplistic does not mean wrong or incorrect.
> > > > I fail to see why it is not enough.
> > > > 
> > > > What QEMU can do better? Wait an unbounded time for the blinking to finish?  
> > > 
> > > It certainly shouldn't wait an unbounded time.  But a wait with timeout
> > > seems worth investigating to me.  
> racy, timeout is bound to break once it's in overcommited env.
> 
> > If it's helpful, I'd add a query to check state
> > so management can figure out why doesn't guest see device yet.
> that means mgmt would have to poll it and forward it to user
> somehow.
> 
> > But otherwise just buffer the request until such time as
> > we can deliver it to guest ...
> I have more questions wrt the suggestion/workflow:
> * at what place would you suggest buffering it?

PCIESlot maybe?

> * what would be the request in this case, i.e. create PCI device anyways
> and try to signal hotplug event later?


that was my idea, yes.

> * what would baremethal do in such case?

exactly the same, human would wait until blinking stops.

> * what to do in case guest is never ready, what user should do in such case?

As in guest is stuck? Do we care? It can't use the device.

> * can be such device be removed?

why not? device_del could complete immediately ...

> not sure that all of this is worth of the effort and added complexity.
> 
> alternatively:
> maybe ports can send QMP events about it's state changes, which end user would
> be able to see + error like in this patch.
> 
> On top of it, mgmt could build a better UIx, like retry/notify logic if
> that's what user really wishes for and configures (it would be up to user to
> define behaviour).

I'd say let's get expected behaviour for existing commands first.
We can add events and stuff on top.

> > > > What if we have a buggy guest with a kernel stuck in blinking?
> > > > Is QEMU's responsibility to emulate the operator itself? Because the
> > > > operator
> > > > is the one who is supposed to wait.
> > > > 
> > > > 
> > > > Thanks,
> > > > Marcel
> > > > 
> > > > [...]  
> > > 
> > > 
> > > -- 
> > > David Gibson <dgibson@redhat.com>
> > > Principal Software Engineer, Virtualization, Red Hat  
> > 
> > 
> > 


Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by David Gibson 3 years, 6 months ago
On Tue, 27 Oct 2020 09:02:06 -0400
"Michael S. Tsirkin" <mst@redhat.com> wrote:

> On Tue, Oct 27, 2020 at 01:54:26PM +0100, Igor Mammedov wrote:
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
> > I have more questions wrt the suggestion/workflow:
> > * at what place would you suggest buffering it?  
> 
> PCIESlot maybe?
> 
> > * what would be the request in this case, i.e. create PCI device anyways
> > and try to signal hotplug event later?  
> 
> 
> that was my idea, yes.

Actually, I don't think that will quite work.  A whole chunk of the
problem here is that because the device is realized, the guest sees it
as part of its general scan *before* the hotplug event (ABP and PDC
interrupts) appears.  That makes the guest misinterpret the ABP as an
*unplug* request.

So delaying the interrupt without delaying the realize (or at least
filtering config space access to it based on.. something) will just
trigger the same problem AFAICT.

> > * what would baremethal do in such case?  
> 
> exactly the same, human would wait until blinking stops.
> 
> > * what to do in case guest is never ready, what user should do in such case?  
> 
> As in guest is stuck? Do we care? It can't use the device.
> 
> > * can be such device be removed?  
> 
> why not? device_del could complete immediately ...
> 
>  [...]  
> 
> I'd say let's get expected behaviour for existing commands first.
> We can add events and stuff on top.
> 
>  [...]  
>  [...]  
>  [...]  
> 


-- 
David Gibson <dgibson@redhat.com>
Principal Software Engineer, Virtualization, Red Hat
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Posted by David Gibson 3 years, 6 months ago
On Thu, 22 Oct 2020 16:55:10 +0300
Marcel Apfelbaum <marcel.apfelbaum@gmail.com> wrote:

> Hi David, Michael,
> 
> On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgibson@redhat.com> wrote:
> 
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
>  [...]  
> > >
> > > Probably the only way to handle for existing machine types.  
> >  
> 
> I agree
> 
> 
> > > For new ones, can't we queue it in host memory somewhere?  
> >
> >  
> I am not sure I understand what will be the flow.
>   - The user asks for a hotplug operation.
>   -  QEMU deferred operation.
> After that the operation may still fail, how would the user know if the
> operation
> succeeded or not?
> 
> 
> 
> > I'm not actually convinced we can't do that even for existing machine
> > types.  
> 
> 
> Is a Guest visible change, I don't think we can do it.

How is it a guest visible change?

> > So I'm a bit hesitant to suggest going ahead with this without
> > looking a bit closer at whether we can implement a wait-for-ready in
> > qemu, rather than forcing every user of qemu (human or machine) to do
> > so.
> 
> While I agree it is a pain from the usability point of view, hotplug
> operations
> are allowed to fail. This is not more than a corner case, ensuring the right
> response (gracefully erroring out) may be enough.
> 
> Thanks,
> Marcel
> 
> 
> 
>  [...]  


-- 
David Gibson <dgibson@redhat.com>
Principal Software Engineer, Virtualization, Red Hat