Persistent Failed to attach device following transient Failed to read product/vendor ID

Pighin, Anthony (Nokia - CA/Ottawa) posted 1 patch 1 year, 9 months ago
Failed in applying to current master (apply log)
Persistent Failed to attach device following transient Failed to read product/vendor ID
Posted by Pighin, Anthony (Nokia - CA/Ottawa) 1 year, 9 months ago
I'm attempting to solve the issue reported here:

https://gitlab.com/libvirt/libvirt/-/issues/345

Hoping the originator of virHostdevDeleteMissingPCIDevices() will be able to comment, as I am still trying to reproduce the issue with additional debug in place.

diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c
index c0ce867596..d43354963e 100644
--- a/src/hypervisor/virhostdev.c
+++ b/src/hypervisor/virhostdev.c
@@ -1101,11 +1101,11 @@ virHostdevReAttachPCIDevices(virHostdevManager *mgr,
         VIR_ERROR(_("Failed to allocate PCI device list: %s"),
                   virGetLastErrorMessage());
         virResetLastError();
-        return;
     }
-
-    virHostdevReAttachPCIDevicesImpl(mgr, drv_name, dom_name, pcidevs,
-                                     hostdevs, nhostdevs);
+    else {
+        virHostdevReAttachPCIDevicesImpl(mgr, drv_name, dom_name, pcidevs,
+                                        hostdevs, nhostdevs);
+    }

     /* Handle the case where PCI devices from the host went missing
      * during the domain lifetime */
Re: Persistent Failed to attach device following transient Failed to read product/vendor ID
Posted by Michal Prívozník 1 year, 9 months ago
On 7/11/22 20:14, Pighin, Anthony (Nokia - CA/Ottawa) wrote:
> I'm attempting to solve the issue reported here:
> 
> https://gitlab.com/libvirt/libvirt/-/issues/345
> 
> Hoping the originator of virHostdevDeleteMissingPCIDevices() will be able to comment, as I am still trying to reproduce the issue with additional debug in place.
> 
> diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c
> index c0ce867596..d43354963e 100644
> --- a/src/hypervisor/virhostdev.c
> +++ b/src/hypervisor/virhostdev.c
> @@ -1101,11 +1101,11 @@ virHostdevReAttachPCIDevices(virHostdevManager *mgr,
>          VIR_ERROR(_("Failed to allocate PCI device list: %s"),
>                    virGetLastErrorMessage());
>          virResetLastError();
> -        return;
>      }
> -
> -    virHostdevReAttachPCIDevicesImpl(mgr, drv_name, dom_name, pcidevs,
> -                                     hostdevs, nhostdevs);
> +    else {
> +        virHostdevReAttachPCIDevicesImpl(mgr, drv_name, dom_name, pcidevs,
> +                                        hostdevs, nhostdevs);
> +    }
> 
>      /* Handle the case where PCI devices from the host went missing
>       * during the domain lifetime */
> 

Yeah, this looks like a correct fix, but I'm trying to understand the
original problem more. In the gilab issue you mention 'link bounce' - do
you mean PCIe link?

Michal
RE: Persistent Failed to attach device following transient Failed to read product/vendor ID
Posted by Pighin, Anthony (Nokia - CA/Ottawa) 1 year, 9 months ago
Correct, PCIe link bounce/flap. The attached PCIe device entered a failed state where it was repeatedly resetting, and therefore the link itself was going up and down.

-----Original Message-----
From: Michal Prívozník <mprivozn@redhat.com> 
Sent: Wednesday, July 20, 2022 11:07 AM
To: Pighin, Anthony (Nokia - CA/Ottawa) <anthony.pighin@nokia.com>; libvir-list@redhat.com
Subject: Re: Persistent Failed to attach device following transient Failed to read product/vendor ID

On 7/11/22 20:14, Pighin, Anthony (Nokia - CA/Ottawa) wrote:
> I'm attempting to solve the issue reported here:
> 
> https://gitlab.com/libvirt/libvirt/-/issues/345
> 
> Hoping the originator of virHostdevDeleteMissingPCIDevices() will be able to comment, as I am still trying to reproduce the issue with additional debug in place.
> 
> diff --git a/src/hypervisor/virhostdev.c b/src/hypervisor/virhostdev.c 
> index c0ce867596..d43354963e 100644
> --- a/src/hypervisor/virhostdev.c
> +++ b/src/hypervisor/virhostdev.c
> @@ -1101,11 +1101,11 @@ virHostdevReAttachPCIDevices(virHostdevManager *mgr,
>          VIR_ERROR(_("Failed to allocate PCI device list: %s"),
>                    virGetLastErrorMessage());
>          virResetLastError();
> -        return;
>      }
> -
> -    virHostdevReAttachPCIDevicesImpl(mgr, drv_name, dom_name, pcidevs,
> -                                     hostdevs, nhostdevs);
> +    else {
> +        virHostdevReAttachPCIDevicesImpl(mgr, drv_name, dom_name, pcidevs,
> +                                        hostdevs, nhostdevs);
> +    }
> 
>      /* Handle the case where PCI devices from the host went missing
>       * during the domain lifetime */
> 

Yeah, this looks like a correct fix, but I'm trying to understand the original problem more. In the gilab issue you mention 'link bounce' - do you mean PCIe link?

Michal