[PATCH] spapr: Don't request to unplug the same core twice

Greg Kurz posted 1 patch 4 years, 6 months ago
Test asan passed
Test checkpatch passed
Test FreeBSD passed
Test docker-mingw@fedora passed
Test docker-quick@centos7 passed
Test docker-clang@ubuntu passed
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/157185826035.3073024.1664101000438499392.stgit@bahia.lan
Maintainers: David Gibson <david@gibson.dropbear.id.au>
hw/ppc/spapr.c |    7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
[PATCH] spapr: Don't request to unplug the same core twice
Posted by Greg Kurz 4 years, 6 months ago
We must not call spapr_drc_detach() on a detached DRC otherwise bad things
can happen, ie. QEMU hangs or crashes. This is easily demonstrated with
a CPU hotplug/unplug loop using QMP.

Signed-off-by: Greg Kurz <groug@kaod.org>
---
 hw/ppc/spapr.c |    7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index f9410d390a07..94f9d27096af 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3741,9 +3741,10 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev,
                           spapr_vcpu_id(spapr, cc->core_id));
     g_assert(drc);
 
-    spapr_drc_detach(drc);
-
-    spapr_hotplug_req_remove_by_index(drc);
+    if (!spapr_drc_unplug_requested(drc)) {
+        spapr_drc_detach(drc);
+        spapr_hotplug_req_remove_by_index(drc);
+    }
 }
 
 int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,


Re: [PATCH] spapr: Don't request to unplug the same core twice
Posted by David Gibson 4 years, 6 months ago
On Wed, Oct 23, 2019 at 09:17:40PM +0200, Greg Kurz wrote:
> We must not call spapr_drc_detach() on a detached DRC otherwise bad things
> can happen, ie. QEMU hangs or crashes. This is easily demonstrated with
> a CPU hotplug/unplug loop using QMP.
> 
> Signed-off-by: Greg Kurz <groug@kaod.org>

Ouch, good catch.  Applied.

I wonder if we have the same problem with other DRC types.

> ---
>  hw/ppc/spapr.c |    7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index f9410d390a07..94f9d27096af 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3741,9 +3741,10 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev,
>                            spapr_vcpu_id(spapr, cc->core_id));
>      g_assert(drc);
>  
> -    spapr_drc_detach(drc);
> -
> -    spapr_hotplug_req_remove_by_index(drc);
> +    if (!spapr_drc_unplug_requested(drc)) {
> +        spapr_drc_detach(drc);
> +        spapr_hotplug_req_remove_by_index(drc);
> +    }
>  }
>  
>  int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson
Re: [PATCH] spapr: Don't request to unplug the same core twice
Posted by Greg Kurz 4 years, 6 months ago
On Thu, 24 Oct 2019 09:38:17 +1100
David Gibson <david@gibson.dropbear.id.au> wrote:

> On Wed, Oct 23, 2019 at 09:17:40PM +0200, Greg Kurz wrote:
> > We must not call spapr_drc_detach() on a detached DRC otherwise bad things
> > can happen, ie. QEMU hangs or crashes. This is easily demonstrated with
> > a CPU hotplug/unplug loop using QMP.
> > 
> > Signed-off-by: Greg Kurz <groug@kaod.org>
> 
> Ouch, good catch.  Applied.
> 
> I wonder if we have the same problem with other DRC types.
> 

We don't have it with PHB and PCI types, through the same use of
spapr_drc_unplug_requested().

LMBs see to avoid it by failing device_del early if an unplug
request is already in progress:

    /*
     * An existing pending dimm state for this DIMM means that there is an
     * unplug operation in progress, waiting for the spapr_lmb_release
     * callback to complete the job (BQL can't cover that far). In this case,
     * bail out to avoid detaching DRCs that were already released.
     */
    if (spapr_pending_dimm_unplugs_find(spapr, dimm)) {
        error_setg(&local_err,
                   "Memory unplug already in progress for device %s",
                   dev->id);
        goto out;
    }

Not sure why we error out in this case instead of ignoring the unplug 
request.

> > ---
> >  hw/ppc/spapr.c |    7 ++++---
> >  1 file changed, 4 insertions(+), 3 deletions(-)
> > 
> > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > index f9410d390a07..94f9d27096af 100644
> > --- a/hw/ppc/spapr.c
> > +++ b/hw/ppc/spapr.c
> > @@ -3741,9 +3741,10 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev,
> >                            spapr_vcpu_id(spapr, cc->core_id));
> >      g_assert(drc);
> >  
> > -    spapr_drc_detach(drc);
> > -
> > -    spapr_hotplug_req_remove_by_index(drc);
> > +    if (!spapr_drc_unplug_requested(drc)) {
> > +        spapr_drc_detach(drc);
> > +        spapr_hotplug_req_remove_by_index(drc);
> > +    }
> >  }
> >  
> >  int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
> > 
> 

Re: [PATCH] spapr: Don't request to unplug the same core twice
Posted by David Gibson 4 years, 5 months ago
On Thu, Oct 24, 2019 at 08:28:54AM +0200, Greg Kurz wrote:
> On Thu, 24 Oct 2019 09:38:17 +1100
> David Gibson <david@gibson.dropbear.id.au> wrote:
> 
> > On Wed, Oct 23, 2019 at 09:17:40PM +0200, Greg Kurz wrote:
> > > We must not call spapr_drc_detach() on a detached DRC otherwise bad things
> > > can happen, ie. QEMU hangs or crashes. This is easily demonstrated with
> > > a CPU hotplug/unplug loop using QMP.
> > > 
> > > Signed-off-by: Greg Kurz <groug@kaod.org>
> > 
> > Ouch, good catch.  Applied.
> > 
> > I wonder if we have the same problem with other DRC types.
> > 
> 
> We don't have it with PHB and PCI types, through the same use of
> spapr_drc_unplug_requested().
> 
> LMBs see to avoid it by failing device_del early if an unplug
> request is already in progress:
> 
>     /*
>      * An existing pending dimm state for this DIMM means that there is an
>      * unplug operation in progress, waiting for the spapr_lmb_release
>      * callback to complete the job (BQL can't cover that far). In this case,
>      * bail out to avoid detaching DRCs that were already released.
>      */
>     if (spapr_pending_dimm_unplugs_find(spapr, dimm)) {
>         error_setg(&local_err,
>                    "Memory unplug already in progress for device %s",
>                    dev->id);
>         goto out;
>     }
> 
> Not sure why we error out in this case instead of ignoring the unplug 
> request.

I suspect no particularly good reason, just history.  Everything's a
bit  different with LMBs because a single device_del will usually
remove a whole bunch of LMB DRCs.  In general the interfacing between
the qemu user side DIMM handling and the PAPR side LMB/DRC handling
is... pretty clunky.

> 
> > > ---
> > >  hw/ppc/spapr.c |    7 ++++---
> > >  1 file changed, 4 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > index f9410d390a07..94f9d27096af 100644
> > > --- a/hw/ppc/spapr.c
> > > +++ b/hw/ppc/spapr.c
> > > @@ -3741,9 +3741,10 @@ void spapr_core_unplug_request(HotplugHandler *hotplug_dev, DeviceState *dev,
> > >                            spapr_vcpu_id(spapr, cc->core_id));
> > >      g_assert(drc);
> > >  
> > > -    spapr_drc_detach(drc);
> > > -
> > > -    spapr_hotplug_req_remove_by_index(drc);
> > > +    if (!spapr_drc_unplug_requested(drc)) {
> > > +        spapr_drc_detach(drc);
> > > +        spapr_hotplug_req_remove_by_index(drc);
> > > +    }
> > >  }
> > >  
> > >  int spapr_core_dt_populate(SpaprDrc *drc, SpaprMachineState *spapr,
> > > 
> > 
> 



-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson