[Qemu-devel] [PATCH] spapr: fix memory hot-unplugging

Laurent Vivier posted 1 patch 7 years ago
Patches applied successfully (tree, apply log)
git fetch https://github.com/patchew-project/qemu tags/patchew/20170327132242.17104-1-lvivier@redhat.com
Test s390x passed
There is a newer version of this series
hw/ppc/spapr_drc.c | 6 ++++++
1 file changed, 6 insertions(+)
[Qemu-devel] [PATCH] spapr: fix memory hot-unplugging
Posted by Laurent Vivier 7 years ago
If, once the kernel has booted, we try to remove a memory
hotplugged while the kernel was not started, QEMU crashes on
an assert:

    qemu-system-ppc64: hw/virtio/vhost.c:651:
                       vhost_commit: Assertion `r >= 0' failed.
    ...
    #4  in vhost_commit
    #5  in memory_region_transaction_commit
    #6  in pc_dimm_memory_unplug
    #7  in spapr_memory_unplug
    #8  spapr_machine_device_unplug
    #9  in hotplug_handler_unplug
    #10 in spapr_lmb_release
    #11 in detach
    #12 in set_allocation_state
    #13 in rtas_set_indicator
    ...

If we take a closer look to the guest kernel log, we can see when
we try to unplug the memory:

    pseries-hotplug-mem: Attempting to hot-add 4 LMB(s)

What happens:

    1- The kernel has ignored the memory hotplug event because
       it was not started when it was generated.

    2- When we hot-unplug the memory,
       QEMU starts to remove the memory,
            generates an hot-unplug event,
        and signals the kernel of the incoming new event

    3- as the kernel is started, on the QEMU signal, it reads
       the event list, decodes the hotplug event and tries to
       finish the hotplugging.

    4- QEMU receives the hotplug notification while it
       is trying to hot-unplug the memory. This moves the memory
       DRC to an invalid state

This patch prevents this by not allowing to set the allocation
state to USABLE while the DRC is awaiting release and allocation.

RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1432382

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
---
 hw/ppc/spapr_drc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
index 150f6bf..377ea65 100644
--- a/hw/ppc/spapr_drc.c
+++ b/hw/ppc/spapr_drc.c
@@ -135,6 +135,12 @@ static uint32_t set_allocation_state(sPAPRDRConnector *drc,
         if (!drc->dev) {
             return RTAS_OUT_NO_SUCH_INDICATOR;
         }
+        if (drc->awaiting_release && drc->awaiting_allocation) {
+            /* kernel is acknowledging a previous hotplug event
+             * while we are already removing it.
+             */
+            return RTAS_OUT_NO_SUCH_INDICATOR;
+        }
     }
 
     if (drc->type != SPAPR_DR_CONNECTOR_TYPE_PCI) {
-- 
2.9.3


Re: [Qemu-devel] [PATCH] spapr: fix memory hot-unplugging
Posted by Laurent Vivier 7 years ago
Adding cc: Bharata.

On 27/03/2017 15:22, Laurent Vivier wrote:
> If, once the kernel has booted, we try to remove a memory
> hotplugged while the kernel was not started, QEMU crashes on
> an assert:
> 
>     qemu-system-ppc64: hw/virtio/vhost.c:651:
>                        vhost_commit: Assertion `r >= 0' failed.
>     ...
>     #4  in vhost_commit
>     #5  in memory_region_transaction_commit
>     #6  in pc_dimm_memory_unplug
>     #7  in spapr_memory_unplug
>     #8  spapr_machine_device_unplug
>     #9  in hotplug_handler_unplug
>     #10 in spapr_lmb_release
>     #11 in detach
>     #12 in set_allocation_state
>     #13 in rtas_set_indicator
>     ...
> 
> If we take a closer look to the guest kernel log, we can see when
> we try to unplug the memory:
> 
>     pseries-hotplug-mem: Attempting to hot-add 4 LMB(s)
> 
> What happens:
> 
>     1- The kernel has ignored the memory hotplug event because
>        it was not started when it was generated.
> 
>     2- When we hot-unplug the memory,
>        QEMU starts to remove the memory,
>             generates an hot-unplug event,
>         and signals the kernel of the incoming new event
> 
>     3- as the kernel is started, on the QEMU signal, it reads
>        the event list, decodes the hotplug event and tries to
>        finish the hotplugging.
> 
>     4- QEMU receives the hotplug notification while it
>        is trying to hot-unplug the memory. This moves the memory
>        DRC to an invalid state
> 
> This patch prevents this by not allowing to set the allocation
> state to USABLE while the DRC is awaiting release and allocation.
> 
> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1432382
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
>  hw/ppc/spapr_drc.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 150f6bf..377ea65 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -135,6 +135,12 @@ static uint32_t set_allocation_state(sPAPRDRConnector *drc,
>          if (!drc->dev) {
>              return RTAS_OUT_NO_SUCH_INDICATOR;
>          }
> +        if (drc->awaiting_release && drc->awaiting_allocation) {
> +            /* kernel is acknowledging a previous hotplug event
> +             * while we are already removing it.
> +             */

I'm wondering if I should set drc->awaiting_allocation to false here?

> +            return RTAS_OUT_NO_SUCH_INDICATOR;
> +        }
>      }
>  
>      if (drc->type != SPAPR_DR_CONNECTOR_TYPE_PCI) {
> 

Thanks,
Laurent

Re: [Qemu-devel] [PATCH] spapr: fix memory hot-unplugging
Posted by Michael Roth 7 years ago
Quoting Laurent Vivier (2017-03-27 08:22:42)
> If, once the kernel has booted, we try to remove a memory
> hotplugged while the kernel was not started, QEMU crashes on
> an assert:
> 
>     qemu-system-ppc64: hw/virtio/vhost.c:651:
>                        vhost_commit: Assertion `r >= 0' failed.
>     ...
>     #4  in vhost_commit
>     #5  in memory_region_transaction_commit
>     #6  in pc_dimm_memory_unplug
>     #7  in spapr_memory_unplug
>     #8  spapr_machine_device_unplug
>     #9  in hotplug_handler_unplug
>     #10 in spapr_lmb_release
>     #11 in detach
>     #12 in set_allocation_state
>     #13 in rtas_set_indicator
>     ...
> 
> If we take a closer look to the guest kernel log, we can see when
> we try to unplug the memory:
> 
>     pseries-hotplug-mem: Attempting to hot-add 4 LMB(s)
> 
> What happens:
> 
>     1- The kernel has ignored the memory hotplug event because
>        it was not started when it was generated.
> 
>     2- When we hot-unplug the memory,
>        QEMU starts to remove the memory,
>             generates an hot-unplug event,
>         and signals the kernel of the incoming new event
> 
>     3- as the kernel is started, on the QEMU signal, it reads
>        the event list, decodes the hotplug event and tries to
>        finish the hotplugging.
> 
>     4- QEMU receives the hotplug notification while it
>        is trying to hot-unplug the memory. This moves the memory
>        DRC to an invalid state

So, IIUC, it looks like the the memory is already onlined at boot-time, and
when it attempts to handle the hotplug event the kernel does the following:

static int dlpar_memory_add_by_count(u32 lmbs_to_add, struct property *prop)
{
        ...
        pr_info("Attempting to hot-add %d LMB(s)\n", lmbs_to_add);
        ...
        for (i = 0; i < num_lmbs && lmbs_to_add != lmbs_added; i++) {
                rc = dlpar_acquire_drc(lmbs[i].drc_index);
                if (rc)
                        continue;

                rc = dlpar_add_lmb(&lmbs[i]);
                if (rc) {
                        dlpar_release_drc(lmbs[i].drc_index);
                        continue;
                }

                lmbs_added++;

                /* Mark this lmb so we can remove it later if all of the
                 * requested LMBs cannot be added.
                 */
                lmbs[i].reserved = 1;
        }

dlpar_add_lmb likely fails because the memory was already onlined, and the
acquire/release cause allocation_state transitions of USABLE and UNUSABLE,
respectively. But since in the meantime we've marked the memory for unplug,
this UNUSABLE transition falsely indicates to QEMU that the guest has
acknowledged the unplug and released the memory, so we nuke it prematurely.

> 
> This patch prevents this by not allowing to set the allocation
> state to USABLE while the DRC is awaiting release and allocation.
> 
> RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1432382
> 
> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
> ---
>  hw/ppc/spapr_drc.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c
> index 150f6bf..377ea65 100644
> --- a/hw/ppc/spapr_drc.c
> +++ b/hw/ppc/spapr_drc.c
> @@ -135,6 +135,12 @@ static uint32_t set_allocation_state(sPAPRDRConnector *drc,
>          if (!drc->dev) {
>              return RTAS_OUT_NO_SUCH_INDICATOR;
>          }
> +        if (drc->awaiting_release && drc->awaiting_allocation) {
> +            /* kernel is acknowledging a previous hotplug event
> +             * while we are already removing it.
> +             */
> +            return RTAS_OUT_NO_SUCH_INDICATOR;
> +        }

So basically if the unplug happens soon-enough after the hotplug (or in the
special-case of hotplugging at boot-time, any time after hotplug), we force
the dlpar_acquire_drc to fail so we don't subsequently nuke it on release.
This might trip up test cases where it's common to fire off a ton of
plug/unplug and assume each operation succeeds in turn, but the new behavior
does seem appropriate since there's no reason acquire should be expected to
succeed if we've since marked the memory for removal; that choice is wholly
up to the platform up until the point the acquire actually succeeds.

Technically, though, it's the *release* part that's the real problem, since
that's what's getting misconstrued. Unfortunately I don't see any way to
address things on that end, so this approach seems reasonable.

As for the follow-up question I think clearing awaiting_allocation would be
necessary to allow subsequent unplug to succeed and not be stuck with the
unused DIMM until reset. My main concern there though is if any code ever
decided to retry the hotplug for some reason, they could bypass this check
and trigger the same error. If it was highly unlikely to occur I think I
would opt to simply clear awaiting_allocation, but I think this scenario
could be triggered by simply invoking:

  drmgr -c lmb -s ... -a

after the initial plug and unplug requests fail. Ideally we would keep the
awaiting_allocation && awaiting_release guard in place so dlpar_acquire_drc
never succeeds in this case, but still allow for unplug to succeed. One
*assumption* we can make use of is that acquire/release are done in pairs
by the hotplug code, so if acquire/allocation_state:USABLE always fail, then
a subsequent allocate_state:UNUSABLE would then likely be the result of the
unplug path where release can be reasonably issued in isolation (under the
assumption that hotplug handled the acquire).

So perhaps something like (untested):

       if (drc->awaiting_release && drc->awaiting_allocation) {
           /* kernel is acknowledging a previous hotplug event
            * while we are already removing it.
-           */
+           * it's safe to ignore awaiting_allocation here since we know the
+           * situation is predicated on the guest either already having done
+           * so (boot-time hotplug), or never being able to acquire in the
+           * first place (hotplug followed by immediate unplug).
+           */
+          drc->awaiting_allocation_skippable = true;
           return RTAS_OUT_NO_SUCH_INDICATOR;
       }

and then in detach():

    if (drc->awaiting_allocation) {
-       drc->awaiting_release = true;
-       trace_spapr_drc_awaiting_allocation(get_index(drc));
-       return;
+       if (!drc->awaiting_allocation_skippable) {
+           drc->awaiting_release = true;
+           trace_spapr_drc_awaiting_allocation(get_index(drc));
+           return;
+       }
    }

...

    drc->awaiting_release = false;
+   drc->awaiting_release_skippable = false;


>      }
> 
>      if (drc->type != SPAPR_DR_CONNECTOR_TYPE_PCI) {
> -- 
> 2.9.3
> 
>