From nobody Sun Apr 28 22:46:37 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) client-ip=208.118.235.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Authentication-Results: mx.zoho.com; spf=pass (zoho.com: domain of gnu.org designates 208.118.235.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; Return-Path: Received: from lists.gnu.org (lists.gnu.org [208.118.235.17]) by mx.zohomail.com with SMTPS id 1490703090622161.67917815391536; Tue, 28 Mar 2017 05:11:30 -0700 (PDT) Received: from localhost ([::1]:52957 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cspyD-00077Z-Ic for importer@patchew.org; Tue, 28 Mar 2017 08:11:29 -0400 Received: from eggs.gnu.org ([2001:4830:134:3::10]:35563) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1cspwV-00069J-Ae for qemu-devel@nongnu.org; Tue, 28 Mar 2017 08:09:44 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1cspwR-0002Qb-BT for qemu-devel@nongnu.org; Tue, 28 Mar 2017 08:09:43 -0400 Received: from mx1.redhat.com ([209.132.183.28]:35076) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1cspwR-0002QP-2Z; Tue, 28 Mar 2017 08:09:39 -0400 Received: from smtp.corp.redhat.com (int-mx03.intmail.prod.int.phx2.redhat.com [10.5.11.13]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 21477C04D29A; Tue, 28 Mar 2017 12:09:38 +0000 (UTC) Received: from thinkpad.redhat.com (ovpn-116-101.ams2.redhat.com [10.36.116.101]) by smtp.corp.redhat.com (Postfix) with ESMTP id B225881C0C; Tue, 28 Mar 2017 12:09:35 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mx1.redhat.com 21477C04D29A Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; dmarc=none (p=none dis=none) header.from=redhat.com Authentication-Results: ext-mx07.extmail.prod.ext.phx2.redhat.com; spf=pass smtp.mailfrom=lvivier@redhat.com DKIM-Filter: OpenDKIM Filter v2.11.0 mx1.redhat.com 21477C04D29A From: Laurent Vivier To: Michael Roth Date: Tue, 28 Mar 2017 14:09:34 +0200 Message-Id: <20170328120934.29526-1-lvivier@redhat.com> X-Scanned-By: MIMEDefang 2.79 on 10.5.11.13 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.31]); Tue, 28 Mar 2017 12:09:38 +0000 (UTC) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] [fuzzy] X-Received-From: 209.132.183.28 Subject: [Qemu-devel] [PATCH v2] spapr: fix memory hot-unplugging X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Laurent Vivier , Thomas Huth , qemu-devel@nongnu.org, qemu-ppc@nongnu.org, Bharata B Rao , David Gibson Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: "Qemu-devel" X-ZohoMail: RSF_0 Z_629925259 SPT_0 Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" If, once the kernel has booted, we try to remove a memory hotplugged while the kernel was not started, QEMU crashes on an assert: qemu-system-ppc64: hw/virtio/vhost.c:651: vhost_commit: Assertion `r >=3D 0' failed. ... #4 in vhost_commit #5 in memory_region_transaction_commit #6 in pc_dimm_memory_unplug #7 in spapr_memory_unplug #8 spapr_machine_device_unplug #9 in hotplug_handler_unplug #10 in spapr_lmb_release #11 in detach #12 in set_allocation_state #13 in rtas_set_indicator ... If we take a closer look to the guest kernel log, we can see when we try to unplug the memory: pseries-hotplug-mem: Attempting to hot-add 4 LMB(s) What happens: 1- The kernel has ignored the memory hotplug event because it was not started when it was generated. 2- When we hot-unplug the memory, QEMU starts to remove the memory, generates an hot-unplug event, and signals the kernel of the incoming new event 3- as the kernel is started, on the QEMU signal, it reads the event list, decodes the hotplug event and tries to finish the hotplugging. 4- QEMU receive the the hotplug notification while it is trying to hot-unplug the memory. This moves the memory DRC to an invalid state This patch prevents this by not allowing to set the allocation state to USABLE while the DRC is awaiting release. RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=3D1432382 Signed-off-by: Laurent Vivier --- v2: Add awaiting_allocation_skippable flag as suggested by Michael hw/ppc/spapr_drc.c | 20 +++++++++++++++++--- include/hw/ppc/spapr_drc.h | 1 + 2 files changed, 18 insertions(+), 3 deletions(-) diff --git a/hw/ppc/spapr_drc.c b/hw/ppc/spapr_drc.c index 150f6bf..a1cdc87 100644 --- a/hw/ppc/spapr_drc.c +++ b/hw/ppc/spapr_drc.c @@ -135,6 +135,17 @@ static uint32_t set_allocation_state(sPAPRDRConnector = *drc, if (!drc->dev) { return RTAS_OUT_NO_SUCH_INDICATOR; } + if (drc->awaiting_release && drc->awaiting_allocation) { + /* kernel is acknowledging a previous hotplug event + * while we are already removing it. + * it's safe to ignore awaiting_allocation here since we know = the + * situation is predicated on the guest either already having = done + * so (boot-time hotplug), or never being able to acquire in t= he + * first place (hotplug followed by immediate unplug). + */ + drc->awaiting_allocation_skippable =3D true; + return RTAS_OUT_NO_SUCH_INDICATOR; + } } =20 if (drc->type !=3D SPAPR_DR_CONNECTOR_TYPE_PCI) { @@ -436,9 +447,11 @@ static void detach(sPAPRDRConnector *drc, DeviceState = *d, } =20 if (drc->awaiting_allocation) { - drc->awaiting_release =3D true; - trace_spapr_drc_awaiting_allocation(get_index(drc)); - return; + if (!drc->awaiting_allocation_skippable) { + drc->awaiting_release =3D true; + trace_spapr_drc_awaiting_allocation(get_index(drc)); + return; + } } =20 drc->indicator_state =3D SPAPR_DR_INDICATOR_STATE_INACTIVE; @@ -448,6 +461,7 @@ static void detach(sPAPRDRConnector *drc, DeviceState *= d, } =20 drc->awaiting_release =3D false; + drc->awaiting_allocation_skippable =3D false; g_free(drc->fdt); drc->fdt =3D NULL; drc->fdt_start_offset =3D 0; diff --git a/include/hw/ppc/spapr_drc.h b/include/hw/ppc/spapr_drc.h index fa531d5..5524247 100644 --- a/include/hw/ppc/spapr_drc.h +++ b/include/hw/ppc/spapr_drc.h @@ -154,6 +154,7 @@ typedef struct sPAPRDRConnector { bool awaiting_release; bool signalled; bool awaiting_allocation; + bool awaiting_allocation_skippable; =20 /* device pointer, via link property */ DeviceState *dev; --=20 2.9.3