From nobody Wed May 22 01:40:17 2024 Delivered-To: importer@patchew.org Received-SPF: pass (zohomail.com: domain of groups.io designates 66.175.222.108 as permitted sender) client-ip=66.175.222.108; envelope-from=bounce+27952+98336+1787277+3901457@groups.io; helo=mail02.groups.io; Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of groups.io designates 66.175.222.108 as permitted sender) smtp.mailfrom=bounce+27952+98336+1787277+3901457@groups.io; dmarc=fail(p=none dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1673512135; cv=none; d=zohomail.com; s=zohoarc; b=UN2vXKFDex+/bB+CUxZZBmMX9DYGErisrmLCb6n8eroYzcMJzTkXzuHfOKFPIAHgxFjVvw0hGpR/cK4UfN6JM5G0rwtQ+2yw50eQroHvgdEBjmxYkaWUSSyL98lrj8AGZ7WrDgooRSO/o4QePTcitwNrGCaDrsmO4jF5xrQOcvQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1673512135; h=Content-Type:Content-Transfer-Encoding:Cc:Date:From:List-Subscribe:List-Id:List-Help:List-Unsubscribe:MIME-Version:Message-ID:Reply-To:Sender:Subject:To; bh=MjIaGu8dMXTLoLAKluV3T+7G7v0uSvTklS5dY/ND5sE=; b=JItQIDijZ7n57iMSiIVe391gw18wYH0F0IZRLUFtouTWmIvyYSJeJHbs+Plnvtj84dafbX7OWY/6QGNxtRUpAo3XmEZJRypHycNF0bjpRtJvYY96y6LhTQc37xoB9CkgiRuYl8jUHiNDt2wQYchyvu1JmmBzED1QUQFK/itDzM4= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of groups.io designates 66.175.222.108 as permitted sender) smtp.mailfrom=bounce+27952+98336+1787277+3901457@groups.io; dmarc=fail header.from= (p=none dis=none) Received: from mail02.groups.io (mail02.groups.io [66.175.222.108]) by mx.zohomail.com with SMTPS id 1673512135404120.7954436098322; Thu, 12 Jan 2023 00:28:55 -0800 (PST) Return-Path: X-Received: by 127.0.0.2 with SMTP id ohsLYY1788612x8FDgEpavtz; Thu, 12 Jan 2023 00:28:55 -0800 X-Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by mx.groups.io with SMTP id smtpd.web10.49990.1673512134457724941 for ; Thu, 12 Jan 2023 00:28:54 -0800 X-Received: from mimecast-mx02.redhat.com (mimecast-mx02.redhat.com [66.187.233.88]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-442-HpO777JxPQejjdBeMVudeg-1; Thu, 12 Jan 2023 03:28:50 -0500 X-MC-Unique: HpO777JxPQejjdBeMVudeg-1 X-Received: from smtp.corp.redhat.com (int-mx07.intmail.prod.int.rdu2.redhat.com [10.11.54.7]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 0D04885D060; Thu, 12 Jan 2023 08:28:50 +0000 (UTC) X-Received: from lacos-laptop-9.usersys.redhat.com (unknown [10.39.192.93]) by smtp.corp.redhat.com (Postfix) with ESMTP id C433E140EBF5; Thu, 12 Jan 2023 08:28:47 +0000 (UTC) From: "Laszlo Ersek" To: devel@edk2.groups.io Cc: Ard Biesheuvel , Brijesh Singh , Erdem Aktas , Gerd Hoffmann , James Bottomley , Jiewen Yao , Jordan Justen , Min Xu , Oliver Steffen , Sebastien Boeuf , Tom Lendacky Subject: [edk2-devel] [PATCH v2] OvmfPkg/PlatformInitLib: catch QEMU's CPU hotplug reg block regression Date: Thu, 12 Jan 2023 09:28:45 +0100 Message-Id: <20230112082845.128463-1-lersek@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 3.1 on 10.11.54.7 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Precedence: Bulk List-Unsubscribe: List-Subscribe: List-Help: Sender: devel@edk2.groups.io List-Id: Mailing-List: list devel@edk2.groups.io; contact devel+owner@edk2.groups.io Reply-To: devel@edk2.groups.io,lersek@redhat.com X-Gm-Message-State: 0M7ZkDR16Thwj51sNyEncilXx1787277AA= Content-Transfer-Encoding: quoted-printable DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=groups.io; q=dns/txt; s=20140610; t=1673512135; bh=MjIaGu8dMXTLoLAKluV3T+7G7v0uSvTklS5dY/ND5sE=; h=Cc:Content-Type:Date:From:Reply-To:Subject:To; b=PFt2Leu2YiTAozX4jbJ16TvwAh/LUQ/yKg9an8SKf5oZpoRZOcuCuOAA5ghdXCthjbb maGYIvcipBneqI1a/MWbObg78LcH/YPR/+SGOgitTclblcGOA/jU4kf0/q7WdFYlVpFvz oOZptzOSRaTv6qyNbPE7rFhZSgNY+5LCb5I= X-ZohoMail-DKIM: pass (identity @groups.io) X-ZM-MESSAGEID: 1673512136094100001 Content-Type: text/plain; charset="utf-8"; x-default="true" In QEMU v5.1.0, the CPU hotplug register block misbehaves: the negotiation protocol is (effectively) broken such that it suggests that switching from the legacy interface to the modern interface works, but in reality the switch never happens. The symptom has been witnessed when using TCG acceleration; KVM seems to mask the issue. The issue persists with the following (latest) stable QEMU releases: v5.2.0, v6.2.0, v7.2.0. Currently there is no stable release that addresses the problem. The QEMU bug confuses the Present and Possible counting in function PlatformMaxCpuCountInitialization(), in "OvmfPkg/Library/PlatformInitLib/Platform.c". OVMF ends up with Present=3D0 Possible=3D1. This in turn further confuses MpInitLib in UefiCpuPkg (hence firmware-time multiprocessing will be broken). Worse, CPU hot(un)plug with SMI will be summarily broken in OvmfPkg/CpuHotplugSmm, which (considering the privilege level of SMM) is not that great. Detect the issue in PlatformMaxCpuCountInitialization(), and print an error message and *hang* if the issue is present. The problem was originally reported by Ard [0]. We analyzed it at [1] and [2]. A QEMU patch was sent at [3]; now merged as commit dab30fbef389 ("acpi: cpuhp: fix guest-visible maximum access size to the legacy reg block", 2023-01-08), to be included in QEMU v8.0.0. [0] https://bugzilla.tianocore.org/show_bug.cgi?id=3D4234#c2 [1] https://bugzilla.tianocore.org/show_bug.cgi?id=3D4234#c3 [2] IO port write width clamping differs between TCG and KVM http://mid.mail-archive.com/aaedee84-d3ed-a4f9-21e7-d221a28d1683@redhat= .com https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00199.html [3] acpi: cpuhp: fix guest-visible maximum access size to the legacy reg bl= ock http://mid.mail-archive.com/20230104090138.214862-1-lersek@redhat.com https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg00278.html NOTE: PlatformInitLib is used in the following platform DSCs: OvmfPkg/AmdSev/AmdSevX64.dsc OvmfPkg/CloudHv/CloudHvX64.dsc OvmfPkg/IntelTdx/IntelTdxX64.dsc OvmfPkg/Microvm/MicrovmX64.dsc OvmfPkg/OvmfPkgIa32.dsc OvmfPkg/OvmfPkgIa32X64.dsc OvmfPkg/OvmfPkgX64.dsc but I can only test this change with the last three platforms, running on QEMU. Test results: TCG QEMU OVMF result patched patched --- ------- ------- ------------------------------------------------- 0 0 0 CPU counts OK (KVM masks the QEMU bug) 0 0 1 CPU counts OK (KVM masks the QEMU bug) 0 1 0 CPU counts OK (QEMU fix, but KVM masks the QEMU bug anyway) 0 1 1 CPU counts OK (QEMU fix, but KVM masks the QEMU bug anyway) 1 0 0 boot with broken CPU counts (original QEMU bug) 1 0 1 broken CPU count caught (boot hangs) 1 1 0 CPU counts OK (QEMU fix) 1 1 1 CPU counts OK (QEMU fix) Cc: Ard Biesheuvel Cc: Brijesh Singh Cc: Erdem Aktas Cc: Gerd Hoffmann Cc: James Bottomley Cc: Jiewen Yao Cc: Jordan Justen Cc: Min Xu Cc: Oliver Steffen Cc: Sebastien Boeuf Cc: Tom Lendacky Bugzilla: https://bugzilla.tianocore.org/show_bug.cgi?id=3D4250 Reviewed-by: Gerd Hoffmann Signed-off-by: Laszlo Ersek --- Notes: v2: =20 - V1 was at . =20 - Repo: , branch: cpuhp-reg-catch-4250-v2 =20 - Remove KVM as a proposed workaround from the error message, because in the QEMU discussion, we had found that the KVM accelerator's behavior in QEMU (masking the problem) was not right, and that a fix for that had been in progress for quite some time. =20 - Add the QEMU commit hash to the commit message, the code comment, and the error message. =20 - Pick up Gerd's R-b; add Oliver to the Cc list. OvmfPkg/Library/PlatformInitLib/Platform.c | 35 ++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/OvmfPkg/Library/PlatformInitLib/Platform.c b/OvmfPkg/Library/P= latformInitLib/Platform.c index 3e13c5d4b34f..13348afb4890 100644 --- a/OvmfPkg/Library/PlatformInitLib/Platform.c +++ b/OvmfPkg/Library/PlatformInitLib/Platform.c @@ -541,6 +541,41 @@ PlatformMaxCpuCountInitialization ( ASSERT (Selected =3D=3D Possible || Selected =3D=3D 0); } while (Selected > 0); =20 + // + // Sanity check: we need at least 1 present CPU (CPU#0 is always pre= sent). + // + // The legacy-to-modern switching of the CPU hotplug register block = got + // broken (for TCG) in QEMU v5.1.0. Refer to "IO port write width cl= amping + // differs between TCG and KVM" at + // + // or at + // . + // + // QEMU received the fix in commit dab30fbef389 ("acpi: cpuhp: fix + // guest-visible maximum access size to the legacy reg block", + // 2023-01-08), to be included in QEMU v8.0.0. + // + // If we're affected by this QEMU bug, then we must not continue: it + // confuses the multiprocessing in UefiCpuPkg/Library/MpInitLib, and + // breaks CPU hot(un)plug with SMI in OvmfPkg/CpuHotplugSmm. + // + if (Present =3D=3D 0) { + DEBUG (( + DEBUG_ERROR, + "%a: Broken CPU hotplug register block: Present=3D%u Possible=3D= %u.\n" + "%a: Update QEMU to v8, or to stable with dab30fbef389 backporte= d.\n" + "%a: Refer to " + ".\n", + __FUNCTION__, + Present, + Possible, + __FUNCTION__, + __FUNCTION__ + )); + ASSERT (FALSE); + CpuDeadLoop (); + } + // // Sanity check: fw_cfg and the modern CPU hotplug interface should // return the same boot CPU count. -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D- Groups.io Links: You receive all messages sent to this group. View/Reply Online (#98336): https://edk2.groups.io/g/devel/message/98336 Mute This Topic: https://groups.io/mt/96218818/1787277 Group Owner: devel+owner@edk2.groups.io Unsubscribe: https://edk2.groups.io/g/devel/unsub [importer@patchew.org] -=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-=3D-