From nobody Sun Nov 16 00:56:40 2025 Delivered-To: importer@patchew.org Authentication-Results: mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass(p=quarantine dis=none) header.from=redhat.com ARC-Seal: i=1; a=rsa-sha256; t=1745571067; cv=none; d=zohomail.com; s=zohoarc; b=CmDBxycm641O9bcaRDQBJvLDC1aujhjs70dxF7Ou6EuGNmdhBUYI3qKuDCKQH3HqM33RQsU7WAv9CL5y5PNtEb7FQoeoccpvyvq7bzUE8Kf0is+hFur7/gbs9fuVQCseEO1WO0SDcxYoevMV/2cxavKFkZ5iPd2CKK4IYjebeV0= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=zohomail.com; s=zohoarc; t=1745571067; h=Content-Type:Content-Transfer-Encoding:Cc:Cc:Date:Date:From:From:In-Reply-To:List-Subscribe:List-Post:List-Id:List-Archive:List-Help:List-Unsubscribe:MIME-Version:Message-ID:References:Sender:Subject:Subject:To:To:Message-Id:Reply-To; bh=FOjx7Ww77adIq1GqkqZD+JwC0UxWK+D5vsju9/L0IZ0=; b=BAfhwqk7zLkaI/HkhMAePwxOSHG1DywFvabdbH9SJF+Nm1YY/Hsyds6WzoXiiyz0GU10C5qAsFLu2MgtsL5AhNCuxrkVLupdw9V8J2fd1KTOb1JSIfgXk66RommHNAb9w9EMGrQOr61wZFOffODpMo1lV3hoIx/FAtH5LqBwtZ8= ARC-Authentication-Results: i=1; mx.zohomail.com; dkim=pass; spf=pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=qemu-devel-bounces+importer=patchew.org@nongnu.org; dmarc=pass header.from= (p=quarantine dis=none) Return-Path: Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) by mx.zohomail.com with SMTPS id 1745571067667353.5556365039432; Fri, 25 Apr 2025 01:51:07 -0700 (PDT) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1u8EiW-0002mZ-L1; Fri, 25 Apr 2025 04:47:28 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1u8EiI-0002iV-8l for qemu-devel@nongnu.org; Fri, 25 Apr 2025 04:47:14 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.133.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1u8EiD-0003Cx-K9 for qemu-devel@nongnu.org; Fri, 25 Apr 2025 04:47:12 -0400 Received: from mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (ec2-35-165-154-97.us-west-2.compute.amazonaws.com [35.165.154.97]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-384-G8_9OlkmP7K3YvJh-q62cA-1; Fri, 25 Apr 2025 04:47:07 -0400 Received: from mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com [10.30.177.4]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mx-prod-mc-06.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTPS id 2EB45180087B; Fri, 25 Apr 2025 08:47:06 +0000 (UTC) Received: from corto.redhat.com (unknown [10.44.33.65]) by mx-prod-int-01.mail-002.prod.us-west-2.aws.redhat.com (Postfix) with ESMTP id C6C7D30001A2; Fri, 25 Apr 2025 08:47:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1745570829; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=FOjx7Ww77adIq1GqkqZD+JwC0UxWK+D5vsju9/L0IZ0=; b=bN+abl8QSaC2Vzw3NS1jo9A6ZRTg/gp+0AAOAd6/yTN+LTq3hjUD2ha/FsnzhXDlu/Mb78 ZCHCGncPceSrId3p1oxQwTel0bWC+C56qe+Q6k6j28A5N40YbVBmZOXKE4D2HkD6mmWcow +WlJ3NVz3gLe+2RTuppm8ecNJnlwOwE= X-MC-Unique: G8_9OlkmP7K3YvJh-q62cA-1 X-Mimecast-MFC-AGG-ID: G8_9OlkmP7K3YvJh-q62cA_1745570826 From: =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= To: qemu-devel@nongnu.org Cc: Alex Williamson , Amit Machhiwal , =?UTF-8?q?C=C3=A9dric=20Le=20Goater?= Subject: [PULL 04/50] vfio/spapr: Fix L2 crash with PCI device passthrough and memory > 128G Date: Fri, 25 Apr 2025 10:45:57 +0200 Message-ID: <20250425084644.102196-5-clg@redhat.com> In-Reply-To: <20250425084644.102196-1-clg@redhat.com> References: <20250425084644.102196-1-clg@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable X-Scanned-By: MIMEDefang 3.4.1 on 10.30.177.4 Received-SPF: pass (zohomail.com: domain of gnu.org designates 209.51.188.17 as permitted sender) client-ip=209.51.188.17; envelope-from=qemu-devel-bounces+importer=patchew.org@nongnu.org; helo=lists.gnu.org; Received-SPF: pass client-ip=170.10.133.124; envelope-from=clg@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -28 X-Spam_score: -2.9 X-Spam_bar: -- X-Spam_report: (-2.9 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.84, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, RCVD_IN_VALIDITY_CERTIFIED_BLOCKED=0.001, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+importer=patchew.org@nongnu.org Sender: qemu-devel-bounces+importer=patchew.org@nongnu.org X-ZohoMail-DKIM: pass (identity @redhat.com) X-ZM-MESSAGEID: 1745571068374019000 From: Amit Machhiwal An L2 KVM guest fails to boot inside a pSeries LPAR when booted with a memory more than 128 GB and PCI device passthrough. The L2 guest also crashes when it is booted with a memory greater than 128 GB and a PCI device is hotplugged later. The issue arises from a conditional check for `levels > 1` in `spapr_tce_create_table()` within L1 KVM. This check is meant to prevent multi-level TCEs, which are not supported by the PowerVM hypervisor. As a result, when QEMU makes a `VFIO_IOMMU_SPAPR_TCE_CREATE` ioctl call with `levels > 1`, it triggers the conditional check and returns `EINVAL`, causing the guest to crash with the following errors: 2025-03-04T06:36:36.133117Z qemu-system-ppc64: Failed to create a window, = ret =3D -1 (Invalid argument) 2025-03-04T06:36:36.133176Z qemu-system-ppc64: Failed to create SPAPR wind= ow: Invalid argument qemu: hardware error: vfio: DMA mapping failed, unable to continue Fix this by checking the supported DDW "levels" returned by the VFIO_IOMMU_SPAPR_TCE_GET_INFO ioctl before attempting the TCE create ioctl in KVM. The patch has been tested on KVM guests with memory configurations of up to 390GB, and 450GB on PowerVM and bare-metal environments respectively. Signed-off-by: Amit Machhiwal Reviewed-by: C=C3=A9dric Le Goater Link: https://lore.kernel.org/qemu-devel/20250408124042.2695955-3-amachhiw@= linux.ibm.com Signed-off-by: C=C3=A9dric Le Goater --- hw/vfio/spapr.c | 36 +++++++++++++++++++++++++++--------- 1 file changed, 27 insertions(+), 9 deletions(-) diff --git a/hw/vfio/spapr.c b/hw/vfio/spapr.c index 3d6354134c3d517b810d28404f3e2a7eee9b1192..7e5cb95f6a48ff87a0fddd62f9c= 3c297b790049d 100644 --- a/hw/vfio/spapr.c +++ b/hw/vfio/spapr.c @@ -26,6 +26,7 @@ typedef struct VFIOSpaprContainer { VFIOContainer container; MemoryListener prereg_listener; QLIST_HEAD(, VFIOHostDMAWindow) hostwin_list; + unsigned int levels; } VFIOSpaprContainer; =20 OBJECT_DECLARE_SIMPLE_TYPE(VFIOSpaprContainer, VFIO_IOMMU_SPAPR); @@ -236,9 +237,11 @@ static bool vfio_spapr_create_window(VFIOContainer *co= ntainer, { int ret =3D 0; VFIOContainerBase *bcontainer =3D &container->bcontainer; + VFIOSpaprContainer *scontainer =3D container_of(container, VFIOSpaprCo= ntainer, + container); IOMMUMemoryRegion *iommu_mr =3D IOMMU_MEMORY_REGION(section->mr); uint64_t pagesize =3D memory_region_iommu_get_min_page_size(iommu_mr),= pgmask; - unsigned entries, bits_total, bits_per_level, max_levels; + unsigned entries, bits_total, bits_per_level, max_levels, ddw_levels; struct vfio_iommu_spapr_tce_create create =3D { .argsz =3D sizeof(crea= te) }; long rampagesize =3D qemu_minrampagesize(); =20 @@ -291,16 +294,29 @@ static bool vfio_spapr_create_window(VFIOContainer *c= ontainer, */ bits_per_level =3D ctz64(qemu_real_host_page_size()) + 8; create.levels =3D bits_total / bits_per_level; - if (bits_total % bits_per_level) { - ++create.levels; - } - max_levels =3D (64 - create.page_shift) / ctz64(qemu_real_host_page_si= ze()); - for ( ; create.levels <=3D max_levels; ++create.levels) { - ret =3D ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_CREATE, &create); - if (!ret) { - break; + + ddw_levels =3D scontainer->levels; + if (ddw_levels > 1) { + if (bits_total % bits_per_level) { + ++create.levels; } + max_levels =3D (64 - create.page_shift) / ctz64(qemu_real_host_pag= e_size()); + for ( ; create.levels <=3D max_levels; ++create.levels) { + ret =3D ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_CREATE, &cre= ate); + if (!ret) { + break; + } + } + } else { /* ddw_levels =3D=3D 1 */ + if (create.levels > ddw_levels) { + error_setg_errno(errp, EINVAL, "Host doesn't support multi-lev= el TCE tables" + ". Use larger IO page size. Supported mask is= 0x%lx", + bcontainer->pgsizes); + return false; + } + ret =3D ioctl(container->fd, VFIO_IOMMU_SPAPR_TCE_CREATE, &create); } + if (ret) { error_setg_errno(errp, errno, "Failed to create a window, ret =3D = %d", ret); return false; @@ -501,6 +517,8 @@ static bool vfio_spapr_container_setup(VFIOContainerBas= e *bcontainer, goto listener_unregister_exit; } =20 + scontainer->levels =3D info.ddw.levels; + if (v2) { bcontainer->pgsizes =3D info.ddw.pgsizes; /* --=20 2.49.0