From nobody Tue Dec 9 03:22:49 2025 Received: from mail-il1-f202.google.com (mail-il1-f202.google.com [209.85.166.202]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A5F42FF16E for ; Thu, 13 Nov 2025 05:25:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.202 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763011511; cv=none; b=s8maxJXfwAZSnEk5Bg1jeyq1u48jMH7ZkzgYfQp0GmypFauBuzni4SgYeoOt3IchZWKECxVGjmTHnC5KHN9pqLddwDOQGA1Tl5GOCQnJhdcVjfgkvu6vq/q79k1OG1pRo2hq9B2ZuFFboocrRE9UZ0Uij2nkGyBPFazVSpn6eJ0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763011511; c=relaxed/simple; bh=w/PDH8oJeD6dMpuJ8r5XsK3/Zk2LCzp9EjWC0515E8Y=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=j+ssDH2At2nndT6Znv5V0QMO2e8544lJEmiCNawlKewgorhnNvO2ItshHly4nvNIJbF1XuQm3hb7P69DrpVhXYOt5o72fp2gc2eAHyWGkRiupvS3pO0IRxZrgKyWmRgMMdCxhnVcGCt/oiKQasqGsOfcd6JyT73fPSyBVEo7hxc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--rananta.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=l/2TzuNm; arc=none smtp.client-ip=209.85.166.202 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--rananta.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="l/2TzuNm" Received: by mail-il1-f202.google.com with SMTP id e9e14a558f8ab-43330d77c3aso23322435ab.3 for ; Wed, 12 Nov 2025 21:25:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1763011509; x=1763616309; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=s9SgCCzfx7hqB2TFCbu/gbrkrcw0YcG+BnpWWFTO5/g=; b=l/2TzuNmS9lN2R1VBYZW10nynAVIXUxKDKRpblZllRntOR2hzxAJKpFtOJOZk9IVOi wG/r1wRu79R+wrDDS5c948GJqol8Kt2gKcb71YQnrHn6pqwdQBr5G5u+wBrF46pxDb3b mvVw0Il/P8pKnNS3ob6oJ+UyTwEFk6S0S8m87Rx0LcuXA45xjABPRtDWc9K6/ziMqmUF GC4Q/LdCdPFRE6JP7Eq1leryXwz7GJu9tafdV+iMdaUcSXwZ3uOWu/ulWtZmk+BIAubY ZQOWCpm4gauz7KT+B5VgMy4YzRVMqb3sjuzcC8UhtH8f3tJPu3+WetPzr4Rl8taB5Ow6 PM1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763011509; x=1763616309; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=s9SgCCzfx7hqB2TFCbu/gbrkrcw0YcG+BnpWWFTO5/g=; b=rn5PLmZpnUOO5CcSMQS55GDczN6q5+aTx5t8Oz/SCo4YPjPCjldit/KWmdPdwkVBag +skEI0Y6ey9aiEO2FrroKWXREq5D51LlVXRwj3j1JsAmIWQPQwV4n32hWQTjhQ45G5op t9RS6cprtVM1N4yDWDOwcQ/45ZGk3CPokM0ERHr2JLcmdRjut7ZSV/Lj8vhqwmDk2GqR 6/aYAJe8CutiQ1VvL7MWpAzW+IL2b+Xe6I8v7jP739f0tfmpxX7DsueajiNSO6tmupxa PN4dM/qIe1XYA5ePWeSvhUdh3CFWB0ur8bF20bj1crxD17ot01X0Fn3aTCqXEa/lU3c6 gX2g== X-Forwarded-Encrypted: i=1; AJvYcCVkD4fS+0iHG6UlqW+Bu1Q83PfL7LOhLJHAHGdddtzA4X5ItmbgJoCoEmV8/8fwXO0oz3t9n7Hqp6sA9Lk=@vger.kernel.org X-Gm-Message-State: AOJu0Yz99JSVa/2z1UyRVJZlr9dgkl3VG7HdUZo2o0o+ptaq5LWtXEer J8uKbdU7sZcBr7BVhrvvmRPmbI/g0MFJn2AA/ftGQFnbNEtRW4JfjISHkJIkLEoiOvXgZsBIX0V 7vdRZtZ4nnQ== X-Google-Smtp-Source: AGHT+IEyJSyJIvlGRNrSoUOGgfcAkjKY1WyU5H5l7T8c0BR5P21QqG7oUV9JZZVGF8o1Ck1Nj4/ZN1JRw7Lq X-Received: from ilbbk5.prod.google.com ([2002:a05:6e02:3285:b0:433:76d7:5d39]) (user=rananta job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6e02:190e:b0:433:7310:f5bf with SMTP id e9e14a558f8ab-43473daeab5mr67314545ab.22.1763011509204; Wed, 12 Nov 2025 21:25:09 -0800 (PST) Date: Thu, 13 Nov 2025 05:24:50 +0000 In-Reply-To: <20251113052452.975081-1-rananta@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251113052452.975081-1-rananta@google.com> X-Mailer: git-send-email 2.51.2.1041.gc1ab5b90ca-goog Message-ID: <20251113052452.975081-2-rananta@google.com> Subject: [PATCH 1/3] KVM: arm64: Only drop references on empty tables in stage2_free_walker From: Raghavendra Rao Ananta To: Oliver Upton , Marc Zyngier Cc: Raghavendra Rao Anata , Mingwei Zhang , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Oliver Upton Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Oliver Upton A subsequent change to the way KVM frees stage-2s will invoke the free walker on sub-ranges of the VM's IPA space, meaning there's potential for only partially visiting a table's PTEs. Split the leaf and table visitors and only drop references on a table when the page count reaches 1, implying there are no valid PTEs that need to be visited. Invalidate the table PTE to avoid traversing the stale reference. Signed-off-by: Oliver Upton --- arch/arm64/kvm/hyp/pgtable.c | 38 ++++++++++++++++++++++++++++++------ 1 file changed, 32 insertions(+), 6 deletions(-) diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index c351b4abd5dbf..6d6a23f7dedb6 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -1535,20 +1535,46 @@ size_t kvm_pgtable_stage2_pgd_size(u64 vtcr) return kvm_pgd_pages(ia_bits, start_level) * PAGE_SIZE; } =20 -static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx, - enum kvm_pgtable_walk_flags visit) +static int stage2_free_leaf(const struct kvm_pgtable_visit_ctx *ctx) { struct kvm_pgtable_mm_ops *mm_ops =3D ctx->mm_ops; =20 - if (!stage2_pte_is_counted(ctx->old)) + mm_ops->put_page(ctx->ptep); + return 0; +} + +static int stage2_free_table_post(const struct kvm_pgtable_visit_ctx *ctx) +{ + struct kvm_pgtable_mm_ops *mm_ops =3D ctx->mm_ops; + kvm_pte_t *childp =3D kvm_pte_follow(ctx->old, mm_ops); + + if (mm_ops->page_count(childp) !=3D 1) return 0; =20 + /* + * Drop references and clear the now stale PTE to avoid rewalking the + * freed page table. + */ mm_ops->put_page(ctx->ptep); + mm_ops->put_page(childp); + kvm_clear_pte(ctx->ptep); + return 0; +} =20 - if (kvm_pte_table(ctx->old, ctx->level)) - mm_ops->put_page(kvm_pte_follow(ctx->old, mm_ops)); +static int stage2_free_walker(const struct kvm_pgtable_visit_ctx *ctx, + enum kvm_pgtable_walk_flags visit) +{ + if (!stage2_pte_is_counted(ctx->old)) + return 0; =20 - return 0; + switch (visit) { + case KVM_PGTABLE_WALK_LEAF: + return stage2_free_leaf(ctx); + case KVM_PGTABLE_WALK_TABLE_POST: + return stage2_free_table_post(ctx); + default: + return -EINVAL; + } } =20 void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt) --=20 2.51.2.1041.gc1ab5b90ca-goog From nobody Tue Dec 9 03:22:49 2025 Received: from mail-io1-f74.google.com (mail-io1-f74.google.com [209.85.166.74]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 95A033090CB for ; Thu, 13 Nov 2025 05:25:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.74 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763011514; cv=none; b=jUaaw6DXiazbDlZYXyS2qt6iOGkhRfE3rxX8GhrAgjgchfOlMjDvP0gGjEZQ0ek8pkNoyfbmgmrc4bt1f556EvI22d7eMHscAKUOTrG4Tj8CJX6E+BN5pOaksxajmROvHYlDKMVcAQr+sE/dmh6gtyu/fZIuT8zry38zA2dGjgU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763011514; c=relaxed/simple; bh=3MvUjAEITw8JSRCxSPotHQttoes8WfavPfN/6AC4l2A=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=nIWKX7o7UzmwzBw/PiX3xbSjmi/WKFMbueYMeeIKKAOR7wDLUvhou08e3wCvvv+VLWnVLwOATQWQxBSCegYrRupyfnDzNZndvsvz7gtqYHwIcP6qCxLBESq0ktghs3INipXPOmpwXUHrhKu8ZJfdXr7/EYD7og17Hr2qDb/szfM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--rananta.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=yfH/cDac; arc=none smtp.client-ip=209.85.166.74 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--rananta.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="yfH/cDac" Received: by mail-io1-f74.google.com with SMTP id ca18e2360f4ac-9489c833d89so49303539f.2 for ; Wed, 12 Nov 2025 21:25:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1763011512; x=1763616312; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ta7sr0PTTEzvgb7b4GWZ1P6Da8DksAgzSXe7OlGF+gA=; b=yfH/cDacUnoNKYpnsxFWKgP/2Qg0j/6lIaBon6YOapO4Z8dKTOMW6dmNaE1OBa6/gL RmzGdXYEmpcoGtqHHIBk0nH2IZsybtAcFPSvtGaxcqrCTlom6uvGRr8emCmD3jZtv60Q o39sDB84WamHhOmZ6rwKxcA5OWOrVN0ddr9oI4zi5i3Rc3w1NkGLdszUQpBjnkvEZ+xX GtGJmMyQrsZPnma0nZ4GcD2fd6+byRAQOBkOko1dvthOTgwuiRBiFtFNo83o/e98i/W7 UeXPaKWsgmAOd4LLnz1R8cCvabj623Azoay27wDqJ4wfQSwS6yiWeIFRHocCiGWIB0EE Eu5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763011512; x=1763616312; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ta7sr0PTTEzvgb7b4GWZ1P6Da8DksAgzSXe7OlGF+gA=; b=hrznLWxuzWNspdCs5BYUykw+AfQVQbnrLFNaJB/VMPVNMemG1TfiYcYdSHm65jS/K9 Hhzi+AG27Ci2RCReXx45qiRdnfr9tcjSysHQXrJrl2PMfcUngUigyImMFreSTHyKKI5f eLZo7mlFqYyU4by5qR8tOte4dUXlvm0pAzV7eKDzwoJSixo6Q7gAwoPmyElAebz10OsO YsAUGKio3hrzuMmPbiav352vAHgNZzHwclQX41thbyjENyl6I+KH+WpEmCE1bB8PeJV5 SYz/Pu3TiZKSIcJZa/3UPrJZYNlMZ5hB6DhaYUDPqeNauP1T2DeQGe8iw+KVwYRuhSYw 6NPw== X-Forwarded-Encrypted: i=1; AJvYcCV40fbT2gV5XMGAxIAeTp32YT3SG+l7vtNQXpuXHYrhyxLBgAUETgJhoURo/7ttajpBUN6ey6QXi5/jDvQ=@vger.kernel.org X-Gm-Message-State: AOJu0Yw/NXCB+uBZv/CHj/n6DYn9Lyiovop6jaubor/3Ll70hP93Wmv5 YHiS/jdgDLXs//XqFr/SbhpxyKub5blgi0uGCWOi/yhSwfn/JryQsHQQaOD+xjLNbK+jghP4Be6 4ml8VOBw+ag== X-Google-Smtp-Source: AGHT+IHMvk9ultN6RL8YYeEEQa9a2SgSGY279SAAWTq2ofAb1uDKUH6fYPApAnPv5ZmcPZ+gZVtFlIs2gOZI X-Received: from iopt1.prod.google.com ([2002:a5e:dd01:0:b0:945:af6f:682e]) (user=rananta job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6602:8312:b0:948:9ff9:4189 with SMTP id ca18e2360f4ac-948c46a06f5mr586614639f.19.1763011511803; Wed, 12 Nov 2025 21:25:11 -0800 (PST) Date: Thu, 13 Nov 2025 05:24:51 +0000 In-Reply-To: <20251113052452.975081-1-rananta@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251113052452.975081-1-rananta@google.com> X-Mailer: git-send-email 2.51.2.1041.gc1ab5b90ca-goog Message-ID: <20251113052452.975081-3-rananta@google.com> Subject: [PATCH 2/3] KVM: arm64: Split kvm_pgtable_stage2_destroy() From: Raghavendra Rao Ananta To: Oliver Upton , Marc Zyngier Cc: Raghavendra Rao Anata , Mingwei Zhang , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Oliver Upton Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Split kvm_pgtable_stage2_destroy() into two: - kvm_pgtable_stage2_destroy_range(), that performs the page-table walk and free the entries over a range of addresses. - kvm_pgtable_stage2_destroy_pgd(), that frees the PGD. This refactoring enables subsequent patches to free large page-tables in chunks, calling cond_resched() between each chunk, to yield the CPU as necessary. Existing callers of kvm_pgtable_stage2_destroy(), that probably cannot take advantage of this (such as nVMHE), will continue to function as is. Signed-off-by: Raghavendra Rao Ananta Suggested-by: Oliver Upton Link: https://lore.kernel.org/r/20250820162242.2624752-2-rananta@google.com Signed-off-by: Oliver Upton --- arch/arm64/include/asm/kvm_pgtable.h | 30 ++++++++++++++++++++++++++++ arch/arm64/include/asm/kvm_pkvm.h | 4 +++- arch/arm64/kvm/hyp/pgtable.c | 25 +++++++++++++++++++---- arch/arm64/kvm/mmu.c | 12 +++++++++-- arch/arm64/kvm/pkvm.c | 11 ++++++++-- 5 files changed, 73 insertions(+), 9 deletions(-) diff --git a/arch/arm64/include/asm/kvm_pgtable.h b/arch/arm64/include/asm/= kvm_pgtable.h index 2888b5d037573..1246216616b51 100644 --- a/arch/arm64/include/asm/kvm_pgtable.h +++ b/arch/arm64/include/asm/kvm_pgtable.h @@ -355,6 +355,11 @@ static inline kvm_pte_t *kvm_dereference_pteref(struct= kvm_pgtable_walker *walke return pteref; } =20 +static inline kvm_pte_t *kvm_dereference_pteref_raw(kvm_pteref_t pteref) +{ + return pteref; +} + static inline int kvm_pgtable_walk_begin(struct kvm_pgtable_walker *walker) { /* @@ -384,6 +389,11 @@ static inline kvm_pte_t *kvm_dereference_pteref(struct= kvm_pgtable_walker *walke return rcu_dereference_check(pteref, !(walker->flags & KVM_PGTABLE_WALK_S= HARED)); } =20 +static inline kvm_pte_t *kvm_dereference_pteref_raw(kvm_pteref_t pteref) +{ + return rcu_dereference_raw(pteref); +} + static inline int kvm_pgtable_walk_begin(struct kvm_pgtable_walker *walker) { if (walker->flags & KVM_PGTABLE_WALK_SHARED) @@ -551,6 +561,26 @@ static inline int kvm_pgtable_stage2_init(struct kvm_p= gtable *pgt, struct kvm_s2 */ void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt); =20 +/** + * kvm_pgtable_stage2_destroy_range() - Destroy the unlinked range of addr= esses. + * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init*(). + * @addr: Intermediate physical address at which to place the mapping. + * @size: Size of the mapping. + * + * The page-table is assumed to be unreachable by any hardware walkers pri= or + * to freeing and therefore no TLB invalidation is performed. + */ +void kvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt, + u64 addr, u64 size); + +/** + * kvm_pgtable_stage2_destroy_pgd() - Destroy the PGD of guest stage-2 pag= e-table. + * @pgt: Page-table structure initialised by kvm_pgtable_stage2_init= *(). + * + * It is assumed that the rest of the page-table is freed before this oper= ation. + */ +void kvm_pgtable_stage2_destroy_pgd(struct kvm_pgtable *pgt); + /** * kvm_pgtable_stage2_free_unlinked() - Free an unlinked stage-2 paging st= ructure. * @mm_ops: Memory management callbacks. diff --git a/arch/arm64/include/asm/kvm_pkvm.h b/arch/arm64/include/asm/kvm= _pkvm.h index 08be89c95466e..0aecd4ac5f45d 100644 --- a/arch/arm64/include/asm/kvm_pkvm.h +++ b/arch/arm64/include/asm/kvm_pkvm.h @@ -180,7 +180,9 @@ struct pkvm_mapping { =20 int pkvm_pgtable_stage2_init(struct kvm_pgtable *pgt, struct kvm_s2_mmu *m= mu, struct kvm_pgtable_mm_ops *mm_ops); -void pkvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt); +void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt, + u64 addr, u64 size); +void pkvm_pgtable_stage2_destroy_pgd(struct kvm_pgtable *pgt); int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size, u= 64 phys, enum kvm_pgtable_prot prot, void *mc, enum kvm_pgtable_walk_flags flags); diff --git a/arch/arm64/kvm/hyp/pgtable.c b/arch/arm64/kvm/hyp/pgtable.c index 6d6a23f7dedb6..0882896dbf8f2 100644 --- a/arch/arm64/kvm/hyp/pgtable.c +++ b/arch/arm64/kvm/hyp/pgtable.c @@ -1577,21 +1577,38 @@ static int stage2_free_walker(const struct kvm_pgta= ble_visit_ctx *ctx, } } =20 -void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt) +void kvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt, + u64 addr, u64 size) { - size_t pgd_sz; struct kvm_pgtable_walker walker =3D { .cb =3D stage2_free_walker, .flags =3D KVM_PGTABLE_WALK_LEAF | KVM_PGTABLE_WALK_TABLE_POST, }; =20 - WARN_ON(kvm_pgtable_walk(pgt, 0, BIT(pgt->ia_bits), &walker)); + WARN_ON(kvm_pgtable_walk(pgt, addr, size, &walker)); +} + +void kvm_pgtable_stage2_destroy_pgd(struct kvm_pgtable *pgt) +{ + size_t pgd_sz; + pgd_sz =3D kvm_pgd_pages(pgt->ia_bits, pgt->start_level) * PAGE_SIZE; - pgt->mm_ops->free_pages_exact(kvm_dereference_pteref(&walker, pgt->pgd), = pgd_sz); + + /* + * Since the pgtable is unlinked at this point, and not shared with + * other walkers, safely deference pgd with kvm_dereference_pteref_raw() + */ + pgt->mm_ops->free_pages_exact(kvm_dereference_pteref_raw(pgt->pgd), pgd_s= z); pgt->pgd =3D NULL; } =20 +void kvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt) +{ + kvm_pgtable_stage2_destroy_range(pgt, 0, BIT(pgt->ia_bits)); + kvm_pgtable_stage2_destroy_pgd(pgt); +} + void kvm_pgtable_stage2_free_unlinked(struct kvm_pgtable_mm_ops *mm_ops, v= oid *pgtable, s8 level) { kvm_pteref_t ptep =3D (kvm_pteref_t)pgtable; diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index 7cc964af8d305..c2bc1eba032cd 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -904,6 +904,14 @@ static int kvm_init_ipa_range(struct kvm_s2_mmu *mmu, = unsigned long type) return 0; } =20 +static void kvm_stage2_destroy(struct kvm_pgtable *pgt) +{ + unsigned int ia_bits =3D VTCR_EL2_IPA(pgt->mmu->vtcr); + + KVM_PGT_FN(kvm_pgtable_stage2_destroy_range)(pgt, 0, BIT(ia_bits)); + KVM_PGT_FN(kvm_pgtable_stage2_destroy_pgd)(pgt); +} + /** * kvm_init_stage2_mmu - Initialise a S2 MMU structure * @kvm: The pointer to the KVM structure @@ -980,7 +988,7 @@ int kvm_init_stage2_mmu(struct kvm *kvm, struct kvm_s2_= mmu *mmu, unsigned long t return 0; =20 out_destroy_pgtable: - KVM_PGT_FN(kvm_pgtable_stage2_destroy)(pgt); + kvm_stage2_destroy(pgt); out_free_pgtable: kfree(pgt); return err; @@ -1081,7 +1089,7 @@ void kvm_free_stage2_pgd(struct kvm_s2_mmu *mmu) write_unlock(&kvm->mmu_lock); =20 if (pgt) { - KVM_PGT_FN(kvm_pgtable_stage2_destroy)(pgt); + kvm_stage2_destroy(pgt); kfree(pgt); } } diff --git a/arch/arm64/kvm/pkvm.c b/arch/arm64/kvm/pkvm.c index 24f0f8a8c943c..d7a0f69a99821 100644 --- a/arch/arm64/kvm/pkvm.c +++ b/arch/arm64/kvm/pkvm.c @@ -344,9 +344,16 @@ static int __pkvm_pgtable_stage2_unmap(struct kvm_pgta= ble *pgt, u64 start, u64 e return 0; } =20 -void pkvm_pgtable_stage2_destroy(struct kvm_pgtable *pgt) +void pkvm_pgtable_stage2_destroy_range(struct kvm_pgtable *pgt, + u64 addr, u64 size) { - __pkvm_pgtable_stage2_unmap(pgt, 0, ~(0ULL)); + __pkvm_pgtable_stage2_unmap(pgt, addr, addr + size); +} + +void pkvm_pgtable_stage2_destroy_pgd(struct kvm_pgtable *pgt) +{ + /* Expected to be called after all pKVM mappings have been released. */ + WARN_ON_ONCE(!RB_EMPTY_ROOT(&pgt->pkvm_mappings.rb_root)); } =20 int pkvm_pgtable_stage2_map(struct kvm_pgtable *pgt, u64 addr, u64 size, --=20 2.51.2.1041.gc1ab5b90ca-goog From nobody Tue Dec 9 03:22:49 2025 Received: from mail-il1-f201.google.com (mail-il1-f201.google.com [209.85.166.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0A28E30BB86 for ; Thu, 13 Nov 2025 05:25:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.166.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763011516; cv=none; b=DLtHjF1LnnggaV6Ad27KkcOj72Bu7l4Z5N58iS7l8t+mq7ooidGvzVzIbwa4fQ+Wgz/cX5srF44QRfWy/QsvBHqIZOQeLiCcegi+y/yf/mV/1zSI3b7EiZ4Fl3dhRgBrEacxPWPjIfaiY1O94UmWtPCULuLaojNEm8dOLyDQ1C4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1763011516; c=relaxed/simple; bh=P9X8/pKAhvG/wWkSjOkEiXdGNBedfO9mFofKaMhghnc=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=ucUBBMGJtcBmiymK1oL6+u8iOfujtNlGTy6vz3qeypflKwQPRESp5YnyuMuTJ5pkllyyU6c6GvB66qCGqbPFYjoRKTiVQ072STAipMayAReXRk9q4O3TGqyWf8dVuQ3/fKdWBpWuTgBql8JXbcg0ICmN3mYOeFB+lVwKkrub1Tc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--rananta.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=dDhsBjQs; arc=none smtp.client-ip=209.85.166.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--rananta.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="dDhsBjQs" Received: by mail-il1-f201.google.com with SMTP id e9e14a558f8ab-43478824a6fso6602455ab.3 for ; Wed, 12 Nov 2025 21:25:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1763011514; x=1763616314; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=YR4PqSCoKUZHQwNLH+ibXY+eaMViVtVOUHNqTtgOoNw=; b=dDhsBjQsQl/Q5J1bCQs23AzkHSaTmoGxCQEP1LOyrJp8IzZA1uKS3ZPoaglgjF54Vo xKUCibGXqtO9yw7FXmLL6I+yqPXHKQRVvlwm/MGzuBMlD3KMxOl+p3/STy/MC/mN/P4h b2NZs+Sabx1FPgVUIHoNi0freQQAfYsrOiqBu+CjMKHuh1eWHgbpRXAE+JZVwr4bv9fb EtzJA/udBAgPfTfryJ48edtMaBX8K/n8s1xSKPuLpej41R1R5KPlwhfvnMBZ2jfh0i+m +rGAvVCQuJsdwMTPUcnMxsMXQ63CuJO1a3aY/GLK+lGzPlG93ifOxzsXwEt3h4jQ6Dhk N+UQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1763011514; x=1763616314; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=YR4PqSCoKUZHQwNLH+ibXY+eaMViVtVOUHNqTtgOoNw=; b=TeJy/ii7DZPtEnJaY0FdvIA6iJm4B9MDIbbKjjh0VY4OAEFByKxAi+U7QW2utIt/yu Sme0s+x+48cbeLwO7T0wX2cywwn9S3s/pP36wUfjss2+FCnf8cm3SMvhU4NCQn+2UB0x 7Ff8ZAMC2Ld2QF/nOnFz0w5kkddQHYxNgnF5q5C0FDo7NNatdJ0TAj8o7A6alQzM7NTc UvyHpaNK/Z7qOEBIpeITfgHJmWPc/0tT8jlNBH+edaob1x6xFwAUgQ8jj+jsJMC1xm5H KHf4bjsjSbYqujKNeBqNPHwzqWN8w+EtpdIJX8Mln0hq32Qm7BgRpT5nzvYRJFO1ghao CKFQ== X-Forwarded-Encrypted: i=1; AJvYcCXZnOuq9/OMo2QayQgtoa7gByE/WQf0UX7vPdap1QkYKHEqDfyAWsJUrCDHCvU/tKgsLPNzcmYBh2IVaMk=@vger.kernel.org X-Gm-Message-State: AOJu0YwCfl2yI/nO20F9DEEwfvIzNf6fCWRb/XI6soBIRKPKkojjkl8R 0F0a/OeTvmcnz/IRGeWIalqfn/lBKIFX0b5tGBrgagP1Hq8TwWyNHJgngjzS7AF/rwKAkeqWv9D 4Cdikb53jcw== X-Google-Smtp-Source: AGHT+IFRYF4zNyv9Ly6pBzYTZfMV1gbB624Cl5TmoplavbzHX+2MSvXvATj641ux/60yeM2/PvFgItVBduUS X-Received: from ilbbm14.prod.google.com ([2002:a05:6e02:330e:b0:433:2091:8e7e]) (user=rananta job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6e02:1a83:b0:433:2072:9d2c with SMTP id e9e14a558f8ab-43473d2a3a5mr86726965ab.10.1763011514189; Wed, 12 Nov 2025 21:25:14 -0800 (PST) Date: Thu, 13 Nov 2025 05:24:52 +0000 In-Reply-To: <20251113052452.975081-1-rananta@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20251113052452.975081-1-rananta@google.com> X-Mailer: git-send-email 2.51.2.1041.gc1ab5b90ca-goog Message-ID: <20251113052452.975081-4-rananta@google.com> Subject: [PATCH 3/3] KVM: arm64: Reschedule as needed when destroying the stage-2 page-tables From: Raghavendra Rao Ananta To: Oliver Upton , Marc Zyngier Cc: Raghavendra Rao Anata , Mingwei Zhang , linux-arm-kernel@lists.infradead.org, kvmarm@lists.linux.dev, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, Oliver Upton Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When a large VM, specifically one that holds a significant number of PTEs, gets abruptly destroyed, the following warning is seen during the page-table walk: sched: CPU 0 need_resched set for > 100018840 ns (100 ticks) without sched= ule CPU: 0 UID: 0 PID: 9617 Comm: kvm_page_table_ Tainted: G O 6.16.0-smp-DEV = #3 NONE Tainted: [O]=3DOOT_MODULE Call trace: show_stack+0x20/0x38 (C) dump_stack_lvl+0x3c/0xb8 dump_stack+0x18/0x30 resched_latency_warn+0x7c/0x88 sched_tick+0x1c4/0x268 update_process_times+0xa8/0xd8 tick_nohz_handler+0xc8/0x168 __hrtimer_run_queues+0x11c/0x338 hrtimer_interrupt+0x104/0x308 arch_timer_handler_phys+0x40/0x58 handle_percpu_devid_irq+0x8c/0x1b0 generic_handle_domain_irq+0x48/0x78 gic_handle_irq+0x1b8/0x408 call_on_irq_stack+0x24/0x30 do_interrupt_handler+0x54/0x78 el1_interrupt+0x44/0x88 el1h_64_irq_handler+0x18/0x28 el1h_64_irq+0x84/0x88 stage2_free_walker+0x30/0xa0 (P) __kvm_pgtable_walk+0x11c/0x258 __kvm_pgtable_walk+0x180/0x258 __kvm_pgtable_walk+0x180/0x258 __kvm_pgtable_walk+0x180/0x258 kvm_pgtable_walk+0xc4/0x140 kvm_pgtable_stage2_destroy+0x5c/0xf0 kvm_free_stage2_pgd+0x6c/0xe8 kvm_uninit_stage2_mmu+0x24/0x48 kvm_arch_flush_shadow_all+0x80/0xa0 kvm_mmu_notifier_release+0x38/0x78 __mmu_notifier_release+0x15c/0x250 exit_mmap+0x68/0x400 __mmput+0x38/0x1c8 mmput+0x30/0x68 exit_mm+0xd4/0x198 do_exit+0x1a4/0xb00 do_group_exit+0x8c/0x120 get_signal+0x6d4/0x778 do_signal+0x90/0x718 do_notify_resume+0x70/0x170 el0_svc+0x74/0xd8 el0t_64_sync_handler+0x60/0xc8 el0t_64_sync+0x1b0/0x1b8 The warning is seen majorly on the host kernels that are configured not to force-preempt, such as CONFIG_PREEMPT_NONE=3Dy. To avoid this, instead of walking the entire page-table in one go, split it into smaller ranges, by checking for cond_resched() between each range. Since the path is executed during VM destruction, after the page-table structure is unlinked from the KVM MMU, relying on cond_resched_rwlock_write() isn't necessary. Signed-off-by: Raghavendra Rao Ananta Suggested-by: Oliver Upton Link: https://lore.kernel.org/r/20250820162242.2624752-3-rananta@google.com Signed-off-by: Oliver Upton --- arch/arm64/kvm/mmu.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/arch/arm64/kvm/mmu.c b/arch/arm64/kvm/mmu.c index c2bc1eba032cd..f86d17ad50a7f 100644 --- a/arch/arm64/kvm/mmu.c +++ b/arch/arm64/kvm/mmu.c @@ -904,11 +904,35 @@ static int kvm_init_ipa_range(struct kvm_s2_mmu *mmu,= unsigned long type) return 0; } =20 +/* + * Assume that @pgt is valid and unlinked from the KVM MMU to free the + * page-table without taking the kvm_mmu_lock and without performing any + * TLB invalidations. + * + * Also, the range of addresses can be large enough to cause need_resched + * warnings, for instance on CONFIG_PREEMPT_NONE kernels. Hence, invoke + * cond_resched() periodically to prevent hogging the CPU for a long time + * and schedule something else, if required. + */ +static void stage2_destroy_range(struct kvm_pgtable *pgt, phys_addr_t addr, + phys_addr_t end) +{ + u64 next; + + do { + next =3D stage2_range_addr_end(addr, end); + KVM_PGT_FN(kvm_pgtable_stage2_destroy_range)(pgt, addr, + next - addr); + if (next !=3D end) + cond_resched(); + } while (addr =3D next, addr !=3D end); +} + static void kvm_stage2_destroy(struct kvm_pgtable *pgt) { unsigned int ia_bits =3D VTCR_EL2_IPA(pgt->mmu->vtcr); =20 - KVM_PGT_FN(kvm_pgtable_stage2_destroy_range)(pgt, 0, BIT(ia_bits)); + stage2_destroy_range(pgt, 0, BIT(ia_bits)); KVM_PGT_FN(kvm_pgtable_stage2_destroy_pgd)(pgt); } =20 --=20 2.51.2.1041.gc1ab5b90ca-goog