From nobody Thu Apr 2 12:33:05 2026 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B5F8E372B3D for ; Thu, 26 Mar 2026 22:24:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774563890; cv=none; b=cPvH/Q8mQc4TZvcLnKbVcmuM1lSdMEI17TFEJ/3rmO6kP9P2H823/b2DxtnDu5JZqB/0yMmDvIxPI87Cycl7nnZYpXt5iXFiisyC+KL1rtf6G5q1SaRfFUDYfnxoU5P3D1mBgWg95hOTEV8HjCaLNNt6LXWmk7P5nqWkdUS+Ej0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1774563890; c=relaxed/simple; bh=ZJiMvoi/meyCAH/H680tKhOQBTf6lB9IuvXAsiQfyOo=; h=Date:In-Reply-To:Mime-Version:References:Message-ID:Subject:From: To:Cc:Content-Type; b=tc4LZv6Fq0DZn+hnNxVgk3Zw/+IzPBvn/VFgb+yP1IRgNRzJCZeC01IA+AERzOi9tT+I426lmYsPh4ikPaY0mvHdgExAMTHmXC6ftbdn2K2l0OG1DRFtc6vXf9LS3dzdPnGRAVIzyi/FJ+Qwzz2BLG/VW8QZ+BTOJ2Z45sxetqQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b=wlW1WYtM; arc=none smtp.client-ip=209.85.210.201 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=reject dis=none) header.from=google.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=flex--ackerleytng.bounces.google.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="wlW1WYtM" Received: by mail-pf1-f201.google.com with SMTP id d2e1a72fcca58-82a782029b7so1139493b3a.3 for ; Thu, 26 Mar 2026 15:24:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20251104; t=1774563887; x=1775168687; darn=vger.kernel.org; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=nCgdcXsysKW/U1BkGMVZDbqNfZApi8yNye51/HQjjFw=; b=wlW1WYtMen+85IuzAwD6tJEeV3Vs1WkPAcGyMqY9OxMnX6obAabC7yLAe7qx+pNAvn cMMxpxYEaJ5onRU6LUUN1WB4CgovlfFDksbYllNPnlLU3+BjHajCHY2/3XySFgsI2K0N sWoIP/DvhKOzzqhGmQyk2Jr3YBHnmJnX24/eBT0++2y6u3Cem+AlJUc1nFx1UE53dXeO GXFgy2tQyXs4Xo8JUgmZuUKuvCIFOEGia2HX8M65qn1f6KC8TYI+PHuceozL5iOrcfed r9NWSmwxCSa4lrH/2SHunV9shQdCMmpgJHph6rlBlTXBKwSRpwf1CfUdpA4rR3dTt7jG 0miQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1774563887; x=1775168687; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nCgdcXsysKW/U1BkGMVZDbqNfZApi8yNye51/HQjjFw=; b=eU5dJdXsdSMOgL7HglnVIxBqSAnDsC1kZgPDSdtdnCZnFvYXCM0wNZTcLNTOC1c0JX wtjt9qq/b3Xj7MRGb0TIdkUlzNAAC+ATMt/IergDwlHJqsatnt+674OWs1EvHIUvVwYM UnLlwjwlVoVfXmC2cXUrF8EIfraspl+KkBd3A0WxvUts+wPM5ZlEFWu0XLNEKn9uL9Lo RJTxQElqpDELyf7qxLAxj+8THH24vuKH3NekR5u68+8lFNeCDKLNPXiZWyZPEuI7YE78 HXOYlH/LDmvcM8CdNNXIpYVSKwm0WXZm63tJwnFLkuds0Aaauy9QGd7f+GkDvSsEnQF9 7ZcA== X-Forwarded-Encrypted: i=1; AJvYcCV+DbfY+tUXINJzqD5UGDLBARB7Q32iX6DmhAiUbiELB3Te/DOnxVpRPVPdTvffU6jbUtiYNi0VH+QE6xU=@vger.kernel.org X-Gm-Message-State: AOJu0Yw52Ent5tLxvzB9b8PLNwlC3ZsOwbd7OY4zzcKTrTc9UUfLJPG0 MVfNBSy+kHYxjcJC8qq4QzcNigwA+lFsmOgo5aC9I9lN3UcKGr+eFbLPkXJA0NZjPlJsJ/+LAl+ fV2v8Az/NwCbe4c5LjkyDowfQ1Q== X-Received: from pfgt12.prod.google.com ([2002:a05:6a00:138c:b0:829:7493:545c]) (user=ackerleytng job=prod-delivery.src-stubby-dispatcher) by 2002:a05:6a00:1c96:b0:823:12cb:f5d1 with SMTP id d2e1a72fcca58-82c95d35b57mr160274b3a.6.1774563886617; Thu, 26 Mar 2026 15:24:46 -0700 (PDT) Date: Thu, 26 Mar 2026 15:24:20 -0700 In-Reply-To: <20260326-gmem-inplace-conversion-v4-0-e202fe950ffd@google.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: Mime-Version: 1.0 References: <20260326-gmem-inplace-conversion-v4-0-e202fe950ffd@google.com> X-Developer-Key: i=ackerleytng@google.com; a=ed25519; pk=sAZDYXdm6Iz8FHitpHeFlCMXwabodTm7p8/3/8xUxuU= X-Developer-Signature: v=1; a=ed25519-sha256; t=1774563861; l=3273; i=ackerleytng@google.com; s=20260225; h=from:subject:message-id; bh=ZJiMvoi/meyCAH/H680tKhOQBTf6lB9IuvXAsiQfyOo=; b=WRe5ynfWZSuX+kx9jMY0JasXB4rIPKpI9ql6ibZPHFbe+yzbcDGcpqU1uX+402hGD156NdMiH L7DiNdokHKID+cbe2SXtvZ5GHdpaoaiPi/BbdGDX1VVbt0wzzcHVT/X X-Mailer: b4 0.14.3 Message-ID: <20260326-gmem-inplace-conversion-v4-11-e202fe950ffd@google.com> Subject: [PATCH RFC v4 11/44] KVM: guest_memfd: Handle lru_add fbatch refcounts during conversion safety check From: Ackerley Tng To: aik@amd.com, andrew.jones@linux.dev, binbin.wu@linux.intel.com, brauner@kernel.org, chao.p.peng@linux.intel.com, david@kernel.org, ira.weiny@intel.com, jmattson@google.com, jroedel@suse.de, jthoughton@google.com, michael.roth@amd.com, oupton@kernel.org, pankaj.gupta@amd.com, qperret@google.com, rick.p.edgecombe@intel.com, rientjes@google.com, shivankg@amd.com, steven.price@arm.com, tabba@google.com, willy@infradead.org, wyihan@google.com, yan.y.zhao@intel.com, forkloop@google.com, pratyush@kernel.org, suzuki.poulose@arm.com, aneesh.kumar@kernel.org, Paolo Bonzini , Sean Christopherson , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , x86@kernel.org, "H. Peter Anvin" , Steven Rostedt , Masami Hiramatsu , Mathieu Desnoyers , Jonathan Corbet , Shuah Khan , Shuah Khan , Vishal Annapurve , Andrew Morton , Chris Li , Kairui Song , Kemeng Shi , Nhat Pham , Baoquan He , Barry Song , Axel Rasmussen , Yuanchu Xie , Wei Xu , Jason Gunthorpe , Vlastimil Babka Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, linux-doc@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, Ackerley Tng Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: quoted-printable When checking if a guest_memfd folio is safe for conversion, its refcount is examined. A folio may be present in a per-CPU lru_add fbatch, which temporarily increases its refcount. This can lead to a false positive, incorrectly indicating that the folio is in use and preventing the conversion, even if it is otherwise safe. The conversion process might not be on the same CPU that holds the folio in its fbatch, making a simple per-CPU check insufficient. To address this, drain all CPUs' lru_add fbatches if an unexpectedly high refcount is encountered during the safety check. This is performed at most once per conversion request. Draining only if the folio in question may be lru cached. guest_memfd folios are unevictable, so they can only reside in the lru_add fbatch. If the folio's refcount is still unsafe after draining, then the conversion is truly deemed unsafe. Signed-off-by: Ackerley Tng --- mm/swap.c | 2 ++ virt/kvm/guest_memfd.c | 23 +++++++++++++++++------ 2 files changed, 19 insertions(+), 6 deletions(-) diff --git a/mm/swap.c b/mm/swap.c index bb19ccbece464..4861661c71fab 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -37,6 +37,7 @@ #include #include #include +#include =20 #include "internal.h" =20 @@ -898,6 +899,7 @@ void lru_add_drain_all(void) lru_add_drain(); } #endif /* CONFIG_SMP */ +EXPORT_SYMBOL_FOR_KVM(lru_add_drain_all); =20 atomic_t lru_disable_count =3D ATOMIC_INIT(0); =20 diff --git a/virt/kvm/guest_memfd.c b/virt/kvm/guest_memfd.c index 0cff9a85a4c53..20a09d9bbcd2b 100644 --- a/virt/kvm/guest_memfd.c +++ b/virt/kvm/guest_memfd.c @@ -8,6 +8,7 @@ #include #include #include +#include =20 #include "kvm_mm.h" =20 @@ -571,25 +572,35 @@ static bool kvm_gmem_range_has_attributes(struct mapl= e_tree *mt, return true; } =20 -static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, pgoff_t s= tart, - size_t nr_pages, pgoff_t *err_index) +static bool kvm_gmem_is_safe_for_conversion(struct inode *inode, + pgoff_t start, size_t nr_pages, + pgoff_t *err_index) { struct address_space *mapping =3D inode->i_mapping; const int filemap_get_folios_refcount =3D 1; pgoff_t last =3D start + nr_pages - 1; struct folio_batch fbatch; + bool lru_drained =3D false; bool safe =3D true; int i; =20 folio_batch_init(&fbatch); while (safe && filemap_get_folios(mapping, &start, last, &fbatch)) { =20 - for (i =3D 0; i < folio_batch_count(&fbatch); ++i) { + for (i =3D 0; i < folio_batch_count(&fbatch);) { struct folio *folio =3D fbatch.folios[i]; =20 - if (folio_ref_count(folio) !=3D - folio_nr_pages(folio) + filemap_get_folios_refcount) { - safe =3D false; + safe =3D (folio_ref_count(folio) =3D=3D + folio_nr_pages(folio) + + filemap_get_folios_refcount); + + if (safe) { + ++i; + } else if (folio_may_be_lru_cached(folio) && + !lru_drained) { + lru_add_drain_all(); + lru_drained =3D true; + } else { *err_index =3D folio->index; break; } --=20 2.53.0.1018.g2bb0e51243-goog