From nobody Fri Jun 12 22:34:20 2026 Received: from mail-pg1-f196.google.com (mail-pg1-f196.google.com [209.85.215.196]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EFF134BC016 for ; Tue, 12 May 2026 09:50:39 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.215.196 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778579441; cv=none; b=FZ86S4gFQDy4WGmI158kWTc0UovupESaqgLfS3aOJ48MRT7TtIJJVt3J6+OtceyStjUTTq1APxX6fwV/u70u9wMyCO8wCfWYSjphtrGGjyRZdJZ1TaElj56nWOPOOD3KOKLmZNwNZi1R/KPvyyZYAUbo/UjZgxv4abJZhbP5XSA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1778579441; c=relaxed/simple; bh=sPvxdPqOZuNkZuZmOXE5cmQytAyQZtJcVovs1HFhHEM=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version; b=XKQf4r5DAPi+TFJ33rQ9qMt3zttL2TcoMg4Hol3TAtLKV24ip6TIBFtA6t4TSHoQBCNBUkNK/OLFivhnKYFs39E7pn+iQ9Vkr2cpEVYWnx7zV5ueXJIaIhThfuykBewiS0FcOPlgYVfk4Xq2wVtVZes5+ywoYkEC16JhAoZy9n4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=gKAMo3lC; arc=none smtp.client-ip=209.85.215.196 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="gKAMo3lC" Received: by mail-pg1-f196.google.com with SMTP id 41be03b00d2f7-c7ffe8eeaf2so2185704a12.0 for ; Tue, 12 May 2026 02:50:39 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20251104; t=1778579439; x=1779184239; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=ONOFtYpxI6P6xc2JJjMFTYUnZhNOQ0Ff9/nUOO77MMQ=; b=gKAMo3lCDrjOpYcFlAeKBInQYw+Vq5mSewup0Sa+7eidqyfADG4LpMq9jtPh9vydv1 89R2dXQ8WHZWRmg2zRd5sR9/Ruo6eFT/QIFLjVuM9nfuZPFYtPe8FZQnSh9tm60JJDus 6IbmtLdRRmCPsO5QGCbweJLDBLcQR12UUlssfvyrn1OSn/MdTtc48exTRpPfKcncx1RM 7zk626VutYXyXC3QJmgZa6vBs4khu9wWShq26szw4ZZiKIG91lSSJP6Y0ocT4xNC+9qY Q/GvGuuxc9Yv7cHZOpliB1KoeXwyBz05n+hbMVMi54xAKNVvL2/0Tlse9Ruf0WUuZgTh wttQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20251104; t=1778579439; x=1779184239; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-gg:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=ONOFtYpxI6P6xc2JJjMFTYUnZhNOQ0Ff9/nUOO77MMQ=; b=A9JBR12MshqUdjUkasVqBNNe6YmhNOMbUeZSPFHGcQh3y3s/VY5P+5fVVzAyAc/4pH YdouWi4HlI/WvT5IiaFOgjnMPxG7MORzT/aewyxyr6IgQceUNyiRFN0qBGsJHZEnYF7A p/8xqBYviRGzpRzKoQ6ANnvk4LpP19PSMv3VPHvCj5UnvXetxUwn8kK1I4YrK4/kND8h NNiX1zLUfMed4RhkIOWYT23elPVsXZb6yB3+tuirGYJxv5zYMHsnauJJrQ8EWGmwyJgM +f+VoMdlVbNGCRJI1/9q5tF1YuyEcZ1OmcsmF1uQHPFZbwkvL7OWYSKbW7E07+MtLjSu UL7Q== X-Forwarded-Encrypted: i=1; AFNElJ85RxoK/bRyYXv1fWXYE+nhIpX17F3XWleDZZjCItJP6CgLNZHEceYBpr8glpH093vQbKrEs6pM0ov6PS8=@vger.kernel.org X-Gm-Message-State: AOJu0YwF7V/K1I6Db+ZuQ6aXchKl9dlhSYKXLHgUshOn0VKnhuv9dz6i cLVT3WoYAmXr06V7Wl0fBHgrzlSvj8bM2fx+Fr8gYNt+47VFhKAUXvs5 X-Gm-Gg: Acq92OGc8X/UtTgvQNHz2VXeN7DUNjYyF1Jcrr8wb1SfeaZDRWS562TsqfgEmBKMIca qSWti2f2oi0LLIO/GAUwG4ECbcCnIxTYMojM2t5436z1ywj5byc4GCbolcTPX81/3CyqlO6u3pw FCQKStHDO1cNOpyPgZtX4d0BkBa/kcJWy/qduzaWuGfTTg6I8nOKNHFnifnZfx6BvL9BuGWnEQ2 Iu9Ma09285TXeiJtuBjm67XbHF55FbffDYY+fqT5RZLOmyIdYK2JUv9vufdcmXPgIa2kn3jU7tj 55LMyRIELy53WpECrNSdyMTtJHTdVe2Mo/9rHzipfBebnXdRSFG9Nn4Wo0YBUOsupYTXPN2JAwF 890fye4bN1MUsjFz61pcQ6qkd9kibHCthXm32rQO7F2BOjoAQj/VHjC8poZ5xXLPpAFfBKoAtB3 /Z745YAPUrf8tYLm29ab3wKSutDcOQQFE9yjQ+rw== X-Received: by 2002:a05:6a20:9185:b0:39c:cdb:5d78 with SMTP id adf61e73a8af0-3aa5ab713c6mr29960557637.36.1778579439147; Tue, 12 May 2026 02:50:39 -0700 (PDT) Received: from intel.company.local ([210.184.73.204]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-83965945101sm22238484b3a.13.2026.05.12.02.50.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 12 May 2026 02:50:38 -0700 (PDT) From: Wandun Chen X-Google-Original-From: Wandun Chen To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: akpm@linux-foundation.org, david@kernel.org, ljs@kernel.org, liam@infradead.org, vbabka@kernel.org, rppt@kernel.org, surenb@google.com, mhocko@suse.com Subject: [PATCH] mm/memory: avoid unnecessary #PF on mTHP allocation race Date: Tue, 12 May 2026 17:50:31 +0800 Message-ID: <20260512095031.1333997-1-chenwandun@lixiang.com> X-Mailer: git-send-email 2.43.0 Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" When an mTHP folio is allocated in do_anonymous_page() and the target pte range is not fully empty, current code would release the folio and return. This results an illusion that a page fault has already been processed even if the fact is vmf->address itself is still pte_none(). Another page fault will be triggered again. The race scenario as below, use 64KB mTHP for example, two threads of the same process, base page 4KB, range =3D [X, X + 64KB), X < Y < X + 64KB CPU 0 (writer, faults at X) CPU 1 (reader, faults at Y) -------------------------------- ----------------------------- do_anonymous_page() do_anonymous_page() alloc_anon_folio() pte_range_none(R) --> true vma_alloc_folio() --> 64KB pte_offset_map_lock(Y) install zero_pfn PTE at Y pte_unmap_unlock() pte_offset_map_lock(X) pte_range_none(R) -> false, Y is populated /* but pte at X is still none */ goto release return 0 In order to avoid this, check if vmf->address has been mapped, if not mapped, try alloc_anon_folio and subsequent operations again. On retry, alloc_anon_folio() re-checks pte_range_none() and falls back to a smaller order, so no infinite loop situation. Signed-off-by: Wandun Chen --- Reproducer (not included in the patch, available on request): two threads hammer the same 64K mTHP range, writer at offset 0, reader at offset 32K, per-round barrier, 1024 rounds. Minor faults before: writer=3D1951 reader=3D973 (927 extra faults) Minor faults after: writer=3D1024 reader=3D1022 I'm not sure if this situation often occurs in real workloads. --- mm/memory.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 0c9d9c2cbf0e..104f5be1de36 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -5339,10 +5339,12 @@ static vm_fault_t do_anonymous_page(struct vm_fault= *vmf) { struct vm_area_struct *vma =3D vmf->vma; unsigned long addr =3D vmf->address; + unsigned long fault_offset; struct folio *folio; vm_fault_t ret =3D 0; int nr_pages; pte_t entry; + bool should_retry =3D false; =20 /* File mapping without ->vm_ops ? */ if (vma->vm_flags & VM_SHARED) @@ -5389,6 +5391,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *= vmf) ret =3D vmf_anon_prepare(vmf); if (ret) return ret; +retry: /* Returns NULL on OOM or ERR_PTR(-EAGAIN) if we must retry the fault */ folio =3D alloc_anon_folio(vmf); if (IS_ERR(folio)) @@ -5413,14 +5416,26 @@ static vm_fault_t do_anonymous_page(struct vm_fault= *vmf) update_mmu_tlb(vma, addr, vmf->pte); goto release; } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) { - update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); - goto release; + fault_offset =3D (vmf->address - addr) >> PAGE_SHIFT; + if (!pte_none(ptep_get(vmf->pte + fault_offset))) { + update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages); + goto release; + } + + should_retry =3D true; } =20 ret =3D check_stable_address_space(vma->vm_mm); if (ret) goto release; =20 + if (should_retry) { + pte_unmap_unlock(vmf->pte, vmf->ptl); + folio_put(folio); + should_retry =3D false; + goto retry; + } + /* Deliver the page fault to userland, check inside PT lock */ if (userfaultfd_missing(vma)) { pte_unmap_unlock(vmf->pte, vmf->ptl); --=20 2.43.0