From nobody Wed Feb 11 07:50:13 2026 Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 150E53224 for ; Mon, 6 Jan 2025 03:17:29 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.214.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736133451; cv=none; b=i7KU7cd9Dy2ZtrwVnDo04zvZ20be4YQRqBFgbn1S/LQme+Awyc4dRHGoV1HH7llpzLAXBDXf2VHQLW0dI8xCLVDeLayOF/FTPWh9rdqSzxZUd3LZuXnqt7Hl+hAh3wof7VwghP8BktKAc7id+84x/ehoeZ2q/+P6MTbVNREy4XU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736133451; c=relaxed/simple; bh=JuX9sUjxhlLR+w4P5J+xhwIOfjgzGbmZzuuWbH6Adl8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=DsBJ80b5vavvUD+s4RiDwTpHCPzrYQVdjFUPcywscEddlSeJKYGhL4H1ciBSd35AjXPm5pVUXSBBdH5uLWQ+++yGT+kVE9H9iKtG/0FZe/Y39pS3rR0lrotclfInPny9Gi8Y72HjLNepdm2ecvWNBCaXM7wiTGNXxrsjWxEtK5c= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=iu13dI5U; arc=none smtp.client-ip=209.85.214.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="iu13dI5U" Received: by mail-pl1-f173.google.com with SMTP id d9443c01a7336-21654fdd5daso183736045ad.1 for ; Sun, 05 Jan 2025 19:17:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736133449; x=1736738249; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=IPcqXqaTIBo3x10L2Zu0olCr3NMK/mAmrBeyqDgRKcc=; b=iu13dI5UCdEObrOr463UjZwo75482MgXPG+yo/rtLh0FOW3eK/O4pGfuVqsVO5ke2h 6y6GtU0c7Jqg16LpsHv/7F4lai3Z6KcZNby+Nw4P35z0OUJKfh2nj3MVuQOuMoscS3zp nsRE8asKfPliITw81enwGOPSEsV9i4QvCqPXtLIZuYCKQC78VPtBhxwl/k5olQz/Cbk9 uJZOp5L8gYj2k39jHMDxRlJvqytC+AzcMk7BiV+tAlbmAPtiv1UKWNvKiWwp0oara6iT HNZGFE5Zs1UolINxC/pvIBwHG7HedDsDrMuUdqgp13GgS2lKZVY0+neuQHKxH4VhbNsU kXdQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736133449; x=1736738249; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=IPcqXqaTIBo3x10L2Zu0olCr3NMK/mAmrBeyqDgRKcc=; b=fKS3wlm6UqkYv8h1X7jzA7drJvuo1UPntnhxcgdX04vExSgpE/gbGGKw4dhR1cdAQO WvBPbjmkvIqZPsEKCjJUw4DrO7vHxg7Wv4UfGAxVEk13WOWL7umKMd7Cy80/6UO5X5P3 8GOWNmSuTsAPB9d0fTRDv6WB4TRf0VO7Ub/oHDPieEgc0LUr7MZB0xHABNsXj3534iw4 o/kRC5P7lwFGIc+U9Iaeq89NNRk9IHUf3UDsrFb85RNFCzvsTuI3UzGONmyd0a1Q0JL/ uZYQ25ClckHFl0Z5HnCJ6NJ30m0PkvRI4UNTNtCV2Ii76+YDAHiwsMu8IdVdXoisqnJf ANgA== X-Forwarded-Encrypted: i=1; AJvYcCVQMCx/tv1Tq0SWa0VxDvmjx2yVRqEE4jKqK/l1a8oIxlPnk1UUbM0cajkVjGRdYRE8Kd+yX5Xca9t9hZA=@vger.kernel.org X-Gm-Message-State: AOJu0YxxUJQtKVSlgZ5nlhxBS+6AguS7KqznLQKOpYrajB5QcKwOBQV3 rLJr5V/RqtZmSr6O1ojzj2iUfGUG/4OKIPkLyjO4LVlefMQzkz1k X-Gm-Gg: ASbGncvbNuWdR9rsA/HMmWasqvBl8+tzIPQkYaCooI7cj1wzV1DAXExEnjo3GY03Sml 98gz1YN8m15hPMyVOOXNhJ6RHWKbkDhAczBeHd60mQIOYqJAezffO5gGUR7oTyYTG9Nog1V6W1e 5PIQKFT+iebuWYgth4iPUiOaxmnxFsuhe039U9zzZf2K0VIeUXGpLc+vZqBfvBCx3us2GwShQJp Q+Il7rMOQU3ybFqQqg+fJnIGr+FObxV5mTc4bfJTz471ZGnhdzLZyoPakKD57AFJI8TwF7iPgqa t6rnt0rJ X-Google-Smtp-Source: AGHT+IGUcWPca8wiecs9NrdPulrIQkPrhd7bNDv5OvIdHp7U8jnY2abuTVpzr1KXpxDPDL37Vp5CVw== X-Received: by 2002:a17:902:e852:b0:216:386e:dbc with SMTP id d9443c01a7336-219e6ea1d22mr898673135ad.13.1736133449268; Sun, 05 Jan 2025 19:17:29 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:a54c:5ad3:ad27:edb7]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f2ed62cde6sm38471399a91.13.2025.01.05.19.17.24 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 05 Jan 2025 19:17:28 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org, x86@kernel.org, linux-kernel@vger.kernel.org, ioworker0@gmail.com, david@redhat.com, ryan.roberts@arm.com, zhengtangquan@oppo.com, ying.huang@intel.com, kasong@tencent.com, chrisl@kernel.org, baolin.wang@linux.alibaba.com, Barry Song Subject: [PATCH 1/3] mm: set folio swapbacked iff folios are dirty in try_to_unmap_one Date: Mon, 6 Jan 2025 16:17:09 +1300 Message-Id: <20250106031711.82855-2-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250106031711.82855-1-21cnbao@gmail.com> References: <20250106031711.82855-1-21cnbao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Barry Song The refcount may be temporarily or long-term increased, but this does not change the fundamental nature of the folio already being lazy- freed. Therefore, we only reset 'swapbacked' when we are certain the folio is dirty and not droppable. Suggested-by: David Hildenbrand Signed-off-by: Barry Song --- mm/rmap.c | 49 ++++++++++++++++++++++--------------------------- 1 file changed, 22 insertions(+), 27 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index c6c4d4ea29a7..de6b8c34e98c 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1868,34 +1868,29 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, */ smp_rmb(); =20 - /* - * The only page refs must be one from isolation - * plus the rmap(s) (dropped by discard:). - */ - if (ref_count =3D=3D 1 + map_count && - (!folio_test_dirty(folio) || - /* - * Unlike MADV_FREE mappings, VM_DROPPABLE - * ones can be dropped even if they've - * been dirtied. - */ - (vma->vm_flags & VM_DROPPABLE))) { - dec_mm_counter(mm, MM_ANONPAGES); - goto discard; - } - - /* - * If the folio was redirtied, it cannot be - * discarded. Remap the page to page table. - */ - set_pte_at(mm, address, pvmw.pte, pteval); - /* - * Unlike MADV_FREE mappings, VM_DROPPABLE ones - * never get swap backed on failure to drop. - */ - if (!(vma->vm_flags & VM_DROPPABLE)) + if (folio_test_dirty(folio) && !(vma->vm_flags & VM_DROPPABLE)) { + /* + * redirtied either using the page table or a previously + * obtained GUP reference. + */ + set_pte_at(mm, address, pvmw.pte, pteval); folio_set_swapbacked(folio); - goto walk_abort; + goto walk_abort; + } else if (ref_count !=3D 1 + map_count) { + /* + * Additional reference. Could be a GUP reference or any + * speculative reference. GUP users must mark the folio + * dirty if there was a modification. This folio cannot be + * reclaimed right now either way, so act just like nothing + * happened. + * We'll come back here later and detect if the folio was + * dirtied when the additional reference is gone. + */ + set_pte_at(mm, address, pvmw.pte, pteval); + goto walk_abort; + } + dec_mm_counter(mm, MM_ANONPAGES); + goto discard; } =20 if (swap_duplicate(entry) < 0) { --=20 2.39.3 (Apple Git-146) From nobody Wed Feb 11 07:50:13 2026 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A36153595E for ; Mon, 6 Jan 2025 03:17:41 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736133463; cv=none; b=HVK3gMEnFfWxdYkR9xqc7B478gnZq2EJ+aAf2ScqYj5abO+lMW2Hne/5FodaZ1Hqr3+Y9Ij65V1s2UFcSbyzR9R6NzU37r+LUJknxN0ncnjBDtQw0W3IrHN0enNj9D9irVmwzYQLtdrYXwLyT+ddomx1cFZKFZHEdVG1DYwqDmE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736133463; c=relaxed/simple; bh=BTjq0bcz/45ON7mTAKKZOF2c45zKGkP8z/bMBiGgXDU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=e9ePDGY0caixhcFfccAgwpoVN6bo+6fGTa5Yt/un+4qekr02K5p1kE5253S2/wvACglGK9eCz2DyCkw3pWCM5Zc8NV0tJXN77kCCg3E5qSIszmsIykxvQg+RdHXGMTfo0IO+a75+q3FL6vB3fnzVy+ODuwBBfo8sqJaF+aGL8RA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=OAySW6Qf; arc=none smtp.client-ip=209.85.216.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="OAySW6Qf" Received: by mail-pj1-f50.google.com with SMTP id 98e67ed59e1d1-2eeb4d643a5so20329490a91.3 for ; Sun, 05 Jan 2025 19:17:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736133461; x=1736738261; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=n3DoQzm517FFIjqDvouc3tzKgfaa0QgvxVLolob4ICo=; b=OAySW6QfDODFLJeEYmYJt+QwcjP1W56+w6O/6VXSpE2m+XRPAeo+v7qLTyHNb/VTjB e/2AFbo44zR9wo+WdmdCa9g249hetd1/KY1lQKeD3z5RM3n+iMBHKB8MwWb40HxXNLWC MQI2QWiA32xEhlX8iPcoxCeAoFaxZeFZho2RBa9Ge4TDnn9t1D/u1j2LR2dZPaiwe751 064c8c/cYqaEbkm3pngjKgboRsyuxeB8oqDLk1PMMTuLGckApxNidndICX8naoY+dICY zcUvSDDE0RJXn0Cvgcw8WjuiFy8BWjrzWTWCbFUmPrpEF/T5KIz3rwB0LG8TGMLYdume eTbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736133461; x=1736738261; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n3DoQzm517FFIjqDvouc3tzKgfaa0QgvxVLolob4ICo=; b=Ldvg2QR0PVrpg9iFQfNM2SsPaguwUiCtOke3xGzOGajLFdVVKEdA1WyqbPqWEzchED H5VzZ1lyEBO4Ml2wDHRKlJ704ytoReSJsHq4dPLHqn1DwM0rUlVke142y7Dov2pGtZku fuF15TegYKM7M6ty6evpwDalXwfdx6LxhpCc1nlBKo0v/tX+WjZmUoai8+t0Otkv9DqY AtRgS3um9gJNcJgoGSUCC6XXZ2l7Y5GsUBP9v7fcsVaQy0lvG0u7uPK16AAEyRmC/6Of xe3si9mYcAchc14jw4g/AQ9kkBV89K5/6PrREfqxFZWTAvWS1KAaMyTymgWJosQK9E6O CUtg== X-Forwarded-Encrypted: i=1; AJvYcCV1z81ery+4kgMehPp5tmU+UKliS3/vVnqajF4VS2OeeNOKrSmXkp+shz3MmUq2JgFn/RnUmK0ia9fGiWQ=@vger.kernel.org X-Gm-Message-State: AOJu0YyRMCeRwU/W/vX/bBGTK+c988HIq7LAdDQ13D+k2LLEmCnqzIz1 tlHaPxTvRDrMkH5fRjtLeE4C7zaQEb8f+GPcR9TrRC7cDrfH+l/d X-Gm-Gg: ASbGnctFMYs3QcY9fiAVAcsuSy7TUeHnxsYWmYXx1CizWWzrSprwXxl8K2uYDqrWqus ZLgbzgKfnKlN1tuUSx1L+ZxdMbONlIq+7obP3obz5eJ6OfFr0mAUowMhYRD0pbocl8KWi0xlY2e yKSSLpc6OVUukwcpnTfqt0wwp2Q/CWEpEJlAXrAIK9j7YBU2OeVLSfOkNvGpjiK4hR0zJsYNYqo Rqi0YnTk3R+QLUIPTvmJ3GNp4Nsaftlsy2jppXJrfZpf7Ts563W0g7P4U4tgw/ioVMNAABfhG/8 WNo1k6hc X-Google-Smtp-Source: AGHT+IFyfsnRq/TPlAXOEBq4tTLlzzCUGW+jGQSlThHUVYTr85agsMlhfIXWLMv/0CFjghXY79/T8g== X-Received: by 2002:a17:90a:e18b:b0:2ee:fdf3:390d with SMTP id 98e67ed59e1d1-2f452edc2cfmr70372123a91.31.1736133460760; Sun, 05 Jan 2025 19:17:40 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:a54c:5ad3:ad27:edb7]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f2ed62cde6sm38471399a91.13.2025.01.05.19.17.31 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 05 Jan 2025 19:17:40 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org, x86@kernel.org, linux-kernel@vger.kernel.org, ioworker0@gmail.com, david@redhat.com, ryan.roberts@arm.com, zhengtangquan@oppo.com, ying.huang@intel.com, kasong@tencent.com, chrisl@kernel.org, baolin.wang@linux.alibaba.com, Barry Song , Catalin Marinas , Will Deacon , Thomas Gleixner , Ingo Molnar , Borislav Petkov , Dave Hansen , "H. Peter Anvin" , Anshuman Khandual , Shaoqin Huang , Gavin Shan , Kefeng Wang , Mark Rutland , "Kirill A. Shutemov" , Yosry Ahmed Subject: [PATCH 2/3] mm: Support tlbbatch flush for a range of PTEs Date: Mon, 6 Jan 2025 16:17:10 +1300 Message-Id: <20250106031711.82855-3-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250106031711.82855-1-21cnbao@gmail.com> References: <20250106031711.82855-1-21cnbao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Barry Song This is a preparatory patch to support batch PTE unmapping in `try_to_unmap_one`. It first introduces range handling for `tlbbatch` flush. Currently, the range is always set to the size of PAGE_SIZE. Cc: Catalin Marinas Cc: Will Deacon Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Borislav Petkov Cc: Dave Hansen Cc: "H. Peter Anvin" Cc: Anshuman Khandual Cc: Ryan Roberts Cc: Shaoqin Huang Cc: Gavin Shan Cc: Kefeng Wang Cc: Mark Rutland Cc: David Hildenbrand Cc: Lance Yang Cc: "Kirill A. Shutemov" Cc: Yosry Ahmed Signed-off-by: Barry Song Tested-by: kernel test robot --- arch/arm64/include/asm/tlbflush.h | 26 ++++++++++++++------------ arch/arm64/mm/contpte.c | 2 +- arch/x86/include/asm/tlbflush.h | 3 ++- mm/rmap.c | 12 +++++++----- 4 files changed, 24 insertions(+), 19 deletions(-) diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlb= flush.h index bc94e036a26b..f34e4fab5aa2 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -322,13 +322,6 @@ static inline bool arch_tlbbatch_should_defer(struct m= m_struct *mm) return true; } =20 -static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_ba= tch *batch, - struct mm_struct *mm, - unsigned long uaddr) -{ - __flush_tlb_page_nosync(mm, uaddr); -} - /* * If mprotect/munmap/etc occurs during TLB batched flushing, we need to * synchronise all the TLBI issued with a DSB to avoid the race mentioned = in @@ -448,7 +441,7 @@ static inline bool __flush_tlb_range_limit_excess(unsig= ned long start, return false; } =20 -static inline void __flush_tlb_range_nosync(struct vm_area_struct *vma, +static inline void __flush_tlb_range_nosync(struct mm_struct *mm, unsigned long start, unsigned long end, unsigned long stride, bool last_level, int tlb_level) @@ -460,12 +453,12 @@ static inline void __flush_tlb_range_nosync(struct vm= _area_struct *vma, pages =3D (end - start) >> PAGE_SHIFT; =20 if (__flush_tlb_range_limit_excess(start, end, pages, stride)) { - flush_tlb_mm(vma->vm_mm); + flush_tlb_mm(mm); return; } =20 dsb(ishst); - asid =3D ASID(vma->vm_mm); + asid =3D ASID(mm); =20 if (last_level) __flush_tlb_range_op(vale1is, start, pages, stride, asid, @@ -474,7 +467,7 @@ static inline void __flush_tlb_range_nosync(struct vm_a= rea_struct *vma, __flush_tlb_range_op(vae1is, start, pages, stride, asid, tlb_level, true, lpa2_is_enabled()); =20 - mmu_notifier_arch_invalidate_secondary_tlbs(vma->vm_mm, start, end); + mmu_notifier_arch_invalidate_secondary_tlbs(mm, start, end); } =20 static inline void __flush_tlb_range(struct vm_area_struct *vma, @@ -482,7 +475,7 @@ static inline void __flush_tlb_range(struct vm_area_str= uct *vma, unsigned long stride, bool last_level, int tlb_level) { - __flush_tlb_range_nosync(vma, start, end, stride, + __flush_tlb_range_nosync(vma->vm_mm, start, end, stride, last_level, tlb_level); dsb(ish); } @@ -533,6 +526,15 @@ static inline void __flush_tlb_kernel_pgtable(unsigned= long kaddr) dsb(ish); isb(); } + +static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_ba= tch *batch, + struct mm_struct *mm, + unsigned long uaddr, + unsigned long size) +{ + __flush_tlb_range_nosync(mm, uaddr, uaddr + size, + PAGE_SIZE, true, 3); +} #endif =20 #endif diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c index 55107d27d3f8..bcac4f55f9c1 100644 --- a/arch/arm64/mm/contpte.c +++ b/arch/arm64/mm/contpte.c @@ -335,7 +335,7 @@ int contpte_ptep_clear_flush_young(struct vm_area_struc= t *vma, * eliding the trailing DSB applies here. */ addr =3D ALIGN_DOWN(addr, CONT_PTE_SIZE); - __flush_tlb_range_nosync(vma, addr, addr + CONT_PTE_SIZE, + __flush_tlb_range_nosync(vma->vm_mm, addr, addr + CONT_PTE_SIZE, PAGE_SIZE, true, 3); } =20 diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflus= h.h index 69e79fff41b8..cda35f53f544 100644 --- a/arch/x86/include/asm/tlbflush.h +++ b/arch/x86/include/asm/tlbflush.h @@ -279,7 +279,8 @@ static inline u64 inc_mm_tlb_gen(struct mm_struct *mm) =20 static inline void arch_tlbbatch_add_pending(struct arch_tlbflush_unmap_ba= tch *batch, struct mm_struct *mm, - unsigned long uaddr) + unsigned long uaddr, + unsignd long size) { inc_mm_tlb_gen(mm); cpumask_or(&batch->cpumask, &batch->cpumask, mm_cpumask(mm)); diff --git a/mm/rmap.c b/mm/rmap.c index de6b8c34e98c..365112af5291 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -672,7 +672,8 @@ void try_to_unmap_flush_dirty(void) (TLB_FLUSH_BATCH_PENDING_MASK / 2) =20 static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, - unsigned long uaddr) + unsigned long uaddr, + unsigned long size) { struct tlbflush_unmap_batch *tlb_ubc =3D ¤t->tlb_ubc; int batch; @@ -681,7 +682,7 @@ static void set_tlb_ubc_flush_pending(struct mm_struct = *mm, pte_t pteval, if (!pte_accessible(mm, pteval)) return; =20 - arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr); + arch_tlbbatch_add_pending(&tlb_ubc->arch, mm, uaddr, size); tlb_ubc->flush_required =3D true; =20 /* @@ -757,7 +758,8 @@ void flush_tlb_batched_pending(struct mm_struct *mm) } #else static void set_tlb_ubc_flush_pending(struct mm_struct *mm, pte_t pteval, - unsigned long uaddr) + unsigned long uaddr, + unsigned long size) { } =20 @@ -1792,7 +1794,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, */ pteval =3D ptep_get_and_clear(mm, address, pvmw.pte); =20 - set_tlb_ubc_flush_pending(mm, pteval, address); + set_tlb_ubc_flush_pending(mm, pteval, address, PAGE_SIZE); } else { pteval =3D ptep_clear_flush(vma, address, pvmw.pte); } @@ -2164,7 +2166,7 @@ static bool try_to_migrate_one(struct folio *folio, s= truct vm_area_struct *vma, */ pteval =3D ptep_get_and_clear(mm, address, pvmw.pte); =20 - set_tlb_ubc_flush_pending(mm, pteval, address); + set_tlb_ubc_flush_pending(mm, pteval, address, PAGE_SIZE); } else { pteval =3D ptep_clear_flush(vma, address, pvmw.pte); } --=20 2.39.3 (Apple Git-146) From nobody Wed Feb 11 07:50:13 2026 Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 4869C19067A for ; Mon, 6 Jan 2025 03:17:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.216.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736133470; cv=none; b=ZtaMrO0Vw1nIub2sh031k++pf+iRVozSitn0ycZt1ctk8rLF133nA7oBoN4G8h5uvBEAc3+f62qH3c+PehmDzIyuSLm1dxwicXM5dV4sG53C8ShKSUJPzlhG64+AGSpMAE9FzVqslDRpWDG4n/31UfUoAe6uATkI274woIFqYu4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1736133470; c=relaxed/simple; bh=a5sFfiFVXWrwmqa1OFRa8t+MyJBEd1TlvoEUn67IkxA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=P+pgwT8ccDbcuT1ok8+24i93xDJsd4uUlksZxR9YbXAVx04U6ClFv/pDjW1pQogvLEkss1jHIZ01MdvZHMZ+G0NOsfE/zR0Lne4MbkUl5m6eM4SF0mTFxQnVj5RsMNAyGuK6EponpOcxvNcHo6LBHuSwlEWIXQvDM8QbMmLi3g8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=eIpqZXle; arc=none smtp.client-ip=209.85.216.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eIpqZXle" Received: by mail-pj1-f50.google.com with SMTP id 98e67ed59e1d1-2ee989553c1so19834761a91.3 for ; Sun, 05 Jan 2025 19:17:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1736133467; x=1736738267; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=n0uoTlJkT3xJASppF2RyQQeqHMmR9kwA4pIBCxskX/8=; b=eIpqZXlevrfpHS3DiHaASbEhRXS3cwvhWknWb/NQF49J+docPDrGtiS98RX0eg8C4+ W4dAkM0CvbiLRTOFkHsaSfWM+lDM+V9iDlTeZd1vI6QLT4swvGiYMyj7I6mqNMWVHjr5 KYx6XgFsVJ6iI8PgzYO6lov9YtmtOplYcbRIC1czWeTAnAK0uXW3pqXlg60ct47LIO/L nYIgPMHa2SMdDjgaJNZgGg1LHfUiwPt+zSOLJBF9yIMTaPYb4wbhGM3LysCh03UkygK2 A+5hSHDQssxyLhr4dNYXR2Tma6Sp5VbJXrXMI4QjsHzpglZtO8S/vsgoNB+TMIAQyvJD 2pig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1736133467; x=1736738267; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=n0uoTlJkT3xJASppF2RyQQeqHMmR9kwA4pIBCxskX/8=; b=q3287XC1Rmy90iuANmxPtKSoFLkmNOi+I+pmROJbJJWLhjlF4tYm2Su10BJIDKpKPG 0VEXzAk51knWT6mAxUdWprAJM3RCtgTz3YgdLuzSt2WXjkYT+IpfpZwyca/aASHbicO/ GDyTcx+roGYwjMxEyYeTJcMmvP3o7xvrxnpJQ8dAJK7T3Zdby7ThEhvO/yX99hpntTQz lGV4Gwebim/0pioAeNbXkrKef2f6Wv/Ywp8OZ7NS+SLhxNDg9I9wXS+CtT88U3EW2Ggq dBJyYvOxjGo3OCmzD6EbDOSosHGJ6lqychTnKU1ccy4JP2NJRABik2QMrbPAv/YaQwCd 3P5g== X-Forwarded-Encrypted: i=1; AJvYcCXpuNJXbWRn9tGwLzFk4PyUNoVcPycawr+HIhxXJsVTrkn1I6mk2Zh1NiXwEQCzaBZJg3O3t9Y/Xj3pIaY=@vger.kernel.org X-Gm-Message-State: AOJu0YyLSjDpSFeoughKoyHCD6buU/G2nxFfgMpNyOU5C28PYcFKyWAf nhGOsUZ6pXlKoByIoJJnsZhw19x5Ez/5J/gpbeq0YCucZ7FdEQKu X-Gm-Gg: ASbGncvY+7bn4EO9eVFBYEh646yB85cNXIjRfQWgspNrs0KjymLUowE9HuXC1Y3HNEH wZfFpOjG64XuXuHXAdD8MylYjAunlkzdgrBNP4oY70RDa6SRl7IzEmMvXxdPvXFn68gxQCMcmGI xSzgA70FtUmwBME+H0xCmxKX+/am6Lg9ZgUJGTtIIsMMpQ5i2VfJYD1MdPB6wHtxjau/rYw8QkL GqTFJWl+n9Hz5lEbpAXq+JTYSYuZEV5Zhrn2fd33w17AaJ+l1t58jasaQTvDC7aBzdx0xa6ryyM 46htXGs8 X-Google-Smtp-Source: AGHT+IHWP7zdolqMJNXBm1PXeI0GcejV+nCAy9WuJHL0SUoSVOj1myQWg2/Wp3YfIF+X1TNQvseTgA== X-Received: by 2002:a17:90b:540f:b0:2ee:6db1:21d3 with SMTP id 98e67ed59e1d1-2f452ec922bmr79065291a91.25.1736133467432; Sun, 05 Jan 2025 19:17:47 -0800 (PST) Received: from Barrys-MBP.hub ([2407:7000:af65:8200:a54c:5ad3:ad27:edb7]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2f2ed62cde6sm38471399a91.13.2025.01.05.19.17.42 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Sun, 05 Jan 2025 19:17:47 -0800 (PST) From: Barry Song <21cnbao@gmail.com> To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: linux-arm-kernel@lists.infradead.org, x86@kernel.org, linux-kernel@vger.kernel.org, ioworker0@gmail.com, david@redhat.com, ryan.roberts@arm.com, zhengtangquan@oppo.com, ying.huang@intel.com, kasong@tencent.com, chrisl@kernel.org, baolin.wang@linux.alibaba.com, Barry Song Subject: [PATCH 3/3] mm: Support batched unmap for lazyfree large folios during reclamation Date: Mon, 6 Jan 2025 16:17:11 +1300 Message-Id: <20250106031711.82855-4-21cnbao@gmail.com> X-Mailer: git-send-email 2.39.3 (Apple Git-146) In-Reply-To: <20250106031711.82855-1-21cnbao@gmail.com> References: <20250106031711.82855-1-21cnbao@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Barry Song Currently, the PTEs and rmap of a large folio are removed one at a time. This is not only slow but also causes the large folio to be unnecessarily added to deferred_split, which can lead to races between the deferred_split shrinker callback and memory reclamation. This patch releases all PTEs and rmap entries in a batch. Currently, it only handles lazyfree large folios. The below microbench tries to reclaim 128MB lazyfree large folios whose sizes are 64KiB: #include #include #include #include #define SIZE 128*1024*1024 // 128 MB unsigned long read_split_deferred() { FILE *file =3D fopen("/sys/kernel/mm/transparent_hugepage" "/hugepages-64kB/stats/split_deferred", "r"); if (!file) { perror("Error opening file"); return 0; } unsigned long value; if (fscanf(file, "%lu", &value) !=3D 1) { perror("Error reading value"); fclose(file); return 0; } fclose(file); return value; } int main(int argc, char *argv[]) { while(1) { volatile int *p =3D mmap(0, SIZE, PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); memset((void *)p, 1, SIZE); madvise((void *)p, SIZE, MADV_FREE); clock_t start_time =3D clock(); unsigned long start_split =3D read_split_deferred(); madvise((void *)p, SIZE, MADV_PAGEOUT); clock_t end_time =3D clock(); unsigned long end_split =3D read_split_deferred(); double elapsed_time =3D (double)(end_time - start_time) / CLOCKS_PER_SEC; printf("Time taken by reclamation: %f seconds, split_deferred: %ld\n", elapsed_time, end_split - start_split); munmap((void *)p, SIZE); } return 0; } w/o patch: ~ # ./a.out Time taken by reclamation: 0.177418 seconds, split_deferred: 2048 Time taken by reclamation: 0.178348 seconds, split_deferred: 2048 Time taken by reclamation: 0.174525 seconds, split_deferred: 2048 Time taken by reclamation: 0.171620 seconds, split_deferred: 2048 Time taken by reclamation: 0.172241 seconds, split_deferred: 2048 Time taken by reclamation: 0.174003 seconds, split_deferred: 2048 Time taken by reclamation: 0.171058 seconds, split_deferred: 2048 Time taken by reclamation: 0.171993 seconds, split_deferred: 2048 Time taken by reclamation: 0.169829 seconds, split_deferred: 2048 Time taken by reclamation: 0.172895 seconds, split_deferred: 2048 Time taken by reclamation: 0.176063 seconds, split_deferred: 2048 Time taken by reclamation: 0.172568 seconds, split_deferred: 2048 Time taken by reclamation: 0.171185 seconds, split_deferred: 2048 Time taken by reclamation: 0.170632 seconds, split_deferred: 2048 Time taken by reclamation: 0.170208 seconds, split_deferred: 2048 Time taken by reclamation: 0.174192 seconds, split_deferred: 2048 ... w/ patch: ~ # ./a.out Time taken by reclamation: 0.074231 seconds, split_deferred: 0 Time taken by reclamation: 0.071026 seconds, split_deferred: 0 Time taken by reclamation: 0.072029 seconds, split_deferred: 0 Time taken by reclamation: 0.071873 seconds, split_deferred: 0 Time taken by reclamation: 0.073573 seconds, split_deferred: 0 Time taken by reclamation: 0.071906 seconds, split_deferred: 0 Time taken by reclamation: 0.073604 seconds, split_deferred: 0 Time taken by reclamation: 0.075903 seconds, split_deferred: 0 Time taken by reclamation: 0.073191 seconds, split_deferred: 0 Time taken by reclamation: 0.071228 seconds, split_deferred: 0 Time taken by reclamation: 0.071391 seconds, split_deferred: 0 Time taken by reclamation: 0.071468 seconds, split_deferred: 0 Time taken by reclamation: 0.071896 seconds, split_deferred: 0 Time taken by reclamation: 0.072508 seconds, split_deferred: 0 Time taken by reclamation: 0.071884 seconds, split_deferred: 0 Time taken by reclamation: 0.072433 seconds, split_deferred: 0 Time taken by reclamation: 0.071939 seconds, split_deferred: 0 ... Signed-off-by: Barry Song --- mm/rmap.c | 48 ++++++++++++++++++++++++++++++++++++++++++------ 1 file changed, 42 insertions(+), 6 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index 365112af5291..9424b96f8482 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1642,6 +1642,27 @@ void folio_remove_rmap_pmd(struct folio *folio, stru= ct page *page, #endif } =20 +/* We support batch unmapping of PTEs for lazyfree large folios */ +static inline bool can_batch_unmap_folio_ptes(unsigned long addr, + struct folio *folio, pte_t *ptep) +{ + const fpb_t fpb_flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; + int max_nr =3D folio_nr_pages(folio); + pte_t pte =3D ptep_get(ptep); + + if (pte_none(pte)) + return false; + if (!pte_present(pte)) + return false; + if (!folio_test_anon(folio)) + return false; + if (folio_test_swapbacked(folio)) + return false; + + return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL, + NULL, NULL) =3D=3D max_nr; +} + /* * @arg: enum ttu_flags will be passed to this argument */ @@ -1655,6 +1676,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, bool anon_exclusive, ret =3D true; struct mmu_notifier_range range; enum ttu_flags flags =3D (enum ttu_flags)(long)arg; + int nr_pages =3D 1; unsigned long pfn; unsigned long hsz =3D 0; =20 @@ -1780,6 +1802,15 @@ static bool try_to_unmap_one(struct folio *folio, st= ruct vm_area_struct *vma, hugetlb_vma_unlock_write(vma); } pteval =3D huge_ptep_clear_flush(vma, address, pvmw.pte); + } else if (folio_test_large(folio) && + can_batch_unmap_folio_ptes(address, folio, pvmw.pte)) { + nr_pages =3D folio_nr_pages(folio); + flush_cache_range(vma, range.start, range.end); + pteval =3D get_and_clear_full_ptes(mm, address, pvmw.pte, nr_pages, 0); + if (should_defer_flush(mm, flags)) + set_tlb_ubc_flush_pending(mm, pteval, address, folio_size(folio)); + else + flush_tlb_range(vma, range.start, range.end); } else { flush_cache_page(vma, address, pfn); /* Nuke the page table entry. */ @@ -1875,7 +1906,7 @@ static bool try_to_unmap_one(struct folio *folio, str= uct vm_area_struct *vma, * redirtied either using the page table or a previously * obtained GUP reference. */ - set_pte_at(mm, address, pvmw.pte, pteval); + set_ptes(mm, address, pvmw.pte, pteval, nr_pages); folio_set_swapbacked(folio); goto walk_abort; } else if (ref_count !=3D 1 + map_count) { @@ -1888,10 +1919,10 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, * We'll come back here later and detect if the folio was * dirtied when the additional reference is gone. */ - set_pte_at(mm, address, pvmw.pte, pteval); + set_ptes(mm, address, pvmw.pte, pteval, nr_pages); goto walk_abort; } - dec_mm_counter(mm, MM_ANONPAGES); + add_mm_counter(mm, MM_ANONPAGES, -nr_pages); goto discard; } =20 @@ -1943,13 +1974,18 @@ static bool try_to_unmap_one(struct folio *folio, s= truct vm_area_struct *vma, dec_mm_counter(mm, mm_counter_file(folio)); } discard: - if (unlikely(folio_test_hugetlb(folio))) + if (unlikely(folio_test_hugetlb(folio))) { hugetlb_remove_rmap(folio); - else - folio_remove_rmap_pte(folio, subpage, vma); + } else { + folio_remove_rmap_ptes(folio, subpage, nr_pages, vma); + folio_ref_sub(folio, nr_pages - 1); + } if (vma->vm_flags & VM_LOCKED) mlock_drain_local(); folio_put(folio); + /* We have already batched the entire folio */ + if (nr_pages > 1) + goto walk_done; continue; walk_abort: ret =3D false; --=20 2.39.3 (Apple Git-146)