From nobody Wed Oct 8 04:10:18 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2C6ED2571BD for ; Wed, 2 Jul 2025 10:49:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751453378; cv=none; b=eM2hL3C2o0ipw4faphaUFLmTvr63xqpHefN1kFA69e8FpJYEJzyMuWg65YAr7FyEmZTs5KGxngkF7eGu62karJ+c5CpQtMqETPIlPgJM6i0ZjbDZiahGX7UulMWg10EN1a6QiK48orjfJC+l1M1/8qcXBVzDDSehkEy+PH+gbe4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751453378; c=relaxed/simple; bh=EB1ODOYm0doluo9SmRJUpo62bm3CObYhBVAFw5pWSRQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=YyVsANu1Z1Y/EngH0cfuGB9nyXnWVBkv6Wgs69sgRZqTim5WbbKQ5HOEuG4r5DASCioWIEg3zQDB3Pu7UT43EsLIhmwznDzDdTUKdt+O4wPSuELVCPneOAZbV6p5u8Upb4xpe3dZjavRDxfmpDg4yM2jN6a/nkF/s8Hz7ywDsgE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=YHPsLoFw; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="YHPsLoFw" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751453375; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/J+W0F15YhPX18Zq/VlVVl9dXxQqgtke2+o/p/5jgRs=; b=YHPsLoFwRqg5/FZRTqmKEQhc+ZygMKwoRZ1IX0mhIXWGSFGrDl40fldGw2jRAa6pLppd5+ 6EunFqHHtz0iIWeMU+S7uiktVm38hioq6QTxHky5DBgQIEsJtDEityQYekJyGyqtmxinYW jO0bn9nse48SbhkdeQ5+X7HatPlkcwk= Received: from mail-wm1-f69.google.com (mail-wm1-f69.google.com [209.85.128.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-382-TTb8MWACPn6W3DiVRtmCNQ-1; Wed, 02 Jul 2025 06:49:33 -0400 X-MC-Unique: TTb8MWACPn6W3DiVRtmCNQ-1 X-Mimecast-MFC-AGG-ID: TTb8MWACPn6W3DiVRtmCNQ_1751453372 Received: by mail-wm1-f69.google.com with SMTP id 5b1f17b1804b1-45320bfc18dso33580395e9.1 for ; Wed, 02 Jul 2025 03:49:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751453371; x=1752058171; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/J+W0F15YhPX18Zq/VlVVl9dXxQqgtke2+o/p/5jgRs=; b=aULxQQ0iHTpiOUocEezEQHQdr5WEt64wmikfhMagMmGKeWWhoyJ5jtIBd3uvK//Tur 3JpsHZr220BfkSCUdSl8/7wA+OvwzITEIeWgNvKdHPPHCTwGDZDGd+7xlpvtnKA9oXnD nRvUWrSHqYvOME2qqUjezThuyiNt7vdM37uKxuIDm9Xbok1K7l3Jal3TmxXMm+g0WkTY 8N1SS2vkL1Q4xVwZYa7tdRo15yDa3U8wt5ZUA9gIGRV+gIqieUUPH9X/1ftd+MC449fu SgQmXgT9mi6NIk4rfHhT35GOkGxghwcX5XG5dkzRaJQfaEGjIquZpta/VbqGj+XcWdAk /rYw== X-Gm-Message-State: AOJu0YwYOrISEg6gov64POhVFxDFfqG4Gs4JBqAfyvhIonRr2jzfyk8N s7uumDYt+bvkeJOI7LeK34/0tBzzMXFhXMOkeRuPSM6NNt9hdZPkvnbX6Bw0IpCdsJD9eczgIk7 98qGudXXvYdySrJJ8VkViKSVJ58ia/bUPh5w88CHmngK6GQRlf+f5M4ppagutBUUY8yWCmWy39a GwcbdstqSWSKrNvtKZZ9q2YmYvGnk/RMKtDdj+QwtJ4U9THexx X-Gm-Gg: ASbGncssHaYc8GQKyC14jhYp/ms73bR/FIkrvhVebNC3LH/kra78Do7RMlYErZ8ATLp 4G/iLkkh643CPzqJAl7QK4nfz06yde0bvs9rar4RiEz+2Swa31IktYNUSjfqP/Yfu4caREf7vke YO4AyQY9oZUOeQc6ynIkwIhkvyrGMG4qEHnjteCfZWD2OM0FhQEDdTCYYZzdC4ECSTeuq30Byuk F38eN1iK/HnJT06c0pzmsj+gzaNI4M2HH5DP9GuQi7G/dgaSyvuFm6vCOimSDeomSyjZF05LjAd mp+i1yaEg6VnSSEW X-Received: by 2002:a05:600c:c176:b0:442:f4a3:a2c0 with SMTP id 5b1f17b1804b1-454a41609efmr20817315e9.13.1751453371424; Wed, 02 Jul 2025 03:49:31 -0700 (PDT) X-Google-Smtp-Source: AGHT+IEPxIF24NU0cgA7LgQoEvYVTqVBLrm0QWneoTQ4N5UtqQayGKXh6ikNCsFoGZpvs78Obi1muw== X-Received: by 2002:a05:600c:c176:b0:442:f4a3:a2c0 with SMTP id 5b1f17b1804b1-454a41609efmr20816655e9.13.1751453370775; Wed, 02 Jul 2025 03:49:30 -0700 (PDT) Received: from localhost ([2a09:80c0:192:0:5dac:bf3d:c41:c3e7]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-3a892e52ebcsm15430008f8f.46.2025.07.02.03.49.29 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 02 Jul 2025 03:49:30 -0700 (PDT) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Zi Yan , Matthew Brost , Joshua Hahn , Rakie Kim , Byungchul Park , Gregory Price , Ying Huang , Alistair Popple , Pedro Falcato , Rik van Riel , Harry Yoo , Lance Yang , Oscar Salvador , Lance Yang Subject: [PATCH v2 1/4] mm: convert FPB_IGNORE_* into FPB_RESPECT_* Date: Wed, 2 Jul 2025 12:49:23 +0200 Message-ID: <20250702104926.212243-2-david@redhat.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250702104926.212243-1-david@redhat.com> References: <20250702104926.212243-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Respecting these PTE bits is the exception, so let's invert the meaning. With this change, most callers don't have to pass any flags. This is a preparation for splitting folio_pte_batch() into a non-inlined variant that doesn't consume any flags. Long-term, we want folio_pte_batch() to probably ignore most common PTE bits (e.g., write/dirty/young/soft-dirty) that are not relevant for most page table walkers: uffd-wp and protnone might be bits to consider in the future. Only walkers that care about them can opt-in to respect them. No functional change intended. Reviewed-by: Lance Yang Reviewed-by: Zi Yan Reviewed-by: Oscar Salvador Signed-off-by: David Hildenbrand Reviewed-by: Dev Jain --- mm/internal.h | 16 ++++++++-------- mm/madvise.c | 3 +-- mm/memory.c | 11 +++++------ mm/mempolicy.c | 4 +--- mm/mlock.c | 3 +-- mm/mremap.c | 3 +-- mm/rmap.c | 3 +-- 7 files changed, 18 insertions(+), 25 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index e84217e27778d..170d55b6851ff 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -202,17 +202,17 @@ static inline void vma_close(struct vm_area_struct *v= ma) /* Flags for folio_pte_batch(). */ typedef int __bitwise fpb_t; =20 -/* Compare PTEs after pte_mkclean(), ignoring the dirty bit. */ -#define FPB_IGNORE_DIRTY ((__force fpb_t)BIT(0)) +/* Compare PTEs respecting the dirty bit. */ +#define FPB_RESPECT_DIRTY ((__force fpb_t)BIT(0)) =20 -/* Compare PTEs after pte_clear_soft_dirty(), ignoring the soft-dirty bit.= */ -#define FPB_IGNORE_SOFT_DIRTY ((__force fpb_t)BIT(1)) +/* Compare PTEs respecting the soft-dirty bit. */ +#define FPB_RESPECT_SOFT_DIRTY ((__force fpb_t)BIT(1)) =20 static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags) { - if (flags & FPB_IGNORE_DIRTY) + if (!(flags & FPB_RESPECT_DIRTY)) pte =3D pte_mkclean(pte); - if (likely(flags & FPB_IGNORE_SOFT_DIRTY)) + if (likely(!(flags & FPB_RESPECT_SOFT_DIRTY))) pte =3D pte_clear_soft_dirty(pte); return pte_wrprotect(pte_mkold(pte)); } @@ -236,8 +236,8 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte= , fpb_t flags) * pages of the same large folio. * * All PTEs inside a PTE batch have the same PTE bits set, excluding the P= FN, - * the accessed bit, writable bit, dirty bit (with FPB_IGNORE_DIRTY) and - * soft-dirty bit (with FPB_IGNORE_SOFT_DIRTY). + * the accessed bit, writable bit, dirty bit (unless FPB_RESPECT_DIRTY is = set) + * and soft-dirty bit (unless FPB_RESPECT_SOFT_DIRTY is set). * * start_ptep must map any page of the folio. max_nr must be at least one = and * must be limited by the caller so scanning cannot exceed a single page t= able. diff --git a/mm/madvise.c b/mm/madvise.c index e61e32b2cd91f..661bb743d2216 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -347,10 +347,9 @@ static inline int madvise_folio_pte_batch(unsigned lon= g addr, unsigned long end, pte_t pte, bool *any_young, bool *any_dirty) { - const fpb_t fpb_flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; int max_nr =3D (end - addr) / PAGE_SIZE; =20 - return folio_pte_batch(folio, addr, ptep, pte, max_nr, fpb_flags, NULL, + return folio_pte_batch(folio, addr, ptep, pte, max_nr, 0, NULL, any_young, any_dirty); } =20 diff --git a/mm/memory.c b/mm/memory.c index 0f9b32a20e5b7..0a47583ca9937 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -990,10 +990,10 @@ copy_present_ptes(struct vm_area_struct *dst_vma, str= uct vm_area_struct *src_vma * by keeping the batching logic separate. */ if (unlikely(!*prealloc && folio_test_large(folio) && max_nr !=3D 1)) { - if (src_vma->vm_flags & VM_SHARED) - flags |=3D FPB_IGNORE_DIRTY; - if (!vma_soft_dirty_enabled(src_vma)) - flags |=3D FPB_IGNORE_SOFT_DIRTY; + if (!(src_vma->vm_flags & VM_SHARED)) + flags |=3D FPB_RESPECT_DIRTY; + if (vma_soft_dirty_enabled(src_vma)) + flags |=3D FPB_RESPECT_SOFT_DIRTY; =20 nr =3D folio_pte_batch(folio, addr, src_pte, pte, max_nr, flags, &any_writable, NULL, NULL); @@ -1535,7 +1535,6 @@ static inline int zap_present_ptes(struct mmu_gather = *tlb, struct zap_details *details, int *rss, bool *force_flush, bool *force_break, bool *any_skipped) { - const fpb_t fpb_flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; struct mm_struct *mm =3D tlb->mm; struct folio *folio; struct page *page; @@ -1565,7 +1564,7 @@ static inline int zap_present_ptes(struct mmu_gather = *tlb, * by keeping the batching logic separate. */ if (unlikely(folio_test_large(folio) && max_nr !=3D 1)) { - nr =3D folio_pte_batch(folio, addr, pte, ptent, max_nr, fpb_flags, + nr =3D folio_pte_batch(folio, addr, pte, ptent, max_nr, 0, NULL, NULL, NULL); =20 zap_present_folio_ptes(tlb, vma, folio, page, pte, ptent, nr, diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 1ff7b2174eb77..2a25eedc3b1c0 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -675,7 +675,6 @@ static void queue_folios_pmd(pmd_t *pmd, struct mm_walk= *walk) static int queue_folios_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) { - const fpb_t fpb_flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; struct vm_area_struct *vma =3D walk->vma; struct folio *folio; struct queue_pages *qp =3D walk->private; @@ -713,8 +712,7 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned = long addr, continue; if (folio_test_large(folio) && max_nr !=3D 1) nr =3D folio_pte_batch(folio, addr, pte, ptent, - max_nr, fpb_flags, - NULL, NULL, NULL); + max_nr, 0, NULL, NULL, NULL); /* * vm_normal_folio() filters out zero pages, but there might * still be reserved folios to skip, perhaps in a VDSO. diff --git a/mm/mlock.c b/mm/mlock.c index 3cb72b579ffd3..2238cdc5eb1c1 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -307,14 +307,13 @@ void munlock_folio(struct folio *folio) static inline unsigned int folio_mlock_step(struct folio *folio, pte_t *pte, unsigned long addr, unsigned long end) { - const fpb_t fpb_flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; unsigned int count =3D (end - addr) >> PAGE_SHIFT; pte_t ptent =3D ptep_get(pte); =20 if (!folio_test_large(folio)) return 1; =20 - return folio_pte_batch(folio, addr, pte, ptent, count, fpb_flags, NULL, + return folio_pte_batch(folio, addr, pte, ptent, count, 0, NULL, NULL, NULL); } =20 diff --git a/mm/mremap.c b/mm/mremap.c index 36585041c760d..d4d3ffc931502 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -173,7 +173,6 @@ static pte_t move_soft_dirty_pte(pte_t pte) static int mremap_folio_pte_batch(struct vm_area_struct *vma, unsigned lon= g addr, pte_t *ptep, pte_t pte, int max_nr) { - const fpb_t flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; struct folio *folio; =20 if (max_nr =3D=3D 1) @@ -183,7 +182,7 @@ static int mremap_folio_pte_batch(struct vm_area_struct= *vma, unsigned long addr if (!folio || !folio_test_large(folio)) return 1; =20 - return folio_pte_batch(folio, addr, ptep, pte, max_nr, flags, NULL, + return folio_pte_batch(folio, addr, ptep, pte, max_nr, 0, NULL, NULL, NULL); } =20 diff --git a/mm/rmap.c b/mm/rmap.c index 34311f654d0c2..98ed3e49e8bbe 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1849,7 +1849,6 @@ static inline unsigned int folio_unmap_pte_batch(stru= ct folio *folio, struct page_vma_mapped_walk *pvmw, enum ttu_flags flags, pte_t pte) { - const fpb_t fpb_flags =3D FPB_IGNORE_DIRTY | FPB_IGNORE_SOFT_DIRTY; unsigned long end_addr, addr =3D pvmw->address; struct vm_area_struct *vma =3D pvmw->vma; unsigned int max_nr; @@ -1869,7 +1868,7 @@ static inline unsigned int folio_unmap_pte_batch(stru= ct folio *folio, if (pte_unused(pte)) return 1; =20 - return folio_pte_batch(folio, addr, pvmw->pte, pte, max_nr, fpb_flags, + return folio_pte_batch(folio, addr, pvmw->pte, pte, max_nr, 0, NULL, NULL, NULL); } =20 --=20 2.49.0 From nobody Wed Oct 8 04:10:18 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 993012571D4 for ; Wed, 2 Jul 2025 10:49:37 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751453379; cv=none; b=t3252SmwaGwZ+K3U8p6f7m+0U8HszjneN0MhcAiP4vMp6PBPrRszMLRQf7aRsd6EYu0IFXSbjv9tsf1tt7YfbODeUExPHTBdb8SYOAjEY/NidGzfmRYp8qdb6Z6MDL3iICW6UlQdewZvkomC1yYim8TMBhcC/LITLo1gxVg9Qgc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751453379; c=relaxed/simple; bh=mbpxhWXUwl1lWdA+zRmdbhqQLXQzQ8VkXVmBdJJDKwg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=A22BVLKhyql3we1ohN+FeLCUYP1HnRPJFQMpWK3Ue+tWonRXAagCGIs/3Pt/+AcGZCaPKM3/NE/dIXrBxxrETNlXMk452gNblK5urC7ZOKYSnvnD/pzRUHaLhjGsi/OUPBGjOYkczk/FBNC8mvwPqlS8sK/CWGe9NIyV78r3aKg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=N4KSuoON; arc=none smtp.client-ip=170.10.129.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="N4KSuoON" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751453376; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VPISzlN4Ef+CieamPDhRbk3MzYno0ZPxQFyByj0IeiE=; b=N4KSuoONw+0+Mhq4nHRO35wE9LmU4aRolMRm2YYuL4dzBOs5fjzmB5SU0V6wouJiZptdtP zmz0+JZLMSaiJDzZno0ee0xxSxCbfrUBxKEE9y5yBn/B/ckA2aPaQAPajZRezyJ+BvKEUp wkcdDgbzbZb+b4XWiKtypPdq76v9C3c= Received: from mail-wm1-f70.google.com (mail-wm1-f70.google.com [209.85.128.70]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-59-0ZTNDZsmNtSWSUrM_BCQuA-1; Wed, 02 Jul 2025 06:49:35 -0400 X-MC-Unique: 0ZTNDZsmNtSWSUrM_BCQuA-1 X-Mimecast-MFC-AGG-ID: 0ZTNDZsmNtSWSUrM_BCQuA_1751453374 Received: by mail-wm1-f70.google.com with SMTP id 5b1f17b1804b1-4537f56ab74so29915765e9.3 for ; Wed, 02 Jul 2025 03:49:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751453374; x=1752058174; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=VPISzlN4Ef+CieamPDhRbk3MzYno0ZPxQFyByj0IeiE=; b=o4rA3+Mx1xtZQGvWBYJQ0XruC9SrFKCSGNOvGdOel5U3dJte7PluUvBszTO5mROmV0 bN16wQfA0m/kSDel6hA2brIsliglTJ4sy6Z1V/UQAHrrv+/ER2tHKDuXeyIc2GKV5NiP 9rslvadnjWBx7D1y3GT4/m6DxOsYK9gKDitZX9LwkzoOXNbiPdLlT7VFvpDRFWXZIG2/ vOzAQiaPepJFaQ71+gDbeYlI00fPyqiMhFhLUTKDkOviYpMXfKIac2rRuUyMM0LeT5nu r/TCBVxUiDw4qnYJhSi+ADeI3LjktKQEmOVKroKuffBI4W1VC8K3SvA+/fTfm4nQe1fE 6hjg== X-Gm-Message-State: AOJu0YzkcbXKCYRFLlfl7rZ+fj7kb78TYpnjPPZN48GsfSgZmnB/FIw3 inN6zcnUT3ssZq7mPl4mKtfIMGDB2OveJW1i6bj73Y9SI88elIupBzUyPvqapvShZhqoStmvvjh V5/QvxQixRiyLuqxivySzd92j5scLL7bDYTeCfYU9CeQwsDlGRuBJk3oLdP+kpfJoNzTw1IjaT+ m2sOg55uZx+ViCD4w0dArlCGM2Spcy9WiesAQXXbyxFrABTYmm X-Gm-Gg: ASbGncv7iKALfRxnXFtkvMgJtwxgHMDQRULQv5vfGy+0deLQZQrv0WzCQusr9+jAOjd emM5s09Ejfrf7o/GnMBPEvGzCmOj3xc4SGPgvc2XDZduamKPwISd1P2+YMOxfJ7RZhQyrGq0rZi JAYF73kgi26wVgCb3x+G8HNWlbVJhxI1fne42WTO5OztrK7je+8+4d7XzcQ0v5A7vPd/pXX93uq 20AshWCbFT1a/vyERStquk3hIJ0e9xNF4ixbzKWfaHYz/N9OrmddS236ieYY/sNeZmujC/09NCm YglZCPaSc84aF/Ll X-Received: by 2002:a05:600c:6992:b0:453:8a63:b484 with SMTP id 5b1f17b1804b1-454a373837cmr28268995e9.30.1751453373768; Wed, 02 Jul 2025 03:49:33 -0700 (PDT) X-Google-Smtp-Source: AGHT+IHHQPoDIN7PwQ921rzZl2n7gXmku1+dhq1oXexaGTT/h4Oq5IhCFDNuSHNV22agwohZ0LPOxg== X-Received: by 2002:a05:600c:6992:b0:453:8a63:b484 with SMTP id 5b1f17b1804b1-454a373837cmr28268245e9.30.1751453372813; Wed, 02 Jul 2025 03:49:32 -0700 (PDT) Received: from localhost ([2a09:80c0:192:0:5dac:bf3d:c41:c3e7]) by smtp.gmail.com with UTF8SMTPSA id 5b1f17b1804b1-453823b6d50sm227736465e9.30.2025.07.02.03.49.31 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 02 Jul 2025 03:49:32 -0700 (PDT) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Zi Yan , Matthew Brost , Joshua Hahn , Rakie Kim , Byungchul Park , Gregory Price , Ying Huang , Alistair Popple , Pedro Falcato , Rik van Riel , Harry Yoo , Lance Yang , Oscar Salvador , Lance Yang Subject: [PATCH v2 2/4] mm: smaller folio_pte_batch() improvements Date: Wed, 2 Jul 2025 12:49:24 +0200 Message-ID: <20250702104926.212243-3-david@redhat.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250702104926.212243-1-david@redhat.com> References: <20250702104926.212243-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Let's clean up a bit: (1) No need for start_ptep vs. ptep anymore, we can simply use ptep. (2) Let's switch to "unsigned int" for everything. Negative values do not make sense. (3) We can simplify the code by leaving the pte unchanged after the pte_same() check. (4) Clarify that we should never exceed a single VMA; it indicates a problem in the caller. No functional change intended. Reviewed-by: Lance Yang Reviewed-by: Lorenzo Stoakes Reviewed-by: Oscar Salvador Signed-off-by: David Hildenbrand Reviewed-by: Dev Jain Reviewed-by: Zi Yan --- mm/internal.h | 37 +++++++++++++++---------------------- 1 file changed, 15 insertions(+), 22 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 170d55b6851ff..dba1346ded972 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -221,7 +221,7 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte= , fpb_t flags) * folio_pte_batch - detect a PTE batch for a large folio * @folio: The large folio to detect a PTE batch for. * @addr: The user virtual address the first page is mapped at. - * @start_ptep: Page table pointer for the first entry. + * @ptep: Page table pointer for the first entry. * @pte: Page table entry for the first page. * @max_nr: The maximum number of table entries to consider. * @flags: Flags to modify the PTE batch semantics. @@ -233,24 +233,24 @@ static inline pte_t __pte_batch_clear_ignored(pte_t p= te, fpb_t flags) * first one is dirty. * * Detect a PTE batch: consecutive (present) PTEs that map consecutive - * pages of the same large folio. + * pages of the same large folio in a single VMA and a single page table. * * All PTEs inside a PTE batch have the same PTE bits set, excluding the P= FN, * the accessed bit, writable bit, dirty bit (unless FPB_RESPECT_DIRTY is = set) * and soft-dirty bit (unless FPB_RESPECT_SOFT_DIRTY is set). * - * start_ptep must map any page of the folio. max_nr must be at least one = and - * must be limited by the caller so scanning cannot exceed a single page t= able. + * @ptep must map any page of the folio. max_nr must be at least one and + * must be limited by the caller so scanning cannot exceed a single VMA and + * a single page table. * * Return: the number of table entries in the batch. */ -static inline int folio_pte_batch(struct folio *folio, unsigned long addr, - pte_t *start_ptep, pte_t pte, int max_nr, fpb_t flags, +static inline unsigned int folio_pte_batch(struct folio *folio, unsigned l= ong addr, + pte_t *ptep, pte_t pte, unsigned int max_nr, fpb_t flags, bool *any_writable, bool *any_young, bool *any_dirty) { - pte_t expected_pte, *ptep; - bool writable, young, dirty; - int nr, cur_nr; + unsigned int nr, cur_nr; + pte_t expected_pte; =20 if (any_writable) *any_writable =3D false; @@ -267,29 +267,22 @@ static inline int folio_pte_batch(struct folio *folio= , unsigned long addr, max_nr =3D min_t(unsigned long, max_nr, folio_pfn(folio) + folio_nr_pages(folio) - pte_pfn(pte)); =20 - nr =3D pte_batch_hint(start_ptep, pte); + nr =3D pte_batch_hint(ptep, pte); expected_pte =3D __pte_batch_clear_ignored(pte_advance_pfn(pte, nr), flag= s); - ptep =3D start_ptep + nr; + ptep =3D ptep + nr; =20 while (nr < max_nr) { pte =3D ptep_get(ptep); - if (any_writable) - writable =3D !!pte_write(pte); - if (any_young) - young =3D !!pte_young(pte); - if (any_dirty) - dirty =3D !!pte_dirty(pte); - pte =3D __pte_batch_clear_ignored(pte, flags); =20 - if (!pte_same(pte, expected_pte)) + if (!pte_same(__pte_batch_clear_ignored(pte, flags), expected_pte)) break; =20 if (any_writable) - *any_writable |=3D writable; + *any_writable |=3D pte_write(pte); if (any_young) - *any_young |=3D young; + *any_young |=3D pte_young(pte); if (any_dirty) - *any_dirty |=3D dirty; + *any_dirty |=3D pte_dirty(pte); =20 cur_nr =3D pte_batch_hint(ptep, pte); expected_pte =3D pte_advance_pfn(expected_pte, cur_nr); --=20 2.49.0 From nobody Wed Oct 8 04:10:18 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 15E5325BEFE for ; Wed, 2 Jul 2025 10:49:38 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751453380; cv=none; b=GLBrfm1lRq9ZHD7FG5u36C56CG+d6gmXa6ctlS8mOxA3jj9IB37Rljc+pXnNNYVS2IwwY3mz/DjvLAYkeK+M8/Juj3rHF190RMrc1msvd71/j2mhAScjSyQgzKXWG+CQSzefXPNiA7ZDgtXPflhTIPjNHx92SdmY0mFoi5HXMeE= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751453380; c=relaxed/simple; bh=plRQfGzNNXIV+20HCi83sjnczPExr4J+Xynr2Ax4ACg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KMS4eyTGMDtHXvHug3cOgE8qFwcGij0vIJE9x8hPhHljPwSQQ+St0yRjDCuJMjvkfQdaXWJGoNlVnNX5z9V39HWth/wj3maAlTRN5XJqaU0mCclV+x9rYcOEoCfkiS1ZPyHnwRWbkxKzu1+nUiaFV6Nx/gGPP99i11a+zkfCxv0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=WdvUTeTL; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="WdvUTeTL" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751453378; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=pdSzsUX0r6Sl18OoUi8XlsHVLZGeoeHXJaQLdIlqVew=; b=WdvUTeTLEelQMJ0O1STY6qqe6Erw7N40VSoEwwE9sGhfwhky2jneT5NB9Pq7ZFeic5uTYM CGijz3aQZfndeNp05NSIqGcmqYOkJRo6aBb1TEnZ6spRCQbjno7iejq8D0FmR+cxCtsfrh 59UdcqseionJo3aCMl19WNtH/XRQu8U= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-522-qoLKK1wJMdybJQHbddU2rg-1; Wed, 02 Jul 2025 06:49:37 -0400 X-MC-Unique: qoLKK1wJMdybJQHbddU2rg-1 X-Mimecast-MFC-AGG-ID: qoLKK1wJMdybJQHbddU2rg_1751453376 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-3a4eeed54c2so3098261f8f.3 for ; Wed, 02 Jul 2025 03:49:36 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751453376; x=1752058176; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=pdSzsUX0r6Sl18OoUi8XlsHVLZGeoeHXJaQLdIlqVew=; b=Y980vnjbGvILL7jFMZ/jB+BXnbAootzCYHaxdh7uloluA6gc+Aecup5QgqZpnV05E1 w//yEiK+K6/k9WKChdAsW2ZHswwRV0IPyFGA0MNJ3GxDM7TgTvJIXmta6IyLauFW8gHO bQW09VnmjL6qn/rlu86gSBhhMXZlqu5cE0971OsNYbKnvezIF1IKqeu2mm33fHcSa2rS jgtftky3uxSlBk2NNOEUGNMkvIV+DZkBg32u/+z2GBlr/91AjAuIQepHf6Xvz06OxAZQ hssakJNwAZCUkz+8In28959XLNFUpx1olXMmrwzdXmHZJOfmIKRQlO219MHHom8HeSQ/ XWTQ== X-Gm-Message-State: AOJu0YyQqi74rxItVBaRDqsbP/0mJenTScUxUL09RPdYk6Zm5r0S0JNV ukNecJu6aoT36Y36MaEoEMGiRYHpcE1Omk8sfMOvPZprAgxILSPcbh0F65eNAwcVujidFC2TazY Bk1sv3VkDuh/n48YTgXQaheb1G+56kHMmb0toWQPKk2Z3nvZg7ou35g0KDe64wyX3kOfj6tCY01 Y3iHBQG5uJ+y28/x/Tim+MAy3cyNh76xi36OSZZme4VgvemMOP X-Gm-Gg: ASbGncsc10i8Jq+IPdJlx3aKl6fqz/DtSufzPPhgRAHNPttZb0PzHckWalVLbgVYiqY cqiT+r0K5D1pKc7BbtukmFKUzDvThU0HWTg92SVQ79YYgwhqzrVAxjtqFzLh23phmbilMQEbR21 UXg9cxfCHUsno2coPb+j/Prqmq1y35/hwfdyJkvb8sKlRYyRm9j6nN9xB5VgiLyiZp0o2RlOyMi 38QrxZfUnalMhzY0IOCNagjx3/4MAKlf8YhGxnUTEksYxSxmmOYPLfQ36MqJfx1jzbu6o3K828a 7MCA+Tf8koUvOtSG X-Received: by 2002:a5d:64e8:0:b0:3a5:2ec5:35a9 with SMTP id ffacd0b85a97d-3b1fdf024d3mr1851818f8f.3.1751453375559; Wed, 02 Jul 2025 03:49:35 -0700 (PDT) X-Google-Smtp-Source: AGHT+IFjFtDc8oNq3Sjw8kwMyszC7GGt08Dk1rENRIs8iM2js5xYxb1Rg+m2Mck5dqr0G+VSIa7Ztw== X-Received: by 2002:a5d:64e8:0:b0:3a5:2ec5:35a9 with SMTP id ffacd0b85a97d-3b1fdf024d3mr1851758f8f.3.1751453374854; Wed, 02 Jul 2025 03:49:34 -0700 (PDT) Received: from localhost ([2a09:80c0:192:0:5dac:bf3d:c41:c3e7]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-3a892e5f918sm15668044f8f.100.2025.07.02.03.49.33 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 02 Jul 2025 03:49:34 -0700 (PDT) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Zi Yan , Matthew Brost , Joshua Hahn , Rakie Kim , Byungchul Park , Gregory Price , Ying Huang , Alistair Popple , Pedro Falcato , Rik van Riel , Harry Yoo , Lance Yang , Oscar Salvador Subject: [PATCH v2 3/4] mm: split folio_pte_batch() into folio_pte_batch() and folio_pte_batch_flags() Date: Wed, 2 Jul 2025 12:49:25 +0200 Message-ID: <20250702104926.212243-4-david@redhat.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250702104926.212243-1-david@redhat.com> References: <20250702104926.212243-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Many users (including upcoming ones) don't really need the flags etc, and can live with the possible overhead of a function call. So let's provide a basic, non-inlined folio_pte_batch(), to avoid code bloat while still providing a variant that optimizes out all flag checks at runtime. folio_pte_batch_flags() will get inlined into folio_pte_batch(), optimizing out any conditionals that depend on input flags. folio_pte_batch() will behave like folio_pte_batch_flags() when no flags are specified. It's okay to add new users of folio_pte_batch_flags(), but using folio_pte_batch() if applicable is preferred. So, before this change, folio_pte_batch() was inlined into the C file optimized by propagating constants within the resulting object file. With this change, we now also have a folio_pte_batch() that is optimized by propagating all constants. But instead of having one instance per object file, we have a single shared one. In zap_present_ptes(), where we care about performance, the compiler already seem to generate a call to a common inlined folio_pte_batch() variant, shared with fork() code. So calling the new non-inlined variant should not make a difference. While at it, drop the "addr" parameter that is unused. Suggested-by: Andrew Morton Link: https://lore.kernel.org/linux-mm/20250503182858.5a02729fcffd6d4723afc= fc2@linux-foundation.org/ Reviewed-by: Oscar Salvador Signed-off-by: David Hildenbrand Reviewed-by: Dev Jain Reviewed-by: Zi Yan --- mm/internal.h | 11 ++++++++--- mm/madvise.c | 4 ++-- mm/memory.c | 8 +++----- mm/mempolicy.c | 3 +-- mm/mlock.c | 3 +-- mm/mremap.c | 3 +-- mm/rmap.c | 3 +-- mm/util.c | 29 +++++++++++++++++++++++++++++ 8 files changed, 46 insertions(+), 18 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index dba1346ded972..6c92956ac4fd9 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -218,9 +218,8 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pte= , fpb_t flags) } =20 /** - * folio_pte_batch - detect a PTE batch for a large folio + * folio_pte_batch_flags - detect a PTE batch for a large folio * @folio: The large folio to detect a PTE batch for. - * @addr: The user virtual address the first page is mapped at. * @ptep: Page table pointer for the first entry. * @pte: Page table entry for the first page. * @max_nr: The maximum number of table entries to consider. @@ -243,9 +242,12 @@ static inline pte_t __pte_batch_clear_ignored(pte_t pt= e, fpb_t flags) * must be limited by the caller so scanning cannot exceed a single VMA and * a single page table. * + * This function will be inlined to optimize based on the input parameters; + * consider using folio_pte_batch() instead if applicable. + * * Return: the number of table entries in the batch. */ -static inline unsigned int folio_pte_batch(struct folio *folio, unsigned l= ong addr, +static inline unsigned int folio_pte_batch_flags(struct folio *folio, pte_t *ptep, pte_t pte, unsigned int max_nr, fpb_t flags, bool *any_writable, bool *any_young, bool *any_dirty) { @@ -293,6 +295,9 @@ static inline unsigned int folio_pte_batch(struct folio= *folio, unsigned long ad return min(nr, max_nr); } =20 +unsigned int folio_pte_batch(struct folio *folio, pte_t *ptep, pte_t pte, + unsigned int max_nr); + /** * pte_move_swp_offset - Move the swap entry offset field of a swap pte * forward or backward by delta diff --git a/mm/madvise.c b/mm/madvise.c index 661bb743d2216..fe363a14daab3 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -349,8 +349,8 @@ static inline int madvise_folio_pte_batch(unsigned long= addr, unsigned long end, { int max_nr =3D (end - addr) / PAGE_SIZE; =20 - return folio_pte_batch(folio, addr, ptep, pte, max_nr, 0, NULL, - any_young, any_dirty); + return folio_pte_batch_flags(folio, ptep, pte, max_nr, 0, NULL, + any_young, any_dirty); } =20 static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, diff --git a/mm/memory.c b/mm/memory.c index 0a47583ca9937..26a82b82863b0 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -995,8 +995,8 @@ copy_present_ptes(struct vm_area_struct *dst_vma, struc= t vm_area_struct *src_vma if (vma_soft_dirty_enabled(src_vma)) flags |=3D FPB_RESPECT_SOFT_DIRTY; =20 - nr =3D folio_pte_batch(folio, addr, src_pte, pte, max_nr, flags, - &any_writable, NULL, NULL); + nr =3D folio_pte_batch_flags(folio, src_pte, pte, max_nr, flags, + &any_writable, NULL, NULL); folio_ref_add(folio, nr); if (folio_test_anon(folio)) { if (unlikely(folio_try_dup_anon_rmap_ptes(folio, page, @@ -1564,9 +1564,7 @@ static inline int zap_present_ptes(struct mmu_gather = *tlb, * by keeping the batching logic separate. */ if (unlikely(folio_test_large(folio) && max_nr !=3D 1)) { - nr =3D folio_pte_batch(folio, addr, pte, ptent, max_nr, 0, - NULL, NULL, NULL); - + nr =3D folio_pte_batch(folio, pte, ptent, max_nr); zap_present_folio_ptes(tlb, vma, folio, page, pte, ptent, nr, addr, details, rss, force_flush, force_break, any_skipped); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 2a25eedc3b1c0..eb83cff7db8c3 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -711,8 +711,7 @@ static int queue_folios_pte_range(pmd_t *pmd, unsigned = long addr, if (!folio || folio_is_zone_device(folio)) continue; if (folio_test_large(folio) && max_nr !=3D 1) - nr =3D folio_pte_batch(folio, addr, pte, ptent, - max_nr, 0, NULL, NULL, NULL); + nr =3D folio_pte_batch(folio, pte, ptent, max_nr); /* * vm_normal_folio() filters out zero pages, but there might * still be reserved folios to skip, perhaps in a VDSO. diff --git a/mm/mlock.c b/mm/mlock.c index 2238cdc5eb1c1..a1d93ad33c6db 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -313,8 +313,7 @@ static inline unsigned int folio_mlock_step(struct foli= o *folio, if (!folio_test_large(folio)) return 1; =20 - return folio_pte_batch(folio, addr, pte, ptent, count, 0, NULL, - NULL, NULL); + return folio_pte_batch(folio, pte, ptent, count); } =20 static inline bool allow_mlock_munlock(struct folio *folio, diff --git a/mm/mremap.c b/mm/mremap.c index d4d3ffc931502..1f5bebbb9c0cb 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -182,8 +182,7 @@ static int mremap_folio_pte_batch(struct vm_area_struct= *vma, unsigned long addr if (!folio || !folio_test_large(folio)) return 1; =20 - return folio_pte_batch(folio, addr, ptep, pte, max_nr, 0, NULL, - NULL, NULL); + return folio_pte_batch(folio, ptep, pte, max_nr); } =20 static int move_ptes(struct pagetable_move_control *pmc, diff --git a/mm/rmap.c b/mm/rmap.c index 98ed3e49e8bbe..a15939453c41a 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1868,8 +1868,7 @@ static inline unsigned int folio_unmap_pte_batch(stru= ct folio *folio, if (pte_unused(pte)) return 1; =20 - return folio_pte_batch(folio, addr, pvmw->pte, pte, max_nr, 0, - NULL, NULL, NULL); + return folio_pte_batch(folio, pvmw->pte, pte, max_nr); } =20 /* diff --git a/mm/util.c b/mm/util.c index 0b270c43d7d12..cf41edceec7d2 100644 --- a/mm/util.c +++ b/mm/util.c @@ -1171,3 +1171,32 @@ int compat_vma_mmap_prepare(struct file *file, struc= t vm_area_struct *vma) return 0; } EXPORT_SYMBOL(compat_vma_mmap_prepare); + +#ifdef CONFIG_MMU +/** + * folio_pte_batch - detect a PTE batch for a large folio + * @folio: The large folio to detect a PTE batch for. + * @ptep: Page table pointer for the first entry. + * @pte: Page table entry for the first page. + * @max_nr: The maximum number of table entries to consider. + * + * This is a simplified variant of folio_pte_batch_flags(). + * + * Detect a PTE batch: consecutive (present) PTEs that map consecutive + * pages of the same large folio in a single VMA and a single page table. + * + * All PTEs inside a PTE batch have the same PTE bits set, excluding the P= FN, + * the accessed bit, writable bit, dirt-bit and soft-dirty bit. + * + * ptep must map any page of the folio. max_nr must be at least one and + * must be limited by the caller so scanning cannot exceed a single VMA and + * a single page table. + * + * Return: the number of table entries in the batch. + */ +unsigned int folio_pte_batch(struct folio *folio, pte_t *ptep, pte_t pte, + unsigned int max_nr) +{ + return folio_pte_batch_flags(folio, ptep, pte, max_nr, 0, NULL, NULL, NUL= L); +} +#endif /* CONFIG_MMU */ --=20 2.49.0 From nobody Wed Oct 8 04:10:18 2025 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B1F98262FDC for ; Wed, 2 Jul 2025 11:07:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=170.10.133.124 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751454481; cv=none; b=ZrNkzlo5j5zdEDtqjfgP1elh4xTWE6DEI5Tt7HYz8X26zBXdhc4AU2FgSOletc29sNH20FIOjYxoV8TaRQVtcB6lGaKnHPtpDyaCvgs76mMC8scmyXyFT+txr2Q9goY+5TrvwUmQCsSUR1hbDzk2AwAmsUWCvq3zcFKGNMdufdM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1751454481; c=relaxed/simple; bh=iq5EbOPuBdZmLwhjWwZ4Tu3NTW5FLwSPpNp7cvQvjwE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eChPv8x+x4g0m+WUaDR+bvq3JEe3d5GAqVE23S7bFbnwX8RUzPfIidbglO6U2VX0UvOT5Jh3RDAcuuMlMCWznj4ELJv68F6i/q5elbbxbRhk85AtLgugUj4DZTIlO10NAm7+kV7+ZHS94UqQ7wfJcqrbMxaAHNK6QOfRaI/15JQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com; spf=pass smtp.mailfrom=redhat.com; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b=XvNRCZuo; arc=none smtp.client-ip=170.10.133.124 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=quarantine dis=none) header.from=redhat.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=redhat.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="XvNRCZuo" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1751454478; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=3oAnJSM0KYfarg3+XnNxR/2W/7E08lIWfFxeFG4I6C8=; b=XvNRCZuoWif6jMVdBzgA0sX+BfTatPGyYbm9sR7cqoyzJzbVU6il8rahgKlAmh2ViSkfR8 mAwmK4c80BqT+vggDVD0ysricbUa1cik8Xi5FlmHbG9kJcMdCpxnHDQeYQrU94nx7ybad/ ubcUOds4cjJglCRrlUMaXDX7zN/NOcY= Received: from mail-wr1-f69.google.com (mail-wr1-f69.google.com [209.85.221.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-582-QSLckNXnN22tIfbW3d-rNQ-1; Wed, 02 Jul 2025 06:49:39 -0400 X-MC-Unique: QSLckNXnN22tIfbW3d-rNQ-1 X-Mimecast-MFC-AGG-ID: QSLckNXnN22tIfbW3d-rNQ_1751453378 Received: by mail-wr1-f69.google.com with SMTP id ffacd0b85a97d-3a578958000so1707809f8f.3 for ; Wed, 02 Jul 2025 03:49:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751453378; x=1752058178; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=3oAnJSM0KYfarg3+XnNxR/2W/7E08lIWfFxeFG4I6C8=; b=K9euXdEKABhP4j0GtBEgM3IXI6jKhN0DXHwNASO0KxRBJ/x2vlqDVzjoOsIhgiHH6J hUtEyVNV6oMJdfnohgDUbjemA5EZ89czFso8bf2eiC8U0sPV/K6IWL6dNbnBggImMC+o P2PVBtXJeVAqwqjSbCerrgTXx7UbHkGchXz3jOc1BYm1uK4S4t8dvTebkSz6Y1UXOIj4 boUVBvZZdQ7gchxEHZishajHPgyYtA5r3KRbWpb0xxgH+MlW7xseAUsZuuZ6goKeBq7d zTpCBwN+yFIxMoWnoTUcgUxWiZac5pL+2YHkenZMVZEN++WGNhJkk+YJb/sK9W8zcQnU bc0g== X-Gm-Message-State: AOJu0Yzqw0KWC6edzCdIykgH7muWKYoxu1GWnutS8K4sHAvhI3TmmYWD k2vYE7utPd6XFAtCzqRZeh1hy6BVUiT9f3WDDaahLsKwOPSt3ejOxZusVW/qID44ES2WnBnzblW 8c1ZNXEA5jJe5y9Qoy4Bm39gjhK9LQZ1GZbekq75A62pVD0PPDlQtoxls7Skmwa+xg+5uDhr4ov eUj9K/qL5136Tono1BpoZlyrApFBwsHD77GasMTRUq21d80rk2 X-Gm-Gg: ASbGncukpQatBlCuKuMBZT9NxTvrMMdQ9d51e7nks2rMCbvPa/cwzOt/vk+Nbi0X+Z0 BFuBqWu9W/cm9qaYq9+ru1D66WOWiCj+x6sooJA4Ei4gHI7nFOn9jcIP9Qw9/7ZsGRdrE3XcO+c xzDLwUI74N9Ky0UpjTFDBVfkCyBOiB4bblR2zAKjyWQPstDJh/oLnPPZHUAmWvix6OfOjVd7BOv zcb0LuLoNSZGP7Jpavucwcc+Y2E0fR5g9iY9PxvQeUlRUYtoO0zS+FM2Qsq/VDvhG4cZY7CD/++ hHBMsuXB7+BqpOfX X-Received: by 2002:a5d:5f4e:0:b0:3a4:fbaf:749e with SMTP id ffacd0b85a97d-3b20234d94cmr1583878f8f.49.1751453377782; Wed, 02 Jul 2025 03:49:37 -0700 (PDT) X-Google-Smtp-Source: AGHT+IGqpc4JaEHI4XLQrd/hrL7HETc6C5MvZ9hTJGeziQf4Dkb9UiYdEYneULGVBMvUpSbvMTd7UQ== X-Received: by 2002:a5d:5f4e:0:b0:3a4:fbaf:749e with SMTP id ffacd0b85a97d-3b20234d94cmr1583828f8f.49.1751453377074; Wed, 02 Jul 2025 03:49:37 -0700 (PDT) Received: from localhost ([2a09:80c0:192:0:5dac:bf3d:c41:c3e7]) by smtp.gmail.com with UTF8SMTPSA id ffacd0b85a97d-3a88c8013fesm15956078f8f.38.2025.07.02.03.49.35 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 02 Jul 2025 03:49:36 -0700 (PDT) From: David Hildenbrand To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, David Hildenbrand , Andrew Morton , "Liam R. Howlett" , Lorenzo Stoakes , Vlastimil Babka , Jann Horn , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Zi Yan , Matthew Brost , Joshua Hahn , Rakie Kim , Byungchul Park , Gregory Price , Ying Huang , Alistair Popple , Pedro Falcato , Rik van Riel , Harry Yoo , Lance Yang , Oscar Salvador Subject: [PATCH v2 4/4] mm: remove boolean output parameters from folio_pte_batch_ext() Date: Wed, 2 Jul 2025 12:49:26 +0200 Message-ID: <20250702104926.212243-5-david@redhat.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250702104926.212243-1-david@redhat.com> References: <20250702104926.212243-1-david@redhat.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Instead, let's just allow for specifying through flags whether we want to have bits merged into the original PTE. For the madvise() case, simplify by having only a single parameter for merging young+dirty. For madvise_cold_or_pageout_pte_range() merging the dirty bit is not required, but also not harmful. This code is not that performance critical after all to really force all micro-optimizations. As we now have two pte_t * parameters, use PageTable() to make sure we are actually given a pointer at a copy of the PTE, not a pointer into an actual page table. Signed-off-by: David Hildenbrand Reviewed-by: Dev Jain Reviewed-by: Oscar Salvador --- mm/internal.h | 65 +++++++++++++++++++++++++++++++++------------------ mm/madvise.c | 26 ++++----------------- mm/memory.c | 8 ++----- mm/util.c | 2 +- 4 files changed, 50 insertions(+), 51 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 6c92956ac4fd9..b7131bd3d1ad1 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -208,6 +208,18 @@ typedef int __bitwise fpb_t; /* Compare PTEs respecting the soft-dirty bit. */ #define FPB_RESPECT_SOFT_DIRTY ((__force fpb_t)BIT(1)) =20 +/* + * Merge PTE write bits: if any PTE in the batch is writable, modify the + * PTE at @ptentp to be writable. + */ +#define FPB_MERGE_WRITE ((__force fpb_t)BIT(2)) + +/* + * Merge PTE young and dirty bits: if any PTE in the batch is young or dir= ty, + * modify the PTE at @ptentp to be young or dirty, respectively. + */ +#define FPB_MERGE_YOUNG_DIRTY ((__force fpb_t)BIT(3)) + static inline pte_t __pte_batch_clear_ignored(pte_t pte, fpb_t flags) { if (!(flags & FPB_RESPECT_DIRTY)) @@ -220,16 +232,12 @@ static inline pte_t __pte_batch_clear_ignored(pte_t p= te, fpb_t flags) /** * folio_pte_batch_flags - detect a PTE batch for a large folio * @folio: The large folio to detect a PTE batch for. + * @vma: The VMA. Only relevant with FPB_MERGE_WRITE, otherwise can be NUL= L. * @ptep: Page table pointer for the first entry. - * @pte: Page table entry for the first page. + * @ptentp: Pointer to a COPY of the first page table entry whose flags th= is + * function updates based on @flags if appropriate. * @max_nr: The maximum number of table entries to consider. * @flags: Flags to modify the PTE batch semantics. - * @any_writable: Optional pointer to indicate whether any entry except the - * first one is writable. - * @any_young: Optional pointer to indicate whether any entry except the - * first one is young. - * @any_dirty: Optional pointer to indicate whether any entry except the - * first one is dirty. * * Detect a PTE batch: consecutive (present) PTEs that map consecutive * pages of the same large folio in a single VMA and a single page table. @@ -242,28 +250,32 @@ static inline pte_t __pte_batch_clear_ignored(pte_t p= te, fpb_t flags) * must be limited by the caller so scanning cannot exceed a single VMA and * a single page table. * + * Depending on the FPB_MERGE_* flags, the pte stored at @ptentp will + * be updated: it's crucial that a pointer to a COPY of the first + * page table entry, obtained through ptep_get(), is provided as @ptentp. + * * This function will be inlined to optimize based on the input parameters; * consider using folio_pte_batch() instead if applicable. * * Return: the number of table entries in the batch. */ static inline unsigned int folio_pte_batch_flags(struct folio *folio, - pte_t *ptep, pte_t pte, unsigned int max_nr, fpb_t flags, - bool *any_writable, bool *any_young, bool *any_dirty) + struct vm_area_struct *vma, pte_t *ptep, pte_t *ptentp, + unsigned int max_nr, fpb_t flags) { + bool any_writable =3D false, any_young =3D false, any_dirty =3D false; + pte_t expected_pte, pte =3D *ptentp; unsigned int nr, cur_nr; - pte_t expected_pte; - - if (any_writable) - *any_writable =3D false; - if (any_young) - *any_young =3D false; - if (any_dirty) - *any_dirty =3D false; =20 VM_WARN_ON_FOLIO(!pte_present(pte), folio); VM_WARN_ON_FOLIO(!folio_test_large(folio) || max_nr < 1, folio); VM_WARN_ON_FOLIO(page_folio(pfn_to_page(pte_pfn(pte))) !=3D folio, folio); + /* + * Ensure this is a pointer to a copy not a pointer into a page table. + * If this is a stack value, it won't be a valid virtual address, but + * that's fine because it also cannot be pointing into the page table. + */ + VM_WARN_ON(virt_addr_valid(ptentp) && PageTable(virt_to_page(ptentp))); =20 /* Limit max_nr to the actual remaining PFNs in the folio we could batch.= */ max_nr =3D min_t(unsigned long, max_nr, @@ -279,12 +291,12 @@ static inline unsigned int folio_pte_batch_flags(stru= ct folio *folio, if (!pte_same(__pte_batch_clear_ignored(pte, flags), expected_pte)) break; =20 - if (any_writable) - *any_writable |=3D pte_write(pte); - if (any_young) - *any_young |=3D pte_young(pte); - if (any_dirty) - *any_dirty |=3D pte_dirty(pte); + if (flags & FPB_MERGE_WRITE) + any_writable |=3D pte_write(pte); + if (flags & FPB_MERGE_YOUNG_DIRTY) { + any_young |=3D pte_young(pte); + any_dirty |=3D pte_dirty(pte); + } =20 cur_nr =3D pte_batch_hint(ptep, pte); expected_pte =3D pte_advance_pfn(expected_pte, cur_nr); @@ -292,6 +304,13 @@ static inline unsigned int folio_pte_batch_flags(struc= t folio *folio, nr +=3D cur_nr; } =20 + if (any_writable) + *ptentp =3D pte_mkwrite(*ptentp, vma); + if (any_young) + *ptentp =3D pte_mkyoung(*ptentp); + if (any_dirty) + *ptentp =3D pte_mkdirty(*ptentp); + return min(nr, max_nr); } =20 diff --git a/mm/madvise.c b/mm/madvise.c index fe363a14daab3..9de9b7c797c63 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -344,13 +344,12 @@ static inline bool can_do_file_pageout(struct vm_area= _struct *vma) =20 static inline int madvise_folio_pte_batch(unsigned long addr, unsigned lon= g end, struct folio *folio, pte_t *ptep, - pte_t pte, bool *any_young, - bool *any_dirty) + pte_t *ptentp) { int max_nr =3D (end - addr) / PAGE_SIZE; =20 - return folio_pte_batch_flags(folio, ptep, pte, max_nr, 0, NULL, - any_young, any_dirty); + return folio_pte_batch_flags(folio, NULL, ptep, ptentp, max_nr, + FPB_MERGE_YOUNG_DIRTY); } =20 static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, @@ -488,13 +487,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pm= d, * next pte in the range. */ if (folio_test_large(folio)) { - bool any_young; - - nr =3D madvise_folio_pte_batch(addr, end, folio, pte, - ptent, &any_young, NULL); - if (any_young) - ptent =3D pte_mkyoung(ptent); - + nr =3D madvise_folio_pte_batch(addr, end, folio, pte, &ptent); if (nr < folio_nr_pages(folio)) { int err; =20 @@ -724,11 +717,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned= long addr, * next pte in the range. */ if (folio_test_large(folio)) { - bool any_young, any_dirty; - - nr =3D madvise_folio_pte_batch(addr, end, folio, pte, - ptent, &any_young, &any_dirty); - + nr =3D madvise_folio_pte_batch(addr, end, folio, pte, &ptent); if (nr < folio_nr_pages(folio)) { int err; =20 @@ -753,11 +742,6 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned= long addr, nr =3D 0; continue; } - - if (any_young) - ptent =3D pte_mkyoung(ptent); - if (any_dirty) - ptent =3D pte_mkdirty(ptent); } =20 if (folio_test_swapcache(folio) || folio_test_dirty(folio)) { diff --git a/mm/memory.c b/mm/memory.c index 26a82b82863b0..0269edb520987 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -972,10 +972,9 @@ copy_present_ptes(struct vm_area_struct *dst_vma, stru= ct vm_area_struct *src_vma pte_t *dst_pte, pte_t *src_pte, pte_t pte, unsigned long addr, int max_nr, int *rss, struct folio **prealloc) { + fpb_t flags =3D FPB_MERGE_WRITE; struct page *page; struct folio *folio; - bool any_writable; - fpb_t flags =3D 0; int err, nr; =20 page =3D vm_normal_page(src_vma, addr, pte); @@ -995,8 +994,7 @@ copy_present_ptes(struct vm_area_struct *dst_vma, struc= t vm_area_struct *src_vma if (vma_soft_dirty_enabled(src_vma)) flags |=3D FPB_RESPECT_SOFT_DIRTY; =20 - nr =3D folio_pte_batch_flags(folio, src_pte, pte, max_nr, flags, - &any_writable, NULL, NULL); + nr =3D folio_pte_batch_flags(folio, src_vma, src_pte, &pte, max_nr, flag= s); folio_ref_add(folio, nr); if (folio_test_anon(folio)) { if (unlikely(folio_try_dup_anon_rmap_ptes(folio, page, @@ -1010,8 +1008,6 @@ copy_present_ptes(struct vm_area_struct *dst_vma, str= uct vm_area_struct *src_vma folio_dup_file_rmap_ptes(folio, page, nr, dst_vma); rss[mm_counter_file(folio)] +=3D nr; } - if (any_writable) - pte =3D pte_mkwrite(pte, src_vma); __copy_present_ptes(dst_vma, src_vma, dst_pte, src_pte, pte, addr, nr); return nr; diff --git a/mm/util.c b/mm/util.c index cf41edceec7d2..ce826ca82a11d 100644 --- a/mm/util.c +++ b/mm/util.c @@ -1197,6 +1197,6 @@ EXPORT_SYMBOL(compat_vma_mmap_prepare); unsigned int folio_pte_batch(struct folio *folio, pte_t *ptep, pte_t pte, unsigned int max_nr) { - return folio_pte_batch_flags(folio, ptep, pte, max_nr, 0, NULL, NULL, NUL= L); + return folio_pte_batch_flags(folio, NULL, ptep, &pte, max_nr, 0); } #endif /* CONFIG_MMU */ --=20 2.49.0