From nobody Sun Feb 8 01:22:42 2026 Received: from mail-qk1-f181.google.com (mail-qk1-f181.google.com [209.85.222.181]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 6E6EA13B791; Tue, 13 Aug 2024 12:03:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.181 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723550627; cv=none; b=g+eQgi+W685/5D17QR0RinCNCTry3Sza8JpkGjxdmkS2O4TsO8MoEvsZwd0aYpadutvraCk3Uw1ARlsZvWPMLgG4bvYkkujjDXfeiRCgb9HIgfL3ij/15MKZ17zNnyFHavmQmfAcHJIWYt5Awj4tEiapnz+Q+saIDCPE0+5UFoU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723550627; c=relaxed/simple; bh=s5vMsF1VpXoaCzaOaeRvQSso1xq/xJJHbkJDTudTxGE=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iOhmY8jEc1697xBP9kumjEXaPSWCmM7NANo8r+Bt/MfDmqylOR7G9ViL90GhMiLiV5NJu1Qpzr623w6RgYuITZyiQfUg+RyNeiugM0PDZf3MWkxmeazJmng4s4Y44te4zmLdH+CZvNFvTR5oIt2DXIpyq6MGigAQP/WYztguzjw= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=bwf4KWn0; arc=none smtp.client-ip=209.85.222.181 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bwf4KWn0" Received: by mail-qk1-f181.google.com with SMTP id af79cd13be357-7a1e2ac1ee5so352633385a.2; Tue, 13 Aug 2024 05:03:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723550624; x=1724155424; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=0tRn9HldKKoz64Z505xjl60T+jy+qH2RDb/CoWbJ5vM=; b=bwf4KWn0PCeT+OKmmEmEIKjj5GjuxrD6HmD6YMgCIN0VXChCMe67g4cLLQNFRHlp3V JnlDr8gDEgFD58cleVAUOgmTidpRCLUEcDSHHMXFEnDO1SFEb+QsCB3j015CtNbsSrpn Ga/GZ4n4UXCLt8+wa3+Omvb0voMNItsXtsi4XvJXSJu4bpa1wN1Ep6XjJm94NUw8jBxW fJ/94pcZARljWpMEg+OmTZP50OV1NcdP+ZTi1v9ulJFaJ73nPPwAwFyq2goAdT2jjIeq PsbIMboGUzcx91K3Ok83rkVLvglyj8o+QmAWNd365FzH73Jolb7Q26dCR0Xy5CaZYmyn YTmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723550624; x=1724155424; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0tRn9HldKKoz64Z505xjl60T+jy+qH2RDb/CoWbJ5vM=; b=BQyfVnNSE4ois3xHyDPNmboS5OoMOqFZxgVeVHVPJ6vcCBlpaOXzW74d1X1lRNcOdC qJpVBAjIoqfdO4l96kgcFJP13sR1LUzJ6mpCCYbadBOm2C8I/pPRLtq64SSlV8rnS6OB TbToMktEXfZC7bSpTwgttb3yKkhpYBY6yowdL3x6WoFH0D1ia3EkKtFXxAaMHEkH2O4q eTVQyrNfiq4Myh0ceBhvrxXh0ZJNhvyEmeDDvuMB8HwHohKvW8sSoI7bT7qB9x+3iE2S hRKjLyWC+FWx0m7xIlHwSiY0e1IRjAbdNhhBysadm70Q/HFbQZD2xdnjcXmLarzowK0w XZvQ== X-Forwarded-Encrypted: i=1; AJvYcCUC8qbZyW3NzTZH1ftMZumJo4gKM1lt1yA5IcmqMBDNjxOn8v8xsTTCWEVobXLnOp9NqDFZYTqlTdnm1bVX@vger.kernel.org, AJvYcCUFebqUBWJm4hpLlc48AI4DWRnE7QwD03xvLhnMB4RfmaGm6Y3jUmq/9z/39pKfG4TWzWIRgVReWAY=@vger.kernel.org X-Gm-Message-State: AOJu0YxlG08sta8IQB/SyIGSSFCp6/jVhE6koWOsXYFUwiWpG4xm1KZq CuOH41RZ7DdQhvkyOTXbWkURXQwOSaeiRkf2ExNT+Vy6PJUloeq8GpMSI1Pj1Aw= X-Google-Smtp-Source: AGHT+IHCyk5c5SE6e3z0DoyqupXKxHAXubnY9Py1uh22TPysRE3LzZi4VhhYcHmcgCKHwTje7P8Lig== X-Received: by 2002:a05:620a:4091:b0:79d:9102:554a with SMTP id af79cd13be357-7a4e15011femr407319085a.14.1723550624171; Tue, 13 Aug 2024 05:03:44 -0700 (PDT) Received: from localhost (fwdproxy-ash-112.fbsv.net. [2a03:2880:20ff:70::face:b00c]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4531c26ec58sm31235081cf.60.2024.08.13.05.03.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Aug 2024 05:03:43 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Shuang Zhai , Usama Arif Subject: [PATCH v3 1/6] mm: free zapped tail pages when splitting isolated thp Date: Tue, 13 Aug 2024 13:02:44 +0100 Message-ID: <20240813120328.1275952-2-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240813120328.1275952-1-usamaarif642@gmail.com> References: <20240813120328.1275952-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yu Zhao If a tail page has only two references left, one inherited from the isolation of its head and the other from lru_add_page_tail() which we are about to drop, it means this tail page was concurrently zapped. Then we can safely free it and save page reclaim or migration the trouble of trying it. Signed-off-by: Yu Zhao Tested-by: Shuang Zhai Signed-off-by: Usama Arif Acked-by: Johannes Weiner --- mm/huge_memory.c | 27 +++++++++++++++++++++++++++ 1 file changed, 27 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 04ee8abd6475..85a424e954be 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3059,7 +3059,9 @@ static void __split_huge_page(struct page *page, stru= ct list_head *list, unsigned int new_nr =3D 1 << new_order; int order =3D folio_order(folio); unsigned int nr =3D 1 << order; + struct folio_batch free_folios; =20 + folio_batch_init(&free_folios); /* complete memcg works before add pages to LRU */ split_page_memcg(head, order, new_order); =20 @@ -3143,6 +3145,26 @@ static void __split_huge_page(struct page *page, str= uct list_head *list, if (subpage =3D=3D page) continue; folio_unlock(new_folio); + /* + * If a folio has only two references left, one inherited + * from the isolation of its head and the other from + * lru_add_page_tail() which we are about to drop, it means this + * folio was concurrently zapped. Then we can safely free it + * and save page reclaim or migration the trouble of trying it. + */ + if (list && folio_ref_freeze(new_folio, 2)) { + VM_WARN_ON_ONCE_FOLIO(folio_test_lru(new_folio), new_folio); + VM_WARN_ON_ONCE_FOLIO(folio_test_large(new_folio), new_folio); + VM_WARN_ON_ONCE_FOLIO(folio_mapped(new_folio), new_folio); + + folio_clear_active(new_folio); + folio_clear_unevictable(new_folio); + if (!folio_batch_add(&free_folios, folio)) { + mem_cgroup_uncharge_folios(&free_folios); + free_unref_folios(&free_folios); + } + continue; + } =20 /* * Subpages may be freed if there wasn't any mapping @@ -3153,6 +3175,11 @@ static void __split_huge_page(struct page *page, str= uct list_head *list, */ free_page_and_swap_cache(subpage); } + + if (free_folios.nr) { + mem_cgroup_uncharge_folios(&free_folios); + free_unref_folios(&free_folios); + } } =20 /* Racy check whether the huge page can be split */ --=20 2.43.5 From nobody Sun Feb 8 01:22:42 2026 Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C520E199221; Tue, 13 Aug 2024 12:03:46 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723550628; cv=none; b=AUgTdtv2A5E/2RiEIJK1MbfQLyCFqcHwTZENE8QvrcG49Oo9/+G02m26tiPRHOkKjdzlzUGG9xYNrM0zRRunDQ86dhnCJnKIgt/VBp5QN34kILHalkQcJTFqoJm2jpDlp6Oarw+YT8cV9z+diI6Td6PGrbs8slpygoko5jEaAiM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723550628; c=relaxed/simple; bh=o4I5yjQIdFXxZ7sVL+MMpAW222CgyrF+0buKw3KytsA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Z+aO/SjvDWZGT4At+1IqZxG7APVj1UqqROYi9kSrYR6VoFkvW1vO9mqUIFnVn2leoUFpeWJ74KpyrDn1+Pr+u1ZPsA+A8NSj5JpIq54SmE9qzhyotB27siKxnZ+l96lodByCTQxKMOvnTmApxZgyt4OtLP8/dLPGzAYHU7DG8wo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=bjp0tLUW; arc=none smtp.client-ip=209.85.160.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="bjp0tLUW" Received: by mail-qt1-f172.google.com with SMTP id d75a77b69052e-44fee8813c3so32332561cf.2; Tue, 13 Aug 2024 05:03:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723550625; x=1724155425; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ArHqWK+pDk4THc6HGtgBtflIMEbd37YycfL2VsChgqA=; b=bjp0tLUW2HV8kjKRUlMb63g8jJe+zXS6EGrGEKtVHkLYZ2BnwG+Rn56/bzt6DaN4XT dRf2Nt299faRZQRL+iKMMfhShgia+jomMCyHBGc1YPMUv7KO5nzbTs9FlQGEFDePpU/o 5UHjQV37bHrzn0oChb8c1jKDpm3495FfQEE44WLptkNnKYhR44efMDKdajyQX6QVZSrJ I9Vlve3JcqvRzrTS8f/fIq9zB0npIUoZl0RrJnKMso1bhi3rcmx0dG6dtgx3kWISCFxO qNgKPTD3kt7fMQhYPsJBiZxTiTFGYBg/z//sXtWilcEnW6nM/as3puHiWVG3+xNs8s7L 0zxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723550625; x=1724155425; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ArHqWK+pDk4THc6HGtgBtflIMEbd37YycfL2VsChgqA=; b=jKMeJjF1tPxCyX/rj46m46AokF9eIcfvlZTLtwSK/aCTBSAXMiurBKDKSBzGIHQSWl De1lRwM/3QIxlebSrhEAl5qr3OxxfLRbsUlebXzFZQDcDS3ULZop/in5Agf8ijVbNU4n 0xuyU7p4l6O+W4ZmpUy+/iYoUOOi/XI3LR+mbF6wmttEYVdZr2+Nyi4ktI7kPPeG6YAA C8YZ0ZZxG/FUXweXB/aztiRvdTcERhXodUTRdE4xZDitsHeGjTswjxO+dqcXs3uSPfQK 3koh28jqPM0cmT9voOonOKHIR+IgRJlB0SUsmRnUc0yYh8xcMSbYfMrQ4fd//54KGz4S pheg== X-Forwarded-Encrypted: i=1; AJvYcCUbcRzihpJ9gviBAMXzUo5MVZJsCVjf8ojfUQlyS1mali/td/0oG1GB/jqVcDWL/XEZsSKXNZjMcJmEzoHN2avIZBWzEzotfvNOZ0Tw9HY1XsfmBM5pL6o8PubRp6eEhQqQRjtyXrHp X-Gm-Message-State: AOJu0Yz/raWnzrsePkdVHlSbXKLwxn8zujwWRPof0P/Fz60Wl1EHlR0K 6mjc8y7pgxMGUgxSKT4X35lWQraL7wVzyGe9vw6T1U0+iRjPRF5P X-Google-Smtp-Source: AGHT+IFmFyxDVIM6hJt06Re344Hp5hGhJ4dfTOoArazVB1tqefIgGdnCSzB4+w1duntTekqvxAqRhA== X-Received: by 2002:a05:622a:316:b0:453:4aaa:d585 with SMTP id d75a77b69052e-4534aaad5f5mr34502981cf.4.1723550625396; Tue, 13 Aug 2024 05:03:45 -0700 (PDT) Received: from localhost (fwdproxy-ash-000.fbsv.net. [2a03:2880:20ff::face:b00c]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4531c1a7f12sm31519601cf.20.2024.08.13.05.03.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Aug 2024 05:03:45 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Shuang Zhai , Usama Arif Subject: [PATCH v3 2/6] mm: remap unused subpages to shared zeropage when splitting isolated thp Date: Tue, 13 Aug 2024 13:02:45 +0100 Message-ID: <20240813120328.1275952-3-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240813120328.1275952-1-usamaarif642@gmail.com> References: <20240813120328.1275952-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yu Zhao Here being unused means containing only zeros and inaccessible to userspace. When splitting an isolated thp under reclaim or migration, the unused subpages can be mapped to the shared zeropage, hence saving memory. This is particularly helpful when the internal fragmentation of a thp is high, i.e. it has many untouched subpages. This is also a prerequisite for THP low utilization shrinker which will be introduced in later patches, where underutilized THPs are split, and the zero-filled pages are freed saving memory. Signed-off-by: Yu Zhao Tested-by: Shuang Zhai Signed-off-by: Usama Arif --- include/linux/rmap.h | 7 ++++- mm/huge_memory.c | 8 ++--- mm/migrate.c | 71 ++++++++++++++++++++++++++++++++++++++------ mm/migrate_device.c | 4 +-- 4 files changed, 74 insertions(+), 16 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 0978c64f49d8..07854d1f9ad6 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -745,7 +745,12 @@ int folio_mkclean(struct folio *); int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t p= goff, struct vm_area_struct *vma); =20 -void remove_migration_ptes(struct folio *src, struct folio *dst, bool lock= ed); +enum rmp_flags { + RMP_LOCKED =3D 1 << 0, + RMP_USE_SHARED_ZEROPAGE =3D 1 << 1, +}; + +void remove_migration_ptes(struct folio *src, struct folio *dst, int flags= ); =20 /* * rmap_walk_control: To control rmap traversing for specific needs diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 85a424e954be..6df0e9f4f56c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2911,7 +2911,7 @@ bool unmap_huge_pmd_locked(struct vm_area_struct *vma= , unsigned long addr, return false; } =20 -static void remap_page(struct folio *folio, unsigned long nr) +static void remap_page(struct folio *folio, unsigned long nr, int flags) { int i =3D 0; =20 @@ -2919,7 +2919,7 @@ static void remap_page(struct folio *folio, unsigned = long nr) if (!folio_test_anon(folio)) return; for (;;) { - remove_migration_ptes(folio, folio, true); + remove_migration_ptes(folio, folio, RMP_LOCKED | flags); i +=3D folio_nr_pages(folio); if (i >=3D nr) break; @@ -3129,7 +3129,7 @@ static void __split_huge_page(struct page *page, stru= ct list_head *list, =20 if (nr_dropped) shmem_uncharge(folio->mapping->host, nr_dropped); - remap_page(folio, nr); + remap_page(folio, nr, PageAnon(head) ? RMP_USE_SHARED_ZEROPAGE : 0); =20 /* * set page to its compound_head when split to non order-0 pages, so @@ -3424,7 +3424,7 @@ int split_huge_page_to_list_to_order(struct page *pag= e, struct list_head *list, if (mapping) xas_unlock(&xas); local_irq_enable(); - remap_page(folio, folio_nr_pages(folio)); + remap_page(folio, folio_nr_pages(folio), 0); ret =3D -EAGAIN; } =20 diff --git a/mm/migrate.c b/mm/migrate.c index 66a5f73ebfdf..3288ac041d03 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -178,13 +178,56 @@ void putback_movable_pages(struct list_head *l) } } =20 +static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvm= w, + struct folio *folio, + unsigned long idx) +{ + struct page *page =3D folio_page(folio, idx); + bool contains_data; + pte_t newpte; + void *addr; + + VM_BUG_ON_PAGE(PageCompound(page), page); + VM_BUG_ON_PAGE(!PageAnon(page), page); + VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_PAGE(pte_present(*pvmw->pte), page); + + if (PageMlocked(page) || (pvmw->vma->vm_flags & VM_LOCKED)) + return false; + + /* + * The pmd entry mapping the old thp was flushed and the pte mapping + * this subpage has been non present. If the subpage is only zero-filled + * then map it to the shared zeropage. + */ + addr =3D kmap_local_page(page); + contains_data =3D memchr_inv(addr, 0, PAGE_SIZE); + kunmap_local(addr); + + if (contains_data || mm_forbids_zeropage(pvmw->vma->vm_mm)) + return false; + + newpte =3D pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address), + pvmw->vma->vm_page_prot)); + set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); + + dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); + return true; +} + +struct rmap_walk_arg { + struct folio *folio; + bool map_unused_to_zeropage; +}; + /* * Restore a potential migration pte to a working pte entry */ static bool remove_migration_pte(struct folio *folio, - struct vm_area_struct *vma, unsigned long addr, void *old) + struct vm_area_struct *vma, unsigned long addr, void *arg) { - DEFINE_FOLIO_VMA_WALK(pvmw, old, vma, addr, PVMW_SYNC | PVMW_MIGRATION); + struct rmap_walk_arg *rmap_walk_arg =3D arg; + DEFINE_FOLIO_VMA_WALK(pvmw, rmap_walk_arg->folio, vma, addr, PVMW_SYNC | = PVMW_MIGRATION); =20 while (page_vma_mapped_walk(&pvmw)) { rmap_t rmap_flags =3D RMAP_NONE; @@ -208,6 +251,9 @@ static bool remove_migration_pte(struct folio *folio, continue; } #endif + if (rmap_walk_arg->map_unused_to_zeropage && + try_to_map_unused_to_zeropage(&pvmw, folio, idx)) + continue; =20 folio_get(folio); pte =3D mk_pte(new, READ_ONCE(vma->vm_page_prot)); @@ -286,14 +332,21 @@ static bool remove_migration_pte(struct folio *folio, * Get rid of all migration entries and replace them by * references to the indicated page. */ -void remove_migration_ptes(struct folio *src, struct folio *dst, bool lock= ed) +void remove_migration_ptes(struct folio *src, struct folio *dst, int flags) { + struct rmap_walk_arg rmap_walk_arg =3D { + .folio =3D src, + .map_unused_to_zeropage =3D flags & RMP_USE_SHARED_ZEROPAGE, + }; + struct rmap_walk_control rwc =3D { .rmap_one =3D remove_migration_pte, - .arg =3D src, + .arg =3D &rmap_walk_arg, }; =20 - if (locked) + VM_BUG_ON_FOLIO((flags & RMP_USE_SHARED_ZEROPAGE) && (src !=3D dst), src); + + if (flags & RMP_LOCKED) rmap_walk_locked(dst, &rwc); else rmap_walk(dst, &rwc); @@ -903,7 +956,7 @@ static int writeout(struct address_space *mapping, stru= ct folio *folio) * At this point we know that the migration attempt cannot * be successful. */ - remove_migration_ptes(folio, folio, false); + remove_migration_ptes(folio, folio, 0); =20 rc =3D mapping->a_ops->writepage(&folio->page, &wbc); =20 @@ -1067,7 +1120,7 @@ static void migrate_folio_undo_src(struct folio *src, struct list_head *ret) { if (page_was_mapped) - remove_migration_ptes(src, src, false); + remove_migration_ptes(src, src, 0); /* Drop an anon_vma reference if we took one */ if (anon_vma) put_anon_vma(anon_vma); @@ -1305,7 +1358,7 @@ static int migrate_folio_move(free_folio_t put_new_fo= lio, unsigned long private, lru_add_drain(); =20 if (old_page_state & PAGE_WAS_MAPPED) - remove_migration_ptes(src, dst, false); + remove_migration_ptes(src, dst, 0); =20 out_unlock_both: folio_unlock(dst); @@ -1443,7 +1496,7 @@ static int unmap_and_move_huge_page(new_folio_t get_n= ew_folio, =20 if (page_was_mapped) remove_migration_ptes(src, - rc =3D=3D MIGRATEPAGE_SUCCESS ? dst : src, false); + rc =3D=3D MIGRATEPAGE_SUCCESS ? dst : src, 0); =20 unlock_put_anon: folio_unlock(dst); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 6d66dc1c6ffa..8f875636b35b 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -424,7 +424,7 @@ static unsigned long migrate_device_unmap(unsigned long= *src_pfns, continue; =20 folio =3D page_folio(page); - remove_migration_ptes(folio, folio, false); + remove_migration_ptes(folio, folio, 0); =20 src_pfns[i] =3D 0; folio_unlock(folio); @@ -837,7 +837,7 @@ void migrate_device_finalize(unsigned long *src_pfns, =20 src =3D page_folio(page); dst =3D page_folio(newpage); - remove_migration_ptes(src, dst, false); + remove_migration_ptes(src, dst, 0); folio_unlock(src); =20 if (is_zone_device_page(page)) --=20 2.43.5 From nobody Sun Feb 8 01:22:42 2026 Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D1711199384; Tue, 13 Aug 2024 12:03:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723550629; cv=none; b=hc14MlpnVylHA3vbnrmNiNy3utGdzTbTn0lFUzIe/8MJe2cBEqvexj0iMftgdGx01Ca3aVCBdo+qnO9ayzI3kwjtMszH1hwXEEQa+8BK19NOpQWBw6uKHhkcMGzDuEzqQOdUwg3E+IS1PqtdcKy44Kp87ngum/7FIbWdXdAkHf4= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723550629; c=relaxed/simple; bh=SjyxMmQKra7S5l/6xA9Wu7IUhwn3P+bFS/PXJwp2AHM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=e+zTTXmp2QR62LRvQ0ngvvn97q98PMb96NfYl5h2+E2KiPRKeYpfKAg/rjTttswaVi90sjUF2FTi9XbeINoAE+2qy9xB0kT2Zw0JCBSoCVj2Y3rfSYte97cxtczqelLy7bhtGx/em7XoVtq3nmKDQxr8ImluDortk8NFdzkpKlE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Ji2P19El; arc=none smtp.client-ip=209.85.219.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Ji2P19El" Received: by mail-qv1-f48.google.com with SMTP id 6a1803df08f44-6b7a36f26f3so53995866d6.1; Tue, 13 Aug 2024 05:03:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723550627; x=1724155427; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zqJ9Y6RR0iCOONporxiAWd0rvcRWYXEUM+u9cAIorwk=; b=Ji2P19ElqjT4qoqC7El27UqGlYO3z66wNCrGdqfdz5JZURBcqmJrCLBhusd5o5ROVN lc+7imK9FAyExuAHHbHWPVWwsZBLeuihtwr4drRyTf1UJCcx6vPUsmComdFmD9+pZc5Q 2/GUU2IJ0Hv8SS2mGj/OwK1igdQ+6n/Q3jIicvnJ8epVk7HfZiqa2ZsPlG50fve/LfxB vsCF5Zc85mGpNbmnAtdLXZxXK2kGmK2se5b84pQwJ2CMuJXCzknBm8nHXXKwCjjZyvzr q+UiYxaM9dsWSgnmTrK4vFJi7HBFm+elc2FWspWiO7WsxiaoNhiVeiNZNdiZ5UbqckQu afIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723550627; x=1724155427; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zqJ9Y6RR0iCOONporxiAWd0rvcRWYXEUM+u9cAIorwk=; b=EEy+o/m0eYV1vMda6QA09RDlNsc/LN9S1B7Mf+FunHUdqlEE9NAvV9kLNDoGosBTlf Neqdf+LOzRTzm4MutHh3wl5Z3n12xhTs2qbLc3eNWZJR9tpRXeAZvWPaqPcz+5nyzOrZ b7bby0ECWVk8XL4fOhuSPtLFsUJgrsigt0ESfMcjETeA4GHF4y2NTg8LLVsfInVbDu5n vBcnJ02WJDoSMOxB1cGaNQotK+wn5S1pNl92UbMXgkz68dFiLNgS7dNDYcAr645R51Nq RduKecIjwPl/d8LRgU4V7Nw/V/7ilEE11esOb67Y7wUwVtpn4z1fez5czkAk97SYhUqc /s2Q== X-Forwarded-Encrypted: i=1; AJvYcCUQLsOJQAhoNolFMFRs+m/SZVw3mBj9ob0NbDSDEOTaDBYLDeYGi4S5S5H1zOUJQKkHSjyWXWmYJobUcPd4Zj9YBsNlUzeoncm/EUqDo0a1/hdmYEcWpWSNRFTn4zdeiCgVFG3dfyrM X-Gm-Message-State: AOJu0YzBXECUjUXHyA/wvkqtMhd9Imj89VJsNzVJ52JP1r0t18pT46vD do7gaow46l0leOrASrKAcXTPDd0NlPH9PmH2nDA3eGG3tkz16DXN X-Google-Smtp-Source: AGHT+IFJ5i6wBfUkrpsXhs0Tc15ioa/fxXOSNWt+VVOIv2sQLq9LGVXdX8ciULa9jF+2pjwR9W6Tjw== X-Received: by 2002:a05:6214:2aab:b0:6bb:8b7b:c2df with SMTP id 6a1803df08f44-6bf50c6a12emr48629266d6.25.1723550626609; Tue, 13 Aug 2024 05:03:46 -0700 (PDT) Received: from localhost (fwdproxy-ash-112.fbsv.net. [2a03:2880:20ff:70::face:b00c]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6bd82e35785sm33439146d6.81.2024.08.13.05.03.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Aug 2024 05:03:46 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Alexander Zhu , Usama Arif Subject: [PATCH v3 3/6] mm: selftest to verify zero-filled pages are mapped to zeropage Date: Tue, 13 Aug 2024 13:02:46 +0100 Message-ID: <20240813120328.1275952-4-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240813120328.1275952-1-usamaarif642@gmail.com> References: <20240813120328.1275952-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Alexander Zhu When a THP is split, any subpage that is zero-filled will be mapped to the shared zeropage, hence saving memory. Add selftest to verify this by allocating zero-filled THP and comparing RssAnon before and after split. Signed-off-by: Alexander Zhu Acked-by: Rik van Riel Signed-off-by: Usama Arif --- .../selftests/mm/split_huge_page_test.c | 71 +++++++++++++++++++ tools/testing/selftests/mm/vm_util.c | 22 ++++++ tools/testing/selftests/mm/vm_util.h | 1 + 3 files changed, 94 insertions(+) diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/test= ing/selftests/mm/split_huge_page_test.c index e5e8dafc9d94..eb6d1b9fc362 100644 --- a/tools/testing/selftests/mm/split_huge_page_test.c +++ b/tools/testing/selftests/mm/split_huge_page_test.c @@ -84,6 +84,76 @@ static void write_debugfs(const char *fmt, ...) write_file(SPLIT_DEBUGFS, input, ret + 1); } =20 +static char *allocate_zero_filled_hugepage(size_t len) +{ + char *result; + size_t i; + + result =3D memalign(pmd_pagesize, len); + if (!result) { + printf("Fail to allocate memory\n"); + exit(EXIT_FAILURE); + } + + madvise(result, len, MADV_HUGEPAGE); + + for (i =3D 0; i < len; i++) + result[i] =3D (char)0; + + return result; +} + +static void verify_rss_anon_split_huge_page_all_zeroes(char *one_page, int= nr_hpages, size_t len) +{ + unsigned long rss_anon_before, rss_anon_after; + size_t i; + + if (!check_huge_anon(one_page, 4, pmd_pagesize)) { + printf("No THP is allocated\n"); + exit(EXIT_FAILURE); + } + + rss_anon_before =3D rss_anon(); + if (!rss_anon_before) { + printf("No RssAnon is allocated before split\n"); + exit(EXIT_FAILURE); + } + + /* split all THPs */ + write_debugfs(PID_FMT, getpid(), (uint64_t)one_page, + (uint64_t)one_page + len, 0); + + for (i =3D 0; i < len; i++) + if (one_page[i] !=3D (char)0) { + printf("%ld byte corrupted\n", i); + exit(EXIT_FAILURE); + } + + if (!check_huge_anon(one_page, 0, pmd_pagesize)) { + printf("Still AnonHugePages not split\n"); + exit(EXIT_FAILURE); + } + + rss_anon_after =3D rss_anon(); + if (rss_anon_after >=3D rss_anon_before) { + printf("Incorrect RssAnon value. Before: %ld After: %ld\n", + rss_anon_before, rss_anon_after); + exit(EXIT_FAILURE); + } +} + +void split_pmd_zero_pages(void) +{ + char *one_page; + int nr_hpages =3D 4; + size_t len =3D nr_hpages * pmd_pagesize; + + one_page =3D allocate_zero_filled_hugepage(len); + verify_rss_anon_split_huge_page_all_zeroes(one_page, nr_hpages, len); + printf("Split zero filled huge pages successful\n"); + free(one_page); +} + void split_pmd_thp(void) { char *one_page; @@ -431,6 +501,7 @@ int main(int argc, char **argv) =20 fd_size =3D 2 * pmd_pagesize; =20 + split_pmd_zero_pages(); split_pmd_thp(); split_pte_mapped_thp(); split_file_backed_thp(); diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests= /mm/vm_util.c index 5a62530da3b5..d8d0cf04bb57 100644 --- a/tools/testing/selftests/mm/vm_util.c +++ b/tools/testing/selftests/mm/vm_util.c @@ -12,6 +12,7 @@ =20 #define PMD_SIZE_FILE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_= size" #define SMAP_FILE_PATH "/proc/self/smaps" +#define STATUS_FILE_PATH "/proc/self/status" #define MAX_LINE_LENGTH 500 =20 unsigned int __page_size; @@ -171,6 +172,27 @@ uint64_t read_pmd_pagesize(void) return strtoul(buf, NULL, 10); } =20 +unsigned long rss_anon(void) +{ + unsigned long rss_anon =3D 0; + FILE *fp; + char buffer[MAX_LINE_LENGTH]; + + fp =3D fopen(STATUS_FILE_PATH, "r"); + if (!fp) + ksft_exit_fail_msg("%s: Failed to open file %s\n", __func__, STATUS_FILE= _PATH); + + if (!check_for_pattern(fp, "RssAnon:", buffer, sizeof(buffer))) + goto err_out; + + if (sscanf(buffer, "RssAnon:%10lu kB", &rss_anon) !=3D 1) + ksft_exit_fail_msg("Reading status error\n"); + +err_out: + fclose(fp); + return rss_anon; +} + bool __check_huge(void *addr, char *pattern, int nr_hpages, uint64_t hpage_size) { diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests= /mm/vm_util.h index 9007c420d52c..71b75429f4a5 100644 --- a/tools/testing/selftests/mm/vm_util.h +++ b/tools/testing/selftests/mm/vm_util.h @@ -39,6 +39,7 @@ unsigned long pagemap_get_pfn(int fd, char *start); void clear_softdirty(void); bool check_for_pattern(FILE *fp, const char *pattern, char *buf, size_t le= n); uint64_t read_pmd_pagesize(void); +uint64_t rss_anon(void); bool check_huge_anon(void *addr, int nr_hpages, uint64_t hpage_size); bool check_huge_file(void *addr, int nr_hpages, uint64_t hpage_size); bool check_huge_shmem(void *addr, int nr_hpages, uint64_t hpage_size); --=20 2.43.5 From nobody Sun Feb 8 01:22:42 2026 Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 291D11993B7; Tue, 13 Aug 2024 12:03:48 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.175 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723550630; cv=none; b=W6bpJI+VNzntlY0rZqZQAZ3vN/tdZB1568isGObOhvL5Rccnnu1khfs5rvdhZbnDX2tbUS6kYS4//GQIXtgstSWFbC90oSZeU6Ii4UmbDxDYG/tZU0TqUaMjLgnUy7TVI14JSQMk0Oi1Ao7aTa1twhRS0rvhHCAM6dIP1NpHe2Y= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723550630; c=relaxed/simple; bh=dWtkP23U4l8tMkGkY1qnb83yVWqh+FLWRkWBFCFBhgk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=k4RsHKH2E4GTv7qoEzuLpg0kIu0LrvvOMCTle+OA59AFUYRJzf8USMGg0u5H77oXJJPXi2sQOViMF3Y7WcU7MKZ8VZPMcAprTXy6yi4jDFwcEBk420IfPH77yfqF6Wo5QMtmF/djeMODEg+vSJEbavJx5s93V3tr9rpNFGj9jCo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=cVHn0uKZ; arc=none smtp.client-ip=209.85.222.175 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="cVHn0uKZ" Received: by mail-qk1-f175.google.com with SMTP id af79cd13be357-7a1dea79e1aso306028285a.1; Tue, 13 Aug 2024 05:03:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723550628; x=1724155428; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ml/fx4q0/GpQb3pUk4RTjdvQZB9z9qmUFwqEb9M1LyE=; b=cVHn0uKZooeZnJ5SbwR/QIC+4kNbWTBxbmMQ3kUgps5N0h8SU3TaIF4mVkWeFffWMD u7U+kv/Ds7r8HTnT8T2dkXt6FsoWDVMfS8RtacgOzIOgAWoGCKJBeL/EEK5LI9Q3wa0L tmnWRxtbmtT+20WxadIX305hSRrAbr0X8LFN/urspfEEL/a4uZyd0DHbNcwEOMKOHIqw 40bwIggCxlwigflSpcVlK01AjBrKIv6y1CH5imO2rA4s1R4W4rIBMiOu+QxxPqv3prji mcMsMG4I2seYg/LQRa3TIX+XOfcCDHxIiNiH3/Zf90KPtn6w7jU436fCEvIK2rxEtrNO iIsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723550628; x=1724155428; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ml/fx4q0/GpQb3pUk4RTjdvQZB9z9qmUFwqEb9M1LyE=; b=uesEDnf1eIVUkJPESIbE/37+02j1K8Lo6eL0xI5nVSHRJPDlLYxAclfpyrAZRFID9P lnjs92v78V4bd8tq4K2rl2uLylqvr8BySPIPZ+BpREghaTZs/jMct9rsQC4VQDn1KOWZ gRZkYSgz+zPyZAUqwdIK9UDqYuuJVW2Vcf/nIV6wx6nBfTJ/0dixdDss7VQEQC1jzxMz FgCEevz/h5KnFyc67W/I/a5XVpJYq9Dg2aCkU1/7JKhQL4Lb/x6PFcxsYdOvN1n/3Cjq Sjped9bdha0EC+nvKwxmqNWHRTBMZ0tYTzYdg11Bc0yxLPMfS/lSKVg1iSR/9t5TUnDB HjIg== X-Forwarded-Encrypted: i=1; AJvYcCUv4Q+3vUXqhhnbFNetEdKfcX2LHKm2XmIGVZqnLmHoTi+qdb9igBRm7Nkq/hyQMWw/8oMHxqMYNE/fW0AAw8NTfOLg6x2Xg0a7oLeQGfACKj7tTb7HQgDjW5sZ/Y9FcxvC95i+K/yU X-Gm-Message-State: AOJu0YyMKEnG0/A7llx4rCrXOIAEEU8RnUJHxwVoMm60+2/Flyj1jfo1 8uJvRjb67zdHBjGREReNuBKbAgjsLmCf114xkbsVYzuDBSJjg5lj X-Google-Smtp-Source: AGHT+IGnLAxIgNZTQh4vIeUHubCTk+9Li27aBYorSBwfeBTlQEsQV26mOZEo4xLzaw1PA1P3cAHmZQ== X-Received: by 2002:a05:620a:31a6:b0:7a3:7920:5500 with SMTP id af79cd13be357-7a4e15537c4mr389032385a.32.1723550627906; Tue, 13 Aug 2024 05:03:47 -0700 (PDT) Received: from localhost (fwdproxy-ash-006.fbsv.net. [2a03:2880:20ff:6::face:b00c]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a4c7d62678sm333062785a.5.2024.08.13.05.03.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Aug 2024 05:03:47 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v3 4/6] mm: Introduce a pageflag for partially mapped folios Date: Tue, 13 Aug 2024 13:02:47 +0100 Message-ID: <20240813120328.1275952-5-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240813120328.1275952-1-usamaarif642@gmail.com> References: <20240813120328.1275952-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently folio->_deferred_list is used to keep track of partially_mapped folios that are going to be split under memory pressure. In the next patch, all THPs that are faulted in and collapsed by khugepaged are also going to be tracked using _deferred_list. This patch introduces a pageflag to be able to distinguish between partially mapped folios and others in the deferred_list at split time in deferred_split_scan. Its needed as __folio_remove_rmap decrements _mapcount, _large_mapcount and _entire_mapcount, hence it won't be possible to distinguish between partially mapped folios and others in deferred_split_scan. Eventhough it introduces an extra flag to track if the folio is partially mapped, there is no functional change intended with this patch and the flag is not useful in this patch itself, it will become useful in the next patch when _deferred_list has non partially mapped folios. Signed-off-by: Usama Arif --- include/linux/huge_mm.h | 4 ++-- include/linux/page-flags.h | 3 +++ mm/huge_memory.c | 21 +++++++++++++-------- mm/hugetlb.c | 1 + mm/internal.h | 4 +++- mm/memcontrol.c | 3 ++- mm/migrate.c | 3 ++- mm/page_alloc.c | 5 +++-- mm/rmap.c | 3 ++- mm/vmscan.c | 3 ++- 10 files changed, 33 insertions(+), 17 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 4c32058cacfe..969f11f360d2 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -321,7 +321,7 @@ static inline int split_huge_page(struct page *page) { return split_huge_page_to_list_to_order(page, NULL, 0); } -void deferred_split_folio(struct folio *folio); +void deferred_split_folio(struct folio *folio, bool partially_mapped); =20 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, bool freeze, struct folio *folio); @@ -495,7 +495,7 @@ static inline int split_huge_page(struct page *page) { return 0; } -static inline void deferred_split_folio(struct folio *folio) {} +static inline void deferred_split_folio(struct folio *folio, bool partiall= y_mapped) {} #define split_huge_pmd(__vma, __pmd, __address) \ do { } while (0) =20 diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index a0a29bd092f8..cecc1bad7910 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -182,6 +182,7 @@ enum pageflags { /* At least one page in this folio has the hwpoison flag set */ PG_has_hwpoisoned =3D PG_active, PG_large_rmappable =3D PG_workingset, /* anon or file-backed */ + PG_partially_mapped, /* was identified to be partially mapped */ }; =20 #define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1) @@ -861,8 +862,10 @@ static inline void ClearPageCompound(struct page *page) ClearPageHead(page); } FOLIO_FLAG(large_rmappable, FOLIO_SECOND_PAGE) +FOLIO_FLAG(partially_mapped, FOLIO_SECOND_PAGE) #else FOLIO_FLAG_FALSE(large_rmappable) +FOLIO_FLAG_FALSE(partially_mapped) #endif =20 #define PG_head_mask ((1UL << PG_head)) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 6df0e9f4f56c..c024ab0f745c 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3397,6 +3397,7 @@ int split_huge_page_to_list_to_order(struct page *pag= e, struct list_head *list, * page_deferred_list. */ list_del_init(&folio->_deferred_list); + folio_clear_partially_mapped(folio); } spin_unlock(&ds_queue->split_queue_lock); if (mapping) { @@ -3453,11 +3454,12 @@ void __folio_undo_large_rmappable(struct folio *fol= io) if (!list_empty(&folio->_deferred_list)) { ds_queue->split_queue_len--; list_del_init(&folio->_deferred_list); + folio_clear_partially_mapped(folio); } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); } =20 -void deferred_split_folio(struct folio *folio) +void deferred_split_folio(struct folio *folio, bool partially_mapped) { struct deferred_split *ds_queue =3D get_deferred_split_queue(folio); #ifdef CONFIG_MEMCG @@ -3485,14 +3487,17 @@ void deferred_split_folio(struct folio *folio) if (folio_test_swapcache(folio)) return; =20 - if (!list_empty(&folio->_deferred_list)) - return; - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); + if (partially_mapped) + folio_set_partially_mapped(folio); + else + folio_clear_partially_mapped(folio); if (list_empty(&folio->_deferred_list)) { - if (folio_test_pmd_mappable(folio)) - count_vm_event(THP_DEFERRED_SPLIT_PAGE); - count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED); + if (partially_mapped) { + if (folio_test_pmd_mappable(folio)) + count_vm_event(THP_DEFERRED_SPLIT_PAGE); + count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED); + } list_add_tail(&folio->_deferred_list, &ds_queue->split_queue); ds_queue->split_queue_len++; #ifdef CONFIG_MEMCG @@ -3541,6 +3546,7 @@ static unsigned long deferred_split_scan(struct shrin= ker *shrink, } else { /* We lost race with folio_put() */ list_del_init(&folio->_deferred_list); + folio_clear_partially_mapped(folio); ds_queue->split_queue_len--; } if (!--sc->nr_to_scan) @@ -3558,7 +3564,6 @@ static unsigned long deferred_split_scan(struct shrin= ker *shrink, next: folio_put(folio); } - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); list_splice_tail(&list, &ds_queue->split_queue); spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1fdd9eab240c..2ae2d9a18e40 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1758,6 +1758,7 @@ static void __update_and_free_hugetlb_folio(struct hs= tate *h, free_gigantic_folio(folio, huge_page_order(h)); } else { INIT_LIST_HEAD(&folio->_deferred_list); + folio_clear_partially_mapped(folio); folio_put(folio); } } diff --git a/mm/internal.h b/mm/internal.h index 52f7fc4e8ac3..d64546b8d377 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -662,8 +662,10 @@ static inline void prep_compound_head(struct page *pag= e, unsigned int order) atomic_set(&folio->_entire_mapcount, -1); atomic_set(&folio->_nr_pages_mapped, 0); atomic_set(&folio->_pincount, 0); - if (order > 1) + if (order > 1) { INIT_LIST_HEAD(&folio->_deferred_list); + folio_clear_partially_mapped(folio); + } } =20 static inline void prep_compound_tail(struct page *head, int tail_idx) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e1ffd2950393..0fd95daecf9a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4669,7 +4669,8 @@ static void uncharge_folio(struct folio *folio, struc= t uncharge_gather *ug) VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); VM_BUG_ON_FOLIO(folio_order(folio) > 1 && !folio_test_hugetlb(folio) && - !list_empty(&folio->_deferred_list), folio); + !list_empty(&folio->_deferred_list) && + folio_test_partially_mapped(folio), folio); =20 /* * Nobody should be changing or seriously looking at diff --git a/mm/migrate.c b/mm/migrate.c index 3288ac041d03..6e32098ac2dc 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1734,7 +1734,8 @@ static int migrate_pages_batch(struct list_head *from, * use _deferred_list. */ if (nr_pages > 2 && - !list_empty(&folio->_deferred_list)) { + !list_empty(&folio->_deferred_list) && + folio_test_partially_mapped(folio)) { if (!try_split_folio(folio, split_folios, mode)) { nr_failed++; stats->nr_thp_failed +=3D is_thp; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 408ef3d25cf5..a145c550dd2a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -957,8 +957,9 @@ static int free_tail_page_prepare(struct page *head_pag= e, struct page *page) break; case 2: /* the second tail page: deferred_list overlaps ->mapping */ - if (unlikely(!list_empty(&folio->_deferred_list))) { - bad_page(page, "on deferred list"); + if (unlikely(!list_empty(&folio->_deferred_list) && + folio_test_partially_mapped(folio))) { + bad_page(page, "partially mapped folio on deferred list"); goto out; } break; diff --git a/mm/rmap.c b/mm/rmap.c index a6b9cd0b2b18..9ad558c2bad0 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1579,7 +1579,8 @@ static __always_inline void __folio_remove_rmap(struc= t folio *folio, */ if (partially_mapped && folio_test_anon(folio) && list_empty(&folio->_deferred_list)) - deferred_split_folio(folio); + deferred_split_folio(folio, true); + __folio_mod_stat(folio, -nr, -nr_pmdmapped); =20 /* diff --git a/mm/vmscan.c b/mm/vmscan.c index 25e43bb3b574..25f4e8403f41 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1233,7 +1233,8 @@ static unsigned int shrink_folio_list(struct list_hea= d *folio_list, * Split partially mapped folios right away. * We can free the unmapped pages without IO. */ - if (data_race(!list_empty(&folio->_deferred_list)) && + if (data_race(!list_empty(&folio->_deferred_list) && + folio_test_partially_mapped(folio)) && split_folio_to_list(folio, folio_list)) goto activate_locked; } --=20 2.43.5 From nobody Sun Feb 8 01:22:42 2026 Received: from mail-qk1-f179.google.com (mail-qk1-f179.google.com [209.85.222.179]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9912E199EB4; Tue, 13 Aug 2024 12:03:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.179 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723550633; cv=none; b=TiFpMBNiQLmKxTdV+nGEJ93Y0kZ+/M6QmGCZrsIlVNNFVIxboGm69wxM+Qp/chjOvusquvUPe9b2TT+DIIRRT/480tCLLIMrb9Uhtmjz11BJY4gzvVrIZSMgJ2jqQrhQHKHCRD/bztpsHPSuVn5Q2n7DtcJY4AOwGPeJWSmlK4s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723550633; c=relaxed/simple; bh=wuwopemU10Mqsny+RRhShmjAob6TMoVAPVpJZJOv+bk=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sED1tTny39uEtQhYscHR5g5Xpz4E62sE58LjsVgcAB4S9Zxlkvxn3v06gVrmsy9qUygOY3LaZV+JOClGXH1Kfnat2aw164NWys3iDeBx3Sskbi6usg7yP1WbtiahT7BIXEgDGKijDdfm2c2Yyl1DwjBUGM4TvsH4+zHun2o7Cs8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=CLJ86kFt; arc=none smtp.client-ip=209.85.222.179 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="CLJ86kFt" Received: by mail-qk1-f179.google.com with SMTP id af79cd13be357-7a3375015f8so376800985a.1; Tue, 13 Aug 2024 05:03:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723550629; x=1724155429; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=/3e8jqaD9j4AlkmfSijGsYDb40oSO+aTFas99401M0M=; b=CLJ86kFtCnsNq9nYhB1jETDlV8JXv3oSrzwACANFPbTke086D0aKyWMaG95qoB+S0w p2Fq703vl9iVTKwQiUFYIoLyfy7uZaxZsmDBfoATKoFwn0SREaYgOlhF/p3phsnnQpPU snDfNQjVuPdP9aUX/lXOIuPNoGT/VBBqp2tFKzKnpZ836NxGk3VPA1gmhzUHEgwK8sk4 2hR2RyC2PkOgbVrVxxmO6eqCRk25P4LVLGy+QZszIu3LODGQy7+MVAXua/rLkkcYUDSl cNUF58DM2rjaLg7kiEcDM9OcKuubXDzGXqGO/J+5vjl+2A5Vh40AUEqqmG1hAarmv7El +FIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723550629; x=1724155429; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/3e8jqaD9j4AlkmfSijGsYDb40oSO+aTFas99401M0M=; b=t9GA+tTvg6XJgnVSeSFM65ZO874oA6Xv7clcTI44u6P7lpmcYYKVLAlMsN9kULe1hr Xolr3OLlYMrtZqD+R4gVWHQ8swaNrHEtL27AjkDuMz47P8skXG7PZ9E5NbkHgsZQ8sQ1 9kE8u4k/awt6UWKZL0oQfzzYTLb3fCEJowmju8k5G7BHomhdQ7OJwAPd159lcfzOLHjR Me6/+6hfa8lJDO08ZG+UYmAlJDwaPgopemrV7BYJxK0gYXtCiEGSBnbJ94VdDnyclWzw ZVN4v+A2xn48U1iynQVRmTqR1OgYEaREnxhJVrBboPsUkw/MDyV+4WcLRE1oXKBKBInM UcHQ== X-Forwarded-Encrypted: i=1; AJvYcCWSNg42kMxrc8O08dicJzQpoI08X0jgdEpopPqYFzRRe5LISwWGGyQ5CRWv713stG+C3OzQS3K4TKZjMn76RxbvJnVlTcu3JOYVjd6pZGYZbZ448Tkdhrqog584frEkHyP3G+Ouk83g X-Gm-Message-State: AOJu0YyzfC9oDCQkKzYl3vR9hSET2LgD3s3SOWV/vjVGp/5ivNcDZ03x rKtHUToiDd+wNfleY9+YSq9cnANgM5Dep8TVlWOknT7EltvOLr5V X-Google-Smtp-Source: AGHT+IEFfd2FYRK2Jx6dZup0/ifNGdrGzF8HSf/AxkPfSro8RvwQqrR7qIuXRhOvG6+PgXGqlDdltw== X-Received: by 2002:a05:620a:2441:b0:79f:b72:fb30 with SMTP id af79cd13be357-7a4e1667d8amr395537185a.59.1723550629311; Tue, 13 Aug 2024 05:03:49 -0700 (PDT) Received: from localhost (fwdproxy-ash-000.fbsv.net. [2a03:2880:20ff::face:b00c]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a4c7e040e0sm333193985a.101.2024.08.13.05.03.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Aug 2024 05:03:48 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v3 5/6] mm: split underutilized THPs Date: Tue, 13 Aug 2024 13:02:48 +0100 Message-ID: <20240813120328.1275952-6-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240813120328.1275952-1-usamaarif642@gmail.com> References: <20240813120328.1275952-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is an attempt to mitigate the issue of running out of memory when THP is always enabled. During runtime whenever a THP is being faulted in (__do_huge_pmd_anonymous_page) or collapsed by khugepaged (collapse_huge_page), the THP is added to _deferred_list. Whenever memory reclaim happens in linux, the kernel runs the deferred_split shrinker which goes through the _deferred_list. If the folio was partially mapped, the shrinker attempts to split it. If the folio is not partially mapped, the shrinker checks if the THP was underutilized, i.e. how many of the base 4K pages of the entire THP were zero-filled. If this number goes above a certain threshold (decided by /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none), the shrinker will attempt to split that THP. Then at remap time, the pages that were zero-filled are mapped to the shared zeropage, hence saving memory. Suggested-by: Rik van Riel Co-authored-by: Johannes Weiner Signed-off-by: Usama Arif --- Documentation/admin-guide/mm/transhuge.rst | 6 ++ include/linux/khugepaged.h | 1 + include/linux/vm_event_item.h | 1 + mm/huge_memory.c | 76 ++++++++++++++++++++-- mm/khugepaged.c | 3 +- mm/vmstat.c | 1 + 6 files changed, 80 insertions(+), 8 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index 058485daf186..60522f49178b 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -447,6 +447,12 @@ thp_deferred_split_page splitting it would free up some memory. Pages on split queue are going to be split under memory pressure. =20 +thp_underutilized_split_page + is incremented when a huge page on the split queue was split + because it was underutilized. A THP is underutilized if the + number of zero pages in the THP is above a certain threshold + (/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none). + thp_split_pmd is incremented every time a PMD split into table of PTEs. This can happen, for instance, when application calls mprotect() or diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index f68865e19b0b..30baae91b225 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -4,6 +4,7 @@ =20 #include /* MMF_VM_HUGEPAGE */ =20 +extern unsigned int khugepaged_max_ptes_none __read_mostly; #ifdef CONFIG_TRANSPARENT_HUGEPAGE extern struct attribute_group khugepaged_attr_group; =20 diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index aae5c7c5cfb4..bf1470a7a737 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -105,6 +105,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_SPLIT_PAGE, THP_SPLIT_PAGE_FAILED, THP_DEFERRED_SPLIT_PAGE, + THP_UNDERUTILIZED_SPLIT_PAGE, THP_SPLIT_PMD, THP_SCAN_EXCEED_NONE_PTE, THP_SCAN_EXCEED_SWAP_PTE, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index c024ab0f745c..6b32b2d4ab1e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1087,6 +1087,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct= vm_fault *vmf, update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); mm_inc_nr_ptes(vma->vm_mm); + deferred_split_folio(folio, false); spin_unlock(vmf->ptl); count_vm_event(THP_FAULT_ALLOC); count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC); @@ -3522,6 +3523,39 @@ static unsigned long deferred_split_count(struct shr= inker *shrink, return READ_ONCE(ds_queue->split_queue_len); } =20 +static bool thp_underutilized(struct folio *folio) +{ + int num_zero_pages =3D 0, num_filled_pages =3D 0; + void *kaddr; + int i; + + if (khugepaged_max_ptes_none =3D=3D HPAGE_PMD_NR - 1) + return false; + + for (i =3D 0; i < folio_nr_pages(folio); i++) { + kaddr =3D kmap_local_folio(folio, i * PAGE_SIZE); + if (!memchr_inv(kaddr, 0, PAGE_SIZE)) { + num_zero_pages++; + if (num_zero_pages > khugepaged_max_ptes_none) { + kunmap_local(kaddr); + return true; + } + } else { + /* + * Another path for early exit once the number + * of non-zero filled pages exceeds threshold. + */ + num_filled_pages++; + if (num_filled_pages >=3D HPAGE_PMD_NR - khugepaged_max_ptes_none) { + kunmap_local(kaddr); + return false; + } + } + kunmap_local(kaddr); + } + return false; +} + static unsigned long deferred_split_scan(struct shrinker *shrink, struct shrink_control *sc) { @@ -3555,17 +3589,45 @@ static unsigned long deferred_split_scan(struct shr= inker *shrink, spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); =20 list_for_each_entry_safe(folio, next, &list, _deferred_list) { + bool did_split =3D false; + bool underutilized =3D false; + + if (folio_test_partially_mapped(folio)) + goto split; + underutilized =3D thp_underutilized(folio); + if (underutilized) + goto split; + continue; +split: if (!folio_trylock(folio)) - goto next; - /* split_huge_page() removes page from list on success */ - if (!split_folio(folio)) - split++; + continue; + did_split =3D !split_folio(folio); folio_unlock(folio); -next: - folio_put(folio); + if (did_split) { + /* Splitting removed folio from the list, drop reference here */ + folio_put(folio); + if (underutilized) + count_vm_event(THP_UNDERUTILIZED_SPLIT_PAGE); + split++; + } } + spin_lock_irqsave(&ds_queue->split_queue_lock, flags); - list_splice_tail(&list, &ds_queue->split_queue); + /* + * Only add back to the queue if folio is partially mapped. + * If thp_underutilized returns false, or if split_folio fails in + * the case it was underutilized, then consider it used and don't + * add it back to split_queue. + */ + list_for_each_entry_safe(folio, next, &list, _deferred_list) { + if (folio_test_partially_mapped(folio)) + list_move(&folio->_deferred_list, &ds_queue->split_queue); + else { + list_del_init(&folio->_deferred_list); + ds_queue->split_queue_len--; + } + folio_put(folio); + } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); =20 /* diff --git a/mm/khugepaged.c b/mm/khugepaged.c index cdd1d8655a76..02e1463e1a79 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -85,7 +85,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait); * * Note that these are only respected if collapse was initiated by khugepa= ged. */ -static unsigned int khugepaged_max_ptes_none __read_mostly; +unsigned int khugepaged_max_ptes_none __read_mostly; static unsigned int khugepaged_max_ptes_swap __read_mostly; static unsigned int khugepaged_max_ptes_shared __read_mostly; =20 @@ -1235,6 +1235,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, pgtable_trans_huge_deposit(mm, pmd, pgtable); set_pmd_at(mm, address, pmd, _pmd); update_mmu_cache_pmd(vma, address, pmd); + deferred_split_folio(folio, false); spin_unlock(pmd_ptl); =20 folio =3D NULL; diff --git a/mm/vmstat.c b/mm/vmstat.c index c3a402ea91f0..91cd7d4d482b 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1384,6 +1384,7 @@ const char * const vmstat_text[] =3D { "thp_split_page", "thp_split_page_failed", "thp_deferred_split_page", + "thp_underutilized_split_page", "thp_split_pmd", "thp_scan_exceed_none_pte", "thp_scan_exceed_swap_pte", --=20 2.43.5 From nobody Sun Feb 8 01:22:42 2026 Received: from mail-qv1-f50.google.com (mail-qv1-f50.google.com [209.85.219.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 73CED19AD5C; Tue, 13 Aug 2024 12:03:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723550633; cv=none; b=B9LuEFYe00YARzU8YdS3QlwN4S3k6EBF5IWTPU1aKvMpxetrLx/dPI9cdTn3qzS9UbNym7qqfVyNotgh8tGAWIeXqymJmATOi7qXnyxFRzEa7zYpR1+Fll62IyF6LahoD05EpMWj5qy/I1ZBnlaFFG3SNJW4CDoS0kjBSZ04iK8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1723550633; c=relaxed/simple; bh=J8iCl8NFugwQEvqBetgBik9A5B1qMEYPkoEZhRroEr0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=X9Bpj0gzvPUlTvi7E/m7Eq/D/MZWfqbvV3B6apkRJbJEvJpsQT37gBvXeYrYGvPraltCNxrUlSCWh3gOP5zEbXCF47vmRETvZrETFLsm2uyMtElTDMqRmbePBC3TQp4+f9NDuY1FiPFjLR7hcODAT+4yHLIQ1Rpp3GsFH4+Ep+A= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=Eh526LHw; arc=none smtp.client-ip=209.85.219.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Eh526LHw" Received: by mail-qv1-f50.google.com with SMTP id 6a1803df08f44-6b79fc76d03so30950326d6.1; Tue, 13 Aug 2024 05:03:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723550630; x=1724155430; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=fpDDmqOnJijcRyZ/zRVr0dfh/EUV8mM0nwD7hiV/Er4=; b=Eh526LHwFRietxpi0Yr2Qp+4lWG2PCwUeOb3l5q8sAcOy/AIgIyXLRAyqONFd9/eir CLa2NoqdvA6LFcjJ9Qwlj2apGGAbzbGxuCFBuf8uZgrdbgf5Io6xUDoh1PWXIobwY9Tx 7tJYrKqBdYvrmqwmyIXqfpvCy2mljWdMZPBv01yAm8jE8U21RknRKhipjM2SadF6o3kn Nkn0ARK8Sz3k7nYDf+C3V3CGT8VTl+IF3XBtiohKEA/ub+nFujpvmdie4KipHWJ0aZOn JkZi48aN5+um0QRGtaWeEKahRMgRCoX7Vaz+lMQCS5Vf42D6dKhcANz9XVNZPIumeitc uQWA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723550630; x=1724155430; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=fpDDmqOnJijcRyZ/zRVr0dfh/EUV8mM0nwD7hiV/Er4=; b=Hi0wWKoYUwLkL1vy21DaZ5nGFbhKVS7wNPhg/U+nLI5rRYPjja2QTklI/Uq02J5F38 u3SIfKSb1YlqvFktyhONifMDgzmzC9DDXAE5NUlGQRItsyOTJg0jXIhhR+G2B1K8FxW7 Wsw+rUlo0bGoCo6Qmbk0ZWMl9b50fpUWHX0i2UFk0DRoXRUMzKlmWa41aTcmNWpt/1+u 2Azt7NCcJ/q+T9y5jH1EEhJfEYXWt3WoDDhjYe33fZJjQbP9x93E2BWImOHBWPC7Mmjz vpz/SfhNHbGRJon4bEv0El41BeSS4kRsl5HY2lxnnsBidAIDW11Kqt6cvB5aGkP8GPbm Qh9Q== X-Forwarded-Encrypted: i=1; AJvYcCUOwEiNhwxFk6X9MkWJ4dAKGTLw/i92dKVga3nPmQnK0nK2A8/xukAwEuD781531/6mJtfF8WzfFwAF2moMWLyEtCteM85CG5tSP3s9iHu0DBp6AK4fhlwkOh3TOjYRZ3esjqsIqXum X-Gm-Message-State: AOJu0YyevFJofaxCJ1m8xnRfGBBML2LeW8bmGuU/CzGyhUncbsHjSCCi 6/lZZ709T9mSWmop/OKKi1NrUQ7sLH1HQvTqENKLb5vyaM76F39o X-Google-Smtp-Source: AGHT+IG/p4KHdjtbRHl8jrRDtZpiTzA00oY1agex2KO6TAfMpoEbk5fNk2V++Y8FeIZ5s4O/KzF07g== X-Received: by 2002:a05:6214:5f0b:b0:6b7:980b:e0ac with SMTP id 6a1803df08f44-6bf4f7e2315mr29329476d6.32.1723550630417; Tue, 13 Aug 2024 05:03:50 -0700 (PDT) Received: from localhost (fwdproxy-ash-011.fbsv.net. [2a03:2880:20ff:b::face:b00c]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6bd82c81cd0sm33464656d6.53.2024.08.13.05.03.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 13 Aug 2024 05:03:50 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v3 6/6] mm: add sysfs entry to disable splitting underutilized THPs Date: Tue, 13 Aug 2024 13:02:49 +0100 Message-ID: <20240813120328.1275952-7-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240813120328.1275952-1-usamaarif642@gmail.com> References: <20240813120328.1275952-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If disabled, THPs faulted in or collapsed will not be added to _deferred_list, and therefore won't be considered for splitting under memory pressure if underutilized. Signed-off-by: Usama Arif --- mm/huge_memory.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 6b32b2d4ab1e..b4d72479330d 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -74,6 +74,7 @@ static unsigned long deferred_split_count(struct shrinker= *shrink, struct shrink_control *sc); static unsigned long deferred_split_scan(struct shrinker *shrink, struct shrink_control *sc); +static bool split_underutilized_thp =3D true; =20 static atomic_t huge_zero_refcount; struct folio *huge_zero_folio __read_mostly; @@ -439,6 +440,27 @@ static ssize_t hpage_pmd_size_show(struct kobject *kob= j, static struct kobj_attribute hpage_pmd_size_attr =3D __ATTR_RO(hpage_pmd_size); =20 +static ssize_t split_underutilized_thp_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%d\n", split_underutilized_thp); +} + +static ssize_t split_underutilized_thp_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + int err =3D kstrtobool(buf, &split_underutilized_thp); + + if (err < 0) + return err; + + return count; +} + +static struct kobj_attribute split_underutilized_thp_attr =3D __ATTR( + thp_low_util_shrinker, 0644, split_underutilized_thp_show, split_underuti= lized_thp_store); + static struct attribute *hugepage_attr[] =3D { &enabled_attr.attr, &defrag_attr.attr, @@ -447,6 +469,7 @@ static struct attribute *hugepage_attr[] =3D { #ifdef CONFIG_SHMEM &shmem_enabled_attr.attr, #endif + &split_underutilized_thp_attr.attr, NULL, }; =20 @@ -3475,6 +3498,9 @@ void deferred_split_folio(struct folio *folio, bool p= artially_mapped) if (folio_order(folio) <=3D 1) return; =20 + if (!partially_mapped && !split_underutilized_thp) + return; + /* * The try_to_unmap() in page reclaim path might reach here too, * this may cause a race condition to corrupt deferred split queue. --=20 2.43.5