From nobody Sun Feb 8 04:30:15 2026 Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DE863BE46; Mon, 19 Aug 2024 02:31:52 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.180 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724034714; cv=none; b=fKGhy72RS2Oe3ldioS4COjfOa33bp7+zM3BiYOLinQ9zIx+FBLVPEGD/n0yz7QDcrlU4KLZVJjD4bJlfz4XNRUcwT/ZR9Rt6QGMHfQwb2hXsgk0KlnX+3yLFcQriOVylM5XfSGqDrcp5bkep6SlSJTOjCwSJoCsdwuaNjHk5fUo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724034714; c=relaxed/simple; bh=7960Kdu80UYNselAaKrcv+t7dmc7UVlFDU1k5afrQ+o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=G4t4bX1t2wWdifwybB/E5WrPxh+gDcnNIB3LbT0RUyAVOngrtwyTu8B3ysJ+HMDO5L2fZzsMNfl/OLajRG5sjmH0eVfOYSe9VyO9+0H2iU3HpkoqKquXT/SWaVPuUrEj2Iw3U3lLRXcqxLhWyBXf3WGJqxfhXFqWycQO4dNsLXM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=faY1Ik9T; arc=none smtp.client-ip=209.85.160.180 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="faY1Ik9T" Received: by mail-qt1-f180.google.com with SMTP id d75a77b69052e-44ff7cc5432so30688681cf.3; Sun, 18 Aug 2024 19:31:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724034712; x=1724639512; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=qHbAGWI5YhvG8+e8FC6gBqfnjsq6DXdxHGGYZ2mM3LI=; b=faY1Ik9Tj1UwU/coFamXQMlF4BRsHd+13GRFu5FPt4FK3GM53vugN3zchBPgp9p42p W5n/qk5K2E4aD/89d7si8yyltx4qzIks3w3Odh83cEcXkGWnOgJk2hqD9oXAIbWjDZQR 6hhfbcQkB0jjK5YfmOD1R9mwlUBzasRqQlKfrWAqn9Wlc3qJ3R06pehJruXx6l8s1/oR 6PsQq5phSTLXpmP/clUZusakiG3047K3tNNxkUKNtRAn0PZH2QBo8uZ+ojZDfWlGoeW1 zKXYughMAQ44rn215uIqIjd/HaHiHzd+/FOPLfNd5cePjBBLPUHn5frKWrUMOHzqT1eK Y47A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724034712; x=1724639512; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=qHbAGWI5YhvG8+e8FC6gBqfnjsq6DXdxHGGYZ2mM3LI=; b=ljszhrBVH201TwgjsHWV5Ehb17epJOtRINGxOuTTf9qh21l4nG/rRep/RuKMoEQO3X 5bycJS3UBmhbh53UJB1dy7DNrGxrGeKl58AHr1B9BtIFLkVsENhq7SKuefOKRHlkcSgQ GEEZSm6coKZQR0x+n3VwQ0hFqylIlW1++j54/lRmOdkpNT/FUKcBezeTs7ZTlDS/cZIT ZpP6hWdoSJLANvyXRqCnMT9o8bwB74w/0HbiumrRnDZ1gFI8h12Fx6txfqlJyFgYb04L SL3BSYf2KIvxcyyYhiX7l9dTU/TnW0VjSKdbnC1FCQNirrzQfVaVgAOB97JwqKWitkhj ZMdQ== X-Forwarded-Encrypted: i=1; AJvYcCWaCBOGubP6BvRY1ShZ6lwdtiaRjr/zezVIXrgOzkiSXpjmEcikICBLskyoRh4sk3CFrocmpPCuOorGCieRZbUwXGlF5FfBQDqz6pSjasBA7UAJX4BNpl19rQfWpUoYR/tN215RU4hN X-Gm-Message-State: AOJu0YxwxOnXzgExE3dVEiQFn2/F253mpuBbAwzWb1amNVsD/4cG3DhZ mEwGfhiTRJy5ANum67JjeJOGVcjQLWMWPcYB3ARVpEM4fG9qAJ4o X-Google-Smtp-Source: AGHT+IEpaTMKPf3VW76AVnS5nT7RUI9JcYK6qI1yzTo5GJ8U9EnjvGJNpHkQcNL620aPLntPx4HfJA== X-Received: by 2002:a05:622a:988:b0:44f:e132:14df with SMTP id d75a77b69052e-4537425375emr117719541cf.21.1724034711670; Sun, 18 Aug 2024 19:31:51 -0700 (PDT) Received: from localhost (fwdproxy-ash-003.fbsv.net. [2a03:2880:20ff:3::face:b00c]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-45369ff4bc2sm36832461cf.37.2024.08.18.19.31.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 18 Aug 2024 19:31:50 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, ryncsn@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Shuang Zhai , Usama Arif Subject: [PATCH v4 1/6] mm: free zapped tail pages when splitting isolated thp Date: Mon, 19 Aug 2024 03:30:54 +0100 Message-ID: <20240819023145.2415299-2-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819023145.2415299-1-usamaarif642@gmail.com> References: <20240819023145.2415299-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yu Zhao If a tail page has only two references left, one inherited from the isolation of its head and the other from lru_add_page_tail() which we are about to drop, it means this tail page was concurrently zapped. Then we can safely free it and save page reclaim or migration the trouble of trying it. Signed-off-by: Yu Zhao Tested-by: Shuang Zhai Signed-off-by: Usama Arif Acked-by: Johannes Weiner --- mm/huge_memory.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 04ee8abd6475..147655821f09 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3059,7 +3059,9 @@ static void __split_huge_page(struct page *page, stru= ct list_head *list, unsigned int new_nr =3D 1 << new_order; int order =3D folio_order(folio); unsigned int nr =3D 1 << order; + struct folio_batch free_folios; =20 + folio_batch_init(&free_folios); /* complete memcg works before add pages to LRU */ split_page_memcg(head, order, new_order); =20 @@ -3143,6 +3145,27 @@ static void __split_huge_page(struct page *page, str= uct list_head *list, if (subpage =3D=3D page) continue; folio_unlock(new_folio); + /* + * If a folio has only two references left, one inherited + * from the isolation of its head and the other from + * lru_add_page_tail() which we are about to drop, it means this + * folio was concurrently zapped. Then we can safely free it + * and save page reclaim or migration the trouble of trying it. + */ + if (list && folio_ref_freeze(new_folio, 2)) { + VM_WARN_ON_ONCE_FOLIO(folio_test_lru(new_folio), new_folio); + VM_WARN_ON_ONCE_FOLIO(folio_test_large(new_folio), new_folio); + VM_WARN_ON_ONCE_FOLIO(folio_mapped(new_folio), new_folio); + + folio_clear_active(new_folio); + folio_clear_unevictable(new_folio); + list_del(&new_folio->lru); + if (!folio_batch_add(&free_folios, new_folio)) { + mem_cgroup_uncharge_folios(&free_folios); + free_unref_folios(&free_folios); + } + continue; + } =20 /* * Subpages may be freed if there wasn't any mapping @@ -3153,6 +3176,11 @@ static void __split_huge_page(struct page *page, str= uct list_head *list, */ free_page_and_swap_cache(subpage); } + + if (free_folios.nr) { + mem_cgroup_uncharge_folios(&free_folios); + free_unref_folios(&free_folios); + } } =20 /* Racy check whether the huge page can be split */ --=20 2.43.5 From nobody Sun Feb 8 04:30:15 2026 Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2764824B34; Mon, 19 Aug 2024 02:31:53 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724034715; cv=none; b=GpRco34jPhM/lAlM1yFnf8BeF4ZQ1xBiIS2UXE4/m3XWPryPP5Rq6phJje6VutmS4jCUNx5p/sYleKOarZ94ZKxANNnBVBK/d4xTpqnL2xbMqfez70PmKWliOr8orPxeeoWmBMRhZAUtpWKBTQc2iva4uGripiCF5OOe8Bj6yLU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724034715; c=relaxed/simple; bh=OY3jL/RuT63CJ01Gu3hoK0+Ys0L8lwLLwKKANlweKos=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=KhTGThJ7vOxhScDEs6i/Q1RYLiiQm4aLmnIywc+jhKw+mD2EmnhAiSkz76oCnz+HOOW29/vEBsewaQaGWmjfvIoiEBIokD18pVb4AAgAhWHhuDi3NEZGdtd9GAe06H999RUlx0V1eAr3x/aJBg2yzYHdWHA2B7Jy75uzBHCUsgo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=knxV/M/P; arc=none smtp.client-ip=209.85.160.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="knxV/M/P" Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-454b3d89a18so10857911cf.2; Sun, 18 Aug 2024 19:31:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724034713; x=1724639513; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=+Be5wg7AuGlyVLPd79rHyMZjywH2a3H2lUduVQVliSc=; b=knxV/M/PEZZ+ppwYuKLT1gX3blNNdLCRGqT6gVZXezJSoaqZ0Mpuwn4TACg46Zay4h yHiMjY4YrVLcr0UXzfRspgkd6trLXK1h4Fp6eGE29mjYiWx4HgjVAj8rOZswYVVNNnzq OnhOlZcntOgCHtRZIgEVawzeE04Q55njuUjZuUmdGn0Ufmd60jcUnaQGFiUp2B0umW6h SENe9NDfKIIsRz0jBIWur4G5n6hhi6KvkvDIgXlhw4ywwQb1aDODTPaGyb3RGMrTkqc7 NJzxRgUMjfq2K5nlBX/G7+6t6Q+SWOkbfaYM/vARcHxVzLZ1CDvPSvLZYzpiZihj3sSk gCng== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724034713; x=1724639513; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=+Be5wg7AuGlyVLPd79rHyMZjywH2a3H2lUduVQVliSc=; b=DK3MFk4Fzfa+kIM2mrC39HWf7kF0kZcI38FDrOb2bVJ3pwodRqpbmNkXm9PAHQfPEn 3S/O/+ktiU7DQl9UEHR/XguqGJoxdSMv18zO/X1/gcgMcpie02glSbdwjU7jZKBEOn9d SAWjio2J5YWyA5Y8fZS7t3Sc72tHzrvG5gO5O9bb+4z10aoNF6yarHwD4BLhUEX9Dpc3 x88d6K4YCEobbTPnrAdAaMkTc4gsyIaG8RjvxryhEUkESYga0mksDFvPMK+Zw9Zlf5Ce ot9dVVT5xd4HzntNdizgYJO87m3x97EMokibmBEx7LiWNVtEukevRU5AaqjwXFSHsyWr ZVRQ== X-Forwarded-Encrypted: i=1; AJvYcCWsmE5xSvAJXQmKnnE0tTBrIHKP0LrYG2eOFGbmeoRHifW6dxmzKeJ1qDiSMrP1HGGFVf2r1pN7bweQCp+c6ZGc5h+QQM23/Q9Ukr7GiORxXt0aoz8ZAFeJuluXCNo82ocRjv89MeN+ X-Gm-Message-State: AOJu0Yww2ZWeLB6sTVw92D4OdJV7mY4ws7efRqHgPuAvjDjwSY7Phm5i bVltasnXue5fkrQJEDuFzPewyXaymMZPaa1aKSm2Ki+CmRbxZuUj X-Google-Smtp-Source: AGHT+IFeP4hzF59Alq38selex7r4IkghqvqpA36paciq5N3TrCWUhapW14wSlaHsVPgGfZ79jyfiJw== X-Received: by 2002:a05:622a:4c8b:b0:451:d557:22ed with SMTP id d75a77b69052e-454b67b6fd7mr63370071cf.11.1724034712823; Sun, 18 Aug 2024 19:31:52 -0700 (PDT) Received: from localhost (fwdproxy-ash-000.fbsv.net. [2a03:2880:20ff::face:b00c]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-45369ff5460sm36860341cf.35.2024.08.18.19.31.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 18 Aug 2024 19:31:52 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, ryncsn@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Shuang Zhai , Usama Arif Subject: [PATCH v4 2/6] mm: remap unused subpages to shared zeropage when splitting isolated thp Date: Mon, 19 Aug 2024 03:30:55 +0100 Message-ID: <20240819023145.2415299-3-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819023145.2415299-1-usamaarif642@gmail.com> References: <20240819023145.2415299-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yu Zhao Here being unused means containing only zeros and inaccessible to userspace. When splitting an isolated thp under reclaim or migration, the unused subpages can be mapped to the shared zeropage, hence saving memory. This is particularly helpful when the internal fragmentation of a thp is high, i.e. it has many untouched subpages. This is also a prerequisite for THP low utilization shrinker which will be introduced in later patches, where underutilized THPs are split, and the zero-filled pages are freed saving memory. Signed-off-by: Yu Zhao Tested-by: Shuang Zhai Signed-off-by: Usama Arif --- include/linux/rmap.h | 7 ++++- mm/huge_memory.c | 8 ++--- mm/migrate.c | 72 ++++++++++++++++++++++++++++++++++++++------ mm/migrate_device.c | 4 +-- 4 files changed, 75 insertions(+), 16 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 0978c64f49d8..07854d1f9ad6 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -745,7 +745,12 @@ int folio_mkclean(struct folio *); int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t p= goff, struct vm_area_struct *vma); =20 -void remove_migration_ptes(struct folio *src, struct folio *dst, bool lock= ed); +enum rmp_flags { + RMP_LOCKED =3D 1 << 0, + RMP_USE_SHARED_ZEROPAGE =3D 1 << 1, +}; + +void remove_migration_ptes(struct folio *src, struct folio *dst, int flags= ); =20 /* * rmap_walk_control: To control rmap traversing for specific needs diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 147655821f09..2d77b5d2291e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2911,7 +2911,7 @@ bool unmap_huge_pmd_locked(struct vm_area_struct *vma= , unsigned long addr, return false; } =20 -static void remap_page(struct folio *folio, unsigned long nr) +static void remap_page(struct folio *folio, unsigned long nr, int flags) { int i =3D 0; =20 @@ -2919,7 +2919,7 @@ static void remap_page(struct folio *folio, unsigned = long nr) if (!folio_test_anon(folio)) return; for (;;) { - remove_migration_ptes(folio, folio, true); + remove_migration_ptes(folio, folio, RMP_LOCKED | flags); i +=3D folio_nr_pages(folio); if (i >=3D nr) break; @@ -3129,7 +3129,7 @@ static void __split_huge_page(struct page *page, stru= ct list_head *list, =20 if (nr_dropped) shmem_uncharge(folio->mapping->host, nr_dropped); - remap_page(folio, nr); + remap_page(folio, nr, PageAnon(head) ? RMP_USE_SHARED_ZEROPAGE : 0); =20 /* * set page to its compound_head when split to non order-0 pages, so @@ -3425,7 +3425,7 @@ int split_huge_page_to_list_to_order(struct page *pag= e, struct list_head *list, if (mapping) xas_unlock(&xas); local_irq_enable(); - remap_page(folio, folio_nr_pages(folio)); + remap_page(folio, folio_nr_pages(folio), 0); ret =3D -EAGAIN; } =20 diff --git a/mm/migrate.c b/mm/migrate.c index 66a5f73ebfdf..2d2e65d69427 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -178,13 +178,57 @@ void putback_movable_pages(struct list_head *l) } } =20 +static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvm= w, + struct folio *folio, + unsigned long idx) +{ + struct page *page =3D folio_page(folio, idx); + bool contains_data; + pte_t newpte; + void *addr; + + VM_BUG_ON_PAGE(PageCompound(page), page); + VM_BUG_ON_PAGE(!PageAnon(page), page); + VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_PAGE(pte_present(*pvmw->pte), page); + + if (PageMlocked(page) || (pvmw->vma->vm_flags & VM_LOCKED) || + mm_forbids_zeropage(pvmw->vma->vm_mm)) + return false; + + /* + * The pmd entry mapping the old thp was flushed and the pte mapping + * this subpage has been non present. If the subpage is only zero-filled + * then map it to the shared zeropage. + */ + addr =3D kmap_local_page(page); + contains_data =3D memchr_inv(addr, 0, PAGE_SIZE); + kunmap_local(addr); + + if (contains_data) + return false; + + newpte =3D pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address), + pvmw->vma->vm_page_prot)); + set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); + + dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); + return true; +} + +struct rmap_walk_arg { + struct folio *folio; + bool map_unused_to_zeropage; +}; + /* * Restore a potential migration pte to a working pte entry */ static bool remove_migration_pte(struct folio *folio, - struct vm_area_struct *vma, unsigned long addr, void *old) + struct vm_area_struct *vma, unsigned long addr, void *arg) { - DEFINE_FOLIO_VMA_WALK(pvmw, old, vma, addr, PVMW_SYNC | PVMW_MIGRATION); + struct rmap_walk_arg *rmap_walk_arg =3D arg; + DEFINE_FOLIO_VMA_WALK(pvmw, rmap_walk_arg->folio, vma, addr, PVMW_SYNC | = PVMW_MIGRATION); =20 while (page_vma_mapped_walk(&pvmw)) { rmap_t rmap_flags =3D RMAP_NONE; @@ -208,6 +252,9 @@ static bool remove_migration_pte(struct folio *folio, continue; } #endif + if (rmap_walk_arg->map_unused_to_zeropage && + try_to_map_unused_to_zeropage(&pvmw, folio, idx)) + continue; =20 folio_get(folio); pte =3D mk_pte(new, READ_ONCE(vma->vm_page_prot)); @@ -286,14 +333,21 @@ static bool remove_migration_pte(struct folio *folio, * Get rid of all migration entries and replace them by * references to the indicated page. */ -void remove_migration_ptes(struct folio *src, struct folio *dst, bool lock= ed) +void remove_migration_ptes(struct folio *src, struct folio *dst, int flags) { + struct rmap_walk_arg rmap_walk_arg =3D { + .folio =3D src, + .map_unused_to_zeropage =3D flags & RMP_USE_SHARED_ZEROPAGE, + }; + struct rmap_walk_control rwc =3D { .rmap_one =3D remove_migration_pte, - .arg =3D src, + .arg =3D &rmap_walk_arg, }; =20 - if (locked) + VM_BUG_ON_FOLIO((flags & RMP_USE_SHARED_ZEROPAGE) && (src !=3D dst), src); + + if (flags & RMP_LOCKED) rmap_walk_locked(dst, &rwc); else rmap_walk(dst, &rwc); @@ -903,7 +957,7 @@ static int writeout(struct address_space *mapping, stru= ct folio *folio) * At this point we know that the migration attempt cannot * be successful. */ - remove_migration_ptes(folio, folio, false); + remove_migration_ptes(folio, folio, 0); =20 rc =3D mapping->a_ops->writepage(&folio->page, &wbc); =20 @@ -1067,7 +1121,7 @@ static void migrate_folio_undo_src(struct folio *src, struct list_head *ret) { if (page_was_mapped) - remove_migration_ptes(src, src, false); + remove_migration_ptes(src, src, 0); /* Drop an anon_vma reference if we took one */ if (anon_vma) put_anon_vma(anon_vma); @@ -1305,7 +1359,7 @@ static int migrate_folio_move(free_folio_t put_new_fo= lio, unsigned long private, lru_add_drain(); =20 if (old_page_state & PAGE_WAS_MAPPED) - remove_migration_ptes(src, dst, false); + remove_migration_ptes(src, dst, 0); =20 out_unlock_both: folio_unlock(dst); @@ -1443,7 +1497,7 @@ static int unmap_and_move_huge_page(new_folio_t get_n= ew_folio, =20 if (page_was_mapped) remove_migration_ptes(src, - rc =3D=3D MIGRATEPAGE_SUCCESS ? dst : src, false); + rc =3D=3D MIGRATEPAGE_SUCCESS ? dst : src, 0); =20 unlock_put_anon: folio_unlock(dst); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 6d66dc1c6ffa..8f875636b35b 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -424,7 +424,7 @@ static unsigned long migrate_device_unmap(unsigned long= *src_pfns, continue; =20 folio =3D page_folio(page); - remove_migration_ptes(folio, folio, false); + remove_migration_ptes(folio, folio, 0); =20 src_pfns[i] =3D 0; folio_unlock(folio); @@ -837,7 +837,7 @@ void migrate_device_finalize(unsigned long *src_pfns, =20 src =3D page_folio(page); dst =3D page_folio(newpage); - remove_migration_ptes(src, dst, false); + remove_migration_ptes(src, dst, 0); folio_unlock(src); =20 if (is_zone_device_page(page)) --=20 2.43.5 From nobody Sun Feb 8 04:30:15 2026 Received: from mail-yb1-f172.google.com (mail-yb1-f172.google.com [209.85.219.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B8688381B1; Mon, 19 Aug 2024 02:31:55 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724034719; cv=none; b=UZVsDQ5F6gaI7OXpi2tuD7NGem3u3HyOf+qgPcxIDbWGFcLKglMsf2NNKtrp4nHG6lMsq/nZ+NUViVvllLplzQ5nJUboHxbzACJZy3A2BZiMVing/8RCfNHIfiWoQSmYRTRHCtePdBgW2fN+5FdI7QY9GRgYlXkQHjLEq0RAgoQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724034719; c=relaxed/simple; bh=SjyxMmQKra7S5l/6xA9Wu7IUhwn3P+bFS/PXJwp2AHM=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=iczfePo7sjffvx7trwtVdQYx9lMju591DZjm1GywyHLGtfp9mm/4ZkD8Bj0D8L4viKDaNN8xPSElY99ZFmW09OI5scWklf1pGWYyZ2Q2Yk77L6GQdHNq5erYkVpGVKivO+F45sRWaHRpDNG4fgl0zKvSH3HCYs+gzO9XC+nDcmE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=VoKKLTkP; arc=none smtp.client-ip=209.85.219.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="VoKKLTkP" Received: by mail-yb1-f172.google.com with SMTP id 3f1490d57ef6-e13cce6dc85so1868536276.2; Sun, 18 Aug 2024 19:31:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724034715; x=1724639515; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=zqJ9Y6RR0iCOONporxiAWd0rvcRWYXEUM+u9cAIorwk=; b=VoKKLTkPe0etB+fHSLqyt/ByK00wMsISIaBCRtR2wRU5yRL1JCGiYZSQ3GYrORA7gX PF19SBmYUG8Nw7xdGWyu2gPvGrFIbsfOlJnQVFYYZY1awDpNRxq5xQQpYVWJpe//8/kg OVR3XBJ9of9qhjrYKudu8qSnArbfu2P9q4aUPGXuU1r8eziiuH5vAbnAf6suPfPEzLsI 03hPFYWvJfde0Wz7EsBQmlGkmLIGv3F3E2U/mdf043JzP2BRldnU6S2H7g4EG6y0esBS 3LmM7MX/htaHtu1AOlJPk6kZ5/qf17E8XmA4FDrSXUbGrVsX3ACTsrjntmxn2oi3cjir KlFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724034715; x=1724639515; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zqJ9Y6RR0iCOONporxiAWd0rvcRWYXEUM+u9cAIorwk=; b=A+P+u1fE+DKquG0Mxbfo2SxoScitgoQGf+UWHOuKP9BDC4F2cB3ScqUBRZ4MR7/MlG Jr0xmOURoS+Ltpg7Bciign/BkLYaUJgSJuAVkU+R7usvVWwh8xWG2NTCnWZoavgWdVO4 vsxp6dti5tASHMqv7vA8jR+157JnVWPl4s7qxYvqGOPLayQS+vWzPe5yZ8hb+k9Jpm3T j78ZTMzJGvi/jGVHkI6gcZNx/lDFyjRQEWdJUrSI8R4QjgyDiWfvTldMt3yXi2jwH4lu UHckVyaegmhxLXrYJq0aC27xj5fRQISMOiC4GGDGAqfuFhw8Ut5VCZGIchBT/34BRsZA t6wQ== X-Forwarded-Encrypted: i=1; AJvYcCXKsBpb5TaSR2TQHNNiSDiAA45R6LptZd+/w0DkalAr5QwYhI0YWFagoQRe7rhQ7foRQWUXhfRKq/2oBhRZ07sdEne2OuhHwGRW/gFQwd6WstAiAMhuEdt6qDR3+j8lIv0yY0fgOZsT X-Gm-Message-State: AOJu0Yz9knYt5fwRRUACYEvM1EEH8raHFs5tV6Y10qaByDUvtaP7RvEN F8rkD2AtOmQFPJ4FLXruY6KcMYP8Pd4AHliTzuxxRKkecOMCyssY X-Google-Smtp-Source: AGHT+IGftm6KdS613kRYfd4LezgLxWlUfzFZGyqU4Es+6S/AqWgBVnIYntbuL1qMdd7RJe0UQP4CYA== X-Received: by 2002:a05:6902:260c:b0:e13:eaf4:884f with SMTP id 3f1490d57ef6-e13eaf4b68emr3300790276.13.1724034714608; Sun, 18 Aug 2024 19:31:54 -0700 (PDT) Received: from localhost (fwdproxy-ash-008.fbsv.net. [2a03:2880:20ff:8::face:b00c]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4536a04e638sm36843281cf.71.2024.08.18.19.31.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 18 Aug 2024 19:31:53 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, ryncsn@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Alexander Zhu , Usama Arif Subject: [PATCH v4 3/6] mm: selftest to verify zero-filled pages are mapped to zeropage Date: Mon, 19 Aug 2024 03:30:56 +0100 Message-ID: <20240819023145.2415299-4-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819023145.2415299-1-usamaarif642@gmail.com> References: <20240819023145.2415299-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Alexander Zhu When a THP is split, any subpage that is zero-filled will be mapped to the shared zeropage, hence saving memory. Add selftest to verify this by allocating zero-filled THP and comparing RssAnon before and after split. Signed-off-by: Alexander Zhu Acked-by: Rik van Riel Signed-off-by: Usama Arif --- .../selftests/mm/split_huge_page_test.c | 71 +++++++++++++++++++ tools/testing/selftests/mm/vm_util.c | 22 ++++++ tools/testing/selftests/mm/vm_util.h | 1 + 3 files changed, 94 insertions(+) diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/test= ing/selftests/mm/split_huge_page_test.c index e5e8dafc9d94..eb6d1b9fc362 100644 --- a/tools/testing/selftests/mm/split_huge_page_test.c +++ b/tools/testing/selftests/mm/split_huge_page_test.c @@ -84,6 +84,76 @@ static void write_debugfs(const char *fmt, ...) write_file(SPLIT_DEBUGFS, input, ret + 1); } =20 +static char *allocate_zero_filled_hugepage(size_t len) +{ + char *result; + size_t i; + + result =3D memalign(pmd_pagesize, len); + if (!result) { + printf("Fail to allocate memory\n"); + exit(EXIT_FAILURE); + } + + madvise(result, len, MADV_HUGEPAGE); + + for (i =3D 0; i < len; i++) + result[i] =3D (char)0; + + return result; +} + +static void verify_rss_anon_split_huge_page_all_zeroes(char *one_page, int= nr_hpages, size_t len) +{ + unsigned long rss_anon_before, rss_anon_after; + size_t i; + + if (!check_huge_anon(one_page, 4, pmd_pagesize)) { + printf("No THP is allocated\n"); + exit(EXIT_FAILURE); + } + + rss_anon_before =3D rss_anon(); + if (!rss_anon_before) { + printf("No RssAnon is allocated before split\n"); + exit(EXIT_FAILURE); + } + + /* split all THPs */ + write_debugfs(PID_FMT, getpid(), (uint64_t)one_page, + (uint64_t)one_page + len, 0); + + for (i =3D 0; i < len; i++) + if (one_page[i] !=3D (char)0) { + printf("%ld byte corrupted\n", i); + exit(EXIT_FAILURE); + } + + if (!check_huge_anon(one_page, 0, pmd_pagesize)) { + printf("Still AnonHugePages not split\n"); + exit(EXIT_FAILURE); + } + + rss_anon_after =3D rss_anon(); + if (rss_anon_after >=3D rss_anon_before) { + printf("Incorrect RssAnon value. Before: %ld After: %ld\n", + rss_anon_before, rss_anon_after); + exit(EXIT_FAILURE); + } +} + +void split_pmd_zero_pages(void) +{ + char *one_page; + int nr_hpages =3D 4; + size_t len =3D nr_hpages * pmd_pagesize; + + one_page =3D allocate_zero_filled_hugepage(len); + verify_rss_anon_split_huge_page_all_zeroes(one_page, nr_hpages, len); + printf("Split zero filled huge pages successful\n"); + free(one_page); +} + void split_pmd_thp(void) { char *one_page; @@ -431,6 +501,7 @@ int main(int argc, char **argv) =20 fd_size =3D 2 * pmd_pagesize; =20 + split_pmd_zero_pages(); split_pmd_thp(); split_pte_mapped_thp(); split_file_backed_thp(); diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests= /mm/vm_util.c index 5a62530da3b5..d8d0cf04bb57 100644 --- a/tools/testing/selftests/mm/vm_util.c +++ b/tools/testing/selftests/mm/vm_util.c @@ -12,6 +12,7 @@ =20 #define PMD_SIZE_FILE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_= size" #define SMAP_FILE_PATH "/proc/self/smaps" +#define STATUS_FILE_PATH "/proc/self/status" #define MAX_LINE_LENGTH 500 =20 unsigned int __page_size; @@ -171,6 +172,27 @@ uint64_t read_pmd_pagesize(void) return strtoul(buf, NULL, 10); } =20 +unsigned long rss_anon(void) +{ + unsigned long rss_anon =3D 0; + FILE *fp; + char buffer[MAX_LINE_LENGTH]; + + fp =3D fopen(STATUS_FILE_PATH, "r"); + if (!fp) + ksft_exit_fail_msg("%s: Failed to open file %s\n", __func__, STATUS_FILE= _PATH); + + if (!check_for_pattern(fp, "RssAnon:", buffer, sizeof(buffer))) + goto err_out; + + if (sscanf(buffer, "RssAnon:%10lu kB", &rss_anon) !=3D 1) + ksft_exit_fail_msg("Reading status error\n"); + +err_out: + fclose(fp); + return rss_anon; +} + bool __check_huge(void *addr, char *pattern, int nr_hpages, uint64_t hpage_size) { diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests= /mm/vm_util.h index 9007c420d52c..71b75429f4a5 100644 --- a/tools/testing/selftests/mm/vm_util.h +++ b/tools/testing/selftests/mm/vm_util.h @@ -39,6 +39,7 @@ unsigned long pagemap_get_pfn(int fd, char *start); void clear_softdirty(void); bool check_for_pattern(FILE *fp, const char *pattern, char *buf, size_t le= n); uint64_t read_pmd_pagesize(void); +uint64_t rss_anon(void); bool check_huge_anon(void *addr, int nr_hpages, uint64_t hpage_size); bool check_huge_file(void *addr, int nr_hpages, uint64_t hpage_size); bool check_huge_shmem(void *addr, int nr_hpages, uint64_t hpage_size); --=20 2.43.5 From nobody Sun Feb 8 04:30:15 2026 Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 085F93BBE5; Mon, 19 Aug 2024 02:31:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724034718; cv=none; b=cIhVsW8a2gdh35d5YozSeIoXFSdcR3j3gwKI8PzZMPKR3duT6mXRFgO/1hZ3xN6E5BnSTSY0W/PLKfPJhawg3JkvWb7X9dlHSnWtloRTFEDSYcCSSsWUWPdEeSSp4tQ42HXR/dGvCr1spD76oN2QQyHwa7OqHLosFQGtkn2IA2U= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724034718; c=relaxed/simple; bh=4nTOFR0lFrNStvg4j9Qi3T6hCV1VVwmpLrLVXzIzNnI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=rMV88M6g63ZaC2NWFNVvHkFH5xiIYu/KPdDsKUGvsCnUXTts/AYrlYLAfvVcdTs3sGZPiRtuAPHhfw62W/6bTcJz3T8eJeG9OVI+qzQbF6HzZULruCAky99Iwi2ihRcNc9YJhqU6pyU8Vq1llYCbMbufmn1W+0Vw8vDLVWhgSiQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=a/3TRjPq; arc=none smtp.client-ip=209.85.160.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="a/3TRjPq" Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-45007373217so42795311cf.0; Sun, 18 Aug 2024 19:31:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724034716; x=1724639516; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=7qMuNDyBvGp6REiF+VtGrTnesU92EL9IKEWMO4HzFdQ=; b=a/3TRjPqy2KhmBpvGsLeztKAjq6oK3dVyhe1yYi0oGqqOHt8cB6T51ElkwPt7FEKYu vBmMrpf3gMUrSHTs74QX38FFer0X3G0mw1OvPy8kqkMI31K/208eR9mcH+R/Qoh5fEqy 4XMlC7vn0BYPbh4JAbj9dQ7e7hNqJ6TD1TGeBLaklcuhYSPs2+d0VOMqULLWoaulAlU0 OJREgCv7NvQH3xRTfywHflq52MMzWUq9rK3H+0l/X7MMjKq1CdqoVu9tqeS1kSbP9uGt RPVwBj9t2GOXcDfId+Sm6vcw3H/uv2qmZqMq8JZNTn6FvnlDpjKa11CpJw6gJ6YhgJTW R5qw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724034716; x=1724639516; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=7qMuNDyBvGp6REiF+VtGrTnesU92EL9IKEWMO4HzFdQ=; b=DCoN4XS/t6X0Q62raSmbNsCA78DpYP99cVZaoN2JqgMw6lKBn4sB5Ln+EcWICO27Dy TddS4mqMiZruOTJqXi+fMJgehWUADhg9f3TPykNU9A70XEvxGuE+HFCwy2uWydoHSvUX eqlDGfkKCrwq/0yJZTW7nanN1timzOQSEoZQcmN6ueGAWVLJosq0cfkq8/r08urRwpaO FuURlGFWD5quiOxWRFHUH1ZAF5Ng5JtYHV8oA8j4idz8MZ4iivD+5lj13mktJnvm1Pk2 fM8wKHyrmBBPzFUsXARFV4wTHxORXdT7c1m95FZP83KWgZOFr8tL4hg8sjGAMXXg7ghP TOOw== X-Forwarded-Encrypted: i=1; AJvYcCWPwEn78we/f4n4RuPv3kGxEDcNY3ALK9y1tIZR2tOVphjMklQTmd9DUtbr+oayH7eRgmF0kICqpNxhZieY3WSCzASZtJvzkwY7saCJ/MBpvyJ0m+7WAk9nG5YrDyuBR/u9bq3xwHGv X-Gm-Message-State: AOJu0Yyc3wQMgr63M+2TMrQvPneEPDsFU4czSNvoFjbIlYBqYSslnCqD LDyxY9MqmDEqHac0EQs2g8RZ7Z2P/k71POtsDvkiGcnCTlpljHA6 X-Google-Smtp-Source: AGHT+IFmtPd7hU7za122IjXT3WTFdEVF9e2slXlOQoczJAWqePZoKzw0RZWmiFcdFLHebyhRGR/CwA== X-Received: by 2002:a05:622a:4d0c:b0:453:1334:9725 with SMTP id d75a77b69052e-45374f5468cmr173712171cf.3.1724034715718; Sun, 18 Aug 2024 19:31:55 -0700 (PDT) Received: from localhost (fwdproxy-ash-000.fbsv.net. [2a03:2880:20ff::face:b00c]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-45369fee649sm36828591cf.29.2024.08.18.19.31.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 18 Aug 2024 19:31:55 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, ryncsn@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v4 4/6] mm: Introduce a pageflag for partially mapped folios Date: Mon, 19 Aug 2024 03:30:57 +0100 Message-ID: <20240819023145.2415299-5-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819023145.2415299-1-usamaarif642@gmail.com> References: <20240819023145.2415299-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently folio->_deferred_list is used to keep track of partially_mapped folios that are going to be split under memory pressure. In the next patch, all THPs that are faulted in and collapsed by khugepaged are also going to be tracked using _deferred_list. This patch introduces a pageflag to be able to distinguish between partially mapped folios and others in the deferred_list at split time in deferred_split_scan. Its needed as __folio_remove_rmap decrements _mapcount, _large_mapcount and _entire_mapcount, hence it won't be possible to distinguish between partially mapped folios and others in deferred_split_scan. Eventhough it introduces an extra flag to track if the folio is partially mapped, there is no functional change intended with this patch and the flag is not useful in this patch itself, it will become useful in the next patch when _deferred_list has non partially mapped folios. Signed-off-by: Usama Arif --- include/linux/huge_mm.h | 4 ++-- include/linux/page-flags.h | 11 +++++++++++ mm/huge_memory.c | 23 ++++++++++++++++------- mm/internal.h | 4 +++- mm/memcontrol.c | 3 ++- mm/migrate.c | 3 ++- mm/page_alloc.c | 5 +++-- mm/rmap.c | 5 +++-- mm/vmscan.c | 3 ++- 9 files changed, 44 insertions(+), 17 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 4c32058cacfe..969f11f360d2 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -321,7 +321,7 @@ static inline int split_huge_page(struct page *page) { return split_huge_page_to_list_to_order(page, NULL, 0); } -void deferred_split_folio(struct folio *folio); +void deferred_split_folio(struct folio *folio, bool partially_mapped); =20 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, bool freeze, struct folio *folio); @@ -495,7 +495,7 @@ static inline int split_huge_page(struct page *page) { return 0; } -static inline void deferred_split_folio(struct folio *folio) {} +static inline void deferred_split_folio(struct folio *folio, bool partiall= y_mapped) {} #define split_huge_pmd(__vma, __pmd, __address) \ do { } while (0) =20 diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index a0a29bd092f8..c3bb0e0da581 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -182,6 +182,7 @@ enum pageflags { /* At least one page in this folio has the hwpoison flag set */ PG_has_hwpoisoned =3D PG_active, PG_large_rmappable =3D PG_workingset, /* anon or file-backed */ + PG_partially_mapped =3D PG_reclaim, /* was identified to be partially map= ped */ }; =20 #define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1) @@ -861,8 +862,18 @@ static inline void ClearPageCompound(struct page *page) ClearPageHead(page); } FOLIO_FLAG(large_rmappable, FOLIO_SECOND_PAGE) +FOLIO_TEST_FLAG(partially_mapped, FOLIO_SECOND_PAGE) +/* + * PG_partially_mapped is protected by deferred_split split_queue_lock, + * so its safe to use non-atomic set/clear. + */ +__FOLIO_SET_FLAG(partially_mapped, FOLIO_SECOND_PAGE) +__FOLIO_CLEAR_FLAG(partially_mapped, FOLIO_SECOND_PAGE) #else FOLIO_FLAG_FALSE(large_rmappable) +FOLIO_TEST_FLAG_FALSE(partially_mapped) +__FOLIO_SET_FLAG_NOOP(partially_mapped) +__FOLIO_CLEAR_FLAG_NOOP(partially_mapped) #endif =20 #define PG_head_mask ((1UL << PG_head)) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 2d77b5d2291e..70ee49dfeaad 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3398,6 +3398,7 @@ int split_huge_page_to_list_to_order(struct page *pag= e, struct list_head *list, * page_deferred_list. */ list_del_init(&folio->_deferred_list); + __folio_clear_partially_mapped(folio); } spin_unlock(&ds_queue->split_queue_lock); if (mapping) { @@ -3454,11 +3455,13 @@ void __folio_undo_large_rmappable(struct folio *fol= io) if (!list_empty(&folio->_deferred_list)) { ds_queue->split_queue_len--; list_del_init(&folio->_deferred_list); + __folio_clear_partially_mapped(folio); } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); } =20 -void deferred_split_folio(struct folio *folio) +/* partially_mapped=3Dfalse won't clear PG_partially_mapped folio flag */ +void deferred_split_folio(struct folio *folio, bool partially_mapped) { struct deferred_split *ds_queue =3D get_deferred_split_queue(folio); #ifdef CONFIG_MEMCG @@ -3486,14 +3489,19 @@ void deferred_split_folio(struct folio *folio) if (folio_test_swapcache(folio)) return; =20 - if (!list_empty(&folio->_deferred_list)) - return; - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); + if (partially_mapped) { + if (!folio_test_partially_mapped(folio)) { + __folio_set_partially_mapped(folio); + if (folio_test_pmd_mappable(folio)) + count_vm_event(THP_DEFERRED_SPLIT_PAGE); + count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED); + } + } else { + /* partially mapped folios cannot become non-partially mapped */ + VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio); + } if (list_empty(&folio->_deferred_list)) { - if (folio_test_pmd_mappable(folio)) - count_vm_event(THP_DEFERRED_SPLIT_PAGE); - count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED); list_add_tail(&folio->_deferred_list, &ds_queue->split_queue); ds_queue->split_queue_len++; #ifdef CONFIG_MEMCG @@ -3542,6 +3550,7 @@ static unsigned long deferred_split_scan(struct shrin= ker *shrink, } else { /* We lost race with folio_put() */ list_del_init(&folio->_deferred_list); + __folio_clear_partially_mapped(folio); ds_queue->split_queue_len--; } if (!--sc->nr_to_scan) diff --git a/mm/internal.h b/mm/internal.h index 52f7fc4e8ac3..27cbb5365841 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -662,8 +662,10 @@ static inline void prep_compound_head(struct page *pag= e, unsigned int order) atomic_set(&folio->_entire_mapcount, -1); atomic_set(&folio->_nr_pages_mapped, 0); atomic_set(&folio->_pincount, 0); - if (order > 1) + if (order > 1) { INIT_LIST_HEAD(&folio->_deferred_list); + __folio_clear_partially_mapped(folio); + } } =20 static inline void prep_compound_tail(struct page *head, int tail_idx) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e1ffd2950393..0fd95daecf9a 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4669,7 +4669,8 @@ static void uncharge_folio(struct folio *folio, struc= t uncharge_gather *ug) VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); VM_BUG_ON_FOLIO(folio_order(folio) > 1 && !folio_test_hugetlb(folio) && - !list_empty(&folio->_deferred_list), folio); + !list_empty(&folio->_deferred_list) && + folio_test_partially_mapped(folio), folio); =20 /* * Nobody should be changing or seriously looking at diff --git a/mm/migrate.c b/mm/migrate.c index 2d2e65d69427..ef4a732f22b1 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1735,7 +1735,8 @@ static int migrate_pages_batch(struct list_head *from, * use _deferred_list. */ if (nr_pages > 2 && - !list_empty(&folio->_deferred_list)) { + !list_empty(&folio->_deferred_list) && + folio_test_partially_mapped(folio)) { if (!try_split_folio(folio, split_folios, mode)) { nr_failed++; stats->nr_thp_failed +=3D is_thp; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 408ef3d25cf5..a145c550dd2a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -957,8 +957,9 @@ static int free_tail_page_prepare(struct page *head_pag= e, struct page *page) break; case 2: /* the second tail page: deferred_list overlaps ->mapping */ - if (unlikely(!list_empty(&folio->_deferred_list))) { - bad_page(page, "on deferred list"); + if (unlikely(!list_empty(&folio->_deferred_list) && + folio_test_partially_mapped(folio))) { + bad_page(page, "partially mapped folio on deferred list"); goto out; } break; diff --git a/mm/rmap.c b/mm/rmap.c index a6b9cd0b2b18..4c330635aa4e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1578,8 +1578,9 @@ static __always_inline void __folio_remove_rmap(struc= t folio *folio, * Check partially_mapped first to ensure it is a large folio. */ if (partially_mapped && folio_test_anon(folio) && - list_empty(&folio->_deferred_list)) - deferred_split_folio(folio); + !folio_test_partially_mapped(folio)) + deferred_split_folio(folio, true); + __folio_mod_stat(folio, -nr, -nr_pmdmapped); =20 /* diff --git a/mm/vmscan.c b/mm/vmscan.c index 25e43bb3b574..25f4e8403f41 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1233,7 +1233,8 @@ static unsigned int shrink_folio_list(struct list_hea= d *folio_list, * Split partially mapped folios right away. * We can free the unmapped pages without IO. */ - if (data_race(!list_empty(&folio->_deferred_list)) && + if (data_race(!list_empty(&folio->_deferred_list) && + folio_test_partially_mapped(folio)) && split_folio_to_list(folio, folio_list)) goto activate_locked; } --=20 2.43.5 From nobody Sun Feb 8 04:30:15 2026 Received: from mail-yb1-f171.google.com (mail-yb1-f171.google.com [209.85.219.171]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9B7E7947A; Mon, 19 Aug 2024 02:31:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.171 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724034721; cv=none; b=tW337Bs9U17Hex1WzENnNEkohaVasubmVV8fWb9GGLL5H+pbG4auWLp1TycQRVsqXJSyTfMp/StFBD2BAMgGnyFU5f76FKEpIJo5tHyBkwKdIXtDLMCOtYMhv0QXlUBYOQzl3nl5+RJaw45IMpoCbCGEPQEfFebiWTmqiPrulk8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724034721; c=relaxed/simple; bh=Gn/49yLWPZ+HZnELT0RsBnck5s3XyCzfPaGzaRAJSZc=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=NWG65ozoh6fqF4rFPIKH2Gh1Zj1PKvjVGi7uU2Xn5vkoFP8xj8m874SudgeLo5ECSTliumPUSUZpdGmR/sjYKwUWB/38B9PAr2bA8MWYAClKEKN11Z3lGQ6+NDe4Qeli8vP2IWwhvg8KU1/Ia/nR86RIV8G6t7EM+QvJ3+urZlE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=YVBvrXZk; arc=none smtp.client-ip=209.85.219.171 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="YVBvrXZk" Received: by mail-yb1-f171.google.com with SMTP id 3f1490d57ef6-e026a2238d8so3838147276.0; Sun, 18 Aug 2024 19:31:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724034717; x=1724639517; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=BDAWwS723hN3aXuX35eiOFn0OpaDg+7RbZaRdmjXne8=; b=YVBvrXZk14fXpDlX0Xx5PCTa0QDLjNCa1eMIZeyh1oZntBNrUa9qaOMj9fHyJdVdxS Y0L1MfUSnCqZyUQ0Lt5aFQnf0clziPWcCfgiK+fZHn9ShBP9DdI0Sev9sCToUxC4aMje vphTxkc+c4uX8F3hBdy8lLF7+ZbKz3xytAwYWOfc3gj/FKKXlY5yReE+jmOapS2AlAnT zFbHR/G/Swumrr7LhVbPwNvQcNZqfk0Tv/iiwh7zdfzV0ekSCfFRjCvqpCFv7fRoiDFn E1S0RDYSaWjSLPbrhVI7Fd/90uGn14NKd6aMdSnygXU2ZVEXnJ92jxznvMcxDVFB22FL VzTQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724034717; x=1724639517; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BDAWwS723hN3aXuX35eiOFn0OpaDg+7RbZaRdmjXne8=; b=hhxr+bVJhow+WzK4+UmFQm9NFWfoE0Jpnh+LQbfXpuVrF7EaULwTlz4s5hOeqtRlaA 0k3C4nZOi7c4DDeqBd77eujlb+5Ndk3/PwDWYy0+aa8wnRtZkV6DP6cPQ1HLrTM5YXVG tQTg2vpc7r32d1wFUkqTM8m2Ul9HVqM2kl08dAwyRWVz2kjio1l4Ef45W8wMFtqGE54f IGvMCRfcyHAM0RuivzxAgEbvsJND2JAbDvy9idC4InIwwCIqUP5QIJB57+Ly6S9QZxhw 3kiFMyAjpN5oKmphnwLwslyaw0uDx1Kr4FSYtvrVXxRLDhzBKq+t5fGYIgwyYe5009L3 47AQ== X-Forwarded-Encrypted: i=1; AJvYcCWxYa65Sb/eIc50TC6yTGBTMbTArSBIxKcSJoNRYKyzrybUEBzPwXp5JafN2G/9SpYwyE4ztJpgYS3MAGlj@vger.kernel.org, AJvYcCXRRxkc35mh3io4XFkOWTG9nhVhhkutbxBBbeaS8pGP5zgfDqQwLlWwxBABOAsfSVjWiAE9e+5KuwY=@vger.kernel.org X-Gm-Message-State: AOJu0Yyvh2DHH/Fx3zfXFyJOqYsgz5AvSFIp8Odek4hwLMAmLJVwteMf gpsavZpJcPnIZm+Oc4JA6Y+gcRekMFe/5cTjolWbpviCnFRMlvzi X-Google-Smtp-Source: AGHT+IHePXgHOJtzLLyUXdkdD/gbE32Y/UIS3Wv6y/4aGanspmXVzMlLaX5LmK42xXQm7XY5FnN/nA== X-Received: by 2002:a05:6902:18c3:b0:e13:d3ec:2b8f with SMTP id 3f1490d57ef6-e13d3ec34a5mr7085223276.52.1724034717319; Sun, 18 Aug 2024 19:31:57 -0700 (PDT) Received: from localhost (fwdproxy-ash-014.fbsv.net. [2a03:2880:20ff:e::face:b00c]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-4536a0046a4sm36947491cf.41.2024.08.18.19.31.56 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 18 Aug 2024 19:31:56 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, ryncsn@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v4 5/6] mm: split underused THPs Date: Mon, 19 Aug 2024 03:30:58 +0100 Message-ID: <20240819023145.2415299-6-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819023145.2415299-1-usamaarif642@gmail.com> References: <20240819023145.2415299-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is an attempt to mitigate the issue of running out of memory when THP is always enabled. During runtime whenever a THP is being faulted in (__do_huge_pmd_anonymous_page) or collapsed by khugepaged (collapse_huge_page), the THP is added to _deferred_list. Whenever memory reclaim happens in linux, the kernel runs the deferred_split shrinker which goes through the _deferred_list. If the folio was partially mapped, the shrinker attempts to split it. If the folio is not partially mapped, the shrinker checks if the THP was underused, i.e. how many of the base 4K pages of the entire THP were zero-filled. If this number goes above a certain threshold (decided by /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none), the shrinker will attempt to split that THP. Then at remap time, the pages that were zero-filled are mapped to the shared zeropage, hence saving memory. Suggested-by: Rik van Riel Co-authored-by: Johannes Weiner Signed-off-by: Usama Arif --- Documentation/admin-guide/mm/transhuge.rst | 6 +++ include/linux/khugepaged.h | 1 + include/linux/vm_event_item.h | 1 + mm/huge_memory.c | 60 +++++++++++++++++++++- mm/khugepaged.c | 3 +- mm/vmstat.c | 1 + 6 files changed, 69 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index 058485daf186..40741b892aff 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -447,6 +447,12 @@ thp_deferred_split_page splitting it would free up some memory. Pages on split queue are going to be split under memory pressure. =20 +thp_underused_split_page + is incremented when a huge page on the split queue was split + because it was underused. A THP is underused if the number of + zero pages in the THP is above a certain threshold + (/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none). + thp_split_pmd is incremented every time a PMD split into table of PTEs. This can happen, for instance, when application calls mprotect() or diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index f68865e19b0b..30baae91b225 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -4,6 +4,7 @@ =20 #include /* MMF_VM_HUGEPAGE */ =20 +extern unsigned int khugepaged_max_ptes_none __read_mostly; #ifdef CONFIG_TRANSPARENT_HUGEPAGE extern struct attribute_group khugepaged_attr_group; =20 diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index aae5c7c5cfb4..aed952d04132 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -105,6 +105,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_SPLIT_PAGE, THP_SPLIT_PAGE_FAILED, THP_DEFERRED_SPLIT_PAGE, + THP_UNDERUSED_SPLIT_PAGE, THP_SPLIT_PMD, THP_SCAN_EXCEED_NONE_PTE, THP_SCAN_EXCEED_SWAP_PTE, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 70ee49dfeaad..f5363cf900f9 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1087,6 +1087,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct= vm_fault *vmf, update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); mm_inc_nr_ptes(vma->vm_mm); + deferred_split_folio(folio, false); spin_unlock(vmf->ptl); count_vm_event(THP_FAULT_ALLOC); count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC); @@ -3526,6 +3527,39 @@ static unsigned long deferred_split_count(struct shr= inker *shrink, return READ_ONCE(ds_queue->split_queue_len); } =20 +static bool thp_underused(struct folio *folio) +{ + int num_zero_pages =3D 0, num_filled_pages =3D 0; + void *kaddr; + int i; + + if (khugepaged_max_ptes_none =3D=3D HPAGE_PMD_NR - 1) + return false; + + for (i =3D 0; i < folio_nr_pages(folio); i++) { + kaddr =3D kmap_local_folio(folio, i * PAGE_SIZE); + if (!memchr_inv(kaddr, 0, PAGE_SIZE)) { + num_zero_pages++; + if (num_zero_pages > khugepaged_max_ptes_none) { + kunmap_local(kaddr); + return true; + } + } else { + /* + * Another path for early exit once the number + * of non-zero filled pages exceeds threshold. + */ + num_filled_pages++; + if (num_filled_pages >=3D HPAGE_PMD_NR - khugepaged_max_ptes_none) { + kunmap_local(kaddr); + return false; + } + } + kunmap_local(kaddr); + } + return false; +} + static unsigned long deferred_split_scan(struct shrinker *shrink, struct shrink_control *sc) { @@ -3559,13 +3593,35 @@ static unsigned long deferred_split_scan(struct shr= inker *shrink, spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); =20 list_for_each_entry_safe(folio, next, &list, _deferred_list) { + bool did_split =3D false; + bool underused =3D false; + + if (!folio_test_partially_mapped(folio)) { + underused =3D thp_underused(folio); + if (!underused) + goto next; + } if (!folio_trylock(folio)) goto next; - /* split_huge_page() removes page from list on success */ - if (!split_folio(folio)) + if (!split_folio(folio)) { + did_split =3D true; + if (underused) + count_vm_event(THP_UNDERUSED_SPLIT_PAGE); split++; + } folio_unlock(folio); next: + /* + * split_folio() removes folio from list on success. + * Only add back to the queue if folio is partially mapped. + * If thp_underused returns false, or if split_folio fails + * in the case it was underused, then consider it used and + * don't add it back to split_queue. + */ + if (!did_split && !folio_test_partially_mapped(folio)) { + list_del_init(&folio->_deferred_list); + ds_queue->split_queue_len--; + } folio_put(folio); } =20 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 6c42062478c1..2e138b22d939 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -85,7 +85,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait); * * Note that these are only respected if collapse was initiated by khugepa= ged. */ -static unsigned int khugepaged_max_ptes_none __read_mostly; +unsigned int khugepaged_max_ptes_none __read_mostly; static unsigned int khugepaged_max_ptes_swap __read_mostly; static unsigned int khugepaged_max_ptes_shared __read_mostly; =20 @@ -1235,6 +1235,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, pgtable_trans_huge_deposit(mm, pmd, pgtable); set_pmd_at(mm, address, pmd, _pmd); update_mmu_cache_pmd(vma, address, pmd); + deferred_split_folio(folio, false); spin_unlock(pmd_ptl); =20 folio =3D NULL; diff --git a/mm/vmstat.c b/mm/vmstat.c index c3a402ea91f0..6060bb7bbb44 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1384,6 +1384,7 @@ const char * const vmstat_text[] =3D { "thp_split_page", "thp_split_page_failed", "thp_deferred_split_page", + "thp_underused_split_page", "thp_split_pmd", "thp_scan_exceed_none_pte", "thp_scan_exceed_swap_pte", --=20 2.43.5 From nobody Sun Feb 8 04:30:15 2026 Received: from mail-qv1-f50.google.com (mail-qv1-f50.google.com [209.85.219.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 197D83FB9F; Mon, 19 Aug 2024 02:31:59 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.50 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724034721; cv=none; b=AkPrOQ0NvaWzqvZ6pANCg0npFKp4SmvFwV76NLwkIXtoZcdVkVpqQrEM1167sRMNd2TT9wp3cB0UdodsOO0gXPW8D4lAxUyZsnVY7FnX0+4DbFLMiiTyL5iUfZa2iK7ae++OycVdtJNLofYnX1wQ6cnlcWXtMMf50nYgLakHrCM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1724034721; c=relaxed/simple; bh=zy/HiLlGI/671eeY/clWWemO6VZMEviQAxIY7uwADFg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=EHWb9MJxZnkDKkZekmeQZ7IiIYGVQQraGQyKIawL+29BMKo4Ob41S3zm/hSwmld2cKKzVgP0Vm0/vy6mCByj+DnBAy9yuBKA8mY9wZn0/vFwD4rHFv0XhqMdpvHstymNgDlAspyqSRj/FZDW1B6FI25VhhTOxddqJWUEw9czfGE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=XrZoyfIn; arc=none smtp.client-ip=209.85.219.50 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="XrZoyfIn" Received: by mail-qv1-f50.google.com with SMTP id 6a1803df08f44-6bf84c3d043so11287136d6.3; Sun, 18 Aug 2024 19:31:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1724034719; x=1724639519; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=eO6lPMqYWNsw0u0kOe6TeGUxnE5KvmwuKqe4o/pamZ4=; b=XrZoyfInGURukwRRnH8B8V2sMFAwTX1M+46opVIQ0UAxiJeOf0nfMlnKEXTfKTln/v xOBLIQhEI1hE8CO1JZM2eZVeibuf24+zuhcR2qMw0P1ymHv8PdzdwAh/BwRSrv6zMVoE +LtVsYeV8SO2y02/OrU93jbSx8r++ZiAo7BaTVbFQUBqCc1xzlMTeynPzkJ+KbDPGAaJ xF+F+g3FuKBGWSvqki/FYsYOBd+b3FPOudwvuHrXKbiE41sBoJND+xKmItrNpbFmwtGE cqM0a4fpD2jX+zIcwCdJ9nvQptwm42rraa3nWw1+YLN1wByt+4piphZkpEXR1I8pzLXm 4QJA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1724034719; x=1724639519; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=eO6lPMqYWNsw0u0kOe6TeGUxnE5KvmwuKqe4o/pamZ4=; b=Jgn0AchhCu16KCeuKyKphEc7TF2WXK+oYKnJQPtJOB1ArxAS0GHH7rtrTRjkJ1eslV dyzpvph7O1bcbaI+MbGFjE9U81Xw0xmsGDgXMPIsA09yT9y6Scx0Sw0GNGXinMRxBNHj MidMzfS1iGvxHNFQ7FoDKVbEHRzIv+8ARcDpZFHMo93VlI1nLJbVgKm5y9FO7MVezoD6 dGP1oKu60tFwI4o6oTBIDqGzN+42lWqwY8rQ1p+le5da11xT3hMhWd90Z5Iogv0y7F7S Jdo8PDzKuKZm+3REzXuRqeDvOmfpTl/i7F4bUQeodnbRPYbGLhXweX0yOULJ0XRsOXng 5Hgw== X-Forwarded-Encrypted: i=1; AJvYcCVj28SeGZZBtYMf6CjvkbnesIZK79cov+moL2kuhM4HdQCqI/wrhPTPB0bwsHCJoCWzBecZYn3eOESVApCRqBgueJE3Wd1BZkwaxKPKekQDyLuU6Iivft2x+kG2ZijP1GAPbwAHeZQ5 X-Gm-Message-State: AOJu0YwwGnaofuErrmAYd4QqoK5zOqd/bU85T9HCdNcj1p/r4TtV7q4n G0qx9rajAAOCgYEgUb8/Nq+tzrQJwHvwcObMAM3P32DWf3fWDi2q X-Google-Smtp-Source: AGHT+IGNG9vxDFxLwp6LH6gHm98AbumciP+UKQShkj+bs5fdr5dQCOqFsqwOOMWP9Hvs4t2YHLHiLg== X-Received: by 2002:a05:6214:588e:b0:6bf:894e:7964 with SMTP id 6a1803df08f44-6bf894e7aabmr54977146d6.57.1724034718766; Sun, 18 Aug 2024 19:31:58 -0700 (PDT) Received: from localhost (fwdproxy-ash-012.fbsv.net. [2a03:2880:20ff:c::face:b00c]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6bf763cb74fsm36274866d6.112.2024.08.18.19.31.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 18 Aug 2024 19:31:58 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, ryncsn@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v4 6/6] mm: add sysfs entry to disable splitting underused THPs Date: Mon, 19 Aug 2024 03:30:59 +0100 Message-ID: <20240819023145.2415299-7-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240819023145.2415299-1-usamaarif642@gmail.com> References: <20240819023145.2415299-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If disabled, THPs faulted in or collapsed will not be added to _deferred_list, and therefore won't be considered for splitting under memory pressure if underused. Signed-off-by: Usama Arif --- Documentation/admin-guide/mm/transhuge.rst | 10 +++++++++ mm/huge_memory.c | 26 ++++++++++++++++++++++ 2 files changed, 36 insertions(+) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index 40741b892aff..02ae7bc9efbd 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -202,6 +202,16 @@ PMD-mappable transparent hugepage:: =20 cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size =20 +All THPs at fault and collapse time will be added to _deferred_list, +and will therefore be split under memory presure if they are considered +"underused". A THP is underused if the number of zero-filled pages in +the THP is above max_ptes_none (see below). It is possible to disable +this behaviour by writing 0 to shrink_underused, and enable it by writing +1 to it:: + + echo 0 > /sys/kernel/mm/transparent_hugepage/shrink_underused + echo 1 > /sys/kernel/mm/transparent_hugepage/shrink_underused + khugepaged will be automatically started when PMD-sized THP is enabled (either of the per-size anon control or the top-level control are set to "always" or "madvise"), and it'll be automatically shutdown when diff --git a/mm/huge_memory.c b/mm/huge_memory.c index f5363cf900f9..5d67d3b3c1b2 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -74,6 +74,7 @@ static unsigned long deferred_split_count(struct shrinker= *shrink, struct shrink_control *sc); static unsigned long deferred_split_scan(struct shrinker *shrink, struct shrink_control *sc); +static bool split_underused_thp =3D true; =20 static atomic_t huge_zero_refcount; struct folio *huge_zero_folio __read_mostly; @@ -439,6 +440,27 @@ static ssize_t hpage_pmd_size_show(struct kobject *kob= j, static struct kobj_attribute hpage_pmd_size_attr =3D __ATTR_RO(hpage_pmd_size); =20 +static ssize_t split_underused_thp_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%d\n", split_underused_thp); +} + +static ssize_t split_underused_thp_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + int err =3D kstrtobool(buf, &split_underused_thp); + + if (err < 0) + return err; + + return count; +} + +static struct kobj_attribute split_underused_thp_attr =3D __ATTR( + shrink_underused, 0644, split_underused_thp_show, split_underused_thp_sto= re); + static struct attribute *hugepage_attr[] =3D { &enabled_attr.attr, &defrag_attr.attr, @@ -447,6 +469,7 @@ static struct attribute *hugepage_attr[] =3D { #ifdef CONFIG_SHMEM &shmem_enabled_attr.attr, #endif + &split_underused_thp_attr.attr, NULL, }; =20 @@ -3477,6 +3500,9 @@ void deferred_split_folio(struct folio *folio, bool p= artially_mapped) if (folio_order(folio) <=3D 1) return; =20 + if (!partially_mapped && !split_underused_thp) + return; + /* * The try_to_unmap() in page reclaim path might reach here too, * this may cause a race condition to corrupt deferred split queue. --=20 2.43.5