From nobody Fri Dec 19 07:18:46 2025 Received: from mail-qk1-f174.google.com (mail-qk1-f174.google.com [209.85.222.174]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id EE59016F0EF; Fri, 30 Aug 2024 10:04:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.174 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725012285; cv=none; b=VIbbS9rbeo+wevYChSTGWrV0nOACuZE+HOsav2Bj25WkMwC4XkKlRHcPDgJ+iWHK5f7LMHG14Oxnuz5YkHVk2tVUHPIxJcCswYnatxbg6O7afgMb28yneijTYGho/GAcgFB+7hdb74uUcbzKsH3WnobNg5UTYZ+I65zKQy8PEe8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725012285; c=relaxed/simple; bh=fVFvsLDfOCuGJsWwcR/FCYnCnhmVNtmLd+5C5LFv9gQ=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=UefmQI6qhReO/YkYWrVM3zwq9LMi4d3PbSKeVsNK5HEdMt+xK8XDwBfYjhre8KAKNzEL5Nyr1Dn7VN/g2IevG/wfZBgxNT0vVMCODa/XhRwh38WuFHhBdnjf/Xi2YNsncnpJwoHNWDYZ/IEs80wVIDmFb6W2Gr7dc+InVKKgVqA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=ITPrvGzp; arc=none smtp.client-ip=209.85.222.174 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="ITPrvGzp" Received: by mail-qk1-f174.google.com with SMTP id af79cd13be357-7a7fd79ff71so104095985a.2; Fri, 30 Aug 2024 03:04:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725012283; x=1725617083; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=r5a45zgnKr1XZRxdreeDmqy+oLRtttPm8KTNMecFwc8=; b=ITPrvGzp5HSFif5Np610DRw8tw9ZpRm+06EhZYYDIy6h+4mBmaK+ePmQ4z1rXB9w5R XvuXNNCjeq9vb3HRNM+XgXKK4a3nBnHYf1f5LnLNFb1cLYTpybgBgvo12teJhDc3WuKe PC5/nx4HcR3pXcStKn3vi05W+kcedvrfvj3U/92H4tNXILNkSt6FpdbWZtZthQ5UUVxN zvJia/n+xr2sqZboAaTTxqWJIZd/flsGRFQq18tcaVYfGZxKALsSdCZVCMqFwxKhdwoR /Af8bfxvi05N9rvzD01BmFj/wUKbExjHlX79qcyYfc1oSVziX7r0rZHYNWwuvpcdaaZs MoTA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725012283; x=1725617083; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r5a45zgnKr1XZRxdreeDmqy+oLRtttPm8KTNMecFwc8=; b=EkfszyWZUpDLJpDb8S7mIU140NitG7mZzgx9AtV377hKqs8fn5evCgI9BsL0bGoQ3i wj9W+Gj+Pvc1rOWFPYVKNcScLLi120wZPmkO43sltG4Qv9rRQj8R+oogT5SvW1HUzFFg GDv/moO1ccQdYFDbPmKaRC6sb/5mcqbD/CyDrtvjCKfVCHDrxK23pLmm0zMqE/OqqOK1 N9m1jo5b45O9+1/kqVvttHhUP645ep8XmxHqfmSHhX8VkcR6JMRDvElLWhEAmn1C0Znt T6kHa2zmJMsOtFD2YaZ8jtENJQv1ODVLXB5+Mdy0vxBJ+qvhbz+MMaEte+B3PPEEa6xa s2wQ== X-Forwarded-Encrypted: i=1; AJvYcCW231TWpjAZyF+XrjFivtqSbvc6hj4R6tWem4eWRU85jOJYlHDAl1bSAlT2iwkYa/yyXMh7p2HshWHT5Ukp@vger.kernel.org, AJvYcCWpfvxB3vS3MCdzMY7jlHsSg7gViP37a1Hhkj7im9zAyleJiZCMZRgQsOFZARzE/JIQ0AGaXxjdG/Q=@vger.kernel.org X-Gm-Message-State: AOJu0YztnzHtcpAqDTeOjWKGUJGCFxuXXmf3k13dPxzmT0lWa6Lf/QhP yn3MyjHrOSGCXgNEwm4Ta0dJxztofxESC6sZ7GN1N/QRhO3mz9Zs X-Google-Smtp-Source: AGHT+IGgUTAqqh5BMFm/0McbIC4rNQXxf1Fjc0Krv4v/rvVrVCmsGivh9at8oTGIG8VkATlTayTR6g== X-Received: by 2002:a05:6214:5410:b0:6bf:6b15:a6d9 with SMTP id 6a1803df08f44-6c33e69f6ecmr68793496d6.51.1725012282619; Fri, 30 Aug 2024 03:04:42 -0700 (PDT) Received: from localhost (fwdproxy-ash-014.fbsv.net. [2a03:2880:20ff:e::face:b00c]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6c340c435dbsm13151326d6.71.2024.08.30.03.04.41 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Aug 2024 03:04:41 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, npache@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, ryncsn@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Shuang Zhai , Usama Arif Subject: [PATCH v5 1/6] mm: free zapped tail pages when splitting isolated thp Date: Fri, 30 Aug 2024 11:03:35 +0100 Message-ID: <20240830100438.3623486-2-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240830100438.3623486-1-usamaarif642@gmail.com> References: <20240830100438.3623486-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yu Zhao If a tail page has only two references left, one inherited from the isolation of its head and the other from lru_add_page_tail() which we are about to drop, it means this tail page was concurrently zapped. Then we can safely free it and save page reclaim or migration the trouble of trying it. Signed-off-by: Yu Zhao Tested-by: Shuang Zhai Acked-by: Johannes Weiner Signed-off-by: Usama Arif --- mm/huge_memory.c | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 15418ffdd377..0c48806ccb9a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3170,7 +3170,9 @@ static void __split_huge_page(struct page *page, stru= ct list_head *list, unsigned int new_nr =3D 1 << new_order; int order =3D folio_order(folio); unsigned int nr =3D 1 << order; + struct folio_batch free_folios; =20 + folio_batch_init(&free_folios); /* complete memcg works before add pages to LRU */ split_page_memcg(head, order, new_order); =20 @@ -3254,6 +3256,27 @@ static void __split_huge_page(struct page *page, str= uct list_head *list, if (subpage =3D=3D page) continue; folio_unlock(new_folio); + /* + * If a folio has only two references left, one inherited + * from the isolation of its head and the other from + * lru_add_page_tail() which we are about to drop, it means this + * folio was concurrently zapped. Then we can safely free it + * and save page reclaim or migration the trouble of trying it. + */ + if (list && folio_ref_freeze(new_folio, 2)) { + VM_WARN_ON_ONCE_FOLIO(folio_test_lru(new_folio), new_folio); + VM_WARN_ON_ONCE_FOLIO(folio_test_large(new_folio), new_folio); + VM_WARN_ON_ONCE_FOLIO(folio_mapped(new_folio), new_folio); + + folio_clear_active(new_folio); + folio_clear_unevictable(new_folio); + list_del(&new_folio->lru); + if (!folio_batch_add(&free_folios, new_folio)) { + mem_cgroup_uncharge_folios(&free_folios); + free_unref_folios(&free_folios); + } + continue; + } =20 /* * Subpages may be freed if there wasn't any mapping @@ -3264,6 +3287,11 @@ static void __split_huge_page(struct page *page, str= uct list_head *list, */ free_page_and_swap_cache(subpage); } + + if (free_folios.nr) { + mem_cgroup_uncharge_folios(&free_folios); + free_unref_folios(&free_folios); + } } =20 /* Racy check whether the huge page can be split */ --=20 2.43.5 From nobody Fri Dec 19 07:18:46 2025 Received: from mail-oo1-f48.google.com (mail-oo1-f48.google.com [209.85.161.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B63C917B50D; Fri, 30 Aug 2024 10:04:45 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.161.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725012287; cv=none; b=oTgrjB3G9vEq4vt0MsJyPfwJPU/04hpxmJoil6MQqfdTic0NaWXqoIZYFEib4pNVxi7HgOnxKH9saGMOTm6K7EXublLqrgXzSQIPwj1dWHv6JIsJQpV3tJRAYvxlo6qPBzZTYn2nJLpv0yZIv386Ya2B/aCH3Rx9Yvs3q/n9/Lk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725012287; c=relaxed/simple; bh=Fdz8y4Ewmwe+LSgIoQD8kMMFXF75XcNR2I94nxW1Epg=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=gWgd+rCvnTElN1sJM7oelGkdBTpXWaLYLzZxv3ub29QUJzCqa831x3HFB7jqkMZfZpclnv4bambP+m25+CZ7aY2cUK6TfvOS5Lz4HbdvPg0AWogSRzBLJMoY6J1Bg44anYl3cg9NIQyzBT/m090CKM4gpLAMhbzLBeXqo9nxcnc= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=RKxa2nEz; arc=none smtp.client-ip=209.85.161.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="RKxa2nEz" Received: by mail-oo1-f48.google.com with SMTP id 006d021491bc7-5de8b17db8dso915449eaf.2; Fri, 30 Aug 2024 03:04:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725012285; x=1725617085; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=lsaItdHLrkOs94vXPLpVeca+5XkpGp6YNABO6738q6c=; b=RKxa2nEzIgS8iRBBDoYChz1KnryiKd44u0gyOMcPv4IWE4sOU7Rp/TpQQZRiKfwmuX 9fwvD/K5Sgc5ga/q6xgEWKM0f/8LB55wLP8cuVt28EkmzCjaeMyYr4pqLwnvpsFRJRCg FX4BlIf8nr152J3Nz25M31xhqzRSnWdxSwRzF1QZb8xvpGrkNGiJQzetWyu2UNAMjkoT PVfIAaZreiUSHU9IFBD3lU9SWtGDT4Sr74fmDD6u2MvImc78QwKq/OSmp6/sj/bspOwi rxrtZCY8hLJ4COxhnmGTSzuneECXLTTw15sIkzqkicOPildbZkpGc7km/GSc8d2zJQ6/ FACw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725012285; x=1725617085; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lsaItdHLrkOs94vXPLpVeca+5XkpGp6YNABO6738q6c=; b=Ncwsn7tU1ap22sCPOIf0z/SgSmKXJmp6zgPERAZb72JyN0BDqMMVdElrl6SN+TFoVC hEceLA06BDSAQ6ljNtuXZfU0sgTDACflfrqss/+CdhdJ6qZZPG/1N/sgqvJ+nCmAXhQ4 /Yr7BFCUQIWN7mq8CoD8+OdS8uOnVcDzH41ffl56FjzHH9902LkQxqd0P8gxf0Juj3pL rhyHsuhp1RpA2ASO1qROKZg5OlDXbsD9MT2Wb1hxmNeHWOh1vj9JL9fdeuYorJNpj8ni BC5lvg9fXyGDrdrC/Q6GVvROO6arVLYBxHg2MToCj6Haq0Hq8+Ly2EkzUB365Y2ITTy9 2B6A== X-Forwarded-Encrypted: i=1; AJvYcCULtefZeqs0l8JYQ8eAUxH2KXYpBxSQfSr6T7GuxBQY5TDppKR7vliV4XzB8cIcA1oUwRSXTBqqXRTnrZbY@vger.kernel.org, AJvYcCXCASqz6MYR1+fqj5+i55BJ23iFINYJyy6b0QzgzX6DddzGMk2/YoNr3ZvsBiQ+eo2dZjOfIEGZTw4=@vger.kernel.org X-Gm-Message-State: AOJu0Yw3z7wDmfsYvFmNgRg4nTMhY/0jmmECCqD5Y8kmn04TEo2gNEtt 8HRg4b/7vSXXhg5TYBz6v2NsiOdWQYVNHjo4E8aMb8tLjTp0PaONNjx6qd09 X-Google-Smtp-Source: AGHT+IHVXyDvTPcFg0d6jozxHg1gXGpjAfSA3ZiESWLvV7o1PUmHfmhl+s0Gs537cY51gwYtkOJCDA== X-Received: by 2002:a05:6820:1c9a:b0:5d5:bce7:677 with SMTP id 006d021491bc7-5dfacf2ff24mr1741737eaf.7.1725012284525; Fri, 30 Aug 2024 03:04:44 -0700 (PDT) Received: from localhost (fwdproxy-ash-113.fbsv.net. [2a03:2880:20ff:71::face:b00c]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6c340c96856sm13189406d6.89.2024.08.30.03.04.43 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Aug 2024 03:04:43 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, npache@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, ryncsn@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Shuang Zhai , Usama Arif Subject: [PATCH v5 2/6] mm: remap unused subpages to shared zeropage when splitting isolated thp Date: Fri, 30 Aug 2024 11:03:36 +0100 Message-ID: <20240830100438.3623486-3-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240830100438.3623486-1-usamaarif642@gmail.com> References: <20240830100438.3623486-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yu Zhao Here being unused means containing only zeros and inaccessible to userspace. When splitting an isolated thp under reclaim or migration, the unused subpages can be mapped to the shared zeropage, hence saving memory. This is particularly helpful when the internal fragmentation of a thp is high, i.e. it has many untouched subpages. This is also a prerequisite for THP low utilization shrinker which will be introduced in later patches, where underutilized THPs are split, and the zero-filled pages are freed saving memory. Signed-off-by: Yu Zhao Tested-by: Shuang Zhai Signed-off-by: Usama Arif --- include/linux/rmap.h | 7 ++++- mm/huge_memory.c | 8 ++--- mm/migrate.c | 72 ++++++++++++++++++++++++++++++++++++++------ mm/migrate_device.c | 4 +-- 4 files changed, 75 insertions(+), 16 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 91b5935e8485..d5e93e44322e 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -745,7 +745,12 @@ int folio_mkclean(struct folio *); int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t p= goff, struct vm_area_struct *vma); =20 -void remove_migration_ptes(struct folio *src, struct folio *dst, bool lock= ed); +enum rmp_flags { + RMP_LOCKED =3D 1 << 0, + RMP_USE_SHARED_ZEROPAGE =3D 1 << 1, +}; + +void remove_migration_ptes(struct folio *src, struct folio *dst, int flags= ); =20 /* * rmap_walk_control: To control rmap traversing for specific needs diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0c48806ccb9a..af60684e7c70 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3020,7 +3020,7 @@ bool unmap_huge_pmd_locked(struct vm_area_struct *vma= , unsigned long addr, return false; } =20 -static void remap_page(struct folio *folio, unsigned long nr) +static void remap_page(struct folio *folio, unsigned long nr, int flags) { int i =3D 0; =20 @@ -3028,7 +3028,7 @@ static void remap_page(struct folio *folio, unsigned = long nr) if (!folio_test_anon(folio)) return; for (;;) { - remove_migration_ptes(folio, folio, true); + remove_migration_ptes(folio, folio, RMP_LOCKED | flags); i +=3D folio_nr_pages(folio); if (i >=3D nr) break; @@ -3240,7 +3240,7 @@ static void __split_huge_page(struct page *page, stru= ct list_head *list, =20 if (nr_dropped) shmem_uncharge(folio->mapping->host, nr_dropped); - remap_page(folio, nr); + remap_page(folio, nr, PageAnon(head) ? RMP_USE_SHARED_ZEROPAGE : 0); =20 /* * set page to its compound_head when split to non order-0 pages, so @@ -3542,7 +3542,7 @@ int split_huge_page_to_list_to_order(struct page *pag= e, struct list_head *list, if (mapping) xas_unlock(&xas); local_irq_enable(); - remap_page(folio, folio_nr_pages(folio)); + remap_page(folio, folio_nr_pages(folio), 0); ret =3D -EAGAIN; } =20 diff --git a/mm/migrate.c b/mm/migrate.c index 6f9c62c746be..d039863e014b 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -204,13 +204,57 @@ bool isolate_folio_to_list(struct folio *folio, struc= t list_head *list) return true; } =20 +static bool try_to_map_unused_to_zeropage(struct page_vma_mapped_walk *pvm= w, + struct folio *folio, + unsigned long idx) +{ + struct page *page =3D folio_page(folio, idx); + bool contains_data; + pte_t newpte; + void *addr; + + VM_BUG_ON_PAGE(PageCompound(page), page); + VM_BUG_ON_PAGE(!PageAnon(page), page); + VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_PAGE(pte_present(*pvmw->pte), page); + + if (folio_test_mlocked(folio) || (pvmw->vma->vm_flags & VM_LOCKED) || + mm_forbids_zeropage(pvmw->vma->vm_mm)) + return false; + + /* + * The pmd entry mapping the old thp was flushed and the pte mapping + * this subpage has been non present. If the subpage is only zero-filled + * then map it to the shared zeropage. + */ + addr =3D kmap_local_page(page); + contains_data =3D memchr_inv(addr, 0, PAGE_SIZE); + kunmap_local(addr); + + if (contains_data) + return false; + + newpte =3D pte_mkspecial(pfn_pte(my_zero_pfn(pvmw->address), + pvmw->vma->vm_page_prot)); + set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); + + dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); + return true; +} + +struct rmap_walk_arg { + struct folio *folio; + bool map_unused_to_zeropage; +}; + /* * Restore a potential migration pte to a working pte entry */ static bool remove_migration_pte(struct folio *folio, - struct vm_area_struct *vma, unsigned long addr, void *old) + struct vm_area_struct *vma, unsigned long addr, void *arg) { - DEFINE_FOLIO_VMA_WALK(pvmw, old, vma, addr, PVMW_SYNC | PVMW_MIGRATION); + struct rmap_walk_arg *rmap_walk_arg =3D arg; + DEFINE_FOLIO_VMA_WALK(pvmw, rmap_walk_arg->folio, vma, addr, PVMW_SYNC | = PVMW_MIGRATION); =20 while (page_vma_mapped_walk(&pvmw)) { rmap_t rmap_flags =3D RMAP_NONE; @@ -234,6 +278,9 @@ static bool remove_migration_pte(struct folio *folio, continue; } #endif + if (rmap_walk_arg->map_unused_to_zeropage && + try_to_map_unused_to_zeropage(&pvmw, folio, idx)) + continue; =20 folio_get(folio); pte =3D mk_pte(new, READ_ONCE(vma->vm_page_prot)); @@ -312,14 +359,21 @@ static bool remove_migration_pte(struct folio *folio, * Get rid of all migration entries and replace them by * references to the indicated page. */ -void remove_migration_ptes(struct folio *src, struct folio *dst, bool lock= ed) +void remove_migration_ptes(struct folio *src, struct folio *dst, int flags) { + struct rmap_walk_arg rmap_walk_arg =3D { + .folio =3D src, + .map_unused_to_zeropage =3D flags & RMP_USE_SHARED_ZEROPAGE, + }; + struct rmap_walk_control rwc =3D { .rmap_one =3D remove_migration_pte, - .arg =3D src, + .arg =3D &rmap_walk_arg, }; =20 - if (locked) + VM_BUG_ON_FOLIO((flags & RMP_USE_SHARED_ZEROPAGE) && (src !=3D dst), src); + + if (flags & RMP_LOCKED) rmap_walk_locked(dst, &rwc); else rmap_walk(dst, &rwc); @@ -934,7 +988,7 @@ static int writeout(struct address_space *mapping, stru= ct folio *folio) * At this point we know that the migration attempt cannot * be successful. */ - remove_migration_ptes(folio, folio, false); + remove_migration_ptes(folio, folio, 0); =20 rc =3D mapping->a_ops->writepage(&folio->page, &wbc); =20 @@ -1098,7 +1152,7 @@ static void migrate_folio_undo_src(struct folio *src, struct list_head *ret) { if (page_was_mapped) - remove_migration_ptes(src, src, false); + remove_migration_ptes(src, src, 0); /* Drop an anon_vma reference if we took one */ if (anon_vma) put_anon_vma(anon_vma); @@ -1336,7 +1390,7 @@ static int migrate_folio_move(free_folio_t put_new_fo= lio, unsigned long private, lru_add_drain(); =20 if (old_page_state & PAGE_WAS_MAPPED) - remove_migration_ptes(src, dst, false); + remove_migration_ptes(src, dst, 0); =20 out_unlock_both: folio_unlock(dst); @@ -1474,7 +1528,7 @@ static int unmap_and_move_huge_page(new_folio_t get_n= ew_folio, =20 if (page_was_mapped) remove_migration_ptes(src, - rc =3D=3D MIGRATEPAGE_SUCCESS ? dst : src, false); + rc =3D=3D MIGRATEPAGE_SUCCESS ? dst : src, 0); =20 unlock_put_anon: folio_unlock(dst); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 8d687de88a03..9cf26592ac93 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -424,7 +424,7 @@ static unsigned long migrate_device_unmap(unsigned long= *src_pfns, continue; =20 folio =3D page_folio(page); - remove_migration_ptes(folio, folio, false); + remove_migration_ptes(folio, folio, 0); =20 src_pfns[i] =3D 0; folio_unlock(folio); @@ -840,7 +840,7 @@ void migrate_device_finalize(unsigned long *src_pfns, dst =3D src; } =20 - remove_migration_ptes(src, dst, false); + remove_migration_ptes(src, dst, 0); folio_unlock(src); =20 if (folio_is_zone_device(src)) --=20 2.43.5 From nobody Fri Dec 19 07:18:46 2025 Received: from mail-oa1-f48.google.com (mail-oa1-f48.google.com [209.85.160.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 93EA017C23D; Fri, 30 Aug 2024 10:04:47 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725012291; cv=none; b=HXjdxSrQChIXaXIsOsiea5yct9ZhQqxDx9J5rl1gvMedvWos2IT9hmxafQG5HZiCvvF6sFcwctDmImYgBGkxsEX8jlnIJqtoPwHHp7UwYDgeyEFGMQLqyPn2X9r32QQqIO3QaPm4E/h5EOF2ycFQg93fl8HmnXKGLfLqWtQLfJQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725012291; c=relaxed/simple; bh=JRI/TG5tsEH+6PDoJbWTW8KeioTYkt/ObSbGirGtVEw=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=lgmNjreM1MEV4qlk0xLsCc75mNZdaFDqHe/kFf/oV48mKs+jAt7FPwfhbfZR4JWk3XkhLQ1A+u5DXtFnCwYgKpE3hne3o6l3gdlMHhO0dmXTFaUM0IDM41Pwv2HbH5cFLIJYhS8/BIcvKYEJ5U6CBdAPBT3CjyNuRhYSTVKs6WE= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=CJSkLNDu; arc=none smtp.client-ip=209.85.160.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="CJSkLNDu" Received: by mail-oa1-f48.google.com with SMTP id 586e51a60fabf-2705dd4ba64so875950fac.3; Fri, 30 Aug 2024 03:04:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725012286; x=1725617086; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=ozbFo2ch+MkbZ+krZSCzDhOFvoowl/UCaYy4gtivTp4=; b=CJSkLNDuOJm6z6TgviQwW3i534TxivFX1WToPFfOFZgBBBqFC0g7sJKrrhUUHXPtRm 8iUll/7HAiy1mAdNRm0DOTSdeQULKgYBL5xfrAipnwflDo8dpiRDMhwUfU1t1fnpBAi4 4inVZsiAgV+QPMqm87u4qzVio4zM8nAnQE7HIZ1btSD4EROwrzR8mdPI0UkMC8vlaCY9 f24FLHb1m4jKrAmJCiT1GX9vIh+fNso8dQRa3GBBZSWoGuUxlOMRmnIKf6vWqc+7wabw iQVt/71MV0nMW4B/ifiqmuJBXf5T18BsXFIgS+v3x3qVYeqRe+UrU3lf0aWt4J2gj3tZ /wXg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725012286; x=1725617086; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ozbFo2ch+MkbZ+krZSCzDhOFvoowl/UCaYy4gtivTp4=; b=GzV/A1i4vG5kuKUd+Io5AHE/X6fveKU3AzFyi8VQeUPYuJMj6NiGPSZ7AZBWEM248C ZhKtLMUdgSku3po3VByDjtCrngTq9SI8NCZdWeEKQO5smNZyRK2qWoUdPacyxwetRc1g qb3f6c1CJzcvt64a2EArnxsP1yuLSpKN3kEKfKtw9CWQIk0q7gktrzwBCSbLn0F/Ekss x5glVy+FhpHJgh/O8w4y9z1zvLljvfni4aFR27N3XbJh84I22X8GazwANyzK1AhPj3AO CLItLpVIINUcgUcAa1zeAFYFLIBIvTSeEAk6KpHmki2VFKFDO2mOkKlDxL8ql9a1GVMP cg1w== X-Forwarded-Encrypted: i=1; AJvYcCW8SuUH1aWFw/Q6HNdYE5blGzQW3lw2AcuonuWdNRZjlQ5WePZZoQC7DV4dWZwzsfVdMaZJQ7ewfG5cF+sN@vger.kernel.org, AJvYcCXGaGfBs+ZzIZCchSPdAxlRw120IGxPfMp/rxaZbXqBWenQGl3jut34ZqdjlIf4GdvET03Hn9YROWk=@vger.kernel.org X-Gm-Message-State: AOJu0YxFRuzuRZjm8omV4xgJYbGCQxdGb2iX8Hjdr5RQR11WghQRtyGh /gf9lKq1Iu+v/hlVtYWrOZLn6ouNHFin5yBy2L8qtiYOkk8Ir9// X-Google-Smtp-Source: AGHT+IFoXn0hlKtDOWPBmBf5kINkjOtVZEqP5VHvyK8WGYNnu5Y+WFxUha8zAkckLlImFFhEi3Na6w== X-Received: by 2002:a05:6870:2046:b0:270:1dab:64a9 with SMTP id 586e51a60fabf-277900c10d2mr6589971fac.14.1725012286247; Fri, 30 Aug 2024 03:04:46 -0700 (PDT) Received: from localhost (fwdproxy-ash-003.fbsv.net. [2a03:2880:20ff:3::face:b00c]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a806c4a6e6sm129878685a.64.2024.08.30.03.04.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Aug 2024 03:04:45 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, npache@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, ryncsn@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Alexander Zhu , Usama Arif Subject: [PATCH v5 3/6] mm: selftest to verify zero-filled pages are mapped to zeropage Date: Fri, 30 Aug 2024 11:03:37 +0100 Message-ID: <20240830100438.3623486-4-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240830100438.3623486-1-usamaarif642@gmail.com> References: <20240830100438.3623486-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Alexander Zhu When a THP is split, any subpage that is zero-filled will be mapped to the shared zeropage, hence saving memory. Add selftest to verify this by allocating zero-filled THP and comparing RssAnon before and after split. Signed-off-by: Alexander Zhu Acked-by: Rik van Riel Signed-off-by: Usama Arif --- .../selftests/mm/split_huge_page_test.c | 71 +++++++++++++++++++ tools/testing/selftests/mm/vm_util.c | 22 ++++++ tools/testing/selftests/mm/vm_util.h | 1 + 3 files changed, 94 insertions(+) diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/test= ing/selftests/mm/split_huge_page_test.c index e5e8dafc9d94..eb6d1b9fc362 100644 --- a/tools/testing/selftests/mm/split_huge_page_test.c +++ b/tools/testing/selftests/mm/split_huge_page_test.c @@ -84,6 +84,76 @@ static void write_debugfs(const char *fmt, ...) write_file(SPLIT_DEBUGFS, input, ret + 1); } =20 +static char *allocate_zero_filled_hugepage(size_t len) +{ + char *result; + size_t i; + + result =3D memalign(pmd_pagesize, len); + if (!result) { + printf("Fail to allocate memory\n"); + exit(EXIT_FAILURE); + } + + madvise(result, len, MADV_HUGEPAGE); + + for (i =3D 0; i < len; i++) + result[i] =3D (char)0; + + return result; +} + +static void verify_rss_anon_split_huge_page_all_zeroes(char *one_page, int= nr_hpages, size_t len) +{ + unsigned long rss_anon_before, rss_anon_after; + size_t i; + + if (!check_huge_anon(one_page, 4, pmd_pagesize)) { + printf("No THP is allocated\n"); + exit(EXIT_FAILURE); + } + + rss_anon_before =3D rss_anon(); + if (!rss_anon_before) { + printf("No RssAnon is allocated before split\n"); + exit(EXIT_FAILURE); + } + + /* split all THPs */ + write_debugfs(PID_FMT, getpid(), (uint64_t)one_page, + (uint64_t)one_page + len, 0); + + for (i =3D 0; i < len; i++) + if (one_page[i] !=3D (char)0) { + printf("%ld byte corrupted\n", i); + exit(EXIT_FAILURE); + } + + if (!check_huge_anon(one_page, 0, pmd_pagesize)) { + printf("Still AnonHugePages not split\n"); + exit(EXIT_FAILURE); + } + + rss_anon_after =3D rss_anon(); + if (rss_anon_after >=3D rss_anon_before) { + printf("Incorrect RssAnon value. Before: %ld After: %ld\n", + rss_anon_before, rss_anon_after); + exit(EXIT_FAILURE); + } +} + +void split_pmd_zero_pages(void) +{ + char *one_page; + int nr_hpages =3D 4; + size_t len =3D nr_hpages * pmd_pagesize; + + one_page =3D allocate_zero_filled_hugepage(len); + verify_rss_anon_split_huge_page_all_zeroes(one_page, nr_hpages, len); + printf("Split zero filled huge pages successful\n"); + free(one_page); +} + void split_pmd_thp(void) { char *one_page; @@ -431,6 +501,7 @@ int main(int argc, char **argv) =20 fd_size =3D 2 * pmd_pagesize; =20 + split_pmd_zero_pages(); split_pmd_thp(); split_pte_mapped_thp(); split_file_backed_thp(); diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests= /mm/vm_util.c index 5a62530da3b5..d8d0cf04bb57 100644 --- a/tools/testing/selftests/mm/vm_util.c +++ b/tools/testing/selftests/mm/vm_util.c @@ -12,6 +12,7 @@ =20 #define PMD_SIZE_FILE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_= size" #define SMAP_FILE_PATH "/proc/self/smaps" +#define STATUS_FILE_PATH "/proc/self/status" #define MAX_LINE_LENGTH 500 =20 unsigned int __page_size; @@ -171,6 +172,27 @@ uint64_t read_pmd_pagesize(void) return strtoul(buf, NULL, 10); } =20 +unsigned long rss_anon(void) +{ + unsigned long rss_anon =3D 0; + FILE *fp; + char buffer[MAX_LINE_LENGTH]; + + fp =3D fopen(STATUS_FILE_PATH, "r"); + if (!fp) + ksft_exit_fail_msg("%s: Failed to open file %s\n", __func__, STATUS_FILE= _PATH); + + if (!check_for_pattern(fp, "RssAnon:", buffer, sizeof(buffer))) + goto err_out; + + if (sscanf(buffer, "RssAnon:%10lu kB", &rss_anon) !=3D 1) + ksft_exit_fail_msg("Reading status error\n"); + +err_out: + fclose(fp); + return rss_anon; +} + bool __check_huge(void *addr, char *pattern, int nr_hpages, uint64_t hpage_size) { diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests= /mm/vm_util.h index 9007c420d52c..2eaed8209925 100644 --- a/tools/testing/selftests/mm/vm_util.h +++ b/tools/testing/selftests/mm/vm_util.h @@ -39,6 +39,7 @@ unsigned long pagemap_get_pfn(int fd, char *start); void clear_softdirty(void); bool check_for_pattern(FILE *fp, const char *pattern, char *buf, size_t le= n); uint64_t read_pmd_pagesize(void); +unsigned long rss_anon(void); bool check_huge_anon(void *addr, int nr_hpages, uint64_t hpage_size); bool check_huge_file(void *addr, int nr_hpages, uint64_t hpage_size); bool check_huge_shmem(void *addr, int nr_hpages, uint64_t hpage_size); --=20 2.43.5 From nobody Fri Dec 19 07:18:46 2025 Received: from mail-qv1-f48.google.com (mail-qv1-f48.google.com [209.85.219.48]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 64F4017D36A; Fri, 30 Aug 2024 10:04:49 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.219.48 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725012291; cv=none; b=qHJLj6agO4ojZiYnZfFU4Y6p6pL41ZDVPIqdQOSvt/id4ka0faxTxfg4g+8a/HGu/35BJ/CT8jH4q8opTZUBDyeatlJM7oRbKsAx28UCPa4rjhmnSOS0HTZQJx1p4VFI4UBGPjeJX6QJR5yB86gVhiHXTxKdlcHhEQT6tAd8zRM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725012291; c=relaxed/simple; bh=XC/D3FWbSMjcdNJsMt8ddxpm9iPGUgCsLG/PvWEmTFA=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=sh9kmfgsIshoaAbJ61CcGTn5PlFcG0MY9VU7yk+j3YD9zAISfdAaYq9CULzd9pYMx8BDrEJROSg07wJwa4Tmgaz1zFFO0VBU9Vb29wmkOVEMyuu08MwNFvqrJlYdMtoCniCzFQBO+cB6+z7J9h0911tGQwesUDp2r5Pelkff394= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=WvAyDYGR; arc=none smtp.client-ip=209.85.219.48 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="WvAyDYGR" Received: by mail-qv1-f48.google.com with SMTP id 6a1803df08f44-6bf84c3d043so8304896d6.3; Fri, 30 Aug 2024 03:04:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725012288; x=1725617088; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=t+vBNVWjO6mtV26zUs32wjVtSHlbqsf20K2xRQFmV6I=; b=WvAyDYGRCuosNF+95VatMhnbc4kZkRtxyhn6CQZxWO7J/82ry4wCTvfpZkiZue7tzT o65PveullBmRO0UXNJ2nVNmXQA7REx+M3MJBIoS53HiYM/E0EidiLod3sR7RPbZZfgE3 WcSMGdTdLyzmd1USIiL7i/NRY75otHykSuIbu7vcrSEffPgvvFmHK8uX9HIzhHnDeFXS 9Rq8jUgW3y0TPYPWm3MiLdPG4rtgnsAW8U4Gb0LeknRhC4uTbq/InMRznkC3wOVXSPbg FVkuzueDDbannNJaiQMQbhDAAgh5T7p4hQC7BaAse3lE97BbvGpxtf5GOwN/P9ABzHRG JbcQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725012288; x=1725617088; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=t+vBNVWjO6mtV26zUs32wjVtSHlbqsf20K2xRQFmV6I=; b=RI798R++tJ/ulQmmF1x2ufPvUy2pu3M6dC0gqo8ehCvHDtlTBZJ5bksC3tLGTCoHxE VdVqFQIHSULa9FVI2YHPb/UeMnYmPlx86v7Zfcl7OstlHSr+VTgcAKuXNd0g02if744j Hpj85j9PIUzCS3Z6KmvaTdDl76dzZ2S87KwdwfHZjEQ7MIWtGIF9313PvZmHwv4eMCHl C/+Rm8eyyy30Uqx7M4tCIZMNQWHRL020QIDt9PjI6ELxbHV2qXGdsBivTyhsa5MtHwp/ Lp3EoFkd7tUXwUpaQfElmn9n1WPtSvop0APNO0q/q8QenbAOlihZaZ/ZJaw4fCxFsYTB 8kuw== X-Forwarded-Encrypted: i=1; AJvYcCUI+x61tUlAhpB05NMyyZRNbnqxGlqOPCppQEJhRtdIk7v3gl/+Byo+2VhtgWxybZPkBdCwIGRGO+wvOA6b@vger.kernel.org, AJvYcCWRLDAuc8JkhxGopQK9WLBOmMdqeo3Tr4NyEKW67q4dlIKKZOaLwg/MCl/YIV5PC4HJTjUmpqaouJU=@vger.kernel.org X-Gm-Message-State: AOJu0Yy7A18FPW0a+9BP3GRTV9Fib7Cd5qdhG7RlOHXn3SKBA3DeO22t CtNeWIwVjbugn3ACVH6ZhaQva5/q58BClsGvklUPxIrhQbzZJbaD X-Google-Smtp-Source: AGHT+IH5AyMQZDNKx+vMxR5PHzOCBM3mGoXbZzUBy0tcyWMwXEM0DbfBsCwTeIM/EkOxrAzZrQbkmg== X-Received: by 2002:a0c:fa03:0:b0:6bf:6b8a:40a1 with SMTP id 6a1803df08f44-6c34ae4a03amr11113366d6.29.1725012288019; Fri, 30 Aug 2024 03:04:48 -0700 (PDT) Received: from localhost (fwdproxy-ash-001.fbsv.net. [2a03:2880:20ff:1::face:b00c]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6c340bfa666sm13303256d6.2.2024.08.30.03.04.47 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Aug 2024 03:04:47 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, npache@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, ryncsn@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v5 4/6] mm: Introduce a pageflag for partially mapped folios Date: Fri, 30 Aug 2024 11:03:38 +0100 Message-ID: <20240830100438.3623486-5-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240830100438.3623486-1-usamaarif642@gmail.com> References: <20240830100438.3623486-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Currently folio->_deferred_list is used to keep track of partially_mapped folios that are going to be split under memory pressure. In the next patch, all THPs that are faulted in and collapsed by khugepaged are also going to be tracked using _deferred_list. This patch introduces a pageflag to be able to distinguish between partially mapped folios and others in the deferred_list at split time in deferred_split_scan. Its needed as __folio_remove_rmap decrements _mapcount, _large_mapcount and _entire_mapcount, hence it won't be possible to distinguish between partially mapped folios and others in deferred_split_scan. Eventhough it introduces an extra flag to track if the folio is partially mapped, there is no functional change intended with this patch and the flag is not useful in this patch itself, it will become useful in the next patch when _deferred_list has non partially mapped folios. Signed-off-by: Usama Arif --- include/linux/huge_mm.h | 4 ++-- include/linux/page-flags.h | 13 +++++++++++- mm/huge_memory.c | 41 ++++++++++++++++++++++++++++---------- mm/memcontrol.c | 3 ++- mm/migrate.c | 3 ++- mm/page_alloc.c | 5 +++-- mm/rmap.c | 5 +++-- mm/vmscan.c | 3 ++- 8 files changed, 56 insertions(+), 21 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 4da102b74a8c..0b0539f4ee1a 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -333,7 +333,7 @@ static inline int split_huge_page(struct page *page) { return split_huge_page_to_list_to_order(page, NULL, 0); } -void deferred_split_folio(struct folio *folio); +void deferred_split_folio(struct folio *folio, bool partially_mapped); =20 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, bool freeze, struct folio *folio); @@ -502,7 +502,7 @@ static inline int split_huge_page(struct page *page) { return 0; } -static inline void deferred_split_folio(struct folio *folio) {} +static inline void deferred_split_folio(struct folio *folio, bool partiall= y_mapped) {} #define split_huge_pmd(__vma, __pmd, __address) \ do { } while (0) =20 diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 2175ebceb41c..1b3a76710487 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -186,6 +186,7 @@ enum pageflags { /* At least one page in this folio has the hwpoison flag set */ PG_has_hwpoisoned =3D PG_active, PG_large_rmappable =3D PG_workingset, /* anon or file-backed */ + PG_partially_mapped =3D PG_reclaim, /* was identified to be partially map= ped */ }; =20 #define PAGEFLAGS_MASK ((1UL << NR_PAGEFLAGS) - 1) @@ -859,8 +860,18 @@ static inline void ClearPageCompound(struct page *page) ClearPageHead(page); } FOLIO_FLAG(large_rmappable, FOLIO_SECOND_PAGE) +FOLIO_TEST_FLAG(partially_mapped, FOLIO_SECOND_PAGE) +/* + * PG_partially_mapped is protected by deferred_split split_queue_lock, + * so its safe to use non-atomic set/clear. + */ +__FOLIO_SET_FLAG(partially_mapped, FOLIO_SECOND_PAGE) +__FOLIO_CLEAR_FLAG(partially_mapped, FOLIO_SECOND_PAGE) #else FOLIO_FLAG_FALSE(large_rmappable) +FOLIO_TEST_FLAG_FALSE(partially_mapped) +__FOLIO_SET_FLAG_NOOP(partially_mapped) +__FOLIO_CLEAR_FLAG_NOOP(partially_mapped) #endif =20 #define PG_head_mask ((1UL << PG_head)) @@ -1171,7 +1182,7 @@ static __always_inline void __ClearPageAnonExclusive(= struct page *page) */ #define PAGE_FLAGS_SECOND \ (0xffUL /* order */ | 1UL << PG_has_hwpoisoned | \ - 1UL << PG_large_rmappable) + 1UL << PG_large_rmappable | 1UL << PG_partially_mapped) =20 #define PAGE_FLAGS_PRIVATE \ (1UL << PG_private | 1UL << PG_private_2) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index af60684e7c70..166f8810f3c6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -3503,7 +3503,11 @@ int split_huge_page_to_list_to_order(struct page *pa= ge, struct list_head *list, if (folio_order(folio) > 1 && !list_empty(&folio->_deferred_list)) { ds_queue->split_queue_len--; - mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -= 1); + if (folio_test_partially_mapped(folio)) { + __folio_clear_partially_mapped(folio); + mod_mthp_stat(folio_order(folio), + MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); + } /* * Reinitialize page_deferred_list after removing the * page from the split_queue, otherwise a subsequent @@ -3570,13 +3574,18 @@ void __folio_undo_large_rmappable(struct folio *fol= io) spin_lock_irqsave(&ds_queue->split_queue_lock, flags); if (!list_empty(&folio->_deferred_list)) { ds_queue->split_queue_len--; - mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1= ); + if (folio_test_partially_mapped(folio)) { + __folio_clear_partially_mapped(folio); + mod_mthp_stat(folio_order(folio), + MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); + } list_del_init(&folio->_deferred_list); } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); } =20 -void deferred_split_folio(struct folio *folio) +/* partially_mapped=3Dfalse won't clear PG_partially_mapped folio flag */ +void deferred_split_folio(struct folio *folio, bool partially_mapped) { struct deferred_split *ds_queue =3D get_deferred_split_queue(folio); #ifdef CONFIG_MEMCG @@ -3604,15 +3613,21 @@ void deferred_split_folio(struct folio *folio) if (folio_test_swapcache(folio)) return; =20 - if (!list_empty(&folio->_deferred_list)) - return; - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); + if (partially_mapped) { + if (!folio_test_partially_mapped(folio)) { + __folio_set_partially_mapped(folio); + if (folio_test_pmd_mappable(folio)) + count_vm_event(THP_DEFERRED_SPLIT_PAGE); + count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED); + mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, 1= ); + + } + } else { + /* partially mapped folios cannot become non-partially mapped */ + VM_WARN_ON_FOLIO(folio_test_partially_mapped(folio), folio); + } if (list_empty(&folio->_deferred_list)) { - if (folio_test_pmd_mappable(folio)) - count_vm_event(THP_DEFERRED_SPLIT_PAGE); - count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED); - mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, 1); list_add_tail(&folio->_deferred_list, &ds_queue->split_queue); ds_queue->split_queue_len++; #ifdef CONFIG_MEMCG @@ -3660,7 +3675,11 @@ static unsigned long deferred_split_scan(struct shri= nker *shrink, list_move(&folio->_deferred_list, &list); } else { /* We lost race with folio_put() */ - mod_mthp_stat(folio_order(folio), MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -= 1); + if (folio_test_partially_mapped(folio)) { + __folio_clear_partially_mapped(folio); + mod_mthp_stat(folio_order(folio), + MTHP_STAT_NR_ANON_PARTIALLY_MAPPED, -1); + } list_del_init(&folio->_deferred_list); ds_queue->split_queue_len--; } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 087a8cb1a6d8..e66da58a365d 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4629,7 +4629,8 @@ static void uncharge_folio(struct folio *folio, struc= t uncharge_gather *ug) VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); VM_BUG_ON_FOLIO(folio_order(folio) > 1 && !folio_test_hugetlb(folio) && - !list_empty(&folio->_deferred_list), folio); + !list_empty(&folio->_deferred_list) && + folio_test_partially_mapped(folio), folio); =20 /* * Nobody should be changing or seriously looking at diff --git a/mm/migrate.c b/mm/migrate.c index d039863e014b..35cc9d35064b 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1766,7 +1766,8 @@ static int migrate_pages_batch(struct list_head *from, * use _deferred_list. */ if (nr_pages > 2 && - !list_empty(&folio->_deferred_list)) { + !list_empty(&folio->_deferred_list) && + folio_test_partially_mapped(folio)) { if (!try_split_folio(folio, split_folios, mode)) { nr_failed++; stats->nr_thp_failed +=3D is_thp; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index c2ffccf9d213..a82c221b7c2e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -962,8 +962,9 @@ static int free_tail_page_prepare(struct page *head_pag= e, struct page *page) break; case 2: /* the second tail page: deferred_list overlaps ->mapping */ - if (unlikely(!list_empty(&folio->_deferred_list))) { - bad_page(page, "on deferred list"); + if (unlikely(!list_empty(&folio->_deferred_list) && + folio_test_partially_mapped(folio))) { + bad_page(page, "partially mapped folio on deferred list"); goto out; } break; diff --git a/mm/rmap.c b/mm/rmap.c index 78529cf0fd66..a8797d1b3d49 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1579,8 +1579,9 @@ static __always_inline void __folio_remove_rmap(struc= t folio *folio, * Check partially_mapped first to ensure it is a large folio. */ if (partially_mapped && folio_test_anon(folio) && - list_empty(&folio->_deferred_list)) - deferred_split_folio(folio); + !folio_test_partially_mapped(folio)) + deferred_split_folio(folio, true); + __folio_mod_stat(folio, -nr, -nr_pmdmapped); =20 /* diff --git a/mm/vmscan.c b/mm/vmscan.c index f27792e77a0f..4ca612f7e473 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1238,7 +1238,8 @@ static unsigned int shrink_folio_list(struct list_hea= d *folio_list, * Split partially mapped folios right away. * We can free the unmapped pages without IO. */ - if (data_race(!list_empty(&folio->_deferred_list)) && + if (data_race(!list_empty(&folio->_deferred_list) && + folio_test_partially_mapped(folio)) && split_folio_to_list(folio, folio_list)) goto activate_locked; } --=20 2.43.5 From nobody Fri Dec 19 07:18:46 2025 Received: from mail-qk1-f169.google.com (mail-qk1-f169.google.com [209.85.222.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9F92117E000; Fri, 30 Aug 2024 10:04:50 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725012295; cv=none; b=lrg1OQvZjU8o9GrOtaw64TobuYOqDXYIcNzyec96YNxUndAuNdm4Ghl4oJcoNyOwbpLSq5Gz8ZrFQ4WRvHd34O4i7i4DTFvid/A59m1FGShevmyLDRc2t8nVRnW7r9lzOAOfu+peFDh6x7tN6ever1GPVq5Q2lthAmargZhmZ7g= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725012295; c=relaxed/simple; bh=+Hrdoy36XqHCkKjq0CLngQMsQdp34ew3vkSdw5Kw+lY=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=eKWMwzPu3eDeWVREhmHfUf9fsD63VfVhGZatC010oVuMfWq79z2pldUidlaQXvRSckK0cRSfM3Eqv/nl18k8DkfQ2ZSH0VvHd4KB51Jcnz/3hh0EueGGt4+Lu3EmIsRSy5xLLPwjeIGtJvJj9oL6QuiBHlVDq8E3bvLQ9g3q2PM= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=LHXbKTUy; arc=none smtp.client-ip=209.85.222.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="LHXbKTUy" Received: by mail-qk1-f169.google.com with SMTP id af79cd13be357-7a8134aefe8so26184485a.2; Fri, 30 Aug 2024 03:04:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725012289; x=1725617089; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=TVfBrp+oNzgiShEzXHur6LLPz7aGRa6cdgG73sD1ODc=; b=LHXbKTUyutmOnY2QNke3+GZxzyZW+PaclRwQaY97NJNUGKkN7zXqWqpw+4MPBsuOHi m+ZyIOJ7w+tXW90ZwknlwqA4n1E20gbnQM+DMeSWNxQgdewYA0R4B3Nmbj4xDRNP1KJA JSfqFxSPhvrByM1x7OwfJqS4aTW8vTCbQcOAz18JrioQvw7yhe2B1V+uvKpt1tTt1lfq zGxbnUv9L8Gc7LMQ6fDqVsoT2YGTy6RiEfOgmCcqNuEGKs5gDL9rY1AFiLA2/JRnMkme JTagULEa+djRKGsKkbYpQYU8jkkDIJ7N8jqvLhLSYbBLPQw/BevyIcSqaTLbPLeNhkiI jH5g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725012289; x=1725617089; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=TVfBrp+oNzgiShEzXHur6LLPz7aGRa6cdgG73sD1ODc=; b=Il/U8eNWl8MDjLyluwjCFRKmvITpgz24bJPOknJg6MCFw++cf1TLNy2rR0nf+wYTkU zNAcv5Ut8Q/a4KVdiILlmoFC4lmjOeaeGSSELVj+ZwDSByOOIAvnY12DMjcglW+SNl02 DCGMzzDul3qDQp2MSyFUNZz1vSOUYJkxKojaajIVX43XLiyv52Z0rXXcEb1zOJawT6xG iolCCFQzl+X/bscgvSSQjrryowurn9nYYfj9ZH+ReZHaIggHt0vw8135UXMxKU6gsm+e IqVZKcqNI/QxpUzJpUR7wtfXy+WA/zaxXth6Irt29XsIj0rzWm4CZFrygoJVbdylNpVk HRKg== X-Forwarded-Encrypted: i=1; AJvYcCUH0RrrdRl4H+wzK2Vl1gdxh5k2ySmGMARuNUBDqO0iIdLwbImu5uALmeVrpUnR/Ce84jA2BVe2OuURN4C7@vger.kernel.org, AJvYcCVXrco4FB6rn7LUCmxTJqb8oNhJbbjUfkaInZCiREsOnpWii4A/jR9w25zYfM5xwdUvAPB6FozR+KY=@vger.kernel.org X-Gm-Message-State: AOJu0YxKKyq3BtBBkdPvx7JNCK/jJE9wNyru2CnJkzCkfF714cq+wikC L1A52i0iweiRxvxaR7jBZ7JiFtEfT42cHAkJOJCH36++UhGn9w5H X-Google-Smtp-Source: AGHT+IGdkW+88y1JPZ8HJseiLWBGoeU0Lq8Ru1U+26rkyCLiExAn2mKtW62AzHffSX6x/E3+lb5IPw== X-Received: by 2002:a05:620a:319d:b0:79f:249:9f9f with SMTP id af79cd13be357-7a80427d35amr433621985a.62.1725012289265; Fri, 30 Aug 2024 03:04:49 -0700 (PDT) Received: from localhost (fwdproxy-ash-113.fbsv.net. [2a03:2880:20ff:71::face:b00c]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a806c4cd7csm130315085a.69.2024.08.30.03.04.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Aug 2024 03:04:48 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, npache@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, ryncsn@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v5 5/6] mm: split underused THPs Date: Fri, 30 Aug 2024 11:03:39 +0100 Message-ID: <20240830100438.3623486-6-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240830100438.3623486-1-usamaarif642@gmail.com> References: <20240830100438.3623486-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is an attempt to mitigate the issue of running out of memory when THP is always enabled. During runtime whenever a THP is being faulted in (__do_huge_pmd_anonymous_page) or collapsed by khugepaged (collapse_huge_page), the THP is added to _deferred_list. Whenever memory reclaim happens in linux, the kernel runs the deferred_split shrinker which goes through the _deferred_list. If the folio was partially mapped, the shrinker attempts to split it. If the folio is not partially mapped, the shrinker checks if the THP was underused, i.e. how many of the base 4K pages of the entire THP were zero-filled. If this number goes above a certain threshold (decided by /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none), the shrinker will attempt to split that THP. Then at remap time, the pages that were zero-filled are mapped to the shared zeropage, hence saving memory. Suggested-by: Rik van Riel Co-authored-by: Johannes Weiner Signed-off-by: Usama Arif --- Documentation/admin-guide/mm/transhuge.rst | 6 +++ include/linux/khugepaged.h | 1 + include/linux/vm_event_item.h | 1 + mm/huge_memory.c | 60 +++++++++++++++++++++- mm/khugepaged.c | 3 +- mm/vmstat.c | 1 + 6 files changed, 69 insertions(+), 3 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index 56a086900651..aca0cff852b8 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -471,6 +471,12 @@ thp_deferred_split_page splitting it would free up some memory. Pages on split queue are going to be split under memory pressure. =20 +thp_underused_split_page + is incremented when a huge page on the split queue was split + because it was underused. A THP is underused if the number of + zero pages in the THP is above a certain threshold + (/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none). + thp_split_pmd is incremented every time a PMD split into table of PTEs. This can happen, for instance, when application calls mprotect() or diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index f68865e19b0b..30baae91b225 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -4,6 +4,7 @@ =20 #include /* MMF_VM_HUGEPAGE */ =20 +extern unsigned int khugepaged_max_ptes_none __read_mostly; #ifdef CONFIG_TRANSPARENT_HUGEPAGE extern struct attribute_group khugepaged_attr_group; =20 diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index aae5c7c5cfb4..aed952d04132 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -105,6 +105,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_SPLIT_PAGE, THP_SPLIT_PAGE_FAILED, THP_DEFERRED_SPLIT_PAGE, + THP_UNDERUSED_SPLIT_PAGE, THP_SPLIT_PMD, THP_SCAN_EXCEED_NONE_PTE, THP_SCAN_EXCEED_SWAP_PTE, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 166f8810f3c6..a97aeffc55d6 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1187,6 +1187,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct= vm_fault *vmf, update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); mm_inc_nr_ptes(vma->vm_mm); + deferred_split_folio(folio, false); spin_unlock(vmf->ptl); count_vm_event(THP_FAULT_ALLOC); count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC); @@ -3652,6 +3653,39 @@ static unsigned long deferred_split_count(struct shr= inker *shrink, return READ_ONCE(ds_queue->split_queue_len); } =20 +static bool thp_underused(struct folio *folio) +{ + int num_zero_pages =3D 0, num_filled_pages =3D 0; + void *kaddr; + int i; + + if (khugepaged_max_ptes_none =3D=3D HPAGE_PMD_NR - 1) + return false; + + for (i =3D 0; i < folio_nr_pages(folio); i++) { + kaddr =3D kmap_local_folio(folio, i * PAGE_SIZE); + if (!memchr_inv(kaddr, 0, PAGE_SIZE)) { + num_zero_pages++; + if (num_zero_pages > khugepaged_max_ptes_none) { + kunmap_local(kaddr); + return true; + } + } else { + /* + * Another path for early exit once the number + * of non-zero filled pages exceeds threshold. + */ + num_filled_pages++; + if (num_filled_pages >=3D HPAGE_PMD_NR - khugepaged_max_ptes_none) { + kunmap_local(kaddr); + return false; + } + } + kunmap_local(kaddr); + } + return false; +} + static unsigned long deferred_split_scan(struct shrinker *shrink, struct shrink_control *sc) { @@ -3689,13 +3723,35 @@ static unsigned long deferred_split_scan(struct shr= inker *shrink, spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); =20 list_for_each_entry_safe(folio, next, &list, _deferred_list) { + bool did_split =3D false; + bool underused =3D false; + + if (!folio_test_partially_mapped(folio)) { + underused =3D thp_underused(folio); + if (!underused) + goto next; + } if (!folio_trylock(folio)) goto next; - /* split_huge_page() removes page from list on success */ - if (!split_folio(folio)) + if (!split_folio(folio)) { + did_split =3D true; + if (underused) + count_vm_event(THP_UNDERUSED_SPLIT_PAGE); split++; + } folio_unlock(folio); next: + /* + * split_folio() removes folio from list on success. + * Only add back to the queue if folio is partially mapped. + * If thp_underused returns false, or if split_folio fails + * in the case it was underused, then consider it used and + * don't add it back to split_queue. + */ + if (!did_split && !folio_test_partially_mapped(folio)) { + list_del_init(&folio->_deferred_list); + ds_queue->split_queue_len--; + } folio_put(folio); } =20 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 5bfb5594c604..bf1734e8e665 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -85,7 +85,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait); * * Note that these are only respected if collapse was initiated by khugepa= ged. */ -static unsigned int khugepaged_max_ptes_none __read_mostly; +unsigned int khugepaged_max_ptes_none __read_mostly; static unsigned int khugepaged_max_ptes_swap __read_mostly; static unsigned int khugepaged_max_ptes_shared __read_mostly; =20 @@ -1237,6 +1237,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, pgtable_trans_huge_deposit(mm, pmd, pgtable); set_pmd_at(mm, address, pmd, _pmd); update_mmu_cache_pmd(vma, address, pmd); + deferred_split_folio(folio, false); spin_unlock(pmd_ptl); =20 folio =3D NULL; diff --git a/mm/vmstat.c b/mm/vmstat.c index f41984dc856f..bb081ae4d0ae 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1385,6 +1385,7 @@ const char * const vmstat_text[] =3D { "thp_split_page", "thp_split_page_failed", "thp_deferred_split_page", + "thp_underused_split_page", "thp_split_pmd", "thp_scan_exceed_none_pte", "thp_scan_exceed_swap_pte", --=20 2.43.5 From nobody Fri Dec 19 07:18:46 2025 Received: from mail-ot1-f54.google.com (mail-ot1-f54.google.com [209.85.210.54]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BAE77186618; Fri, 30 Aug 2024 10:04:51 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.210.54 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725012293; cv=none; b=dqA5eaHBoyVXBdIuWFC5N2gEFztBFhvgQ/7hvKv2LYU4w9FHot37sfs3sDa27mg/0edIUrbjCJ4UE/EaoijSfaPVdx+mjrxijxgwzvIt+bRwEKvNFtqgmpJpbpITIgsgNIoGZKUf2T7CRS4ks5uZjNXd5xo7uk3PMrgKUb6XHJI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1725012293; c=relaxed/simple; bh=I4j6m2ue+2oFeLC3LpeXOA7UHaZhu5FNAIPV7H15v4o=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=fpCZIsnYXi7tPM3z+pV6eGfNvpKo8boim8VjvzJmvst6kg+uRUIi/mPjEVX/MmYouauh4EXX04RkKKSmpvuNIyEocnTrhw08LlsSKwSnmamiUW44WDFuTs+KAysCwKIJnuagGI0/wkqoLVyY6VsHwPZLsU26QMp+Jp9uyaxqyk4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=HqQVoxM1; arc=none smtp.client-ip=209.85.210.54 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HqQVoxM1" Received: by mail-ot1-f54.google.com with SMTP id 46e09a7af769-709340f1cb1so595982a34.3; Fri, 30 Aug 2024 03:04:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1725012291; x=1725617091; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=y9HM33kRGSHCEaJuDM+XKG3QBl4KNlRa8+XSdjB42Js=; b=HqQVoxM1iGnK8tB1zhxSB4cpQUocIPw2Q8c8qOZxCRsw2ZcZSBYVXe8RwxvKtpKCWp 7I/VOGU5UFs2DJfyIVF8EmYuYYiDFLp0dnErSYi3cwXipyGGadf2AoXKi8JaaGiAOVkL rxOzKZvETbU4xWHOus27H6og33c/umyKo5WrliMLJ8DHlntEuV8tlLER4PAVNsdaf+aX bCY1JNaQRxIgN7r5xUaM/mSffnOrzxkKyPpHn6ihaS+CQ3BzWlk97GazS92E9iFSNGXF V4JIfCwVgPt3nS6ZBjAF9OJvsELHsPARkf5f582i5g6tMoTWemGNbxEbkYhWZB1AKAyM kNzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1725012291; x=1725617091; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=y9HM33kRGSHCEaJuDM+XKG3QBl4KNlRa8+XSdjB42Js=; b=tUpDltv41EHl0Cje/m+Mfkghfma649wltoBIIL5EnDwQA/xVuV1eSgavJpvUE+rPrF ZXoxDkskWgPcPrNxT+OeHq7N+/PEX44m2QAGPMIOw6LQkCqQIs7MHy6AbrQelvObY5Oy fH7vQwu2Wukky8gfz1M3Hml2e0iOqoQMjTV4WDj6m4UsJnHSWUQAQrfQiRwZV3IygTN0 eQV0zGjJC6AxSrnL6OFq+zS8hzC8q5J8sPeU5m9hLrZCh4EhPcy0rqEP7/goylzSsWLK DVtMo0DOuF7FU5Nxyk0ricLuGhehCsv/tCAdTb7mHYgTqac9enpRi17SMzf+qTtQfJWH QhhQ== X-Forwarded-Encrypted: i=1; AJvYcCVhC7RmxpclTDOHZMhqd66oQ+szGv3vRlnB3ZC4JVKHdrNsyR4af3uPL4x7nFJ/cgz7RPIiWD0A8o7TkX3E@vger.kernel.org, AJvYcCX1hT1l234bvXppNc3W+FpfAp3+j2V8MgzPndh7JmSKjjGuf9N6ru1OpqZsiMsy3ZQbTB80tnvtt34=@vger.kernel.org X-Gm-Message-State: AOJu0Yzy9CrqMVwYLAZ55iiZOurRgsI8DWHXjp2DS8Ac7kTi3L6mvCOs 0op6vzFVAE69HvxFN6nGdry1/m/XAcefmTv4dRaztQkOjyuYSPnU X-Google-Smtp-Source: AGHT+IFnFxgDYehxBOjERbSY3E6IQRb/Eo5/olia1KlgsAQYx6fH7Gfn02t5h6FRR0dUftMkedAUyg== X-Received: by 2002:a05:6830:2682:b0:704:45b7:8ffc with SMTP id 46e09a7af769-70f5c49e963mr5658027a34.32.1725012290746; Fri, 30 Aug 2024 03:04:50 -0700 (PDT) Received: from localhost (fwdproxy-ash-013.fbsv.net. [2a03:2880:20ff:d::face:b00c]) by smtp.gmail.com with ESMTPSA id 6a1803df08f44-6c340c0020esm13219306d6.50.2024.08.30.03.04.49 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 30 Aug 2024 03:04:49 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, npache@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, ryncsn@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH v5 6/6] mm: add sysfs entry to disable splitting underused THPs Date: Fri, 30 Aug 2024 11:03:40 +0100 Message-ID: <20240830100438.3623486-7-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.5 In-Reply-To: <20240830100438.3623486-1-usamaarif642@gmail.com> References: <20240830100438.3623486-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" If disabled, THPs faulted in or collapsed will not be added to _deferred_list, and therefore won't be considered for splitting under memory pressure if underused. Signed-off-by: Usama Arif --- Documentation/admin-guide/mm/transhuge.rst | 10 +++++++++ mm/huge_memory.c | 26 ++++++++++++++++++++++ 2 files changed, 36 insertions(+) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index aca0cff852b8..cfdd16a52e39 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -202,6 +202,16 @@ PMD-mappable transparent hugepage:: =20 cat /sys/kernel/mm/transparent_hugepage/hpage_pmd_size =20 +All THPs at fault and collapse time will be added to _deferred_list, +and will therefore be split under memory presure if they are considered +"underused". A THP is underused if the number of zero-filled pages in +the THP is above max_ptes_none (see below). It is possible to disable +this behaviour by writing 0 to shrink_underused, and enable it by writing +1 to it:: + + echo 0 > /sys/kernel/mm/transparent_hugepage/shrink_underused + echo 1 > /sys/kernel/mm/transparent_hugepage/shrink_underused + khugepaged will be automatically started when PMD-sized THP is enabled (either of the per-size anon control or the top-level control are set to "always" or "madvise"), and it'll be automatically shutdown when diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a97aeffc55d6..0993dfe9ae94 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -74,6 +74,7 @@ static unsigned long deferred_split_count(struct shrinker= *shrink, struct shrink_control *sc); static unsigned long deferred_split_scan(struct shrinker *shrink, struct shrink_control *sc); +static bool split_underused_thp =3D true; =20 static atomic_t huge_zero_refcount; struct folio *huge_zero_folio __read_mostly; @@ -440,6 +441,27 @@ static ssize_t hpage_pmd_size_show(struct kobject *kob= j, static struct kobj_attribute hpage_pmd_size_attr =3D __ATTR_RO(hpage_pmd_size); =20 +static ssize_t split_underused_thp_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%d\n", split_underused_thp); +} + +static ssize_t split_underused_thp_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + int err =3D kstrtobool(buf, &split_underused_thp); + + if (err < 0) + return err; + + return count; +} + +static struct kobj_attribute split_underused_thp_attr =3D __ATTR( + shrink_underused, 0644, split_underused_thp_show, split_underused_thp_sto= re); + static struct attribute *hugepage_attr[] =3D { &enabled_attr.attr, &defrag_attr.attr, @@ -448,6 +470,7 @@ static struct attribute *hugepage_attr[] =3D { #ifdef CONFIG_SHMEM &shmem_enabled_attr.attr, #endif + &split_underused_thp_attr.attr, NULL, }; =20 @@ -3601,6 +3624,9 @@ void deferred_split_folio(struct folio *folio, bool p= artially_mapped) if (folio_order(folio) <=3D 1) return; =20 + if (!partially_mapped && !split_underused_thp) + return; + /* * The try_to_unmap() in page reclaim path might reach here too, * this may cause a race condition to corrupt deferred split queue. --=20 2.43.5