From nobody Sat Feb 7 19:45:43 2026 Received: from mail-qk1-f169.google.com (mail-qk1-f169.google.com [209.85.222.169]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BEEAF1991A7; Tue, 30 Jul 2024 12:54:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.169 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722344053; cv=none; b=p0IDr4kjAIOcryxH7ORzrNHr8nPZJ8A/wMriD1DlKH4gslMLuJlkX0PYlnn8SkXNuNhXDdr6PjJOeIc+xlTh6uSvRCY2zZR7SqWuJ5p/45QoYBCEjdLyv/qslJyMpm1DJWwlFOke9YRhtrjkkyMCNDATpeXMSJXEdnvKOM8VfwU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722344053; c=relaxed/simple; bh=hMifvmNy6jSvPiHEg2g5JNijSpLu2sbjQPsrq9O1zvU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=MxO1cPcBqAQz/oVmwBlSqe7cM59rt/U2Uo9WRMQ1sw9HYv1hRpdBQ3DcIvrM6VfIW6WYPhjVMdGFyGT/DbDVtOgkPASrTgJTiGYvJ2MnS8SplafNsCyRFBAGyNbDlcD9zuZGzRYUAE2NhOt6Ko9TIOzMPH2DGBs9zfx8OHuZ2uk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=H4O04Ubr; arc=none smtp.client-ip=209.85.222.169 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="H4O04Ubr" Received: by mail-qk1-f169.google.com with SMTP id af79cd13be357-7a1e0ff6871so266746085a.2; Tue, 30 Jul 2024 05:54:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722344050; x=1722948850; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=KP/Cz3QVNU5uyUnKyaSLbXt0q/kOvLrAG+hU9UnZ/BE=; b=H4O04UbrYXXl9To4EKPfYL55GxQrD2MI2z1hACXcwe+bv0nl0i7i8AioP+8dSCy8dk 5/LWp8jqCMuQLHTdxxAFZ2kWwRZHycQ5jhyjZYmo3XdJ3COtQjDlR64gibKiVdzWE8Z+ e7wRVXFRTJToeCeTEbD53nrn0KZCc3zIU5nZXzKLakEehY1NCaIxlmd26Gb0Cmm6P+WM rGgV3GQiaPrYBpIAokLZpKswaQJ6NlNc3gKWLngwBozdBURsVhHoqEK2rXxtjpIMHSwR Pyrjotu1++uyymZU/wJLbJEdYsc9FA/HotzHKZF/FTH2+daHr3523AxVF05gBO4b9Ie+ 0yDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722344050; x=1722948850; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KP/Cz3QVNU5uyUnKyaSLbXt0q/kOvLrAG+hU9UnZ/BE=; b=bPxDoyFZ06Occh9X4rMqohSxWoYlPy2xkV2hO09dTGEksWlvhzoUBrfb8lGdejc9eX 8fhejPFGozL2zB/jT7QBCM+4B13m3lDkp55dD8uC+k4qnQfVhrBv0lWyvJMQIVGYBaoL zhszPaGRfQ2eY4ZR1ItWV8MDChUZHVD6uBd8b3UThSqanZwmFoET6TUed5YIlhZ2qJia kMx/Mz6shbJZriq6DiueM4P0LVippDtGRu5mwJ9vxl+7QMLbx/8f/QN+lQwyAO1/gHwV VQlRKNsa2q1J2/RFB3DNG6WScCiK3XvSp8Mm33+Vu71BPGUM59J6x7SDx0yh+A+PdmU3 rZNw== X-Forwarded-Encrypted: i=1; AJvYcCUEZr0yDNq67fKmKEIwMOg0+3ukVvHAzZNl6SGt3WIYZ0+OQyN8Jr1S6E0wf/K+t0Z9CpY40idn5AyoKPDjD3akGt2Z8pf/A0X8ws6vPJLXia7U+QCtxE7PRSjsE4cU5N84AxCv4n+8 X-Gm-Message-State: AOJu0Yy2PyQgF3zEnIoQeXMZvW2W9enJBzoLTlpMzBfs3LoSvW2kTcfZ ycJ+IxwYVKaaWrB6Xb0M8lJJBVvXQ3kWdOXPjyhp1LPZa4iEDrjE X-Google-Smtp-Source: AGHT+IGKfomzFh7hnhpcESd3MD/dzvcJlg6cDEKXingUGcZg56zluFUVdo6AMVlaIlW4lTWHXPgxoA== X-Received: by 2002:a05:620a:24c3:b0:79e:fcb8:815c with SMTP id af79cd13be357-7a1e52cdbf2mr1496430585a.54.1722344050589; Tue, 30 Jul 2024 05:54:10 -0700 (PDT) Received: from localhost (fwdproxy-ash-009.fbsv.net. [2a03:2880:20ff:9::face:b00c]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a1d73ea990sm626878885a.55.2024.07.30.05.54.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jul 2024 05:54:09 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH 1/6] Revert "memcg: remove mem_cgroup_uncharge_list()" Date: Tue, 30 Jul 2024 13:45:58 +0100 Message-ID: <20240730125346.1580150-2-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240730125346.1580150-1-usamaarif642@gmail.com> References: <20240730125346.1580150-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" mem_cgroup_uncharge_list will be needed in a later patch for an optimization to free zapped tail pages when splitting isolated thp. Signed-off-by: Usama Arif --- include/linux/memcontrol.h | 12 ++++++++++++ mm/memcontrol.c | 19 +++++++++++++++++++ 2 files changed, 31 insertions(+) diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 07eadf7ecbba..cbaf0ea1b217 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -713,6 +713,14 @@ static inline void mem_cgroup_uncharge(struct folio *f= olio) __mem_cgroup_uncharge(folio); } =20 +void __mem_cgroup_uncharge_list(struct list_head *page_list); +static inline void mem_cgroup_uncharge_list(struct list_head *page_list) +{ + if (mem_cgroup_disabled()) + return; + __mem_cgroup_uncharge_list(page_list); +} + void __mem_cgroup_uncharge_folios(struct folio_batch *folios); static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios) { @@ -1203,6 +1211,10 @@ static inline void mem_cgroup_uncharge(struct folio = *folio) { } =20 +static inline void mem_cgroup_uncharge_list(struct list_head *page_list) +{ +} + static inline void mem_cgroup_uncharge_folios(struct folio_batch *folios) { } diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9b3ef3a70833..f568b9594c2b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4717,6 +4717,25 @@ void __mem_cgroup_uncharge(struct folio *folio) uncharge_batch(&ug); } =20 +/** + * __mem_cgroup_uncharge_list - uncharge a list of page + * @page_list: list of pages to uncharge + * + * Uncharge a list of pages previously charged with + * __mem_cgroup_charge(). + */ +void __mem_cgroup_uncharge_list(struct list_head *page_list) +{ + struct uncharge_gather ug; + struct folio *folio; + + uncharge_gather_clear(&ug); + list_for_each_entry(folio, page_list, lru) + uncharge_folio(folio, &ug); + if (ug.memcg) + uncharge_batch(&ug); +} + void __mem_cgroup_uncharge_folios(struct folio_batch *folios) { struct uncharge_gather ug; --=20 2.43.0 From nobody Sat Feb 7 19:45:43 2026 Received: from mail-qk1-f173.google.com (mail-qk1-f173.google.com [209.85.222.173]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0CC9919E80F; Tue, 30 Jul 2024 12:54:12 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.173 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722344054; cv=none; b=BuDd3q4KPkJiIS752roTNmAFI3svphMxwRVt4Ej7LW/g1A5nIHiMXUT5eEIYpE157BdB/ZqVOqTuR90H+mv8j5i9xdKg0JfCclQ3baVrThiA7URBv9t6P6acogCXX2THB5esvKeRQCWZJN3hhUu5VcqoB192aqYyFbRd5TWwOtM= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722344054; c=relaxed/simple; bh=xqdoAcWmVUYMrD+FVNRpmeoojd2kYweY5GdVkMg2yo0=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=oZf3XEQZ2xPt9NMf3VLmmAq6wBONX8J46nmGMVQCs6QM+FnU0hLwnun/eDImkzE/wbyoy5M0SVvDqdPq+0ByP2TAKuAkR3lzd6SvtuRNeAhXGtrBcdHOkMk0Ug317RFms13icz0g6mz8XBTRLsujR3DaGlwJj06UCL2i1f7vzN8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=NsHVScy9; arc=none smtp.client-ip=209.85.222.173 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="NsHVScy9" Received: by mail-qk1-f173.google.com with SMTP id af79cd13be357-7a1e1f6a924so263811085a.1; Tue, 30 Jul 2024 05:54:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722344052; x=1722948852; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=xejBMxasF4z7vTbtt633OZCxjB3/TpD8KsbFAqhPXuY=; b=NsHVScy9BmTHGBMRJKxpYBzdcbpg6cZzWzhIOC/A7n60hNuQ6zTGPWpLjYzZWHq3Ru i+YEyKR4lpCaPfxa2ODcgPOrqKIPxO/vbn93TkrwCDtwHoU4mj4yDj4t9r7vuqgAf+KO +MBoZdHuDV3t3TBVXFHK5dQT5lsVMF1KkQSHGod3xA3xhR3RwQF+VxLEATU5WCrl04p+ EAjuIfgUWxzUVmazK7ySG/6upPGLg3diafsSPXHsfs+ndyk49ze828/nKbFnCwuP5ycW uFvlv84oBjlY/c79P8zQGmgZGMoXGIHk/mzWUTzh0E1ovwaPcM817xjITCgmuPi6nDjM y6mw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722344052; x=1722948852; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=xejBMxasF4z7vTbtt633OZCxjB3/TpD8KsbFAqhPXuY=; b=R0Ww9fB6hPlIbpsfYwJn6QEheQJC9QMB6H7+cGn7DPunRG3J9dGqD36stsoZ4KkBmQ 3CDrff8zK3uDRLLLeH08zcVUnw+zMwtDyjWrhpHVYY3qg3ne21Jx2A2TYxz9BfKQL9PU qwjjej31X0ivnc/8NqVGvq62TFANaP9DDpSU7mKOuPTuvat42AopxRsCytuHZGB8ozpz hPRTeEMicrNy9PjrnDRNolaoq2ChILvHL7K0+bDhKogcyERWPnEzfbOqoExT1X+jQlFu oywIrhlqsIk4LqMPqgIxol9OjZGkpN2vLoLA/ztmpXXP5FgsYF6utt8zupsI829JGPH1 N/hg== X-Forwarded-Encrypted: i=1; AJvYcCXHg5a5Gey3apcvVeAmZZfg8lSI5c91FutB6XxCoMGaDNpUlBww1shkU3+hpu08m4juZ4srl4ud6k6g8mFeobvt8i1zzty8/WFd4ESPEkP/ZL7HsDifHlut+o5Efsr5ejv+CQvRHREo X-Gm-Message-State: AOJu0Yzbrdc1cXZEK1Camr27n8KDlLlI8sY5ru6Bsne4+N1VWu9JHPak QblddOIcQOo2Dz7U8cRk6H5EU3qHIB3ttMC7F0ES0A/UwL8cq+XV X-Google-Smtp-Source: AGHT+IH1TaNMaT1owhjXGb3Te5E+aSoMQp8eh+KW6WOTF5nBI+raPcLbOMS88vx6Ilwbsv6KRamZCA== X-Received: by 2002:a05:6214:4019:b0:6b9:299b:94ba with SMTP id 6a1803df08f44-6bb55ad89a7mr115074966d6.46.1722344051931; Tue, 30 Jul 2024 05:54:11 -0700 (PDT) Received: from localhost (fwdproxy-ash-112.fbsv.net. [2a03:2880:20ff:70::face:b00c]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44fe8123516sm50426311cf.1.2024.07.30.05.54.11 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jul 2024 05:54:11 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH 2/6] Revert "mm: remove free_unref_page_list()" Date: Tue, 30 Jul 2024 13:45:59 +0100 Message-ID: <20240730125346.1580150-3-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240730125346.1580150-1-usamaarif642@gmail.com> References: <20240730125346.1580150-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" free_unref_page_list will be needed in a later patch for an optimization to free zapped tail pages when splitting isolated thp. Signed-off-by: Usama Arif --- mm/internal.h | 1 + mm/page_alloc.c | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+) diff --git a/mm/internal.h b/mm/internal.h index 7a3bcc6d95e7..259afe44dc88 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -680,6 +680,7 @@ extern int user_min_free_kbytes; =20 void free_unref_page(struct page *page, unsigned int order); void free_unref_folios(struct folio_batch *fbatch); +void free_unref_page_list(struct list_head *list); =20 extern void zone_pcp_reset(struct zone *zone); extern void zone_pcp_disable(struct zone *zone); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index aae00ba3b3bd..38832e6b1e6c 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2774,6 +2774,24 @@ void free_unref_folios(struct folio_batch *folios) folio_batch_reinit(folios); } =20 +void free_unref_page_list(struct list_head *list) +{ + struct folio_batch fbatch; + + folio_batch_init(&fbatch); + while (!list_empty(list)) { + struct folio *folio =3D list_first_entry(list, struct folio, lru); + + list_del(&folio->lru); + if (folio_batch_add(&fbatch, folio) > 0) + continue; + free_unref_folios(&fbatch); + } + + if (fbatch.nr) + free_unref_folios(&fbatch); +} + /* * split_page takes a non-compound higher-order page, and splits it into * n (1< To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Shuang Zhai , Usama Arif Subject: [PATCH 3/6] mm: free zapped tail pages when splitting isolated thp Date: Tue, 30 Jul 2024 13:46:00 +0100 Message-ID: <20240730125346.1580150-4-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240730125346.1580150-1-usamaarif642@gmail.com> References: <20240730125346.1580150-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yu Zhao If a tail page has only two references left, one inherited from the isolation of its head and the other from lru_add_page_tail() which we are about to drop, it means this tail page was concurrently zapped. Then we can safely free it and save page reclaim or migration the trouble of trying it. Signed-off-by: Yu Zhao Tested-by: Shuang Zhai Signed-off-by: Usama Arif --- mm/huge_memory.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 0167dc27e365..76a3b6a2b796 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2923,6 +2923,8 @@ static void __split_huge_page(struct page *page, stru= ct list_head *list, unsigned int new_nr =3D 1 << new_order; int order =3D folio_order(folio); unsigned int nr =3D 1 << order; + LIST_HEAD(pages_to_free); + int nr_pages_to_free =3D 0; =20 /* complete memcg works before add pages to LRU */ split_page_memcg(head, order, new_order); @@ -3007,6 +3009,24 @@ static void __split_huge_page(struct page *page, str= uct list_head *list, if (subpage =3D=3D page) continue; folio_unlock(new_folio); + /* + * If a tail page has only two references left, one inherited + * from the isolation of its head and the other from + * lru_add_page_tail() which we are about to drop, it means this + * tail page was concurrently zapped. Then we can safely free it + * and save page reclaim or migration the trouble of trying it. + */ + if (list && page_ref_freeze(subpage, 2)) { + VM_BUG_ON_PAGE(PageLRU(subpage), subpage); + VM_BUG_ON_PAGE(PageCompound(subpage), subpage); + VM_BUG_ON_PAGE(page_mapped(subpage), subpage); + + ClearPageActive(subpage); + ClearPageUnevictable(subpage); + list_move(&subpage->lru, &pages_to_free); + nr_pages_to_free++; + continue; + } =20 /* * Subpages may be freed if there wasn't any mapping @@ -3017,6 +3037,12 @@ static void __split_huge_page(struct page *page, str= uct list_head *list, */ free_page_and_swap_cache(subpage); } + + if (!nr_pages_to_free) + return; + + mem_cgroup_uncharge_list(&pages_to_free); + free_unref_page_list(&pages_to_free); } =20 /* Racy check whether the huge page can be split */ --=20 2.43.0 From nobody Sat Feb 7 19:45:43 2026 Received: from mail-ua1-f51.google.com (mail-ua1-f51.google.com [209.85.222.51]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 39B4019FA76; Tue, 30 Jul 2024 12:54:15 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.51 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722344058; cv=none; b=BAU+OZFF2NM3gBAKsQ1K5dTjMj7bafA/uH9MCu3/wl104/2f9a7NyVhSKy/t2kxOCmrMdyXqchEuHR4s2qGg85GW0Lx/gYKaqRmFTANBVq2SDZFrYm7JfjHA02a8An4wpLxWQI5TisQ6s0hSZFOD2H/bkdf0R81xTH2vNUav9ug= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722344058; c=relaxed/simple; bh=99nW0Cfi/w568LUR5sE7RuC+dO3tcBSiOUXyyhiMJQU=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=Trl7UsrxX3evzgjhvCS9chudrXNkoIdWFWxoQwC80sXBXwXiTlUffIN4IB9fHtamG6FdtLT+Rqy7ZpVc24CgVjy7WQxslFu2MMmXM2QutNDp6R2uu/mNK6wz6lmm+zGHcAUxGYOZi02Liz3l/cvGcYXp7epWAJaLs6e0RyvPhLQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=aSEe1m5b; arc=none smtp.client-ip=209.85.222.51 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="aSEe1m5b" Received: by mail-ua1-f51.google.com with SMTP id a1e0cc1a2514c-83446a5601bso1188312241.0; Tue, 30 Jul 2024 05:54:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722344055; x=1722948855; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=a1wFF2OykYTq7u4lRnrBlmyhKJmdj3JpsXJVQxU9HeI=; b=aSEe1m5b/mwi35UO6EH/POoHfDbXiTCcb+tFg7zg0L7VOibQXH2BqiN1wFQBV3gFV3 9jSzFDddJam4KRFtXb+Rbp/jao/Pz4pH3Jb4qhfcEpAzNjLExwDIj3qLvna8A7aD6fOc G8LcRbOOGrO0Js8c44pq+GrL3z44JJ5IwYEDP/GDBE5hy1v8bX9siP/OvSc7VE3deF+n wTkMgzZhWdtLj8lZIOUSvKt9zxFiev4YzQWvasuREp7u78V93J7mBCGrWDz8kR93t4Ai 1z1bgnCNyCluM+ZjMhckQWVH+anVVQm30OeO9P+jk5qifpj0NuT0acdx11in0OD8lZ0+ a6ig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722344055; x=1722948855; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=a1wFF2OykYTq7u4lRnrBlmyhKJmdj3JpsXJVQxU9HeI=; b=vsR2nFSWrmy8tG/Mn0iZ2odNhA6Mz5xr8IZOA0MJQj7gf8CYphKh7D5weU76Y9I6S4 RGWCygTPRMLvWBfSBqmvda1tUOadlrGfbQNGKB8KpB9+JULkiwPTO3Kp7sY74JC/fIB/ +DhDvp1AD6tQh24zzo97xWnrgSVQk7RMT6jtK8NJiMcnFhCA+KQhd6SOTa72LuIgq+cQ njiYQ2IMHHXo/ZFgLI66nfHDK4tNWb7nhlIxVjuWc7KoYx3HX+FoGKT3gUptbihk0LLM h1WUE3h3M8/soMbrO5zYzHZCy1AioHevlEjwKCTSLtYgN0FpZGEJySaXORBjBah6SmyL HDxw== X-Forwarded-Encrypted: i=1; AJvYcCVXe8ob346QlFnxtFyOoemRvs3lSTKxUrbErTIf/NNhK8tSrK+R4YvE/mDH7oTZbcrE2Trdns2vLZxZE3hVJ/6gqKCjjVi8mekeve7VjtKcvhAlewqsx9/36samOy81PL2m8so8m9IM X-Gm-Message-State: AOJu0YzmorV0kvIKrbbcvGTTShwjfe4a+PB3EfL8oN47+CU+tutTTaK5 a8XZvfyhZjIZNUE1TaZV6XFqi42+nPGlA1IrsyPimzNXZaBNdYg8 X-Google-Smtp-Source: AGHT+IFv69c1CqQsvpikAHVYWXzas2sO4e5GzupNiWoyXChBgpWC9IWcrnZ5g+GmctsDCyK+p6YuCA== X-Received: by 2002:a05:6102:54ac:b0:492:a93d:7ca7 with SMTP id ada2fe7eead31-493fa61a794mr12211983137.4.1722344054907; Tue, 30 Jul 2024 05:54:14 -0700 (PDT) Received: from localhost (fwdproxy-ash-114.fbsv.net. [2a03:2880:20ff:72::face:b00c]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a1d745f404sm628318285a.131.2024.07.30.05.54.14 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jul 2024 05:54:14 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Shuang Zhai , Usama Arif Subject: [PATCH 4/6] mm: don't remap unused subpages when splitting isolated thp Date: Tue, 30 Jul 2024 13:46:01 +0100 Message-ID: <20240730125346.1580150-5-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240730125346.1580150-1-usamaarif642@gmail.com> References: <20240730125346.1580150-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Yu Zhao Here being unused means containing only zeros and inaccessible to userspace. When splitting an isolated thp under reclaim or migration, there is no need to remap its unused subpages because they can be faulted in anew. Not remapping them avoids writeback or copying during reclaim or migration. This is particularly helpful when the internal fragmentation of a thp is high, i.e., it has many untouched subpages. This is also a prerequisite for THP low utilization shrinker which will be introduced in later patches, where underutilized THPs are split, and the zero-filled split pages are freed saving memory. Signed-off-by: Yu Zhao Tested-by: Shuang Zhai Signed-off-by: Usama Arif --- include/linux/rmap.h | 2 +- mm/huge_memory.c | 8 ++--- mm/migrate.c | 73 +++++++++++++++++++++++++++++++++++++++----- mm/migrate_device.c | 4 +-- 4 files changed, 72 insertions(+), 15 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index 0978c64f49d8..805ab09057ed 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -745,7 +745,7 @@ int folio_mkclean(struct folio *); int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t p= goff, struct vm_area_struct *vma); =20 -void remove_migration_ptes(struct folio *src, struct folio *dst, bool lock= ed); +void remove_migration_ptes(struct folio *src, struct folio *dst, bool lock= ed, bool unmap_unused); =20 /* * rmap_walk_control: To control rmap traversing for specific needs diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 76a3b6a2b796..892467d85f3a 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2775,7 +2775,7 @@ bool unmap_huge_pmd_locked(struct vm_area_struct *vma= , unsigned long addr, return false; } =20 -static void remap_page(struct folio *folio, unsigned long nr) +static void remap_page(struct folio *folio, unsigned long nr, bool unmap_u= nused) { int i =3D 0; =20 @@ -2783,7 +2783,7 @@ static void remap_page(struct folio *folio, unsigned = long nr) if (!folio_test_anon(folio)) return; for (;;) { - remove_migration_ptes(folio, folio, true); + remove_migration_ptes(folio, folio, true, unmap_unused); i +=3D folio_nr_pages(folio); if (i >=3D nr) break; @@ -2993,7 +2993,7 @@ static void __split_huge_page(struct page *page, stru= ct list_head *list, =20 if (nr_dropped) shmem_uncharge(folio->mapping->host, nr_dropped); - remap_page(folio, nr); + remap_page(folio, nr, PageAnon(head)); =20 /* * set page to its compound_head when split to non order-0 pages, so @@ -3286,7 +3286,7 @@ int split_huge_page_to_list_to_order(struct page *pag= e, struct list_head *list, if (mapping) xas_unlock(&xas); local_irq_enable(); - remap_page(folio, folio_nr_pages(folio)); + remap_page(folio, folio_nr_pages(folio), false); ret =3D -EAGAIN; } =20 diff --git a/mm/migrate.c b/mm/migrate.c index b273bac0d5ae..f4f06bdded70 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -177,13 +177,61 @@ void putback_movable_pages(struct list_head *l) } } =20 +static bool try_to_unmap_unused(struct page_vma_mapped_walk *pvmw, + struct folio *folio, + unsigned long idx) +{ + struct page *page =3D folio_page(folio, idx); + void *addr; + bool dirty; + pte_t newpte; + + VM_BUG_ON_PAGE(PageCompound(page), page); + VM_BUG_ON_PAGE(!PageAnon(page), page); + VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_PAGE(pte_present(*pvmw->pte), page); + + if (PageMlocked(page) || (pvmw->vma->vm_flags & VM_LOCKED)) + return false; + + /* + * The pmd entry mapping the old thp was flushed and the pte mapping + * this subpage has been non present. Therefore, this subpage is + * inaccessible. We don't need to remap it if it contains only zeros. + */ + addr =3D kmap_local_page(page); + dirty =3D memchr_inv(addr, 0, PAGE_SIZE); + kunmap_local(addr); + + if (dirty) + return false; + + pte_clear_not_present_full(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, fa= lse); + + if (userfaultfd_armed(pvmw->vma)) { + newpte =3D pte_mkspecial(pfn_pte(page_to_pfn(ZERO_PAGE(pvmw->address)), + pvmw->vma->vm_page_prot)); + ptep_clear_flush(pvmw->vma, pvmw->address, pvmw->pte); + set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); + } + + dec_mm_counter(pvmw->vma->vm_mm, mm_counter(folio)); + return true; +} + +struct rmap_walk_arg { + struct folio *folio; + bool unmap_unused; +}; + /* * Restore a potential migration pte to a working pte entry */ static bool remove_migration_pte(struct folio *folio, - struct vm_area_struct *vma, unsigned long addr, void *old) + struct vm_area_struct *vma, unsigned long addr, void *arg) { - DEFINE_FOLIO_VMA_WALK(pvmw, old, vma, addr, PVMW_SYNC | PVMW_MIGRATION); + struct rmap_walk_arg *rmap_walk_arg =3D arg; + DEFINE_FOLIO_VMA_WALK(pvmw, rmap_walk_arg->folio, vma, addr, PVMW_SYNC | = PVMW_MIGRATION); =20 while (page_vma_mapped_walk(&pvmw)) { rmap_t rmap_flags =3D RMAP_NONE; @@ -207,6 +255,8 @@ static bool remove_migration_pte(struct folio *folio, continue; } #endif + if (rmap_walk_arg->unmap_unused && try_to_unmap_unused(&pvmw, folio, idx= )) + continue; =20 folio_get(folio); pte =3D mk_pte(new, READ_ONCE(vma->vm_page_prot)); @@ -285,13 +335,20 @@ static bool remove_migration_pte(struct folio *folio, * Get rid of all migration entries and replace them by * references to the indicated page. */ -void remove_migration_ptes(struct folio *src, struct folio *dst, bool lock= ed) +void remove_migration_ptes(struct folio *src, struct folio *dst, bool lock= ed, bool unmap_unused) { + struct rmap_walk_arg rmap_walk_arg =3D { + .folio =3D src, + .unmap_unused =3D unmap_unused, + }; + struct rmap_walk_control rwc =3D { .rmap_one =3D remove_migration_pte, - .arg =3D src, + .arg =3D &rmap_walk_arg, }; =20 + VM_BUG_ON_FOLIO(unmap_unused && src !=3D dst, src); + if (locked) rmap_walk_locked(dst, &rwc); else @@ -904,7 +961,7 @@ static int writeout(struct address_space *mapping, stru= ct folio *folio) * At this point we know that the migration attempt cannot * be successful. */ - remove_migration_ptes(folio, folio, false); + remove_migration_ptes(folio, folio, false, false); =20 rc =3D mapping->a_ops->writepage(&folio->page, &wbc); =20 @@ -1068,7 +1125,7 @@ static void migrate_folio_undo_src(struct folio *src, struct list_head *ret) { if (page_was_mapped) - remove_migration_ptes(src, src, false); + remove_migration_ptes(src, src, false, false); /* Drop an anon_vma reference if we took one */ if (anon_vma) put_anon_vma(anon_vma); @@ -1306,7 +1363,7 @@ static int migrate_folio_move(free_folio_t put_new_fo= lio, unsigned long private, lru_add_drain(); =20 if (old_page_state & PAGE_WAS_MAPPED) - remove_migration_ptes(src, dst, false); + remove_migration_ptes(src, dst, false, false); =20 out_unlock_both: folio_unlock(dst); @@ -1444,7 +1501,7 @@ static int unmap_and_move_huge_page(new_folio_t get_n= ew_folio, =20 if (page_was_mapped) remove_migration_ptes(src, - rc =3D=3D MIGRATEPAGE_SUCCESS ? dst : src, false); + rc =3D=3D MIGRATEPAGE_SUCCESS ? dst : src, false, false); =20 unlock_put_anon: folio_unlock(dst); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 6d66dc1c6ffa..a1630d8e0d95 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -424,7 +424,7 @@ static unsigned long migrate_device_unmap(unsigned long= *src_pfns, continue; =20 folio =3D page_folio(page); - remove_migration_ptes(folio, folio, false); + remove_migration_ptes(folio, folio, false, false); =20 src_pfns[i] =3D 0; folio_unlock(folio); @@ -837,7 +837,7 @@ void migrate_device_finalize(unsigned long *src_pfns, =20 src =3D page_folio(page); dst =3D page_folio(newpage); - remove_migration_ptes(src, dst, false); + remove_migration_ptes(src, dst, false, false); folio_unlock(src); =20 if (is_zone_device_page(page)) --=20 2.43.0 From nobody Sat Feb 7 19:45:43 2026 Received: from mail-qk1-f172.google.com (mail-qk1-f172.google.com [209.85.222.172]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A2A9D1A01D3; Tue, 30 Jul 2024 12:54:17 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.222.172 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722344059; cv=none; b=ogn0EJGFOdSxRWbigrwAmHIiq8nTLf8qcvLkoBBm+wa5gAPOhNmvwLP4imvo2vmOXMW7Z2DCEIv9bUUWcniQYijeVw5us0WuXp4STGQXbISlhXoHOdoRAk2xlGP6Zg345oXQygubL0uuJhO5/jaM4c5J7A5ppDKrji/ZJVvYlAc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722344059; c=relaxed/simple; bh=JIxWKMCWTWtq4hHAFtRWFUtzzelOPWyAk5/tlbZUSP4=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=luh1z8LOih02eVdqTvcY8qy4ZDiCab676wTPrlT9L5lLP8HP54wbJ+HEn+w+Y89N8u8t3Ppg6TtoGCnQr73As5Vxovm21BHlfoDKn3eDNJbj7UY2KyihDj1fG2nSTuax+jvmF8JNfYsbeug4RLLx/Ulkbma7erLPOVuIh3iFxJY= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=GDzInY6J; arc=none smtp.client-ip=209.85.222.172 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="GDzInY6J" Received: by mail-qk1-f172.google.com with SMTP id af79cd13be357-7a1e31bc183so136060985a.3; Tue, 30 Jul 2024 05:54:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722344056; x=1722948856; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=WMly8HhHGH4A9wPlKbdln8VuB1NNTIz/ItEPyhn23R8=; b=GDzInY6Jc/5hBkqNck7COluEv2IwUxKXc3FwSd86DpRG3zgTNGq0nNq7L/ZtibcQnc wVtqX+pnUhEwktP29aRX3EtotYqLOVvT6yQ99Y1LG1ux4zi5WBtYAOJZIzCLf7Y1eXW0 KS9iNt6UvLLtE6j/rnDyzi3egYb5FPykbX4MAa/WdnoMUy7pRFDzmTBJAqz+6wrwgIjz ImD4UrefmfCcuCpyIm2/tmyUin8IXx1UOadCB3+/NJ/5XA9RXUVGsCYwh0LdK/VbUBen QIGjrQ+8veKSl00FFNSA+xDhuLGaXzsfcXUWeZCyvN8jNtQtYW/OsF3edssjgWb2oMBR Y1yA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722344056; x=1722948856; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WMly8HhHGH4A9wPlKbdln8VuB1NNTIz/ItEPyhn23R8=; b=c8HW9SDaM3yWxZ6MB9Ic/KMYirOHLQiWCvwo1diJpmyl4Z/PVOQqfJlYjuc6ChGXr1 VQWiEjsrGde67OtY9KfpcqZ3uWukHtgiTgoNrrY250uEjsT4P05+emo7RxZ9Gqvlo8mC V3KUG/doate3KizNVmrF2Ilgfiwr9RRy1yxZ/owOVkGsu9LtLG5jl9ubGccgz7IB4GS0 pf6zNNpRVU6CPlMq5wehtoynOiJvIwhaJTBFXrw9SnVG2YUotuAPNoR3yY2YaElm08eq IKG9mDf8Y76V6zF4h9r+JiY+cJcHTjewYnVF7t1i4R537jTp2QtdM/IVgBjyFTFv+Flf 7Zdw== X-Forwarded-Encrypted: i=1; AJvYcCVgPzhkMIXpr4k70osMX2FkXYkHNQFqodXlSmA4F/LRBRIyIR6HuK2/mzIWcDm0sHlZqQSZw2VtItNMRBQpKrCsaYBFZVa0UBfpDLSzhsbwMsr2imyg5wwmGgKf2YYGvVNvL0oqXnuW X-Gm-Message-State: AOJu0YxYqjpsi0i28mr/19UgoK0EQDSiKyOxz62566gnGYgMeHpcL0Jh 8MEI8aytT/h5fCHNApwYU4a3GWPA9GsNEf4VJ6n68vTPJqmOpPde X-Google-Smtp-Source: AGHT+IEmkurgjtLPaaaxi8bgqZN25LoWu8XIBbBu/DllgXvVyAnH9Zg7Q6mMTVSeKZ93YStrgiY5Rg== X-Received: by 2002:a05:620a:4507:b0:79d:751b:67b6 with SMTP id af79cd13be357-7a1e525458cmr1406226385a.16.1722344056416; Tue, 30 Jul 2024 05:54:16 -0700 (PDT) Received: from localhost (fwdproxy-ash-007.fbsv.net. [2a03:2880:20ff:7::face:b00c]) by smtp.gmail.com with ESMTPSA id af79cd13be357-7a1d7435577sm636521785a.96.2024.07.30.05.54.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jul 2024 05:54:15 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Alexander Zhu , Usama Arif Subject: [PATCH 5/6] mm: add selftests to split_huge_page() to verify unmap/zap of zero pages Date: Tue, 30 Jul 2024 13:46:02 +0100 Message-ID: <20240730125346.1580150-6-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240730125346.1580150-1-usamaarif642@gmail.com> References: <20240730125346.1580150-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: Alexander Zhu Self tests to verify the RssAnon value to make sure zero pages are not remapped except in the case of userfaultfd. Also includes a self test for the userfaultfd use case. Signed-off-by: Alexander Zhu Signed-off-by: Usama Arif Acked-by: Rik van Riel --- .../selftests/mm/split_huge_page_test.c | 113 ++++++++++++++++++ tools/testing/selftests/mm/vm_util.c | 22 ++++ tools/testing/selftests/mm/vm_util.h | 1 + 3 files changed, 136 insertions(+) diff --git a/tools/testing/selftests/mm/split_huge_page_test.c b/tools/test= ing/selftests/mm/split_huge_page_test.c index e5e8dafc9d94..da271ad6ff11 100644 --- a/tools/testing/selftests/mm/split_huge_page_test.c +++ b/tools/testing/selftests/mm/split_huge_page_test.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #include "vm_util.h" #include "../kselftest.h" =20 @@ -84,6 +86,115 @@ static void write_debugfs(const char *fmt, ...) write_file(SPLIT_DEBUGFS, input, ret + 1); } =20 +static char *allocate_zero_filled_hugepage(size_t len) +{ + char *result; + size_t i; + + result =3D memalign(pmd_pagesize, len); + if (!result) { + printf("Fail to allocate memory\n"); + exit(EXIT_FAILURE); + } + + madvise(result, len, MADV_HUGEPAGE); + + for (i =3D 0; i < len; i++) + result[i] =3D (char)0; + + return result; +} + +static void verify_rss_anon_split_huge_page_all_zeroes(char *one_page, int= nr_hpages, size_t len) +{ + uint64_t rss_anon_before, rss_anon_after; + size_t i; + + if (!check_huge_anon(one_page, 4, pmd_pagesize)) { + printf("No THP is allocated\n"); + exit(EXIT_FAILURE); + } + + rss_anon_before =3D rss_anon(); + if (!rss_anon_before) { + printf("No RssAnon is allocated before split\n"); + exit(EXIT_FAILURE); + } + + /* split all THPs */ + write_debugfs(PID_FMT, getpid(), (uint64_t)one_page, + (uint64_t)one_page + len, 0); + + for (i =3D 0; i < len; i++) + if (one_page[i] !=3D (char)0) { + printf("%ld byte corrupted\n", i); + exit(EXIT_FAILURE); + } + + if (!check_huge_anon(one_page, 0, pmd_pagesize)) { + printf("Still AnonHugePages not split\n"); + exit(EXIT_FAILURE); + } + + rss_anon_after =3D rss_anon(); + if (rss_anon_after >=3D rss_anon_before) { + printf("Incorrect RssAnon value. Before: %ld After: %ld\n", + rss_anon_before, rss_anon_after); + exit(EXIT_FAILURE); + } +} + +void split_pmd_zero_pages(void) +{ + char *one_page; + int nr_hpages =3D 4; + size_t len =3D nr_hpages * pmd_pagesize; + + one_page =3D allocate_zero_filled_hugepage(len); + verify_rss_anon_split_huge_page_all_zeroes(one_page, nr_hpages, len); + printf("Split zero filled huge pages successful\n"); + free(one_page); +} + +void split_pmd_zero_pages_uffd(void) +{ + char *one_page; + int nr_hpages =3D 4; + size_t len =3D nr_hpages * pmd_pagesize; + long uffd; /* userfaultfd file descriptor */ + struct uffdio_api uffdio_api; + struct uffdio_register uffdio_register; + + /* Create and enable userfaultfd object. */ + + uffd =3D syscall(__NR_userfaultfd, O_CLOEXEC | O_NONBLOCK); + if (uffd =3D=3D -1) { + perror("userfaultfd"); + exit(1); + } + + uffdio_api.api =3D UFFD_API; + uffdio_api.features =3D 0; + if (ioctl(uffd, UFFDIO_API, &uffdio_api) =3D=3D -1) { + perror("ioctl-UFFDIO_API"); + exit(1); + } + + one_page =3D allocate_zero_filled_hugepage(len); + + uffdio_register.range.start =3D (unsigned long)one_page; + uffdio_register.range.len =3D len; + uffdio_register.mode =3D UFFDIO_REGISTER_MODE_WP; + if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register) =3D=3D -1) { + perror("ioctl-UFFDIO_REGISTER"); + exit(1); + } + + verify_rss_anon_split_huge_page_all_zeroes(one_page, nr_hpages, len); + printf("Split zero filled huge pages with uffd successful\n"); + free(one_page); +} + void split_pmd_thp(void) { char *one_page; @@ -431,6 +542,8 @@ int main(int argc, char **argv) =20 fd_size =3D 2 * pmd_pagesize; =20 + split_pmd_zero_pages(); + split_pmd_zero_pages_uffd(); split_pmd_thp(); split_pte_mapped_thp(); split_file_backed_thp(); diff --git a/tools/testing/selftests/mm/vm_util.c b/tools/testing/selftests= /mm/vm_util.c index 5a62530da3b5..7b7e763ba8e3 100644 --- a/tools/testing/selftests/mm/vm_util.c +++ b/tools/testing/selftests/mm/vm_util.c @@ -12,6 +12,7 @@ =20 #define PMD_SIZE_FILE_PATH "/sys/kernel/mm/transparent_hugepage/hpage_pmd_= size" #define SMAP_FILE_PATH "/proc/self/smaps" +#define STATUS_FILE_PATH "/proc/self/status" #define MAX_LINE_LENGTH 500 =20 unsigned int __page_size; @@ -171,6 +172,27 @@ uint64_t read_pmd_pagesize(void) return strtoul(buf, NULL, 10); } =20 +uint64_t rss_anon(void) +{ + uint64_t rss_anon =3D 0; + FILE *fp; + char buffer[MAX_LINE_LENGTH]; + + fp =3D fopen(STATUS_FILE_PATH, "r"); + if (!fp) + ksft_exit_fail_msg("%s: Failed to open file %s\n", __func__, STATUS_FILE= _PATH); + + if (!check_for_pattern(fp, "RssAnon:", buffer, sizeof(buffer))) + goto err_out; + + if (sscanf(buffer, "RssAnon:%10ld kB", &rss_anon) !=3D 1) + ksft_exit_fail_msg("Reading status error\n"); + +err_out: + fclose(fp); + return rss_anon; +} + bool __check_huge(void *addr, char *pattern, int nr_hpages, uint64_t hpage_size) { diff --git a/tools/testing/selftests/mm/vm_util.h b/tools/testing/selftests= /mm/vm_util.h index 9007c420d52c..71b75429f4a5 100644 --- a/tools/testing/selftests/mm/vm_util.h +++ b/tools/testing/selftests/mm/vm_util.h @@ -39,6 +39,7 @@ unsigned long pagemap_get_pfn(int fd, char *start); void clear_softdirty(void); bool check_for_pattern(FILE *fp, const char *pattern, char *buf, size_t le= n); uint64_t read_pmd_pagesize(void); +uint64_t rss_anon(void); bool check_huge_anon(void *addr, int nr_hpages, uint64_t hpage_size); bool check_huge_file(void *addr, int nr_hpages, uint64_t hpage_size); bool check_huge_shmem(void *addr, int nr_hpages, uint64_t hpage_size); --=20 2.43.0 From nobody Sat Feb 7 19:45:43 2026 Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C7EF51A0715; Tue, 30 Jul 2024 12:54:18 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=209.85.160.170 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722344060; cv=none; b=Ch3YW9NGzzP38aajdl2C9x6gFK/zg3UHv1I7KGIGbiiKtcR//I11OxQN2xlpiRQ02FrmrhB31oAHfiidK/SVt2ygLkx5NFhAYZyRdEVEtnYFlkZ1XhiWwRuQySoyVQtVRTUOxh7fEqtVAyXB35/zBft1Db9kuXRcwsZ2l5mZhRA= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1722344060; c=relaxed/simple; bh=eWtzit659GXSRxSB3hDv1MO2PZdMBStrIEX4ScBNDho=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=V/6lAJQp4RkO5L8/8hUcTHEanbR2QWjlfOrJZGOyBWo44BGcJSxtTNaIWuE/rIUpGpAhW5dVRnHsxX38AB4TpOhwpC1rQL39HjZYPxKG/TAbVnnJ0XiHTXXZOBYNrBs9UUVlt2uJdzcyyFXz0WOAJUY9lMq0Ut+lxHWc7HeGwyU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com; spf=pass smtp.mailfrom=gmail.com; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b=UPWgETXA; arc=none smtp.client-ip=209.85.160.170 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=gmail.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=gmail.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="UPWgETXA" Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-44fe9cf83c7so26012021cf.0; Tue, 30 Jul 2024 05:54:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1722344058; x=1722948858; darn=vger.kernel.org; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=bVPY4ASHiKDxJoUMy5U0J/OHFKRn+kIriV1SskgJKvM=; b=UPWgETXA0Ny9QQPrW3ZtcU2mp9kmoHOCsE4KqjdmQdzmYpbSyoMu3HFseyanlc54uP DNAkHofdeWY7oQ1eDDa4j0q0OiymRT5lRYh9niDqhE9Gwurn6Zg3SuK6JylB//6RBVF7 bseZs/41+6rmUCDrYy2LsRyrwHFjrSiIpQQkAiQ/wxYC1T+i3bxdq3B7V54Cooy5+EsQ rmEfGJ7yUWcUwWHQCtRDvKv5183z7ApopA4xEEqf1OhqKW6u7ETFaDrV7pmP5zEMSCS1 pQSTqmJSGGJTyNLHDE7GeSnCfz3W7hwB5m3r/ilCU5NwX8SO1gAFMF6HgLqtQQMAZNpz Pk0A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1722344058; x=1722948858; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=bVPY4ASHiKDxJoUMy5U0J/OHFKRn+kIriV1SskgJKvM=; b=cY00DJjcQX14CKZGtSdxB39vBUukOkPRn2A//dcibEISzmTtocZ+lBhy7w6tC0Zy8Q VaYdXBVJkbF5jOUhXcugK31LruBg8GFzz/H02HiHvCbefZlIhIgvC3+dx3r89VgvQ8qX FpEJN9okBPr062vWCt+We3IntMigw0CjzPwN/UqQw4IRFHcPMGNVnuZ1vUVV6kG8iG2J 7gdZCWFDMR1f3o2JOrgdG8InGsmD5WRGOoNDwmq14paoLM+bOKh8rwGMRzuZeozr7HD9 /htqwQj6AgoXfdJQ9aDcitLGopw8IoGm5e8lU771lPCY8WpaOlhSBOTkpv6vu6cGdnM6 G/6g== X-Forwarded-Encrypted: i=1; AJvYcCU387kMfkHXpoF5CTxYWSKxgrcAyYiFSkRxP2t8H8m8P6DthWc2CwstJwGSpKh8x4OUI4mgPxNzQnzNKy1pbsDfJn++yYCPSrBDR/s9j2WPjMKjqlIYDhIeDzabnY5HmA+NSbY+nFf0 X-Gm-Message-State: AOJu0YxEe+tF6ikFXs9RJDREM95MZhPi+rHrCa3Hs7INoZgGP0LGCY+1 /GXk3EF7Whh9xXhNR4paWJRnk3GNzD3f1+x0Mx7NdPTsqiE0kJ/Z X-Google-Smtp-Source: AGHT+IFuFNIaZtWXUVIV7gwuLUlPC5q8beeG5HvE70wh9m/VCp60jshxDnGmEpdr8lVXHO6FbmZa+A== X-Received: by 2002:ac8:5d0e:0:b0:442:2c5f:d2f7 with SMTP id d75a77b69052e-45004f2dc53mr162177051cf.31.1722344057639; Tue, 30 Jul 2024 05:54:17 -0700 (PDT) Received: from localhost (fwdproxy-ash-007.fbsv.net. [2a03:2880:20ff:7::face:b00c]) by smtp.gmail.com with ESMTPSA id d75a77b69052e-44fe84199d2sm49133741cf.97.2024.07.30.05.54.17 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 30 Jul 2024 05:54:17 -0700 (PDT) From: Usama Arif To: akpm@linux-foundation.org, linux-mm@kvack.org Cc: hannes@cmpxchg.org, riel@surriel.com, shakeel.butt@linux.dev, roman.gushchin@linux.dev, yuzhao@google.com, david@redhat.com, baohua@kernel.org, ryan.roberts@arm.com, rppt@kernel.org, willy@infradead.org, cerasuolodomenico@gmail.com, corbet@lwn.net, linux-kernel@vger.kernel.org, linux-doc@vger.kernel.org, kernel-team@meta.com, Usama Arif Subject: [PATCH 6/6] mm: split underutilized THPs Date: Tue, 30 Jul 2024 13:46:03 +0100 Message-ID: <20240730125346.1580150-7-usamaarif642@gmail.com> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240730125346.1580150-1-usamaarif642@gmail.com> References: <20240730125346.1580150-1-usamaarif642@gmail.com> Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" This is an attempt to mitigate the issue of running out of memory when THP is always enabled. During runtime whenever a THP is being faulted in (__do_huge_pmd_anonymous_page) or collapsed by khugepaged (collapse_huge_page), the THP is added to _deferred_list. Whenever memory reclaim happens in linux, the kernel runs the deferred_split shrinker which goes through the _deferred_list. If the folio was partially mapped, the shrinker attempts to split it. A new boolean is added to be able to distinguish between partially mapped folios and others in the deferred_list at split time in deferred_split_scan. Its needed as __folio_remove_rmap decrements the folio mapcount elements, hence it won't be possible to distinguish between partially mapped folios and others in deferred_split_scan without the boolean. If folio->_partially_mapped is not set, the shrinker checks if the THP was underutilized, i.e. how many of the base 4K pages of the entire THP were zero-filled. If this number goes above a certain threshold (decided by /sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none), the shrinker will attempt to split that THP. Then at remap time, the pages that were zero-filled are not remapped, hence saving memory. Suggested-by: Rik van Riel Co-authored-by: Johannes Weiner Signed-off-by: Usama Arif --- Documentation/admin-guide/mm/transhuge.rst | 6 ++ include/linux/huge_mm.h | 4 +- include/linux/khugepaged.h | 1 + include/linux/mm_types.h | 2 + include/linux/vm_event_item.h | 1 + mm/huge_memory.c | 118 ++++++++++++++++++--- mm/hugetlb.c | 1 + mm/internal.h | 4 +- mm/khugepaged.c | 3 +- mm/memcontrol.c | 3 +- mm/migrate.c | 3 +- mm/rmap.c | 2 +- mm/vmscan.c | 3 +- mm/vmstat.c | 1 + 14 files changed, 130 insertions(+), 22 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index 058485daf186..24eec1c03ad8 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -447,6 +447,12 @@ thp_deferred_split_page splitting it would free up some memory. Pages on split queue are going to be split under memory pressure. =20 +thp_underutilized_split_page + is incremented when a huge page on the split queue was split + because it was underutilized. A THP is underutilized if the + number of zero pages in the THP are above a certain threshold + (/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none). + thp_split_pmd is incremented every time a PMD split into table of PTEs. This can happen, for instance, when application calls mprotect() or diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index e25d9ebfdf89..00af84aa88ea 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -321,7 +321,7 @@ static inline int split_huge_page(struct page *page) { return split_huge_page_to_list_to_order(page, NULL, 0); } -void deferred_split_folio(struct folio *folio); +void deferred_split_folio(struct folio *folio, bool partially_mapped); =20 void __split_huge_pmd(struct vm_area_struct *vma, pmd_t *pmd, unsigned long address, bool freeze, struct folio *folio); @@ -484,7 +484,7 @@ static inline int split_huge_page(struct page *page) { return 0; } -static inline void deferred_split_folio(struct folio *folio) {} +static inline void deferred_split_folio(struct folio *folio, bool partiall= y_mapped) {} #define split_huge_pmd(__vma, __pmd, __address) \ do { } while (0) =20 diff --git a/include/linux/khugepaged.h b/include/linux/khugepaged.h index f68865e19b0b..30baae91b225 100644 --- a/include/linux/khugepaged.h +++ b/include/linux/khugepaged.h @@ -4,6 +4,7 @@ =20 #include /* MMF_VM_HUGEPAGE */ =20 +extern unsigned int khugepaged_max_ptes_none __read_mostly; #ifdef CONFIG_TRANSPARENT_HUGEPAGE extern struct attribute_group khugepaged_attr_group; =20 diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 485424979254..443026cf763e 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -311,6 +311,7 @@ typedef struct { * @_hugetlb_cgroup_rsvd: Do not use directly, use accessor in hugetlb_cgr= oup.h. * @_hugetlb_hwpoison: Do not use directly, call raw_hwp_list_head(). * @_deferred_list: Folios to be split under memory pressure. + * @_partially_mapped: Folio was partially mapped. * @_unused_slab_obj_exts: Placeholder to match obj_exts in struct slab. * * A folio is a physically, virtually and logically contiguous set @@ -393,6 +394,7 @@ struct folio { unsigned long _head_2a; /* public: */ struct list_head _deferred_list; + bool _partially_mapped; /* private: the union with struct page is transitional */ }; struct page __page_2; diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h index aae5c7c5cfb4..bf1470a7a737 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -105,6 +105,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, THP_SPLIT_PAGE, THP_SPLIT_PAGE_FAILED, THP_DEFERRED_SPLIT_PAGE, + THP_UNDERUTILIZED_SPLIT_PAGE, THP_SPLIT_PMD, THP_SCAN_EXCEED_NONE_PTE, THP_SCAN_EXCEED_SWAP_PTE, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 892467d85f3a..3305e6d0b90e 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -73,6 +73,7 @@ static unsigned long deferred_split_count(struct shrinker= *shrink, struct shrink_control *sc); static unsigned long deferred_split_scan(struct shrinker *shrink, struct shrink_control *sc); +static bool split_underutilized_thp =3D true; =20 static atomic_t huge_zero_refcount; struct folio *huge_zero_folio __read_mostly; @@ -438,6 +439,27 @@ static ssize_t hpage_pmd_size_show(struct kobject *kob= j, static struct kobj_attribute hpage_pmd_size_attr =3D __ATTR_RO(hpage_pmd_size); =20 +static ssize_t split_underutilized_thp_show(struct kobject *kobj, + struct kobj_attribute *attr, char *buf) +{ + return sysfs_emit(buf, "%d\n", split_underutilized_thp); +} + +static ssize_t split_underutilized_thp_store(struct kobject *kobj, + struct kobj_attribute *attr, + const char *buf, size_t count) +{ + int err =3D kstrtobool(buf, &split_underutilized_thp); + + if (err < 0) + return err; + + return count; +} + +static struct kobj_attribute split_underutilized_thp_attr =3D __ATTR( + thp_low_util_shrinker, 0644, split_underutilized_thp_show, split_underuti= lized_thp_store); + static struct attribute *hugepage_attr[] =3D { &enabled_attr.attr, &defrag_attr.attr, @@ -446,6 +468,7 @@ static struct attribute *hugepage_attr[] =3D { #ifdef CONFIG_SHMEM &shmem_enabled_attr.attr, #endif + &split_underutilized_thp_attr.attr, NULL, }; =20 @@ -1002,6 +1025,7 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct= vm_fault *vmf, update_mmu_cache_pmd(vma, vmf->address, vmf->pmd); add_mm_counter(vma->vm_mm, MM_ANONPAGES, HPAGE_PMD_NR); mm_inc_nr_ptes(vma->vm_mm); + deferred_split_folio(folio, false); spin_unlock(vmf->ptl); count_vm_event(THP_FAULT_ALLOC); count_mthp_stat(HPAGE_PMD_ORDER, MTHP_STAT_ANON_FAULT_ALLOC); @@ -3259,6 +3283,7 @@ int split_huge_page_to_list_to_order(struct page *pag= e, struct list_head *list, * page_deferred_list. */ list_del_init(&folio->_deferred_list); + folio->_partially_mapped =3D false; } spin_unlock(&ds_queue->split_queue_lock); if (mapping) { @@ -3315,11 +3340,12 @@ void __folio_undo_large_rmappable(struct folio *fol= io) if (!list_empty(&folio->_deferred_list)) { ds_queue->split_queue_len--; list_del_init(&folio->_deferred_list); + folio->_partially_mapped =3D false; } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); } =20 -void deferred_split_folio(struct folio *folio) +void deferred_split_folio(struct folio *folio, bool partially_mapped) { struct deferred_split *ds_queue =3D get_deferred_split_queue(folio); #ifdef CONFIG_MEMCG @@ -3334,6 +3360,9 @@ void deferred_split_folio(struct folio *folio) if (folio_order(folio) <=3D 1) return; =20 + if (!partially_mapped && !split_underutilized_thp) + return; + /* * The try_to_unmap() in page reclaim path might reach here too, * this may cause a race condition to corrupt deferred split queue. @@ -3347,14 +3376,14 @@ void deferred_split_folio(struct folio *folio) if (folio_test_swapcache(folio)) return; =20 - if (!list_empty(&folio->_deferred_list)) - return; - spin_lock_irqsave(&ds_queue->split_queue_lock, flags); + folio->_partially_mapped =3D partially_mapped; if (list_empty(&folio->_deferred_list)) { - if (folio_test_pmd_mappable(folio)) - count_vm_event(THP_DEFERRED_SPLIT_PAGE); - count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED); + if (partially_mapped) { + if (folio_test_pmd_mappable(folio)) + count_vm_event(THP_DEFERRED_SPLIT_PAGE); + count_mthp_stat(folio_order(folio), MTHP_STAT_SPLIT_DEFERRED); + } list_add_tail(&folio->_deferred_list, &ds_queue->split_queue); ds_queue->split_queue_len++; #ifdef CONFIG_MEMCG @@ -3379,6 +3408,39 @@ static unsigned long deferred_split_count(struct shr= inker *shrink, return READ_ONCE(ds_queue->split_queue_len); } =20 +static bool thp_underutilized(struct folio *folio) +{ + int num_zero_pages =3D 0, num_filled_pages =3D 0; + void *kaddr; + int i; + + if (khugepaged_max_ptes_none =3D=3D HPAGE_PMD_NR - 1) + return false; + + for (i =3D 0; i < folio_nr_pages(folio); i++) { + kaddr =3D kmap_local_folio(folio, i * PAGE_SIZE); + if (memchr_inv(kaddr, 0, PAGE_SIZE) =3D=3D NULL) { + num_zero_pages++; + if (num_zero_pages > khugepaged_max_ptes_none) { + kunmap_local(kaddr); + return true; + } + } else { + /* + * Another path for early exit once the number + * of non-zero filled pages exceeds threshold. + */ + num_filled_pages++; + if (num_filled_pages >=3D HPAGE_PMD_NR - khugepaged_max_ptes_none) { + kunmap_local(kaddr); + return false; + } + } + kunmap_local(kaddr); + } + return false; +} + static unsigned long deferred_split_scan(struct shrinker *shrink, struct shrink_control *sc) { @@ -3403,6 +3465,7 @@ static unsigned long deferred_split_scan(struct shrin= ker *shrink, } else { /* We lost race with folio_put() */ list_del_init(&folio->_deferred_list); + folio->_partially_mapped =3D false; ds_queue->split_queue_len--; } if (!--sc->nr_to_scan) @@ -3411,18 +3474,45 @@ static unsigned long deferred_split_scan(struct shr= inker *shrink, spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); =20 list_for_each_entry_safe(folio, next, &list, _deferred_list) { + bool did_split =3D false; + bool underutilized =3D false; + + if (folio->_partially_mapped) + goto split; + underutilized =3D thp_underutilized(folio); + if (underutilized) + goto split; + continue; +split: if (!folio_trylock(folio)) - goto next; - /* split_huge_page() removes page from list on success */ - if (!split_folio(folio)) - split++; + continue; + did_split =3D !split_folio(folio); folio_unlock(folio); -next: - folio_put(folio); + if (did_split) { + /* Splitting removed folio from the list, drop reference here */ + folio_put(folio); + if (underutilized) + count_vm_event(THP_UNDERUTILIZED_SPLIT_PAGE); + split++; + } } =20 spin_lock_irqsave(&ds_queue->split_queue_lock, flags); - list_splice_tail(&list, &ds_queue->split_queue); + /* + * Only add back to the queue if folio->_partially_mapped is set. + * If thp_underutilized returns false, or if split_folio fails in + * the case it was underutilized, then consider it used and don't + * add it back to split_queue. + */ + list_for_each_entry_safe(folio, next, &list, _deferred_list) { + if (folio->_partially_mapped) + list_move(&folio->_deferred_list, &ds_queue->split_queue); + else { + list_del_init(&folio->_deferred_list); + ds_queue->split_queue_len--; + } + folio_put(folio); + } spin_unlock_irqrestore(&ds_queue->split_queue_lock, flags); =20 /* diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5a32157ca309..df2da47d0637 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1758,6 +1758,7 @@ static void __update_and_free_hugetlb_folio(struct hs= tate *h, free_gigantic_folio(folio, huge_page_order(h)); } else { INIT_LIST_HEAD(&folio->_deferred_list); + folio->_partially_mapped =3D false; folio_put(folio); } } diff --git a/mm/internal.h b/mm/internal.h index 259afe44dc88..8fc072cc3023 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -657,8 +657,10 @@ static inline void prep_compound_head(struct page *pag= e, unsigned int order) atomic_set(&folio->_entire_mapcount, -1); atomic_set(&folio->_nr_pages_mapped, 0); atomic_set(&folio->_pincount, 0); - if (order > 1) + if (order > 1) { INIT_LIST_HEAD(&folio->_deferred_list); + folio->_partially_mapped =3D false; + } } =20 static inline void prep_compound_tail(struct page *head, int tail_idx) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index f3b3db104615..5a434fdbc1ef 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -85,7 +85,7 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait); * * Note that these are only respected if collapse was initiated by khugepa= ged. */ -static unsigned int khugepaged_max_ptes_none __read_mostly; +unsigned int khugepaged_max_ptes_none __read_mostly; static unsigned int khugepaged_max_ptes_swap __read_mostly; static unsigned int khugepaged_max_ptes_shared __read_mostly; =20 @@ -1235,6 +1235,7 @@ static int collapse_huge_page(struct mm_struct *mm, u= nsigned long address, pgtable_trans_huge_deposit(mm, pmd, pgtable); set_pmd_at(mm, address, pmd, _pmd); update_mmu_cache_pmd(vma, address, pmd); + deferred_split_folio(folio, false); spin_unlock(pmd_ptl); =20 folio =3D NULL; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index f568b9594c2b..2ee61d619d86 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -4651,7 +4651,8 @@ static void uncharge_folio(struct folio *folio, struc= t uncharge_gather *ug) VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); VM_BUG_ON_FOLIO(folio_order(folio) > 1 && !folio_test_hugetlb(folio) && - !list_empty(&folio->_deferred_list), folio); + !list_empty(&folio->_deferred_list) && + folio->_partially_mapped, folio); =20 /* * Nobody should be changing or seriously looking at diff --git a/mm/migrate.c b/mm/migrate.c index f4f06bdded70..2731ac20ff33 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -1734,7 +1734,8 @@ static int migrate_pages_batch(struct list_head *from, * use _deferred_list. */ if (nr_pages > 2 && - !list_empty(&folio->_deferred_list)) { + !list_empty(&folio->_deferred_list) && + folio->_partially_mapped) { if (try_split_folio(folio, split_folios) =3D=3D 0) { nr_failed++; stats->nr_thp_failed +=3D is_thp; diff --git a/mm/rmap.c b/mm/rmap.c index 2630bde38640..1b5418121965 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1582,7 +1582,7 @@ static __always_inline void __folio_remove_rmap(struc= t folio *folio, */ if (folio_test_anon(folio) && partially_mapped && list_empty(&folio->_deferred_list)) - deferred_split_folio(folio); + deferred_split_folio(folio, true); } __folio_mod_stat(folio, -nr, -nr_pmdmapped); =20 diff --git a/mm/vmscan.c b/mm/vmscan.c index c89d0551655e..1bee9b1262f6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1233,7 +1233,8 @@ static unsigned int shrink_folio_list(struct list_hea= d *folio_list, * Split partially mapped folios right away. * We can free the unmapped pages without IO. */ - if (data_race(!list_empty(&folio->_deferred_list)) && + if (data_race(!list_empty(&folio->_deferred_list) && + folio->_partially_mapped) && split_folio_to_list(folio, folio_list)) goto activate_locked; } diff --git a/mm/vmstat.c b/mm/vmstat.c index 5082431dad28..525fad4a1d6d 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1367,6 +1367,7 @@ const char * const vmstat_text[] =3D { "thp_split_page", "thp_split_page_failed", "thp_deferred_split_page", + "thp_underutilized_split_page", "thp_split_pmd", "thp_scan_exceed_none_pte", "thp_scan_exceed_swap_pte", --=20 2.43.0