From nobody Wed Dec 17 12:13:33 2025 Received: from out30-132.freemail.mail.aliyun.com (out30-132.freemail.mail.aliyun.com [115.124.30.132]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 2F2B41AB6DE for ; Tue, 12 Nov 2024 07:45:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.132 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397562; cv=none; b=T5OgaRY2TXTUibBEGzX5UgPlThEtJx5lRageYu+BbwC0zBlm2599ukNm4sy9aYtKHcFdBO193YvmjTBtvS9WhaT2RqTgMQ+cUm6IgHOxgRVhIF8ASuNBRu44sDE0e1O7n6o/Tc7qmUnWQK4BuBmPgK2PuigjuPit+JI4OOZVEXc= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397562; c=relaxed/simple; bh=VCB0HQSqkQ/50UPtCxNi5c10rWwfbTyk4/8wNX7JF7k=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=g48hgIt9nEkgTJJrLRciUSB9mz3ghG0CnwEFrH0UsPwBu8LVCs5+H8cHiitjuBms9lyg0+dWnFUU4fbDZZFY+g4GERhKmXBZVBRCUdK4LleVoqksvSN0oJIe5pcDqkmlVmlKIrUNb6FQBY66irIO/LJPaiBn5gOacKIOfv/GcFk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=wYjKhkeq; arc=none smtp.client-ip=115.124.30.132 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="wYjKhkeq" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1731397557; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=1SJuihJYAFsIthe48Yp7PlQqFbbRscF0mbHdHErE/Xs=; b=wYjKhkeq8/+I1kjCLIVs9e/KnVLWp+PFKid+1SqkOdru0vsvHQdI2XOSxJaf/rG3AjwVbw05Iq4LZfIIY2NAdZHt/U8Gn63pbKFwkHTtghmgATm7oPg/toqZX4w6JbwSgbd21mkiIJOxkVAs+rANKHyDzJDuA+0sTKzm92eQqWo= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WJGDWme_1731397555 cluster:ay36) by smtp.aliyun-inc.com; Tue, 12 Nov 2024 15:45:55 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com Cc: willy@infradead.org, david@redhat.com, wangkefeng.wang@huawei.com, 21cnbao@gmail.com, ryan.roberts@arm.com, ioworker0@gmail.com, da.gomez@samsung.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 1/5] mm: factor out the order calculation into a new helper Date: Tue, 12 Nov 2024 15:45:48 +0800 Message-Id: <582997bd09b17a292124ea47dabc2ea5642daade.1731397290.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Factor out the order calculation into a new helper, which can be reused by shmem in the following patch. Suggested-by: Matthew Wilcox Signed-off-by: Baolin Wang Reviewed-by: Barry Song Reviewed-by: David Hildenbrand Reviewed-by: Daniel Gomez --- include/linux/pagemap.h | 16 +++++++++++++--- 1 file changed, 13 insertions(+), 3 deletions(-) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index bcf0865a38ae..d796c8a33647 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -727,6 +727,16 @@ typedef unsigned int __bitwise fgf_t; =20 #define FGP_WRITEBEGIN (FGP_LOCK | FGP_WRITE | FGP_CREAT | FGP_STABLE) =20 +static inline unsigned int filemap_get_order(size_t size) +{ + unsigned int shift =3D ilog2(size); + + if (shift <=3D PAGE_SHIFT) + return 0; + + return shift - PAGE_SHIFT; +} + /** * fgf_set_order - Encode a length in the fgf_t flags. * @size: The suggested size of the folio to create. @@ -740,11 +750,11 @@ typedef unsigned int __bitwise fgf_t; */ static inline fgf_t fgf_set_order(size_t size) { - unsigned int shift =3D ilog2(size); + unsigned int order =3D filemap_get_order(size); =20 - if (shift <=3D PAGE_SHIFT) + if (!order) return 0; - return (__force fgf_t)((shift - PAGE_SHIFT) << 26); + return (__force fgf_t)(order << 26); } =20 void *filemap_get_entry(struct address_space *mapping, pgoff_t index); --=20 2.39.3 From nobody Wed Dec 17 12:13:33 2025 Received: from out30-101.freemail.mail.aliyun.com (out30-101.freemail.mail.aliyun.com [115.124.30.101]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id E106A20DD6B for ; Tue, 12 Nov 2024 07:46:05 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.101 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397568; cv=none; b=HrNuj9ooFtSDfBagmnvUdGPJufJRUCeTOoR9Lg4LF2cJSNJOudc6dV4aoLrbKszpHIkIfdi3t2+NymyP3u/ZOehA6rKdYpCIFKDFsPcFAJNJfkHT36QSsfhjjfh0oZVSnrHn0pNkWIjt9WJ+SMiWlWO+ntbER1Jsq2KL0r5/pfg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397568; c=relaxed/simple; bh=gQgXiCxbLnwqFMfwzUWGR3NMlOsaeSvNn4SxCT8UAQ4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=LbZs9f6l7pFzTf+twrIfhQPrhaNCHP4jp8PTG74q/4+/I8hbd7oYLVsN3xQSdRDFOA8D+sYuZ7pQ9y+ZxfIVJM7muRPiI6U7SVU8Ky0lLRXxWiK7uAxzmAlYZQ0BmvSRKzPWyrDDKeVDimSb3Yym3ZeYihAhqWdjpzYfyuMnfTQ= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=MLzIlkAI; arc=none smtp.client-ip=115.124.30.101 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="MLzIlkAI" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1731397558; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=qou3qX2PAau4l6WamHoD3ilZA9HxOfUZL7Pa/oatbLM=; b=MLzIlkAIRVM3Pg/F0BKbIPGMCk/ZC5/YkSYTQZ1w450AVs+9Lof/+a1wdSFCm/uMvBl5floqLX7D0Y7GFGrEJZVURXd/IxKS3u+YaqCMajXq5cY+mzgLSdSen4nWe4VZKzm6IxqjAKflzIQqwa51na5nRC6G3KifPiQnNjFhwrY= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WJG9pLG_1731397556 cluster:ay36) by smtp.aliyun-inc.com; Tue, 12 Nov 2024 15:45:57 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com Cc: willy@infradead.org, david@redhat.com, wangkefeng.wang@huawei.com, 21cnbao@gmail.com, ryan.roberts@arm.com, ioworker0@gmail.com, da.gomez@samsung.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 2/5] mm: shmem: change shmem_huge_global_enabled() to return huge order bitmap Date: Tue, 12 Nov 2024 15:45:49 +0800 Message-Id: X-Mailer: git-send-email 2.39.3 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Change the shmem_huge_global_enabled() to return the suitable huge order bitmap, and return 0 if huge pages are not allowed. This is a preparation for supporting various huge orders allocation of tmpfs in the following patches. No functional changes. Signed-off-by: Baolin Wang Acked-by: David Hildenbrand --- mm/shmem.c | 40 ++++++++++++++++++++-------------------- 1 file changed, 20 insertions(+), 20 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 579e58cb3262..86b2e417dc6f 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -549,37 +549,37 @@ static bool shmem_confirm_swap(struct address_space *= mapping, =20 static int shmem_huge __read_mostly =3D SHMEM_HUGE_NEVER; =20 -static bool shmem_huge_global_enabled(struct inode *inode, pgoff_t index, - loff_t write_end, bool shmem_huge_force, - unsigned long vm_flags) +static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t= index, + loff_t write_end, bool shmem_huge_force, + unsigned long vm_flags) { loff_t i_size; =20 if (HPAGE_PMD_ORDER > MAX_PAGECACHE_ORDER) - return false; + return 0; if (!S_ISREG(inode->i_mode)) - return false; + return 0; if (shmem_huge =3D=3D SHMEM_HUGE_DENY) - return false; + return 0; if (shmem_huge_force || shmem_huge =3D=3D SHMEM_HUGE_FORCE) - return true; + return BIT(HPAGE_PMD_ORDER); =20 switch (SHMEM_SB(inode->i_sb)->huge) { case SHMEM_HUGE_ALWAYS: - return true; + return BIT(HPAGE_PMD_ORDER); case SHMEM_HUGE_WITHIN_SIZE: index =3D round_up(index + 1, HPAGE_PMD_NR); i_size =3D max(write_end, i_size_read(inode)); i_size =3D round_up(i_size, PAGE_SIZE); if (i_size >> PAGE_SHIFT >=3D index) - return true; + return BIT(HPAGE_PMD_ORDER); fallthrough; case SHMEM_HUGE_ADVISE: if (vm_flags & VM_HUGEPAGE) - return true; + return BIT(HPAGE_PMD_ORDER); fallthrough; default: - return false; + return 0; } } =20 @@ -774,11 +774,11 @@ static unsigned long shmem_unused_huge_shrink(struct = shmem_sb_info *sbinfo, return 0; } =20 -static bool shmem_huge_global_enabled(struct inode *inode, pgoff_t index, - loff_t write_end, bool shmem_huge_force, - unsigned long vm_flags) +static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t= index, + loff_t write_end, bool shmem_huge_force, + unsigned long vm_flags) { - return false; + return 0; } #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 @@ -1682,21 +1682,21 @@ unsigned long shmem_allowable_huge_orders(struct in= ode *inode, unsigned long mask =3D READ_ONCE(huge_shmem_orders_always); unsigned long within_size_orders =3D READ_ONCE(huge_shmem_orders_within_s= ize); unsigned long vm_flags =3D vma ? vma->vm_flags : 0; - bool global_huge; + unsigned int global_orders; loff_t i_size; int order; =20 if (thp_disabled_by_hw() || (vma && vma_thp_disabled(vma, vm_flags))) return 0; =20 - global_huge =3D shmem_huge_global_enabled(inode, index, write_end, - shmem_huge_force, vm_flags); + global_orders =3D shmem_huge_global_enabled(inode, index, write_end, + shmem_huge_force, vm_flags); if (!vma || !vma_is_anon_shmem(vma)) { /* * For tmpfs, we now only support PMD sized THP if huge page * is enabled, otherwise fallback to order 0. */ - return global_huge ? BIT(HPAGE_PMD_ORDER) : 0; + return global_orders; } =20 /* @@ -1729,7 +1729,7 @@ unsigned long shmem_allowable_huge_orders(struct inod= e *inode, if (vm_flags & VM_HUGEPAGE) mask |=3D READ_ONCE(huge_shmem_orders_madvise); =20 - if (global_huge) + if (global_orders > 0) mask |=3D READ_ONCE(huge_shmem_orders_inherit); =20 return THP_ORDERS_ALL_FILE_DEFAULT & mask; --=20 2.39.3 From nobody Wed Dec 17 12:13:33 2025 Received: from out30-97.freemail.mail.aliyun.com (out30-97.freemail.mail.aliyun.com [115.124.30.97]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B3A4320B214 for ; Tue, 12 Nov 2024 07:46:01 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.97 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397566; cv=none; b=oy1kB+ymZe/zIUi0+YbOE1rm2ScPTCT3NfgTgSsbBuaLo7Y4WnKZs7mPqRQhxEnAGVc/DwyToIQFYQRWrftfPzgtrfwYRExV1TtW/3VrdLoBSlp4L9E3Y2Q53UQ+HorG073b/BOyK/wbNvnjgX0zXlxGApOJEaf6wIH1MtVu7do= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397566; c=relaxed/simple; bh=Tho3/xRk+k5il9te1bwPre7tXnyjSHYo3l2yjIuYwnU=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=lDrjpzX8Fy50+bYDxxukYB4TagxDGptFQ9CtzC4gmAJMBLY/NR2loVReBGxfDI28QOQWe+97Qrv0o6dh6ZZgqj3g6JFTCCV6xiidD/toRy6QWVcGgnKc6cGT8Ph4EFglTJp8xDYSk3EVZ61RF3RudktWOwoCzgVQeupaMvIyKek= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=IjjYKn/y; arc=none smtp.client-ip=115.124.30.97 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="IjjYKn/y" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1731397559; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=BLfs2hBKLS+Nhpnosu4xyvw0VZRH7d3h79ZeLMJQj1Q=; b=IjjYKn/yi1jzD42byU1q931bayceOYNqbqoq7HXrb0tIPBySrdjL3a9CXv7ffyvdZ7fghUyd3rKTU/cwvZHznwMOyO4tFOOC78/VTXBKS9OHL38WfoUsW33vAMtWMJ2AxcQ1ioIY8wJyGCoJPAnPWkJ2p/2pw/uqtSlQMrzJ4hA= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WJG9pLm_1731397557 cluster:ay36) by smtp.aliyun-inc.com; Tue, 12 Nov 2024 15:45:58 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com Cc: willy@infradead.org, david@redhat.com, wangkefeng.wang@huawei.com, 21cnbao@gmail.com, ryan.roberts@arm.com, ioworker0@gmail.com, da.gomez@samsung.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 3/5] mm: shmem: add large folio support for tmpfs Date: Tue, 12 Nov 2024 15:45:50 +0800 Message-Id: X-Mailer: git-send-email 2.39.3 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Add large folio support for tmpfs write and fallocate paths matching the same high order preference mechanism used in the iomap buffered IO path as used in __filemap_get_folio(). Add shmem_mapping_size_orders() to get a hint for the orders of the folio based on the file size which takes care of the mapping requirements. Traditionally, tmpfs only supported PMD-sized huge folios. However nowadays with other file systems supporting any sized large folios, and extending anonymous to support mTHP, we should not restrict tmpfs to allocating only PMD-sized huge folios, making it more special. Instead, we should allow tmpfs can allocate any sized large folios. Considering that tmpfs already has the 'huge=3D' option to control the huge folios allocation, we can extend the 'huge=3D' option to allow any sized hu= ge folios. The semantics of the 'huge=3D' mount option are: huge=3Dnever: no any sized huge folios huge=3Dalways: any sized huge folios huge=3Dwithin_size: like 'always' but respect the i_size huge=3Dadvise: like 'always' if requested with fadvise()/madvise() Note: for tmpfs mmap() faults, due to the lack of a write size hint, still allocate the PMD-sized huge folios if huge=3Dalways/within_size/advise is s= et. Moreover, the 'deny' and 'force' testing options controlled by '/sys/kernel/mm/transparent_hugepage/shmem_enabled', still retain the same semantics. The 'deny' can disable any sized large folios for tmpfs, while the 'force' can enable PMD sized large folios for tmpfs. Co-developed-by: Daniel Gomez Signed-off-by: Daniel Gomez Signed-off-by: Baolin Wang --- mm/shmem.c | 91 +++++++++++++++++++++++++++++++++++++++++++++--------- 1 file changed, 77 insertions(+), 14 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 86b2e417dc6f..a3203cf8860f 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -549,10 +549,50 @@ static bool shmem_confirm_swap(struct address_space *= mapping, =20 static int shmem_huge __read_mostly =3D SHMEM_HUGE_NEVER; =20 +/** + * shmem_mapping_size_orders - Get allowable folio orders for the given fi= le size. + * @mapping: Target address_space. + * @index: The page index. + * @write_end: end of a write, could extend inode size. + * + * This returns huge orders for folios (when supported) based on the file = size + * which the mapping currently allows at the given index. The index is rel= evant + * due to alignment considerations the mapping might have. The returned or= der + * may be less than the size passed. + * + * Return: The orders. + */ +static inline unsigned int +shmem_mapping_size_orders(struct address_space *mapping, pgoff_t index, lo= ff_t write_end) +{ + unsigned int order; + size_t size; + + if (!mapping_large_folio_support(mapping) || !write_end) + return 0; + + /* Calculate the write size based on the write_end */ + size =3D write_end - (index << PAGE_SHIFT); + order =3D filemap_get_order(size); + if (!order) + return 0; + + /* If we're not aligned, allocate a smaller folio */ + if (index & ((1UL << order) - 1)) + order =3D __ffs(index); + + order =3D min_t(size_t, order, MAX_PAGECACHE_ORDER); + return order > 0 ? BIT(order + 1) - 1 : 0; +} + static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t= index, loff_t write_end, bool shmem_huge_force, + struct vm_area_struct *vma, unsigned long vm_flags) { + unsigned long within_size_orders; + unsigned int order; + pgoff_t aligned_index; loff_t i_size; =20 if (HPAGE_PMD_ORDER > MAX_PAGECACHE_ORDER) @@ -564,15 +604,41 @@ static unsigned int shmem_huge_global_enabled(struct = inode *inode, pgoff_t index if (shmem_huge_force || shmem_huge =3D=3D SHMEM_HUGE_FORCE) return BIT(HPAGE_PMD_ORDER); =20 + /* + * The huge order allocation for anon shmem is controlled through + * the mTHP interface, so we still use PMD-sized huge order to + * check whether global control is enabled. + * + * For tmpfs mmap()'s huge order, we still use PMD-sized order to + * allocate huge pages due to lack of a write size hint. + * + * Otherwise, tmpfs will allow getting a highest order hint based on + * the size of write and fallocate paths, then will try each allowable + * huge orders. + */ switch (SHMEM_SB(inode->i_sb)->huge) { case SHMEM_HUGE_ALWAYS: - return BIT(HPAGE_PMD_ORDER); - case SHMEM_HUGE_WITHIN_SIZE: - index =3D round_up(index + 1, HPAGE_PMD_NR); - i_size =3D max(write_end, i_size_read(inode)); - i_size =3D round_up(i_size, PAGE_SIZE); - if (i_size >> PAGE_SHIFT >=3D index) + if (vma) return BIT(HPAGE_PMD_ORDER); + + return shmem_mapping_size_orders(inode->i_mapping, index, write_end); + case SHMEM_HUGE_WITHIN_SIZE: + if (vma) + within_size_orders =3D BIT(HPAGE_PMD_ORDER); + else + within_size_orders =3D shmem_mapping_size_orders(inode->i_mapping, + index, write_end); + + order =3D highest_order(within_size_orders); + while (within_size_orders) { + aligned_index =3D round_up(index + 1, 1 << order); + i_size =3D max(write_end, i_size_read(inode)); + i_size =3D round_up(i_size, PAGE_SIZE); + if (i_size >> PAGE_SHIFT >=3D aligned_index) + return within_size_orders; + + order =3D next_order(&within_size_orders, order); + } fallthrough; case SHMEM_HUGE_ADVISE: if (vm_flags & VM_HUGEPAGE) @@ -776,6 +842,7 @@ static unsigned long shmem_unused_huge_shrink(struct sh= mem_sb_info *sbinfo, =20 static unsigned int shmem_huge_global_enabled(struct inode *inode, pgoff_t= index, loff_t write_end, bool shmem_huge_force, + struct vm_area_struct *vma, unsigned long vm_flags) { return 0; @@ -1173,7 +1240,7 @@ static int shmem_getattr(struct mnt_idmap *idmap, generic_fillattr(idmap, request_mask, inode, stat); inode_unlock_shared(inode); =20 - if (shmem_huge_global_enabled(inode, 0, 0, false, 0)) + if (shmem_huge_global_enabled(inode, 0, 0, false, NULL, 0)) stat->blksize =3D HPAGE_PMD_SIZE; =20 if (request_mask & STATX_BTIME) { @@ -1690,14 +1757,10 @@ unsigned long shmem_allowable_huge_orders(struct in= ode *inode, return 0; =20 global_orders =3D shmem_huge_global_enabled(inode, index, write_end, - shmem_huge_force, vm_flags); - if (!vma || !vma_is_anon_shmem(vma)) { - /* - * For tmpfs, we now only support PMD sized THP if huge page - * is enabled, otherwise fallback to order 0. - */ + shmem_huge_force, vma, vm_flags); + /* Tmpfs huge pages allocation */ + if (!vma || !vma_is_anon_shmem(vma)) return global_orders; - } =20 /* * Following the 'deny' semantics of the top level, force the huge --=20 2.39.3 From nobody Wed Dec 17 12:13:33 2025 Received: from out30-100.freemail.mail.aliyun.com (out30-100.freemail.mail.aliyun.com [115.124.30.100]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C620A20DD45 for ; Tue, 12 Nov 2024 07:46:02 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.100 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397565; cv=none; b=MsDN8u95sxR6zrg+0/FXlvAyRMPKmyJubpdXs00Kn9w52Dc3C/dbDGB1AF7bnVxUsp24AnKVx+LdnLysZn3SpYL9dZj758OrvPT3UXxYeqGOCpHzpwLviasrQZOZtTvdWNOGFwt3HXaAZr+Kan2x1XMuG+Fr8vrQAKKEw5FCuzk= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397565; c=relaxed/simple; bh=vSXmYGGtW0f45md3LO6/fvpa0u6LzyoaIloZAN8fmVw=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=biUF7cRaOiFVYBNF2b5xiCVDCH9LZ5/PxlaezCEPVztPLJn+mOD/2kBzDpssE1L+xl4I12JG/3g8udJKGoF+B/9O3e5sWm2ozp0jLgDnErUzUyLjr7KKyX1xEODPpPoZxEQ9og8dTc1+dEW5dVVZatysqKpsE35sSfUDHXgyVIs= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=RijQOClj; arc=none smtp.client-ip=115.124.30.100 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="RijQOClj" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1731397560; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=mOCJluRojGIhiFhKxbeJdF6tM55rkRyZI48QBIuEF6A=; b=RijQOClj4iHpEULXd41QvhBbOxNvwNxshECWy3k7mFeJJNypwLek3QpQxu3YaEbKNrl2mSnz2K9+qtZebhihidTeHx80o3DBka3B11Xl2k1fXSTUIxXqVIWVdVeBen56HkLntucPpsdHcZmvlYAdBiF4tHVuIRRLx6fUpB+zSnQ= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WJGC0DF_1731397558 cluster:ay36) by smtp.aliyun-inc.com; Tue, 12 Nov 2024 15:45:59 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com Cc: willy@infradead.org, david@redhat.com, wangkefeng.wang@huawei.com, 21cnbao@gmail.com, ryan.roberts@arm.com, ioworker0@gmail.com, da.gomez@samsung.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 4/5] mm: shmem: add a kernel command line to change the default huge policy for tmpfs Date: Tue, 12 Nov 2024 15:45:51 +0800 Message-Id: <64091a3d5a8c5edb0461fae203cfcf6f302a19ce.1731397290.git.baolin.wang@linux.alibaba.com> X-Mailer: git-send-email 2.39.3 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" Now the tmpfs can allow to allocate any sized large folios, and the default huge policy is still 'never'. Thus adding a new command line to change the default huge policy will be helpful to use the large folios for tmpfs, which is similar to the 'transparent_hugepage_shmem' cmdline for shmem. Signed-off-by: Baolin Wang --- .../admin-guide/kernel-parameters.txt | 7 ++++++ Documentation/admin-guide/mm/transhuge.rst | 6 +++++ mm/shmem.c | 23 ++++++++++++++++++- 3 files changed, 35 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentatio= n/admin-guide/kernel-parameters.txt index b48d744d99b0..007e6cfada3e 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -6943,6 +6943,13 @@ See Documentation/admin-guide/mm/transhuge.rst for more details. =20 + transparent_hugepage_tmpfs=3D [KNL] + Format: [always|within_size|advise|never] + Can be used to control the default hugepage allocation policy + for the tmpfs mount. + See Documentation/admin-guide/mm/transhuge.rst + for more details. + trusted.source=3D [KEYS] Format: This parameter identifies the trust source as a backend diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index 5034915f4e8e..9ae775eaacbe 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -332,6 +332,12 @@ allocation policy for the internal shmem mount by usin= g the kernel parameter seven valid policies for shmem (``always``, ``within_size``, ``advise``, ``never``, ``deny``, and ``force``). =20 +Similarly to ``transparent_hugepage_shmem``, you can control the default +hugepage allocation policy for the tmpfs mount by using the kernel paramet= er +``transparent_hugepage_tmpfs=3D``, where ```` is one of the +four valid policies for tmpfs (``always``, ``within_size``, ``advise``, +``never``). The tmpfs mount default policy is ``never``. + In the same manner as ``thp_anon`` controls each supported anonymous THP size, ``thp_shmem`` controls each supported shmem THP size. ``thp_shmem`` has the same format as ``thp_anon``, but also supports the policy diff --git a/mm/shmem.c b/mm/shmem.c index a3203cf8860f..021760e91cea 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -548,6 +548,7 @@ static bool shmem_confirm_swap(struct address_space *ma= pping, /* ifdef here to avoid bloating shmem.o when not necessary */ =20 static int shmem_huge __read_mostly =3D SHMEM_HUGE_NEVER; +static int tmpfs_huge __read_mostly =3D SHMEM_HUGE_NEVER; =20 /** * shmem_mapping_size_orders - Get allowable folio orders for the given fi= le size. @@ -4780,7 +4781,12 @@ static int shmem_fill_super(struct super_block *sb, = struct fs_context *fc) sbinfo->gid =3D ctx->gid; sbinfo->full_inums =3D ctx->full_inums; sbinfo->mode =3D ctx->mode; - sbinfo->huge =3D ctx->huge; +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + if (ctx->seen & SHMEM_SEEN_HUGE) + sbinfo->huge =3D ctx->huge; + else + sbinfo->huge =3D tmpfs_huge; +#endif sbinfo->mpol =3D ctx->mpol; ctx->mpol =3D NULL; =20 @@ -5259,6 +5265,21 @@ static int __init setup_transparent_hugepage_shmem(c= har *str) } __setup("transparent_hugepage_shmem=3D", setup_transparent_hugepage_shmem); =20 +static int __init setup_transparent_hugepage_tmpfs(char *str) +{ + int huge; + + huge =3D shmem_parse_huge(str); + if (huge < 0) { + pr_warn("transparent_hugepage_tmpfs=3D cannot parse, ignored\n"); + return huge; + } + + tmpfs_huge =3D huge; + return 1; +} +__setup("transparent_hugepage_tmpfs=3D", setup_transparent_hugepage_tmpfs); + static char str_dup[PAGE_SIZE] __initdata; static int __init setup_thp_shmem(char *str) { --=20 2.39.3 From nobody Wed Dec 17 12:13:33 2025 Received: from out30-110.freemail.mail.aliyun.com (out30-110.freemail.mail.aliyun.com [115.124.30.110]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8D24E20DD64 for ; Tue, 12 Nov 2024 07:46:03 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=115.124.30.110 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397566; cv=none; b=Ivujf05dXxJQRz7Dy7qm4WoxpSSpytKOBHqHwC+AOy+8lmg+IFEnB8+2J9HS2y8uWSLQ5yVteVPr316GCXflfUDBfDmPTX4tviXXgiJ0QnxkLX//4c/Ws9y4fZfdSjYoc4ccCpV8W2izqiUS5qVP2vjzH9F5aULoc8nNdn/dBY0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1731397566; c=relaxed/simple; bh=L7rFdPe4RkTCRnaVXfTFgkfc8ImPN+u8+wGEVH5vCeA=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=RGrLuD+mlTTI8xfFhQ9yMMjHceM6NXM9cQakvsKEFz/QEOlJWi+s6Jkiup4NYfQHnFemtOgkuM0+cFfFdlpEqegvhAAueY6VtimGsvNlpK/wZQYjk0EkSep9flF2DWWbMAv61cr5R/OtocsBGo+0PGLGkjTJUml0E0rz1ueFQ/4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com; spf=pass smtp.mailfrom=linux.alibaba.com; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b=J9x9ZEkl; arc=none smtp.client-ip=115.124.30.110 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=linux.alibaba.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (1024-bit key) header.d=linux.alibaba.com header.i=@linux.alibaba.com header.b="J9x9ZEkl" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1731397561; h=From:To:Subject:Date:Message-Id:MIME-Version; bh=QrGwmL/EZQwRk4LD22fvlkOHU0IkeaUKNMscFX8f1w0=; b=J9x9ZEkl3hPePvX8j2T0pRH26LEcJiMJGaWm69ms2kPBJ+jT4UZbNuz2FpWo47vp69QNXKk9J0WiinF9VRQuqUDJIjIo83SB0yz2x5kW14K1F8HVYp9bJiYyg+fehjcgILKjO+L3sGtAsKrX5Q8gTZPGIKqAG9vi+F6ewaER2Bo= Received: from localhost(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WJGDWo8_1731397559 cluster:ay36) by smtp.aliyun-inc.com; Tue, 12 Nov 2024 15:46:00 +0800 From: Baolin Wang To: akpm@linux-foundation.org, hughd@google.com Cc: willy@infradead.org, david@redhat.com, wangkefeng.wang@huawei.com, 21cnbao@gmail.com, ryan.roberts@arm.com, ioworker0@gmail.com, da.gomez@samsung.com, baolin.wang@linux.alibaba.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: [PATCH v2 5/5] docs: tmpfs: update the huge folios policy for tmpfs and shmem Date: Tue, 12 Nov 2024 15:45:52 +0800 Message-Id: X-Mailer: git-send-email 2.39.3 In-Reply-To: References: Precedence: bulk X-Mailing-List: linux-kernel@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset="utf-8" From: David Hildenbrand Update the huge folios policy for tmpfs and shmem. Signed-off-by: David Hildenbrand Signed-off-by: Baolin Wang --- Documentation/admin-guide/mm/transhuge.rst | 58 +++++++++++++++------- 1 file changed, 41 insertions(+), 17 deletions(-) diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/adm= in-guide/mm/transhuge.rst index 9ae775eaacbe..ba6edff728ed 100644 --- a/Documentation/admin-guide/mm/transhuge.rst +++ b/Documentation/admin-guide/mm/transhuge.rst @@ -358,8 +358,21 @@ default to ``never``. Hugepages in tmpfs/shmem =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 -You can control hugepage allocation policy in tmpfs with mount option -``huge=3D``. It can have following values: +Traditionally, tmpfs only supported a single huge page size ("PMD"). Today, +it also supports smaller sizes just like anonymous memory, often referred +to as "multi-size THP" (mTHP). Huge pages of any size are commonly +represented in the kernel as "large folios". + +While there is fine control over the huge page sizes to use for the intern= al +shmem mount (see below), ordinary tmpfs mounts will make use of all availa= ble +huge page sizes without any control over the exact sizes, behaving more li= ke +other file systems. + +tmpfs mounts +------------ + +The THP allocation policy for tmpfs mounts can be adjusted using the mount +option: ``huge=3D``. It can have following values: =20 always Attempt to allocate huge pages every time we need a new page; @@ -374,19 +387,19 @@ within_size advise Only allocate huge pages if requested with fadvise()/madvise(); =20 -The default policy is ``never``. +Remember, that the kernel may use huge pages of all available sizes, and +that no fine control as for the internal tmpfs mount is available. + +The default policy in the past was ``never``, but it can now be adjusted +using the kernel parameter ``transparent_hugepage_tmpfs=3D``. =20 ``mount -o remount,huge=3D /mountpoint`` works fine after mount: remounting ``huge=3Dnever`` will not attempt to break up huge pages at all, just stop= more from being allocated. =20 -There's also sysfs knob to control hugepage allocation policy for internal -shmem mount: /sys/kernel/mm/transparent_hugepage/shmem_enabled. The mount -is used for SysV SHM, memfds, shared anonymous mmaps (of /dev/zero or -MAP_ANONYMOUS), GPU drivers' DRM objects, Ashmem. - -In addition to policies listed above, shmem_enabled allows two further -values: +In addition to policies listed above, the sysfs knob +/sys/kernel/mm/transparent_hugepage/shmem_enabled will affect the +allocation policy of tmpfs mounts, when set to the following values: =20 deny For use in emergencies, to force the huge option off from @@ -394,13 +407,24 @@ deny force Force the huge option on for all - very useful for testing; =20 -Shmem can also use "multi-size THP" (mTHP) by adding a new sysfs knob to -control mTHP allocation: -'/sys/kernel/mm/transparent_hugepage/hugepages-kB/shmem_enabled', -and its value for each mTHP is essentially consistent with the global -setting. An 'inherit' option is added to ensure compatibility with these -global settings. Conversely, the options 'force' and 'deny' are dropped, -which are rather testing artifacts from the old ages. +shmem / internal tmpfs +---------------------- +The mount internal tmpfs mount is used for SysV SHM, memfds, shared anonym= ous +mmaps (of /dev/zero or MAP_ANONYMOUS), GPU drivers' DRM objects, Ashmem. + +To control the THP allocation policy for this internal tmpfs mount, the +sysfs knob /sys/kernel/mm/transparent_hugepage/shmem_enabled and the knobs +per THP size in +'/sys/kernel/mm/transparent_hugepage/hugepages-kB/shmem_enabled' +can be used. + +The global knob has the same semantics as the ``huge=3D`` mount options +for tmpfs mounts, except that the different huge page sizes can be control= led +individually, and will only use the setting of the global knob when the +per-size knob is set to 'inherit'. + +The options 'force' and 'deny' are dropped for the individual sizes, which +are rather testing artifacts from the old ages. =20 always Attempt to allocate huge pages every time we need a new page; --=20 2.39.3