From nobody Thu Sep 11 00:11:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 97D8BC64EC4 for ; Thu, 9 Mar 2023 23:06:41 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231430AbjCIXGk (ORCPT ); Thu, 9 Mar 2023 18:06:40 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:53200 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231361AbjCIXGU (ORCPT ); Thu, 9 Mar 2023 18:06:20 -0500 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DAA74F4033 for ; Thu, 9 Mar 2023 15:06:05 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=H478EBfRc8YOP2keWn4dK4BIB7gGzmccWr6qJ1Nxgx8=; b=3S7FD3TXf9UwH4vZnN664p7upk YngkEO7ZO7iHs5EUB/KYcNJUo/BBNpSQIKzIX2veruPnCMScaqst8M6NFr920RF+Ka8GmIiv23fXF e+foHWC2ZT4xOTXoeG61Ldgxwl1krdl21xo/MYEYnmV491LOq4xx+g2ILjTCD4JWvm+gt/0lL8SzB 7Bm9YAxy2MY4dP5fTR9VKWl1wIcNDc+iMCjK0hDYqtGPmSX4QavVkCiT3qr4JIHo/0u50GiK8JKGd LgqymKO7cQtlBE5EemwB6Nd2OqE+Og8haRrr/W8LciuaFC6uSXoTtZsSZUNijvsMRv9frTHiLnack 9cLWBkwg==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1paPKW-00CIRO-3B; Thu, 09 Mar 2023 23:05:48 +0000 From: Luis Chamberlain To: hughd@google.com, akpm@linux-foundation.org, willy@infradead.org, brauner@kernel.org Cc: linux-mm@kvack.org, p.raghav@samsung.com, da.gomez@samsung.com, a.manzanares@samsung.com, dave@stgolabs.net, yosryahmed@google.com, keescook@chromium.org, mcgrof@kernel.org, patches@lists.linux.dev, linux-kernel@vger.kernel.org, David Hildenbrand Subject: [PATCH v2 1/6] shmem: remove check for folio lock on writepage() Date: Thu, 9 Mar 2023 15:05:40 -0800 Message-Id: <20230309230545.2930737-2-mcgrof@kernel.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230309230545.2930737-1-mcgrof@kernel.org> References: <20230309230545.2930737-1-mcgrof@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: Luis Chamberlain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Matthew notes we should not need to check the folio lock on the writepage() callback so remove it. This sanity check has been lingering since linux-history days. We remove this as we tidy up the writepage() callback to make things a bit clearer. Suggested-by: Matthew Wilcox Acked-by: David Hildenbrand Reviewed-by: Christian Brauner Signed-off-by: Luis Chamberlain Reviewed-by: Davidlohr Bueso --- mm/shmem.c | 1 - 1 file changed, 1 deletion(-) diff --git a/mm/shmem.c b/mm/shmem.c index 1af85259b6fc..7fff1a3af092 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1354,7 +1354,6 @@ static int shmem_writepage(struct page *page, struct = writeback_control *wbc) folio_clear_dirty(folio); } =20 - BUG_ON(!folio_test_locked(folio)); mapping =3D folio->mapping; index =3D folio->index; inode =3D mapping->host; --=20 2.39.1 From nobody Thu Sep 11 00:11:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E2FD3C61DA4 for ; Thu, 9 Mar 2023 23:06:19 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231347AbjCIXGS (ORCPT ); Thu, 9 Mar 2023 18:06:18 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52806 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231246AbjCIXGJ (ORCPT ); Thu, 9 Mar 2023 18:06:09 -0500 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 249A7F865E for ; Thu, 9 Mar 2023 15:05:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=Mj38/Shbs9ZyLGljvL78FJsMrZNw4adJNB7STaV6cuA=; b=seeJm9gYP9E41NpNLIQfn02A+C oxlZ3k6wzYiLUH1KFiSb05cnPDxxE85TUCsL05PdvvOaU6QIuFDUpzxwIZeM5WsxwhAmZzqgZSSl7 8hBYfYuPOrxwUcjZKpqMjt3Nb6f22NzLkFsgK7K/Y5Lb/XJs0ZAhpfXLhj0DRTUwjaOTXI8FGUT5d 44OtbSzlkiJoESExutB36lEeI9dejYMgU7S7R7ApLodCoJtM4HPACPX51Ftk7Pyfct5xSDZ0+nutl pXExNQXMnX+lTj1Bareamc66niruXfSdjnmbxhqrUnXtYCLtmkZtYwcGSf+KatBCDrZ3pe1ZlEhyX pd/xldcA==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1paPKW-00CIRU-5W; Thu, 09 Mar 2023 23:05:48 +0000 From: Luis Chamberlain To: hughd@google.com, akpm@linux-foundation.org, willy@infradead.org, brauner@kernel.org Cc: linux-mm@kvack.org, p.raghav@samsung.com, da.gomez@samsung.com, a.manzanares@samsung.com, dave@stgolabs.net, yosryahmed@google.com, keescook@chromium.org, mcgrof@kernel.org, patches@lists.linux.dev, linux-kernel@vger.kernel.org, David Hildenbrand Subject: [PATCH v2 2/6] shmem: set shmem_writepage() variables early Date: Thu, 9 Mar 2023 15:05:41 -0800 Message-Id: <20230309230545.2930737-3-mcgrof@kernel.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230309230545.2930737-1-mcgrof@kernel.org> References: <20230309230545.2930737-1-mcgrof@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: Luis Chamberlain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" shmem_writepage() sets up variables typically used *after* a possible huge page split. However even if that does happen the address space mapping should not change, and the inode does not change either. So it should be safe to set that from the very beginning. This commit makes no functional changes. Acked-by: David Hildenbrand Reviewed-by: Christian Brauner Signed-off-by: Luis Chamberlain Reviewed-by: Davidlohr Bueso --- mm/shmem.c | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 7fff1a3af092..2b9ff585a553 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1334,9 +1334,9 @@ int shmem_unuse(unsigned int type) static int shmem_writepage(struct page *page, struct writeback_control *wb= c) { struct folio *folio =3D page_folio(page); - struct shmem_inode_info *info; - struct address_space *mapping; - struct inode *inode; + struct address_space *mapping =3D folio->mapping; + struct inode *inode =3D mapping->host; + struct shmem_inode_info *info =3D SHMEM_I(inode); swp_entry_t swap; pgoff_t index; =20 @@ -1354,10 +1354,7 @@ static int shmem_writepage(struct page *page, struct= writeback_control *wbc) folio_clear_dirty(folio); } =20 - mapping =3D folio->mapping; index =3D folio->index; - inode =3D mapping->host; - info =3D SHMEM_I(inode); if (info->flags & VM_LOCKED) goto redirty; if (!total_swap_pages) --=20 2.39.1 From nobody Thu Sep 11 00:11:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id C07D9C61DA4 for ; Thu, 9 Mar 2023 23:06:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231312AbjCIXGO (ORCPT ); Thu, 9 Mar 2023 18:06:14 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52802 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231231AbjCIXGJ (ORCPT ); Thu, 9 Mar 2023 18:06:09 -0500 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 11F70F8654 for ; Thu, 9 Mar 2023 15:05:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=lRTyWhPNnRtMIqcninG/zeE0F44SVFjwfZYisJiUrvs=; b=gLrCmQ+JAmVqsGLnc+ORmTR9Po KUk+ypQwAKuob3XUb+wNe97TWEZv4xENuSB54H1MRcT5n5U2JpvIE5Ysf4oRoQmn4KwoZpT+heKBD bJ6A1bK+Fe0UV0icaRTaGBI8yOXczM+rW0owdM7y9w7NnixJe59wgw89zRYFYjKVtKrBHNwm3xGlj dpat9O+ynl7cjNJqggoOfVLoW7YnoumCLQM0WzApY+vgIlEep002GRlUkPdvy2pYCtGaOcxyTvp0h Q6sQ0lq3b4JFMxBExpRhPEOpHbcolHLjsglmub8cSWnmdm4QziVbGrmC7mKCC6Gz0/DlbVYM3a1DO VlWyRLaw==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1paPKW-00CIRW-7E; Thu, 09 Mar 2023 23:05:48 +0000 From: Luis Chamberlain To: hughd@google.com, akpm@linux-foundation.org, willy@infradead.org, brauner@kernel.org Cc: linux-mm@kvack.org, p.raghav@samsung.com, da.gomez@samsung.com, a.manzanares@samsung.com, dave@stgolabs.net, yosryahmed@google.com, keescook@chromium.org, mcgrof@kernel.org, patches@lists.linux.dev, linux-kernel@vger.kernel.org, David Hildenbrand Subject: [PATCH v2 3/6] shmem: move reclaim check early on writepages() Date: Thu, 9 Mar 2023 15:05:42 -0800 Message-Id: <20230309230545.2930737-4-mcgrof@kernel.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230309230545.2930737-1-mcgrof@kernel.org> References: <20230309230545.2930737-1-mcgrof@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: Luis Chamberlain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" i915_gem requires huge folios to be split when swapping. However we have check for usage of writepages() to ensure it used only for swap purposes later. Avoid the splits if we're not being called for reclaim, even if they should in theory not happen. This makes the conditions easier to follow on shem_writepage(). Acked-by: David Hildenbrand Reviewed-by: Yosry Ahmed Reviewed-by: Christian Brauner Signed-off-by: Luis Chamberlain Reviewed-by: Davidlohr Bueso --- mm/shmem.c | 22 ++++++++++------------ 1 file changed, 10 insertions(+), 12 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 2b9ff585a553..68e9970baf1e 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1340,6 +1340,16 @@ static int shmem_writepage(struct page *page, struct= writeback_control *wbc) swp_entry_t swap; pgoff_t index; =20 + /* + * Our capabilities prevent regular writeback or sync from ever calling + * shmem_writepage; but a stacking filesystem might use ->writepage of + * its underlying filesystem, in which case tmpfs should write out to + * swap only in response to memory pressure, and not for the writeback + * threads or sync. + */ + if (WARN_ON_ONCE(!wbc->for_reclaim)) + goto redirty; + /* * If /sys/kernel/mm/transparent_hugepage/shmem_enabled is "always" or * "force", drivers/gpu/drm/i915/gem/i915_gem_shmem.c gets huge pages, @@ -1360,18 +1370,6 @@ static int shmem_writepage(struct page *page, struct= writeback_control *wbc) if (!total_swap_pages) goto redirty; =20 - /* - * Our capabilities prevent regular writeback or sync from ever calling - * shmem_writepage; but a stacking filesystem might use ->writepage of - * its underlying filesystem, in which case tmpfs should write out to - * swap only in response to memory pressure, and not for the writeback - * threads or sync. - */ - if (!wbc->for_reclaim) { - WARN_ON_ONCE(1); /* Still happens? Tell us about it! */ - goto redirty; - } - /* * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC * value into swapfile.c, the only way we can correctly account for a --=20 2.39.1 From nobody Thu Sep 11 00:11:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0957C64EC4 for ; Thu, 9 Mar 2023 23:06:12 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231282AbjCIXGL (ORCPT ); Thu, 9 Mar 2023 18:06:11 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52800 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230474AbjCIXGJ (ORCPT ); Thu, 9 Mar 2023 18:06:09 -0500 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB152F4B65 for ; Thu, 9 Mar 2023 15:05:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=nd2m/LNu9NMKcIyRptke6PFaE+tBCNb9V+MN5pDSpQU=; b=3HA+jE0dyswGu9J5ZVVPfPe329 BO74aE5mhVip8O7KCZwd+SF7eFwb27HJyop6K4tGVHCtQXPIHeTkMP2VAkR0T3MojwU8x6u+tf0Dv nBlE2vsMycFdc8mJdNhg9Wjv8EuDGp2Liutocgdvmqk3KDlQD+PgebAyTzprcitwS5UM8JZ4yfkgZ UNrRRD4Z1LIzm7QVLI/cT8FeeIAguMrwrgUAi/3NZY3KeND9TvJ0wUDcAtY9ObElBVCcs+W7kwkZ9 7EyooKOu7ISN/RzkBo8MmRK2IeqGOi5Rh2g77AKzK4W7UpYmr2grBlhpxHPHisC4onhlVIrgPbgEq 5dxQmXXA==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1paPKW-00CIRY-8d; Thu, 09 Mar 2023 23:05:48 +0000 From: Luis Chamberlain To: hughd@google.com, akpm@linux-foundation.org, willy@infradead.org, brauner@kernel.org Cc: linux-mm@kvack.org, p.raghav@samsung.com, da.gomez@samsung.com, a.manzanares@samsung.com, dave@stgolabs.net, yosryahmed@google.com, keescook@chromium.org, mcgrof@kernel.org, patches@lists.linux.dev, linux-kernel@vger.kernel.org, David Hildenbrand Subject: [PATCH v2 4/6] shmem: skip page split if we're not reclaiming Date: Thu, 9 Mar 2023 15:05:43 -0800 Message-Id: <20230309230545.2930737-5-mcgrof@kernel.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230309230545.2930737-1-mcgrof@kernel.org> References: <20230309230545.2930737-1-mcgrof@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: Luis Chamberlain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In theory when info->flags & VM_LOCKED we should not be getting shem_writepage() called so we should be verifying this with a WARN_ON_ONCE(). Since we should not be swapping then best to ensure we also don't do the folio split earlier too. So just move the check early to avoid folio splits in case its a dubious call. We also have a similar early bail when !total_swap_pages so just move that earlier to avoid the possible folio split in the same situation. Acked-by: David Hildenbrand Reviewed-by: Christian Brauner Signed-off-by: Luis Chamberlain Reviewed-by: Davidlohr Bueso Reviewed-by: Yosry Ahmed --- mm/shmem.c | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 68e9970baf1e..dfd995da77b4 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1350,6 +1350,12 @@ static int shmem_writepage(struct page *page, struct= writeback_control *wbc) if (WARN_ON_ONCE(!wbc->for_reclaim)) goto redirty; =20 + if (WARN_ON_ONCE(info->flags & VM_LOCKED)) + goto redirty; + + if (!total_swap_pages) + goto redirty; + /* * If /sys/kernel/mm/transparent_hugepage/shmem_enabled is "always" or * "force", drivers/gpu/drm/i915/gem/i915_gem_shmem.c gets huge pages, @@ -1365,10 +1371,6 @@ static int shmem_writepage(struct page *page, struct= writeback_control *wbc) } =20 index =3D folio->index; - if (info->flags & VM_LOCKED) - goto redirty; - if (!total_swap_pages) - goto redirty; =20 /* * This is somewhat ridiculous, but without plumbing a SWAP_MAP_FALLOC --=20 2.39.1 From nobody Thu Sep 11 00:11:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AB235C61DA4 for ; Thu, 9 Mar 2023 23:06:23 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231284AbjCIXGW (ORCPT ); Thu, 9 Mar 2023 18:06:22 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52848 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231272AbjCIXGL (ORCPT ); Thu, 9 Mar 2023 18:06:11 -0500 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 210E1F4D93 for ; Thu, 9 Mar 2023 15:06:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=KhIOryYi/WjMt1G9oWTtfKzfZj1AUYZk+i6xawkBgBo=; b=oZkr/qx3wEVfj1+ZVhCpyCYUCM jPh9BtYO/JSVE8KrDiP2o4+eSvNFTDhlZZMAGxJNUISY7r5Y2UQ9kJ7YZ7cFxmcydxWagnOYed0Og wDztp8ZfqI7TdvS/7rrodGjB1hRbxyZoDbRSj2Mhdxfex4+EkVe/kcw4urS7yVzUCrelwyN4OH53M pAguCzkK7m39yc+D18Yqzw4LPOmZDxW1iRJDItZjlzSejD1aNkbTxRX0HfkUWjGtOOtW3KhfjHCIZ HVtJ04r4g8q08N0npmkuwuhmFDnKnU1F4T7o6wh3xbBAlS8F2RA2HFNXQ2VjZe2ygU7wxl1qrFmMu jmuVMOqQ==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1paPKW-00CIRa-Az; Thu, 09 Mar 2023 23:05:48 +0000 From: Luis Chamberlain To: hughd@google.com, akpm@linux-foundation.org, willy@infradead.org, brauner@kernel.org Cc: linux-mm@kvack.org, p.raghav@samsung.com, da.gomez@samsung.com, a.manzanares@samsung.com, dave@stgolabs.net, yosryahmed@google.com, keescook@chromium.org, mcgrof@kernel.org, patches@lists.linux.dev, linux-kernel@vger.kernel.org, David Hildenbrand Subject: [PATCH v2 5/6] shmem: update documentation Date: Thu, 9 Mar 2023 15:05:44 -0800 Message-Id: <20230309230545.2930737-6-mcgrof@kernel.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230309230545.2930737-1-mcgrof@kernel.org> References: <20230309230545.2930737-1-mcgrof@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: Luis Chamberlain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" Update the docs to reflect a bit better why some folks prefer tmpfs over ramfs and clarify a bit more about the difference between brd ramdisks. While at it, add THP docs for tmpfs, both the mount options and the sysfs file. Reviewed-by: Christian Brauner Reviewed-by: David Hildenbrand Signed-off-by: Luis Chamberlain Reviewed-by: Davidlohr Bueso --- Documentation/filesystems/tmpfs.rst | 57 +++++++++++++++++++++++++---- 1 file changed, 49 insertions(+), 8 deletions(-) diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystem= s/tmpfs.rst index 0408c245785e..1ec9a9f8196b 100644 --- a/Documentation/filesystems/tmpfs.rst +++ b/Documentation/filesystems/tmpfs.rst @@ -13,14 +13,25 @@ everything stored therein is lost. =20 tmpfs puts everything into the kernel internal caches and grows and shrinks to accommodate the files it contains and is able to swap -unneeded pages out to swap space. It has maximum size limits which can -be adjusted on the fly via 'mount -o remount ...' - -If you compare it to ramfs (which was the template to create tmpfs) -you gain swapping and limit checking. Another similar thing is the RAM -disk (/dev/ram*), which simulates a fixed size hard disk in physical -RAM, where you have to create an ordinary filesystem on top. Ramdisks -cannot swap and you do not have the possibility to resize them. +unneeded pages out to swap space, and supports THP. + +tmpfs extends ramfs with a few userspace configurable options listed and +explained further below, some of which can be reconfigured dynamically on = the +fly using a remount ('mount -o remount ...') of the filesystem. A tmpfs +filesystem can be resized but it cannot be resized to a size below its cur= rent +usage. tmpfs also supports POSIX ACLs, and extended attributes for the +trusted.* and security.* namespaces. ramfs does not use swap and you cannot +modify any parameter for a ramfs filesystem. The size limit of a ramfs +filesystem is how much memory you have available, and so care must be take= n if +used so to not run out of memory. + +An alternative to tmpfs and ramfs is to use brd to create RAM disks +(/dev/ram*), which allows you to simulate a block device disk in physical = RAM. +To write data you would just then need to create an regular filesystem on = top +this ramdisk. As with ramfs, brd ramdisks cannot swap. brd ramdisks are al= so +configured in size at initialization and you cannot dynamically resize the= m. +Contrary to brd ramdisks, tmpfs has its own filesystem, it does not rely o= n the +block layer at all. =20 Since tmpfs lives completely in the page cache and on swap, all tmpfs pages will be shown as "Shmem" in /proc/meminfo and "Shared" in @@ -85,6 +96,36 @@ mount with such options, since it allows any user with w= rite access to use up all the memory on the machine; but enhances the scalability of that instance in a system with many CPUs making intensive use of it. =20 +tmpfs also supports Transparent Huge Pages which requires a kernel +configured with CONFIG_TRANSPARENT_HUGEPAGE and with huge supported for +your system (has_transparent_hugepage(), which is architecture specific). +The mount options for this are: + +=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +huge=3D0 never: disables huge pages for the mount +huge=3D1 always: enables huge pages for the mount +huge=3D2 within_size: only allocate huge pages if the page will be + fully within i_size, also respect fadvise()/madvise() hints. +huge=3D3 advise: only allocate huge pages if requested with + fadvise()/madvise() +=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D + +There is a sysfs file which you can also use to control system wide THP +configuration for all tmpfs mounts, the file is: + +/sys/kernel/mm/transparent_hugepage/shmem_enabled + +This sysfs file is placed on top of THP sysfs directory and so is register= ed +by THP code. It is however only used to control all tmpfs mounts with one +single knob. Since it controls all tmpfs mounts it should only be used eit= her +for emergency or testing purposes. The values you can set for shmem_enable= d are: + +=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D +-1 deny: disables huge on shm_mnt and all mounts, for + emergency use +-2 force: enables huge on shm_mnt and all mounts, w/o needing + option, for testing +=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 tmpfs has a mount option to set the NUMA memory allocation policy for all files in that instance (if CONFIG_NUMA is enabled) - which can be --=20 2.39.1 From nobody Thu Sep 11 00:11:09 2025 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A468C64EC4 for ; Thu, 9 Mar 2023 23:06:26 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231377AbjCIXGY (ORCPT ); Thu, 9 Mar 2023 18:06:24 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52846 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231267AbjCIXGL (ORCPT ); Thu, 9 Mar 2023 18:06:11 -0500 Received: from bombadil.infradead.org (bombadil.infradead.org [IPv6:2607:7c80:54:3::133]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0059DF2FAB for ; Thu, 9 Mar 2023 15:06:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=bombadil.20210309; h=Sender:Content-Transfer-Encoding: MIME-Version:References:In-Reply-To:Message-Id:Date:Subject:Cc:To:From: Reply-To:Content-Type:Content-ID:Content-Description; bh=Y21J21Mx6XlkkY3LHzdeVvTlVfIT0jycn4uNZ988WD8=; b=ZOkHpRV5sG86BmDY3iU6uooKNB X3EgaVH1y2vZ8Y60RVSjgd8vXePnDDwNwiZgCL7Jx8kSkkxi+M7ICS9cWp6Fqj0QL4IDr4lusiybL GLwkVmNemW5q94eEDaG1NQg8bd9eI2rT2SekcRvywb7SEIjLWXO+AMm+dE/HTYrzSyv7kJFiQ3IT7 kw27XPA3Hw7D4RlPo7C+KcXnezJquG3ev5odFrxa+d/5hFE3F2qs5HH38Y+MSrpkapTY1e3q5UsHR y/yyIBml3owJtYGV0QLnLnl/Py4djMGQXd5bdOyoAC6P1eeSB5GjL+5yYR5HToQmYWrP/c52xdF+j j4mPV3yQ==; Received: from mcgrof by bombadil.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1paPKW-00CIRc-CU; Thu, 09 Mar 2023 23:05:48 +0000 From: Luis Chamberlain To: hughd@google.com, akpm@linux-foundation.org, willy@infradead.org, brauner@kernel.org Cc: linux-mm@kvack.org, p.raghav@samsung.com, da.gomez@samsung.com, a.manzanares@samsung.com, dave@stgolabs.net, yosryahmed@google.com, keescook@chromium.org, mcgrof@kernel.org, patches@lists.linux.dev, linux-kernel@vger.kernel.org Subject: [PATCH v2 6/6] shmem: add support to ignore swap Date: Thu, 9 Mar 2023 15:05:45 -0800 Message-Id: <20230309230545.2930737-7-mcgrof@kernel.org> X-Mailer: git-send-email 2.37.1 In-Reply-To: <20230309230545.2930737-1-mcgrof@kernel.org> References: <20230309230545.2930737-1-mcgrof@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: Luis Chamberlain Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Type: text/plain; charset="utf-8" In doing experimentations with shmem having the option to avoid swap becomes a useful mechanism. One of the *raves* about brd over shmem is you can avoid swap, but that's not really a good reason to use brd if we can instead use shmem. Using brd has its own good reasons to exist, but just because "tmpfs" doesn't let you do that is not a great reason to avoid it if we can easily add support for it. I don't add support for reconfiguring incompatible options, but if we really wanted to we can add support for that. To avoid swap we use mapping_set_unevictable() upon inode creation, and put a WARN_ON_ONCE() stop-gap on writepages() for reclaim. Acked-by: Christian Brauner Signed-off-by: Luis Chamberlain Reviewed-by: Davidlohr Bueso --- Documentation/filesystems/tmpfs.rst | 9 ++++++--- Documentation/mm/unevictable-lru.rst | 2 ++ include/linux/shmem_fs.h | 1 + mm/shmem.c | 28 +++++++++++++++++++++++++++- 4 files changed, 36 insertions(+), 4 deletions(-) diff --git a/Documentation/filesystems/tmpfs.rst b/Documentation/filesystem= s/tmpfs.rst index 1ec9a9f8196b..f18f46be5c0c 100644 --- a/Documentation/filesystems/tmpfs.rst +++ b/Documentation/filesystems/tmpfs.rst @@ -13,7 +13,8 @@ everything stored therein is lost. =20 tmpfs puts everything into the kernel internal caches and grows and shrinks to accommodate the files it contains and is able to swap -unneeded pages out to swap space, and supports THP. +unneeded pages out to swap space, if swap was enabled for the tmpfs +mount. tmpfs also supports THP. =20 tmpfs extends ramfs with a few userspace configurable options listed and explained further below, some of which can be reconfigured dynamically on = the @@ -33,8 +34,8 @@ configured in size at initialization and you cannot dynam= ically resize them. Contrary to brd ramdisks, tmpfs has its own filesystem, it does not rely o= n the block layer at all. =20 -Since tmpfs lives completely in the page cache and on swap, all tmpfs -pages will be shown as "Shmem" in /proc/meminfo and "Shared" in +Since tmpfs lives completely in the page cache and optionally on swap, +all tmpfs pages will be shown as "Shmem" in /proc/meminfo and "Shared" in free(1). Notice that these counters also include shared memory (shmem, see ipcs(1)). The most reliable way to get the count is using df(1) and du(1). @@ -83,6 +84,8 @@ nr_inodes The maximum number of inodes for this instance= . The default is half of the number of your physical RAM pages, or (on a machine with highmem) the number of lowmem RAM pages, whichever is the lower. +noswap Disables swap. Remounts must respect the original settings. + By default swap is enabled. =3D=3D=3D=3D=3D=3D=3D=3D=3D =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D =20 These parameters accept a suffix k, m or g for kilo, mega and giga and diff --git a/Documentation/mm/unevictable-lru.rst b/Documentation/mm/unevic= table-lru.rst index 92ac5dca420c..d5ac8511eb67 100644 --- a/Documentation/mm/unevictable-lru.rst +++ b/Documentation/mm/unevictable-lru.rst @@ -42,6 +42,8 @@ The unevictable list addresses the following classes of u= nevictable pages: =20 * Those owned by ramfs. =20 + * Those owned by tmpfs with the noswap mount option. + * Those mapped into SHM_LOCK'd shared memory regions. =20 * Those mapped into VM_LOCKED [mlock()ed] VMAs. diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index 103d1000a5a2..50bf82b36995 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -45,6 +45,7 @@ struct shmem_sb_info { kuid_t uid; /* Mount uid for root directory */ kgid_t gid; /* Mount gid for root directory */ bool full_inums; /* If i_ino should be uint or ino_t */ + bool noswap; /* ignores VM reclaim / swap requests */ ino_t next_ino; /* The next per-sb inode number to use */ ino_t __percpu *ino_batch; /* The next per-cpu inode number to use */ struct mempolicy *mpol; /* default memory policy for mappings */ diff --git a/mm/shmem.c b/mm/shmem.c index dfd995da77b4..2e122c72b375 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -119,10 +119,12 @@ struct shmem_options { bool full_inums; int huge; int seen; + bool noswap; #define SHMEM_SEEN_BLOCKS 1 #define SHMEM_SEEN_INODES 2 #define SHMEM_SEEN_HUGE 4 #define SHMEM_SEEN_INUMS 8 +#define SHMEM_SEEN_NOSWAP 16 }; =20 #ifdef CONFIG_TMPFS @@ -1337,6 +1339,7 @@ static int shmem_writepage(struct page *page, struct = writeback_control *wbc) struct address_space *mapping =3D folio->mapping; struct inode *inode =3D mapping->host; struct shmem_inode_info *info =3D SHMEM_I(inode); + struct shmem_sb_info *sbinfo =3D SHMEM_SB(inode->i_sb); swp_entry_t swap; pgoff_t index; =20 @@ -1350,7 +1353,7 @@ static int shmem_writepage(struct page *page, struct = writeback_control *wbc) if (WARN_ON_ONCE(!wbc->for_reclaim)) goto redirty; =20 - if (WARN_ON_ONCE(info->flags & VM_LOCKED)) + if (WARN_ON_ONCE((info->flags & VM_LOCKED) || sbinfo->noswap)) goto redirty; =20 if (!total_swap_pages) @@ -2487,6 +2490,8 @@ static struct inode *shmem_get_inode(struct mnt_idmap= *idmap, struct super_block shmem_set_inode_flags(inode, info->fsflags); INIT_LIST_HEAD(&info->shrinklist); INIT_LIST_HEAD(&info->swaplist); + if (sbinfo->noswap) + mapping_set_unevictable(inode->i_mapping); simple_xattrs_init(&info->xattrs); cache_no_acl(inode); mapping_set_large_folios(inode->i_mapping); @@ -3574,6 +3579,7 @@ enum shmem_param { Opt_uid, Opt_inode32, Opt_inode64, + Opt_noswap, }; =20 static const struct constant_table shmem_param_enums_huge[] =3D { @@ -3595,6 +3601,7 @@ const struct fs_parameter_spec shmem_fs_parameters[] = =3D { fsparam_u32 ("uid", Opt_uid), fsparam_flag ("inode32", Opt_inode32), fsparam_flag ("inode64", Opt_inode64), + fsparam_flag ("noswap", Opt_noswap), {} }; =20 @@ -3678,6 +3685,10 @@ static int shmem_parse_one(struct fs_context *fc, st= ruct fs_parameter *param) ctx->full_inums =3D true; ctx->seen |=3D SHMEM_SEEN_INUMS; break; + case Opt_noswap: + ctx->noswap =3D true; + ctx->seen |=3D SHMEM_SEEN_NOSWAP; + break; } return 0; =20 @@ -3776,6 +3787,14 @@ static int shmem_reconfigure(struct fs_context *fc) err =3D "Current inum too high to switch to 32-bit inums"; goto out; } + if ((ctx->seen & SHMEM_SEEN_NOSWAP) && ctx->noswap && !sbinfo->noswap) { + err =3D "Cannot disable swap on remount"; + goto out; + } + if (!(ctx->seen & SHMEM_SEEN_NOSWAP) && !ctx->noswap && sbinfo->noswap) { + err =3D "Cannot enable swap on remount if it was disabled on first mount= "; + goto out; + } =20 if (ctx->seen & SHMEM_SEEN_HUGE) sbinfo->huge =3D ctx->huge; @@ -3796,6 +3815,10 @@ static int shmem_reconfigure(struct fs_context *fc) sbinfo->mpol =3D ctx->mpol; /* transfers initial ref */ ctx->mpol =3D NULL; } + + if (ctx->noswap) + sbinfo->noswap =3D true; + raw_spin_unlock(&sbinfo->stat_lock); mpol_put(mpol); return 0; @@ -3850,6 +3873,8 @@ static int shmem_show_options(struct seq_file *seq, s= truct dentry *root) seq_printf(seq, ",huge=3D%s", shmem_format_huge(sbinfo->huge)); #endif shmem_show_mpol(seq, sbinfo->mpol); + if (sbinfo->noswap) + seq_printf(seq, ",noswap"); return 0; } =20 @@ -3893,6 +3918,7 @@ static int shmem_fill_super(struct super_block *sb, s= truct fs_context *fc) ctx->inodes =3D shmem_default_max_inodes(); if (!(ctx->seen & SHMEM_SEEN_INUMS)) ctx->full_inums =3D IS_ENABLED(CONFIG_TMPFS_INODE64); + sbinfo->noswap =3D ctx->noswap; } else { sb->s_flags |=3D SB_NOUSER; } --=20 2.39.1