From nobody Fri May  9 02:35:11 2025
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 9176E38DE1;
	Wed,  2 Apr 2025 14:08:21 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1743602901; cv=none;
 b=fpH7+RmoNjIC+slgqwc39i1hVDyeF49w783z+yYVaKAk6vpigde03dQzjv/bYS1m3fkreBTU9KYsQAUiBqYj1IWz2fhkIOVZ4vHs0VdVe2L4fEir8erLah6LhMzyLnLm6+qi4r1yWQewueIFGlB0k0h9y7cCL5gEFlFM1NWKW3U=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1743602901; c=relaxed/simple;
	bh=Cc7nPmgWOc9KXIrYzUf39/EBs1UlZGbW17HXPUFVD78=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=DnKu+2mkp9PWs3KfjHCGpA24FOfMZLYfWLIe9JRJAYocPCiigPL3J1wmJVH3fo0SPxHfvqev9PUDZxakbJjN4aHVTxrdbXifWfz1O7qZbOsHsSl9hU40KNe7KCFOEvLbi8ZOBAUQEZlYpuWpiorD+NBz+tjhEvKKCZYlch0w/qs=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=TuVughTb; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="TuVughTb"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 0EC29C4CEDD;
	Wed,  2 Apr 2025 14:08:16 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1743602901;
	bh=Cc7nPmgWOc9KXIrYzUf39/EBs1UlZGbW17HXPUFVD78=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=TuVughTb691kdC7d86JjNwMuo/tZSFVloEAlw8HWOHgACcsMvQGHNTXE2AtLhFdTZ
	 ThG+QgYKLKULPIBkjV2i8iivliHPSqTDUFmudriNHlpbGJq+w3QzWvvgNIDxVMh1Da
	 WClejCAvYrU1yZb44ZTMHAhdYjix0tcsfmfKMi+bQgBL/OseOtVpSZtbL3E06OALoJ
	 rOYHypaZukiKjaXqhi6LGrkdb1pLY6srW9Z/t6I8wm/5zgIo3MKVA2tRT0Bp3sTeqC
	 hYN2Z3s/PxdLwjOJVXcT89O5HsI+FOMeut02boc7g2HH3z1+QevA/vXR1nEKkmukC5
	 O6Y/C8wq2KX1Q==
From: Christian Brauner <brauner@kernel.org>
To: linux-fsdevel@vger.kernel.org,
	jack@suse.cz
Cc: Christian Brauner <brauner@kernel.org>,
	Ard Biesheuvel <ardb@kernel.org>,
	linux-efi@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	mcgrof@kernel.org,
	hch@infradead.org,
	david@fromorbit.com,
	rafael@kernel.org,
	djwong@kernel.org,
	pavel@kernel.org,
	peterz@infradead.org,
	mingo@redhat.com,
	will@kernel.org,
	boqun.feng@gmail.com
Subject: [PATCH v2 1/4] fs: add owner of freeze/thaw
Date: Wed,  2 Apr 2025 16:07:31 +0200
Message-ID: <20250402-work-freeze-v2-1-6719a97b52ac@kernel.org>
X-Mailer: git-send-email 2.47.2
In-Reply-To: <20250402-work-freeze-v2-0-6719a97b52ac@kernel.org>
References: <20250402-work-freeze-v2-0-6719a97b52ac@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
X-Mailer: b4 0.15-dev-42535
X-Developer-Signature: v=1; a=openpgp-sha256; l=19518; i=brauner@kernel.org;
 h=from:subject:message-id; bh=Cc7nPmgWOc9KXIrYzUf39/EBs1UlZGbW17HXPUFVD78=;
 b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaS/dVl55GfhE92WiDVxP9dbMBrcqFxx4tIMta9fKxhu1
 914q8li21HKwiDGxSArpsji0G4SLrecp2KzUaYGzBxWJpAhDFycAjCRa7MY/kr6Gdt0fhbuvz3v
 ps6aq3JclZ8mLElw53vzaZpI2y+mptmMDDtKLB4pGvwNUVT+vZ9Dq9/uVNq5PTfb2sK8lu/07VG
 exAkA
X-Developer-Key: i=brauner@kernel.org; a=openpgp;
 fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624
Content-Transfer-Encoding: quoted-printable

For some kernel subsystems it is paramount that they are guaranteed that
they are the owner of the freeze to avoid any risk of deadlocks. This is
the case for the power subsystem. Enable it to recognize whether it did
actually freeze the filesystem.

If userspace has 10 filesystems and suspend/hibernate manges to freeze 5
and then fails on the 6th for whatever odd reason (current or future)
then power needs to undo the freeze of the first 5 filesystems. It can't
just walk the list again because while it's unlikely that a new
filesystem got added in the meantime it still cannot tell which
filesystems the power subsystem actually managed to get a freeze
reference count on that needs to be dropped during thaw.

There's various ways out of this ugliness. For example, record the
filesystems the power subsystem managed to freeze on a temporary list in
the callbacks and then walk that list backwards during thaw to undo the
freezing or make sure that the power subsystem just actually exclusively
freezes things it can freeze and marking such filesystems as being owned
by power for the duration of the suspend or resume cycle. I opted for
the latter as that seemed the clean thing to do even if it means more
code changes.

Signed-off-by: Christian Brauner <brauner@kernel.org>
---
 fs/f2fs/gc.c                |  6 ++--
 fs/gfs2/super.c             | 20 ++++++------
 fs/gfs2/sys.c               |  4 +--
 fs/ioctl.c                  |  8 ++---
 fs/super.c                  | 76 ++++++++++++++++++++++++++++++++++++-----=
----
 fs/xfs/scrub/fscounters.c   |  4 +--
 fs/xfs/xfs_notify_failure.c |  6 ++--
 include/linux/fs.h          | 13 +++++---
 8 files changed, 95 insertions(+), 42 deletions(-)

diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
index 2b8f9239bede..3e8af62c9e15 100644
--- a/fs/f2fs/gc.c
+++ b/fs/f2fs/gc.c
@@ -2271,12 +2271,12 @@ int f2fs_resize_fs(struct file *filp, __u64 block_c=
ount)
 	if (err)
 		return err;
=20
-	err =3D freeze_super(sbi->sb, FREEZE_HOLDER_USERSPACE);
+	err =3D freeze_super(sbi->sb, FREEZE_HOLDER_USERSPACE, NULL);
 	if (err)
 		return err;
=20
 	if (f2fs_readonly(sbi->sb)) {
-		err =3D thaw_super(sbi->sb, FREEZE_HOLDER_USERSPACE);
+		err =3D thaw_super(sbi->sb, FREEZE_HOLDER_USERSPACE, NULL);
 		if (err)
 			return err;
 		return -EROFS;
@@ -2333,6 +2333,6 @@ int f2fs_resize_fs(struct file *filp, __u64 block_cou=
nt)
 out_err:
 	f2fs_up_write(&sbi->cp_global_sem);
 	f2fs_up_write(&sbi->gc_lock);
-	thaw_super(sbi->sb, FREEZE_HOLDER_USERSPACE);
+	thaw_super(sbi->sb, FREEZE_HOLDER_USERSPACE, NULL);
 	return err;
 }
diff --git a/fs/gfs2/super.c b/fs/gfs2/super.c
index 44e5658b896c..519943189109 100644
--- a/fs/gfs2/super.c
+++ b/fs/gfs2/super.c
@@ -674,7 +674,7 @@ static int gfs2_sync_fs(struct super_block *sb, int wai=
t)
 	return sdp->sd_log_error;
 }
=20
-static int gfs2_do_thaw(struct gfs2_sbd *sdp)
+static int gfs2_do_thaw(struct gfs2_sbd *sdp, enum freeze_holder who, cons=
t void *freeze_owner)
 {
 	struct super_block *sb =3D sdp->sd_vfs;
 	int error;
@@ -682,7 +682,7 @@ static int gfs2_do_thaw(struct gfs2_sbd *sdp)
 	error =3D gfs2_freeze_lock_shared(sdp);
 	if (error)
 		goto fail;
-	error =3D thaw_super(sb, FREEZE_HOLDER_USERSPACE);
+	error =3D thaw_super(sb, who, freeze_owner);
 	if (!error)
 		return 0;
=20
@@ -703,14 +703,14 @@ void gfs2_freeze_func(struct work_struct *work)
 	if (test_bit(SDF_FROZEN, &sdp->sd_flags))
 		goto freeze_failed;
=20
-	error =3D freeze_super(sb, FREEZE_HOLDER_USERSPACE);
+	error =3D freeze_super(sb, FREEZE_HOLDER_USERSPACE, NULL);
 	if (error)
 		goto freeze_failed;
=20
 	gfs2_freeze_unlock(sdp);
 	set_bit(SDF_FROZEN, &sdp->sd_flags);
=20
-	error =3D gfs2_do_thaw(sdp);
+	error =3D gfs2_do_thaw(sdp, FREEZE_HOLDER_USERSPACE, NULL);
 	if (error)
 		goto out;
=20
@@ -731,7 +731,8 @@ void gfs2_freeze_func(struct work_struct *work)
  *
  */
=20
-static int gfs2_freeze_super(struct super_block *sb, enum freeze_holder wh=
o)
+static int gfs2_freeze_super(struct super_block *sb, enum freeze_holder wh=
o,
+			     const void *freeze_owner)
 {
 	struct gfs2_sbd *sdp =3D sb->s_fs_info;
 	int error;
@@ -744,7 +745,7 @@ static int gfs2_freeze_super(struct super_block *sb, en=
um freeze_holder who)
 	}
=20
 	for (;;) {
-		error =3D freeze_super(sb, FREEZE_HOLDER_USERSPACE);
+		error =3D freeze_super(sb, who, freeze_owner);
 		if (error) {
 			fs_info(sdp, "GFS2: couldn't freeze filesystem: %d\n",
 				error);
@@ -758,7 +759,7 @@ static int gfs2_freeze_super(struct super_block *sb, en=
um freeze_holder who)
 			break;
 		}
=20
-		error =3D gfs2_do_thaw(sdp);
+		error =3D gfs2_do_thaw(sdp, who, freeze_owner);
 		if (error)
 			goto out;
=20
@@ -799,7 +800,8 @@ static int gfs2_freeze_fs(struct super_block *sb)
  *
  */
=20
-static int gfs2_thaw_super(struct super_block *sb, enum freeze_holder who)
+static int gfs2_thaw_super(struct super_block *sb, enum freeze_holder who,
+			   const void *freeze_owner)
 {
 	struct gfs2_sbd *sdp =3D sb->s_fs_info;
 	int error;
@@ -814,7 +816,7 @@ static int gfs2_thaw_super(struct super_block *sb, enum=
 freeze_holder who)
 	atomic_inc(&sb->s_active);
 	gfs2_freeze_unlock(sdp);
=20
-	error =3D gfs2_do_thaw(sdp);
+	error =3D gfs2_do_thaw(sdp, who, freeze_owner);
=20
 	if (!error) {
 		clear_bit(SDF_FREEZE_INITIATOR, &sdp->sd_flags);
diff --git a/fs/gfs2/sys.c b/fs/gfs2/sys.c
index ecc699f8d9fc..748125653d6c 100644
--- a/fs/gfs2/sys.c
+++ b/fs/gfs2/sys.c
@@ -174,10 +174,10 @@ static ssize_t freeze_store(struct gfs2_sbd *sdp, con=
st char *buf, size_t len)
=20
 	switch (n) {
 	case 0:
-		error =3D thaw_super(sdp->sd_vfs, FREEZE_HOLDER_USERSPACE);
+		error =3D thaw_super(sdp->sd_vfs, FREEZE_HOLDER_USERSPACE, NULL);
 		break;
 	case 1:
-		error =3D freeze_super(sdp->sd_vfs, FREEZE_HOLDER_USERSPACE);
+		error =3D freeze_super(sdp->sd_vfs, FREEZE_HOLDER_USERSPACE, NULL);
 		break;
 	default:
 		return -EINVAL;
diff --git a/fs/ioctl.c b/fs/ioctl.c
index c91fd2b46a77..bedc83fc2f20 100644
--- a/fs/ioctl.c
+++ b/fs/ioctl.c
@@ -396,8 +396,8 @@ static int ioctl_fsfreeze(struct file *filp)
=20
 	/* Freeze */
 	if (sb->s_op->freeze_super)
-		return sb->s_op->freeze_super(sb, FREEZE_HOLDER_USERSPACE);
-	return freeze_super(sb, FREEZE_HOLDER_USERSPACE);
+		return sb->s_op->freeze_super(sb, FREEZE_HOLDER_USERSPACE, NULL);
+	return freeze_super(sb, FREEZE_HOLDER_USERSPACE, NULL);
 }
=20
 static int ioctl_fsthaw(struct file *filp)
@@ -409,8 +409,8 @@ static int ioctl_fsthaw(struct file *filp)
=20
 	/* Thaw */
 	if (sb->s_op->thaw_super)
-		return sb->s_op->thaw_super(sb, FREEZE_HOLDER_USERSPACE);
-	return thaw_super(sb, FREEZE_HOLDER_USERSPACE);
+		return sb->s_op->thaw_super(sb, FREEZE_HOLDER_USERSPACE, NULL);
+	return thaw_super(sb, FREEZE_HOLDER_USERSPACE, NULL);
 }
=20
 static int ioctl_file_dedupe_range(struct file *file,
diff --git a/fs/super.c b/fs/super.c
index 3c4a496d6438..3ddded4360c6 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -39,7 +39,8 @@
 #include <uapi/linux/mount.h>
 #include "internal.h"
=20
-static int thaw_super_locked(struct super_block *sb, enum freeze_holder wh=
o);
+static int thaw_super_locked(struct super_block *sb, enum freeze_holder wh=
o,
+			     const void *freeze_owner);
=20
 static LIST_HEAD(super_blocks);
 static DEFINE_SPINLOCK(sb_lock);
@@ -1148,7 +1149,7 @@ static void do_thaw_all_callback(struct super_block *=
sb, void *unused)
 	if (IS_ENABLED(CONFIG_BLOCK))
 		while (sb->s_bdev && !bdev_thaw(sb->s_bdev))
 			pr_warn("Emergency Thaw on %pg\n", sb->s_bdev);
-	thaw_super_locked(sb, FREEZE_HOLDER_USERSPACE);
+	thaw_super_locked(sb, FREEZE_HOLDER_USERSPACE, NULL);
 	return;
 }
=20
@@ -1195,9 +1196,9 @@ static void filesystems_freeze_callback(struct super_=
block *sb, void *unused)
 		return;
=20
 	if (sb->s_op->freeze_super)
-		sb->s_op->freeze_super(sb, FREEZE_MAY_NEST | FREEZE_HOLDER_KERNEL);
+		sb->s_op->freeze_super(sb, FREEZE_MAY_NEST | FREEZE_HOLDER_KERNEL, NULL);
 	else
-		freeze_super(sb, FREEZE_MAY_NEST | FREEZE_HOLDER_KERNEL);
+		freeze_super(sb, FREEZE_MAY_NEST | FREEZE_HOLDER_KERNEL, NULL);
=20
 	deactivate_super(sb);
 }
@@ -1217,9 +1218,9 @@ static void filesystems_thaw_callback(struct super_bl=
ock *sb, void *unused)
 		return;
=20
 	if (sb->s_op->thaw_super)
-		sb->s_op->thaw_super(sb, FREEZE_MAY_NEST | FREEZE_HOLDER_KERNEL);
+		sb->s_op->thaw_super(sb, FREEZE_MAY_NEST | FREEZE_HOLDER_KERNEL, NULL);
 	else
-		thaw_super(sb, FREEZE_MAY_NEST | FREEZE_HOLDER_KERNEL);
+		thaw_super(sb, FREEZE_MAY_NEST | FREEZE_HOLDER_KERNEL, NULL);
=20
 	deactivate_super(sb);
 }
@@ -1522,10 +1523,10 @@ static int fs_bdev_freeze(struct block_device *bdev)
=20
 	if (sb->s_op->freeze_super)
 		error =3D sb->s_op->freeze_super(sb,
-				FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE);
+				FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE, NULL);
 	else
 		error =3D freeze_super(sb,
-				FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE);
+				FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE, NULL);
 	if (!error)
 		error =3D sync_blockdev(bdev);
 	deactivate_super(sb);
@@ -1571,10 +1572,10 @@ static int fs_bdev_thaw(struct block_device *bdev)
=20
 	if (sb->s_op->thaw_super)
 		error =3D sb->s_op->thaw_super(sb,
-				FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE);
+				FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE, NULL);
 	else
 		error =3D thaw_super(sb,
-				FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE);
+				FREEZE_MAY_NEST | FREEZE_HOLDER_USERSPACE, NULL);
 	deactivate_super(sb);
 	return error;
 }
@@ -1946,7 +1947,7 @@ static int wait_for_partially_frozen(struct super_blo=
ck *sb)
 }
=20
 #define FREEZE_HOLDERS (FREEZE_HOLDER_KERNEL | FREEZE_HOLDER_USERSPACE)
-#define FREEZE_FLAGS (FREEZE_HOLDERS | FREEZE_MAY_NEST)
+#define FREEZE_FLAGS (FREEZE_HOLDERS | FREEZE_MAY_NEST | FREEZE_EXCL)
=20
 static inline int freeze_inc(struct super_block *sb, enum freeze_holder wh=
o)
 {
@@ -1977,6 +1978,21 @@ static inline bool may_freeze(struct super_block *sb=
, enum freeze_holder who)
 	WARN_ON_ONCE((who & ~FREEZE_FLAGS));
 	WARN_ON_ONCE(hweight32(who & FREEZE_HOLDERS) > 1);
=20
+	if (who & FREEZE_EXCL) {
+		if (WARN_ON_ONCE(!(who & FREEZE_HOLDER_KERNEL)))
+			return false;
+
+		if (who & ~(FREEZE_EXCL | FREEZE_HOLDER_KERNEL))
+			return false;
+
+		return (sb->s_writers.freeze_kcount +
+			sb->s_writers.freeze_ucount) =3D=3D 0;
+	}
+
+	/* This filesystem is already exclusively frozen. */
+	if (sb->s_writers.freeze_owner)
+		return false;
+
 	if (who & FREEZE_HOLDER_KERNEL)
 		return (who & FREEZE_MAY_NEST) ||
 		       sb->s_writers.freeze_kcount =3D=3D 0;
@@ -1986,10 +2002,30 @@ static inline bool may_freeze(struct super_block *s=
b, enum freeze_holder who)
 	return false;
 }
=20
+static inline bool may_unfreeze(struct super_block *sb, enum freeze_holder=
 who,
+				const void *freeze_owner)
+{
+	WARN_ON_ONCE((who & ~FREEZE_FLAGS));
+	WARN_ON_ONCE(hweight32(who & FREEZE_HOLDERS) > 1);
+
+	if (who & FREEZE_EXCL) {
+		if (WARN_ON_ONCE(sb->s_writers.freeze_owner =3D=3D NULL))
+			return false;
+		if (WARN_ON_ONCE(!(who & FREEZE_HOLDER_KERNEL)))
+			return false;
+		if (who & ~(FREEZE_EXCL | FREEZE_HOLDER_KERNEL))
+			return false;
+		return sb->s_writers.freeze_owner =3D=3D freeze_owner;
+	}
+
+	return sb->s_writers.freeze_owner =3D=3D NULL;
+}
+
 /**
  * freeze_super - lock the filesystem and force it into a consistent state
  * @sb: the super to lock
  * @who: context that wants to freeze
+ * @freeze_owner: owner of the freeze
  *
  * Syncs the super to make sure the filesystem is consistent and calls the=
 fs's
  * freeze_fs.  Subsequent calls to this without first thawing the fs may r=
eturn
@@ -2041,7 +2077,7 @@ static inline bool may_freeze(struct super_block *sb,=
 enum freeze_holder who)
  * Return: If the freeze was successful zero is returned. If the freeze
  *         failed a negative error code is returned.
  */
-int freeze_super(struct super_block *sb, enum freeze_holder who)
+int freeze_super(struct super_block *sb, enum freeze_holder who, const voi=
d *freeze_owner)
 {
 	int ret;
=20
@@ -2075,6 +2111,7 @@ int freeze_super(struct super_block *sb, enum freeze_=
holder who)
 	if (sb_rdonly(sb)) {
 		/* Nothing to do really... */
 		WARN_ON_ONCE(freeze_inc(sb, who) > 1);
+		sb->s_writers.freeze_owner =3D freeze_owner;
 		sb->s_writers.frozen =3D SB_FREEZE_COMPLETE;
 		wake_up_var(&sb->s_writers.frozen);
 		super_unlock_excl(sb);
@@ -2122,6 +2159,7 @@ int freeze_super(struct super_block *sb, enum freeze_=
holder who)
 	 * when frozen is set to SB_FREEZE_COMPLETE, and for thaw_super().
 	 */
 	WARN_ON_ONCE(freeze_inc(sb, who) > 1);
+	sb->s_writers.freeze_owner =3D freeze_owner;
 	sb->s_writers.frozen =3D SB_FREEZE_COMPLETE;
 	wake_up_var(&sb->s_writers.frozen);
 	lockdep_sb_freeze_release(sb);
@@ -2136,13 +2174,17 @@ EXPORT_SYMBOL(freeze_super);
  * removes that state without releasing the other state or unlocking the
  * filesystem.
  */
-static int thaw_super_locked(struct super_block *sb, enum freeze_holder wh=
o)
+static int thaw_super_locked(struct super_block *sb, enum freeze_holder wh=
o,
+			     const void *freeze_owner)
 {
 	int error =3D -EINVAL;
=20
 	if (sb->s_writers.frozen !=3D SB_FREEZE_COMPLETE)
 		goto out_unlock;
=20
+	if (!may_unfreeze(sb, who, freeze_owner))
+		goto out_unlock;
+
 	/*
 	 * All freezers share a single active reference.
 	 * So just unlock in case there are any left.
@@ -2152,6 +2194,7 @@ static int thaw_super_locked(struct super_block *sb, =
enum freeze_holder who)
=20
 	if (sb_rdonly(sb)) {
 		sb->s_writers.frozen =3D SB_UNFROZEN;
+		sb->s_writers.freeze_owner =3D NULL;
 		wake_up_var(&sb->s_writers.frozen);
 		goto out_deactivate;
 	}
@@ -2169,6 +2212,7 @@ static int thaw_super_locked(struct super_block *sb, =
enum freeze_holder who)
 	}
=20
 	sb->s_writers.frozen =3D SB_UNFROZEN;
+	sb->s_writers.freeze_owner =3D NULL;
 	wake_up_var(&sb->s_writers.frozen);
 	sb_freeze_unlock(sb, SB_FREEZE_FS);
 out_deactivate:
@@ -2184,6 +2228,7 @@ static int thaw_super_locked(struct super_block *sb, =
enum freeze_holder who)
  * thaw_super -- unlock filesystem
  * @sb: the super to thaw
  * @who: context that wants to freeze
+ * @freeze_owner: owner of the freeze
  *
  * Unlocks the filesystem and marks it writeable again after freeze_super()
  * if there are no remaining freezes on the filesystem.
@@ -2197,13 +2242,14 @@ static int thaw_super_locked(struct super_block *sb=
, enum freeze_holder who)
  * have been frozen through the block layer via multiple block devices.
  * The filesystem remains frozen until all block devices are unfrozen.
  */
-int thaw_super(struct super_block *sb, enum freeze_holder who)
+int thaw_super(struct super_block *sb, enum freeze_holder who,
+	       const void *freeze_owner)
 {
 	if (!super_lock_excl(sb)) {
 		WARN_ON_ONCE("Dying superblock while thawing!");
 		return -EINVAL;
 	}
-	return thaw_super_locked(sb, who);
+	return thaw_super_locked(sb, who, freeze_owner);
 }
 EXPORT_SYMBOL(thaw_super);
=20
diff --git a/fs/xfs/scrub/fscounters.c b/fs/xfs/scrub/fscounters.c
index e629663e460a..9b598c5790ad 100644
--- a/fs/xfs/scrub/fscounters.c
+++ b/fs/xfs/scrub/fscounters.c
@@ -123,7 +123,7 @@ xchk_fsfreeze(
 {
 	int			error;
=20
-	error =3D freeze_super(sc->mp->m_super, FREEZE_HOLDER_KERNEL);
+	error =3D freeze_super(sc->mp->m_super, FREEZE_HOLDER_KERNEL, NULL);
 	trace_xchk_fsfreeze(sc, error);
 	return error;
 }
@@ -135,7 +135,7 @@ xchk_fsthaw(
 	int			error;
=20
 	/* This should always succeed, we have a kernel freeze */
-	error =3D thaw_super(sc->mp->m_super, FREEZE_HOLDER_KERNEL);
+	error =3D thaw_super(sc->mp->m_super, FREEZE_HOLDER_KERNEL, NULL);
 	trace_xchk_fsthaw(sc, error);
 	return error;
 }
diff --git a/fs/xfs/xfs_notify_failure.c b/fs/xfs/xfs_notify_failure.c
index ed8d8ed42f0a..3545dc1d953c 100644
--- a/fs/xfs/xfs_notify_failure.c
+++ b/fs/xfs/xfs_notify_failure.c
@@ -127,7 +127,7 @@ xfs_dax_notify_failure_freeze(
 	struct super_block	*sb =3D mp->m_super;
 	int			error;
=20
-	error =3D freeze_super(sb, FREEZE_HOLDER_KERNEL);
+	error =3D freeze_super(sb, FREEZE_HOLDER_KERNEL, NULL);
 	if (error)
 		xfs_emerg(mp, "already frozen by kernel, err=3D%d", error);
=20
@@ -143,7 +143,7 @@ xfs_dax_notify_failure_thaw(
 	int			error;
=20
 	if (kernel_frozen) {
-		error =3D thaw_super(sb, FREEZE_HOLDER_KERNEL);
+		error =3D thaw_super(sb, FREEZE_HOLDER_KERNEL, NULL);
 		if (error)
 			xfs_emerg(mp, "still frozen after notify failure, err=3D%d",
 				error);
@@ -153,7 +153,7 @@ xfs_dax_notify_failure_thaw(
 	 * Also thaw userspace call anyway because the device is about to be
 	 * removed immediately.
 	 */
-	thaw_super(sb, FREEZE_HOLDER_USERSPACE);
+	thaw_super(sb, FREEZE_HOLDER_USERSPACE, NULL);
 }
=20
 static int
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 1aa578412f1b..b379a46b5576 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1307,6 +1307,7 @@ struct sb_writers {
 	unsigned short			frozen;		/* Is sb frozen? */
 	int				freeze_kcount;	/* How many kernel freeze requests? */
 	int				freeze_ucount;	/* How many userspace freeze requests? */
+	const void			*freeze_owner;	/* Owner of the freeze */
 	struct percpu_rw_semaphore	rw_sem[SB_FREEZE_LEVELS];
 };
=20
@@ -2270,6 +2271,7 @@ extern loff_t vfs_dedupe_file_range_one(struct file *=
src_file, loff_t src_pos,
  * @FREEZE_HOLDER_KERNEL: kernel wants to freeze or thaw filesystem
  * @FREEZE_HOLDER_USERSPACE: userspace wants to freeze or thaw filesystem
  * @FREEZE_MAY_NEST: whether nesting freeze and thaw requests is allowed
+ * @FREEZE_EXCL: whether actual freezing must be done by the caller
  *
  * Indicate who the owner of the freeze or thaw request is and whether
  * the freeze needs to be exclusive or can nest.
@@ -2283,6 +2285,7 @@ enum freeze_holder {
 	FREEZE_HOLDER_KERNEL	=3D (1U << 0),
 	FREEZE_HOLDER_USERSPACE	=3D (1U << 1),
 	FREEZE_MAY_NEST		=3D (1U << 2),
+	FREEZE_EXCL		=3D (1U << 3),
 };
=20
 struct super_operations {
@@ -2296,9 +2299,9 @@ struct super_operations {
 	void (*evict_inode) (struct inode *);
 	void (*put_super) (struct super_block *);
 	int (*sync_fs)(struct super_block *sb, int wait);
-	int (*freeze_super) (struct super_block *, enum freeze_holder who);
+	int (*freeze_super) (struct super_block *, enum freeze_holder who, const =
void *owner);
 	int (*freeze_fs) (struct super_block *);
-	int (*thaw_super) (struct super_block *, enum freeze_holder who);
+	int (*thaw_super) (struct super_block *, enum freeze_holder who, const vo=
id *owner);
 	int (*unfreeze_fs) (struct super_block *);
 	int (*statfs) (struct dentry *, struct kstatfs *);
 	int (*remount_fs) (struct super_block *, int *, char *);
@@ -2706,8 +2709,10 @@ extern int unregister_filesystem(struct file_system_=
type *);
 extern int vfs_statfs(const struct path *, struct kstatfs *);
 extern int user_statfs(const char __user *, struct kstatfs *);
 extern int fd_statfs(int, struct kstatfs *);
-int freeze_super(struct super_block *super, enum freeze_holder who);
-int thaw_super(struct super_block *super, enum freeze_holder who);
+int freeze_super(struct super_block *super, enum freeze_holder who,
+		 const void *freeze_owner);
+int thaw_super(struct super_block *super, enum freeze_holder who,
+	       const void *freeze_owner);
 extern __printf(2, 3)
 int super_setup_bdi_name(struct super_block *sb, char *fmt, ...);
 extern int super_setup_bdi(struct super_block *sb);

--=20
2.47.2
From nobody Fri May  9 02:35:11 2025
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id 169DB23BD18;
	Wed,  2 Apr 2025 14:08:25 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1743602906; cv=none;
 b=WWwwYhM5uiUd6ovuZhj3oRHf/qx1bSZYcobMJ2u8Ss+7aAj3GHWvOQPswJJiNsaQWmLe/ODE3XsWwLJfi3qhg5Eu0XoSnsPcpr42uncEFWMIzCm/T8GQd/J9uzfFzxGXoEUqCMqzPmN5iNo0D8dOZ2LWajjkCzuvJZxjy/0QUjA=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1743602906; c=relaxed/simple;
	bh=zLuo91C8DjG6fvtrq/faF09Y89HMMWIPMTsbQdIE1Zw=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=AN4tIPlFWCd9bP7tI7POCBdq03UIekNtJkx7o6ra2VyhNufS4zhAfaTCGp9JCiWEQdMqz9Vr6L+wSHueI8AyL3rFvfFxZaRCujYX8/8Kb3xfp02fxTpamlrENeOEZVLr5+iXLr1wTPpzwOan8zKxqL6/zxnNuj7DO/vsPEVozMk=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=nw4bV3Dv; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="nw4bV3Dv"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 9973DC4CEEC;
	Wed,  2 Apr 2025 14:08:21 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1743602905;
	bh=zLuo91C8DjG6fvtrq/faF09Y89HMMWIPMTsbQdIE1Zw=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=nw4bV3DvhfTPf0Mf2ryrVIl81+SK/sgYy0Ir8FN0iFCj91f69YELnvUB4F0mw2u0s
	 6ImRwCIToodpvr5uuRkDGHmrlVTQrh62ocz7TxD+30/GYZZ11iytPIDAkVNJNg9fPv
	 imaGODQN6CMysey6ndtl0gnTurm/RXcKEQtxhhFFjBCf7yyBurQ8tGWm/WvXPijRhL
	 zA4g1K5FObg8gxL9hN9sqcFt7GcpwUA29FwaTvh5Jti0ZAdjOjIK93tabIIVmHKUHR
	 rjr8Yd8K8xE/Ivtlhl7h3l9EDNcS93XHE43oAF3BCk3TOhr+d5HY55W7iW8H6fjg0b
	 w4E2okanIzisQ==
From: Christian Brauner <brauner@kernel.org>
To: linux-fsdevel@vger.kernel.org,
	jack@suse.cz
Cc: Christian Brauner <brauner@kernel.org>,
	Ard Biesheuvel <ardb@kernel.org>,
	linux-efi@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	mcgrof@kernel.org,
	hch@infradead.org,
	david@fromorbit.com,
	rafael@kernel.org,
	djwong@kernel.org,
	pavel@kernel.org,
	peterz@infradead.org,
	mingo@redhat.com,
	will@kernel.org,
	boqun.feng@gmail.com
Subject: [PATCH v2 2/4] fs: allow all writers to be frozen
Date: Wed,  2 Apr 2025 16:07:32 +0200
Message-ID: <20250402-work-freeze-v2-2-6719a97b52ac@kernel.org>
X-Mailer: git-send-email 2.47.2
In-Reply-To: <20250402-work-freeze-v2-0-6719a97b52ac@kernel.org>
References: <20250402-work-freeze-v2-0-6719a97b52ac@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
X-Mailer: b4 0.15-dev-42535
X-Developer-Signature: v=1; a=openpgp-sha256; l=1675; i=brauner@kernel.org;
 h=from:subject:message-id; bh=zLuo91C8DjG6fvtrq/faF09Y89HMMWIPMTsbQdIE1Zw=;
 b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaS/dVnlkFCRPmPT1WkXhVzbtZckhf3d+2+VQwqXav28n
 FfHb6TEdZSyMIhxMciKKbI4tJuEyy3nqdhslKkBM4eVCWQIAxenAExkdjzD/9jpJyafuHfFou9W
 hX2ZofiaU1UxjWdruM9cExF2m71bt4rhf33nryv1q1L44xns7rXOzLxiuEbL8lgGq1BItvLjfW6
 VjAA=
X-Developer-Key: i=brauner@kernel.org; a=openpgp;
 fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624
Content-Transfer-Encoding: quoted-printable

During freeze/thaw we need to be able to freeze all writers during
suspend/hibernate. Otherwise tasks such as systemd-journald that mmap a
file and write to it will not be frozen after we've already frozen the
filesystem.

This has some risk of not being able to freeze processes in case a
process has acquired SB_FREEZE_PAGEFAULT under mmap_sem or
SB_FREEZE_INTERNAL under some other filesytem specific lock. If the
filesystem is frozen, a task can block on the frozen filesystem with
e.g., mmap_sem held. If some other task then blocks on grabbing that
mmap_sem, hibernation ill fail because it is unable to hibernate a task
holding mmap_sem. This could be fixed by making a range of filesystem
related locks use freezable sleeping. That's impractical and not
warranted just for suspend/hibernate. Assume that this is an infrequent
problem and we've given userspace a way to skip filesystem freezing
through a sysfs file.

Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 include/linux/fs.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index b379a46b5576..1edcba3cd68e 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1781,8 +1781,7 @@ static inline void __sb_end_write(struct super_block =
*sb, int level)
=20
 static inline void __sb_start_write(struct super_block *sb, int level)
 {
-	percpu_down_read_freezable(sb->s_writers.rw_sem + level - 1,
-				   level =3D=3D SB_FREEZE_WRITE);
+	percpu_down_read_freezable(sb->s_writers.rw_sem + level - 1, true);
 }
=20
 static inline bool __sb_start_write_trylock(struct super_block *sb, int le=
vel)

--=20
2.47.2
From nobody Fri May  9 02:35:11 2025
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id E504923AE8D;
	Wed,  2 Apr 2025 14:08:30 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1743602911; cv=none;
 b=Qp//4sE11nMCWF8KYZSoL9mGx+OjBdrncGeGbHqZvAFEdwL4Hbiao1MFBOJ8AyreXF+Lr0gLzZXRSOqaZ++0gs+iXlZoEn0EJJPGeKGPFD5/0rLVopshQTrC88lUrK4xdHahg0hqGUkaSHdaje+YW3DbaRKjFgepwwZtHreM5UI=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1743602911; c=relaxed/simple;
	bh=gyW+5E4PcQrUZ/xuv6WYhpt81G/Ut0nCyzaUkCL5UOs=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=Nimm15PwuVXfbia3xovwa8zuC8gu1buDdvuqrsXxUhMmOz9OP1q2zoYiApwArGNVzQSQUL0OR/WCZV6UGTmNZyzkwQCIw5/rIN5oEtKOTyauFj3KEQ1B031mzaq2Yomyhw5XyrVtIXTtyjv+w+CxMh6j2QNTgCmKyIeZfObJV/s=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=kuLZivfi; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="kuLZivfi"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id 1CAA8C4CEDD;
	Wed,  2 Apr 2025 14:08:25 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1743602910;
	bh=gyW+5E4PcQrUZ/xuv6WYhpt81G/Ut0nCyzaUkCL5UOs=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=kuLZivfi5ufnwpl2KQHzAgEoAc+KzbLb9WNOROUjJVoHLLVbyqiDs6sJdwA1kw/0I
	 dovzjy9JYf1XlGJ1KX3kJrA0E0O1NEwMSiZcHgUtojMQ81oVbsyM32k2ZUFyfmSgpD
	 o0KtyiHzMXhh6CUvpODHGUDo56wIc/nEe9GqocxZBjHj1UV1ijOcnGpm5Pda70H7vi
	 YE+y6BTr7HF+h68aruyR1ntt6OomYSZflbbqr11KVxvuZ3aP3WzGjPD/dEy8UThnBo
	 tY/fiTW+qg/6CKP0zEG1EkwsNqUkYeOMD3k86RbWM/6CBLLBJONungYlui68HaQMmM
	 HpbudpPKbPhWg==
From: Christian Brauner <brauner@kernel.org>
To: linux-fsdevel@vger.kernel.org,
	jack@suse.cz
Cc: Christian Brauner <brauner@kernel.org>,
	Ard Biesheuvel <ardb@kernel.org>,
	linux-efi@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	mcgrof@kernel.org,
	hch@infradead.org,
	david@fromorbit.com,
	rafael@kernel.org,
	djwong@kernel.org,
	pavel@kernel.org,
	peterz@infradead.org,
	mingo@redhat.com,
	will@kernel.org,
	boqun.feng@gmail.com
Subject: [PATCH v2 3/4] power: freeze filesystems during suspend/resume
Date: Wed,  2 Apr 2025 16:07:33 +0200
Message-ID: <20250402-work-freeze-v2-3-6719a97b52ac@kernel.org>
X-Mailer: git-send-email 2.47.2
In-Reply-To: <20250402-work-freeze-v2-0-6719a97b52ac@kernel.org>
References: <20250402-work-freeze-v2-0-6719a97b52ac@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
X-Mailer: b4 0.15-dev-42535
X-Developer-Signature: v=1; a=openpgp-sha256; l=8276; i=brauner@kernel.org;
 h=from:subject:message-id; bh=gyW+5E4PcQrUZ/xuv6WYhpt81G/Ut0nCyzaUkCL5UOs=;
 b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaS/dVl1X1aoLXkvt2j0Ps+9LyIqBVMSr094c1Gn9uaCa
 xldyv/ZOkpYGMS4GGTFFFkc2k3C5ZbzVGw2ytSAmcPKBDKEgYtTACbiEczwPVF72awt115NODdX
 QcCObfW+p9x8/3Z/niShq/s76tecZEaGnXpV54Rqp6r+dWd5Pd3+1WrXdL6fDUnTtx+4OOcxa9g
 VXgA=
X-Developer-Key: i=brauner@kernel.org; a=openpgp;
 fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624
Content-Transfer-Encoding: quoted-printable

Now all the pieces are in place to actually allow the power subsystem
to freeze/thaw filesystems during suspend/resume. Filesystems are only
frozen and thawed if the power subsystem does actually own the freeze.

We could bubble up errors and fail suspend/resume if the error isn't
EBUSY (aka it's already frozen) but I don't think that this is worth it.
Filesystem freezing during suspend/resume is best-effort. If the user
has 500 ext4 filesystems mounted and 4 fail to freeze for whatever
reason then we simply skip them.

What we have now is already a big improvement and let's see how we fare
with it before making our lives even harder (and uglier) than we have
to.

We add a new sysctl know /sys/power/freeze_filesystems that will allow
userspace to freeze filesystems during suspend/hibernate. For now it
defaults to off. The thaw logic doesn't require checking whether
freezing is enabled because the power subsystem exclusively owns frozen
filesystems for the duration of suspend/hibernate and is able to skip
filesystems it doesn't need to freeze.

Also it is technically possible that filesystem
filesystem_freeze_enabled is true and power freezes the filesystems but
before freezing all processes another process disables
filesystem_freeze_enabled. If power were to place the filesystems_thaw()
call under filesystems_freeze_enabled it would fail to thaw the
fileystems it frozw. The exclusive holder mechanism makes it possible to
iterate through the list without any concern making sure that no
filesystems are left frozen.

Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/super.c               | 14 ++++++++++----
 kernel/power/hibernate.c | 16 +++++++++++++++-
 kernel/power/main.c      | 31 +++++++++++++++++++++++++++++++
 kernel/power/power.h     |  4 ++++
 kernel/power/suspend.c   |  7 +++++++
 5 files changed, 67 insertions(+), 5 deletions(-)

diff --git a/fs/super.c b/fs/super.c
index 3ddded4360c6..b4bdbc509dba 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -1187,6 +1187,8 @@ static inline bool get_active_super(struct super_bloc=
k *sb)
 	return active;
 }
=20
+static const char *filesystems_freeze_ptr =3D "filesystems_freeze";
+
 static void filesystems_freeze_callback(struct super_block *sb, void *unus=
ed)
 {
 	if (!sb->s_op->freeze_fs && !sb->s_op->freeze_super)
@@ -1196,9 +1198,11 @@ static void filesystems_freeze_callback(struct super=
_block *sb, void *unused)
 		return;
=20
 	if (sb->s_op->freeze_super)
-		sb->s_op->freeze_super(sb, FREEZE_MAY_NEST | FREEZE_HOLDER_KERNEL, NULL);
+		sb->s_op->freeze_super(sb, FREEZE_EXCL | FREEZE_HOLDER_KERNEL,
+				       filesystems_freeze_ptr);
 	else
-		freeze_super(sb, FREEZE_MAY_NEST | FREEZE_HOLDER_KERNEL, NULL);
+		freeze_super(sb, FREEZE_EXCL | FREEZE_HOLDER_KERNEL,
+			     filesystems_freeze_ptr);
=20
 	deactivate_super(sb);
 }
@@ -1218,9 +1222,11 @@ static void filesystems_thaw_callback(struct super_b=
lock *sb, void *unused)
 		return;
=20
 	if (sb->s_op->thaw_super)
-		sb->s_op->thaw_super(sb, FREEZE_MAY_NEST | FREEZE_HOLDER_KERNEL, NULL);
+		sb->s_op->thaw_super(sb, FREEZE_EXCL | FREEZE_HOLDER_KERNEL,
+				     filesystems_freeze_ptr);
 	else
-		thaw_super(sb, FREEZE_MAY_NEST | FREEZE_HOLDER_KERNEL, NULL);
+		thaw_super(sb, FREEZE_EXCL | FREEZE_HOLDER_KERNEL,
+			   filesystems_freeze_ptr);
=20
 	deactivate_super(sb);
 }
diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c
index 50ec26ea696b..37d733945c59 100644
--- a/kernel/power/hibernate.c
+++ b/kernel/power/hibernate.c
@@ -777,6 +777,8 @@ int hibernate(void)
 		goto Restore;
=20
 	ksys_sync_helper();
+	if (filesystem_freeze_enabled)
+		filesystems_freeze();
=20
 	error =3D freeze_processes();
 	if (error)
@@ -845,6 +847,7 @@ int hibernate(void)
 	/* Don't bother checking whether freezer_test_done is true */
 	freezer_test_done =3D false;
  Exit:
+	filesystems_thaw();
 	pm_notifier_call_chain(PM_POST_HIBERNATION);
  Restore:
 	pm_restore_console();
@@ -881,6 +884,9 @@ int hibernate_quiet_exec(int (*func)(void *data), void =
*data)
 	if (error)
 		goto restore;
=20
+	if (filesystem_freeze_enabled)
+		filesystems_freeze();
+
 	error =3D freeze_processes();
 	if (error)
 		goto exit;
@@ -940,6 +946,7 @@ int hibernate_quiet_exec(int (*func)(void *data), void =
*data)
 	thaw_processes();
=20
 exit:
+	filesystems_thaw();
 	pm_notifier_call_chain(PM_POST_HIBERNATION);
=20
 restore:
@@ -1028,19 +1035,26 @@ static int software_resume(void)
 	if (error)
 		goto Restore;
=20
+	if (filesystem_freeze_enabled)
+		filesystems_freeze();
+
 	pm_pr_dbg("Preparing processes for hibernation restore.\n");
 	error =3D freeze_processes();
-	if (error)
+	if (error) {
+		filesystems_thaw();
 		goto Close_Finish;
+	}
=20
 	error =3D freeze_kernel_threads();
 	if (error) {
 		thaw_processes();
+		filesystems_thaw();
 		goto Close_Finish;
 	}
=20
 	error =3D load_image_and_restore();
 	thaw_processes();
+	filesystems_thaw();
  Finish:
 	pm_notifier_call_chain(PM_POST_RESTORE);
  Restore:
diff --git a/kernel/power/main.c b/kernel/power/main.c
index 6254814d4817..0b0e76324c43 100644
--- a/kernel/power/main.c
+++ b/kernel/power/main.c
@@ -962,6 +962,34 @@ power_attr(pm_freeze_timeout);
=20
 #endif	/* CONFIG_FREEZER*/
=20
+#if defined(CONFIG_SUSPEND) || defined(CONFIG_HIBERNATION)
+bool filesystem_freeze_enabled =3D false;
+
+static ssize_t freeze_filesystems_show(struct kobject *kobj,
+				       struct kobj_attribute *attr, char *buf)
+{
+	return sysfs_emit(buf, "%d\n", filesystem_freeze_enabled);
+}
+
+static ssize_t freeze_filesystems_store(struct kobject *kobj,
+					struct kobj_attribute *attr,
+					const char *buf, size_t n)
+{
+	unsigned long val;
+
+	if (kstrtoul(buf, 10, &val))
+		return -EINVAL;
+
+	if (val > 1)
+		return -EINVAL;
+
+	filesystem_freeze_enabled =3D !!val;
+	return n;
+}
+
+power_attr(freeze_filesystems);
+#endif /* CONFIG_SUSPEND || CONFIG_HIBERNATION */
+
 static struct attribute * g[] =3D {
 	&state_attr.attr,
 #ifdef CONFIG_PM_TRACE
@@ -991,6 +1019,9 @@ static struct attribute * g[] =3D {
 #endif
 #ifdef CONFIG_FREEZER
 	&pm_freeze_timeout_attr.attr,
+#endif
+#if defined(CONFIG_SUSPEND) || defined(CONFIG_HIBERNATION)
+	&freeze_filesystems_attr.attr,
 #endif
 	NULL,
 };
diff --git a/kernel/power/power.h b/kernel/power/power.h
index c352dea2f67b..2eb81662b8fa 100644
--- a/kernel/power/power.h
+++ b/kernel/power/power.h
@@ -18,6 +18,10 @@ struct swsusp_info {
 	unsigned long		size;
 } __aligned(PAGE_SIZE);
=20
+#if defined(CONFIG_SUSPEND) || defined(CONFIG_HIBERNATION)
+extern bool filesystem_freeze_enabled;
+#endif
+
 #ifdef CONFIG_HIBERNATION
 /* kernel/power/snapshot.c */
 extern void __init hibernate_reserved_size_init(void);
diff --git a/kernel/power/suspend.c b/kernel/power/suspend.c
index 8eaec4ab121d..76b141b9aac0 100644
--- a/kernel/power/suspend.c
+++ b/kernel/power/suspend.c
@@ -30,6 +30,7 @@
 #include <trace/events/power.h>
 #include <linux/compiler.h>
 #include <linux/moduleparam.h>
+#include <linux/fs.h>
=20
 #include "power.h"
=20
@@ -374,6 +375,8 @@ static int suspend_prepare(suspend_state_t state)
 	if (error)
 		goto Restore;
=20
+	if (filesystem_freeze_enabled)
+		filesystems_freeze();
 	trace_suspend_resume(TPS("freeze_processes"), 0, true);
 	error =3D suspend_freeze_processes();
 	trace_suspend_resume(TPS("freeze_processes"), 0, false);
@@ -550,6 +553,7 @@ int suspend_devices_and_enter(suspend_state_t state)
 static void suspend_finish(void)
 {
 	suspend_thaw_processes();
+	filesystems_thaw();
 	pm_notifier_call_chain(PM_POST_SUSPEND);
 	pm_restore_console();
 }
@@ -588,6 +592,8 @@ static int enter_state(suspend_state_t state)
 		ksys_sync_helper();
 		trace_suspend_resume(TPS("sync_filesystems"), 0, false);
 	}
+	if (filesystem_freeze_enabled)
+		filesystems_freeze();
=20
 	pm_pr_dbg("Preparing system for sleep (%s)\n", mem_sleep_labels[state]);
 	pm_suspend_clear_flags();
@@ -609,6 +615,7 @@ static int enter_state(suspend_state_t state)
 	pm_pr_dbg("Finishing wakeup.\n");
 	suspend_finish();
  Unlock:
+	filesystems_thaw();
 	mutex_unlock(&system_transition_mutex);
 	return error;
 }

--=20
2.47.2
From nobody Fri May  9 02:35:11 2025
Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org
 [10.30.226.201])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by smtp.subspace.kernel.org (Postfix) with ESMTPS id AD5C823C8D9;
	Wed,  2 Apr 2025 14:08:34 +0000 (UTC)
Authentication-Results: smtp.subspace.kernel.org;
 arc=none smtp.client-ip=10.30.226.201
ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116;
	t=1743602914; cv=none;
 b=WZ4LCYbV1OaA8NSG0hsdyVpu4YgOdaMzctkUMs8QddIZ4jyLX/qTOESZS9fVlFYI2B6Fmc/cTbzctimSVVokoqRzLvkfCPdTcun9IXsICPZpGcnybOAdKRJle4iCdb0L7nOTDAgNZa/C50uTfQk3D+J8ydUn7QElvs0z+B87nME=
ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org;
	s=arc-20240116; t=1743602914; c=relaxed/simple;
	bh=0aHy5ajqVlSYjtIMTF8ISiWDHRrTnKpi5swMB8X3rKk=;
	h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References:
	 MIME-Version:Content-Type;
 b=qgcnIE2sE9Na9h120jKV3uPLL7Ds62EgRKmtK0yl4D3a95aH4RsO2XDqd/pZvD3qV4DX6PV6f3vlzKhEVN3w83JjcV9atJq87YFfbU0VeSK3uxGJM00iDReMHticint/lVEQgJcx14yWcfXt+xsXJA35rQsIegDZWsahsaNHN5g=
ARC-Authentication-Results: i=1; smtp.subspace.kernel.org;
 dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b=hYUCxm6C; arc=none smtp.client-ip=10.30.226.201
Authentication-Results: smtp.subspace.kernel.org;
	dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org
 header.b="hYUCxm6C"
Received: by smtp.kernel.org (Postfix) with ESMTPSA id D22BAC4CEEA;
	Wed,  2 Apr 2025 14:08:30 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org;
	s=k20201202; t=1743602914;
	bh=0aHy5ajqVlSYjtIMTF8ISiWDHRrTnKpi5swMB8X3rKk=;
	h=From:To:Cc:Subject:Date:In-Reply-To:References:From;
	b=hYUCxm6Cvxxh9ALBDNsw2gTo/C9Bs+kj4vej2Bqw1fdb8VCIHrzbKzRt7ofD1DArY
	 TLUqKSJsVOa6wGAqYxIThUiekCttoN7gl9X/JXrdQpR2I/jORO95H3DD4n80tx6wjB
	 /EW2NxVVu5FXz6LvDH3IWAy/zjkNVq6oJCAwCFb0qYEtAP1klohRkBnuVaTY2TEo2A
	 TB6Bowy+dm+sW3MwFNojvgeX+Hxg3HGnOV67h2pD0AQxEn0rP12VWo9SWgrZD9s32B
	 dbyGWzz/jxXYv14KOBkhnqy+4fOI3NLDTR2WByBKVi1H3yoeHEA54mSv0z3kn+hbVQ
	 yQM4yYdNjEqeA==
From: Christian Brauner <brauner@kernel.org>
To: linux-fsdevel@vger.kernel.org,
	jack@suse.cz
Cc: Christian Brauner <brauner@kernel.org>,
	Ard Biesheuvel <ardb@kernel.org>,
	linux-efi@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	James Bottomley <James.Bottomley@hansenpartnership.com>,
	mcgrof@kernel.org,
	hch@infradead.org,
	david@fromorbit.com,
	rafael@kernel.org,
	djwong@kernel.org,
	pavel@kernel.org,
	peterz@infradead.org,
	mingo@redhat.com,
	will@kernel.org,
	boqun.feng@gmail.com
Subject: [PATCH v2 4/4] kernfs: add warning about implementing freeze/thaw
Date: Wed,  2 Apr 2025 16:07:34 +0200
Message-ID: <20250402-work-freeze-v2-4-6719a97b52ac@kernel.org>
X-Mailer: git-send-email 2.47.2
In-Reply-To: <20250402-work-freeze-v2-0-6719a97b52ac@kernel.org>
References: <20250402-work-freeze-v2-0-6719a97b52ac@kernel.org>
Precedence: bulk
X-Mailing-List: linux-kernel@vger.kernel.org
List-Id: <linux-kernel.vger.kernel.org>
List-Subscribe: <mailto:linux-kernel+subscribe@vger.kernel.org>
List-Unsubscribe: <mailto:linux-kernel+unsubscribe@vger.kernel.org>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
X-Mailer: b4 0.15-dev-42535
X-Developer-Signature: v=1; a=openpgp-sha256; l=1636; i=brauner@kernel.org;
 h=from:subject:message-id; bh=0aHy5ajqVlSYjtIMTF8ISiWDHRrTnKpi5swMB8X3rKk=;
 b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaS/dVm1t/V8uXejQgD3OdddRV+v/V/NoBCzuO5FeXTjq
 +MOM2SZO0pZGMS4GGTFFFkc2k3C5ZbzVGw2ytSAmcPKBDKEgYtTACaS8IrhnwrflB/xbU99xN52
 HlctKfcX/Hc8zvuhzT75bjsmBT2GNQz/KxwDBF8fP3rMMmPrxPU7TXsTZlzmup3w4EqFxp/omTJ
 buQA=
X-Developer-Key: i=brauner@kernel.org; a=openpgp;
 fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624
Content-Transfer-Encoding: quoted-printable

Sysfs is built on top of kernfs and sysfs provides the power management
infrastructure to support suspend/hibernate by writing to various files
in /sys/power/. As filesystems may be automatically frozen during
suspend/hibernate implementing freeze/thaw support for kernfs
generically will cause deadlocks as the suspending/hibernation
initiating task will hold a VFS lock that it will then wait upon to be
released. If freeze/thaw for kernfs is needed talk to the VFS.

Signed-off-by: Christian Brauner <brauner@kernel.org>
Reviewed-by: Jan Kara <jack@suse.cz>
---
 fs/kernfs/mount.c | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index 1358c21837f1..d2073bb2b633 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -62,6 +62,21 @@ const struct super_operations kernfs_sops =3D {
=20
 	.show_options	=3D kernfs_sop_show_options,
 	.show_path	=3D kernfs_sop_show_path,
+
+	/*
+	 * sysfs is built on top of kernfs and sysfs provides the power
+	 * management infrastructure to support suspend/hibernate by
+	 * writing to various files in /sys/power/. As filesystems may
+	 * be automatically frozen during suspend/hibernate implementing
+	 * freeze/thaw support for kernfs generically will cause
+	 * deadlocks as the suspending/hibernation initiating task will
+	 * hold a VFS lock that it will then wait upon to be released.
+	 * If freeze/thaw for kernfs is needed talk to the VFS.
+	 */
+	.freeze_fs	=3D NULL,
+	.unfreeze_fs	=3D NULL,
+	.freeze_super	=3D NULL,
+	.thaw_super	=3D NULL,
 };
=20
 static int kernfs_encode_fh(struct inode *inode, __u32 *fh, int *max_len,

--=20
2.47.2